Article Text
Abstract
Against the global need for increased access to mental services, health organisations are looking to technological advances to improve the delivery of care and lower costs. Since November 2022, with the public launch of OpenAI’s ChatGPT, the field of generative artificial intelligence (AI) has received expanding attention. Although generative AI itself is not new, technical advances and the increased accessibility of large language models (LLMs) (eg, OpenAI’s GPT-4 and Google’s Bard) suggest use of these tools could be clinically significant. LLMs are an application of generative AI technology that can summarise and generate content based on training on vast data sets. Unlike search engines, which provide internet links in response to typed entries, chatbots that rely on generative language models can simulate dialogue that resembles human conversations. We examine the potential promise and the risks of using LLMs in mental healthcare today, focusing on their scope to impact mental healthcare, including global equity in the delivery of care. Although we caution that LLMs should not be used to disintermediate mental health clinicians, we signal how—if carefully implemented—in the long term these tools could reap benefits for patients and health professionals.
- Machine Learning
- Depression & mood disorders
- Adult psychiatry
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
ChatGPT and mental healthcare: balancing benefits with risks of harms
The WHO estimates that worldwide one in eight people live with mental illness.1 Stigmatisation and human rights violations, combined with lack of resources, including shortfalls in mental health professionals, pose significant barriers to psychiatric care.2 Consequently, clinician time is one of the scarcest resources in healthcare, and psychiatrists report looking to advances in artificial intelligence (AI) to improve efficiencies and assist with administrative tasks.3
Considering these challenges, recent advances in the field of generative AI and its potential to impact healthcare delivery have received considerable attention. Unlike search engines, which provide internet links in response to typed entries, a new generation of chatbots, such as OpenAI’s GPT-4, powered by large language models (LLMs) offer responses that resemble conversations. LLMs use massive amounts of past data to predict the next word in a sequence. This probabilistic process combined with other technical advances means these models have aptitude in recognising, summarising and generating content.
On the face of it, these tools offer considerable promise to clinicians. Already in June 2023, in a Medical Economics survey in the USA, more than 1 in 10 clinicians already reported adopting chatbots such as ChatGPT, and nearly 50% expressed future intent to use these technologies for data entry, medical scheduling or research.4 From the patient side, given the widespread global use of internet-enabled devices and the ease of access of LLM-powered chatbots, people with stigmatised mental health conditions may be especially inclined to adopt these tools. In light of these rapid advances, we examine the potential promise and the risks in mental healthcare and offer suggestions to ensure the enormous scope of these innovations is effectively and ethically harnessed.
The potential benefits of generative AI
While AI continues to evolve, some features appear nearly ready for use today. These include decreasing administrative burdens and improving documentation and clinical hypothesis generation. Surveys show that psychiatrists desire and anticipate assistance from advances in AI in undertaking administrative tasks,3 and the rapidity with which LLMs generate narrative summaries of complex data strongly suggests potential to reduce work burdens, including updating clinical records.
Preliminary studies also suggest that LLMs can assist with writing empathic documentation. For example, a study comparing written responses of physicians and ChatGPT with 195 real-world health questions submitted to Reddit’s AskDocs reported that ChatGPT’s responses were, on average, four times longer.5 In addition, a panel of blinded physicians rated ChatGPT responses as ‘good’ or ‘very good’ nearly four times more often than those submitted by doctors and rated the chatbot’s responses almost 10 times more empathic than responses by doctors.
Other studies suggest chatbots powered by LLMs could assist mental health peers or clinicians in offering consistently high levels of support in patient-facing interactions, including those struggling with compassion fatigue. For example, a randomised controlled trial of responses submitted to TalkLife, a social media platform that offers peer support to mental health patients, found responses that were written in collaboration with a chatbot called ‘HAILEY’, short for ‘Human-AI coLlaboration approach for EmpathY’, were more likely to be rated as empathic than human-only responses.6 Peer supporters who self-identified as struggling to offer empathic support were significantly rated as more likely to provide empathic responses in the AI-in-the-loop scenario.
Aside from clinician documentation and patient-facing interactions, a key emerging strength of generative AI is hypothesis generation. Encouragingly, preliminary studies show the promise of GPT-4 in generating accurate lists of differential diagnoses, including in complicated clinical cases,7 suggesting their potential to facilitate brainstorming in diagnostic and treatment decision-making.
The potential harms of generative AI
Using generative AI in mental healthcare also risks harm. LLMs are autoregressive, meaning they use past data to predict future data in generating responses; this probabilistic process means outputs can be inconsistent, often changing depending on the wording of queries. Most LLMs are not exclusively trained on medical texts and lack the capacity to discriminate the quality of webtext from which they draw their responses, meaning inferior content is treated the same way as reliable material, leading to risks of harm. For example, transcripts show that a chatbot encouraged a Belgian man to end his life to help stop climate change.8
As noted, LLMs have gained a fast reputation for their capacity to follow requests, such as writing responses in a requested conversational style, tone or literacy level. However, for a variety of reasons, biases are baked in, leading to the potential for ‘algorithmic discrimination’ whereby outputs may perpetuate or exacerbate unfair treatment,9 and research shows these models can embed gender, race and disability biases, threatening their equitable application.10 11 The sources of bias are multiple, the training data including omissions in clinical populations in medical publications (eg, PubMed), as well as stereotyping arising within social media (eg, Twitter/X, Reddit, Facebook), books, news media and images.12 Human and societal biases can also be introduced via supervised learning techniques whereby workers who are very often poorly paid risk entrenching unwanted stereotypes via data labelling and feedback.
An additional source of harm is the tendency for LLMs to make up patently false information, referred to as ‘hallucinations’.13 This risk, combined with the sheer speed and authoritative nature of the conversational responses offered by LLM-powered chatbots, might render clinicians and patients more vulnerable to disinformation, risking safety. Relatedly, if patients are unaware that it is a chatbot rather than a human answering their queries, this could compromise patient trust. For example, in January 2023, a company called Koko publicly apologised for using ChatGPT to write emotional responses while deceiving users that the responses were generated by humans.14
Yet, because of the conversational fluency associated with chatbots such as OpenAI’s GPT-4, patients and clinicians may become too trusting and be tempted to input sensitive patient data to solicit seemingly ‘neutral’ advice or recommendations, risking patient privacy. Earlier this year, the American Medical Association issued an advisory cautioning that ChatGPT and other LLM tools are not regulated and that clinicians should avoid entering patient data into generative AI systems.15 Combined with the potential for data triangulation, without additional safeguarding measures, patients may lose control of their confidential health information.
Suggestions to enhance the ethical and effective use of generative AI
Many mental health professionals may already be concerned about the potential harms of generative AI. In some cases, the benefits of employing these models could outweigh the risks, so long as they are implemented appropriately. Several suggestions could be considered.
First, health systems must ensure that LLMs uphold or improve current standards of patient safety, and relatedly that these tools do not perpetuate or compound current inequities in the delivery of care. To this end, robust experimental work with both prompt engineering and tuning of the underlying models is first needed to establish clinical accuracy and quality of LLM responses. For example, to investigate the potential for biases, LLMs could be tested to explore the range of differential diagnoses and the quality of clinical notes created for different patient populations.
Reducing the potential for algorithmic discrimination will require a multilevel approach.10 12 Thorough attention must be given to the quality of data fed into LLMs, the potential for bias in human agents involved in labelling and training AI, and therefore the diversity of participants involved in shaping these technologies.16 Participatory design approaches, involving marginalised voices, including but not limited to those from low-income countries and patients from mental health communities, should be fully integrated into activities relating to the development and impact of these tools.
Second, clinicians and patients could be supported with resources and guidance about the limitations and benefits of using LLMs. These tools should not be used to replace clinician judgement, be relied on to complete documentation or fully substitute for human interactions. Instead, via medical curricula and ongoing professional training, clinicians could be supported in how these tools could be used to augment human capacities that should be overseen by clinicians.
Third, civic health professional and regulatory engagement will also be needed to review privacy concerns related to sensitive patient data in developing and using LLM chatbots.17 In the USA, there are already efforts to integrate generative AI services into electronic healthcare systems that comply with the privacy standards of the 1996 Health Insurance Portability and Accountability Act.18 In the European Union, under the General Data Protection Regulation, strong reasons must be given to process patient data without informed consent, such as for public health justifications, and authorities are currently reviewing whether OpenAI complies with this regulation.19 Patients should be fully engaged in debates about how their health data are managed; decisions about when thresholds of acceptable use might be met should be informed by patients.
Conclusions
Generative AI in mental healthcare has the potential to offer significant benefits in clinical documentation, patient communication and medical decision-making. However, to minimise the risks of harm, these tools need to be more thoroughly studied and monitored. To this end and for psychiatrists to be equipped to lead policy and practice advances on the role of LLM in mental healthcare, improvements in digital education will be imperative.
Ethics statements
Patient consent for publication
Footnotes
Twitter @crblease, @JohnTorousMD
Contributors CB wrote the first draft. JT and CB revised the paper until both signed off on it.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.