Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

2406.12687

Published 6/19/2024 by Ankit Aich, Avery Quynh, Pamela Osseyi, Amy Pinkham, Philip Harvey, Brenda Curtis, Colin Depp, Natalie Parde

cs.CL

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Abstract

NLP in mental health has been primarily social media focused. Real world practitioners also have high case loads and often domain specific variables, of which modern LLMs lack context. We take a dataset made by recruiting 644 participants, including individuals diagnosed with Bipolar Disorder (BD), Schizophrenia (SZ), and Healthy Controls (HC). Participants undertook tasks derived from a standardized mental health instrument, and the resulting data were transcribed and annotated by experts across five clinical variables. This paper demonstrates the application of contemporary language models in sequence-to-sequence tasks to enhance mental health research. Specifically, we illustrate how these models can facilitate the deployment of mental health instruments, data collection, and data annotation with high accuracy and scalability. We show that small models are capable of annotation for domain-specific clinical variables, data collection for mental-health instruments, and perform better then commercial large models.

Create account to get full access

Overview

This paper explores the use of large language models (LLMs) to aid in the annotation and collection of clinically-enriched data for mental health conditions like bipolar disorder and schizophrenia.
The authors investigate how LLMs can be leveraged to enhance the efficiency and quality of data curation processes, which are critical for advancing research and clinical applications in these domains.
The paper presents several key contributions, including the development of novel LLM-based approaches for text annotation and generation, as well as evaluations of these techniques on real-world mental health datasets.

Plain English Explanation

In this paper, the researchers looked at how large language models can be used to help collect and organize data related to mental health conditions like bipolar disorder and schizophrenia. Gathering high-quality, clinically-relevant data is essential for advancing research and developing better treatments in these areas, but it can be a time-consuming and challenging process.

The researchers explored ways that large language models could be leveraged to streamline and improve this data collection and annotation process. They developed new techniques that allow these powerful AI models to assist in tasks like identifying relevant information in text, summarizing key details, and even generating new content that could supplement the existing datasets.

By testing these LLM-based approaches on real mental health data, the researchers were able to demonstrate their potential benefits, such as increased efficiency, better consistency, and the ability to extract more clinically-meaningful insights. This could ultimately help accelerate research and lead to more effective treatments for conditions like bipolar disorder and schizophrenia.

Technical Explanation

The paper presents several novel approaches for using large language models to aid in the annotation and collection of clinically-enriched data for mental health research.

First, the authors develop LLM-based text annotation models that can automatically identify and classify clinically-relevant entities and relations within unstructured text data, such as medical records or social media posts. These models are trained on labeled datasets to learn the patterns and language used to describe symptoms, diagnoses, treatments, and other key concepts.

Second, the researchers explore the use of LLMs for generating synthetic text that can supplement existing mental health datasets. By fine-tuning these models on real patient narratives, they are able to produce new, realistic-sounding content that shares similar characteristics and can be used to expand the training data available to downstream machine learning models.

The paper also includes extensive evaluations of these LLM-based approaches on real-world datasets related to bipolar disorder and schizophrenia. The results demonstrate significant improvements in annotation quality, consistency, and efficiency compared to manual curation efforts. Additionally, the generated synthetic data is shown to enhance the performance of predictive models for these mental health conditions.

Critical Analysis

The paper presents a compelling case for the use of large language models in the annotation and collection of clinically-enriched data for mental health research. The proposed techniques have the potential to dramatically streamline and enhance these crucial data curation processes, which have traditionally been labor-intensive and prone to inconsistencies.

However, the paper also acknowledges several important caveats and limitations. One key concern is the potential for biases inherent in the language models to be propagated or amplified during the annotation and generation tasks. The authors note the need for careful monitoring and mitigation of these biases to ensure the data remains representative and unbiased.

Additionally, while the results on the evaluated datasets are promising, further research is needed to assess the generalizability of these techniques to a wider range of mental health conditions and data sources. The paper also does not address potential privacy and ethical considerations around the use of patient data and the generation of synthetic content.

Overall, this paper represents an important step forward in leveraging the power of large language models to improve the quality and efficiency of mental health research data. However, continued caution and diligence will be necessary to ensure these technologies are developed and deployed responsibly and equitably.

Conclusion

This paper demonstrates the potential for large language models to significantly enhance the annotation and collection of clinically-enriched data for mental health conditions like bipolar disorder and schizophrenia. By automating key data curation tasks and generating synthetic content to supplement existing datasets, these LLM-based approaches could streamline research workflows, improve data quality, and ultimately accelerate the development of more effective treatments and interventions.

While the results are promising, the authors also acknowledge important limitations and areas for further research, particularly around the mitigation of biases and the consideration of ethical implications. As the field of mental health research continues to evolve, the judicious and responsible use of advanced AI technologies like LLMs will be crucial for driving meaningful progress and improving outcomes for those affected by these challenging conditions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Large Language Model for Mental Health: A Systematic Review

Zhijun Guo, Alvina Lai, Johan Hilge Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li

Large language models (LLMs) have attracted significant attention for potential applications in digital health, while their application in mental health is subject to ongoing debate. This systematic review aims to evaluate the usage of LLMs in mental health, focusing on their strengths and limitations in early screening, digital interventions, and clinical applications. Adhering to PRISMA guidelines, we searched PubMed, IEEE Xplore, Scopus, and the JMIR using keywords: 'mental health OR mental illness OR mental disorder OR psychiatry' AND 'large language models'. We included articles published between January 1, 2017, and December 31, 2023, excluding non-English articles. 30 articles were evaluated, which included research on mental illness and suicidal ideation detection through text (n=12), usage of LLMs for mental health conversational agents (CAs) (n=5), and other applications and evaluations of LLMs in mental health (n=13). LLMs exhibit substantial effectiveness in detecting mental health issues and providing accessible, de-stigmatized eHealth services. However, the current risks associated with the clinical use might surpass their benefits. The study identifies several significant issues: the lack of multilingual datasets annotated by experts, concerns about the accuracy and reliability of the content generated, challenges in interpretability due to the 'black box' nature of LLMs, and persistent ethical dilemmas. These include the lack of a clear ethical framework, concerns about data privacy, and the potential for over-reliance on LLMs by both therapists and patients, which could compromise traditional medical practice. Despite these issues, the rapid development of LLMs underscores their potential as new clinical aids, emphasizing the need for continued research and development in this area.

5/31/2024

cs.CY cs.AI cs.CL

💬

Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

Isabelle Lorge, Dan W. Joyce, Andrey Kormilitzin

Mental health in children and adolescents has been steadily deteriorating over the past few years. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage.

4/29/2024

cs.CL

Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models

Yuqing Wang, Yun Zhao, Sara Alessandra Keller, Anne de Hond, Marieke M. van Buchem, Malvika Pillai, Tina Hernandez-Boussard

The advancement of large language models (LLMs) has demonstrated strong capabilities across various applications, including mental health analysis. However, existing studies have focused on predictive performance, leaving the critical issue of fairness underexplored, posing significant risks to vulnerable populations. Despite acknowledging potential biases, previous works have lacked thorough investigations into these biases and their impacts. To address this gap, we systematically evaluate biases across seven social factors (e.g., gender, age, religion) using ten LLMs with different prompting methods on eight diverse mental health datasets. Our results show that GPT-4 achieves the best overall balance in performance and fairness among LLMs, although it still lags behind domain-specific models like MentalRoBERTa in some cases. Additionally, our tailored fairness-aware prompts can effectively mitigate bias in mental health predictions, highlighting the great potential for fair analysis in this field.

6/21/2024

cs.CL

🌀

Bias patterns in the application of LLMs for clinical decision support: A comprehensive study

Raphael Poulain, Hamed Fayyaz, Rahmatollah Beheshti

Large Language Models (LLMs) have emerged as powerful candidates to inform clinical decision-making processes. While these models play an increasingly prominent role in shaping the digital landscape, two growing concerns emerge in healthcare applications: 1) to what extent do LLMs exhibit social bias based on patients' protected attributes (like race), and 2) how do design choices (like architecture design and prompting strategies) influence the observed biases? To answer these questions rigorously, we evaluated eight popular LLMs across three question-answering (QA) datasets using clinical vignettes (patient descriptions) standardized for bias evaluations. We employ red-teaming strategies to analyze how demographics affect LLM outputs, comparing both general-purpose and clinically-trained models. Our extensive experiments reveal various disparities (some significant) across protected groups. We also observe several counter-intuitive patterns such as larger models not being necessarily less biased and fined-tuned models on medical data not being necessarily better than the general-purpose models. Furthermore, our study demonstrates the impact of prompt design on bias patterns and shows that specific phrasing can influence bias patterns and reflection-type approaches (like Chain of Thought) can reduce biased outcomes effectively. Consistent with prior studies, we call on additional evaluations, scrutiny, and enhancement of LLMs used in clinical decision support applications.

4/24/2024

cs.CL cs.LG