Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Read original: arXiv:2407.06125 - Published 7/9/2024 by Avinash Anand, Chayan Tank, Sarthak Pol, Vinayak Katoch, Shaina Mehta, Rajiv Ratn Shah

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Overview

This paper explores the use of large language models and multimodal analysis for detecting and analyzing depression.
The researchers investigate the performance of various deep learning models, including RoBERTa, BiLSTM, and [DepRoBERTa], in identifying and assessing the severity of depression from textual and audio-visual data.
The study also evaluates the potential of large language models like OpenAI GPT 3.5, [OpenAI GPT 4], and [LLAMA 3 8B Instruct] for depression detection and analysis.

Plain English Explanation

The paper explores using advanced AI models, known as large language models, to detect and analyze depression from people's written text and spoken words, as well as their facial expressions and body language in video. The researchers tested different deep learning models, like RoBERTa and BiLSTM, to see how well they could identify signs of depression and measure its severity. They also evaluated the potential of cutting-edge language models like GPT-3.5, GPT-4, and LLAMA for this task. The goal is to develop more accurate and accessible tools for early detection and monitoring of depression, which is a prevalent mental health issue.

Technical Explanation

The paper investigates the use of large language models and multimodal analysis for depression detection and severity assessment. The researchers evaluate the performance of various deep learning models, including RoBERTa, BiLSTM, and [DepRoBERTa], on textual and audio-visual data.

For the textual modality, the models are trained on transcribed speech and written text. The audio-visual modality incorporates facial expressions, body language, and speech patterns captured in video recordings. The study also explores the potential of large language models, such as OpenAI GPT 3.5, [OpenAI GPT 4], and [LLAMA 3 8B Instruct], for depression detection and analysis.

The researchers use a combination of techniques, including transfer learning, knowledge distillation, and multimodal fusion, to leverage the strengths of these models and improve the overall performance in identifying and assessing the severity of depression.

Critical Analysis

The paper presents a comprehensive approach to depression detection and analysis using advanced AI models. However, it is important to note that the reliability and generalizability of such systems may be limited by factors such as the diversity of the training data, the subjectivity of depression assessment, and the potential for bias in the underlying models.

Further research is needed to address these challenges and ensure the ethical and responsible deployment of these technologies, particularly in sensitive mental health contexts. Additionally, the long-term implications of using large language models for mental health monitoring and decision-making should be carefully considered.

Conclusion

This research demonstrates the potential of large language models and multimodal analysis for early detection and monitoring of depression. By leveraging the capabilities of these advanced AI systems, the study aims to develop more accurate and accessible tools for mental health assessment and intervention.

However, the findings also highlight the need for continued research and thoughtful consideration of the ethical and practical implications of deploying such technologies in real-world mental health settings. Ongoing collaboration between researchers, clinicians, and policymakers will be crucial in ensuring the responsible and effective use of these emerging technologies to address the significant challenges posed by depression.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Avinash Anand, Chayan Tank, Sarthak Pol, Vinayak Katoch, Shaina Mehta, Rajiv Ratn Shah

Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionnaires, including variants of the Patient Health Questionnaire (PHQ) by Clinicians and mental health professionals. This approach places significant reliance on the experience and judgment of trained physicians, making the diagnosis susceptible to personal biases. Given that the underlying mechanisms causing depression are still being actively researched, physicians often face challenges in diagnosing and treating the condition, particularly in its early stages of clinical presentation. Recently, significant strides have been made in Artificial neural computing to solve problems involving text, image, and speech in various domains. Our analysis has aimed to leverage these state-of-the-art (SOTA) models in our experiments to achieve optimal outcomes leveraging multiple modalities. The experiments were performed on the Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge. The proposed solutions demonstrate better results achieved by Proprietary and Open-source Large Language Models (LLMs), which achieved a Root Mean Square Error (RMSE) score of 3.98 on Textual Modality, beating the AVEC 2019 challenge baseline results and current SOTA regression analysis architectures. Additionally, the proposed solution achieved an accuracy of 71.43% in the classification task. The paper also includes a novel audio-visual multi-modal network that predicts PHQ-8 scores with an RMSE of 6.51.

7/9/2024

We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation

Palash Moon, Pushpak Bhattacharyya

The detection of depression through non-verbal cues has gained significant attention. Previous research predominantly centred on identifying depression within the confines of controlled laboratory environments, often with the supervision of psychologists or counsellors. Unfortunately, datasets generated in such controlled settings may struggle to account for individual behaviours in real-life situations. In response to this limitation, we present the Extended D-vlog dataset, encompassing a collection of 1, 261 YouTube vlogs. Additionally, the emergence of large language models (LLMs) like GPT3.5, and GPT4 has sparked interest in their potential they can act like mental health professionals. Yet, the readiness of these LLM models to be used in real-life settings is still a concern as they can give wrong responses that can harm the users. We introduce a virtual agent serving as an initial contact for mental health patients, offering Cognitive Behavioral Therapy (CBT)-based responses. It comprises two core functions: 1. Identifying depression in individuals, and 2. Delivering CBT-based therapeutic responses. Our Mistral model achieved impressive scores of 70.1% and 30.9% for distortion assessment and classification, along with a Bert score of 88.7%. Moreover, utilizing the TVLT model on our Multimodal Extended D-vlog Dataset yielded outstanding results, with an impressive F1-score of 67.8%

6/18/2024

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification

Santosh V. Patapati

Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

8/20/2024

Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment

Jinghui Qin, Changsong Liu, Tianchi Tang, Dahuang Liu, Minghao Wang, Qianying Huang, Yang Xu, Rumin Zhang

Mental disorders, such as anxiety and depression, have become a global issue that affects the regular lives of people across different ages. Without proper detection and treatment, anxiety and depression can hinder the sufferer's study, work, and daily life. Fortunately, recent advancements of digital and AI technologies provide new opportunities for better mental health care and many efforts have been made in developing automatic anxiety and depression assessment techniques. However, this field still lacks a publicly available large-scale dataset that can facilitate the development and evaluation of AI-based techniques. To address this limitation, we have constructed a new large-scale textbf{M}ulti-textbf{M}odal textbf{Psy}chological assessment corpus (MMPsy) on anxiety and depression assessment of Mandarin-speaking adolescents. The MMPsy contains audios and extracted transcripts of responses from automated anxiety or depression assessment interviews along with the self-reported anxiety or depression evaluations of the participants using standard mental health assessment questionnaires. Our dataset contains over 7,700 post-processed recordings of interviews for anxiety assessment and over 4,200 recordings for depression assessment. Using this dataset, we have developed a novel deep-learning based mental disorder estimation model, named textbf{Mental-Perceiver}, to detect anxious/depressive mental states from recorded audio and transcript data. Extensive experiments on our MMPsy and the commonly-used DAIC-WOZ datasets have shown the effectiveness and superiority of our proposed Mental-Perceiver model in anxiety and depression detection. The MMPsy dataset will be made publicly available later to facilitate the research and development of AI-based techniques in the mental health care field.

8/23/2024