Exploring Social Media Posts for Depression Identification: A Study on Reddit Dataset

2405.06656

Published 5/14/2024 by Nandigramam Sai Harshit, Nilesh Kumar Sahu, Haroon R. Lone

🗣️

Abstract

Depression is one of the most common mental disorders affecting an individual's personal and professional life. In this work, we investigated the possibility of utilizing social media posts to identify depression in individuals. To achieve this goal, we conducted a preliminary study where we extracted and analyzed the top Reddit posts made in 2022 from depression-related forums. The collected data were labeled as depressive and non-depressive using UMLS Metathesaurus. Further, the pre-processed data were fed to classical machine learning models, where we achieved an accuracy of 92.28% in predicting the depressive and non-depressive posts.

Create account to get full access

Overview

Investigated using social media posts to identify depression
Analyzed top Reddit posts from depression-related forums in 2022
Labeled data as depressive or non-depressive using UMHS Metathesaurus
Achieved 92.28% accuracy in predicting depressive and non-depressive posts using classical machine learning models

Plain English Explanation

Depression is a common mental health issue that can significantly impact a person's personal and professional life. In this study, the researchers explored the possibility of using social media posts, specifically from Reddit, to detect depression in individuals.

They collected and analyzed the top Reddit posts made in 2022 from forums related to depression. The researchers then used a medical database called the UMLS Metathesaurus to label the posts as either depressive or non-depressive. This labeled data was then fed into classical machine learning models, which were able to accurately predict whether a post was depressive or not with an impressive 92.28% accuracy.

This research suggests that analyzing social media posts could be a promising way to identify individuals struggling with depression. By using machine learning to analyze social media data, researchers may be able to develop tools that can automatically screen for depression symptoms and potentially provide earlier intervention and support for those in need.

Technical Explanation

The researchers conducted a preliminary study to explore the feasibility of using social media posts to detect depression. They focused on analyzing top Reddit posts from depression-related forums in 2022.

After collecting the data, the researchers used the UMLS Metathesaurus, a comprehensive medical terminology database, to label the posts as either depressive or non-depressive. This labeled dataset was then preprocessed and fed into classical machine learning models, such as Logistic Regression, Random Forest, and Support Vector Machines.

The researchers achieved an impressive accuracy of 92.28% in predicting whether a post was depressive or non-depressive. This suggests that machine learning techniques can effectively analyze natural language processing (NLP) data from social media to identify signs of depression.

Critical Analysis

The study provides promising results, but it is important to consider some potential limitations and areas for further research. The data used in this study was limited to Reddit posts, which may not fully capture the diversity of social media platforms and user experiences. Additionally, the labeling of posts as depressive or non-depressive was based on a medical terminology database, which may not always accurately reflect an individual's actual mental state.

Further research could explore the use of more sophisticated natural language processing techniques, such as analyzing named entities in Reddit posts, to gain a deeper understanding of the contextual and emotional aspects of the posts. Incorporating additional data sources, such as user demographics and behavioral patterns, could also improve the accuracy and reliability of depression detection models.

Conclusion

This study demonstrates the potential of using social media data, specifically Reddit posts, to detect signs of depression. By leveraging machine learning algorithms and natural language processing techniques, the researchers were able to achieve a high accuracy in predicting whether a post was depressive or non-depressive.

The findings of this research suggest that social media could be a valuable resource for early identification and intervention of depression. While there are some limitations to the current approach, the promising results highlight the need for further exploration and development of tools that can utilize social media data to support mental health. Continued advancements in this area could lead to more effective strategies for addressing the widespread challenge of depression.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Multi Class Depression Detection Through Tweets using Artificial Intelligence

Muhammad Osama Nusrat, Waseem Shahzad, Saad Ahmed Jamal

Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc. These platforms contain valuable information which can be used for research purposes. Considerable research has been conducted across various social media platforms. However, certain limitations persist in these endeavors. Particularly, previous studies were only focused on detecting depression and the intensity of depression in tweets. Also, there existed inaccuracies in dataset labeling. In this research work, five types of depression (Bipolar, major, psychotic, atypical, and postpartum) were predicted using tweets from the Twitter database based on lexicon labeling. Explainable AI was used to provide reasoning by highlighting the parts of tweets that represent type of depression. Bidirectional Encoder Representations from Transformers (BERT) was used for feature extraction and training. Machine learning and deep learning methodologies were used to train the model. The BERT model presented the most promising results, achieving an overall accuracy of 0.96.

4/23/2024

cs.CL cs.AI

🔎

Diverse Perspectives, Divergent Models: Cross-Cultural Evaluation of Depression Detection on Twitter

Nuredin Ali, Charles Chuankai Zhang, Ned Mayo, Stevie Chancellor

Social media data has been used for detecting users with mental disorders, such as depression. Despite the global significance of cross-cultural representation and its potential impact on model performance, publicly available datasets often lack crucial metadata related to this aspect. In this work, we evaluate the generalization of benchmark datasets to build AI models on cross-cultural Twitter data. We gather a custom geo-located Twitter dataset of depressed users from seven countries as a test dataset. Our results show that depression detection models do not generalize globally. The models perform worse on Global South users compared to Global North. Pre-trained language models achieve the best generalization compared to Logistic Regression, though still show significant gaps in performance on depressed and non-Western users. We quantify our findings and provide several actionable suggestions to mitigate this issue.

6/26/2024

cs.CL

🔎

Studying Differential Mental Health Expressions in India

Khushi Shelat, Sunny Rai, Devansh R Jain, Kishen Sivabalan, Young Min Cho, Maitreyi Redkar, Samindara Sawant, Sharath Chandra Guntuku

Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online depression language specific to the Indian context compared to users from the Rest of the World (ROW). Unlike in Western samples, we observe that mental health discussions in India additionally express sadness, use negation, are present-focused, and are related to work and achievement. Illness is uniquely correlated to India, indicating the association between depression and physical health in Indian patients. Two clinical psychologists validated the findings from social media posts and found 95% of the top 20 topics associated with mental health discussions as prevalent in Indians. Significant linguistic variations in online mental health-related language in India compared to ROW, emphasize the importance of developing precision-targeted interventions that are culturally appropriate.

6/18/2024

cs.CY

🤖

EmoScan: Automatic Screening of Depression Symptoms in Romanized Sinhala Tweets

Jayathi Hewapathirana, Deshan Sumanathilaka

This work explores the utilization of Romanized Sinhala social media data to identify individuals at risk of depression. A machine learning-based framework is presented for the automatic screening of depression symptoms by analyzing language patterns, sentiment, and behavioural cues within a comprehensive dataset of social media posts. The research has been carried out to compare the suitability of Neural Networks over the classical machine learning techniques. The proposed Neural Network with an attention layer which is capable of handling long sequence data, attains a remarkable accuracy of 93.25% in detecting depression symptoms, surpassing current state-of-the-art methods. These findings underscore the efficacy of this approach in pinpointing individuals in need of proactive interventions and support. Mental health professionals, policymakers, and social media companies can gain valuable insights through the proposed model. Leveraging natural language processing techniques and machine learning algorithms, this work offers a promising pathway for mental health screening in the digital era. By harnessing the potential of social media data, the framework introduces a proactive method for recognizing and assisting individuals at risk of depression. In conclusion, this research contributes to the advancement of proactive interventions and support systems for mental health, thereby influencing both research and practical applications in the field.

4/1/2024

cs.CL cs.CY cs.LG