Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts

2311.09066

Published 6/17/2024 by Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan

cs.CL

🤿

Abstract

In the last decade, the United States has lost more than 500,000 people from an overdose involving prescription and illicit opioids making it a national public health emergency (USDHHS, 2017). Medical practitioners require robust and timely tools that can effectively identify at-risk patients. Community-based social media platforms such as Reddit allow self-disclosure for users to discuss otherwise sensitive drug-related behaviors. We present a moderate size corpus of 2500 opioid-related posts from various subreddits labeled with six different phases of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, Not Using. For every post, we annotate span-level extractive explanations and crucially study their role both in annotation quality and model development. We evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting. Experimental results and error analysis show that identifying the phases of opioid use disorder is highly contextual and challenging. However, we find that using explanations during modeling leads to a significant boost in classification accuracy demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum.

Create account to get full access

Overview

The United States is facing a severe public health crisis due to the opioid epidemic, with over 500,000 people dying from opioid overdoses in the last decade.
Medical practitioners need effective tools to identify at-risk patients, and social media platforms like Reddit can provide valuable insights into opioid-related behaviors.
Researchers have developed a dataset of 2,500 opioid-related Reddit posts, labeled with six different phases of opioid use, and annotated with span-level explanations.
The researchers evaluate various state-of-the-art models in supervised, few-shot, and zero-shot settings to determine the efficacy of the dataset and the role of explanations in model performance.

Plain English Explanation

The opioid crisis is a major public health problem in the United States, with hundreds of thousands of people dying from opioid overdoses in the last 10 years. Healthcare providers need better tools to identify patients who are at risk of opioid misuse or addiction. Social media platforms, like the discussion forum Reddit, can provide valuable insights into how people discuss and share their experiences with opioid use.

The researchers in this study created a dataset of 2,500 Reddit posts that are related to opioid use. They labeled each post with one of six different stages of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, and Not Using. They also highlighted specific parts of each post that explained the user's opioid use stage.

The researchers then tested different machine learning models to see how well they could automatically identify the stage of opioid use based on the Reddit posts. They tried models in different settings, including when they had lots of labeled examples, just a few examples, or no examples at all. The results showed that identifying the stage of opioid use is very challenging, as the context is often complex and nuanced.

However, the researchers found that using the highlighted explanations from the posts helped the models perform significantly better at classifying the opioid use stage. This suggests that the explanations provide important additional information that can aid in understanding the complexities of opioid use disorders.

Technical Explanation

The researchers created a dataset of 2,500 Reddit posts related to opioid use, labeled with six different phases: Medical Use, Misuse, Addiction, Recovery, Relapse, and Not Using. For each post, they also annotated specific spans of text that explained the user's opioid use stage.

They then evaluated several state-of-the-art machine learning models on this dataset, including in supervised, few-shot, and zero-shot settings. The goal was to assess the models' ability to accurately classify the opioid use stage based on the Reddit posts.

The results showed that identifying the phase of opioid use is highly contextual and challenging, as the language used can be complex and nuanced. However, the researchers found that incorporating the span-level explanations into the modeling process led to a significant boost in classification accuracy. This demonstrates the beneficial role of explanations in a high-stakes domain like studying opioid use disorder.

Critical Analysis

The researchers acknowledge that their dataset, while moderately sized, may not fully capture the diversity of opioid-related discussions on social media. There could be biases or gaps in the types of posts included, which could limit the generalizability of the findings.

Additionally, while the use of span-level explanations improved model performance, the researchers do not provide a deep analysis of how the explanations are being utilized by the models. Further investigation into the specific mechanisms by which the explanations enhance classification could yield additional insights.

It would also be valuable to explore how this approach could be applied in a real-world clinical setting, where practitioners would need to quickly and accurately identify patients at risk of opioid misuse or addiction. The researchers do not address the potential challenges of deploying such a system in a healthcare context.

Overall, this research represents an important step in leveraging social media data and explainable AI to address the complex and devastating opioid crisis. However, continued work is needed to refine the methods and explore the practical applications of this approach.

Conclusion

This study highlights the potential of using social media data and explainable AI techniques to tackle the opioid epidemic, a critical public health challenge facing the United States. By creating a dataset of opioid-related Reddit posts annotated with explanations of the user's stage of use, the researchers have developed a valuable resource for studying this complex issue.

The finding that incorporating these explanations can significantly improve the accuracy of opioid use stage classification is a promising result, suggesting that explainable AI models may be a valuable tool for healthcare providers in identifying at-risk patients. While further research is needed to address the limitations and explore real-world applications, this work represents an important step forward in leveraging technology to address the devastating opioid crisis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Decoding the Narratives: Analyzing Personal Drug Experiences Shared on Reddit

Layla Bouzoubaa, Elham Aghakhani, Max Song, Minh Trinh, Rezvaneh Rezapour

Online communities such as drug-related subreddits serve as safe spaces for people who use drugs (PWUD), fostering discussions on substance use experiences, harm reduction, and addiction recovery. Users' shared narratives on these forums provide insights into the likelihood of developing a substance use disorder (SUD) and recovery potential. Our study aims to develop a multi-level, multi-label classification model to analyze online user-generated texts about substance use experiences. For this purpose, we first introduce a novel taxonomy to assess the nature of posts, including their intended connections (Inquisition or Disclosure), subjects (e.g., Recovery, Dependency), and specific objectives (e.g., Relapse, Quality, Safety). Using various multi-label classification algorithms on a set of annotated data, we show that GPT-4, when prompted with instructions, definitions, and examples, outperformed all other models. We apply this model to label an additional 1,000 posts and analyze the categories of linguistic expression used within posts in each class. Our analysis shows that topics such as Safety, Combination of Substances, and Mental Health see more disclosure, while discussions about physiological Effects focus on harm reduction. Our work enriches the understanding of PWUD's experiences and informs the broader knowledge base on SUD and drug use.

6/19/2024

cs.CL

Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media

Yao Ge, Sudeshna Das, Karen O'Connor, Mohammed Ali Al-Garadi, Graciela Gonzalez-Hernandez, Abeed Sarker

Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences. In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder. The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its clinical and social impacts. We collected data from chosen subreddits using the publicly available Application Programming Interface for Reddit. We manually annotated text spans representing clinical and social impacts reported by people who also reported personal nonmedical use of substances including but not limited to opioids, stimulants and benzodiazepines. Our objective is to create a resource that can enable the development of systems that can automatically detect clinical and social impacts of substance use from text-based social media data. The successful development of such systems may enable us to better understand how nonmedical use of substances affects individual health and societal dynamics, aiding the development of effective public health strategies. In addition to creating the annotated data set, we applied several machine learning models to establish baseline performances. Specifically, we experimented with transformer models like BERT, and RoBERTa, one few-shot learning model DANN by leveraging the full training dataset, and GPT-3.5 by using one-shot learning, for automatic NER of clinical and social impacts. The dataset has been made available through the 2024 SMM4H shared tasks.

5/13/2024

cs.CL cs.AI cs.LG

Analyzing Toxicity in Deep Conversations: A Reddit Case Study

Vigneshwaran Shankaran, Rajesh Sharma

Online social media has become increasingly popular in recent years due to its ease of access and ability to connect with others. One of social media's main draws is its anonymity, allowing users to share their thoughts and opinions without fear of judgment or retribution. This anonymity has also made social media prone to harmful content, which requires moderation to ensure responsible and productive use. Several methods using artificial intelligence have been employed to detect harmful content. However, conversation and contextual analysis of hate speech are still understudied. Most promising works only analyze a single text at a time rather than the conversation supporting it. In this work, we employ a tree-based approach to understand how users behave concerning toxicity in public conversation settings. To this end, we collect both the posts and the comment sections of the top 100 posts from 8 Reddit communities that allow profanity, totaling over 1 million responses. We find that toxic comments increase the likelihood of subsequent toxic comments being produced in online conversations. Our analysis also shows that immediate context plays a vital role in shaping a response rather than the original post. We also study the effect of consensual profanity and observe overlapping similarities with non-consensual profanity in terms of user behavior and patterns.

4/12/2024

cs.CL cs.CY cs.SI

iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023

Jay Patel, Pujan Paudel, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn

Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception. Furthermore, we provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model, to further advance the field in characterizing the discussions within these communities. We aim to provide these resources to facilitate their investigations without the need for extensive data collection and processing efforts.

5/17/2024

cs.SI cs.CY cs.IR