Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis

Read original: arXiv:2409.05292 - Published 9/20/2024 by Nirmalya Thakur
Total Score

0

🔄

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The paper presents a multilingual dataset of 60,127 Instagram posts about the mpox (monkeypox) outbreak, which occurred between July 23, 2022, and September 5, 2024.
  • The dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains posts in 52 languages.
  • For each post, the dataset includes attributes such as Post ID, Post Description, Date of publication, Language, and an English translation.
  • The authors also performed sentiment analysis, hate speech detection, and anxiety/stress detection on the posts and included those results as separate attributes.

Plain English Explanation

This research paper focuses on social media mining during the recent mpox outbreak. The researchers developed a comprehensive dataset of over 60,000 Instagram posts related to mpox, which were published between July 2022 and September 2024.

The dataset includes posts in 52 different languages, and for each post, the researchers provide key information like the post's ID, text, publication date, and language. Importantly, they also translated each post into English using a machine translation API.

After compiling the dataset, the researchers performed several analyses on the posts. They used sentiment analysis to categorize the emotional tone of each post, such as fear, joy, or anger. They also checked for the presence of hate speech and signs of anxiety or stress.

This comprehensive dataset and the accompanying analyses provide valuable insights into how people were discussing and reacting to the mpox outbreak on social media. Researchers and public health officials could use this information to better understand the public's concerns and responses during disease outbreaks.

Technical Explanation

The researchers first collected a dataset of 60,127 Instagram posts about the mpox outbreak, which occurred between July 23, 2022, and September 5, 2024. These posts were written in 52 different languages.

For each post, the dataset includes the following attributes:

  • Post ID
  • Post Description
  • Date of publication
  • Language
  • English translation (using Google Translate API)

After compiling the dataset, the researchers performed several analyses on the posts:

  1. Sentiment Analysis: They classified each post into one of seven sentiment classes: fear, surprise, joy, sadness, anger, disgust, or neutral.
  2. Hate Speech Detection: They identified whether each post contained hate speech or not.
  3. Anxiety/Stress Detection: They determined whether each post indicated any signs of anxiety or stress.

The results of these analyses were included as additional attributes in the dataset.

The sentiment analysis revealed that 27.95% of the posts expressed fear, 2.57% expressed surprise, 8.69% expressed joy, 5.94% expressed sadness, 2.69% expressed anger, 1.53% expressed disgust, and 50.64% were neutral.

In terms of hate speech, 95.75% of the posts did not contain any hate speech, while 4.25% did.

Finally, the researchers found that 72.05% of the posts did not indicate any signs of anxiety or stress, while 27.95% did.

Critical Analysis

The researchers have provided a valuable dataset and analysis that can help researchers and public health officials better understand how people were discussing and reacting to the mpox outbreak on social media.

One potential limitation of the research is that the dataset is limited to Instagram posts, which may not represent the full breadth of online conversations about mpox. It would be useful to expand the analysis to other social media platforms, such as Twitter or Reddit, to get a more comprehensive understanding of the public's response.

Additionally, the researchers did not provide much information about the accuracy or reliability of the sentiment analysis, hate speech detection, and anxiety/stress detection methods used. It would be helpful to have more details on the performance of these algorithms, as well as any potential biases or limitations.

Despite these minor concerns, the dataset and analyses presented in this paper represent an important contribution to the field of social media mining and disease outbreak research. Researchers and public health officials can use this information to better understand the public's concerns and reactions during future disease outbreaks, potentially informing their communication and response strategies.

Conclusion

This research paper presents a comprehensive dataset of over 60,000 Instagram posts about the mpox outbreak, along with detailed analyses of the sentiment, hate speech, and anxiety/stress expressed in those posts.

The dataset, which is publicly available, provides valuable insights into how people were discussing and reacting to the mpox outbreak on social media. The researchers' findings suggest that a significant portion of the posts expressed fear, as well as some level of anxiety or stress, while a small percentage contained hate speech.

This research can be used by researchers and public health officials to better understand the public's response to disease outbreaks and develop more effective communication strategies. By leveraging social media data and analysis, they can gain valuable insights into the concerns and sentiments of the general public, which can inform their public health policies and interventions.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔄

Total Score

0

Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis

Nirmalya Thakur

The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. No prior work related to social media mining has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper aims to address this research gap and makes two scientific contributions to this field. First, it presents a multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. The dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram posts about mpox in 52 languages. For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were performed. This process included classifying each post into (i) one of the sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no anxiety/stress detected. These results are presented as separate attributes in the dataset. Second, this paper presents the results of performing sentiment analysis, hate speech analysis, and anxiety or stress analysis. The variation of the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and neutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and 50.64%, respectively. In terms of hate speech detection, 95.75% of the posts did not contain hate and the remaining 4.25% of the posts contained hate. Finally, 72.05% of the posts did not indicate any anxiety/stress, and the remaining 27.95% of the posts represented some form of anxiety/stress.

Read more

9/20/2024

OPSD: an Offensive Persian Social media Dataset and its baseline evaluations
Total Score

0

OPSD: an Offensive Persian Social media Dataset and its baseline evaluations

Mehran Safayani, Amir Sartipi, Amir Hossein Ahmadi, Parniyan Jalali, Amir Hossein Mansouri, Mohammad Bisheh-Niasar, Zahra Pourbahman

The proliferation of hate speech and offensive comments on social media has become increasingly prevalent due to user activities. Such comments can have detrimental effects on individuals' psychological well-being and social behavior. While numerous datasets in the English language exist in this domain, few equivalent resources are available for Persian language. To address this gap, this paper introduces two offensive datasets. The first dataset comprises annotations provided by domain experts, while the second consists of a large collection of unlabeled data obtained through web crawling for unsupervised learning purposes. To ensure the quality of the former dataset, a meticulous three-stage labeling process was conducted, and kappa measures were computed to assess inter-annotator agreement. Furthermore, experiments were performed on the dataset using state-of-the-art language models, both with and without employing masked language modeling techniques, as well as machine learning algorithms, in order to establish the baselines for the dataset using contemporary cutting-edge approaches. The obtained F1-scores for the three-class and two-class versions of the dataset were 76.9% and 89.9% for XLM-RoBERTa, respectively.

Read more

4/9/2024

🌐

Total Score

0

Characterizing Online Toxicity During the 2022 Mpox Outbreak: A Computational Analysis of Topical and Network Dynamics

Lizhou Fan, Lingyao Li, Libby Hemphill

Background: Online toxicity, encompassing behaviors such as harassment, bullying, hate speech, and the dissemination of misinformation, has become a pressing social concern in the digital age. The 2022 Mpox outbreak, initially termed Monkeypox but subsequently renamed to mitigate associated stigmas and societal concerns, serves as a poignant backdrop to this issue. Objective: In this research, we undertake a comprehensive analysis of the toxic online discourse surrounding the 2022 Mpox outbreak. Our objective is to dissect its origins, characterize its nature and content, trace its dissemination patterns, and assess its broader societal implications, with the goal of providing insights that can inform strategies to mitigate such toxicity in future crises. Methods: We collected more than 1.6 million unique tweets and analyzed them from five dimensions, including context, extent, content, speaker, and intent. Utilizing BERT-based topic modeling and social network community clustering, we delineated the toxic dynamics on Twitter. Results: We identified five high-level topic categories in the toxic online discourse on Twitter, including disease (46.6%), health policy and healthcare (19.3%), homophobia (23.9%), politics (6.0%), and racism (4.1%). Through the toxicity diffusion networks of mentions, retweets, and the top users, we found that retweets of toxic content were widespread, while influential users rarely engaged with or countered this toxicity through retweets. Conclusions: By tracking topical dynamics, we can track the changing popularity of toxic content online, providing a better understanding of societal challenges. Network dynamics spotlight key social media influencers and their intents, indicating that addressing these central figures in toxic discourse can enhance crisis communication and inform policy-making.

Read more

9/4/2024

⚙️

Total Score

0

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles

Nirmalya Thakur, Vanessa Su, Mingchen Shao, Kesha A. Patel, Hongseok Jeong, Victoria Knieling, Andrew Bian

The work of this paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. The dataset is available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. Finally, this paper also presents a list of open research questions that may be investigated using this dataset.

Read more

7/19/2024