Towards Weakly-Supervised Hate Speech Classification Across Datasets

Read original: arXiv:2305.02637 - Published 5/28/2024 by Yiping Jin, Leo Wanner, Vishakha Laxman Kadam, Alexander Shvets
Total Score

0

🗣️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Current research on hate speech (HS) recognition is hindered by inconsistent data creation strategies and varying annotation methods.
  • This leads to poor generalization of supervised learning models, as they struggle to perform well on datasets they weren't trained on.
  • The authors propose using extremely weak supervision, which only relies on class names rather than annotated samples, to address this issue.

Plain English Explanation

The paper explains that current research on detecting hate speech online has a major problem - the data used to train these systems is often created and annotated in very different ways. This can lead to issues with the performance of these models when they are used in new situations.

To solve this, the researchers suggest using a new approach called "extremely weak supervision." Instead of training the hate speech detection models on detailed annotations of examples, they only use the general category names (like "hate speech" or "not hate speech"). This allows the models to learn patterns without being overly influenced by the specific ways the training data was collected and labeled.

The paper shows that this weak supervision approach can work well, allowing the hate speech models to perform well both on the original datasets they were trained on, as well as new datasets that use different labeling methods. This helps address the challenge of building hate speech detection systems that can work across cultures and languages.

Technical Explanation

The paper proposes using a weakly-supervised text classification model to address the generalization issues in hate speech (HS) recognition. Rather than relying on detailed annotations of HS examples, the approach only uses the class names (e.g. "hate speech," "not hate speech") during training.

The authors evaluate this approach in both in-dataset and cross-dataset settings, demonstrating its effectiveness compared to traditional supervised learning. They also conduct a quantitative and qualitative analysis to understand the sources of poor generalizability in HS classification models.

The findings suggest that this weakly-supervised technique can help overcome the challenges posed by diverging annotation schemes and unsystematic data creation in current HS research.

Critical Analysis

The paper presents a promising solution to the generalization issues in hate speech detection, but it is important to consider some potential limitations and areas for further research:

  • The experiments were conducted on a limited set of datasets, so more extensive testing is needed to fully validate the approach across a wider range of HS taxonomies and cultural contexts.

  • The qualitative analysis provides insights, but a more systematic investigation of the sources of poor generalizability would be valuable.

Additionally, the authors do not address the potential risks of using weakly-supervised techniques, such as the possible amplification of biases present in the training data. Further research is needed to ensure these models do not inadvertently reinforce harmful stereotypes or discriminatory patterns.

Conclusion

This paper offers a novel approach to improving the generalization of hate speech detection models by using extremely weak supervision. The findings suggest this technique can help overcome the limitations of current HS research, which is hindered by inconsistent data creation and annotation practices.

While more extensive testing is needed, the proposed method shows promise in enabling hate speech detection systems to perform well across diverse datasets and cultural contexts. This could be an important step towards developing more robust and responsible HS moderation tools.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🗣️

Total Score

0

Towards Weakly-Supervised Hate Speech Classification Across Datasets

Yiping Jin, Leo Wanner, Vishakha Laxman Kadam, Alexander Shvets

As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.

Read more

5/28/2024

Empirical Evaluation of Public HateSpeech Datasets
Total Score

0

Empirical Evaluation of Public HateSpeech Datasets

Sadar Jaf, Basel Barakat

Despite the extensive communication benefits offered by social media platforms, numerous challenges must be addressed to ensure user safety. One of the most significant risks faced by users on these platforms is targeted hate speech. Social media platforms are widely utilised for generating datasets employed in training and evaluating machine learning algorithms for hate speech detection. However, existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hate speech classification. This study provides a comprehensive empirical evaluation of several public datasets commonly used in automated hate speech classification. Through rigorous analysis, we present compelling evidence highlighting the limitations of current hate speech datasets. Additionally, we conduct a range of statistical analyses to elucidate the strengths and weaknesses inherent in these datasets. This work aims to advance the development of more accurate and reliable machine learning models for hate speech detection by addressing the dataset limitations identified.

Read more

7/18/2024

Trustworthy Hate Speech Detection Through Visual Augmentation
Total Score

0

Trustworthy Hate Speech Detection Through Visual Augmentation

Ziyuan Yang, Ming Yan, Yingyu Chen, Hui Wang, Zexin Lu, Yi Zhang

The surge of hate speech on social media platforms poses a significant challenge, with hate speech detection~(HSD) becoming increasingly critical. Current HSD methods focus on enriching contextual information to enhance detection performance, but they overlook the inherent uncertainty of hate speech. We propose a novel HSD method, named trustworthy hate speech detection method through visual augmentation (TrusV-HSD), which enhances semantic information through integration with diffused visual images and mitigates uncertainty with trustworthy loss. TrusV-HSD learns semantic representations by effectively extracting trustworthy information through multi-modal connections without paired data. Our experiments on public HSD datasets demonstrate the effectiveness of TrusV-HSD, showing remarkable improvements over conventional methods.

Read more

9/23/2024

Investigating Annotator Bias in Large Language Models for Hate Speech Detection
Total Score

0

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Amit Das, Zheng Zhang, Fatemeh Jamshidi, Vinija Jain, Aman Chadha, Nilanjana Raychawdhary, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs), like ChatGPT presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs, specifically GPT 3.5 and GPT 4o when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateSpeechCorpus, to conduct this research. Additionally, we perform the same experiments on the ETHOS (Mollas et al., 2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for dataannotation, thereby fostering advancements in this critical field. The HateSpeechCorpus dataset is available here: https://github.com/AmitDasRup123/HateSpeechCorpus

Read more

6/19/2024