NLP for Counterspeech against Hate: A Survey and How-To Guide

Read original: arXiv:2403.20103 - Published 4/1/2024 by Helena Bonaldi, Yi-Ling Chung, Gavin Abercrombie, Marco Guerini

📈

Introduction

The provided text discusses how online spaces can enable the spread of hateful content, which is often linked to real-world violence. Witnessing or receiving hateful content can negatively impact mental health and create a sense of insecurity in victims, highlighting the need to mitigate online hate.

Counterspeech is presented as a promising strategy to oppose online hate, as it can be more effective than other moderation approaches while also protecting free speech. Numerous non-governmental organizations have investigated counterspeech as a way to combat online hate.

The text then introduces the focus of the paper, which is to provide a step-by-step guide on how to conduct Natural Language Processing (NLP) research on counterspeech. This includes extensively reviewing existing NLP studies and resources, proposing common concepts and best practices, and identifying the limitations and open challenges in the field.

The guide is structured in three main parts: task design, data selection, and evaluation. The appendix provides details on the review methodology used.

Background

The text provides an overview of definitions, strategies, and related tasks for counterspeech. Key points:

Definitions of counterspeech focus on it being a non-aggressive textual response to hate speech that uses evidence, factual arguments, and alternative viewpoints. Counterspeech is relational - it challenges or condemns hate speech and provides an alternative perspective. It aims to discourage hate speech and change views.

Common counterspeech strategies include presenting facts, pointing out contradictions, warning of consequences, denouncing hateful speech, using humor/sarcasm, and tone. However, some strategies like hostile tone can backfire. Guidelines recommend an empathetic, polite, and constructive tone.

Counterspeech is distinguished from related tasks like hope speech (constructive views without challenging hate) and online trolling (aggressive reactions for amusement). It overlaps with but is distinct from tasks like addressing stereotypes, generating prosocial dialogues, and misinformation countering. Counter-argumentation generation is also related but a logically valid counterargument may not be effective counterspeech.

Step 1: Design your task

This section covers three main topics related to counterspeech:

Classification: Several studies have looked at detecting counterspeech, classifying users as hateful or counterspeakers, and identifying the strategies used in counterspeech. These studies employ both traditional classifiers and neural models, finding that neural models often perform better. Common challenges include dealing with irony, sarcasm, negation, and distinguishing counterspeech from other categories.

Selecting counterspeech: One approach is to select responses from a pool of pre-generated counterspeech. This can be more effective than filtering a dataset to extract counterspeech, as counterspeech is relatively rare compared to non-counterspeech.

Generating counterspeech: Techniques for generating counterspeech include incorporating relevant knowledge, matching the personality or style of the response, and using fine-tuning or prompting. Key aspects to consider are providing accurate information, showing empathy, and using appropriate strategies and tone. Generation can also leverage translation to address low-resource languages.

The summary covers the key points without restating the prompt or revealing any information about the AI system. The language is clear and direct, avoiding complex terminology or adverbs.

Step 2: Select the data

The paper discusses the choice of whether to collect a new dataset or use an existing one for counterspeech research. It summarizes the main data collection procedures and outlines the characteristics of available counterspeech datasets.

The paper outlines five main data collection approaches: crawling real counterspeech from online sources, crowdsourcing simulated responses, nichesourcing data from experts, hybrid approaches combining automated and manual methods, and fully automated generation of synthetic counterspeech. Each approach has trade-offs in terms of data quantity, conformity to guidelines, diversity, and non-ephemerality.

The paper then describes the available counterspeech datasets, categorizing them by the shape of the interactions (single comments, pairs, dialogues), the targets of hate addressed, the types of hate speech covered, the languages represented, and any additional metadata provided. Key factors to consider when choosing a dataset include the size, number of counterspeech instances, collection procedure, and any supplementary information relevant to the specific research goals.

In conclusion, the choice between collecting new data or using existing datasets should be guided by the task design and the characteristics required. Existing datasets provide an efficient alternative, with the paper outlining the main dimensions to evaluate their suitability.

Step 3: Evaluate

The text summarizes approaches to evaluating counterspeech systems, which can be categorized into classification and generation tasks.

For classification tasks, performance can be assessed using standard metrics like F1, precision, recall, and accuracy, as well as multi-label metrics like hamming loss. Human judgment can also be used to verify classifier performance, and qualitative error analysis can help understand model flaws.

For generation tasks, evaluations can be extrinsic, measuring the system's real-world impact, or intrinsic, assessing the output itself. Intrinsic automatic metrics compare generated counterspeech to reference examples using criteria like linguistic surface, novelty, semantic similarity, and specific characteristics like toxicity, informativeness, and relevance. Human evaluation is also important, typically asking annotators to judge responses on aspects like suitability, specificity, grammaticality, coherence, and informativeness. While automatic metrics can evaluate at scale, human evaluation is seen as more reliable given the complexity of hate mitigation.

Human evaluation often uses expert or trained annotators, but could also involve diverse annotators or individuals representing potential recipients of the counterspeech, like perpetrators or bystanders.

Open challenges

The paper highlights several key open challenges in counterspeech research:

Language and culture: Hate speech is linguistically and culturally specific, requiring tailored responses. For example, the same words can have different discriminatory connotations in different countries.

Sources of hate: The identity of the hate speech perpetrator, along with cultural and geographical factors, should be considered to produce counterspeech targeted at specific groups.

Types of hate: Most studies focus on explicit hate, while implicit biases and stereotypes pose additional challenges due to complex linguistic forms like sarcasm and humor.

Hallucinations: Counterspeech generation can produce factually incorrect content. Using external knowledge sources and detecting inaccuracies can help address this issue.

Evaluation: Existing metrics are limited. Creating test suites to assess different counterspeech strategies and gathering user perspectives could lead to more meaningful evaluation.

Biases in data collection: Choices made during data collection, such as the source platform or the annotators' backgrounds, can introduce biases that affect the content and quality of the counterspeech. Providing dataset cards can help mitigate these issues.

Conclusion

The text provides a step-by-step guide for researchers approaching the topic of counterspeech from a natural language processing (NLP) perspective. It first frames the concept of counterspeech and distinguishes it from similar tasks. The subsequent sections outline progressive steps to undertake when conducting counterspeech research in NLP, drawing insights from the literature on the potential consequences of each choice. Finally, the text highlights open challenges in the field. The authors emphasize that while counterspeech is a promising approach to address online hate, NLP-based systems must be carefully designed to avoid unintended harm. The summary suggests that counterspeech represents a valuable area of study, but researchers must remain cognizant of the implications of their decisions when developing NLP-powered counterspeech solutions.

tations

The number of papers in this study may seem small due to the relatively limited attention to this topic so far, as well as the specific focus on NLP papers proposing a dataset, classification, selection, or generation task. The survey included studies from Scopus, arXiv, and the ACL Anthology, following the methodology of previous abusive language surveys in the NLP domain. The search used keywords, which may not have captured all available counterspeech studies, but represents a reasonable compromise for searching such large databases. The authors already had personal experience in counterspeech research, and their own list of studies was also included in the automated search process.

Ethical considerations

The text discusses the social consequences of engaging in counterspeech and the precautions that should be taken when dealing with it. Researchers and annotators involved in counterspeech tasks should prioritize their mental well-being, as prolonged exposure to abusive content can have negative effects. Synthetic data can be a viable option to preserve user privacy, and using simulated hate speech that is simple and stereotyped can avoid potential negative outcomes. However, if real data is collected, it is important to ensure that this does not interfere with the online activities of counterspeakers. Finally, human supervision is still necessary when deploying generation systems in real-life scenarios, as the risks of hallucinations and abusive generation are still too high to fully automate the task of counterspeech production.

Acknowledgements

Gavin Abercrombie received funding from the EPSRC project 'Equally Safe Online' (EP/W025493/1). Yi-Ling Chung received support from the Ecosystem Leadership Award under the EPSRC Grant EPX03870X1 and The Alan Turing Institute.

Appendix A Appendix

This section covers the methodology used to review the literature on counterspeech in natural language processing (NLP). The review follows the PRISMA framework and includes publicly available papers that present computational approaches to text-based tasks related to online counterspeech. The search was conducted across three databases - ACL Anthology, Scopus, and arXiv - using relevant keywords. After removing duplicates and filtering for NLP-related publications that presented data collection, generation, classification, or selection tasks, a total of 43 papers were included in the survey.

The paper also discusses different taxonomies of counterspeech strategies proposed in the literature, including those by Benesch et al., Qian et al., and Vidgen et al. These taxonomies categorize counterspeech based on various approaches, such as presenting facts, pointing out hypocrisy, warning of consequences, using a positive tone, identifying hate keywords, and expressing solidarity with target entities.

Additionally, the paper provides a non-exhaustive list of available datasets for counterspeech-related tasks, such as counter-trolling, hope speech, prosocial dialogue, detoxification, and misinformation countering. It also discusses the common practices for training annotators to recognize, post-edit, and write counterspeech, which typically involve reading guidelines, reviewing examples, practicing the task, and discussing disagreements with experts.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

NLP for Counterspeech against Hate: A Survey and How-To Guide

Helena Bonaldi, Yi-Ling Chung, Gavin Abercrombie, Marco Guerini

In recent years, counterspeech has emerged as one of the most promising strategies to fight online hate. These non-escalatory responses tackle online abuse while preserving the freedom of speech of the users, and can have a tangible impact in reducing online and offline violence. Recently, there has been growing interest from the Natural Language Processing (NLP) community in addressing the challenges of analysing, collecting, classifying, and automatically generating counterspeech, to reduce the huge burden of manually producing it. In particular, researchers have taken different directions in addressing these challenges, thus providing a variety of related tasks and resources. In this paper, we provide a guide for doing research on counterspeech, by describing - with detailed examples - the steps to undertake, and providing best practices that can be learnt from the NLP studies on this topic. Finally, we discuss open challenges and future directions of counterspeech research in NLP.

4/1/2024

Hostile Counterspeech Drives Users From Hate Subreddits

Daniel Hickey, Matheus Schmitz, Daniel M. T. Fessler, Paul E. Smaldino, Kristina Lerman, Goran Muri'c, Keith Burghardt

Counterspeech -- speech that opposes hate speech -- has gained significant attention recently as a strategy to reduce hate on social media. While previous studies suggest that counterspeech can somewhat reduce hate speech, little is known about its effects on participation in online hate communities, nor which counterspeech tactics reduce harmful behavior. We begin to address these gaps by identifying 25 large hate communities (subreddits) within Reddit and analyzing the effect of counterspeech on newcomers within these communities. We first construct a new public dataset of carefully annotated counterspeech and non-counterspeech comments within these subreddits. We use this dataset to train a state-of-the-art counterspeech detection model. Next, we use matching to evaluate the causal effects of hostile and non-hostile counterspeech on the engagement of newcomers in hate subreddits. We find that, while non-hostile counterspeech is ineffective at keeping users from fully disengaging from these hate subreddits, a single hostile counterspeech comment substantially reduces both future likelihood of engagement. While offering nuance to the understanding of counterspeech efficacy, these results a) leave unanswered the question of whether hostile counterspeech dissuades newcomers from participation in online hate writ large, or merely drives them into less-moderated and more extreme hate communities, and b) raises ethical considerations about hostile counterspeech, which is both comparatively common and might exacerbate rather than mitigate the net level of antagonism in society. These findings underscore the importance of future work to improve counterspeech tactics and minimize unintended harm.

5/29/2024

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

Ghadi Alyahya, Abeer Aldayel

Examining the factors that the counterspeech uses are at the core of understanding the optimal methods for confronting hate speech online. Various studies have assessed the emotional base factors used in counter speech, such as emotional empathy, offensiveness, and hostility. To better understand the counterspeech used in conversations, this study distills persuasion modes into reason, emotion, and credibility and evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) concerning racism, sexism, and religious bigotry. The evaluation covers the distinct behaviors seen with human-sourced as opposed to machine-generated counterspeech. It also assesses the interplay between the stance taken and the mode of persuasion seen in the counterspeech. Notably, we observe nuanced differences in the counterspeech persuasion modes used in open and closed interactions, especially in terms of the topic, with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The machine-generated counterspeech tends to exhibit an emotional persuasion mode, while human counters lean toward reason. Furthermore, our study shows that reason tends to obtain more supportive replies than other persuasion modes. The findings highlight the potential for incorporating persuasion modes into studies about countering hate speech, as they can serve as an optimal means of explainability and pave the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counterspeech.

7/17/2024

💬

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun

Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.

4/1/2024