ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

2403.06765

Published 5/20/2024 by Zhiwei Liu, Boyang Liu, Paul Thompson, Kailai Yang, Sophia Ananiadou

ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

Abstract

The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detection focus only on binary classification and fail to account for the important relationship between misinformation and affective features (i.e., sentiment and emotions). Driven by a comprehensive analysis of conspiracy text that reveals its distinctive affective features, we propose ConspEmoLLM, the first open-source LLM that integrates affective information and is able to perform diverse tasks relating to conspiracy theories. These tasks include not only conspiracy theory detection, but also classification of theory type and detection of related discussion (e.g., opinions towards theories). ConspEmoLLM is fine-tuned based on an emotion-oriented LLM using our novel ConDID dataset, which includes five tasks to support LLM instruction tuning and evaluation. We demonstrate that when applied to these tasks, ConspEmoLLM largely outperforms several open-source general domain LLMs and ChatGPT, as well as an LLM that has been fine-tuned using ConDID, but which does not use affective features. This project will be released on https://github.com/lzw108/ConspEmoLLM/.

Create account to get full access

Overview

This paper proposes a new approach called ConspEmoLLM for detecting conspiracy theories using an emotion-based large language model.
The researchers aim to go beyond simple keyword-based detection and instead leverage the emotional signals in text to identify conspiracy theories more accurately.
The model is trained on a dataset of conspiracy theories and non-conspiracy texts, and it learns to associate certain emotional patterns with conspiracy content.

Plain English Explanation

The researchers have developed a new way to automatically detect conspiracy theories online. Instead of just looking for certain keywords, their model tries to understand the emotions expressed in the text. The idea is that conspiracy theories often have a distinct emotional "signature" - they may contain more anger, fear, or distrust, for example.

By training their model on a dataset of known conspiracy theories and non-conspiracy texts, the researchers have taught it to recognize these emotional patterns. When the model encounters new text, it can analyze the emotions and use that to predict whether the text is likely to be a conspiracy theory or not. This goes beyond simple keyword matching, which can be fooled or miss more subtle cases.

The researchers believe this emotion-based approach will be more effective at catching conspiracy theories, especially as they evolve and change over time. It could be a valuable tool for platforms and fact-checkers trying to identify and limit the spread of misinformation online.

Technical Explanation

The researchers developed a large language model trained specifically on detecting conspiracy theories. Rather than just looking for certain keywords, their "ConspEmoLLM" model analyzes the emotional content of the text to make its predictions.

The model was trained on a dataset containing both conspiracy theory texts and non-conspiracy control texts. By learning the distinct emotional patterns associated with conspiracy content, the model can then apply this knowledge to detect conspiracy theories in new, unseen text.

The researchers experimented with different ways of incorporating the emotional signals, including directly predicting emotion labels and using the emotion features as additional input to the language model. They found that the latter approach, which allows the model to dynamically balance the emotional and linguistic cues, worked best.

Compared to keyword-based baselines, the emotion-aware ConspEmoLLM model showed improved performance on conspiracy theory detection tasks. The researchers argue this demonstrates the value of going beyond simple lexical matching and instead modeling the nuanced emotional characteristics of conspiracy narratives.

Critical Analysis

The researchers make a compelling case for the benefits of an emotion-based approach to conspiracy theory detection. By modeling the emotional rather than just the lexical content, the model can potentially be more robust to evolving conspiracy narratives that may not contain the same keyword triggers.

However, the researchers acknowledge that their dataset and evaluation are limited to a particular domain and language (English). More work is needed to validate the approach on a wider range of conspiracy theories and languages. There may also be cultural or contextual factors that influence the emotional signatures of conspiracy theories that the current model does not capture.

Additionally, while the emotion-based approach is promising, it is not a panacea. Conspiracy theories can still be conveyed in veiled or subtle ways that may not have a clear emotional "tell." The researchers mention this as a limitation, and further work is needed to address more sophisticated attempts at obfuscation.

Overall, the ConspEmoLLM model represents an important step forward in using more sophisticated natural language processing techniques to combat the spread of conspiracy theories and misinformation online. But as with any such system, continued refinement, testing, and thoughtful deployment will be crucial to ensuring it is effective and does not inadvertently cause harm.

Conclusion

This paper presents a new approach to detecting conspiracy theories online that goes beyond simple keyword matching. By training a large language model to recognize the emotional patterns associated with conspiracy narratives, the researchers have developed a more nuanced and potentially more robust system for identifying this type of misinformation.

While more work is needed to validate the approach across different domains and languages, the emotion-based ConspEmoLLM model shows promise as a tool for platforms, fact-checkers, and researchers trying to combat the spread of conspiracy theories. By looking beyond just the lexical content of text, this research demonstrates how advanced natural language processing can be leveraged to address complex societal challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis

Zhiwei Liu, Kailai Yang, Tianlin Zhang, Qianqian Xie, Sophia Ananiadou

Sentiment analysis and emotion detection are important research topics in natural language processing (NLP) and benefit many downstream tasks. With the widespread application of LLMs, researchers have started exploring the application of LLMs based on instruction-tuning in the field of sentiment analysis. However, these models only focus on single aspects of affective classification tasks (e.g. sentimental polarity or categorical emotions), and overlook the regression tasks (e.g. sentiment strength or emotion intensity), which leads to poor performance in downstream tasks. The main reason is the lack of comprehensive affective instruction tuning datasets and evaluation benchmarks, which cover various affective classification and regression tasks. Moreover, although emotional information is useful for downstream tasks, existing downstream datasets lack high-quality and comprehensive affective annotations. In this paper, we propose EmoLLMs, the first series of open-sourced instruction-following LLMs for comprehensive affective analysis based on fine-tuning various LLMs with instruction data, the first multi-task affective analysis instruction dataset (AAID) with 234K data samples based on various classification and regression tasks to support LLM instruction tuning, and a comprehensive affective evaluation benchmark (AEB) with 14 tasks from various sources and domains to test the generalization ability of LLMs. We propose a series of EmoLLMs by fine-tuning LLMs with AAID to solve various affective instruction tasks. We compare our model with a variety of LLMs on AEB, where our models outperform all other open-sourced LLMs, and surpass ChatGPT and GPT-4 in most tasks, which shows that the series of EmoLLMs achieve the ChatGPT-level and GPT-4-level generalization capabilities on affective analysis tasks, and demonstrates our models can be used as affective annotation tools.

6/19/2024

cs.CL

🔎

Detection of Conspiracy Theories Beyond Keyword Bias in German-Language Telegram Using Large Language Models

Milena Pustet, Elisabeth Steffen, Helena Mihaljevi'c

The automated detection of conspiracy theories online typically relies on supervised learning. However, creating respective training data requires expertise, time and mental resilience, given the often harmful content. Moreover, available datasets are predominantly in English and often keyword-based, introducing a token-level bias into the models. Our work addresses the task of detecting conspiracy theories in German Telegram messages. We compare the performance of supervised fine-tuning approaches using BERT-like models with prompt-based approaches using Llama2, GPT-3.5, and GPT-4 which require little or no additional training data. We use a dataset of $sim!! 4,000$ messages collected during the COVID-19 pandemic, without the use of keyword filters. Our findings demonstrate that both approaches can be leveraged effectively: For supervised fine-tuning, we report an F1 score of $sim!! 0.8$ for the positive class, making our model comparable to recent models trained on keyword-focused English corpora. We demonstrate our model's adaptability to intra-domain temporal shifts, achieving F1 scores of $sim!! 0.7$. Among prompting variants, the best model is GPT-4, achieving an F1 score of $sim!! 0.8$ for the positive class in a zero-shot setting and equipped with a custom conspiracy theory definition.

4/30/2024

cs.CL cs.AI

RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information

Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine de Kock, Sophia Ananiadou, Eduard Hovy

Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focus on in-domain tasks and do not incorporate significant sentiment and emotion features (which we jointly call affect). In this paper, we propose RAEmoLLM, the first retrieval augmented (RAG) LLMs framework to address cross-domain misinformation detection using in-context learning based on affective information. It accomplishes this by applying an emotion-aware LLM to construct a retrieval database of affective embeddings. This database is used by our retrieval module to obtain source-domain samples, which are subsequently used for the inference module's in-context few-shot learning to detect target domain misinformation. We evaluate our framework on three misinformation benchmarks. Results show that RAEmoLLM achieves significant improvements compared to the zero-shot method on three datasets, with the highest increases of 20.69%, 23.94%, and 39.11% respectively. This work will be released on https://github.com/lzw108/RAEmoLLM.

6/18/2024

cs.CL

💬

Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysis

Sahas Koka, Anthony Vuong, Anish Kataria

In an era increasingly influenced by artificial intelligence, the detection of fake news is crucial, especially in contexts like election seasons where misinformation can have significant societal impacts. This study evaluates the effectiveness of various LLMs in identifying and filtering fake news content. Utilizing a comparative analysis approach, we tested four large LLMs -- GPT-4, Claude 3 Sonnet, Gemini Pro 1.0, and Mistral Large -- and two smaller LLMs -- Gemma 7B and Mistral 7B. By using fake news dataset samples from Kaggle, this research not only sheds light on the current capabilities and limitations of LLMs in fake news detection but also discusses the implications for developers and policymakers in enhancing AI-driven informational integrity.

6/12/2024

cs.CL cs.AI