Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

2405.12566

Published 5/22/2024 by Alessandra Recordare, Guglielmo Cola, Tiziano Fagni, Maurizio Tesconi

🎲

Abstract

In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.

Create account to get full access

Overview

This paper examines the proliferation of conspiracy theories on online platforms and how they differ from other users.
The researchers analyzed two datasets from X (formerly known as Twitter): one with conspiracy theorists and one without.
They looked at differences in emotions, idioms, and linguistic features between the two groups.
The study developed a machine learning model that can accurately identify users who spread conspiracy theories.

Plain English Explanation

The paper explores the growing problem of conspiracy theories spreading on online platforms. The researchers looked at two groups of users: those who frequently post conspiracy theories, and a control group who don't.

They found that the conspiracy theorists use very different language compared to other users. This includes distinct emotions, common phrases, and overall language patterns. By analyzing these differences, the researchers built a machine learning model that can reliably detect which users are likely to spread conspiracy theories.

The key insight is that it's not just the specific conspiracy theories themselves that set these users apart. Their fundamental way of communicating, from the words they choose to the feelings they express, is distinctly different from the general population. This suggests conspiracy theorists may inhabit a separate "information ecosystem" online.

Technical Explanation

The researchers conducted a comprehensive analysis of two X (formerly Twitter) datasets. One contained users exhibiting conspiracy theorizing patterns, while the other served as a control group without such tendencies.

They examined these two groups across three dimensions: emotions, idioms, and general linguistic features. The findings reveal significant differences in the language and lexicon used by conspiracy theorists compared to other users.

Building on this, the researchers developed a machine learning classifier capable of identifying users who propagate conspiracy theories. This model leverages a rich set of 871 features and demonstrates high accuracy, with an average F1 score of 0.88. The paper also uncovers the most discriminating characteristics that define conspiracy theory propagators.

Critical Analysis

The paper provides valuable insights into the language and communication patterns of conspiracy theorists online. However, it's important to note that the research is limited to X (formerly Twitter) data, and the findings may not extend to other social media platforms or offline contexts.

Additionally, while the machine learning model achieves high accuracy, it's unclear how well it would generalize to new data or handle the evolving nature of conspiracy theories. There may also be concerns around the potential misuse of such a model, such as unfairly targeting or silencing certain users.

Further research is needed to understand the underlying psychological and social factors that drive the spread of conspiracy theories, as well as the broader implications for online discourse and information ecosystems.

Conclusion

This paper sheds light on the distinct communication patterns of conspiracy theorists on online platforms. By analyzing differences in emotions, idioms, and linguistic features, the researchers developed a highly accurate machine learning model to identify users who propagate conspiracy theories.

The findings suggest that conspiracy theorists may inhabit a separate "information ecosystem" online, with their own unique way of expressing themselves. This has important implications for understanding and addressing the proliferation of misinformation and conspiracy theories in the digital age.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔎

Detection of Conspiracy Theories Beyond Keyword Bias in German-Language Telegram Using Large Language Models

Milena Pustet, Elisabeth Steffen, Helena Mihaljevi'c

The automated detection of conspiracy theories online typically relies on supervised learning. However, creating respective training data requires expertise, time and mental resilience, given the often harmful content. Moreover, available datasets are predominantly in English and often keyword-based, introducing a token-level bias into the models. Our work addresses the task of detecting conspiracy theories in German Telegram messages. We compare the performance of supervised fine-tuning approaches using BERT-like models with prompt-based approaches using Llama2, GPT-3.5, and GPT-4 which require little or no additional training data. We use a dataset of $sim!! 4,000$ messages collected during the COVID-19 pandemic, without the use of keyword filters. Our findings demonstrate that both approaches can be leveraged effectively: For supervised fine-tuning, we report an F1 score of $sim!! 0.8$ for the positive class, making our model comparable to recent models trained on keyword-focused English corpora. We demonstrate our model's adaptability to intra-domain temporal shifts, achieving F1 scores of $sim!! 0.7$. Among prompting variants, the best model is GPT-4, achieving an F1 score of $sim!! 0.8$ for the positive class in a zero-shot setting and equipped with a custom conspiracy theory definition.

4/30/2024

cs.CL cs.AI

ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

Zhiwei Liu, Boyang Liu, Paul Thompson, Kailai Yang, Sophia Ananiadou

The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detection focus only on binary classification and fail to account for the important relationship between misinformation and affective features (i.e., sentiment and emotions). Driven by a comprehensive analysis of conspiracy text that reveals its distinctive affective features, we propose ConspEmoLLM, the first open-source LLM that integrates affective information and is able to perform diverse tasks relating to conspiracy theories. These tasks include not only conspiracy theory detection, but also classification of theory type and detection of related discussion (e.g., opinions towards theories). ConspEmoLLM is fine-tuned based on an emotion-oriented LLM using our novel ConDID dataset, which includes five tasks to support LLM instruction tuning and evaluation. We demonstrate that when applied to these tasks, ConspEmoLLM largely outperforms several open-source general domain LLMs and ChatGPT, as well as an LLM that has been fine-tuned using ConDID, but which does not use affective features. This project will be released on https://github.com/lzw108/ConspEmoLLM/.

5/20/2024

cs.CL

Recontextualized Knowledge and Narrative Coalitions on Telegram

Tom Willaert

A defining characteristic of conspiracy texts is that they negotiate power and identity by recontextualizing prior knowledge. This dynamic has been shown to intensify on social media, where knowledge sources can readily be integrated into antagonistic narratives through hyperlinks. The objective of the present chapter is to further our understanding of this dynamic by surfacing and examining 1) how online conspiracy narratives recontextualize prior knowledge by coupling it with heterogeneous antagonistic elements, and 2) how such recontextualizing narratives operate as connectors around which diverse actors might form narrative coalitions. To this end, the chapter offers an empirical analysis of links to prior knowledge in public messaging channels from the Pushshift Telegram dataset. Using transferable methods from the field of bibliometrics, we find that politically extreme Telegram channels engage with a variety of established knowledge sources, including scientific journals, scientific repositories and other sources associated with the system of scholarly communication. Channels engaging with shared knowledge sources thereby form narrative coalitions ranging from scientific and technological imaginaries to far-right extremist and antisemitic conspiracy theories. Our analysis of these coalitions reveals (i) linguistic, political, and thematic forces that shape conspiracy narratives, (ii) emerging ideological, epistemological and ontological positions associated with online conspiracism, and (iii) how references to shared knowledge contribute to the communicability of conspiracy narratives.

4/30/2024

cs.CY

❗

Verified authors shape X/Twitter discursive communities

Stefano Guarino, Ayoub Mounim, Guido Caldarelli, Fabio Saracco

Community detection algorithms try to extract a mesoscale structure from the available network data, generally avoiding any explicit assumption regarding the quantity and quality of information conveyed by specific sets of edges. In this paper, we show that the core of ideological/discursive communities on X/Twitter can be effectively identified by uncovering the most informative interactions in an authors-audience bipartite network through a maximum-entropy null model. The analysis is performed considering three X/Twitter datasets related to the main political events of 2022 in Italy, using as benchmarks four state-of-the-art algorithms - three descriptive, one inferential -, and manually annotating nearly 300 verified users based on their political affiliation. In terms of information content, the communities obtained with the entropy-based algorithm are comparable to those obtained with some of the benchmarks. However, such a methodology on the authors-audience bipartite network: uses just a small sample of the available data to identify the central users of each community; returns a neater partition of the user set in just a few, easy to interpret, communities; clusters well-known political figures in a way that better matches the political alliances when compared with the benchmarks. Our results provide an important insight into online debates, highlighting that online interaction networks are mostly shaped by the activity of a small set of users who enjoy public visibility even outside social media.

5/9/2024

cs.SI cs.CY