Classifying Human-Generated and AI-Generated Election Claims in Social Media

2404.16116

Published 4/29/2024 by Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese, Genya Coulter

🔍

Abstract

Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.

Create account to get full access

Overview

The paper explores the issue of misinformation on social media, particularly during election cycles, and how the emergence of Large Language Models (LLMs) can exacerbate this problem.
The researchers present a novel taxonomy for characterizing election-related claims, which can be used to analyze the nature and characteristics of these claims.
They introduce a benchmark dataset called ElectAI that consists of tweets labeled as human- or AI-generated, along with the specific LLM variant that produced the AI-generated tweets.
The paper investigates the capabilities of LLMs in extracting the taxonomy attributes and trains various machine learning models to distinguish between human- and AI-generated posts, as well as identify the specific LLM variant.

Plain English Explanation

Social media platforms have become a popular forum for discussing political topics, especially during elections. Malicious actors may take advantage of this by spreading misinformation to undermine trust in the electoral process. The rise of large language models (LLMs) has made it easier for these actors to generate misinformation at a larger scale, as the AI-generated content can be difficult to distinguish from authentic user-generated content.

To address this issue, the researchers have developed a new way to categorize and analyze election-related claims. They have created a detailed taxonomy that covers different aspects of the electoral process, such as jurisdiction, equipment, and the nature of the claims. This taxonomy serves as a tool for understanding the characteristics of these claims.

The researchers have also created a dataset called ElectAI, which contains thousands of tweets labeled as either human-generated or AI-generated, with the specific LLM used for the AI-generated tweets identified. This dataset can be used to train machine learning models to detect AI-generated misinformation and identify the LLM responsible.

By developing these tools, the researchers hope to provide a way to better understand and address the problem of AI-generated content in the context of elections, which can have significant implications for the integrity of the democratic process.

Technical Explanation

The researchers present a novel taxonomy for characterizing election-related claims, which includes categories related to jurisdiction, equipment, processes, and the nature of the claims. This taxonomy serves as a framework for analyzing the content and characteristics of these claims.

To support their research, the researchers introduce the ElectAI dataset, which consists of 9,900 tweets, each labeled as either human-generated or AI-generated. For the AI-generated tweets, the specific LLM variant that produced them is also identified. A subset of 1,550 tweets from this dataset was manually annotated using the proposed taxonomy to capture the nuances of the election-related claims.

The researchers then explore the capabilities of LLMs in extracting the taxonomy attributes from the tweets and train various machine learning models using the ElectAI dataset. These models are designed to distinguish between human- and AI-generated posts, as well as identify the specific LLM variant responsible for the AI-generated content.

The findings of this research have implications for detecting and mitigating the spread of AI-generated misinformation on social media platforms, particularly during critical events like elections.

Critical Analysis

The researchers have presented a comprehensive approach to addressing the issue of AI-generated misinformation in the context of election-related discussions on social media. The development of the taxonomy and the ElectAI dataset are valuable contributions to the field, as they provide a structured way to analyze and categorize these claims.

However, the paper does not delve into the potential limitations or biases that may be present in the dataset or the machine learning models. It would be helpful to understand how the dataset was curated, the demographic and geographic representation of the tweets, and any potential biases that may have been introduced during the annotation process.

Additionally, the paper does not discuss the feasibility of deploying these models in real-world scenarios, where the volume and complexity of social media data may pose significant challenges. It would be interesting to see the researchers explore the scalability and robustness of their approach in a more practical setting.

Overall, the research presented in this paper is a promising step towards addressing the crucial problem of AI-generated misinformation and its impact on the integrity of democratic processes. Further research and collaboration with social media platforms and election authorities could help refine and strengthen the proposed solutions.

Conclusion

This paper tackles the growing issue of misinformation on social media, particularly during election cycles, and how the emergence of Large Language Models (LLMs) can exacerbate this problem. The researchers have developed a novel taxonomy for characterizing election-related claims and introduced a benchmark dataset called ElectAI to support their research.

The findings presented in this paper have the potential to contribute to the development of more effective tools for detecting and mitigating AI-generated misinformation on social media platforms. By providing a structured approach to analyzing election-related claims and training machine learning models to distinguish between human- and AI-generated content, the researchers have taken an important step towards safeguarding the integrity of the democratic process.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Charting the Landscape of Nefarious Uses of Generative Artificial Intelligence for Online Election Interference

Emilio Ferrara

Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) pose significant risks, particularly in the realm of online election interference. This paper explores the nefarious applications of GenAI, highlighting their potential to disrupt democratic processes through deepfakes, botnets, targeted misinformation campaigns, and synthetic identities.

6/5/2024

cs.CY

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Filip Trhlik, Pontus Stenetorp

Large language models (LLMs) are increasingly being utilised across a range of tasks and domains, with a burgeoning interest in their application within the field of journalism. This trend raises concerns due to our limited understanding of LLM behaviour in this domain, especially with respect to political bias. Existing studies predominantly focus on LLMs undertaking political questionnaires, which offers only limited insights into their biases and operational nuances. To address this gap, our study establishes a new curated dataset that contains 2,100 human-written articles and utilises their descriptions to generate 56,700 synthetic articles using nine LLMs. This enables us to analyse shifts in properties between human-authored and machine-generated articles, with this study focusing on political bias, detecting it using both supervised models and LLMs. Our findings reveal significant disparities between base and instruction-tuned LLMs, with instruction-tuned models exhibiting consistent political bias. Furthermore, we are able to study how LLMs behave as classifiers, observing their display of political bias even in this role. Overall, for the first time within the journalistic domain, this study outlines a framework and provides a structured dataset for quantifiable experiments, serving as a foundation for further research into LLM political bias and its implications.

6/18/2024

cs.CL cs.AI

Seeing Through AI's Lens: Enhancing Human Skepticism Towards LLM-Generated Fake News

Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee

LLMs offer valuable capabilities, yet they can be utilized by malicious users to disseminate deceptive information and generate fake news. The growing prevalence of LLMs poses difficulties in crafting detection approaches that remain effective across various text domains. Additionally, the absence of precautionary measures for AI-generated news on online social platforms is concerning. Therefore, there is an urgent need to improve people's ability to differentiate between news articles written by humans and those produced by LLMs. By providing cues in human-written and LLM-generated news, we can help individuals increase their skepticism towards fake LLM-generated news. This paper aims to elucidate simple markers that help individuals distinguish between articles penned by humans and those created by LLMs. To achieve this, we initially collected a dataset comprising 39k news articles authored by humans or generated by four distinct LLMs with varying degrees of fake. We then devise a metric named Entropy-Shift Authorship Signature (ESAS) based on the information theory and entropy principles. The proposed ESAS ranks terms or entities, like POS tagging, within news articles based on their relevance in discerning article authorship. We demonstrate the effectiveness of our metric by showing the high accuracy attained by a basic method, i.e., TF-IDF combined with logistic regression classifier, using a small set of terms with the highest ESAS score. Consequently, we introduce and scrutinize these top ESAS-ranked terms to aid individuals in strengthening their skepticism towards LLM-generated fake news.

6/21/2024

cs.CL cs.AI

AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images

Jonas Ricker, Dennis Assenmacher, Thorsten Holz, Asja Fischer, Erwin Quiring

Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking. In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.

4/23/2024

cs.CR cs.AI cs.CY cs.LG cs.SI