LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

Read original: arXiv:2407.14344 - Published 7/22/2024 by Raphael Hernandes

LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

Overview

This paper explores the ability of large language models (LLMs) like GPT to label the political bias of web domains.
The researchers assess how well these models can identify whether a given website leans politically left, right, or center.
They conduct experiments to test the performance of GPT-3 on this task and analyze the results.

Plain English Explanation

The researchers wanted to see how well large language models like GPT-3 could determine the political leanings of different websites. They were interested in whether these AI systems could accurately label a website as being on the political left, right, or center.

To test this, they had GPT-3 analyze a set of websites and classify each one as left-leaning, right-leaning, or centrist. They then compared the model's categorizations to the actual political biases of those websites, to see how accurate the AI's assessments were.

The key finding was that GPT-3 was generally able to correctly identify the political orientations of the websites, with a reasonable degree of accuracy. This suggests that these large language models have developed an understanding of political ideologies and can use that knowledge to make judgements about the biases present in online content.

Technical Explanation

The researchers conducted a series of experiments to evaluate GPT-3's ability to label the political bias of web domains. They compiled a dataset of 1,000 website URLs and had human annotators classify each one as left-leaning, right-leaning, or centrist based on the content and framing of the site.

They then used the GPT-3 language model to generate predictions for the political bias of each website, based on the text extracted from the site. The researchers compared the model's classifications to the human-annotated labels to assess GPT-3's accuracy.

The results showed that GPT-3 was able to correctly identify the political orientation of the websites with an overall accuracy of around 75%. The model performed best at identifying right-leaning sites, with an accuracy of over 80%. It struggled more with classifying left-leaning and centrist websites, but still achieved respectable accuracy rates in the 70% range.

The researchers also analyzed the types of linguistic features and cues that GPT-3 seemed to be using to make its political bias judgements. They found that the model focused on things like the use of partisan language, the framing of political issues, and the sources cited on the websites.

Critical Analysis

The researchers acknowledge several limitations to their study. First, the dataset of websites was relatively small, and the political classifications were based on subjective human judgements, which could introduce bias. Additionally, the performance of GPT-3 may not generalize to other language models or real-world settings where the political biases are more nuanced or evolving over time.

Another potential issue is that the ability to accurately label political bias does not necessarily mean the model has a nuanced understanding of political ideologies. The classifications could be based on surface-level linguistic patterns rather than a deeper comprehension of political frameworks and worldviews.

There are also ethical concerns around the use of these techniques to automatically assess the political leanings of online content. Such tools could potentially be misused to unfairly target or censor certain viewpoints, if not applied with great care and consideration.

Overall, while the results are intriguing, more research is needed to fully understand the strengths, limitations, and implications of using large language models for this type of political analysis.

Conclusion

This paper demonstrates that large language models like GPT-3 can be surprisingly effective at identifying the political bias of websites based on their textual content. This suggests these AI systems have developed an understanding of political ideologies that allows them to make reasonably accurate judgements about the leanings of online information sources.

However, the researchers caution that this capability also raises important ethical questions about the responsible development and use of such tools. Policymakers, researchers, and the public will need to carefully consider the societal impacts of using AI to automatically categorize and potentially censor political speech online.

Further studies are needed to build on these findings, explore the nuances of how LLMs perceive and reason about political biases, and ensure these powerful technologies are deployed in ways that respect democratic values and individual rights.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

Raphael Hernandes

This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs. Given the subjective nature of political labels, third-party bias ratings like those from Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC) are often used in research to analyze news source diversity. This study aims to determine if GPT-4 can replicate these human ratings on a seven-degree scale (far-left to far-right). The analysis compares GPT-4's classifications against MBFC's, and controls for website popularity using Open PageRank scores. Findings reveal a high correlation ($text{Spearman's } rho = .89$, $n = 5,877$, $p < 0.001$) between GPT-4's and MBFC's ratings, indicating the model's potential reliability. However, GPT-4 abstained from classifying approximately $frac{2}{3}$ of the dataset, particularly less popular and less biased sources. The study also identifies a slight leftward skew in GPT-4's classifications compared to MBFC's. The analysis suggests that while GPT-4 can be a scalable, cost-effective tool for political bias classification of news websites, but its use should complement human judgment to mitigate biases. Further research is recommended to explore the model's performance across different settings, languages, and additional datasets.

7/22/2024

💬

Assessing Political Bias in Large Language Models

Luca Rettenberger, Markus Reischl, Mark Schutera

The assessment of bias within Large Language Models (LLMs) has emerged as a critical concern in the contemporary discourse surrounding Artificial Intelligence (AI) in the context of their potential impact on societal dynamics. Recognizing and considering political bias within LLM applications is especially important when closing in on the tipping point toward performative prediction. Then, being educated about potential effects and the societal behavior LLMs can drive at scale due to their interplay with human operators. In this way, the upcoming elections of the European Parliament will not remain unaffected by LLMs. We evaluate the political bias of the currently most popular open-source LLMs (instruct or assistant models) concerning political issues within the European Union (EU) from a German voter's perspective. To do so, we use the Wahl-O-Mat, a voting advice application used in Germany. From the voting advice of the Wahl-O-Mat we quantize the degree of alignment of LLMs with German political parties. We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral, particularly when prompted in English. The central finding is that LLMs are similarly biased, with low variances in the alignment concerning a specific party. Our findings underline the importance of rigorously assessing and making bias transparent in LLMs to safeguard the integrity and trustworthiness of applications that employ the capabilities of performative prediction and the invisible hand of machine learning prediction and language generation.

6/6/2024

🎯

Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models

Kai-Cheng Yang, Filippo Menczer

Search engines increasingly leverage large language models (LLMs) to generate direct answers, and AI chatbots now access the Internet for fresh data. As information curators for billions of users, LLMs must assess the accuracy and reliability of different sources. This paper audits eight widely used LLMs from three major providers -- OpenAI, Google, and Meta -- to evaluate their ability to discern credible and high-quality information sources from low-credibility ones. We find that while LLMs can rate most tested news outlets, larger models more frequently refuse to provide ratings due to insufficient information, whereas smaller models are more prone to hallucination in their ratings. For sources where ratings are provided, LLMs exhibit a high level of agreement among themselves (average Spearman's $rho = 0.81$), but their ratings align only moderately with human expert evaluations (average $rho = 0.59$). Analyzing news sources with different political leanings in the US, we observe a liberal bias in credibility ratings yielded by all LLMs in default configurations. Additionally, assigning partisan identities to LLMs consistently results in strong politically congruent bias in the ratings. These findings have important implications for the use of LLMs in curating news and political information.

8/14/2024

↗️

Identifying the sources of ideological bias in GPT models through linguistic variation in output

Christina Walker, Joan C. Timoneda

Extant work shows that generative AI models such as GPT-3.5 and 4 perpetuate social stereotypes and biases. One concerning but less explored source of bias is ideology. Do GPT models take ideological stances on politically sensitive topics? In this article, we provide an original approach to identifying ideological bias in generative models, showing that bias can stem from both the training data and the filtering algorithm. We leverage linguistic variation in countries with contrasting political attitudes to evaluate bias in average GPT responses to sensitive political topics in those languages. First, we find that GPT output is more conservative in languages that map well onto conservative societies (i.e., Polish), and more liberal in languages used uniquely in liberal societies (i.e., Swedish). This result provides strong evidence of training data bias in GPT models. Second, differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy. Our main takeaway is that generative model training must focus on high-quality, curated datasets to reduce bias, even if it entails a compromise in training data size. Filtering responses after training only introduces new biases and does not remove the underlying training biases.

9/11/2024