Assessing Political Bias in Large Language Models






Published 6/6/2024 by Luca Rettenberger, Markus Reischl, Mark Schutera



The assessment of bias within Large Language Models (LLMs) has emerged as a critical concern in the contemporary discourse surrounding Artificial Intelligence (AI) in the context of their potential impact on societal dynamics. Recognizing and considering political bias within LLM applications is especially important when closing in on the tipping point toward performative prediction. Then, being educated about potential effects and the societal behavior LLMs can drive at scale due to their interplay with human operators. In this way, the upcoming elections of the European Parliament will not remain unaffected by LLMs. We evaluate the political bias of the currently most popular open-source LLMs (instruct or assistant models) concerning political issues within the European Union (EU) from a German voter's perspective. To do so, we use the Wahl-O-Mat, a voting advice application used in Germany. From the voting advice of the Wahl-O-Mat we quantize the degree of alignment of LLMs with German political parties. We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral, particularly when prompted in English. The central finding is that LLMs are similarly biased, with low variances in the alignment concerning a specific party. Our findings underline the importance of rigorously assessing and making bias transparent in LLMs to safeguard the integrity and trustworthiness of applications that employ the capabilities of performative prediction and the invisible hand of machine learning prediction and language generation.

Create account to get full access


If you already have an account, we'll log you in


  • This paper explores the societal biases present in large language models (LLMs) and their potential impact on political applications, particularly in the context of the upcoming European Parliament elections.
  • The researchers use the Wahl-O-Mat, a German voting advice application, to determine the political alignment of popular open-source LLMs from a German perspective.
  • The findings suggest that larger models, like LLama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral, especially in English.
  • This highlights the nuanced behavior of LLMs and the importance of language in shaping their political stances.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can generate human-like text. However, these models can also reflect societal biases, which is a critical concern in the field of AI ethics. Exploring Jungle Bias: Political Bias Attribution in Large Language Models focuses on recognizing and addressing political biases in LLMs, particularly in the context of the upcoming European Parliament elections.

To understand how LLMs might be influenced by political views, the researchers used a German voting advice application called Wahl-O-Mat. This tool helps people determine which political party best aligns with their values and beliefs. By applying this tool to popular open-source LLMs, the researchers were able to see how the models' language and outputs related to different political parties in Germany.

The study found that larger LLMs, such as LLama3-70B, tend to be more closely aligned with left-leaning political parties like GRUNE and Volt. Smaller models, on the other hand, often remained more neutral, especially when using English. This suggests that the size and language of the LLM can significantly impact its political leanings.

These findings highlight the importance of carefully assessing and addressing the societal biases present in LLMs. As these models are increasingly used in a wide range of applications, it's crucial to ensure that their outputs are fair and unbiased, particularly in high-stakes contexts like elections. Scaling Political Texts to Large Language Models: Asking the Right Questions and Analyzing the Impact of Data Selection and Fine-Tuning on Economic Bias in Language Models provide additional insights into the challenges and strategies for addressing biases in LLMs.

Technical Explanation

The researchers in this study evaluated the political biases present in popular open-source large language models (LLMs) from a German perspective. They used the Wahl-O-Mat, a German voting advice application, to determine the political alignment of the LLMs by assessing their responses to questions related to political issues within the European Union (EU).

The researchers analyzed the outputs of various LLMs, including the larger LLama3-70B model and smaller models, to investigate how the size and language of the models might influence their political leanings. They found that the larger LLama3-70B model tended to align more closely with left-leaning political parties like GRUNE and Volt, while smaller models often remained more neutral, particularly when using English.

These findings suggest that the language and scale of the LLM can significantly shape its political stances. Bias Patterns in the Application of LLMs for Clinical Decision Support and Exploring Subjectivity: Towards a More Human-Centric Assessment of Social Biases in Language Models provide further insights into the complex relationship between LLMs and societal biases.

Critical Analysis

The researchers in this study provide valuable insights into the political biases present in large language models (LLMs), but their work also has some limitations and potential areas for further research.

One limitation is the focus on a single voting advice application, the Wahl-O-Mat, which may not fully capture the nuances of political views and alignments within the European Union. Expanding the analysis to other political assessment tools or a broader range of political issues could provide a more comprehensive understanding of LLM biases.

Additionally, the study primarily examines the biases from a German perspective, which may not be representative of the broader political landscape in the EU. Conducting similar analyses in other European countries or at a pan-European level could yield additional insights and support the generalizability of the findings.

Furthermore, the researchers acknowledge that the observed political alignments may be influenced by the training data and fine-tuning processes used to develop the LLMs. Exploring the impact of these factors in more depth could shed light on the underlying mechanisms driving the observed biases.

Despite these limitations, the study's findings underscore the importance of rigorously assessing and addressing societal biases in LLMs, particularly in the context of high-stakes applications like political decision-making. As these powerful AI systems continue to be deployed in various domains, it is crucial to ensure that their outputs are fair, unbiased, and aligned with the principles of democratic governance.


This paper provides a valuable contribution to the ongoing discourse surrounding the societal biases present in large language models (LLMs) and their potential impact on political applications. By using the Wahl-O-Mat, a German voting advice application, the researchers have shed light on the nuanced political alignments of popular open-source LLMs.

The study's key finding that larger models, such as LLama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain more neutral, highlights the complex relationship between the scale and language of LLMs and their political stances. This underscores the importance of carefully evaluating and addressing the biases inherent in these powerful AI systems, especially as they are increasingly deployed in high-stakes contexts like elections.

As the field of AI ethics continues to evolve, the insights from this research can contribute to the development of strategies and best practices for ensuring the integrity and fairness of applications that leverage the capabilities of modern machine learning methods. Ongoing efforts to explore the subjectivity and human-centric assessment of social biases in language models will be crucial in shaping the responsible and ethical use of LLMs in the years to come.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

The Political Preferences of LLMs

David Rozado





I report here a comprehensive analysis about the political preferences embedded in Large Language Models (LLMs). Namely, I administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both closed and open source. When probed with questions/statements with political connotations, most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. This does not appear to be the case for five additional base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. However, the weak performance of the base models at coherently answering the tests' questions makes this subset of results inconclusive. Finally, I demonstrate that LLMs can be steered towards specific locations in the political spectrum through Supervised Fine-Tuning (SFT) with only modest amounts of politically aligned data, suggesting SFT's potential to embed political orientation in LLMs. With LLMs beginning to partially displace traditional information sources like search engines and Wikipedia, the societal implications of political biases embedded in LLMs are substantial.

Read more


Aligning Large Language Models with Diverse Political Viewpoints

Aligning Large Language Models with Diverse Political Viewpoints

Dominik Stammbach, Philine Widmer, Eunjung Cho, Caglar Gulcehre, Elliott Ash





Large language models such as ChatGPT often exhibit striking political biases. If users query them about political information, they might take a normative stance and reinforce such biases. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Such aligned models are able to generate more accurate political viewpoints from Swiss parties compared to commercial models such as ChatGPT. We also propose a procedure to generate balanced overviews from multiple viewpoints using such models.

Read more


Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs

Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs

Tanise Ceron, Neele Falk, Ana Bari'c, Dmitry Nikolaev, Sebastian Pad'o





Due to the widespread use of large language models (LLMs) in ubiquitous systems, we need to understand whether they embed a specific worldview and what these views reflect. Recent studies report that, prompted with political questionnaires, LLMs show left-liberal leanings (Feng et al., 2023; Motoki et al., 2024). However, it is as yet unclear whether these leanings are reliable (robust to prompt variations) and whether the leaning is consistent across policies and political leaning. We propose a series of tests which assess the reliability and consistency of LLMs' stances on political statements based on a dataset of voting-advice questionnaires collected from seven EU countries and annotated for policy domains. We study LLMs ranging in size from 7B to 70B parameters and find that their reliability increases with parameter count. Larger models show overall stronger alignment with left-leaning parties but differ among policy programs: They evince a (left-wing) positive stance towards environment protection, social welfare state and liberal society but also (right-wing) law and order, with no consistent preferences in foreign policy and migration.

Read more


Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Filip Trhlik, Pontus Stenetorp





Large language models (LLMs) are increasingly being utilised across a range of tasks and domains, with a burgeoning interest in their application within the field of journalism. This trend raises concerns due to our limited understanding of LLM behaviour in this domain, especially with respect to political bias. Existing studies predominantly focus on LLMs undertaking political questionnaires, which offers only limited insights into their biases and operational nuances. To address this gap, our study establishes a new curated dataset that contains 2,100 human-written articles and utilises their descriptions to generate 56,700 synthetic articles using nine LLMs. This enables us to analyse shifts in properties between human-authored and machine-generated articles, with this study focusing on political bias, detecting it using both supervised models and LLMs. Our findings reveal significant disparities between base and instruction-tuned LLMs, with instruction-tuned models exhibiting consistent political bias. Furthermore, we are able to study how LLMs behave as classifiers, observing their display of political bias even in this role. Overall, for the first time within the journalistic domain, this study outlines a framework and provides a structured dataset for quantifiable experiments, serving as a foundation for further research into LLM political bias and its implications.

Read more
