Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion

Read original: arXiv:2407.08563 - Published 7/12/2024 by Leah von der Heyde, Anna-Carolina Haensch, Alexander Wenz

💬

Overview

Researchers investigated whether large language models (LLMs) can accurately estimate public opinion, using the example of vote choice in Germany.
They generated a synthetic sample of personas matching characteristics of a German election survey and asked the LLM GPT-3.5 to predict each respondent's vote choice.
The LLM's predictions did not match the actual survey data, exhibiting a bias towards the Green and Left parties and missing the complex factors influencing individual voter choices.

Plain English Explanation

Researchers wanted to see if large language models (LLMs) could be used to estimate public opinion, using voting behavior in Germany as an example. LLMs are AI systems trained on vast amounts of text data, which allow them to generate human-like responses.

The researchers created fictional people with characteristics matching a real German election survey. They then asked the LLM called GPT-3.5 to predict how each of these people would vote. The researchers compared the LLM's predictions to the actual survey data.

Unfortunately, the LLM's predictions did not match the real-world survey results very well. The LLM seemed to have a bias towards the Green and Left political parties, and it failed to capture the nuanced factors that influence how individual voters make their choices.

This study adds to the growing research on the conditions under which LLMs can be used to study public opinion. The findings suggest that LLMs may not accurately represent the full range of opinions in a population, and there are limitations to using them for this purpose.

Technical Explanation

The researchers generated a synthetic sample of personas matching the individual characteristics of respondents from the 2017 German Longitudinal Election Study. They then asked the LLM GPT-3.5 to predict each respondent's vote choice and compared these predictions to the survey-based estimates at both the aggregate and subgroup levels.

The results showed that GPT-3.5 did not accurately predict citizens' vote choice, exhibiting a bias towards the Green and Left parties. While the LLM was able to capture the tendencies of typical voter subgroups, such as partisans, it failed to account for the multifaceted factors that influence individual voter choices.

By examining the LLM-based prediction of voting behavior in a new context, this study contributes to the growing body of research about the conditions under which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.

Critical Analysis

The paper acknowledges several contextual factors that might affect the generalizability of the findings, such as the relationship between the target population and the LLM's training data. The researchers also note that their study only examined vote choice and that LLMs might perform better at estimating public opinion on other topics.

However, the paper does not address the potential biases and limitations inherent in the survey data used as a benchmark. Survey data can also fail to fully capture the complexity of public opinion, particularly on sensitive political topics. It would be valuable to further investigate the discrepancies between LLM predictions and survey results, considering the limitations of both approaches.

Additionally, the study focuses on a single LLM (GPT-3.5) and a specific context (German voting behavior). More research is needed to understand how different LLMs and various public opinion domains might influence the accuracy and reliability of LLM-based estimates.

Conclusion

This study suggests that large language models (LLMs) may not be able to accurately estimate public opinion, at least in the domain of voting behavior in Germany. The LLM GPT-3.5 exhibited biases and failed to capture the nuanced factors influencing individual voter choices.

The findings contribute to our understanding of the limitations of using LLMs for studying public opinion. While LLMs have the potential to complement traditional survey methods, this research highlights the need for caution and further investigation into the conditions under which LLMs can reliably represent the diversity of public opinion.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion

Leah von der Heyde, Anna-Carolina Haensch, Alexander Wenz

The recent development of large language models (LLMs) has spurred discussions about whether LLM-generated synthetic samples could complement or replace traditional surveys, considering their training data potentially reflects attitudes and behaviors prevalent in the population. A number of mostly US-based studies have prompted LLMs to mimic survey respondents, with some of them finding that the responses closely match the survey data. However, several contextual factors related to the relationship between the respective target population and LLM training data might affect the generalizability of such findings. In this study, we investigate the extent to which LLMs can estimate public opinion in Germany, using the example of vote choice. We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates on the aggregate and subgroup levels. We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties. While the LLM captures the tendencies of typical voter subgroups, such as partisans, it misses the multifaceted factors swaying individual voter choices. By examining the LLM-based prediction of voting behavior in a new context, our study contributes to the growing body of research about the conditions under which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.

7/12/2024

💬

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

Junsol Kim, Byungkyu Lee

Large language models (LLMs) that produce human-like responses have begun to revolutionize research practices in the social sciences. We develop a novel methodological framework that fine-tunes LLMs with repeated cross-sectional surveys to incorporate the meaning of survey questions, individual beliefs, and temporal contexts for opinion prediction. We introduce two new emerging applications of the AI-augmented survey: retrodiction (i.e., predict year-level missing responses) and unasked opinion prediction (i.e., predict entirely missing responses). Among 3,110 binarized opinions from 68,846 Americans in the General Social Survey from 1972 to 2021, our models based on Alpaca-7b excel in retrodiction (AUC = 0.86 for personal opinion prediction, $rho$ = 0.98 for public opinion prediction). These remarkable prediction capabilities allow us to fill in missing trends with high confidence and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. On the other hand, our fine-tuned Alpaca-7b models show modest success in unasked opinion prediction (AUC = 0.73, $rho$ = 0.67). We discuss practical constraints and ethical concerns regarding individual autonomy and privacy when using LLMs for opinion prediction. Our study demonstrates that LLMs and surveys can mutually enhance each other's capabilities: LLMs can broaden survey potential, while surveys can improve the alignment of LLMs.

4/9/2024

Representation Bias in Political Sample Simulations with Large Language Models

Weihong Qi, Hanjia Lyu, Jiebo Luo

This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao Dataset, and China Family Panel Studies to simulate voting behaviors and public opinions. This methodology enables us to examine three types of representation bias: disparities based on the the country's language, demographic groups, and political regime types. The findings reveal that simulation performance is generally better for vote choice than for public opinions, more accurate in English-speaking countries, more effective in bipartisan systems than in multi-partisan systems, and stronger in democratic settings than in authoritarian regimes. These results contribute to enhancing our understanding and developing strategies to mitigate biases in AI applications within the field of computational social science.

7/17/2024

Large Language Models can impersonate politicians and other public figures

Steffen Herbold, Alexander Trautsch, Zlata Kikteva, Annette Hautli-Janisz

Modern AI technology like Large language models (LLMs) has the potential to pollute the public information sphere with made-up content, which poses a significant threat to the cohesion of societies at large. A wide range of research has shown that LLMs are capable of generating text of impressive quality, including persuasive political speech, text with a pre-defined style, and role-specific content. But there is a crucial gap in the literature: We lack large-scale and systematic studies of how capable LLMs are in impersonating political and societal representatives and how the general public judges these impersonations in terms of authenticity, relevance and coherence. We present the results of a study based on a cross-section of British society that shows that LLMs are able to generate responses to debate questions that were part of a broadcast political debate programme in the UK. The impersonated responses are judged to be more authentic and relevant than the original responses given by people who were impersonated. This shows two things: (1) LLMs can be made to contribute meaningfully to the public political debate and (2) there is a dire need to inform the general public of the potential harm this can have on society.

7/19/2024