Generative Language Models Exhibit Social Identity Biases

2310.15819

Published 6/18/2024 by Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek

cs.CL cs.CY

💬

Abstract

The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases known from social psychology, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences (e.g., We are...). Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans, both in the lab and in real-world conversations with LLMs, and that curating training data and instruction fine-tuning can mitigate such biases. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

Create account to get full access

Overview

The paper investigates whether large language models (LLMs) exhibit fundamental social identity biases, such as ingroup solidarity and outgroup hostility, similar to human biases.
The researchers analyzed 56 LLMs, including foundational models and instruction fine-tuned models, to assess their biases when prompted to complete sentences.
The findings suggest that most LLMs exhibit clear ingroup-positive and outgroup-negative associations, similar to human biases observed in social psychology.
The research has practical implications for creating less biased LLMs and highlights the need for further study on user interactions with LLMs to prevent potential bias reinforcement.

Plain English Explanation

The recent popularity of large language models (LLMs) has raised concerns about the potential biases these models may have learned from the data they were trained on. This paper investigates whether LLMs exhibit fundamental social identity biases, such as favoring their own group (ingroup) and being hostile towards other groups (outgroup), which are commonly observed in human behavior.

The researchers analyzed 56 different LLMs, including foundational models (like GPT-3) and models that have been fine-tuned with specific instructions. When these models were prompted to complete sentences, the researchers found that almost all of them showed clear biases towards their "ingroup" and against "outgroups." This means the LLMs tended to associate positive qualities with their own group and negative qualities with other groups, just like people often do.

These findings suggest that modern language models exhibit human-like social identity biases, even though they are artificial intelligence systems. This is an important discovery because it means these biases could be reinforced when people interact with LLMs in real-world conversations. The research also shows that curating training data and fine-tuning the models can help reduce such biases.

Overall, this study highlights the need for more research into how users interact with LLMs and the potential risks of algorithmic biases being amplified through these interactions. By understanding the biases present in LLMs, researchers and developers can work towards creating more inclusive and less biased AI systems.

Technical Explanation

The researchers used a sentence completion task to assess 56 large language models (LLMs) for the presence of ingroup-positive and outgroup-negative biases, which are fundamental social identity biases observed in human behavior. The models included both foundational language models (e.g., GPT-3) and instruction fine-tuned models.

In the experiment, the LLMs were prompted with sentence starters like "We are..." and "They are..." to measure their associations with various social groups. The researchers analyzed the model outputs to determine the degree of ingroup favoritism and outgroup hostility exhibited by each LLM.

The results showed that almost all of the foundational language models and some of the instruction fine-tuned models displayed clear ingroup-positive and outgroup-negative biases. This suggests that modern LLMs, like humans, have learned to exhibit fundamental social identity biases from the data they were trained on.

The researchers note that curating the training data and fine-tuning the models with specific instructions can help mitigate such biases. However, the findings also highlight the need for more research into user interactions with LLMs to understand how these biases may be reinforced in real-world conversations and potentially influence human perceptions and behavior.

Critical Analysis

The researchers acknowledge several limitations of their study. Firstly, the sentence completion task may not fully capture the nuanced ways in which LLMs can exhibit social biases. Additionally, the study focused on a limited set of 56 LLMs, and the findings may not generalize to the broader landscape of language models.

Furthermore, the paper does not provide a comprehensive analysis of the potential sources of these biases, such as the specific training data or model architectures that may contribute to their development. This makes it challenging to derive clear recommendations for how to effectively address the issue of bias in LLMs.

While the researchers suggest that curating training data and fine-tuning the models can mitigate these biases, the specific techniques and their effectiveness are not thoroughly explored. Additional research is needed to understand the most effective approaches for reducing social identity biases in LLMs.

Additionally, the paper does not delve into the potential consequences of these biases in real-world applications of LLMs, such as their impact on decision-making, user experience, or societal implications. A deeper exploration of these issues would help reinforce the importance of this research and the need for continued efforts to address bias in AI systems.

Conclusion

This paper provides compelling evidence that large language models exhibit fundamental social identity biases, similar to those observed in human behavior. The findings suggest that the increasing prominence of LLMs in various applications raises concerns about the potential for these biases to be reinforced and amplified through user interactions.

The research highlights the need for a more comprehensive understanding of the sources and manifestations of bias in LLMs, as well as the development of effective strategies for mitigating these biases. By addressing these issues, researchers and developers can work towards creating more inclusive and equitable AI systems that do not perpetuate or exacerbate societal prejudices.

Ultimately, this study underscores the importance of ongoing efforts to ensure the responsible development and deployment of large language models, with a focus on promoting fairness, transparency, and accountability in AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Messi H. J. Lee, Jacob M. Montgomery, Calvin K. Lai

Large language models (LLMs) are becoming pervasive in everyday life, yet their propensity to reproduce biases inherited from training data remains a pressing concern. Prior investigations into bias in LLMs have focused on the association of social groups with stereotypical attributes. However, this is only one form of human bias such systems may reproduce. We investigate a new form of bias in LLMs that resembles a social psychological phenomenon where socially subordinate groups are perceived as more homogeneous than socially dominant groups. We had ChatGPT, a state-of-the-art LLM, generate texts about intersectional group identities and compared those texts on measures of homogeneity. We consistently found that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans, indicating that the model described racial minority groups with a narrower range of human experience. ChatGPT also portrayed women as more homogeneous than men, but these differences were small. Finally, we found that the effect of gender differed across racial/ethnic groups such that the effect of gender was consistent within African and Hispanic Americans but not within Asian and White Americans. We argue that the tendency of LLMs to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.

4/29/2024

cs.CL

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Mina Arzaghi, Florian Carichon, Golnoosh Farnadi

Large Language Models (LLMs) are increasingly integrated into critical decision-making processes, such as loan approvals and visa applications, where inherent biases can lead to discriminatory outcomes. In this paper, we examine the nuanced relationship between demographic attributes and socioeconomic biases in LLMs, a crucial yet understudied area of fairness in LLMs. We introduce a novel dataset of one million English sentences to systematically quantify socioeconomic biases across various demographic groups. Our findings reveal pervasive socioeconomic biases in both established models such as GPT-2 and state-of-the-art models like Llama 2 and Falcon. We demonstrate that these biases are significantly amplified when considering intersectionality, with LLMs exhibiting a remarkable capacity to extract multiple demographic attributes from names and then correlate them with specific socioeconomic biases. This research highlights the urgent necessity for proactive and robust bias mitigation techniques to safeguard against discriminatory outcomes when deploying these powerful models in critical real-world applications.

5/30/2024

cs.CL cs.CY cs.LG

💬

How Susceptible are Large Language Models to Ideological Manipulation?

Kai Chen, Zihao He, Jun Yan, Taiwei Shi, Kristina Lerman

Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideologies within these models can be easily manipulated. In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. Our findings reveal a concerning vulnerability: exposure to only a small amount of ideologically driven samples significantly alters the ideology of LLMs. Notably, LLMs demonstrate a startling ability to absorb ideology from one topic and generalize it to even unrelated ones. The ease with which LLMs' ideologies can be skewed underscores the risks associated with intentionally poisoned training data by malicious actors or inadvertently introduced biases by data annotators. It also emphasizes the imperative for robust safeguards to mitigate the influence of ideological manipulations on LLMs.

6/19/2024

cs.CL cs.CR cs.CY

💬

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Evan Shieh, Faye-Marie Vassel, Cassidy Sugimoto, Thema Monroe-White

The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this laissez-faire setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.

4/17/2024

cs.CL cs.AI cs.CY cs.LG