Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Read original: arXiv:2404.07475 - Published 4/17/2024 by Evan Shieh, Faye-Marie Vassel, Cassidy Sugimoto, Thema Monroe-White

💬

Overview

Generative language models (LMs) have become widely deployed, raising concerns about their potential to perpetuate social biases and harm diverse consumers.
Prior research on bias in LMs has primarily focused on explicit identity prompts, but bias can also occur in open-ended prompts where identity is not specified.
This study examines bias in LM-generated responses to a broader range of natural use cases, considering the potential for harm to minoritized individuals with intersectional identities.

Plain English Explanation

The rapid development of powerful language models, such as ChatGPT, has raised concerns about their ability to perpetuate harmful biases against diverse groups of people. Previous studies on bias in these models have mainly looked at cases where specific identity characteristics, like race or gender, were explicitly mentioned. However, research on earlier language-based technologies has shown that discrimination can occur even when identity is not explicitly stated.

In this study, the researchers wanted to take a broader look at how bias might manifest in language models when they are used in more open-ended, real-world scenarios, where the identity of the user is left unspecified. They examined the responses of several popular language models, including ChatGPT, to a variety of natural prompts to see if the generated text perpetuated harmful stereotypes or portrayed minoritized groups in a negative or subordinate way.

The researchers found widespread evidence of bias in the language models' outputs. Individuals with intersectional identities, such as those with racial, gender, or sexual orientation minorities, were hundreds to thousands of times more likely to encounter model-generated content that depicted their identities in a subordinated or stereotypical manner, compared to more representative or empowering portrayals. This is concerning because such negative representations can trigger psychological harms, like stereotype threat, that can impair cognitive performance and self-perception.

The findings highlight the urgent need to address bias in language models and invest in educational programs that empower diverse consumers to critically evaluate the content they encounter, particularly when it comes to AI-generated news and media.

Technical Explanation

The researchers examined the responses of five prominent language models (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) to a wide range of open-ended prompts, without explicitly specifying any identity characteristics. They analyzed the generated text for evidence of biases that could lead to harms of omission, subordination, and stereotyping for individuals with intersectional racial, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer).

The study found that the language models overwhelmingly produced content that perpetuated these harms. Minoritized individuals were hundreds to thousands of times more likely to encounter model-generated outputs that portrayed their identities in a subordinated manner, compared to representative or empowering portrayals. The researchers also documented a prevalence of harmful stereotypes (e.g., "perpetual foreigner") in the model-generated text, which are known to trigger psychological harms like stereotype threat.

Critical Analysis

The researchers acknowledge that their study has some limitations. They note that the analysis focused on a specific set of prompts and that the language models may exhibit different biases in other contexts. Additionally, the study did not directly measure the real-world impacts of the observed biases on consumers.

That said, the findings are highly concerning and highlight the urgent need to address bias in language models. While previous research has examined bias in more controlled settings, this study demonstrates that bias can also manifest in open-ended, real-world scenarios where the identity of the user is not explicitly stated.

The researchers call for further research to better understand the mechanisms behind these biases and to develop effective strategies for mitigating them. They also emphasize the importance of critical AI education programs to empower diverse consumers to critically evaluate the content they encounter, particularly when it comes to AI-generated media and information.

Conclusion

This study provides a sobering look at the extent to which prominent language models can perpetuate harmful biases and stereotypes, even in open-ended scenarios where identity is not explicitly stated. The findings underscore the need for rigorous efforts to address bias in these powerful AI systems, as well as the importance of empowering diverse consumers to critically evaluate the information they encounter online.

By raising awareness of these issues and pushing for more inclusive and ethical AI development, the researchers hope to protect vulnerable populations from the potential harms of language model-generated content. Ongoing research and education will be crucial in ensuring that the rapid advancements in AI technology benefit all members of society, rather than exacerbating existing inequalities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Evan Shieh, Faye-Marie Vassel, Cassidy Sugimoto, Thema Monroe-White

The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this laissez-faire setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.

4/17/2024

🤖

The Psychosocial Impacts of Generative AI Harms

Faye-Marie Vassel, Evan Shieh, Cassidy R. Sugimoto, Thema Monroe-White

The rapid emergence of generative Language Models (LMs) has led to growing concern about the impacts that their unexamined adoption may have on the social well-being of diverse user groups. Meanwhile, LMs are increasingly being adopted in K-20 schools and one-on-one student settings with minimal investigation of potential harms associated with their deployment. Motivated in part by real-world/everyday use cases (e.g., an AI writing assistant) this paper explores the potential psychosocial harms of stories generated by five leading LMs in response to open-ended prompting. We extend findings of stereotyping harms analyzing a total of 150K 100-word stories related to student classroom interactions. Examining patterns in LM-generated character demographics and representational harms (i.e., erasure, subordination, and stereotyping) we highlight particularly egregious vignettes, illustrating the ways LM-generated outputs may influence the experiences of users with marginalized and minoritized identities, and emphasizing the need for a critical understanding of the psychosocial impacts of generative AI tools when deployed and utilized in diverse social contexts.

5/6/2024

💬

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

7/16/2024

💬

Generative Language Models Exhibit Social Identity Biases

Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek

The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases known from social psychology, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences (e.g., We are...). Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans, both in the lab and in real-world conversations with LLMs, and that curating training data and instruction fine-tuning can mitigate such biases. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

6/18/2024