Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models

2405.14555

Published 6/4/2024 by Abhishek Kumar, Sarfaroz Yunusov, Ali Emami

💬

Abstract

Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.

Create account to get full access

Overview

This research paper explores two types of biases present in large language models (LLMs): representative bias and affinity bias.
Representative bias refers to the tendency of LLMs to generate outputs that mirror the experiences of certain identity groups.
Affinity bias reflects the models' evaluative preferences for specific narratives or viewpoints.
The researchers introduce two novel metrics, the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), to measure these biases.
They also present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks like short story writing and poetry composition, designed to detect these subtle biases.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can generate human-like text. However, research on the impact of unstated norms and bias analysis in language models has shown that these models can exhibit subtle biases that can significantly influence their outputs.

This study focuses on two specific types of biases: representative bias and affinity bias. Representative bias refers to the tendency of LLMs to generate outputs that primarily reflect the experiences of certain identity groups, such as being white, straight, or male. Affinity bias, on the other hand, is the models' evaluative preference for specific narratives or viewpoints.

To measure these biases, the researchers developed two new metrics: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS). They also created the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks like short story writing and poetry composition, designed to explore subjectivity and a more human-centric assessment of social biases.

The study found that prominent LLMs exhibit significant representative biases, with a preference for identities associated with being white, straight, and male. Additionally, the investigation of affinity bias revealed distinctive evaluative patterns within each model, akin to "bias fingerprints." Interestingly, this trend was also observed in human evaluators, highlighting the complex interplay between human and machine bias perceptions.

Technical Explanation

The researchers aimed to address the issue of subtle biases in LLMs that can influence their outputs and potentially reinforce certain social narratives. They focused on two specific biases: representative bias and affinity bias.

To measure these biases, the researchers developed two novel metrics:

Representative Bias Score (RBS): This metric quantifies the extent to which an LLM's outputs tend to mirror the experiences of particular identity groups.
Affinity Bias Score (ABS): This metric reflects the model's evaluative preferences for specific narratives or viewpoints.

The researchers also introduced the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect biases in large language models.

The study's analysis revealed marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and male. Furthermore, the investigation of affinity bias uncovered distinctive evaluative patterns within each model, akin to "bias fingerprints." Interestingly, this trend was also observed in human evaluators, suggesting a complex interplay between human and machine bias perceptions.

Critical Analysis

The researchers acknowledge several caveats and limitations in their study. For instance, the findings suggest that large language models are inconsistent and biased evaluators, and further research is needed to understand the extent and nature of these biases.

Additionally, the bias mitigation framework proposed in the paper focuses on measuring biases, but the researchers did not explore specific techniques for mitigating these biases. Future research should investigate effective methods for reducing representative and affinity biases in LLMs.

It's also worth noting that the study's findings are based on a limited set of LLMs and open-ended tasks. Expanding the analysis to a broader range of models and tasks could provide a more comprehensive understanding of these biases and their implications.

Conclusion

This research highlights the importance of scrutinizing the subtle biases present in large language models, as they can significantly influence the models' outputs and reinforce certain social narratives. By introducing novel metrics to measure representative and affinity biases, and the Creativity-Oriented Generation Suite to detect these biases, the researchers have made an important contribution to the field of AI bias analysis.

The study's findings suggest that prominent LLMs exhibit significant representative biases, favoring identities associated with being white, straight, and male. The investigation of affinity bias also revealed distinctive evaluative patterns within each model, akin to "bias fingerprints." This trend was observed in both machine and human evaluators, underscoring the complex interplay between human and machine bias perceptions.

As the use of LLMs continues to expand, it is crucial to quantify and mitigate the label bias in these models to ensure their outputs are fair, inclusive, and representative of diverse experiences. This research provides a valuable framework for addressing these important challenges in the development and deployment of large language models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, Thomas L. Griffiths

Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks. Both measures are based on psychological research: LLM Implicit Bias adapts the Implicit Association Test, widely used to study the automatic associations between concepts held in human minds; and LLM Decision Bias operationalizes psychological results indicating that relative evaluations between two candidates, not absolute evaluations assessing each independently, are more diagnostic of implicit biases. Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories (race, gender, religion, health) in 21 stereotypes (such as race and criminality, race and weapons, gender and science, age and negativity). Our prompt-based LLM Implicit Bias measure correlates with existing language model embedding-based bias methods, but better predicts downstream behaviors measured by LLM Decision Bias. These new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on purely observable behavior; they expose nuanced biases in proprietary value-aligned LLMs that appear unbiased according to standard benchmarks.

5/24/2024

cs.CY cs.CL

Ask LLMs Directly, What shapes your bias?: Measuring Social Bias in Large Language Models

Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, Jong C. Park

Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities. To fully understand such social bias in large language models (LLMs), it is essential to consider the composite of social perceptions from diverse perspectives among identities. Previous studies have either evaluated biases in LLMs by indirectly assessing the presence of sentiments towards demographic identities in the generated text or measuring the degree of alignment with given stereotypes. These methods have limitations in directly quantifying social biases at the level of distinct perspectives among identities. In this paper, we aim to investigate how social perceptions from various viewpoints contribute to the development of social bias in LLMs. To this end, we propose a novel strategy to intuitively quantify these social perceptions and suggest metrics that can evaluate the social biases within LLMs by aggregating diverse social perceptions. The experimental results show the quantitative demonstration of the social attitude in LLMs by examining social perception. The analysis we conducted shows that our proposed metrics capture the multi-dimensional aspects of social bias, enabling a fine-grained and comprehensive investigation of bias in LLMs.

6/7/2024

cs.CL cs.AI cs.CY

Reevaluating Bias Detection in Language Models: The Role of Implicit Norm

Farnaz Kohankhaki, Jacob-Junqi Tian, David Emerson, Laleh Seyyed-Kalantari, Faiza Khan Khattak

Large language models (LLMs), trained on vast datasets, can carry biases that manifest in various forms, from overt discrimination to implicit stereotypes. One facet of bias is performance disparities in LLMs, often harming underprivileged groups, such as racial minorities. A common approach to quantifying bias is to use template-based bias probes, which explicitly state group membership (e.g. White) and evaluate if the outcome of a task, sentiment analysis for instance, is invariant to the change of group membership (e.g. change White race to Black). This approach is widely used in bias quantification. However, in this work, we find evidence of an unexpectedly overlooked consequence of using template-based probes for LLM bias quantification. We find that in doing so, text examples associated with White ethnicities appear to be classified as exhibiting negative sentiment at elevated rates. We hypothesize that the scenario arises artificially through a mismatch between the pre-training text of LLMs and the templates used to measure bias through reporting bias, unstated norms that imply group membership without explicit statement. Our finding highlights the potential misleading impact of varying group membership through explicit mention in bias quantification

4/9/2024

cs.CL cs.CY cs.LG

Towards Region-aware Bias Evaluation Metrics

Angana Borah, Aparna Garimella, Rada Mihalcea

When exposed to human-generated data, language models are known to learn and amplify societal biases. While previous works introduced benchmarks that can be used to assess the bias in these models, they rely on assumptions that may not be universally true. For instance, a gender bias dimension commonly used by these metrics is that of family--career, but this may not be the only common bias in certain regions of the world. In this paper, we identify topical differences in gender bias across different regions and propose a region-aware bottom-up approach for bias assessment. Our proposed approach uses gender-aligned topics for a given region and identifies gender bias dimensions in the form of topic pairs that are likely to capture gender societal biases. Several of our proposed bias topic pairs are on par with human perception of gender biases in these regions in comparison to the existing ones, and we also identify new pairs that are more aligned than the existing ones. In addition, we use our region-aware bias topic pairs in a Word Embedding Association Test (WEAT)-based evaluation metric to test for gender biases across different regions in different data domains. We also find that LLMs have a higher alignment to bias pairs for highly-represented regions showing the importance of region-aware bias evaluation metric.

6/26/2024

cs.CL