Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Read original: arXiv:2309.09697 - Published 5/21/2024 by Panatchakorn Anantaprayoon, Masahiro Kaneko, Naoaki Okazaki

💬

Overview

Researchers investigated gender biases in pre-trained language models (PLMs) across multiple languages
They proposed a new bias evaluation method called NLI-CoAL that considers all three labels in natural language inference (NLI) tasks
The method creates evaluation datasets in English, Japanese, and Chinese to measure biases in PLMs across languages

Plain English Explanation

Language models, which are AI systems trained on large amounts of text data, can sometimes develop biases - for example, associating certain occupations more strongly with one gender over another. This can lead to problems when these models are used in real-world applications.

In this research, the team focused on evaluating gender biases in language models across different languages - English, Japanese, and Chinese. Rather than just looking at whether a model makes the "right" or "wrong" prediction, they created a more nuanced evaluation method called NLI-CoAL.

NLI is a task where the model has to determine whether a given statement (the "premise") can be inferred from another statement (the "hypothesis"). NLI-CoAL looks at all three possible outcomes - entailment, neutral, and contradiction - to get a better sense of how the model is reasoning and where biases might be creeping in.

The researchers built datasets in each language that capture different types of gender biases, then used NLI-CoAL to measure how the language models performed. This allowed them to identify biases that might have been missed by simpler evaluation methods.

Importantly, they found gender biases not only in English language models, but also in Japanese and Chinese models. This suggests that these issues are widespread and that we need to be vigilant about measuring and addressing them as these technologies are developed and deployed around the world.

Technical Explanation

The researchers first created three groups of evaluation data for the NLI task, each representing a different type of gender bias:

Stereotypical gender biases: Premises and hypotheses that reinforce common gender stereotypes.
Occupational gender biases: Premises and hypotheses related to gender and occupation.
Relational gender biases: Premises and hypotheses involving familial or romantic relationships.

They then defined a bias measure based on the model's output probabilities for the three NLI labels (entailment, neutral, contradiction) across these data groups. This "NLI-CoAL" (Comprehensive Bias Alignment) metric aims to better capture biased inferences that may be associated with specific labels, rather than just looking at overall accuracy.

The researchers conducted experiments to validate their NLI-CoAL metric, showing that it can distinguish biased, incorrect inferences from non-biased incorrect inferences better than a baseline approach. They created NLI datasets in English, Japanese, and Chinese, and used their metric to analyze gender biases in language models across these languages.

The results revealed gender biases in language models for all three languages, confirming that these issues are not limited to English-language models. The researchers were the first to construct such evaluation datasets and measure biases in Japanese and Chinese language models.

Critical Analysis

The researchers acknowledge several limitations in their work. First, the evaluation datasets they created, while comprehensive, may not capture the full range of gender biases present in language models. There may be other types of biases that are not represented.

Additionally, the NLI-CoAL metric, while an improvement over previous approaches, still relies on human judgments to categorize biases into the three data groups. There is inherent subjectivity in this process that could affect the results.

The researchers also note that their analysis focuses on gender biases, but language models may exhibit biases along other demographic dimensions as well, such as race, age, or socioeconomic status. Further research would be needed to understand the intersectional nature of these biases.

Overall, this work represents an important step forward in measuring and understanding gender biases in language models across multiple languages. However, continued effort will be needed to develop more robust and comprehensive bias evaluation techniques, as well as to identify effective strategies for mitigating these biases in the development of language AI systems.

Conclusion

This research paper proposes a new method, NLI-CoAL, for comprehensively evaluating gender biases in pre-trained language models across multiple languages. By considering all three labels in natural language inference tasks, the authors were able to uncover biased inferences that might have been missed by simpler evaluation approaches.

The findings confirm that gender biases are prevalent not only in English language models, but also in Japanese and Chinese models. This underscores the importance of addressing these issues as language AI systems are developed and deployed globally.

While the NLI-CoAL method represents an important advance, the researchers acknowledge limitations and suggest areas for further work, such as exploring biases along other demographic dimensions. Continued research and vigilance will be necessary to ensure language technologies are equitable and avoid perpetuating harmful stereotypes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Panatchakorn Anantaprayoon, Masahiro Kaneko, Naoaki Okazaki

Discriminatory gender biases have been found in Pre-trained Language Models (PLMs) for multiple languages. In Natural Language Inference (NLI), existing bias evaluation methods have focused on the prediction results of one specific label out of three labels, such as neutral. However, such evaluation methods can be inaccurate since unique biased inferences are associated with unique prediction labels. Addressing this limitation, we propose a bias evaluation method for PLMs, called NLI-CoAL, which considers all the three labels of NLI task. First, we create three evaluation data groups that represent different types of biases. Then, we define a bias measure based on the corresponding label output of each data group. In the experiments, we introduce a meta-evaluation technique for NLI bias measures and use it to confirm that our bias measure can distinguish biased, incorrect inferences from non-biased incorrect inferences better than the baseline, resulting in a more accurate bias evaluation. We create the datasets in English, Japanese, and Chinese, and successfully validate the compatibility of our bias measure across multiple languages. Lastly, we observe the bias tendencies in PLMs of different languages. To our knowledge, we are the first to construct evaluation datasets and measure PLMs' bias from NLI in Japanese and Chinese.

5/21/2024

Leveraging Large Language Models to Measure Gender Bias in Gendered Languages

Erik Derner, Sara Sansalvador de la Fuente, Yoan Guti'errez, Paloma Moreda, Nuria Oliver

Gender bias in text corpora used in various natural language processing (NLP) contexts, such as for training large language models (LLMs), can lead to the perpetuation and amplification of societal inequalities. This is particularly pronounced in gendered languages like Spanish or French, where grammatical structures inherently encode gender, making the bias analysis more challenging. Existing methods designed for English are inadequate for this task due to the intrinsic linguistic differences between English and gendered languages. This paper introduces a novel methodology that leverages the contextual understanding capabilities of LLMs to quantitatively analyze gender representation in Spanish corpora. By utilizing LLMs to identify and classify gendered nouns and pronouns in relation to their reference to human entities, our approach provides a nuanced analysis of gender biases. We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender disparities with a male-to-female ratio ranging from 4:1 to 6:1. These findings demonstrate the value of our methodology for bias quantification in gendered languages and suggest its application in NLP, contributing to the development of more equitable language technologies.

6/21/2024

💬

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

7/16/2024

What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models

Jeongrok Yu, Seong Ug Kim, Jacob Choi, Jinho D. Choi

Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (MLMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper proposes a multilingual approach to estimate gender bias in MLMs from 5 languages: Chinese, English, German, Portuguese, and Spanish. Unlike previous work, our approach does not depend on parallel corpora coupled with English to detect gender bias in other languages using multilingual lexicons. Moreover, a novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias, compared to the traditional lexicon-based method. For each language, both the lexicon-based and model-based methods are applied to create two datasets respectively, which are used to evaluate gender bias in an MLM specifically trained for that language using one existing and 3 new scoring metrics. Our results show that the previous approach is data-sensitive and not stable as it does not remove contextual dependencies irrelevant to gender. In fact, the results often flip when different scoring metrics are used on the same dataset, suggesting that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice.

4/11/2024