Multilingual large language models leak human stereotypes across language boundaries

Read original: arXiv:2312.07141 - Published 5/10/2024 by Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

Multilingual large language models leak human stereotypes across language boundaries

Overview

This paper examines how multilingual large language models (MLLMs) can inadvertently learn and reproduce human stereotypes across different languages.
The researchers developed techniques to measure the extent of this "stereotype leakage" in popular MLLM models like XLMR and mT5.
Their findings suggest that these models absorb and amplify societal biases from their training data, with concerning implications for the ethical deployment of such technologies.

Plain English Explanation

The paper looks at how powerful AI language models that can understand and generate text in multiple languages may inadvertently pick up and spread harmful stereotypes. These large language models are trained on huge amounts of online data, which can reflect societal biases and prejudices.

The researchers developed ways to measure how much these biases "leak" across language boundaries when the models are used. For example, they found that if an MLLM is exposed to sexist associations in English, it may start making similar biased associations in other languages it knows, like French or Spanish.

This is concerning because these language models are increasingly being used in real-world applications, from customer service chatbots to automated writing assistants. If they propagate stereotypes, it could reinforce unfair treatment of certain groups. The paper highlights the need to carefully audit and address these ethical issues as MLLMs become more widespread.

Technical Explanation

The researchers used a range of techniques to quantify stereotype leakage in popular multilingual language models like XLMR and mT5.

First, they leveraged stereotype measurement datasets to probe the models' associations between social groups and attributes. This allowed them to detect biases like associating women with home and family, or ethnic minorities with lower socioeconomic status.

Next, they analyzed the models' cross-lingual generalization, showing how biases learned in one language can get transferred to others. For example, an English-trained model may start making similar gender stereotypes in French or Mandarin.

The team also investigated how dataset imbalances and unspecified training norms can exacerbate this stereotype leakage problem.

Overall, the findings demonstrate the troubling tendency of powerful multilingual language models to internalize and propagate human biases across linguistic boundaries. This has significant ethical implications for the real-world deployment of these technologies.

Critical Analysis

The paper provides a rigorous empirical investigation of an important and underexplored issue in ethical AI. By developing novel techniques to quantify stereotype leakage, the researchers make a valuable contribution to the growing body of work on bias in large language models.

That said, the study has some limitations. The stereotype measurement datasets used, while multilingual, may not fully capture the nuanced ways biases manifest across diverse cultural contexts. Additionally, the analysis focuses on a few prominent MLLM architectures, but the findings may not generalize to the rapidly evolving landscape of multilingual language models.

Further research is needed to understand the complex interplay between dataset characteristics, model architectures, and the propagation of stereotypes. Longitudinal studies tracking how these issues evolve as the technology matures would also be valuable.

Ultimately, this work highlights the critical need for deep, multidisciplinary collaboration between AI researchers, ethicists, and domain experts to ensure the responsible development and deployment of multilingual language models. Ongoing vigilance and a commitment to identifying and mitigating societal biases will be essential.

Conclusion

This paper demonstrates that multilingual large language models can inadvertently learn and spread human stereotypes across language boundaries. By developing novel techniques to measure this "stereotype leakage," the researchers have shed light on a concerning ethical issue that must be addressed as these powerful AI systems become more widespread.

The findings underscore the urgent need for comprehensive auditing and mitigation strategies to ensure that multilingual language models respect human rights and promote fairness and inclusion. As the technology continues to advance, ongoing collaboration between diverse stakeholders will be crucial to realizing the transformative potential of these tools while upholding fundamental ethical principles.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Multilingual large language models leak human stereotypes across language boundaries

Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language models undergo the same training procedure as monolingual ones, albeit with training data sourced from various languages. This raises the question: do stereotypes present in one social context leak across languages within the model? In our work, we first define the term ``stereotype leakage'' and propose a framework for its measurement. With this framework, we investigate how stereotypical associations leak across four languages: English, Russian, Chinese, and Hindi. To quantify the stereotype leakage, we employ an approach from social psychology, measuring stereotypes via group-trait associations. We evaluate human stereotypes and stereotypical associations manifested in multilingual large language models such as mBERT, mT5, and GPT-3.5. Our findings show a noticeable leakage of positive, negative, and non-polar associations across all languages. Notably, Hindi within multilingual models appears to be the most susceptible to influence from other languages, while Chinese is the least. Additionally, GPT-3.5 exhibits a better alignment with human scores than other models. WARNING: This paper contains model outputs which could be offensive in nature.

5/10/2024

💬

Do Multilingual Large Language Models Mitigate Stereotype Bias?

Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Gorge, Akbar Karimi, Joan Plepi, Nazia Afsan Mowmita, Nicolas Flores-Herr, Mehdi Ali, Lucie Flek

While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.

7/10/2024

Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation

Zekun Wu, Sahan Bulathwela, Maria Perez-Ortiz, Adriano Soares Koshiyama

Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including SHAP, LIME, and BertViz, and analyse a series of example cases discussing the results. Finally, we develop a series of stereotype elicitation prompts and evaluate the presence of stereotypes in text generation tasks with popular LLMs, using one of our best performing previously presented stereotypes detectors. Our experiments yielded several key findings: i) Training stereotype detectors in a multi-dimension setting yields better results than training multiple single-dimension classifiers.ii) The integrated MGS Dataset enhances both the in-dataset and cross-dataset generalisation ability of stereotype detectors compared to using the datasets separately. iii) There is a reduction in stereotypes in the content generated by GPT Family LLMs with newer versions.

4/3/2024

💬

Generative Language Models Exhibit Social Identity Biases

Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek

The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases known from social psychology, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences (e.g., We are...). Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans, both in the lab and in real-world conversations with LLMs, and that curating training data and instruction fine-tuning can mitigate such biases. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

6/18/2024