Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

Read original: arXiv:2309.08573 - Published 8/12/2024 by Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

💬

Overview

Large language models (LLMs) are widely used, but can encode societal biases, leading to harmful effects on users.
Most bias research focuses on Western contexts, with less attention to biases in the Global South.
This paper quantifies stereotypical biases in popular LLMs using an Indian-centric dataset called Indian-BhED.

Plain English Explanation

Large language models are AI systems that can generate human-like text. These models are now used by millions of people every day. However, these models can pick up on and perpetuate societal biases, which can lead to harmful effects on the people who use them.

Most of the research on bias in language models has focused on biases in Western countries, like gender and racial biases. Comparatively less attention has been paid to biases in the Global South, such as biases related to caste and religion in India.

In this paper, the researchers quantify the level of stereotypical biases in popular language models using a new dataset called Indian-BhED. This dataset contains examples of stereotypical and anti-stereotypical beliefs about caste and religion in India.

The researchers found that many of the language models they tested, including GPT-2, GPT-2 Large, and GPT 3.5, have a strong tendency to generate stereotypical outputs, especially when it comes to caste (63-79%) and religion (69-72%) in the Indian context. This is higher than the biases traditionally studied in Western contexts, like gender and race.

The paper investigates possible reasons for these harmful biases in language models and suggests ways to reduce both stereotypical and anti-stereotypical biases. The findings highlight the need to include more diverse perspectives when studying fairness in AI and evaluating language models.

Technical Explanation

The researchers used a new dataset called Indian-BhED to quantify stereotypical biases in popular large language models (LLMs). Indian-BhED contains examples of stereotypical and anti-stereotypical beliefs about caste and religion in India.

The researchers tested several widely-used LLMs, including GPT-2, GPT-2 Large, and GPT 3.5. They measured the models' propensity to output stereotypical versus anti-stereotypical statements for each axis of bias (caste and religion).

The results show that the majority of LLMs tested have a strong tendency to generate stereotypical outputs, especially for caste (63-79%) and religion (69-72%). This is notably higher than biases traditionally studied in Western contexts, such as gender and race.

The researchers investigate potential causes for these harmful biases, including the models' training data and architectures. They also propose intervention techniques, such as targeted debiasing and adversarial training, to reduce both stereotypical and anti-stereotypical biases in LLMs.

Critical Analysis

The paper provides valuable insights into the presence of stereotypical biases in LLMs, particularly in the context of the Global South. However, the research is limited to a single dataset (Indian-BhED) and a handful of popular LLMs. Additional datasets and models from diverse cultural contexts would be needed to fully understand the extent and manifestations of bias in LLMs globally.

The authors acknowledge that their dataset may not capture the full complexity of social biases in India, and that further research is needed to develop more comprehensive benchmarks. They also note that their intervention techniques require further testing and refinement to be effective at reducing harmful biases in real-world applications.

It would be valuable for future research to explore the societal impacts of these biases, such as how they may contribute to the perpetuation of systemic discrimination. Additionally, more work is needed to understand the role of LLM developers and users in addressing bias and promoting fairness in AI systems.

Conclusion

This paper highlights the need for a more inclusive and global approach to studying and mitigating biases in large language models. The findings demonstrate that popular LLMs exhibit strong stereotypical biases, particularly in the context of caste and religion in India, which have been underexplored compared to biases in Western contexts.

The researchers' work underscores the importance of incorporating diverse perspectives and experiences when developing AI systems and evaluating their fairness. By expanding the scope of bias research to include the Global South, the field can work towards creating more inclusive and equitable language models that better serve the needs of all users.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

Large Language Models (LLMs), now used daily by millions, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame through Indian-BhED, a first of its kind dataset, containing stereotypical and anti-stereotypical examples in the context of caste and religious stereotypes in India. We find that the majority of LLMs tested have a strong propensity to output stereotypes in the Indian context, especially when compared to axes of bias traditionally studied in the Western context, such as gender and race. Notably, we find that GPT-2, GPT-2 Large, and GPT 3.5 have a particularly high propensity for preferring stereotypical outputs as a percent of all sentences for the axes of caste (63-79%) and religion (69-72%). We finally investigate potential causes for such harmful behaviour in LLMs, and posit intervention techniques to reduce both stereotypical and anti-stereotypical biases. The findings of this work highlight the need for including more diverse voices when researching fairness in AI and evaluating LLMs.

8/12/2024

IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context

Nihar Ranjan Sahoo, Pranamya Prashant Kulkarni, Narjis Asad, Arif Ahmad, Tanu Goyal, Aparna Garimella, Pushpak Bhattacharyya

The pervasive influence of social biases in language data has sparked the need for benchmark datasets that capture and evaluate these biases in Large Language Models (LLMs). Existing efforts predominantly focus on English language and the Western context, leaving a void for a reliable dataset that encapsulates India's unique socio-cultural nuances. To bridge this gap, we introduce IndiBias, a comprehensive benchmarking dataset designed specifically for evaluating social biases in the Indian context. We filter and translate the existing CrowS-Pairs dataset to create a benchmark dataset suited to the Indian context in Hindi language. Additionally, we leverage LLMs including ChatGPT and InstructGPT to augment our dataset with diverse societal biases and stereotypes prevalent in India. The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation. We also build a resource to address intersectional biases along three intersectional dimensions. Our dataset contains 800 sentence pairs and 300 tuples for bias measurement across different demographics. The dataset is available in English and Hindi, providing a size comparable to existing benchmark datasets. Furthermore, using IndiBias we compare ten different language models on multiple bias measurement metrics. We observed that the language models exhibit more bias across a majority of the intersectional groups.

4/4/2024

Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives

Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, Dong-Kyu Chae

While Large Language Models (LLM) have created a massive technological impact in the past decade, allowing for human-enabled applications, they can produce output that contains stereotypes and biases, especially when using low-resource languages. This can be of great ethical concern when dealing with sensitive topics such as religion. As a means toward making LLMS more fair, we explore bias from a religious perspective in Bengali, focusing specifically on two main religious dialects: Hindu and Muslim-majority dialects. Here, we perform different experiments and audit showing the comparative analysis of different sentences using three commonly used LLMs: ChatGPT, Gemini, and Microsoft Copilot, pertaining to the Hindu and Muslim dialects of specific words and showcasing which ones catch the social biases and which do not. Furthermore, we analyze our findings and relate them to potential reasons and evaluation perspectives, considering their global impact with over 300 million speakers worldwide. With this work, we hope to establish the rigor for creating more fairness in LLMs, as these are widely used as creative writing agents.

7/29/2024

Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias

Jayanta Sadhu, Maneesha Rani Saha, Rifat Shahriyar

The rapid growth of Large Language Models (LLMs) has put forward the study of biases as a crucial field. It is important to assess the influence of different types of biases embedded in LLMs to ensure fair use in sensitive fields. Although there have been extensive works on bias assessment in English, such efforts are rare and scarce for a major language like Bangla. In this work, we examine two types of social biases in LLM generated outputs for Bangla language. Our main contributions in this work are: (1) bias studies on two different social biases for Bangla (2) a curated dataset for bias measurement benchmarking (3) two different probing techniques for bias detection in the context of Bangla. This is the first work of such kind involving bias assessment of LLMs for Bangla to the best of our knowledge. All our code and resources are publicly available for the progress of bias related research in Bangla NLP.

7/8/2024