Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives

Read original: arXiv:2407.18376 - Published 7/29/2024 by Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, Dong-Kyu Chae

Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives

Overview

The paper explores biases in large language models (LLMs) towards Bengali religious dialects.
Researchers evaluated several LLMs on their ability to handle dialects associated with different Bengali religious communities.
The study provides insights into the performance and biases of LLMs when dealing with linguistic diversity in multilingual and multireligious contexts.

Plain English Explanation

[object Object] are artificial intelligence systems that can generate human-like text. They are trained on vast amounts of online data, which can sometimes lead to the models reflecting societal biases present in the training data.

This paper focuses on exploring the biases of LLMs towards [object Object]. Bengali is a widely spoken language in South Asia, with various dialects associated with different religious communities. The researchers wanted to understand how well LLMs can handle this linguistic diversity and whether they exhibit biases towards certain religious dialects.

By [object Object] on tasks related to Bengali religious dialects, the researchers gained insights into the potential biases and limitations of these models. This can help identify areas where LLMs may struggle to accurately process and represent linguistic diversity, which is crucial for the development of [object Object] that can serve diverse communities.

Technical Explanation

The paper presents a comprehensive [object Object] on their ability to handle Bengali religious dialects. The researchers collected a dataset of text samples representing different Bengali religious communities, including Hinduism, Islam, and Christianity.

They then tested the LLMs on tasks such as dialect identification, sentiment analysis, and language generation using this dataset. The results revealed that the models exhibited varying degrees of [object Object], with some models performing better on specific dialects than others.

The study also explored the potential reasons for these biases, such as the [object Object] and the inherent challenges in representing linguistic diversity in LLMs. The findings highlight the need for more [object Object] that can effectively handle linguistic variations in multilingual and multireligious contexts.

Critical Analysis

The paper provides a valuable contribution to the [object Object]. However, the researchers acknowledge that their study is limited to a specific language and religious context, and more research is needed to understand the generalizability of the findings.

Additionally, the paper does not delve deeply into the potential societal implications of the observed biases or provide concrete [object Object] for addressing them. Further investigation into the practical applications and real-world impact of these biases could strengthen the overall significance of the research.

Conclusion

This study sheds light on the [object Object] in large language models. The findings underscore the need for more [object Object] to ensure they are inclusive and fair, particularly in multilingual and multireligious contexts. Continued efforts to [object Object] will be crucial for creating technology that serves the diverse needs of society.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives

Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, Dong-Kyu Chae

While Large Language Models (LLM) have created a massive technological impact in the past decade, allowing for human-enabled applications, they can produce output that contains stereotypes and biases, especially when using low-resource languages. This can be of great ethical concern when dealing with sensitive topics such as religion. As a means toward making LLMS more fair, we explore bias from a religious perspective in Bengali, focusing specifically on two main religious dialects: Hindu and Muslim-majority dialects. Here, we perform different experiments and audit showing the comparative analysis of different sentences using three commonly used LLMs: ChatGPT, Gemini, and Microsoft Copilot, pertaining to the Hindu and Muslim dialects of specific words and showcasing which ones catch the social biases and which do not. Furthermore, we analyze our findings and relate them to potential reasons and evaluation perspectives, considering their global impact with over 300 million speakers worldwide. With this work, we hope to establish the rigor for creating more fairness in LLMs, as these are widely used as creative writing agents.

7/29/2024

Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias

Jayanta Sadhu, Maneesha Rani Saha, Rifat Shahriyar

The rapid growth of Large Language Models (LLMs) has put forward the study of biases as a crucial field. It is important to assess the influence of different types of biases embedded in LLMs to ensure fair use in sensitive fields. Although there have been extensive works on bias assessment in English, such efforts are rare and scarce for a major language like Bangla. In this work, we examine two types of social biases in LLM generated outputs for Bangla language. Our main contributions in this work are: (1) bias studies on two different social biases for Bangla (2) a curated dataset for bias measurement benchmarking (3) two different probing techniques for bias detection in the context of Bangla. This is the first work of such kind involving bias assessment of LLMs for Bangla to the best of our knowledge. All our code and resources are publicly available for the progress of bias related research in Bangla NLP.

7/8/2024

💬

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

Large Language Models (LLMs), now used daily by millions, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame through Indian-BhED, a first of its kind dataset, containing stereotypical and anti-stereotypical examples in the context of caste and religious stereotypes in India. We find that the majority of LLMs tested have a strong propensity to output stereotypes in the Indian context, especially when compared to axes of bias traditionally studied in the Western context, such as gender and race. Notably, we find that GPT-2, GPT-2 Large, and GPT 3.5 have a particularly high propensity for preferring stereotypical outputs as a percent of all sentences for the axes of caste (63-79%) and religion (69-72%). We finally investigate potential causes for such harmful behaviour in LLMs, and posit intervention techniques to reduce both stereotypical and anti-stereotypical biases. The findings of this work highlight the need for including more diverse voices when researching fairness in AI and evaluating LLMs.

8/12/2024

💬

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

7/16/2024