Evaluating Metrics for Bias in Word Embeddings

Read original: arXiv:2111.07864 - Published 9/14/2024 by Sarah Schroder, Alexander Schulz, Philip Kenneweg, Robert Feldhans, Fabian Hinder, Barbara Hammer

🏋️

Overview

Word and sentence embeddings have become widely used for NLP tasks, but they can inherit biases from the training data.
Researchers have proposed various metrics to quantify bias in these embeddings, but there are doubts about the effectiveness of these metrics.
This work explores different cosine-based bias metrics, formalizes a bias definition, and proposes a new metric called SAME to address the limitations of existing metrics.

Plain English Explanation

[Plain English Explanation] Word and sentence embeddings are a way of representing text data that has become very important for Natural Language Processing (NLP) tasks. These embeddings can capture the meaning and relationships between words or sentences, and have led to significant improvements in NLP performance.

However, it has been found that these embeddings can also inherit biases that are present in the data used to train them. For example, if the training data contains gender stereotypes, the resulting embeddings may reflect those biases. This is problematic, as it can lead to unfair or discriminatory outputs in NLP applications.

Researchers have proposed various metrics to try to measure the biases present in word and sentence embeddings. The most common approach is to use cosine-based metrics, which look at the similarity between certain words or concepts. For example, a metric might check how similar the embeddings for "woman" and "nurse" are, compared to "man" and "doctor."

But recently, some researchers have raised doubts about these cosine-based metrics. They've found that even when these metrics report low levels of bias, other tests can still reveal biases in the embeddings. This suggests that the existing metrics may not be capturing all the nuances of bias in a robust way.

In this paper, the authors aim to address this issue. They formalize a definition of bias and derive conditions that a good bias metric should satisfy. They then thoroughly analyze the limitations of the existing cosine-based metrics, and propose a new metric called SAME (Similarity-Adjusted Mutual Embedding) that is designed to overcome these shortcomings.

The key idea behind SAME is to not just look at the raw cosine similarity, but to also consider how the similarity between concepts compares to the overall similarity distribution in the embedding space. This allows SAME to better detect biases that might be missed by simpler cosine-based approaches.

By providing a more rigorous and comprehensive way to evaluate bias in word and sentence embeddings, this research aims to help develop more fair and inclusive NLP systems.

Technical Explanation

[Technical Explanation] The paper begins by establishing the importance of word and sentence embeddings for NLP tasks, and the well-known issue of these embeddings inheriting biases from the training data. The authors note that many previous works have attempted to quantify bias in embeddings using cosine-based metrics, but that recent studies have raised doubts about the effectiveness of these metrics.

The authors then formalize a definition of bias, stating that an embedding is biased if the similarity between certain concepts (e.g. "woman" and "nurse") is significantly different from the similarity between other related concepts (e.g. "man" and "doctor"). They derive a set of conditions that a good bias metric should satisfy, such as being able to detect different types of bias and providing interpretable results.

The paper then provides a detailed analysis of existing cosine-based bias metrics, including the widely used cosine similarity difference (CSD) and cosine directional bias (CDB) metrics. The authors show that these metrics can fail to detect biases in certain cases, such as when the overall distribution of similarities in the embedding space is skewed.

To address these shortcomings, the authors propose a new metric called SAME (Similarity-Adjusted Mutual Embedding). SAME works by not only looking at the raw cosine similarity between concepts, but also considering how that similarity compares to the overall similarity distribution in the embedding space. This allows SAME to better identify biases that might be missed by simpler metrics.

The paper includes a mathematical proof showing that SAME satisfies the conditions for a good bias metric, and the authors also demonstrate the effectiveness of SAME through experiments on various word and sentence embedding datasets.

Critical Analysis

[Critical Analysis] The paper provides a thoughtful and rigorous analysis of the challenges in evaluating bias in word and sentence embeddings, and proposes a novel metric that aims to address the limitations of existing approaches.

One strength of the work is the authors' formal definition of bias and the conditions they derive for a good bias metric. This helps establish a clear theoretical framework for evaluating bias, which is important given the subjective and multifaceted nature of bias.

The in-depth analysis of the existing cosine-based metrics is also valuable, as it sheds light on the specific ways in which these metrics can fail to capture certain types of bias. This helps motivate the need for a more sophisticated approach like SAME.

That said, the paper does not extensively discuss potential limitations or caveats of the SAME metric itself. While the authors provide a mathematical proof of SAME's properties, there may be other edge cases or considerations that are not explored. Additionally, the evaluation of SAME is limited to a few selected datasets and embedding models.

It would also be helpful if the paper provided more discussion on the real-world implications and applications of this work. How might the SAME metric be used in practice to develop fairer NLP systems? What are the broader societal impacts of being able to better measure and mitigate bias in language models?

Overall, this is a well-executed technical paper that makes a meaningful contribution to the challenge of evaluating and addressing bias in word and sentence embeddings. The proposed SAME metric appears promising, but further research and validation would be needed to fully assess its capabilities and limitations.

Conclusion

[Conclusion] This paper tackles the important issue of bias in word and sentence embeddings, a core component of many modern NLP systems. The authors formalize a definition of bias, analyze the limitations of existing cosine-based bias metrics, and propose a new metric called SAME that aims to address these shortcomings.

By providing a more rigorous and comprehensive approach to bias evaluation, this research has the potential to help develop fairer and more inclusive language models and NLP applications. The SAME metric, in particular, shows promise as a tool for quantifying and mitigating biases in a way that previous metrics could not.

While further research and validation would be helpful, this work represents an important step forward in the ongoing effort to build ethical and equitable AI systems that respect and reflect the diversity of human language and experience.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏋️

Evaluating Metrics for Bias in Word Embeddings

Sarah Schroder, Alexander Schulz, Philip Kenneweg, Robert Feldhans, Fabian Hinder, Barbara Hammer

Over the last years, word and sentence embeddings have established as text preprocessing for all kinds of NLP tasks and improved the performances significantly. Unfortunately, it has also been shown that these embeddings inherit various kinds of biases from the training data and thereby pass on biases present in society to NLP solutions. Many papers attempted to quantify bias in word or sentence embeddings to evaluate debiasing methods or compare different embedding models, usually with cosine-based metrics. However, lately some works have raised doubts about these metrics showing that even though such metrics report low biases, other tests still show biases. In fact, there is a great variety of bias metrics or tests proposed in the literature without any consensus on the optimal solutions. Yet we lack works that evaluate bias metrics on a theoretical level or elaborate the advantages and disadvantages of different bias metrics. In this work, we will explore different cosine based bias metrics. We formalize a bias definition based on the ideas from previous works and derive conditions for bias metrics. Furthermore, we thoroughly investigate the existing cosine-based metrics and their limitations to show why these metrics can fail to report biases in some cases. Finally, we propose a new metric, SAME, to address the shortcomings of existing metrics and mathematically prove that SAME behaves appropriately.

9/14/2024

🐍

The SAME score: Improved cosine based bias score for word embeddings

Sarah Schroder, Alexander Schulz, Barbara Hammer

With the enourmous popularity of large language models, many researchers have raised ethical concerns regarding social biases incorporated in such models. Several methods to measure social bias have been introduced, but apparently these methods do not necessarily agree regarding the presence or severity of bias. Furthermore, some works have shown theoretical issues or severe limitations with certain bias measures. For that reason, we introduce SAME, a novel bias score for semantic bias in embeddings. We conduct a thorough theoretical analysis as well as experiments to show its benefits compared to similar bias scores from the literature. We further highlight a substantial relation of semantic bias measured by SAME with downstream bias, a connection that has recently been argued to be negligible. Instead, we show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.

9/14/2024

📉

Semantic Properties of cosine based bias scores for word embeddings

Sarah Schroder, Alexander Schulz, Fabian Hinder, Barbara Hammer

Plenty of works have brought social biases in language models to attention and proposed methods to detect such biases. As a result, the literature contains a great deal of different bias tests and scores, each introduced with the premise to uncover yet more biases that other scores fail to detect. What severely lacks in the literature, however, are comparative studies that analyse such bias scores and help researchers to understand the benefits or limitations of the existing methods. In this work, we aim to close this gap for cosine based bias scores. By building on a geometric definition of bias, we propose requirements for bias scores to be considered meaningful for quantifying biases. Furthermore, we formally analyze cosine based scores from the literature with regard to these requirements. We underline these findings with experiments to show that the bias scores' limitations have an impact in the application case.

9/14/2024

New!Analyzing Correlations Between Intrinsic and Extrinsic Bias Metrics of Static Word Embeddings With Their Measuring Biases Aligned

Taisei Kat^o, Yusuke Miyao

We examine the abilities of intrinsic bias metrics of static word embeddings to predict whether Natural Language Processing (NLP) systems exhibit biased behavior. A word embedding is one of the fundamental NLP technologies that represents the meanings of words through real vectors, and problematically, it also learns social biases such as stereotypes. An intrinsic bias metric measures bias by examining a characteristic of vectors, while an extrinsic bias metric checks whether an NLP system trained with a word embedding is biased. A previous study found that a common intrinsic bias metric usually does not correlate with extrinsic bias metrics. However, the intrinsic and extrinsic bias metrics did not measure the same bias in most cases, which makes us question whether the lack of correlation is genuine. In this paper, we extract characteristic words from datasets of extrinsic bias metrics and analyze correlations with intrinsic bias metrics with those words to ensure both metrics measure the same bias. We observed moderate to high correlations with some extrinsic bias metrics but little to no correlations with the others. This result suggests that intrinsic bias metrics can predict biased behavior in particular settings but not in others. Experiment codes are available at GitHub.

9/17/2024