Fairness Improvement with Multiple Protected Attributes: How Far Are We?

Read original: arXiv:2308.01923 - Published 4/5/2024 by Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

⚙️

Overview

Existing research on fairness in machine learning (ML) often focuses on a single protected attribute (e.g., race or gender) at a time, which is unrealistic given that many users have multiple protected attributes.
This paper conducts an extensive study on fairness improvement methods that consider multiple protected attributes.
The paper analyzes the effectiveness of 11 state-of-the-art fairness improvement methods across different datasets, metrics, and ML models when considering multiple protected attributes.

Plain English Explanation

Machine learning (ML) systems are increasingly used to make important decisions that affect people's lives, such as loan approvals, job recommendations, and medical diagnoses. It's crucial that these systems are fair and unbiased, treating people equally regardless of their personal characteristics like race, gender, or age.

Most previous research on fairness in ML has focused on improving fairness with respect to a single protected attribute, such as race or gender. However, in reality, many people have multiple protected attributes (e.g., a woman of color). This paper examines what happens when you try to make ML systems fair for multiple protected attributes at the same time.

The researchers tested 11 different methods for improving fairness in ML models. They looked at how well these methods worked across various datasets, performance metrics, and types of ML models, all while considering two or more protected attributes.

The results were eye-opening. The researchers found that improving fairness for one protected attribute often led to a significant decrease in fairness for other unconsidered attributes. This occurred in up to 88.3% of the scenarios they tested, on average.

Surprisingly, the researchers also found that maintaining high accuracy in the ML models was still possible when considering multiple protected attributes. However, the impact on other performance metrics, like F1-score, was much greater when handling two protected attributes compared to just one.

These findings have important implications. They suggest that reporting only accuracy as the measure of ML performance is not enough - researchers and developers need to look at a wider range of metrics to fully understand the tradeoffs involved in building fair and equitable AI systems.

Technical Explanation

This paper presents an extensive empirical study on the effectiveness of 11 state-of-the-art fairness improvement methods when considering multiple protected attributes. The researchers evaluated these methods across a variety of datasets, performance metrics, and ML models.

The key experiment design involved training ML models to predict a target variable while optimizing for fairness with respect to two or more protected attributes (e.g., race and gender). The researchers then analyzed how well the fairness-optimized models performed in terms of standard ML metrics like accuracy, F1-score, and various fairness metrics.

The results showed that improving fairness for a single protected attribute can significantly decrease fairness for other unconsidered attributes. This "negative fairness transfer" effect was observed in up to 88.3% of the scenarios tested, with an average of 57.5%.

Surprisingly, the researchers found little difference in accuracy loss when considering single versus multiple protected attributes. This suggests that accuracy can be maintained when optimizing for fairness across multiple attributes. However, the effect on F1-score was about twice as large when handling two protected attributes compared to one.

These findings highlight the importance of reporting a broader range of performance metrics beyond just accuracy when evaluating fair ML systems. Relying solely on accuracy does not provide a complete picture of the tradeoffs involved in building equitable AI.

Critical Analysis

The paper provides a comprehensive and rigorous analysis of fairness improvement methods for ML models with multiple protected attributes. The experimental design is well-conceived, and the results offer important insights into the complex, and sometimes counterintuitive, relationships between fairness, accuracy, and other performance metrics.

One limitation mentioned by the authors is that their study only considered pairwise combinations of protected attributes. It would be valuable to extend the analysis to scenarios with three or more attributes to better reflect real-world diversity. Additionally, the paper does not delve into the specific reasons behind the "negative fairness transfer" effect, which would be an interesting area for further investigation.

Another potential issue is that the study only evaluates fairness at the model-level, without considering potential disparities in the underlying data or the end-user experience. Unfairness can arise from many sources in the ML pipeline, so a holistic, end-to-end perspective on fairness may be necessary.

Despite these caveats, this paper makes a significant contribution to the growing body of research on fair ML. The findings challenge the common practice of reporting only accuracy and highlight the need for a more comprehensive approach to evaluating and deploying equitable AI systems.

Conclusion

This research paper presents a thorough investigation into the challenges of improving fairness in machine learning models when considering multiple protected attributes. The key takeaway is that optimizing for fairness with respect to a single attribute can often lead to unintended decreases in fairness for other unconsidered attributes.

These findings have important implications for the development of fair and equitable AI systems. They suggest that researchers and practitioners need to look beyond just accuracy and consider a wider range of performance metrics, including various fairness measures, when evaluating and deploying ML models. Only by taking a more holistic view can we ensure that these powerful technologies are used in a way that is truly fair and inclusive for all.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⚙️

Fairness Improvement with Multiple Protected Attributes: How Far Are We?

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effectiveness of these methods with different datasets, metrics, and ML models when considering multiple protected attributes. The results reveal that improving fairness for a single protected attribute can largely decrease fairness regarding unconsidered protected attributes. This decrease is observed in up to 88.3% of scenarios (57.5% on average). More surprisingly, we find little difference in accuracy loss when considering single and multiple protected attributes, indicating that accuracy can be maintained in the multiple-attribute paradigm. However, the effect on F1-score when handling two protected attributes is about twice that of a single attribute. This has important implications for future fairness research: reporting only accuracy as the ML performance metric, which is currently common in the literature, is inadequate.

4/5/2024

Measuring and Mitigating Bias for Tabular Datasets with Multiple Protected Attributes

Manh Khoi Duong, Stefan Conrad

Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more challenging, as many existing methods are designed for a single protected attribute. This paper comes with a twofold contribution: Firstly, new discrimination measures are introduced. These measures are categorized in our framework along with existing ones, guiding researchers and practitioners in choosing the right measure to assess the fairness of the underlying dataset. Secondly, a novel application of an existing bias mitigation method, FairDo, is presented. We show that this strategy can mitigate any type of discrimination, including intersectional discrimination, by transforming the dataset. By conducting experiments on real-world datasets (Adult, Bank, COMPAS), we demonstrate that de-biasing datasets with multiple protected attributes is possible. All transformed datasets show a reduction in discrimination, on average by 28%. Further, these datasets do not compromise any of the tested machine learning models' performances significantly compared to the original datasets. Conclusively, this study demonstrates the effectiveness of the mitigation strategy used and contributes to the ongoing discussion on the implementation of the European Union's AI Act.

10/2/2024

🛸

What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds

Ludwig Bothmann, Kristina Peters, Bernd Bischl

A growing body of literature in fairness-aware machine learning (fairML) aims to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure fairness of an ML model and by proposing methods to ensure that trained ML models achieve low scores on these metrics. However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a significant gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. In this work, we try to bridge this gap by formalizing a consistent concept of fairness and by translating the philosophical considerations into a formal framework for the training and evaluation of ML models in ADM systems. We argue that fairness problems can arise even without the presence of protected attributes (PAs), and point out that fairness and predictive performance are not irreconcilable opposites, but that the latter is necessary to achieve the former. Furthermore, we argue why and how causal considerations are necessary when assessing fairness in the presence of PAs by proposing a fictitious, normatively desired (FiND) world in which PAs have no causal effects. In practice, this FiND world must be approximated by a warped world in which the causal effects of the PAs are removed from the real-world data. Finally, we achieve greater linguistic clarity in the discussion of fairML. We outline algorithms for practical applications and present illustrative experiments on COMPAS data.

6/4/2024

📊

Lazy Data Practices Harm Fairness Research

Jan Simson, Alessandro Fabris, Christoph Kern

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by highlighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a textbf{lack of representation for certain protected attributes} in both data and evaluations; (2) the widespread textbf{exclusion of minorities} during data preprocessing; and (3) textbf{opaque data processing} threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

6/21/2024