Lazy Data Practices Harm Fairness Research

2404.17293

Published 6/21/2024 by Jan Simson, Alessandro Fabris, Christoph Kern

📊

Abstract

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by highlighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a textbf{lack of representation for certain protected attributes} in both data and evaluations; (2) the widespread textbf{exclusion of minorities} during data preprocessing; and (3) textbf{opaque data processing} threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

Create account to get full access

Overview

The paper examines how data practices in machine learning (ML) research on fairness (fair ML) can hinder the reliability and reach of algorithmic fairness findings.
The authors conduct a comprehensive analysis of fair ML datasets, identifying three key areas of concern: 1) lack of representation for certain protected attributes, 2) exclusion of minorities during data preprocessing, and 3) opaque data processing.
The paper proposes recommendations to improve the sourcing and usage of datasets in fairness research, focused on transparency and responsible inclusion.

Plain English Explanation

The paper looks at how the way data is collected and used in machine learning (ML) research on fairness can cause problems. The authors analyzed a large number of experiments in fair ML research and found three main issues:

Some groups of people (based on attributes like race, gender, etc.) are not well-represented in the data used for these studies. This means the research may not be applicable to those underrepresented groups.
Certain minority groups are often excluded from the data during the preprocessing stage, before the research is even conducted. This means the research is not capturing the full picture.
The way the data is processed and transformed is often not explained clearly. This makes it hard to understand how the research findings might apply in the real world.

To address these problems, the authors suggest researchers be more transparent about their data choices and make sure to include a diverse range of people in their studies. This would help ensure the research on algorithmic fairness is reliable and applicable to a wide range of individuals and communities.

Technical Explanation

The paper presents a comprehensive analysis of datasets used in fair ML research. The authors systematically studied the protected attributes encoded in 280 experiments across 142 fair ML publications.

Their analyses identified three key areas of concern:

Lack of representation: Certain protected attributes (e.g. race, disability status) were significantly underrepresented in the data and evaluations.
Exclusion of minorities: During data preprocessing, minority groups were often excluded, skewing the representation in the final datasets.
Opaque data processing: The transformations and processing applied to the data were often not transparently reported, hindering the generalization of fairness research findings.

To demonstrate these issues, the authors conducted exemplary analyses on the usage of prominent fair ML datasets. They showed how unreflective data decisions disproportionately affected minority groups, fairness metrics, and model comparisons.

The paper also identifies supplementary factors that exacerbate these challenges, such as limitations in publicly available data, privacy considerations, and a general lack of awareness in the research community.

Critical Analysis

The paper provides a valuable critique of data practices in fair ML research, highlighting important shortcomings that undermine the reliability and scope of algorithmic fairness findings.

One limitation noted is the reliance on publicly available datasets, which may not be representative of the real-world diversity and complexity of protected attributes. The authors recommend exploring ways to leverage additional data sources while addressing privacy concerns.

Another potential issue is the inherent trade-offs between fairness and other desirable properties, such as model accuracy. The paper does not deeply explore these tensions, which could be an area for further research.

Overall, the paper makes a compelling case for a critical reevaluation of data practices in fair ML. By addressing the identified issues around representation, exclusion, and transparency, the field can work towards more robust and inclusive algorithmic fairness research.

Conclusion

This paper highlights the crucial role of data practices in shaping research and practice on fairness in machine learning. By systematically analyzing fair ML datasets, the authors uncover significant shortcomings that undermine the reliability and generalizability of algorithmic fairness findings.

The proposed recommendations for transparent and responsible data usage in fairness research offer a constructive path forward. Addressing these data-related challenges is essential for the responsible advancement of fair ML and ensuring that the benefits of these technologies are equitably distributed across all communities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

AIM: Attributing, Interpreting, Mitigating Data Unfairness

Zhining Liu, Ruizhong Qiu, Zhichen Zeng, Yada Zhu, Hendrik Hamann, Hanghang Tong

Data collected in the real world often encapsulates historical discrimination against disadvantaged groups and individuals. Existing fair machine learning (FairML) research has predominantly focused on mitigating discriminative bias in the model prediction, with far less effort dedicated towards exploring how to trace biases present in the data, despite its importance for the transparency and interpretability of FairML. To fill this gap, we investigate a novel research problem: discovering samples that reflect biases/prejudices from the training data. Grounding on the existing fairness notions, we lay out a sample bias criterion and propose practical algorithms for measuring and countering sample bias. The derived bias score provides intuitive sample-level attribution and explanation of historical bias in data. On this basis, we further design two FairML strategies via sample-bias-informed minimal data editing. They can mitigate both group and individual unfairness at the cost of minimal or zero predictive utility loss. Extensive experiments and analyses on multiple real-world datasets demonstrate the effectiveness of our methods in explaining and mitigating unfairness. Code is available at https://github.com/ZhiningLiu1998/AIM.

6/19/2024

cs.LG cs.AI stat.ML

⚙️

Fairness Improvement with Multiple Protected Attributes: How Far Are We?

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effectiveness of these methods with different datasets, metrics, and ML models when considering multiple protected attributes. The results reveal that improving fairness for a single protected attribute can largely decrease fairness regarding unconsidered protected attributes. This decrease is observed in up to 88.3% of scenarios (57.5% on average). More surprisingly, we find little difference in accuracy loss when considering single and multiple protected attributes, indicating that accuracy can be maintained in the multiple-attribute paradigm. However, the effect on F1-score when handling two protected attributes is about twice that of a single attribute. This has important implications for future fairness research: reporting only accuracy as the ML performance metric, which is currently common in the literature, is inadequate.

4/5/2024

cs.LG cs.AI cs.CY cs.SE

📊

Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

Manh Khoi Duong, Stefan Conrad

In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective optimization problem that considers fairness and data loss, we propose a methodology to find Pareto-optimal solutions that balance these objectives. By identifying such solutions, users can make informed decisions about the trade-off between fairness and data quality and select the most suitable subset for their application.

6/12/2024

cs.LG cs.AI

Measuring and Mitigating Bias for Tabular Datasets with Multiple Protected Attributes

Manh Khoi Duong, Stefan Conrad

Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more challenging, as many existing methods are designed for a single protected attribute. This paper comes with a twofold contribution: Firstly, new discrimination measures are introduced. These measures are categorized in our framework along with existing ones, guiding researchers and practitioners in choosing the right measure to assess the fairness of the underlying dataset. Secondly, a novel application of an existing bias mitigation method, FairDo, is presented. We show that this strategy can mitigate any type of discrimination, including intersectional discrimination, by transforming the dataset. By conducting experiments on real-world datasets (Adult, Bank, Compas), we demonstrate that de-biasing datasets with multiple protected attributes is achievable. Further, the transformed fair datasets do not compromise any of the tested machine learning models' performances significantly when trained on these datasets compared to the original datasets. Discrimination was reduced by up to 83% in our experimentation. For most experiments, the disparity between protected groups was reduced by at least 7% and 27% on average. Generally, the findings show that the mitigation strategy used is effective, and this study contributes to the ongoing discussion on the implementation of the European Union's AI Act.

5/30/2024

cs.LG cs.AI