Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Read original: arXiv:1912.01094 - Published 8/23/2024 by Avrim Blum, Kevin Stangl

💬

Overview

Multiple fairness constraints have been proposed to address concerns about how demographic groups might be treated unfairly by machine learning classifiers.
This work examines a different motivation: learning from biased training data.
Biased training data can lead to classifiers that are both biased and have suboptimal accuracy on the true data distribution.
The paper investigates the ability of fairness-constrained Empirical Risk Minimization (ERM) to correct this problem.

Plain English Explanation

Machine learning models can sometimes make decisions that unfairly disadvantage certain demographic groups. To address this, researchers have proposed various "fairness constraints" that aim to ensure more equitable treatment across groups.

However, this paper takes a different perspective. It suggests that even if our primary goal is simply to build an accurate model, we may still need to consider fairness interventions. The reason is that the training data itself can be "biased" - for example, having noisier or more negatively labeled examples for a disadvantaged group, or fewer positive or negative examples from that group.

When the training data is biased in this way, a standard machine learning approach like Empirical Risk Minimization (ERM) can produce a classifier that not only reflects the biases in the data, but also has lower overall accuracy on the true, unbiased data distribution.

The paper examines whether incorporating a specific "fairness constraint" - the Equal Opportunity constraint - can help correct this problem. Interestingly, the authors find that this fairness-constrained approach can actually recover the optimal, unbiased classifier, even in the presence of various types of training data bias.

This suggests that considering fairness may be important not just for ethical reasons, but also to achieve the best possible predictive performance, especially when working with biased training data.

Technical Explanation

The paper investigates the impact of training data bias on Empirical Risk Minimization (ERM) and the ability of fairness-constrained ERM to recover the Bayes Optimal Classifier.

The authors posit several ways in which training data may be biased, including:

More noisy or negatively biased labeling for members of a disadvantaged group
Decreased prevalence of positive or negative examples from the disadvantaged group
A combination of the above

They show that under these bias models, standard ERM can produce a classifier that is not only biased, but also has suboptimal accuracy on the true data distribution.

The paper then examines the performance of fairness-constrained ERM, specifically focusing on the Equal Opportunity fairness constraint. They prove that this approach can provably recover the Bayes Optimal Classifier under the range of bias models considered.

The authors also briefly discuss other recovery methods, such as reweighting the training data, Equalized Odds, and Demographic Parity. However, the main theoretical results center on the effectiveness of the Equal Opportunity constraint.

Critical Analysis

The paper provides a thoughtful analysis of how training data bias can impact machine learning models, even when the primary goal is predictive accuracy rather than fairness. The theoretical results on the ability of the Equal Opportunity constraint to recover the optimal, unbiased classifier are quite compelling.

That said, the paper acknowledges several limitations and areas for further research. For example, the bias models considered are relatively simplistic, and real-world training data bias may be more complex. Additionally, the authors note that there may be tradeoffs between fairness and accuracy that are not fully captured by their analysis.

Further empirical validation of these ideas, as well as exploration of other fairness constraints and recovery methods, would be valuable. It would also be interesting to investigate how these findings might apply to more advanced machine learning architectures beyond simple classifiers.

Overall, this paper makes an important contribution by highlighting how fairness considerations can be relevant even when the primary objective is predictive performance, not just ethical treatment. It encourages ML practitioners to think critically about the quality and representativeness of their training data, and consider fairness interventions as a means to achieve better models.

Conclusion

This paper presents a novel perspective on the role of fairness constraints in machine learning. Rather than viewing them solely as a means to ensure equitable treatment across demographic groups, the authors demonstrate how fairness interventions can also help recover optimal predictive performance when training data is biased.

The key theoretical result is that the Equal Opportunity fairness constraint, when combined with Empirical Risk Minimization, can provably recover the Bayes Optimal Classifier under various types of training data bias. This suggests that considering fairness may be important not just for ethical reasons, but also to achieve the best possible predictive accuracy, especially when working with imperfect training data.

While the paper acknowledges some limitations and areas for further research, it provides valuable insights that could inform the development of more robust and fair machine learning systems. By highlighting the connections between fairness and accuracy, this work encourages ML practitioners to think holistically about the quality and representativeness of their data, and the potential benefits of fairness-aware approaches.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Avrim Blum, Kevin Stangl

Multiple fairness constraints have been proposed in the literature, motivated by a range of concerns about how demographic groups might be treated unfairly by machine learning classifiers. In this work we consider a different motivation; learning from biased training data. We posit several ways in which training data may be biased, including having a more noisy or negatively biased labeling process on members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint (Hardt, Price, and Srebro 2016) combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models. We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.

8/23/2024

🚀

How Far Can Fairness Constraints Help Recover From Biased Data?

Mohit Sharma, Amit Deshpande

A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen. Contrary to this belief, Blum & Stangl (2019) show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution. Their result is interesting because it demonstrates that fairness constraints can implicitly rectify data bias and simultaneously overcome a perceived fairness-accuracy trade-off. Their data bias model simulates under-representation and label bias in underprivileged population, and they show the above result on a stylized data distribution with i.i.d. label noise, under simple conditions on the data distribution and bias parameters. We propose a general approach to extend the result of Blum & Stangl (2019) to different fairness constraints, data bias models, data distributions, and hypothesis classes. We strengthen their result, and extend it to the case when their stylized distribution has labels with Massart noise instead of i.i.d. noise. We prove a similar recovery result for arbitrary data distributions using fair reject option classifiers. We further generalize it to arbitrary data distributions and arbitrary hypothesis classes, i.e., we prove that for any data distribution, if the optimally accurate classifier in a given hypothesis class is fair and robust, then it can be recovered through fair classification with equal opportunity constraints on the biased distribution whenever the bias parameters satisfy certain simple conditions. Finally, we show applications of our technique to time-varying data bias in classification and fair machine learning pipelines.

6/4/2024

🎲

Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds

Meiyu Zhong, Ravi Tandon

With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for any classifier), as a function of the fairness budget. In addition, our bounds also exhibit dependence on the underlying statistics of the data, labels and the sensitive group attributes. We validate our theoretical upper bounds through empirical analysis on three real-world datasets: COMPAS, Adult, and Law School. Specifically, we compare our upper bound to the tradeoffs that are achieved by various existing fair classifiers in the literature. Our results show that achieving high accuracy subject to a low-bias could be fundamentally limited based on the statistical disparity across the groups.

5/17/2024

↗️

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

Mostafa M. Amin, Bjorn W. Schuller

Algorithms and Machine Learning (ML) are increasingly affecting everyday life and several decision-making processes, where ML has an advantage due to scalability or superior performance. Fairness in such applications is crucial, where models should not discriminate their results based on race, gender, or other protected groups. This is especially crucial for models affecting very sensitive topics, like interview invitation or recidivism prediction. Fairness is not commonly studied for regression problems compared to binary classification problems; hence, we present a simple, yet effective method based on normalisation (FaiReg), which minimises the impact of unfairness in regression problems, especially due to labelling bias. We present a theoretical analysis of the method, in addition to an empirical comparison against two standard methods for fairness, namely data balancing and adversarial training. We also include a hybrid formulation (FaiRegH), merging the presented method with data balancing, in an attempt to face labelling and sampling biases simultaneously. The experiments are conducted on the multimodal dataset First Impressions (FI) with various labels, namely Big-Five personality prediction and interview screening score. The results show the superior performance of diminishing the effects of unfairness better than data balancing, also without deteriorating the performance of the original problem as much as adversarial training. Fairness is evaluated based on the Equal Accuracy (EA) and Statistical Parity (SP) constraints. The experiments present a setup that enhances the fairness for several protected variables simultaneously.

8/21/2024