Are demographically invariant models and representations in medical imaging fair?

Read original: arXiv:2305.01397 - Published 7/4/2024 by Eike Petersen, Enzo Ferrante, Melanie Ganz, Aasa Feragen

🏅

Overview

Medical imaging models can encode information about patient demographics like age, race, and sex in their hidden representations
This raises concerns about potential discrimination by these models
The paper explores whether it's desirable to require models not to encode demographic attributes

Plain English Explanation

The paper examines whether medical imaging models should be designed to avoid encoding information about a patient's age, race, or sex. These demographic attributes can get encoded in the hidden layers of the model, which could lead to the model making biased or discriminatory predictions.

The researchers point out that forcing models to have representations that don't depend on demographic attributes is similar to the standard fairness notions of demographic parity and equalized odds. However, this type of representation invariance can also mean equalizing important differences between demographic groups.

Instead, the researchers suggest directly enforcing traditional fairness definitions may be better than trying to remove demographic information from the model. They also note that models can still use demographic attributes to make predictions, even if they don't explicitly encode that information.

Defining counterfactual fairness for medical images is challenging. The researchers posit that encoding demographic attributes could actually be beneficial if it allows the model to learn a task-specific representation that doesn't rely on social constructs like "race" and "gender".

Technical Explanation

The paper explores whether medical imaging models should be designed to have representations that are invariant to patient demographics like age, race, and sex. The researchers point out that this type of representation invariance is related to the standard fairness notions of demographic parity (ensuring equal predictions across groups) and equalized odds (ensuring equal true and false positive rates across groups).

However, the authors note that representation invariance also requires matching the risk distributions across groups, which can potentially "equalize away" important differences between them. In contrast, directly enforcing demographic parity or equalized odds does not impose these strong constraints on the model.

The researchers also observe that representationally invariant models may still rely on demographic attributes to make predictions, implying unequal treatment. Achieving true representation invariance may actually require the model to explicitly encode demographic information.

Defining counterfactual fairness notions for medical images with respect to demographic attributes is described as challenging. The paper suggests that encoding demographic information could be advantageous if it allows the model to learn a task-specific representation that avoids relying on social constructs.

Critical Analysis

The paper raises important points about the nuances of enforcing demographic invariance in medical imaging models. While representation invariance is related to standard fairness metrics, the authors highlight how it can lead to unintended consequences like equalizing meaningful differences between groups.

The discussion of how representationally invariant models may still use demographic attributes for predictions is particularly insightful. This suggests that simply removing demographic information from the model's representations may not be sufficient to ensure fairness.

The paper rightly points out the challenges in defining counterfactual fairness notions for medical images. This is an area that deserves further research, as robust definitions of individual fairness are crucial for high-stakes applications like healthcare.

The researchers' proposal that encoding demographic information could be beneficial if it allows the model to learn a more nuanced, task-specific representation is an interesting perspective. This highlights the need for a more holistic view of fairness that goes beyond simply removing sensitive attributes from the model.

Overall, this paper provides a thoughtful and balanced critique of the common assumption that demographic invariance is necessary and sufficient for fairness in medical imaging. It encourages readers to think critically about the complex trade-offs involved in developing fair AI systems for healthcare.

Conclusion

This paper challenges the assumption that medical imaging models should be designed to have representations that are invariant to patient demographics like age, race, and sex. The researchers argue that such representation invariance is neither necessary nor sufficient for ensuring fairness, and may even lead to unintended consequences.

The paper highlights the nuanced relationship between representation invariance and standard fairness metrics like demographic parity and equalized odds. It also raises important practical and theoretical concerns around the feasibility of defining and enforcing counterfactual fairness notions for medical images.

Ultimately, the researchers suggest that encoding demographic attributes may not be inherently problematic, and could even be advantageous if it allows the model to learn a more nuanced, task-specific representation. This emphasizes the need for a comprehensive, contextual approach to fairness assessment in medical AI systems.

The insights from this paper underscore the importance of carefully evaluating the fairness implications of medical imaging models, rather than relying on simplistic notions of demographic invariance. As the use of AI in healthcare continues to grow, this research lends further urgency to calls for rigorous, multifaceted fairness evaluations.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏅

Are demographically invariant models and representations in medical imaging fair?

Eike Petersen, Enzo Ferrante, Melanie Ganz, Aasa Feragen

Medical imaging models have been shown to encode information about patient demographics such as age, race, and sex in their latent representation, raising concerns about their potential for discrimination. Here, we ask whether requiring models not to encode demographic attributes is desirable. We point out that marginal and class-conditional representation invariance imply the standard group fairness notions of demographic parity and equalized odds, respectively. In addition, however, they require matching the risk distributions, thus potentially equalizing away important group differences. Enforcing the traditional fairness notions directly instead does not entail these strong constraints. Moreover, representationally invariant models may still take demographic attributes into account for deriving predictions, implying unequal treatment - in fact, achieving representation invariance may require doing so. In theory, this can be prevented using counterfactual notions of (individual) fairness or invariance. We caution, however, that properly defining medical image counterfactuals with respect to demographic attributes is fraught with challenges. Finally, we posit that encoding demographic attributes may even be advantageous if it enables learning a task-specific encoding of demographic features that does not rely on social constructs such as 'race' and 'gender.' We conclude that demographically invariant representations are neither necessary nor sufficient for fairness in medical imaging. Models may need to encode demographic attributes, lending further urgency to calls for comprehensive model fairness assessments in terms of predictive performance across diverse patient groups.

7/4/2024

Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data

Dilermando Queiroz, Andr'e Anjos, Lilian Berton

Ensuring consistent performance across diverse populations and incorporating fairness into machine learning models are crucial for advancing medical image diagnostics and promoting equitable healthcare. However, many databases do not provide protected attributes or contain unbalanced representations of demographic groups, complicating the evaluation of model performance across different demographics and the application of bias mitigation techniques that rely on these attributes. This study aims to investigate the effectiveness of using the backbone of Foundation Models as an embedding extractor for creating groups that represent protected attributes, such as gender and age. We propose utilizing these groups in different stages of bias mitigation, including pre-processing, in-processing, and evaluation. Using databases in and out-of-distribution scenarios, it is possible to identify that the method can create groups that represent gender in both databases and reduce in 4.44% the difference between the gender attribute in-distribution and 6.16% in out-of-distribution. However, the model lacks robustness in handling age attributes, underscoring the need for more fundamentally fair and robust Foundation models. These findings suggest a role in promoting fairness assessment in scenarios where we lack knowledge of attributes, contributing to the development of more equitable medical diagnostics.

8/30/2024

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

Xuyang Wu, Yuan Wang, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate emph{visual fairness} in several mainstream LVLMs and audit their performance disparities across sensitive demographic attributes, based on public fairness benchmark datasets (e.g., FACET). To disclose the visual bias in LVLMs, we design a fairness evaluation framework with direct questions and single-choice question-instructed prompts on visual question-answering/classification tasks. The zero-shot prompting results indicate that, despite enhancements in visual understanding, both open-source and closed-source LVLMs exhibit prevalent fairness issues across different instruct prompts and demographic attributes.

6/27/2024

Hidden or Inferred: Fair Learning-To-Rank with Unknown Demographics

Oluseun Olulana, Kathleen Cachel, Fabricio Murai, Elke Rundensteiner

As learning-to-rank models are increasingly deployed for decision-making in areas with profound life implications, the FairML community has been developing fair learning-to-rank (LTR) models. These models rely on the availability of sensitive demographic features such as race or sex. However, in practice, regulatory obstacles and privacy concerns protect this data from collection and use. As a result, practitioners may either need to promote fairness despite the absence of these features or turn to demographic inference tools to attempt to infer them. Given that these tools are fallible, this paper aims to further understand how errors in demographic inference impact the fairness performance of popular fair LTR strategies. In which cases would it be better to keep such demographic attributes hidden from models versus infer them? We examine a spectrum of fair LTR strategies ranging from fair LTR with and without demographic features hidden versus inferred to fairness-unaware LTR followed by fair re-ranking. We conduct a controlled empirical investigation modeling different levels of inference errors by systematically perturbing the inferred sensitive attribute. We also perform three case studies with real-world datasets and popular open-source inference methods. Our findings reveal that as inference noise grows, LTR-based methods that incorporate fairness considerations into the learning process may increase bias. In contrast, fair re-ranking strategies are more robust to inference errors. All source code, data, and experimental artifacts of our experimental study are available here: https://github.com/sewen007/hoiltr.git

7/25/2024