An investigation into the causes of race bias in AI-based cine CMR segmentation

Read original: arXiv:2408.02462 - Published 8/6/2024 by Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Sebastien Roujol, Theodore Barfoot, Shaheim Ogbomo-Harmitt, Miaojing Shi, Andrew P. King

An investigation into the causes of race bias in AI-based cine CMR segmentation

Overview

Investigates causes of race bias in AI-based cardiac magnetic resonance (CMR) image segmentation
Analyzes potential sources of bias in the dataset, model architecture, and training process
Proposes strategies to mitigate bias and improve performance on underrepresented populations

Plain English Explanation

Cardiac magnetic resonance (CMR) imaging is a powerful tool for diagnosing and monitoring heart conditions. Recent advances in artificial intelligence (AI) have enabled automated segmentation of CMR images, which can help clinicians analyze the data more efficiently.

However, there is growing concern that these AI systems may exhibit bias, meaning they perform better or worse depending on a patient's race or ethnicity. This paper sought to investigate the causes of race bias in AI-based cine CMR segmentation. The researchers analyzed potential sources of bias in the dataset, model architecture, and training process, and proposed strategies to mitigate these issues and improve performance on underrepresented populations.

Technical Explanation

The researchers used a large, diverse dataset of CMR images to train a state-of-the-art deep learning model for cardiac segmentation. They evaluated the model's performance across different racial/ethnic groups and found significant disparities, with the model performing worse on images from minority populations.

To understand the root causes of this bias, the team analyzed the dataset composition, examined the model architecture, and investigated the training process. They found that the dataset was skewed towards white patients, the model architecture may have been suboptimal for capturing racial differences, and the training process did not adequately address class imbalance.

Based on these insights, the researchers proposed several strategies to mitigate bias, including collecting more diverse datasets, designing specialized model architectures, and using targeted data augmentation and loss functions during training.

Critical Analysis

The paper provides a comprehensive and rigorous investigation into the causes of race bias in AI-based CMR segmentation. The authors should be commended for their systematic approach and for highlighting the important issue of algorithmic bias in medical imaging, which can have significant implications for patient care and outcomes.

One potential limitation of the study is that it focuses on a single model and dataset, and the findings may not generalize to other AI systems or medical domains. Additionally, the paper does not explore the societal and ethical implications of these biases, which could be an important area for further research.

Overall, this study represents an important contribution to the growing body of work on bias in AI systems, and the proposed strategies for mitigating bias could be valuable for researchers and developers working in this field.

Conclusion

This paper investigates the causes of race bias in AI-based cardiac magnetic resonance (CMR) image segmentation, a critical issue that can have significant implications for patient care and outcomes. The researchers systematically analyzed potential sources of bias in the dataset, model architecture, and training process, and proposed several strategies to mitigate these issues and improve performance on underrepresented populations. The findings of this study could inform the development of more equitable and inclusive AI systems in the medical field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

An investigation into the causes of race bias in AI-based cine CMR segmentation

Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Sebastien Roujol, Theodore Barfoot, Shaheim Ogbomo-Harmitt, Miaojing Shi, Andrew P. King

Artificial intelligence (AI) methods are being used increasingly for the automated segmentation of cine cardiac magnetic resonance (CMR) imaging. However, these methods have been shown to be subject to race bias, i.e. they exhibit different levels of performance for different races depending on the (im)balance of the data used to train the AI model. In this paper we investigate the source of this bias, seeking to understand its root cause(s) so that it can be effectively mitigated. We perform a series of classification and segmentation experiments on short-axis cine CMR images acquired from Black and White subjects from the UK Biobank and apply AI interpretability methods to understand the results. In the classification experiments, we found that race can be predicted with high accuracy from the images alone, but less accurately from ground truth segmentations, suggesting that the distributional shift between races, which is often the cause of AI bias, is mostly image-based rather than segmentation-based. The interpretability methods showed that most attention in the classification models was focused on non-heart regions, such as subcutaneous fat. Cropping the images tightly around the heart reduced classification accuracy to around chance level. Similarly, race can be predicted from the latent representations of a biased segmentation model, suggesting that race information is encoded in the model. Cropping images tightly around the heart reduced but did not eliminate segmentation bias. We also investigate the influence of possible confounders on the bias observed.

8/6/2024

🌐

Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging

Emma A. M. Stanley, Raissa Souza, Anthony Winder, Vedant Gulve, Kimberly Amador, Matthias Wilms, Nils D. Forkert

Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of disparities in performance between subgroups. Since not all sources of biases in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess how those biases are encoded in models, and how capable bias mitigation methods are at ameliorating performance disparities. In this article, we introduce a novel analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models. We developed and tested this framework for conducting controlled in silico trials to assess bias in medical imaging AI using a tool for generating synthetic magnetic resonance images with known disease effects and sources of bias. The feasibility is showcased by using three counterfactual bias scenarios to measure the impact of simulated bias effects on a convolutional neural network (CNN) classifier and the efficacy of three bias mitigation strategies. The analysis revealed that the simulated biases resulted in expected subgroup performance disparities when the CNN was trained on the synthetic datasets. Moreover, reweighing was identified as the most successful bias mitigation strategy for this setup, and we demonstrated how explainable AI methods can aid in investigating the manifestation of bias in the model using this framework. Developing fair AI models is a considerable challenge given that many and often unknown sources of biases can be present in medical imaging datasets. In this work, we present a novel methodology to objectively study the impact of biases and mitigation strategies on deep learning pipelines, which can support the development of clinical AI that is robust and responsible.

7/2/2024

On Biases in a UK Biobank-based Retinal Image Classification Model

Anissa Alloula, Rima Mustafa, Daniel R McGowan, Bart{l}omiej W. Papie.z

Recent work has uncovered alarming disparities in the performance of machine learning models in healthcare. In this study, we explore whether such disparities are present in the UK Biobank fundus retinal images by training and evaluating a disease classification model on these images. We assess possible disparities across various population groups and find substantial differences despite strong overall performance of the model. In particular, we discover unfair performance for certain assessment centres, which is surprising given the rigorous data standardisation protocol. We compare how these differences emerge and apply a range of existing bias mitigation methods to each one. A key insight is that each disparity has unique properties and responds differently to the mitigation methods. We also find that these methods are largely unable to enhance fairness, highlighting the need for better bias mitigation methods tailored to the specific type of bias.

8/7/2024

Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods

Vincent Olesen, Nina Weng, Aasa Feragen, Eike Petersen

Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.

6/19/2024