The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study

Read original: arXiv:2409.04368 - Published 9/9/2024 by Gregory Szumel, Brian Guo, Darui Lu, Rongze Gui, Tingyu Wang, Nicholas Konz, Maciej A. Mazurowski

The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study

Overview

Medical imaging is crucial for disease diagnosis and treatment, but deep learning models can struggle with performance when applied to data from different scanners or imaging sites.
This study aims to quantify the impact of scanner domain shift on deep learning model performance in medical imaging tasks.
The researchers conduct experiments using various medical imaging datasets and deep learning models to assess the effects of scanner domain shift.

Plain English Explanation

Deep learning models, which are a type of artificial intelligence, have become very powerful at analyzing medical images like X-rays, MRI scans, and CT scans. These models can help doctors diagnose and treat diseases more accurately and efficiently.

However, a major challenge is that the performance of these deep learning models can degrade when they are applied to images from different scanners or imaging sites than the ones they were trained on. This is called "scanner domain shift." The scanners may use different technologies, settings, or protocols, which can cause the images to look quite different, even if they are of the same patient or condition.

This study set out to better understand the impact of scanner domain shift on deep learning model performance in medical imaging tasks. The researchers conducted experiments using various medical imaging datasets and deep learning models to see how much the model performance would drop when applied to data from different scanners.

By quantifying this effect, the researchers hope to inform the development of more robust and generalizable deep learning models that can maintain high performance across different scanners and imaging sites. This is an important step towards making these AI tools more reliable and useful for real-world clinical applications.

Technical Explanation

The researchers designed experiments to quantify the impact of scanner domain shift on deep learning performance in medical imaging. They evaluated several popular deep learning models, including convolutional neural networks (CNNs) and transformer-based models, on multiple medical imaging datasets spanning different modalities (e.g. CT, X-ray, MRI).

For each dataset, the researchers split the data by scanner or imaging site, training the models on one set of scanners and evaluating on held-out data from different scanners. This allowed them to measure the performance drop due to the domain shift.

The experimental results showed that scanner domain shift can have a significant negative impact on model performance, with error rates sometimes doubling or tripling compared to in-domain evaluation. The magnitude of the effect varied across datasets and model architectures.

The researchers also explored strategies to mitigate the impact of scanner domain shift, such as data augmentation, transfer learning, and domain adversarial training. These techniques were able to partially recover the performance drop, but substantial gaps remained compared to in-domain evaluation.

Critical Analysis

The study provides valuable empirical insights into the practical challenges of deploying deep learning for medical imaging in the real world. The systematic evaluation across multiple datasets and model architectures lends robustness to the findings.

However, the paper does not delve into the underlying causes of scanner domain shift or provide detailed analysis of the types of artifacts or distributional shifts that lead to performance degradation. Further investigation into the specific scanner-related factors impacting model generalization would be helpful.

Additionally, the mitigation strategies explored, while promising, still leave sizable performance gaps. More advanced domain adaptation or meta-learning techniques may be needed to truly overcome the scanner domain shift problem. Exploring these directions in future work could further strengthen the practical applicability of deep learning in medical imaging.

Conclusion

This study demonstrates that scanner domain shift can significantly degrade the performance of deep learning models in medical imaging tasks. The magnitude of the effect varies across datasets and model architectures, highlighting the need for developing more robust and generalizable deep learning techniques.

The findings underscore the importance of carefully evaluating medical AI systems across diverse real-world deployment scenarios, rather than just on held-out test data from the same distribution. Addressing scanner domain shift is a crucial step towards realizing the full potential of deep learning in clinical practice.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study

Gregory Szumel, Brian Guo, Darui Lu, Rongze Gui, Tingyu Wang, Nicholas Konz, Maciej A. Mazurowski

Purpose: Medical images acquired using different scanners and protocols can differ substantially in their appearance. This phenomenon, scanner domain shift, can result in a drop in the performance of deep neural networks which are trained on data acquired by one scanner and tested on another. This significant practical issue is well-acknowledged, however, no systematic study of the issue is available across different modalities and diagnostic tasks. Materials and Methods: In this paper, we present a broad experimental study evaluating the impact of scanner domain shift on convolutional neural network performance for different automated diagnostic tasks. We evaluate this phenomenon in common radiological modalities, including X-ray, CT, and MRI. Results: We find that network performance on data from a different scanner is almost always worse than on same-scanner data, and we quantify the degree of performance drop across different datasets. Notably, we find that this drop is most severe for MRI, moderate for X-ray, and quite small for CT, on average, which we attribute to the standardized nature of CT acquisition systems which is not present in MRI or X-ray. We also study how injecting varying amounts of target domain data into the training set, as well as adding noise to the training data, helps with generalization. Conclusion: Our results provide extensive experimental evidence and quantification of the extent of performance drop caused by scanner domain shift in deep learning across different modalities, with the goal of guiding the future development of robust deep learning models for medical image analysis.

9/9/2024

Domain Shift Analysis in Chest Radiographs Classification in a Veterans Healthcare Administration Population

Mayanka Chandrashekar, Ian Goethert, Md Inzamam Ul Haque, Benjamin McMahon, Sayera Dhaubhadel, Kathryn Knight, Joseph Erdos, Donna Reagan, Caroline Taylor, Peter Kuzmak, John Michael Gaziano, Eileen McAllister, Lauren Costa, Yuk-Lam Ho, Kelly Cho, Suzanne Tamang, Samah Fodeh-Jarad, Olga S. Ovchinnikova, Amy C. Justice, Jacob Hinkle, Ioana Danciu

Objectives: This study aims to assess the impact of domain shift on chest X-ray classification accuracy and to analyze the influence of ground truth label quality and demographic factors such as age group, sex, and study year. Materials and Methods: We used a DenseNet121 model pretrained MIMIC-CXR dataset for deep learning-based multilabel classification using ground truth labels from radiology reports extracted using the CheXpert and CheXbert Labeler. We compared the performance of the 14 chest X-ray labels on the MIMIC-CXR and Veterans Healthcare Administration chest X-ray dataset (VA-CXR). The VA-CXR dataset comprises over 259k chest X-ray images spanning between the years 2010 and 2022. Results: The validation of ground truth and the assessment of multi-label classification performance across various NLP extraction tools revealed that the VA-CXR dataset exhibited lower disagreement rates than the MIMIC-CXR datasets. Additionally, there were notable differences in AUC scores between models utilizing CheXpert and CheXbert. When evaluating multi-label classification performance across different datasets, minimal domain shift was observed in unseen datasets, except for the label Enlarged Cardiomediastinum. The study year's subgroup analyses exhibited the most significant variations in multi-label classification model performance. These findings underscore the importance of considering domain shifts in chest X-ray classification tasks, particularly concerning study years. Conclusion: Our study reveals the significant impact of domain shift and demographic factors on chest X-ray classification, emphasizing the need for improved transfer learning and equitable model development. Addressing these challenges is crucial for advancing medical imaging and enhancing patient care.

8/1/2024

New!A Systematic Review of Generalization Research in Medical Image Classification

Sarah Matta, Mathieu Lamard, Philippe Zhang, Alexandre Le Guilcher, Laurent Borderie, B'eatrice Cochener, Gwenol'e Quellec

Numerous Deep Learning (DL) classification models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, a fundamental questions remain: how can these models effectively handle domain shift? This question is crucial to limit DL models performance degradation. Medical data are dynamic and prone to domain shift, due to multiple factors. Two main shift types can occur over time: 1) covariate shift mainly arising due to updates to medical equipment and 2) concept shift caused by inter-grader variability. To mitigate the problem of domain shift, existing surveys mainly focus on domain adaptation techniques, with an emphasis on covariate shift. More generally, no work has reviewed the state-of-the-art solutions while focusing on the shift types. This paper aims to explore existing domain generalization methods for DL-based classification models through a systematic review of literature. It proposes a taxonomy based on the shift type they aim to solve. Papers were searched and gathered on Scopus till 10 April 2023, and after the eligibility screening and quality evaluation, 77 articles were identified. Exclusion criteria included: lack of methodological novelty (e.g., reviews, benchmarks), experiments conducted on a single mono-center dataset, or articles not written in English. The results of this paper show that learning based methods are emerging, for both shift types. Finally, we discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.

9/18/2024

🖼️

Multi-domain improves out-of-distribution and data-limited scenarios for medical image analysis

Ece Ozkan, Xavier Boix

Current machine learning methods for medical image analysis primarily focus on developing models tailored for their specific tasks, utilizing data within their target domain. These specialized models tend to be data-hungry and often exhibit limitations in generalizing to out-of-distribution samples. In this work, we show that employing models that incorporate multiple domains instead of specialized ones significantly alleviates the limitations observed in specialized models. We refer to this approach as multi-domain model and compare its performance to that of specialized models. For this, we introduce the incorporation of diverse medical image domains, including different imaging modalities like X-ray, MRI, CT, and ultrasound images, as well as various viewpoints such as axial, coronal, and sagittal views. Our findings underscore the superior generalization capabilities of multi-domain models, particularly in scenarios characterized by limited data availability and out-of-distribution, frequently encountered in healthcare applications. The integration of diverse data allows multi-domain models to utilize information across domains, enhancing the overall outcomes substantially. To illustrate, for organ recognition, multi-domain model can enhance accuracy by up to 8% compared to conventional specialized models.

7/8/2024