MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations

Read original: arXiv:2406.13844 - Published 7/31/2024 by Lidia Garrucho, Claire-Anne Reidel, Kaisar Kushibar, Smriti Joshi, Richard Osuala, Apostolia Tsirikoglou, Maciej Bobowicz, Javier del Riego, Alessandro Catanese, Katarzyna Gwo'zdziewicz and 23 others

MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations

Overview

This paper presents a large-scale, multi-center breast cancer dynamic contrast-enhanced MRI (DCE-MRI) dataset called MAMA-MIA, along with expert segmentations of the tumors.
The dataset is designed to support research in treatment response and survival prediction as well as automatic segmentation of breast cancer in MRI.
The MAMA-MIA dataset includes over 5,000 DCE-MRI scans from 10 different medical centers, making it a valuable resource for the research community.

Plain English Explanation

The paper introduces a large collection of breast cancer MRI scans, called the MAMA-MIA dataset. These scans were gathered from 10 different hospitals and research centers around the world. The dataset also includes detailed markings by medical experts that identify the location and boundaries of the tumors in the scans.

This dataset is important for two key reasons:

It can help researchers develop better algorithms to predict how well a patient will respond to cancer treatment and how long they are likely to survive. This could lead to more personalized and effective cancer treatment plans.
It provides a large and diverse set of MRI scans for training computer vision models to automatically detect and segment breast tumors in new patients. This could make breast cancer diagnosis faster and more accurate.

By making this high-quality dataset publicly available, the researchers hope to accelerate progress in these important areas of breast cancer research and clinical care.

Technical Explanation

The MAMA-MIA dataset consists of over 5,000 dynamic contrast-enhanced MRI (DCE-MRI) scans of breast cancer patients, collected from 10 different medical centers around the world. Each scan has been carefully annotated by expert radiologists to delineate the boundaries of the tumor.

The large scale and diversity of this dataset make it a valuable resource for training and evaluating machine learning models for breast cancer tumor segmentation and treatment response prediction. The dataset covers a wide range of tumor sizes, shapes, and locations, as well as variations in imaging protocols, scanner hardware, and patient demographics.

In addition to the raw MRI scans and expert segmentations, the dataset includes rich clinical metadata, such as patient age, tumor stage, and treatment outcomes. This information enables researchers to investigate the relationship between tumor appearance and clinical prognosis.

The MAMA-MIA dataset is publicly available and can be accessed through a secure data sharing platform. The researchers have also provided baseline performance metrics for several state-of-the-art tumor segmentation algorithms, establishing a strong benchmark for future work in this area.

Critical Analysis

The MAMA-MIA dataset represents a significant step forward in providing a large-scale, high-quality resource for breast cancer MRI research. However, the authors acknowledge several limitations and areas for further work:

The dataset is limited to DCE-MRI scans, and does not include other MRI modalities (e.g., T2-weighted, diffusion-weighted) that may provide complementary information for tumor analysis.
The expert segmentations, while extensive, may still exhibit some degree of inter-observer variability, which could impact the reliability of the ground truth annotations.
The dataset does not include longitudinal scans or treatment response data for individual patients, which would be valuable for developing prognostic models.

Future work could address these limitations by expanding the dataset to include a broader range of MRI modalities, investigating methods to ensure consistent segmentation quality, and collecting longitudinal data to support the development of more sophisticated predictive models.

Conclusion

The MAMA-MIA dataset represents a significant contribution to the field of breast cancer MRI research. By providing a large, diverse, and expertly annotated collection of DCE-MRI scans, the dataset enables the development of advanced machine learning models for tumor segmentation and treatment response prediction. This has the potential to improve the accuracy and efficiency of breast cancer diagnosis and management, ultimately leading to better outcomes for patients.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations

Lidia Garrucho, Claire-Anne Reidel, Kaisar Kushibar, Smriti Joshi, Richard Osuala, Apostolia Tsirikoglou, Maciej Bobowicz, Javier del Riego, Alessandro Catanese, Katarzyna Gwo'zdziewicz, Maria-Laura Cosaka, Pasant M. Abo-Elhoda, Sara W. Tantawy, Shorouq S. Sakrana, Norhan O. Shawky-Abdelfatah, Amr Muhammad Abdo-Salem, Androniki Kozana, Eugen Divjak, Gordana Ivanac, Katerina Nikiforaki, Michail E. Klontzas, Rosa Garc'ia-Dosd'a, Meltem Gulsun-Akpinar, Ou{g}uz Lafc{i}, Ritse Mann, Carlos Mart'in-Isla, Fred Prior, Kostas Marias, Martijn P. A. Starmans, Fredrik Strand, Oliver D'iaz, Laura Igual, Karim Lekadir

Current research in breast cancer Magnetic Resonance Imaging (MRI), especially with Artificial Intelligence (AI), faces challenges due to the lack of expert segmentations. To address this, we introduce the MAMA-MIA dataset, comprising 1506 multi-center dynamic contrast-enhanced MRI cases with expert segmentations of primary tumors and non-mass enhancement areas. These cases were sourced from four publicly available collections in The Cancer Imaging Archive (TCIA). Initially, we trained a deep learning model to automatically segment the cases, generating preliminary segmentations that significantly reduced expert segmentation time. Sixteen experts, averaging 9 years of experience in breast cancer, then corrected these segmentations, resulting in the final expert segmentations. Additionally, two radiologists conducted a visual inspection of the automatic segmentations to support future quality control studies. Alongside the expert segmentations, we provide 49 harmonized demographic and clinical variables and the pretrained weights of the well-known nnUNet architecture trained using the DCE-MRI full-images and expert segmentations. This dataset aims to accelerate the development and benchmarking of deep learning models and foster innovation in breast cancer diagnostics and treatment planning.

7/31/2024

BC-MRI-SEG: A Breast Cancer MRI Tumor Segmentation Benchmark

Anthony Bilic, Chen Chen

Binary breast cancer tumor segmentation with Magnetic Resonance Imaging (MRI) data is typically trained and evaluated on private medical data, which makes comparing deep learning approaches difficult. We propose a benchmark (BC-MRI-SEG) for binary breast cancer tumor segmentation based on publicly available MRI datasets. The benchmark consists of four datasets in total, where two datasets are used for supervised training and evaluation, and two are used for zero-shot evaluation. Additionally we compare state-of-the-art (SOTA) approaches on our benchmark and provide an exhaustive list of available public breast cancer MRI datasets. The source code has been made available at https://irulenot.github.io/BC_MRI_SEG_Benchmark.

6/4/2024

Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen

Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure, offering a noninvasive method for personalized breast cancer management. We have curated the largest multiparametric breast MRI dataset, involving 5,205 patients from three hospitals in the north, southeast, and southwest of China, for the development and extensive evaluation of our model. MOME demonstrated accurate and robust identification of breast cancer. It achieved comparable performance for malignancy recognition to that of four senior radiologists and significantly outperformed a junior radiologist, with 0.913 AUROC, 0.948 AUPRC, 0.905 F1 score, and 0.723 MCC. Our findings suggest that MOME could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694. The model further supports scalable and interpretable inference, adapting to missing modalities and providing decision explanations by highlighting lesions and measuring modality contributions. MOME exemplifies a discriminative, robust, scalable, and interpretable multimodal model, paving the way for noninvasive, personalized management of breast cancer patients based on multiparametric breast imaging data.

9/4/2024

⚙️

fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will support research and development of fast and quantitative breast image reconstruction and machine learning methods.

6/11/2024