Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9

Read original: arXiv:2408.16256 - Published 8/30/2024 by Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky

📊

Overview

Breast cancer is one of the leading causes of death in women, with about 42,000 deaths per year in the US.
While there are over 300,000 new breast cancer diagnoses each year, only a fraction of these result in mortality.
Most women undergo treatment for localized cancer, but a significant number later develop metastatic disease, for which current treatments are only temporarily effective.
Existing prognostic metrics are of limited use for the majority of women who seem "cured" after local treatment.
Many women receive aggressive adjuvant therapies that only reduce metastatic recurrence by a third, possibly unnecessarily.
There is a need for better prognostic tools to target aggressive treatment to those likely to relapse and spare those who were actually cured.

Plain English Explanation

Breast cancer is a major health issue for women, with a large number of new cases each year. However, the number of deaths from breast cancer is relatively low compared to the overall number of diagnoses. This suggests that while many women undergo treatment for their localized cancer, a significant portion later develop a more serious, metastatic form of the disease, which is much harder to treat effectively.

The current methods used to predict a patient's prognosis and likelihood of relapse are not very accurate for most women who seem to be "cured" after their initial treatment. As a result, many women receive aggressive additional therapies, even though these treatments only marginally reduce the risk of the cancer returning.

Researchers believe that a new approach is needed - one that can use existing data on clinical and pathological features of the tumor to develop more accurate prognostic algorithms. By applying machine learning techniques to this data, they hope to create prediction models that can better identify which patients are truly cured and which are at high risk of relapse, allowing doctors to tailor treatment accordingly.

Technical Explanation

The researchers in this study sought to develop improved prognostic algorithms for breast cancer using machine learning techniques. They applied methods like grid search and Bayesian Networks to clinical and histopathological data, aiming to create models that could accurately predict which patients were likely to relapse after initial treatment.

The motivation for this work was the observation that while breast cancer has a high incidence, the mortality rate is relatively low compared to the number of new diagnoses. This suggests that many women undergo seemingly curative treatment, but a substantial portion later develop metastatic disease that is much harder to treat effectively.

The researchers noted that current prognostic tools are often of limited use for the majority of women who appear "cured" after local treatment. Additionally, many patients receive aggressive adjuvant therapies that only modestly reduce the risk of metastatic recurrence. There is therefore a need for better prognostic models that can identify high-risk patients who require more intensive treatment, while sparing low-risk patients from unnecessary harm.

Through their machine learning approach, the researchers were able to develop algorithms that achieved an area under the curve (AUC) of up to 0.9 in receiver operating characteristic (ROC) analyses. Importantly, these models relied only on existing clinical and histopathological data, and could therefore be rapidly translated into clinical practice without the need for additional testing.

Critical Analysis

The researchers acknowledge several limitations to their work. Firstly, the study was retrospective in nature, relying on historical data. Prospective validation of the algorithms would be necessary to ensure their effectiveness in real-world clinical settings.

Additionally, the researchers note that their models, while highly predictive, still leave a significant portion of patients unaccounted for. There is likely room for further improvement, perhaps by incorporating additional data sources such as biomarkers or genomic profiles.

It would also be valuable to explore how these prognostic algorithms could be integrated into clinical decision-making and to assess their impact on patient outcomes and quality of life. Careful consideration of the ethical and psychological implications of such models would be important as well.

Conclusion

This research represents a promising step towards more accurate and personalized prognostic tools for breast cancer. By leveraging machine learning techniques and existing clinical data, the researchers have developed algorithms that could help identify high-risk patients who would benefit from aggressive treatment, while sparing low-risk patients from unnecessary harm.

While further validation and refinement are needed, this work highlights the potential for advanced AI models to transform cancer management and improve outcomes for breast cancer patients.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9

Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky

Breast cancer is one of the two cancers responsible for the most deaths in women, with about 42,000 deaths each year in the US. That there are over 300,000 breast cancers newly diagnosed each year suggests that only a fraction of the cancers result in mortality. Thus, most of the women undergo seemingly curative treatment for localized cancers, but a significant later succumb to metastatic disease for which current treatments are only temporizing for the vast majority. The current prognostic metrics are of little actionable value for 4 of the 5 women seemingly cured after local treatment, and many women are exposed to morbid and even mortal adjuvant therapies unnecessarily, with these adjuvant therapies reducing metastatic recurrence by only a third. Thus, there is a need for better prognostics to target aggressive treatment at those who are likely to relapse and spare those who were actually cured. While there is a plethora of molecular and tumor-marker assays in use and under-development to detect recurrence early, these are time consuming, expensive and still often un-validated as to actionable prognostic utility. A different approach would use large data techniques to determine clinical and histopathological parameters that would provide accurate prognostics using existing data. Herein, we report on machine learning, together with grid search and Bayesian Networks to develop algorithms that present a AUC of up to 0.9 in ROC analyses, using only extant data. Such algorithms could be rapidly translated to clinical management as they do not require testing beyond routine tumor evaluations.

8/30/2024

PersonalizedUS: Interpretable Breast Cancer Risk Assessment with Local Coverage Uncertainty Quantification

Alek Frohlich, Thiago Ramos, Gustavo Cabello, Isabela Buzatto, Rafael Izbicki, Daniel Tiezzi

Correctly assessing the malignancy of breast lesions identified during ultrasound examinations is crucial for effective clinical decision-making. However, the current golden standard relies on manual BI-RADS scoring by clinicians, often leading to unnecessary biopsies and a significant mental health burden on patients and their families. In this paper, we introduce PersonalizedUS, an interpretable machine learning system that leverages recent advances in conformal prediction to provide precise and personalized risk estimates with local coverage guarantees and sensitivity, specificity, and predictive values above 0.9 across various threshold levels. In particular, we identify meaningful lesion subgroups where distribution-free, model-agnostic conditional coverage holds, with approximately 90% of our prediction sets containing only the ground truth in most lesion subgroups, thus explicitly characterizing for which patients the model is most suitably applied. Moreover, we make available a curated tabular dataset of 1936 biopsied breast lesions from a recent observational multicenter study and benchmark the performance of several state-of-the-art learning algorithms. We also report a successful case study of the deployed system in the same multicenter context. Concrete clinical benefits include up to a 65% reduction in requested biopsies among BI-RADS 4a and 4b lesions, with minimal to no missed cancer cases.

8/29/2024

🏷️

Biomarker based Cancer Classification using an Ensemble with Pre-trained Models

Chongmin Lee, Jihie Kim

Certain cancer types, namely pancreatic cancer is difficult to detect at an early stage; sparking the importance of discovering the causal relationship between biomarkers and cancer to identify cancer efficiently. By allowing for the detection and monitoring of specific biomarkers through a non-invasive method, liquid biopsies enhance the precision and efficacy of medical interventions, advocating the move towards personalized healthcare. Several machine learning algorithms such as Random Forest, SVM are utilized for classification, yet causing inefficiency due to the need for conducting hyperparameter tuning. We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929 and simultaneously achieving robustness especially on highly imbalanced datasets compared to other ML algorithms in several binary classification tasks (e.g. breast invasive carcinoma; BRCA vs. non-BRCA). We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464) while merely using 500 PCA features; distinguishable from previous studies where they used more than 2,000 features for similar results.

6/17/2024

Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen

Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure, offering a noninvasive method for personalized breast cancer management. We have curated the largest multiparametric breast MRI dataset, involving 5,205 patients from three hospitals in the north, southeast, and southwest of China, for the development and extensive evaluation of our model. MOME demonstrated accurate and robust identification of breast cancer. It achieved comparable performance for malignancy recognition to that of four senior radiologists and significantly outperformed a junior radiologist, with 0.913 AUROC, 0.948 AUPRC, 0.905 F1 score, and 0.723 MCC. Our findings suggest that MOME could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694. The model further supports scalable and interpretable inference, adapting to missing modalities and providing decision explanations by highlighting lesions and measuring modality contributions. MOME exemplifies a discriminative, robust, scalable, and interpretable multimodal model, paving the way for noninvasive, personalized management of breast cancer patients based on multiparametric breast imaging data.

9/4/2024