Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

Read original: arXiv:2407.16634 - Published 7/24/2024 by Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han and 10 others

📊

Overview

Deep learning models have shown great potential to assist radiologists in breast ultrasound (US) diagnoses.
However, their effectiveness is limited by the long-tail distribution of training data, leading to inaccuracies in rare cases.
This study aims to address the challenge of improving diagnostic model performance on rare cases using long-tailed data.

Plain English Explanation

In the field of medical imaging, deep learning models have demonstrated impressive capabilities in assisting radiologists with breast ultrasound (US) diagnoses. These models are trained on large datasets of medical images to learn patterns and make predictions. However, the researchers found that the models' effectiveness is limited when it comes to rare or unusual cases, which are not well-represented in the training data.

To address this challenge, the researchers introduce a new pipeline called TAILOR. TAILOR uses a knowledge-driven generative model to produce synthetic, or artificially generated, data that focuses on the rare cases that the diagnostic models struggle with. By generating millions of additional breast US images, especially targeting the rare and error-prone cases, the researchers were able to build a more robust and accurate diagnostic model.

In their evaluation, the researchers found that their diagnostic model outperformed the average performance of nine radiologists by a significant margin in terms of specificity (correctly identifying non-cancerous cases) while maintaining the same level of sensitivity (correctly identifying cancerous cases). The model also outperformed all radiologists by a large amount on a specific type of cancer called ductal carcinoma in situ (DCIS), even though there were only 34 DCIS lesions in the original training data.

The researchers believe that the TAILOR approach could be extended to other diseases and imaging modalities, potentially improving the performance of AI-based diagnostic tools across a wide range of medical applications.

Technical Explanation

The researchers developed a pipeline called TAILOR that addresses the challenge of improving diagnostic model performance on rare cases in breast ultrasound (US) imaging. The key components of TAILOR include:

Knowledge-driven Generative Model: The researchers built a generative model that can produce synthetic breast US images, particularly targeting the rare and error-prone cases. This model was trained on 3,749 real lesion images as the source data.
Diagnostic Model: The researchers used the synthetic data generated by the TAILOR pipeline to build a more robust and accurate diagnostic model for breast US images. This model was designed to provide interpretable predictions, allowing for better understanding of the decision-making process.

In the prospective external evaluation, the researchers found that their diagnostic model outperformed the average performance of nine radiologists by 33.5% in specificity (correctly identifying non-cancerous cases) while maintaining the same level of sensitivity (correctly identifying cancerous cases). Additionally, on the specific case of ductal carcinoma in situ (DCIS), the diagnostic model outperformed all radiologists by a large margin, despite the small number (34) of DCIS lesions in the original training data.

The researchers believe that the TAILOR approach can potentially be extended to various diseases and imaging modalities, highlighting its broader applicability in improving the performance of AI-based diagnostic tools.

Critical Analysis

While the TAILOR pipeline demonstrates promising results in improving diagnostic model performance on rare cases in breast ultrasound imaging, there are a few potential limitations and areas for further research:

Generalizability: The researchers tested the TAILOR pipeline on breast US imaging, but it is unclear how well the approach would translate to other medical imaging modalities or disease areas. Further research is needed to evaluate the generalizability of the TAILOR approach.
Interpretability: The researchers emphasized the interpretability of their diagnostic model, which is an important aspect for building trust in AI-based medical decision support systems. However, the specific mechanisms by which the model arrives at its decisions could be further explored and explained.
Real-world Deployment: While the prospective external evaluation showed promising results, the true test of the TAILOR pipeline's effectiveness would be its performance in real-world clinical settings. Factors such as integration with existing workflows, user adoption, and long-term maintenance would need to be considered.
Ethical Considerations: As with any AI-based medical tool, there are important ethical considerations around bias, fairness, and the potential impact on clinical decision-making. The researchers could have discussed these aspects in more depth.

Overall, the TAILOR pipeline represents an interesting and potentially impactful approach to improving the performance of deep learning models on rare cases in medical imaging. However, further research and real-world testing will be necessary to fully understand the limitations and broader applicability of this approach.

Conclusion

The researchers have developed a novel pipeline called TAILOR that addresses the challenge of improving diagnostic model performance on rare cases in breast ultrasound imaging. By using a knowledge-driven generative model to produce tailored synthetic data, the researchers were able to build a more robust and accurate diagnostic model that outperformed radiologists in key metrics.

The TAILOR approach holds promise for enhancing the capabilities of AI-based diagnostic tools across a range of medical imaging modalities and disease areas. The ability to generate targeted synthetic data to address long-tail distributions in training data could lead to significant improvements in the performance and reliability of these systems, ultimately benefiting both healthcare providers and patients.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifically, we introduce a pipeline, TAILOR, that builds a knowledge-driven generative model to produce tailored synthetic data. The generative model, using 3,749 lesions as source data, can generate millions of breast-US images, especially for error-prone rare cases. The generated data can be further used to build a diagnostic model for accurate and interpretable diagnoses. In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33.5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process. Moreover, on ductal carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by a large margin, with only 34 DCIS lesions in the source data. We believe that TAILOR can potentially be extended to various diseases and imaging modalities.

7/24/2024

PersonalizedUS: Interpretable Breast Cancer Risk Assessment with Local Coverage Uncertainty Quantification

Alek Frohlich, Thiago Ramos, Gustavo Cabello, Isabela Buzatto, Rafael Izbicki, Daniel Tiezzi

Correctly assessing the malignancy of breast lesions identified during ultrasound examinations is crucial for effective clinical decision-making. However, the current golden standard relies on manual BI-RADS scoring by clinicians, often leading to unnecessary biopsies and a significant mental health burden on patients and their families. In this paper, we introduce PersonalizedUS, an interpretable machine learning system that leverages recent advances in conformal prediction to provide precise and personalized risk estimates with local coverage guarantees and sensitivity, specificity, and predictive values above 0.9 across various threshold levels. In particular, we identify meaningful lesion subgroups where distribution-free, model-agnostic conditional coverage holds, with approximately 90% of our prediction sets containing only the ground truth in most lesion subgroups, thus explicitly characterizing for which patients the model is most suitably applied. Moreover, we make available a curated tabular dataset of 1936 biopsied breast lesions from a recent observational multicenter study and benchmark the performance of several state-of-the-art learning algorithms. We also report a successful case study of the deployed system in the same multicenter context. Concrete clinical benefits include up to a 65% reduction in requested biopsies among BI-RADS 4a and 4b lesions, with minimal to no missed cancer cases.

8/29/2024

🏷️

Breast tumor classification based on self-supervised contrastive learning from ultrasound videos

Yunxin Tang, Siyuan Tang, Jian Zhang, Hao Chen

Background: Breast ultrasound is prominently used in diagnosing breast tumors. At present, many automatic systems based on deep learning have been developed to help radiologists in diagnosis. However, training such systems remains challenging because they are usually data-hungry and demand amounts of labeled data, which need professional knowledge and are expensive. Methods: We adopted a triplet network and a self-supervised contrastive learning technique to learn representations from unlabeled breast ultrasound video clips. We further designed a new hard triplet loss to to learn representations that particularly discriminate positive and negative image pairs that are hard to recognize. We also constructed a pretraining dataset from breast ultrasound videos (1,360 videos from 200 patients), which includes an anchor sample dataset with 11,805 images, a positive sample dataset with 188,880 images, and a negative sample dataset dynamically generated from video clips. Further, we constructed a finetuning dataset, including 400 images from 66 patients. We transferred the pretrained network to a downstream benign/malignant classification task and compared the performance with other state-of-the-art models, including three models pretrained on ImageNet and a previous contrastive learning model retrained on our datasets. Results and conclusion: Experiments revealed that our model achieved an area under the receiver operating characteristic curve (AUC) of 0.952, which is significantly higher than the others. Further, we assessed the dependence of our pretrained model on the number of labeled data and revealed that <100 samples were required to achieve an AUC of 0.901. The proposed framework greatly reduces the demand for labeled data and holds potential for use in automatic breast ultrasound image diagnosis.

8/21/2024

Large-scale Long-tailed Disease Diagnosis on Radiology Images

Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Lisong Dai, Hengyu Guan, Yuehua Li, Ya Zhang, Yanfeng Wang, Weidi Xie

Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics. In this paper, we introduce RadDiag, a foundational model supporting 2D and 3D inputs across various modalities and anatomies, using a transformer-based fusion module for comprehensive disease diagnosis. Due to patient privacy concerns and the lack of large-scale radiology diagnosis datasets, we utilize high-quality, clinician-reviewed radiological images available online with diagnosis labels. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders (930 unique ICD-10-CM codes). Experimentally, our RadDiag achieves 95.14% AUC on internal evaluation with the knowledge-enhancement strategy. Additionally, RadDiag can be zero-shot applied or fine-tuned to external diagnosis datasets sourced from various hospitals, demonstrating state-of-the-art results. In conclusion, we show that publicly shared medical data on the Internet is a tremendous and valuable resource that can potentially support building a generalist AI for healthcare.

6/18/2024