Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI

Read original: arXiv:2404.04686 - Published 4/9/2024 by Taminul Islam, Md. Alif Sheakh, Mst. Sazia Tahosin, Most. Hasna Hena, Shopnil Akash, Yousef A. Bin Jardan, Gezahign Fentahun Wondmie, Hiba-Allah Nafidi, Mohammed Bourhia

🏷️

Overview

Breast cancer is one of the leading causes of mortality worldwide and is the most common type of cancer.
Manually diagnosing breast cancer is a time-consuming process that requires significant expertise.
Machine learning and Explainable AI can help classify breast cancer more efficiently and provide insights into the decision-making process.
This study evaluated and compared the performance of five different machine learning methods on a dataset of 500 patients from Dhaka Medical College Hospital.
The study also applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature.

Plain English Explanation

Breast cancer is a serious health issue that has become increasingly prevalent in recent years. Detecting this type of cancer can be a complex and time-consuming task, requiring a lot of skill and experience from medical professionals. To help make the process more efficient, this study looked at using machine learning and Explainable AI techniques to classify breast cancer cases.

The researchers tested five different machine learning methods on a dataset of 500 patients from a hospital in Dhaka, Bangladesh. These methods included decision tree, random forest, logistic regression, naive Bayes, and XGBoost. The goal was to see which model could most accurately identify whether a patient had breast cancer or not.

In addition to measuring the accuracy of the models, the researchers also used a technique called SHAP analysis to understand how the XGBoost model was making its decisions. This helped provide insights into which factors were most important for the model in classifying the patients.

After evaluating the results, the researchers found that the XGBoost model achieved the highest accuracy, correctly classifying 97% of the cases. This suggests that machine learning can be a powerful tool for improving breast cancer diagnosis and helping to detect the disease more quickly and reliably.

Technical Explanation

This study evaluated and compared the performance of five different supervised machine learning techniques - decision tree, random forest, logistic regression, naive Bayes, and XGBoost - on a dataset of 500 patients from Dhaka Medical College Hospital. The researchers assessed the classification accuracy, precision, recall, and F-1 scores of each model.

In addition, the study applied SHAP (Shapley Additive Explanations) analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the output. SHAP is a method for Explainable AI that helps explain the contributions of each input feature to the model's predictions.

The results showed that the XGBoost model achieved the highest accuracy at 97%, outperforming the other machine learning techniques. The researchers compared these findings with results from other literature in the field of breast cancer diagnosis using machine learning.

Critical Analysis

The study provides a robust evaluation of several machine learning models for breast cancer classification and demonstrates the potential of Explainable AI techniques to interpret the model's decision-making process. However, a few considerations should be kept in mind:

The dataset used in this study was relatively small, with only 500 patients. Larger and more diverse datasets would be needed to further validate the generalizability of these findings. Additionally, the study did not provide details on the specific features used as inputs to the machine learning models, which could limit the reproducibility of the results.

While the XGBoost model achieved impressive accuracy, the study did not explore the potential trade-offs between model performance and interpretability. Some other research has suggested that there may be a balance to strike between model complexity and the ability to explain the model's decisions.

Further research could also investigate the integration of these machine learning techniques with other diagnostic tools, such as mammography, to provide a more comprehensive and accurate breast cancer detection system.

Conclusion

This study demonstrates the potential of machine learning, particularly XGBoost, for accurate and efficient breast cancer classification. By combining machine learning with Explainable AI techniques, the researchers were able to not only achieve high predictive accuracy but also gain valuable insights into the decision-making process of the models.

These findings have important implications for improving the diagnosis and treatment of breast cancer, potentially leading to earlier detection and better outcomes for patients. Further research with larger, more diverse datasets and integration with other diagnostic tools could help solidify the role of machine learning in the fight against this devastating disease.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏷️

Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI

Taminul Islam, Md. Alif Sheakh, Mst. Sazia Tahosin, Most. Hasna Hena, Shopnil Akash, Yousef A. Bin Jardan, Gezahign Fentahun Wondmie, Hiba-Allah Nafidi, Mohammed Bourhia

Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.

4/9/2024

Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques

Samita Bai, Sidra Nasir, Rizwan Ahmed Khan, Sheeraz Arif, Alexandre Meyer, Hubert Konik

Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies continue to permeate the healthcare sector, particularly in oncology, the need for transparent and interpretable models becomes imperative to enhance clinical decision-making and patient care. This review discusses the integration of various XAI approaches, such as SHAP, LIME, Grad-CAM, and others, with machine learning and deep learning models utilized in breast cancer detection and classification. By investigating the modalities of breast cancer datasets, including mammograms, ultrasounds and their processing with AI, the paper highlights how XAI can lead to more accurate diagnoses and personalized treatment plans. It also examines the challenges in implementing these techniques and the importance of developing standardized metrics for evaluating XAI's effectiveness in clinical settings. Through detailed analysis and discussion, this article aims to highlight the potential of XAI in bridging the gap between complex AI models and practical healthcare applications, thereby fostering trust and understanding among medical professionals and improving patient outcomes.

6/4/2024

🔎

Explainable artificial intelligence in breast cancer detection and risk prediction: A systematic scoping review

Amirehsan Ghasemi, Soheil Hashtarkhani, David L Schwartz, Arash Shaban-Nejad

With the advances in artificial intelligence (AI), data-driven algorithms are becoming increasingly popular in the medical domain. However, due to the nonlinear and complex behavior of many of these algorithms, decision-making by such algorithms is not trustworthy for clinicians and is considered a black-box process. Hence, the scientific community has introduced explainable artificial intelligence (XAI) to remedy the problem. This systematic scoping review investigates the application of XAI in breast cancer detection and risk prediction. We conducted a comprehensive search on Scopus, IEEE Explore, PubMed, and Google Scholar (first 50 citations) using a systematic search strategy. The search spanned from January 2017 to July 2023, focusing on peer-reviewed studies implementing XAI methods in breast cancer datasets. Thirty studies met our inclusion criteria and were included in the analysis. The results revealed that SHapley Additive exPlanations (SHAP) is the top model-agnostic XAI technique in breast cancer research in terms of usage, explaining the model prediction results, diagnosis and classification of biomarkers, and prognosis and survival analysis. Additionally, the SHAP model primarily explained tree-based ensemble machine learning models. The most common reason is that SHAP is model agnostic, which makes it both popular and useful for explaining any model prediction. Additionally, it is relatively easy to implement effectively and completely suits performant models, such as tree-based models. Explainable AI improves the transparency, interpretability, fairness, and trustworthiness of AI-enabled health systems and medical devices and, ultimately, the quality of care and outcomes.

7/18/2024

✨

Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis

Kamyab Karimi, Ali Ghodratnama, Reza Tavakkoli-Moghaddam

Breast cancer is not preventable because of its unknown causes. However, its early diagnosis increases patients' recovery chances. Machine learning (ML) can be utilized to improve treatment outcomes in healthcare operations while diminishing costs and time. In this research, we suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA) and their combination with ML algorithms. This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before. K-nearest neighbors, support vector machine, decision tree, Naive Bayes, AdaBoost, linear discriminant analysis, random forest, logistic regression, and artificial neural network are some of the methods employed. This paper applied a distinctive integration of evaluation measures and ML algorithms using the wrapper feature selection based on ICA (WFSIC) and BA (WFSB) separately. We compared two proposed approaches for the performance of the classifiers. Also, we compared our best diagnostic model with previous works reported in the literature survey. Experimentations were performed on the Wisconsin diagnostic breast cancer dataset. Results reveal that the proposed framework that uses the BA with an accuracy of 99.12%, surpasses the framework using the ICA and most previous works. Additionally, the RF classifier in the approach of FS based on BA emerges as the best model and outperforms others regarding its criteria. Besides, the results illustrate the role of our techniques in reducing the dataset dimensions up to 90% and increasing the performance of diagnostic models by over 99%. Moreover, the result demonstrates that there are more critical features than the optimum dataset obtained by proposed FS approaches that have been selected by most ML models.

7/23/2024