Evaluating Machine Learning-based Skin Cancer Diagnosis

Read original: arXiv:2409.03794 - Published 9/9/2024 by Tanish Jain

🤷

Overview

This study evaluates the reliability of two deep learning models for skin cancer detection.
The researchers focus on the models' explainability and fairness.
They use the HAM10000 dataset of dermatoscopic images to assess two convolutional neural network architectures: a MobileNet-based model and a custom CNN model.
The models are evaluated on their ability to classify skin lesions into seven categories and distinguish between dangerous and benign lesions.
Explainability is assessed using Saliency Maps and Integrated Gradients, and the results are interpreted by a dermatologist.
Fairness is evaluated using the Equalized Odds metric across sex and skin tone groups.

Plain English Explanation

The study looked at how well two artificial intelligence (AI) models could detect skin cancer. The researchers wanted to see how reliable the models were and how well they could explain their decision-making process. They also checked if the models were fair, meaning they performed equally well for people of different sexes and skin tones.

The researchers used a dataset of skin images to train two different types of deep learning models. One model was based on a popular architecture called MobileNet, and the other was a custom-built model.

To test the models' explainability, the researchers used special techniques that highlight the areas of the skin image that the models focus on when making their predictions. A dermatologist (skin doctor) then reviewed these highlights to see if the models were using relevant information to make their decisions.

To check the fairness of the models, the researchers looked at how well the models performed for people of different sexes and skin tones. They found that while the models were fair for people of different sexes, they were less accurate for people with darker skin tones. The researchers then tried a technique to improve the fairness of the models, which helped reduce the differences in performance.

Overall, the study shows that while these AI models for skin cancer detection are promising, more work is needed to ensure they are reliable and fair for all people, regardless of their skin tone.

Technical Explanation

The study evaluates the reliability of two convolutional neural network (CNN) architectures for skin lesion classification using the HAM10000 dataset. The two models assessed are a MobileNet-based model and a custom CNN model.

The models are trained to classify skin lesions into seven categories (e.g., melanoma, nevus, seborrheic keratosis) and to distinguish between dangerous (malignant) and benign lesions. Explainability is evaluated using Saliency Maps and Integrated Gradients, which highlight the regions of the input image that contribute most to the model's predictions. A dermatologist interprets the explainability results.

Fairness is assessed using the Equalized Odds metric, which measures disparities in false positive and false negative rates across sex and skin tone groups. While both models demonstrate fairness across sex groups, they show significant disparities in error rates between light and dark skin tones.

To mitigate these disparities, the researchers apply a Calibrated Equalized Odds postprocessing strategy, which results in improved fairness, particularly in reducing false negative rate differences.

Critical Analysis

The study provides valuable insights into the explainability and fairness of deep learning models for skin cancer detection. The use of Saliency Maps and Integrated Gradients to assess explainability is a strength, as it allows for interpretation of the models' decision-making process by a domain expert.

However, the study acknowledges that the models struggle with certain lesion types, such as seborrheic keratoses and vascular lesions. This suggests that further improvements in model architecture or training data may be needed to enhance performance across all lesion categories.

The finding of significant disparities in model performance across skin tone groups is an important limitation that deserves further investigation. While the Calibrated Equalized Odds strategy helps mitigate these issues, the authors note that more research is needed to ensure fairness for diverse populations.

Future studies could explore additional fairness metrics, as well as the impact of factors like dataset composition and model architecture on fairness. Investigating the underlying reasons for the observed disparities could also provide valuable insights to improve the reliability of AI-based skin cancer detection systems.

Conclusion

This study demonstrates the importance of evaluating the explainability and fairness of deep learning models, particularly in high-stakes medical applications like skin cancer detection. While the models show promise in their ability to classify skin lesions, the research highlights the need for further development to ensure reliable and equitable performance across diverse populations.

The findings underscore the crucial role of rigorous model evaluation and the involvement of domain experts, such as dermatologists, to ensure that AI systems are trustworthy and beneficial for all users. As the use of deep learning in healthcare continues to grow, this study serves as a valuable example of the thorough assessment required to deliver on the promise of AI-powered medical tools.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤷

Evaluating Machine Learning-based Skin Cancer Diagnosis

Tanish Jain

This study evaluates the reliability of two deep learning models for skin cancer detection, focusing on their explainability and fairness. Using the HAM10000 dataset of dermatoscopic images, the research assesses two convolutional neural network architectures: a MobileNet-based model and a custom CNN model. Both models are evaluated for their ability to classify skin lesions into seven categories and to distinguish between dangerous and benign lesions. Explainability is assessed using Saliency Maps and Integrated Gradients, with results interpreted by a dermatologist. The study finds that both models generally highlight relevant features for most lesion types, although they struggle with certain classes like seborrheic keratoses and vascular lesions. Fairness is evaluated using the Equalized Odds metric across sex and skin tone groups. While both models demonstrate fairness across sex groups, they show significant disparities in false positive and false negative rates between light and dark skin tones. A Calibrated Equalized Odds postprocessing strategy is applied to mitigate these disparities, resulting in improved fairness, particularly in reducing false negative rate differences. The study concludes that while the models show promise in explainability, further development is needed to ensure fairness across different skin tones. These findings underscore the importance of rigorous evaluation of AI models in medical applications, particularly in diverse population groups.

9/9/2024

Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Carolin Flosdorf, Justin Engelker, Igor Keller, Nicolas Mohr

Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.

8/27/2024

🎲

Enhancing Skin Lesion Diagnosis with Ensemble Learning

Xiaoyi Liu, Zhou Yu, Lianghao Tan, Yafeng Yan, Ge Shi

Skin lesions are an increasingly significant medical concern, varying widely in severity from benign to cancerous. Accurate diagnosis is essential for ensuring timely and appropriate treatment. This study examines the implementation of deep learning methods to assist in the diagnosis of skin lesions using the HAM10000 dataset, which contains seven distinct types of lesions. First, we evaluated three pre-trained models: MobileNetV2, ResNet18, and VGG11, achieving accuracies of 0.798, 0.802, and 0.805, respectively. To further enhance classification accuracy, we developed ensemble models employing max voting, average voting, and stacking, resulting in accuracies of 0.803, 0.82, and 0.83. Building on the best-performing ensemble learning model, stacking, we developed our proposed model, SkinNet, which incorporates a customized architecture and fine-tuning, achieving an accuracy of 0.867 and an AUC of 0.96. This substantial improvement over individual models demonstrates the effectiveness of ensemble learning in improving skin lesion classification.

9/9/2024

Skin Cancer Images Classification using Transfer Learning Techniques

Md Sirajul Islam, Sanjeev Panta

Skin cancer is one of the most common and deadliest types of cancer. Early diagnosis of skin cancer at a benign stage is critical to reducing cancer mortality. To detect skin cancer at an earlier stage an automated system is compulsory that can save the life of many patients. Many previous studies have addressed the problem of skin cancer diagnosis using various deep learning and transfer learning models. However, existing literature has limitations in its accuracy and time-consuming procedure. In this work, we applied five different pre-trained transfer learning approaches for binary classification of skin cancer detection at benign and malignant stages. To increase the accuracy of these models we fine-tune different layers and activation functions. We used a publicly available ISIC dataset to evaluate transfer learning approaches. For model stability, data augmentation techniques are applied to improve the randomness of the input dataset. These approaches are evaluated using different hyperparameters such as batch sizes, epochs, and optimizers. The experimental results show that the ResNet-50 model provides an accuracy of 0.935, F1-score of 0.86, and precision of 0.94.

6/21/2024