DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

Read original: arXiv:2406.07426 - Published 6/12/2024 by Abdurrahim Yilmaz, Sirin Pekcan Yasar, Gulsum Gencoglan, Burak Temelkuran

DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

Overview

This paper presents DERM12345, a large, multisource dermatoscopic skin lesion dataset with 38 subclasses.
The dataset was collected from multiple sources and covers a diverse range of skin lesions, providing a comprehensive resource for training and evaluating computer vision models in dermatology.
The authors describe the dataset's characteristics, including the number of images, annotations, and class distribution, as well as the data collection and processing methodologies.

Plain English Explanation

This research paper introduces a new dataset called DERM12345, which is a collection of over 12,000 images of different types of skin lesions, such as moles, warts, and other conditions. The dataset was created by combining data from various sources, including hospitals and medical clinics, to provide a diverse and comprehensive set of skin lesion examples.

The dataset is divided into 38 different subclasses, each representing a specific type of skin lesion. This allows researchers and developers working on skin condition diagnosis or skin image analysis to train and test their machine learning models on a wide range of skin conditions.

The authors explain how they collected and processed the images to ensure high-quality data, including annotating each image with relevant information about the skin lesion. This comprehensive dataset can be a valuable resource for researchers and clinicians working to develop improved diagnostic tools for skin conditions, which could lead to earlier detection and better treatment outcomes for patients.

Technical Explanation

The DERM12345 dataset is a large, multisource dermatoscopic skin lesion dataset with 38 subclasses. The authors collected over 12,000 images of skin lesions from various sources, including hospitals, clinics, and online repositories, to create a diverse and comprehensive dataset.

Each image in the dataset is annotated with detailed information about the skin lesion, such as its type, location, and visual characteristics. The authors used a combination of manual and automated methods to process the images and ensure consistent annotations across the dataset.

The dataset is divided into 38 subclasses, covering a wide range of skin conditions, including melanomas, basal cell carcinomas, nevi, and seborrheic keratosis, among others. This granular classification allows researchers to develop and evaluate computer vision models for accurate skin lesion diagnosis and segmentation.

The authors describe the dataset's characteristics, such as the number of images, the distribution of classes, and the data collection and processing methodologies. They also provide insights into the potential applications of the dataset, including training and evaluating machine learning models for skin condition diagnosis and image analysis.

Critical Analysis

The authors have done a commendable job in creating a large, diverse, and well-annotated dataset for dermatoscopic skin lesion analysis. The dataset's breadth and depth provide researchers with a valuable resource for developing and evaluating advanced computer vision models in the field of dermatology.

However, the paper does not provide details on the data collection and annotation process, such as the criteria used for selecting the images or the level of expertise of the annotators. Additionally, the authors do not discuss potential biases or limitations in the dataset, such as the geographic or demographic representation of the skin lesions.

Furthermore, the authors could have explored the potential challenges and ethical considerations in using a dataset of sensitive medical images, such as patient privacy and data security. These aspects are crucial for ensuring the responsible development and deployment of AI-based diagnostic tools in clinical settings.

Despite these limitations, the DERM12345 dataset represents a significant contribution to the field of dermatoscopic skin lesion analysis, and the authors' work lays the foundation for future research and development in this area.

Conclusion

The DERM12345 dataset presented in this paper is a valuable resource for researchers and clinicians working in the field of dermatology. By providing a large, diverse, and well-annotated collection of skin lesion images, the authors have created a comprehensive dataset that can be used to train and evaluate machine learning models for accurate skin condition diagnosis and image analysis.

The dataset's 38 subclasses and the detailed annotations enable researchers to develop specialized models for a wide range of skin conditions, potentially leading to improved diagnostic tools and earlier detection of skin diseases. The DERM12345 dataset represents a significant step forward in the field of computer-assisted dermatology, and its availability will undoubtedly spur further advancements in this important area of healthcare.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

Abdurrahim Yilmaz, Sirin Pekcan Yasar, Gulsum Gencoglan, Burak Temelkuran

Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution photos and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.

6/12/2024

SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatological diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce SkinCAP: a multi-modal dermatology dataset annotated with rich medical captions. SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.

5/29/2024

Enhancing Skin Disease Classification Leveraging Transformer-based Deep Learning Architectures and Explainable AI

Jayanth Mohan, Arrun Sivasubramanian, V Sowmya, Ravi Vinayakumar

Skin diseases affect over a third of the global population, yet their impact is often underestimated. Automating skin disease classification to assist doctors with their prognosis might be difficult. Nevertheless, due to efficient feature extraction pipelines, deep learning techniques have shown much promise for various tasks, including dermatological disease identification. This study uses a skin disease dataset with 31 classes and compares it with all versions of Vision Transformers, Swin Transformers and DivoV2. The analysis is also extended to compare with benchmark convolution-based architecture presented in the literature. Transfer learning with ImageNet1k weights on the skin disease dataset contributes to a high test accuracy of 96.48% and an F1-Score of 0.9727 using DinoV2, which is almost a 10% improvement over this data's current benchmark results. The performance of DinoV2 was also compared for the HAM10000 and Dermnet datasets to test the model's robustness, and the trained model overcomes the benchmark results by a slight margin in test accuracy and in F1-Score on the 23 and 7 class datasets. The results are substantiated using explainable AI frameworks like GradCAM and SHAP, which provide precise image locations to map the disease, assisting dermatologists in early detection, prompt prognosis, and treatment.

7/23/2024

🧠

DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images

Ashish Sinha, Jeremy Kawahara, Arezou Pakzad, Kumar Abhishek, Matthieu Ruthven, Enjie Ghorbel, Anis Kacem, Djamila Aouada, Ghassan Hamarneh

In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. DermSynth3D blends skin disease patterns onto 3D textured meshes of human subjects using a differentiable renderer and generates 2D images from various camera viewpoints under chosen lighting conditions in diverse background scenes. Our method adheres to top-down rules that constrain the blending and rendering process to create 2D images with skin conditions that mimic in-the-wild acquisitions, ensuring more meaningful results. The framework generates photo-realistic 2D dermoscopy images and the corresponding dense annotations for semantic segmentation of the skin, skin conditions, body parts, bounding boxes around lesions, depth maps, and other 3D scene parameters, such as camera position and lighting conditions. DermSynth3D allows for the creation of custom datasets for various dermatology tasks. We demonstrate the effectiveness of data generated using DermSynth3D by training DL models on synthetic data and evaluating them on various dermatology tasks using real 2D dermatological images. We make our code publicly available at https://github.com/sfu-mial/DermSynth3D.

4/23/2024