SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

Read original: arXiv:2405.18004 - Published 5/29/2024 by Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

Overview

This paper presents a new multi-modal dermatology dataset called SkinCAP, which contains over 100,000 skin images and rich medical captions.
The dataset is designed to support research on tasks like automated diagnosis, image-to-text generation, and multi-modal understanding in the dermatology domain.
SkinCAP includes a diverse range of skin conditions, annotated by medical experts with detailed descriptions, diagnoses, and treatment recommendations.

Plain English Explanation

The researchers have created a new collection of skin images and related medical information, called SkinCAP. This dataset is intended to help develop AI models for dermatology, such as systems that can automatically identify skin conditions or generate detailed text descriptions from images.

SkinCAP contains over 100,000 skin images, along with extensive captions written by medical experts. These captions provide rich details about each image, including the specific skin condition, its symptoms, how it should be diagnosed, and recommended treatments. The dataset covers a wide variety of different skin problems, from common rashes to rare diseases.

By making this diverse, expert-annotated dataset publicly available, the researchers hope to advance the state-of-the-art in dermatology AI systems and improve their ability to assist doctors and patients. The dataset could also enable new applications, like AI-powered skin condition diagnosis tools or automated generation of medical reports from skin images.

Technical Explanation

The SkinCAP dataset contains over 100,000 skin images, each accompanied by a detailed medical caption describing the condition, symptoms, diagnosis, and treatment recommendations. The images were collected from various online sources and reviewed by medical experts to ensure accuracy and diversity of skin conditions represented.

The captions were written by dermatologists and other medical professionals, providing rich textual information about each case. This includes the name of the skin condition, a description of its appearance and symptoms, the clinical findings that lead to the diagnosis, and the recommended course of treatment. The captions typically range from 50 to 200 words in length.

To enable research on tasks like multi-modal understanding and image-to-text generation in dermatology, the dataset is split into training, validation, and test sets. Baseline model performance is reported for several relevant tasks, establishing a strong starting point for future work.

Critical Analysis

The SkinCAP dataset represents a valuable resource for advancing AI research in the important domain of dermatology. By providing a large, diverse, and richly annotated collection of skin images, the researchers have addressed a key limitation of previous datasets, which tended to be smaller in scale and have less detailed textual information.

However, the dataset does have some potential limitations that could be explored further. For example, the images were collected from online sources, which may introduce biases or quality issues compared to clinical photos. Additionally, while the captions are detailed, they may not capture the full complexity and nuance of real-world clinical decision-making.

Future work could also investigate ways to expand the dataset, such as incorporating demographic information, incorporating 3D imaging data, or collecting additional annotations from a broader set of medical experts. Exploring these directions could lead to even more impactful advancements in dermatology AI systems.

Conclusion

The SkinCAP dataset represents a significant step forward in enabling AI research for dermatology. By providing a large, diverse, and expertly-annotated collection of skin images and medical captions, the dataset opens up new possibilities for developing advanced AI-powered tools to assist doctors and improve patient care. As the field of dermatology AI continues to evolve, datasets like SkinCAP will play a crucial role in driving progress and unlocking new applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatological diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce SkinCAP: a multi-modal dermatology dataset annotated with rich medical captions. SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.

5/29/2024

DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

Abdurrahim Yilmaz, Sirin Pekcan Yasar, Gulsum Gencoglan, Burak Temelkuran

Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution photos and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.

6/12/2024

Data Alignment for Zero-Shot Concept Generation in Dermatology AI

Soham Gadgil, Mahtab Bigverdi

AI in dermatology is evolving at a rapid pace but the major limitation to training trustworthy classifiers is the scarcity of data with ground-truth concept level labels, which are meta-labels semantically meaningful to humans. Foundation models like CLIP providing zero-shot capabilities can help alleviate this challenge by leveraging vast amounts of image-caption pairs available on the internet. CLIP can be fine-tuned using domain specific image-caption pairs to improve classification performance. However, CLIP's pre-training data is not well-aligned with the medical jargon that clinicians use to perform diagnoses. The development of large language models (LLMs) in recent years has led to the possibility of leveraging the expressive nature of these models to generate rich text. Our goal is to use these models to generate caption text that aligns well with both the clinical lexicon and with the natural human language used in CLIP's pre-training data. Starting with captions used for images in PubMed articles, we extend them by passing the raw captions through an LLM fine-tuned on the field's several textbooks. We find that using captions generated by an expressive fine-tuned LLM like GPT-3.5 improves downstream zero-shot concept classification performance.

9/10/2024

A Novel Perspective for Multi-modal Multi-label Skin Lesion Classification

Yuan Zhang, Yutong Xie, Hu Wang, Jodie C Avery, M Louise Hull, Gustavo Carneiro

The efficacy of deep learning-based Computer-Aided Diagnosis (CAD) methods for skin diseases relies on analyzing multiple data modalities (i.e., clinical+dermoscopic images, and patient metadata) and addressing the challenges of multi-label classification. Current approaches tend to rely on limited multi-modal techniques and treat the multi-label problem as a multiple multi-class problem, overlooking issues related to imbalanced learning and multi-label correlation. This paper introduces the innovative Skin Lesion Classifier, utilizing a Multi-modal Multi-label TransFormer-based model (SkinM2Former). For multi-modal analysis, we introduce the Tri-Modal Cross-attention Transformer (TMCT) that fuses the three image and metadata modalities at various feature levels of a transformer encoder. For multi-label classification, we introduce a multi-head attention (MHA) module to learn multi-label correlations, complemented by an optimisation that handles multi-label and imbalanced learning problems. SkinM2Former achieves a mean average accuracy of 77.27% and a mean diagnostic accuracy of 77.85% on the public Derm7pt dataset, outperforming state-of-the-art (SOTA) methods.

9/20/2024