Expanding the Medical Decathlon dataset: segmentation of colon and colorectal cancer from computed tomography images

Read original: arXiv:2407.21516 - Published 8/1/2024 by I. M. Chernenkiy, Y. A. Drach, S. R. Mustakimova, V. V. Kazantseva, N. A. Ushakov, S. K. Efetov, M. V. Feldsherov

👁️

Overview

Colorectal cancer is the third-most common cancer in the Western Hemisphere.
Automated segmentation of colorectal and colorectal cancer in computed tomography (CT) scans is an important problem in medicine.
A system that can accurately segment these structures could enable early detection of colorectal cancer and assist radiologists in their work.
However, most scientific publications on medical image processing use closed, non-public datasets.

Plain English Explanation

The paper presents an extension of the Medical Decathlon dataset with colorectal markups. This new dataset is intended to improve the quality of algorithms for segmenting colorectal structures and detecting colorectal cancer in CT scans.

An experienced radiologist validated the data, categorized it by quality, and made it publicly available. The researchers then used this dataset to train neural network models of the UNet architecture, achieving a Dice metric quality of 0.6988 ± 0.3 with 5-part cross-validation.

The availability of this public dataset and the resulting models should help improve the detection of colorectal cancer in its early stages and make the radiologist's job easier when analyzing CT scans.

Technical Explanation

The researchers obtained an extension of the Medical Decathlon dataset with colorectal markups made by an experienced radiologist. The radiologist validated the data and categorized it into subsets by quality, then published it in the public domain.

The researchers used this dataset to train neural network models of the UNet architecture with 5-part cross-validation. This approach achieved a Dice metric quality of 0.6988 ± 0.3, which measures the overlap between the model's segmentation predictions and the ground truth.

Critical Analysis

The paper demonstrates the potential of publicly available datasets to advance research in medical image processing. By making this colorectal dataset accessible, the authors have provided a valuable resource for developing and evaluating segmentation algorithms.

However, the relatively low Dice metric quality of 0.6988 ± 0.3 suggests that there is still room for improvement in the accuracy of the segmentation models. Potential areas for further research could include exploring different neural network architectures, incorporating additional data preprocessing techniques, or investigating the impact of the dataset's quality categorization on model performance.

Conclusion

This paper presents an important contribution to the field of medical image processing by introducing a publicly available dataset for colorectal and colorectal cancer segmentation in CT scans. The availability of this dataset and the results of the neural network models trained on it have the potential to facilitate the early detection of colorectal cancer and simplify the work of radiologists in analyzing medical images.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

Expanding the Medical Decathlon dataset: segmentation of colon and colorectal cancer from computed tomography images

I. M. Chernenkiy, Y. A. Drach, S. R. Mustakimova, V. V. Kazantseva, N. A. Ushakov, S. K. Efetov, M. V. Feldsherov

Colorectal cancer is the third-most common cancer in the Western Hemisphere. The segmentation of colorectal and colorectal cancer by computed tomography is an urgent problem in medicine. Indeed, a system capable of solving this problem will enable the detection of colorectal cancer at early stages of the disease, facilitate the search for pathology by the radiologist, and significantly accelerate the process of diagnosing the disease. However, scientific publications on medical image processing mostly use closed, non-public data. This paper presents an extension of the Medical Decathlon dataset with colorectal markups in order to improve the quality of segmentation algorithms. An experienced radiologist validated the data, categorized it into subsets by quality, and published it in the public domain. Based on the obtained results, we trained neural network models of the UNet architecture with 5-part cross-validation and achieved a Dice metric quality of $0.6988 pm 0.3$. The published markups will improve the quality of colorectal cancer detection and simplify the radiologist's job for study description.

8/1/2024

🖼️

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, Jo~ao Fraga, Ana Monteiro, Jo~ao Monteiro, Liliana Ribeiro, Sofia Gonc{c}alves, Stefan Reinhard, Inti Zlobec, Isabel M. Pinto, Jaime S. Cardoso

Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.

5/2/2024

👀

Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges

Debesh Jha, Vanshali Sharma, Debapriya Banik, Debayan Bhattacharya, Kaushiki Roy, Steven A. Hicks, Nikhil Kumar Tomar, Vajira Thambawita, Adrian Krenzer, Ge-Peng Ji, Sahadev Poudel, George Batchkala, Saruar Alam, Awadelrahman M. A. Ahmed, Quoc-Huy Trinh, Zeshan Khan, Tien-Phat Nguyen, Shruti Shrestha, Sabari Nathan, Jeonghwan Gwak, Ritika K. Jha, Zheyuan Zhang, Alexander Schlaefer, Debotosh Bhattacharjee, M. K. Bhuyan, Pradip K. Das, Deng-Ping Fan, Sravanthi Parsa, Sharib Ali, Michael A. Riegler, P{aa}l Halvorsen, Thomas De Lange, Ulas Bagci

Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the Medico automatic polyp segmentation (Medico 2020) and MedAI: Transparency in Medical Image Segmentation (MedAI 2021) competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.

5/8/2024

Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development

Yuncheng Jiang, Yiwen Hu, Zixun Zhang, Jun Wei, Chun-Mei Feng, Xuemei Tang, Xiang Wan, Yong Liu, Shuguang Cui, Zhen Li

Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS scenarios, i.e. colorectal cancer segmentation, detection, and infiltration depth staging. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. Based on this dataset, we further introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR). ASTR is designed based on three considerations: scanning mode discrepancy, temporal information, and low computational complexity. For generalizing to different scanning modes, the adaptive scanning-mode augmentation is proposed to convert between raw sector images and linear scan ones. For mining temporal information, the sparse-context transformer is incorporated to integrate inter-frame local and global features. For reducing computational complexity, the sparse-context block is introduced to extract contextual features from auxiliary frames. Finally, on the benchmark dataset, the proposed ASTR model achieves a 77.6% Dice score in rectal cancer segmentation, largely outperforming previous state-of-the-art methods.

8/20/2024