Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

Read original: arXiv:2406.13674 - Published 6/21/2024 by Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

Overview

This paper evaluates the robustness of abdominal organ segmentation models in challenging clinical scenarios.
The authors propose a benchmark dataset with diverse, complex cases to assess the performance and limitations of existing segmentation algorithms.
The goal is to identify areas for improvement in abdominal organ segmentation to enable more reliable computer-assisted diagnosis and treatment planning.

Plain English Explanation

Accurate segmentation of abdominal organs in medical images is a crucial task for healthcare applications like disease diagnosis and treatment planning. However, real-world clinical scenarios often present complex cases that can challenge the capabilities of existing segmentation models.

This paper introduces a new benchmark dataset and evaluation framework to assess the robustness of abdominal organ segmentation algorithms in the face of these challenging cases. The dataset includes a diverse range of scenarios, such as [object Object],: rule-based outlier detection for [object Object],: multi-organ classification, [object Object],: segmentation hallucination, and [object Object],: delineation of organs at risk in radiotherapy planning.

By evaluating segmentation models on this benchmark, the authors aim to identify their strengths, limitations, and areas for improvement. This knowledge can then guide the development of more robust and reliable algorithms for real-world clinical applications, ultimately enhancing the quality of computer-assisted diagnosis and treatment.

Technical Explanation

The authors present a comprehensive benchmark, called Rethinking Abdominal Organ Segmentation (RAOS), to evaluate the robustness of abdominal organ segmentation algorithms in challenging clinical scenarios. The benchmark includes a diverse dataset of CT scans with various challenging cases, such as [object Object],: organs at risk delineation, anatomical abnormalities, and image artifacts.

To create the RAOS dataset, the authors collected CT scans from multiple institutions and manually annotated the abdominal organs, including the liver, kidneys, spleen, and pancreas. They then introduced various types of challenging cases, such as organ displacement, occlusion, and [object Object],: segmentation hallucination, to test the segmentation models' performance and robustness.

The authors evaluate several state-of-the-art segmentation algorithms, including U-Net, nnUNet, and TransUNet, on the RAOS benchmark. The results reveal that while these models perform well on standard datasets, they struggle with the challenging cases in the RAOS benchmark, demonstrating the need for more robust and reliable segmentation algorithms.

Critical Analysis

The RAOS benchmark presented in this paper is a valuable contribution to the field of abdominal organ segmentation. By introducing a diverse set of challenging cases, the authors have highlighted the limitations of current segmentation algorithms and identified areas for improvement.

One potential limitation of the study is the relatively small size of the RAOS dataset, which may not fully capture the breadth of complex scenarios encountered in real-world clinical practice. [object Object],: Expanding the dataset with more diverse cases could further strengthen the benchmark and provide a more comprehensive evaluation of segmentation models.

Additionally, the authors could have explored the use of [object Object],: multi-modal data, such as combining CT scans with other imaging modalities (e.g., MRI, PET), to investigate whether these approaches could improve the robustness of segmentation algorithms in challenging cases.

Overall, the RAOS benchmark is a valuable contribution that highlights the need for more robust and reliable abdominal organ segmentation algorithms to support clinical decision-making and treatment planning. The insights from this study can guide future research and development in this important field.

Conclusion

The Rethinking Abdominal Organ Segmentation (RAOS) benchmark presented in this paper addresses a critical need in the field of medical image analysis. By evaluating the performance of state-of-the-art segmentation algorithms on a diverse set of challenging clinical cases, the authors have identified key limitations and areas for improvement.

The findings from this study can inform the development of more robust and reliable abdominal organ segmentation algorithms, which are essential for enhancing the accuracy and reliability of computer-assisted diagnosis, treatment planning, and patient care. Continued research and innovation in this area have the potential to significantly improve healthcare outcomes for patients.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($sim$80k 2D images, $sim$8k 3D organ annotations) from 413 patients each with 17 (female) or 19 (male) labelled organs, manually delineated by oncologists. We grouped scans based on clinical information into 1) diagnosis/radiotherapy (317 volumes), 2) partial excision without the whole organ missing (22 volumes), and 3) excision with the whole organ missing (74 volumes). RAOS provides a potential benchmark for evaluating model robustness including organ hallucination. It also includes some organs that can be very hard to access on public datasets like the rectum, colon, intestine, prostate and seminal vesicles. We benchmarked several state-of-the-art methods in these three clinical groups to evaluate performance and robustness. We also assessed cross-generalization between RAOS and three public datasets. This dataset and comprehensive analysis establish a potential baseline for future robustness research: url{https://github.com/Luoxd1996/RAOS}.

6/21/2024

Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations https://codalab.lisn.upsaclay.fr/competitions/12239.

8/23/2024

Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scans

S'ebastien Quetin, Andrew Heschl, Mauricio Murillo, Rohit Murali, Shirin A. Enger, Farhad Maleki

Background and purpose: Deep Learning (DL) has been widely explored for Organs at Risk (OARs) segmentation; however, most studies have focused on a single modality, either CT or MRI, not both simultaneously. This study presents a high-performing DL pipeline for segmentation of 30 OARs from MRI and CT scans of Head and Neck (H&N) cancer patients. Materials and methods: Paired CT and MRI-T1 images from 42 H&N cancer patients alongside annotation for 30 OARs from the H&N OAR CT & MR segmentation challenge dataset were used to develop a segmentation pipeline. After cropping irrelevant regions, rigid followed by non-rigid registration of CT and MRI volumes was performed. Two versions of the CT volume, representing soft tissues and bone anatomy, were stacked with the MRI volume and used as input to an nnU-Net pipeline. Modality Dropout was used during the training to force the model to learn from the different modalities. Segmentation masks were predicted with the trained model for an independent set of 14 new patients. The mean Dice Score (DS) and Hausdorff Distance (HD) were calculated for each OAR across these patients to evaluate the pipeline. Results: This resulted in an overall mean DS and HD of 0.777 +- 0.118 and 3.455 +- 1.679, respectively, establishing the state-of-the-art (SOTA) for this challenge at the time of submission. Conclusion: The proposed pipeline achieved the best DS and HD among all participants of the H&N OAR CT and MR segmentation challenge and sets a new SOTA for automated segmentation of H&N OARs.

5/27/2024

📈

Quality assurance of organs-at-risk delineation in radiotherapy

Yihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Wei Liu, Chenbin Liu

The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. Automatic segmentation can be used to reduce the physician workload and improve the consistency. However, the quality assurance of the automatic segmentation is still an unmet need in clinical practice. The patient data used in our study was a standardized dataset from AAPM Thoracic Auto-Segmentation Challenge. The OARs included were left and right lungs, heart, esophagus, and spinal cord. Two groups of OARs were generated, the benchmark dataset manually contoured by experienced physicians and the test dataset automatically created using a software AccuContour. A resnet-152 network was performed as feature extractor, and one-class support vector classifier was used to determine the high or low quality. We evaluate the model performance with balanced accuracy, F-score, sensitivity, specificity and the area under the receiving operator characteristic curve. We randomly generated contour errors to assess the generalization of our method, explored the detection limit, and evaluated the correlations between detection limit and various metrics such as volume, Dice similarity coefficient, Hausdorff distance, and mean surface distance. The proposed one-class classifier outperformed in metrics such as balanced accuracy, AUC, and others. The proposed method showed significant improvement over binary classifiers in handling various types of errors. Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy. The proposed method can significantly reduce the burden of physician review for contour delineation.

5/21/2024