Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting

Read original: arXiv:2309.06824 - Published 7/9/2024 by Xian Lin, Yangyang Xiang, Li Yu, Zengqiang Yan

🖼️

Overview

This paper introduces a new model called SAMUS, which is a universal model tailored for ultrasound image segmentation.
The paper also presents an end-to-end version of SAMUS called AutoSAMUS, which automatically generates prompt embeddings to enable segmentation without manual input.
The researchers collected a comprehensive ultrasound dataset with 30,000 images and 69,000 masks across 6 object categories to verify their models.
The paper compares SAMUS and AutoSAMUS to state-of-the-art task-specific and SAM-based models, demonstrating their superior performance.

Plain English Explanation

Medical image segmentation is an important task for computer-aided diagnosis, but existing models often struggle to generalize well beyond the specific tasks they were trained on. The recent Segment Anything Model (SAM) has shown promise for universal image segmentation, but adapting it to work well with medical imaging, particularly ultrasound, has proven challenging.

The researchers address these challenges by proposing SAMUS, a version of SAM that is tailored for ultrasound image segmentation. SAMUS introduces a parallel CNN branch to capture local information, and uses feature and position adapters to help the model learn the characteristics of ultrasound images. This reduces the complexity of training the model compared to simply fine-tuning SAM.

The researchers also introduce AutoSAMUS, an end-to-end version of SAMUS that automatically generates the prompt embeddings needed for segmentation, eliminating the need for manual prompts. This makes the model even more user-friendly and widely applicable.

To validate their models, the researchers collected a large, diverse ultrasound dataset covering 6 object categories. Their extensive experiments show that SAMUS and AutoSAMUS outperform both task-specific models and SAM-based approaches, demonstrating the potential of their auto-prompted SAM-based approach for medical image segmentation.

Technical Explanation

The core innovation of this work is the SAMUS model, which adapts the Segment Anything Model (SAM) to perform well on ultrasound image segmentation. SAMUS introduces a parallel CNN branch that captures local information, which is then combined with the global features from SAM using cross-branch attention.

To further bridge the gap between SAM's natural image domain and the ultrasound domain, SAMUS employs a feature adapter and a position adapter. These components help the model learn the unique characteristics of ultrasound images, reducing the complexity of the adaptation process compared to simply fine-tuning SAM.

Building on SAMUS, the researchers also present AutoSAMUS, an end-to-end version of the model that automatically generates the prompt embeddings needed for segmentation. This is achieved by introducing an Auto Prompt Generator (APG) module, which replaces the manual prompt encoder used in SAMUS.

The researchers collected a large ultrasound dataset with 30,000 images and 69,000 masks across 6 object categories to evaluate their models. Their extensive experiments show that both SAMUS and AutoSAMUS outperform state-of-the-art task-specific models as well as SAM-based approaches like UltrasoundSAM and SimSAM.

Critical Analysis

The paper presents a compelling approach to adapting the Segment Anything Model (SAM) for the medical imaging domain, specifically ultrasound. The researchers' efforts to address the performance degradation and limited generalization of SAM when applied to ultrasound data are commendable.

However, the paper does not delve deeply into the potential limitations of their models. For instance, it would be valuable to understand how the SAMUS and AutoSAMUS models perform on a wider range of medical imaging modalities beyond ultrasound, and whether the adaptations proposed in this work can be easily transferred to other domains.

Additionally, while the authors mention the complexity reduction achieved by their feature and position adapters compared to fine-tuning SAM, it would be helpful to have more quantitative insights into the training efficiency and computational requirements of their models.

Further exploration of edge cases, potential biases, and failure modes of the auto-prompted segmentation approach could also provide valuable insights for researchers and practitioners interested in applying these techniques in real-world medical settings.

Conclusion

This paper introduces a novel approach to medical image segmentation that adapts the powerful Segment Anything Model (SAM) for ultrasound imaging. The SAMUS model incorporates several key innovations, including a parallel CNN branch, feature and position adapters, and an end-to-end version called AutoSAMUS that automatically generates prompt embeddings.

The researchers' comprehensive evaluation on a large ultrasound dataset demonstrates the superiority of SAMUS and AutoSAMUS over state-of-the-art task-specific and SAM-based models. This work suggests that the auto-prompted SAM-based approach has significant potential to become a new paradigm for end-to-end medical image segmentation, with potential applications in computer-aided diagnosis and beyond.

Overall, this research represents an important step forward in bridging the gap between universal image segmentation models and the specific challenges of medical imaging, paving the way for more robust and user-friendly tools to assist healthcare professionals in their diagnostic and treatment tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting

Xian Lin, Yangyang Xiang, Li Yu, Zengqiang Yan

End-to-end medical image segmentation is of great value for computer-aided diagnosis dominated by task-specific models, usually suffering from poor generalization. With recent breakthroughs brought by the segment anything model (SAM) for universal image segmentation, extensive efforts have been made to adapt SAM for medical imaging but still encounter two major issues: 1) severe performance degradation and limited generalization without proper adaptation, and 2) semi-automatic segmentation relying on accurate manual prompts for interaction. In this work, we propose SAMUS as a universal model tailored for ultrasound image segmentation and further enable it to work in an end-to-end manner denoted as AutoSAMUS. Specifically, in SAMUS, a parallel CNN branch is introduced to supplement local information through cross-branch attention, and a feature adapter and a position adapter are jointly used to adapt SAM from natural to ultrasound domains while reducing training complexity. AutoSAMUS is realized by introducing an auto prompt generator (APG) to replace the manual prompt encoder of SAMUS to automatically generate prompt embeddings. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate the superiority of SAMUS and AutoSAMUS against the state-of-the-art task-specific and SAM-based foundation models. We believe the auto-prompted SAM-based model has the potential to become a new paradigm for end-to-end medical image segmentation and deserves more exploration. Code and data are available at https://github.com/xianlin7/SAMUS.

7/9/2024

🛠️

AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation

Chengyin Li, Prashant Khanduri, Yao Qiang, Rafi Ibn Sultan, Indrin Chetty, Dongxiao Zhu

Segment Anything Model (SAM) is one of the pioneering prompt-based foundation models for image segmentation and has been rapidly adopted for various medical imaging applications. However, in clinical settings, creating effective prompts is notably challenging and time-consuming, requiring the expertise of domain specialists such as physicians. This requirement significantly diminishes SAM's primary advantage - its interactive capability with end users - in medical applications. Moreover, recent studies have indicated that SAM, originally designed for 2D natural images, performs sub optimally on 3D medical image segmentation tasks. This subpar performance is attributed to the domain gaps between natural and medical images and the disparities in spatial arrangements between 2D and 3D images, particularly in multi-organ segmentation applications. To overcome these challenges, we present a novel technique termed AutoProSAM. This method automates 3D multi-organ CT-based segmentation by leveraging SAM's foundational model capabilities without relying on domain experts for prompts. The approach utilizes parameter-efficient adaptation techniques to adapt SAM for 3D medical imagery and incorporates an effective automatic prompt learning paradigm specific to this domain. By eliminating the need for manual prompts, it enhances SAM's capabilities for 3D medical image segmentation and achieves state-of-the-art (SOTA) performance in CT-based multi-organ segmentation tasks.

6/28/2024

🤖

Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images

Zhengzheng Tu, Le Gu, Xixi Wang, Bo Jiang

Segment Anything Model (SAM) has recently achieved amazing results in the field of natural image segmentation. However, it is not effective for medical image segmentation, owing to the large domain gap between natural and medical images. In this paper, we mainly focus on ultrasound image segmentation. As we know that it is very difficult to train a foundation model for ultrasound image data due to the lack of large-scale annotated ultrasound image data. To address these issues, in this paper, we develop a novel Breast Ultrasound SAM Adapter, termed Breast Ultrasound Segment Anything Model (BUSSAM), which migrates the SAM to the field of breast ultrasound image segmentation by using the adapter technique. To be specific, we first design a novel CNN image encoder, which is fully trained on the BUS dataset. Our CNN image encoder is more lightweight, and focuses more on features of local receptive field, which provides the complementary information to the ViT branch in SAM. Then, we design a novel Cross-Branch Adapter to allow the CNN image encoder to fully interact with the ViT image encoder in SAM module. Finally, we add both of the Position Adapter and the Feature Adapter to the ViT branch to fine-tune the original SAM. The experimental results on AMUBUS and BUSI datasets demonstrate that our proposed model outperforms other medical image segmentation models significantly. Our code will be available at: https://github.com/bscs12/BUSSAM.

4/24/2024

MedSAM-U: Uncertainty-Guided Auto Multi-Prompt Adaptation for Reliable MedSAM

Nan Zhou, Ke Zou, Kai Ren, Mengting Luo, Linchao He, Meng Wang, Yidi Chen, Yi Zhang, Hu Chen, Huazhu Fu

The Medical Segment Anything Model (MedSAM) has shown remarkable performance in medical image segmentation, drawing significant attention in the field. However, its sensitivity to varying prompt types and locations poses challenges. This paper addresses these challenges by focusing on the development of reliable prompts that enhance MedSAM's accuracy. We introduce MedSAM-U, an uncertainty-guided framework designed to automatically refine multi-prompt inputs for more reliable and precise medical image segmentation. Specifically, we first train a Multi-Prompt Adapter integrated with MedSAM, creating MPA-MedSAM, to adapt to diverse multi-prompt inputs. We then employ uncertainty-guided multi-prompt to effectively estimate the uncertainties associated with the prompts and their initial segmentation results. In particular, a novel uncertainty-guided prompts adaptation technique is then applied automatically to derive reliable prompts and their corresponding segmentation outcomes. We validate MedSAM-U using datasets from multiple modalities to train a universal image segmentation model. Compared to MedSAM, experimental results on five distinct modal datasets demonstrate that the proposed MedSAM-U achieves an average performance improvement of 1.7% to 20.5% across uncertainty-guided prompts.

9/4/2024