Slide-SAM: Medical SAM Meets Sliding Window

Read original: arXiv:2311.10121 - Published 4/17/2024 by Quan Quan, Fenghe Tang, Zikang Xu, Heqin Zhu, S. Kevin Zhou

🔗

Overview

The Segment Anything Model (SAM) has achieved success in 2D image segmentation, but struggles with 3D medical image segmentation tasks.
SAM has difficulty learning the contextual relationships between slices in 3D medical images, limiting its practical applicability.
Applying 2D SAM to 3D images requires prompting the entire volume, which is time- and label-consuming.

Plain English Explanation

The Segment Anything Model (SAM) is a powerful tool for segmenting objects in 2D natural images. However, when it comes to 3D medical images, such as CT scans or MRI data, SAM faces some significant challenges.

Medical images are quite different from natural photographs, and SAM struggles to learn the contextual relationships between the various slices or cross-sections that make up a 3D volume. This limitation means SAM can't fully utilize the 3D information in these medical datasets, reducing its effectiveness.

Additionally, using SAM for 3D medical image segmentation requires providing prompts (like bounding boxes or clicks) for the entire 3D volume. This process is very time-consuming and labor-intensive, making it impractical for real-world medical applications.

Technical Explanation

To address these problems, the researchers propose a new model called Slide-SAM. Slide-SAM treats a stack of three adjacent slices from a 3D medical image as a "prediction window." It takes these three slices and any prompts (like points or bounding boxes) on the central slice as inputs, and then predicts segmentation masks for all three slices.

The masks for the top and bottom slices in this window are then used to generate new prompts for the adjacent slices. By sliding this prediction window forward or backward through the 3D volume, Slide-SAM can perform step-wise segmentation of the entire 3D dataset.

The researchers trained Slide-SAM on multiple public and private medical image datasets, and demonstrated its effectiveness through extensive 3D segmentation experiments. Importantly, Slide-SAM is able to achieve good results with minimal user prompts, making it more practical for real-world medical applications compared to the original SAM model.

Critical Analysis

The researchers acknowledge that while Slide-SAM represents an improvement over applying the original 2D SAM to 3D medical images, there is still room for further refinement and research. For example, the model may struggle with large gaps between slices or significant changes in anatomy between adjacent frames.

Additionally, the researchers note that Slide-SAM, like the original SAM, relies on prompts from users to guide the segmentation process. Reducing or eliminating the need for human-provided prompts could further enhance the model's practicality and usability in medical settings.

Future work could explore ways to better leverage the 3D context within medical images, perhaps through the use of 3D radiance fields or other volumetric representations. Integrating Slide-SAM with other medical image analysis techniques, such as pathological primitive segmentation or zero-shot video analysis, could also unlock new capabilities and applications.

Conclusion

The Slide-SAM model represents an important step forward in adapting the powerful Segment Anything Model to the domain of 3D medical image segmentation. By treating a stack of adjacent slices as a prediction window, Slide-SAM is able to better leverage the 3D context in these datasets, while requiring fewer user prompts than the original SAM.

As the researchers continue to refine and expand Slide-SAM, it has the potential to significantly improve the efficiency and accuracy of medical image analysis tasks, ultimately leading to better patient outcomes. Further advancements in text-image bridging and test-time adaptation could also unlock new capabilities for this promising model.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Slide-SAM: Medical SAM Meets Sliding Window

Quan Quan, Fenghe Tang, Zikang Xu, Heqin Zhu, S. Kevin Zhou

The Segment Anything Model (SAM) has achieved a notable success in two-dimensional image segmentation in natural images. However, the substantial gap between medical and natural images hinders its direct application to medical image segmentation tasks. Particularly in 3D medical images, SAM struggles to learn contextual relationships between slices, limiting its practical applicability. Moreover, applying 2D SAM to 3D images requires prompting the entire volume, which is time- and label-consuming. To address these problems, we propose Slide-SAM, which treats a stack of three adjacent slices as a prediction window. It firstly takes three slices from a 3D volume and point- or bounding box prompts on the central slice as inputs to predict segmentation masks for all three slices. Subsequently, the masks of the top and bottom slices are then used to generate new prompts for adjacent slices. Finally, step-wise prediction can be achieved by sliding the prediction window forward or backward through the entire volume. Our model is trained on multiple public and private medical datasets and demonstrates its effectiveness through extensive 3D segmetnation experiments, with the help of minimal prompts. Code is available at url{https://github.com/Curli-quan/Slide-SAM}.

4/17/2024

Segment anything model 2: an application to 2D and 3D medical images

Haoyu Dong, Hanxue Gu, Yaqian Chen, Jichen Yang, Yuwen Chen, Maciej A. Mazurowski

Segment Anything Model (SAM) has gained significant attention because of its ability to segment various objects in images given a prompt. The recently developed SAM 2 has extended this ability to video inputs. This opens an opportunity to apply SAM to 3D images, one of the fundamental tasks in the medical imaging field. In this paper, we extensively evaluate SAM 2's ability to segment both 2D and 3D medical images by first collecting 21 medical imaging datasets, including surgical videos, common 3D modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) as well as 2D modalities such as X-ray and ultrasound. Two evaluation settings of SAM 2 are considered: (1) multi-frame 3D segmentation, where prompts are provided to one or multiple slice(s) selected from the volume, and (2) single-frame 2D segmentation, where prompts are provided to each slice. The former only applies to videos and 3D modalities, while the latter applies to all datasets. Our results show that SAM 2 exhibits similar performance as SAM under single-frame 2D segmentation, and has variable performance under multi-frame 3D segmentation depending on the choices of slices to annotate, the direction of the propagation, the predictions utilized during the propagation, etc. We believe our work enhances the understanding of SAM 2's behavior in the medical field and provides directions for future work in adapting SAM 2 to this domain. Our code is available at: https://github.com/mazurowski-lab/segment-anything2-medical-evaluation.

8/23/2024

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Jiayuan Zhu, Yunli Qi, Junde Wu

In this paper, we introduce Medical SAM 2 (MedSAM-2), an advanced segmentation model that utilizes the SAM 2 framework to address both 2D and 3D medical image segmentation tasks. By adopting the philosophy of taking medical images as videos, MedSAM-2 not only applies to 3D medical images but also unlocks new One-prompt Segmentation capability. That allows users to provide a prompt for just one or a specific image targeting an object, after which the model can autonomously segment the same type of object in all subsequent images, regardless of temporal relationships between the images. We evaluated MedSAM-2 across a variety of medical imaging modalities, including abdominal organs, optic discs, brain tumors, thyroid nodules, and skin lesions, comparing it against state-of-the-art models in both traditional and interactive segmentation settings. Our findings show that MedSAM-2 not only surpasses existing models in performance but also exhibits superior generalization across a range of medical image segmentation tasks. Our code will be released at: https://github.com/MedicineToken/Medical-SAM2

8/6/2024

SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images

Zafer Yildiz, Yuwen Chen, Maciej A. Mazurowski

Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in the form of an extension to the popular annotation software: 3D Slicer. Our extension allows users to place point prompts on 2D slices to generate annotation masks and propagate these annotations across entire volumes in either single-directional or bi-directional manners. Our code is publicly available on https://github.com/mazurowski-lab/SlicerSegmentWithSAM and can be easily installed directly from the Extension Manager of 3D Slicer as well.

8/28/2024