CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

Read original: arXiv:2402.03631 - Published 7/17/2024 by Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

Overview

This paper introduces CAT-SAM, a new approach for few-shot adaptation of the Segmentation Anything Model (SAM) to medical image segmentation tasks.
CAT-SAM leverages a conditional tuning network to efficiently fine-tune the pre-trained SAM model on limited target-domain data, enabling effective transfer to new anatomical structures.
The method demonstrates strong performance on challenging medical image segmentation benchmarks, outperforming existing few-shot adaptation techniques for SAM.

Plain English Explanation

The Segmentation Anything Model (SAM) is a powerful deep learning model that can segment objects in images with just a few examples. However, applying SAM to specialized medical imaging tasks can be challenging, as the model may not perform well on the unique characteristics of medical scans.

To address this, the researchers developed CAT-SAM, a new approach that allows SAM to be quickly adapted to new medical imaging domains. CAT-SAM uses a "conditional tuning network" that can learn how to fine-tune the pre-trained SAM model on a small number of target-domain examples.

This allows CAT-SAM to effectively transfer the capabilities of the general SAM model to specialized medical segmentation tasks, such as identifying different anatomical structures in CT or MRI scans. The researchers demonstrated that CAT-SAM outperforms existing few-shot adaptation techniques for SAM, achieving strong performance on challenging medical image segmentation benchmarks.

Technical Explanation

The key innovation of CAT-SAM is the conditional tuning network, which learns how to efficiently fine-tune the pre-trained SAM model on limited target-domain data. This network takes in the target-domain images and prompts, and outputs a set of parameter updates that can be applied to the SAM model to adapt it to the new task.

The conditional tuning network is trained end-to-end alongside the SAM model, allowing it to learn task-specific adaptations that maximize performance on the target domain. This is in contrast to standard fine-tuning approaches, which can struggle to effectively adapt a pre-trained model using only a few examples.

In experiments, the researchers evaluated CAT-SAM on several medical image segmentation benchmarks, including SAM-FewShot, ASAM, and DeepInstructionTuning. CAT-SAM demonstrated superior performance compared to both standard fine-tuning and other few-shot adaptation techniques for SAM, highlighting the effectiveness of the conditional tuning network approach.

Critical Analysis

One potential limitation of CAT-SAM is that the conditional tuning network adds an extra layer of complexity to the overall model, which could make it more challenging to train and deploy in some real-world scenarios. The researchers acknowledge this trade-off, noting that the improved performance on few-shot adaptation tasks may justify the additional complexity in many applications.

Additionally, while CAT-SAM shows strong results on the evaluated medical imaging benchmarks, it would be valuable to see how the method performs on an even broader range of medical segmentation tasks and datasets. Further research could also explore ways to improve the efficiency and scalability of the conditional tuning network, potentially making CAT-SAM even more practical for a wider range of real-world use cases.

Conclusion

CAT-SAM presents a novel approach for adapting the Segmentation Anything Model to specialized medical imaging tasks using only a few examples. By leveraging a conditional tuning network, the method can efficiently fine-tune the pre-trained SAM model to segment new anatomical structures, outperforming existing few-shot adaptation techniques.

This work demonstrates the potential of using flexible, general-purpose models like SAM as a starting point for developing specialized medical imaging tools. As the field of medical AI continues to evolve, approaches like CAT-SAM may play an important role in enabling more efficient and effective transfer of powerful deep learning models to critical real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: https://xiaoaoran.github.io/projects/CAT-SAM

7/17/2024

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, Mayank Kumar

We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured in image embeddings. This innovative reformulation greatly reduces the need for time-consuming online user interactions for labeling volumetric images, such as exhaustively marking points and bounding boxes to provide prompts slice by slice. With our method, users can manually segment a few 2D slices offline, and the embeddings of these annotated image regions serve as effective prompts for online segmentation tasks. Our method prioritizes the efficiency of the fine-tuning process by exclusively training the mask decoder through caching mechanisms while keeping the image encoder frozen. Importantly, this approach is not limited to volumetric medical images, but can generically be applied to any 2D/3D segmentation task. To thoroughly evaluate our method, we conducted extensive validation on four datasets, covering six anatomical segmentation tasks across two modalities. Furthermore, we conducted a comparative analysis of different prompting options within SAM and the fully-supervised nnU-Net. The results demonstrate the superior performance of our method compared to SAM employing only point prompts (approximately 50% improvement in IoU) and performs on-par with fully supervised methods whilst reducing the requirement of labeled data by at least an order of magnitude.

7/8/2024

Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes

Ke Zhou, Zhongwei Qiu, Dongmei Fu

Foundational vision models, such as the Segment Anything Model (SAM), have achieved significant breakthroughs through extensive pre-training on large-scale visual datasets. Despite their general success, these models may fall short in specialized tasks with limited data, and fine-tuning such large-scale models is often not feasible. Current strategies involve incorporating adaptors into the pre-trained SAM to facilitate downstream task performance with minimal model adjustment. However, these strategies can be hampered by suboptimal learning approaches for the adaptors. In this paper, we introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM, which enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels. Our Token-level Contrastive adaptor (TC-adaptor) focuses on refining local representations by improving the discriminability of patch tokens, while the Sample-level Contrastive adaptor (SC-adaptor) amplifies global understanding across different samples. Together, these adaptors synergistically enhance feature comparison within and across samples, bolstering the model's representational strength and its ability to adapt to new tasks. Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains: camouflage object detection, shadow segmentation, and polyp segmentation. Specifically, MCA-SAM exhibits substantial relative performance enhancements, achieving a 20.0% improvement in MAE on the COD10K dataset, a 6.0% improvement in MAE on the CAMO dataset, a 15.4% improvement in BER on the ISTD dataset, and a 7.9% improvement in mDice on the Kvasir-SEG dataset.

8/13/2024

S-SAM: SVD-based Fine-Tuning of Segment Anything Model for Medical Image Segmentation

Jay N. Paranjape, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel

Medical image segmentation has been traditionally approached by training or fine-tuning the entire model to cater to any new modality or dataset. However, this approach often requires tuning a large number of parameters during training. With the introduction of the Segment Anything Model (SAM) for prompted segmentation of natural images, many efforts have been made towards adapting it efficiently for medical imaging, thus reducing the training time and resources. However, these methods still require expert annotations for every image in the form of point prompts or bounding box prompts during training and inference, making it tedious to employ them in practice. In this paper, we propose an adaptation technique, called S-SAM, that only trains parameters equal to 0.4% of SAM's parameters and at the same time uses simply the label names as prompts for producing precise masks. This not only makes tuning SAM more efficient than the existing adaptation methods but also removes the burden of providing expert prompts. We call this modified version S-SAM and evaluate it on five different modalities including endoscopic images, x-ray, ultrasound, CT, and histology images. Our experiments show that S-SAM outperforms state-of-the-art methods as well as existing SAM adaptation methods while tuning a significantly less number of parameters. We release the code for S-SAM at https://github.com/JayParanjape/SVDSAM.

8/14/2024