PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

Read original: arXiv:2404.15028 - Published 4/24/2024 by Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

📈

Overview

Presents PRISM, a Promptable and Robust Interactive Segmentation Model for 3D medical image segmentation
Designed with four principles to achieve robustness: iterative learning, confidence learning, corrective learning, and hybrid design
Validated on four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney
Significantly outperforms state-of-the-art methods, achieving results close to human levels

Plain English Explanation

PRISM is a new machine learning model that can precisely segment, or outline, 3D medical images like CT scans or MRIs. It is designed to be robust and adaptable, meaning it can handle a variety of input prompts from users, like points, boxes, or scribbles, and continue improving its segmentation over multiple iterations.

The key principles that make PRISM robust are:

Iterative learning: The model uses the user's prompts from previous steps to gradually refine and improve the segmentation.
Confidence learning: PRISM generates multiple segmentation predictions per image and assigns confidence scores to each one, allowing it to optimize the best results.
Corrective learning: After each iteration, PRISM uses a small additional network to fix any mislabeled areas.
Hybrid design: PRISM combines different encoder networks to capture both local and global information in the medical images.

PRISM was tested on challenging medical datasets for identifying tumors in the colon, pancreas, liver, and kidney. Compared to other state-of-the-art methods, even without extensive prompt engineering, PRISM significantly outperformed them and achieved segmentation quality close to human-level performance. The code for PRISM is publicly available online.

Technical Explanation

PRISM is designed to accept various visual prompts as input, including sparse prompts like points, boxes, and scribbles, as well as dense prompts like segmentation masks. The model uses an iterative learning approach, where the segmentation is progressively improved by incorporating information from previous prompt inputs.

A key innovation in PRISM is its confidence learning mechanism, where multiple segmentation heads are used per input image. Each head generates a continuous segmentation map as well as a confidence score, allowing the model to optimize the most reliable predictions.

Additionally, PRISM employs a corrective learning step, where a shallow refinement network is used after each iteration to reassign any mislabeled voxels. This helps the model correct its mistakes over time.

The hybrid design of PRISM integrates both local and global information encoders to better capture the complex anatomical details and spatial relationships in 3D medical images.

Comprehensive validation of PRISM was performed on four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney. These datasets present challenges due to anatomical variations and ambiguous boundaries, making accurate tumor identification difficult. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly outperformed existing approaches and achieved results close to human-level performance.

Critical Analysis

The paper thoroughly evaluates PRISM's performance on a diverse set of medical imaging datasets, demonstrating its robustness and versatility. However, the authors acknowledge that further research is needed to improve PRISM's generalization to even more diverse medical imaging modalities and anatomical structures.

Additionally, while the paper highlights PRISM's ability to achieve human-level segmentation quality, it would be valuable to understand the specific use cases and clinical settings where such performance would be most impactful. Further research could explore the practical implications and potential real-world deployment of PRISM in medical workflows.

Conclusion

PRISM is a novel and robust interactive segmentation model that demonstrates significant improvements over state-of-the-art methods for 3D medical image segmentation. Its iterative, confidence-based, and corrective learning approach, combined with a hybrid encoder design, allows it to handle diverse visual prompts and accurately segment challenging anatomical structures.

The public availability of PRISM's code and its promising performance on several medical imaging datasets suggest that this technology could have a significant impact on clinical workflows and decision-making, ultimately enhancing the accuracy and efficiency of medical diagnosis and treatment planning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmentations by using visual prompts from previous iterations to achieve progressive improvement. (2) Confidence learning. PRISM employs multiple segmentation heads per input image, each generating a continuous map and a confidence score to optimize predictions. (3) Corrective learning. Following each segmentation iteration, PRISM employs a shallow corrective refinement network to reassign mislabeled voxels. (4) Hybrid design. PRISM integrates hybrid encoders to better capture both the local and global information. Comprehensive validation of PRISM is conducted using four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney, highlighting challenges caused by anatomical variations and ambiguous boundaries in accurate tumor identification. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly improves performance, achieving results that are close to human levels. The code is publicly available at https://github.com/MedICL-VU/PRISM.

4/24/2024

PRISM Lite: A lightweight model for interactive 3D placenta segmentation in ultrasound

Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz

Placenta volume measured from 3D ultrasound (3DUS) images is an important tool for tracking the growth trajectory and is associated with pregnancy outcomes. Manual segmentation is the gold standard, but it is time-consuming and subjective. Although fully automated deep learning algorithms perform well, they do not always yield high-quality results for each case. Interactive segmentation models could address this issue. However, there is limited work on interactive segmentation models for the placenta. Despite their segmentation accuracy, these methods may not be feasible for clinical use as they require relatively large computational power which may be especially prohibitive in low-resource environments, or on mobile devices. In this paper, we propose a lightweight interactive segmentation model aiming for clinical use to interactively segment the placenta from 3DUS images in real-time. The proposed model adopts the segmentation from our fully automated model for initialization and is designed in a human-in-the-loop manner to achieve iterative improvements. The Dice score and normalized surface Dice are used as evaluation metrics. The results show that our model can achieve superior performance in segmentation compared to state-of-the-art models while using significantly fewer parameters. Additionally, the proposed model is much faster for inference and robust to poor initial masks. The code is available at https://github.com/MedICL-VU/PRISM-placenta.

8/13/2024

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca

Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present emph{ScribblePrompt}, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding boxes. Through rigorous quantitative experiments, we demonstrate that given comparable amounts of interaction, ScribblePrompt produces more accurate segmentations than previous methods on datasets unseen during training. In a user study with domain experts, ScribblePrompt reduced annotation time by 28% while improving Dice by 15% compared to the next best method. ScribblePrompt's success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. We showcase ScribblePrompt in an interactive demo, provide code, and release a dataset of scribble annotations at https://scribbleprompt.csail.mit.edu

7/18/2024

📉

One-Prompt to Segment All Medical Images

Junde Wu, Jiayuan Zhu, Yuanpei Liu, Yueming Jin, Min Xu

Large foundation models, known for their strong zero-shot generalization, have excelled in visual and language applications. However, applying them to medical image segmentation, a domain with diverse imaging types and target labels, remains an open challenge. Current approaches, such as adapting interactive segmentation models like Segment Anything Model (SAM), require user prompts for each sample during inference. Alternatively, transfer learning methods like few/one-shot models demand labeled samples, leading to high costs. This paper introduces a new paradigm toward the universal medical image segmentation, termed 'One-Prompt Segmentation.' One-Prompt Segmentation combines the strengths of one-shot and interactive methods. In the inference stage, with just textbf{one prompted sample}, it can adeptly handle the unseen task in a single forward pass. We train One-Prompt Model on 64 open-source medical datasets, accompanied by the collection of over 3,000 clinician-labeled prompts. Tested on 14 previously unseen datasets, the One-Prompt Model showcases superior zero-shot segmentation capabilities, outperforming a wide range of related methods. The code and data is released as url{https://github.com/KidsWithTokens/one-prompt}.

4/12/2024