sam-2

Maintainer: meta

Last updated 9/18/2024

Property	Value
Run this model	Run on Replicate
API spec	View on Replicate
Github link	View on Github
Paper link	View on Arxiv

Create account to get full access

Model overview

SAM 2: Segment Anything in Images and Videos is a foundation model for solving promptable visual segmentation in images and videos. It extends the original Segment Anything Model (SAM) by Meta to support video processing. The model design is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 is trained on the Segment Anything Video (SA-V) dataset, the largest video segmentation dataset to date, providing strong performance across a wide range of tasks and visual domains.

Model inputs and outputs

The SAM 2 model takes an image or video as input and allows users to provide prompts (such as points, boxes, or text) to segment relevant objects. The outputs include a combined mask covering all segmented objects as well as individual masks for each object.

Inputs

Image: The input image to perform segmentation on.
Use M2M: A boolean flag to use the model-in-the-loop data engine, which improves the model and data via user interaction.
Points Per Side: The number of points per side for mask generation.
Pred Iou Thresh: The predicted IoU threshold for mask prediction.
Stability Score Thresh: The stability score threshold for mask prediction.

Outputs

Combined Mask: A single combined mask covering all segmented objects.
Individual Masks: An array of individual masks for each segmented object.

Capabilities

SAM 2 can be used for a variety of visual segmentation tasks, including interactive segmentation, automatic mask generation, and video segmentation and tracking. It builds upon the strong performance of the original SAM model, while adding the capability to process video data.

What can I use it for?

SAM 2 can be used for a wide range of applications that require precise object segmentation, such as content creation, video editing, autonomous driving, and robotic manipulation. The video processing capabilities make it particularly useful for applications that involve dynamic scenes, such as surveillance, sports analysis, and live event coverage.

Things to try

With SAM 2, you can experiment with different types of prompts (points, boxes, or text) to see how they affect the segmentation results. You can also try the automatic mask generation feature to quickly isolate objects of interest without manual input. Additionally, the video processing capabilities allow you to track objects across multiple frames, which could be useful for applications like motion analysis or object tracking.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

segment-anything-everything

yyjim

The segment-anything-everything model, developed by Replicate creator yyjim, is a tryout of Meta's Segment Anything Model (SAM). SAM is a powerful AI model that can produce high-quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, giving it strong zero-shot performance on a variety of segmentation tasks. Similar models include ram-grounded-sam from idea-research, which combines SAM with a strong image tagging model, and the official segment-anything model from ybelkada, which provides detailed instructions on how to download and use the model. Model inputs and outputs The segment-anything-everything model takes an input image and allows you to specify various parameters for mask generation, such as whether to only return the mask (without the original image), the maximum number of masks to return, and different thresholds and settings for the mask prediction and post-processing. Inputs image**: The input image, provided as a URI. mask_only**: A boolean flag to indicate whether to only return the mask (without the original image). mask_limit**: The maximum number of masks to return. If set to -1 or None, all masks will be returned. crop_n_layers**: The number of layers of image crops to run the mask prediction on. Higher values can lead to more accurate masks but take longer to process. box_nms_thresh**: The box IoU cutoff used by non-maximal suppression to filter duplicate masks. crop_nms_thresh**: The box IoU cutoff used by non-maximal suppression to filter duplicate masks between different crops. points_per_side: The number of points to be sampled along one side of the image. The total number of points is points_per_side2. pred_iou_thresh**: A filtering threshold in [0, 1], using the model's predicted mask quality. crop_overlap_ratio**: The degree to which crops overlap, as a fraction of the image length. min_mask_region_area**: The minimum area (in pixels) for disconnected regions and holes in masks to be removed during post-processing. stability_score_offset**: The amount to shift the cutoff when calculating the stability score. stability_score_thresh**: A filtering threshold in [0, 1], using the stability of the mask under changes to the cutoff used to binarize the model's mask predictions. crop_n_points_downscale_factor**: The factor by which the number of points-per-side is scaled down in each subsequent layer of image crops. Outputs An array of URIs representing the generated masks. Capabilities The segment-anything-everything model can generate high-quality segmentation masks for objects in an image, even without explicit labeling or training on the specific objects. It can be used to segment a wide variety of objects, from household items to natural scenes, by providing simple input prompts such as points or bounding boxes. What can I use it for? The segment-anything-everything model can be useful for a variety of computer vision and image processing applications, such as: Object detection and segmentation**: Automatically identify and segment objects of interest in images or videos. Image editing and manipulation**: Easily select and extract specific objects from an image for further editing or compositing. Augmented reality**: Accurately segment objects in real-time for AR applications, such as virtual try-on or object occlusion. Robotics and autonomous systems**: Segment objects in the environment to aid in navigation, object manipulation, and scene understanding. Things to try One interesting thing to try with the segment-anything-everything model is to experiment with the various input parameters, such as the number of image crops, the point sampling density, and the different threshold settings. Adjusting these parameters can help you find the right balance between mask quality, processing time, and the specific needs of your application. Another idea is to try using the model in combination with other computer vision techniques, such as object detection or instance segmentation, to create more sophisticated pipelines for complex image analysis tasks. The model's zero-shot capabilities can be a powerful addition to a wider range of computer vision tools and workflows.

Updated Invalid Date

Image-to-Image

segment-anything-automatic

pablodawson

The segment-anything-automatic model, created by pablodawson, is a version of the Segment Anything Model (SAM) that can automatically generate segmentation masks for all objects in an image. SAM is a powerful AI model developed by Meta AI Research that can produce high-quality object masks from simple input prompts like points or bounding boxes. Similar models include segment-anything-everything and the official segment-anything model. Model inputs and outputs The segment-anything-automatic model takes an image as its input and automatically generates segmentation masks for all objects in the image. The model supports various input parameters to control the mask generation process, such as the resize width, the number of crop layers, the non-maximum suppression thresholds, and more. Inputs image**: The input image to generate segmentation masks for. resize_width**: The width to resize the image to before running inference (default is 1024). crop_n_layers**: The number of layers to run mask prediction on crops of the image (default is 0). box_nms_thresh**: The box IoU cutoff used by non-maximal suppression to filter duplicate masks (default is 0.7). crop_nms_thresh**: The box IoU cutoff used by non-maximal suppression to filter duplicate masks between different crops (default is 0.7). points_per_side**: The number of points to be sampled along one side of the image (default is 32). pred_iou_thresh**: A filtering threshold between 0 and 1 using the model's predicted mask quality (default is 0.88). crop_overlap_ratio**: The degree to which crops overlap (default is 0.3413333333333333). min_mask_region_area**: The minimum area of a mask region to keep after postprocessing (default is 0). stability_score_offset**: The amount to shift the cutoff when calculating the stability score (default is 1). stability_score_thresh**: A filtering threshold between 0 and 1 using the stability of the mask under changes to the cutoff (default is 0.95). crop_n_points_downscale_factor**: The factor to scale down the number of points-per-side sampled in each layer (default is 1). Outputs Output**: A URI to the generated segmentation masks for the input image. Capabilities The segment-anything-automatic model can automatically generate high-quality segmentation masks for all objects in an image, without requiring any manual input prompts. This makes it a powerful tool for tasks like image analysis, object detection, and image editing. The model's strong zero-shot performance allows it to work well on a variety of image types and scenes. What can I use it for? The segment-anything-automatic model can be used for a wide range of applications, including: Image analysis**: Automatically detect and segment all objects in an image for further analysis. Object detection**: Use the generated masks to identify and locate specific objects within an image. Image editing**: Leverage the precise segmentation masks to selectively modify or remove objects in an image. Automation**: Integrate the model into image processing pipelines to automate repetitive segmentation tasks. Things to try Some interesting things to try with the segment-anything-automatic model include: Experiment with the various input parameters to see how they affect the generated masks, and find the optimal settings for your specific use case. Combine the segmentation masks with other computer vision techniques, such as object classification or instance segmentation, to build more advanced image processing applications. Explore using the model for creative applications, such as image compositing or digital artwork, where the precise segmentation capabilities can be valuable. Compare the performance of the segment-anything-automatic model to similar models, such as segment-anything-everything or the official segment-anything model, to find the best fit for your needs.

Updated Invalid Date

Image-to-Image

segment-anything-tryout

yyjim

segment-anything-tryout is a tryout version of the Segment Anything Model (SAM) developed by Meta AI Research. SAM is a powerful image segmentation model that can generate high-quality object masks from input prompts like points or bounding boxes. It has been trained on a massive dataset of 11 million images and 1.1 billion masks, giving it strong zero-shot performance across a variety of segmentation tasks. Similar models like segment-anything-everything and ram-grounded-sam also utilize the SAM approach, demonstrating its broad applicability. The official segment-anything model provides even more details and usage instructions. Model inputs and outputs segment-anything-tryout takes two primary inputs: an image and an optional set of prompts such as points or bounding boxes. The model then outputs a set of segmentation masks corresponding to the objects in the image. Inputs image**: The input image to generate masks for box**: Bounding box coordinates [x, y, w, h] to use as a prompt. If not provided, the entire image will be used. mask_only**: If True, the output will only include the mask(s), without any additional metadata. multimask_output**: If True, the output will be a list of masks. If False, the output will be a single mask. Outputs Output**: An array of URIs pointing to the generated segmentation mask(s) for the input image. Capabilities The Segment Anything Model (SAM) has impressive zero-shot capabilities, allowing it to generate accurate segmentation masks without any fine-tuning or additional training. It can handle a variety of object types and scenes, as demonstrated by the example outputs on the project website. This makes SAM a highly versatile tool for image understanding and analysis tasks. What can I use it for? segment-anything-tryout and the full SAM model can be used for a wide range of computer vision applications that require accurate object segmentation. Some potential use cases include: Automating photo/image editing tasks by allowing users to easily select and manipulate specific objects Improving image search and retrieval by enabling more fine-grained queries Supporting robotic and autonomous systems that need to understand their surroundings The model's zero-shot capabilities also make it well-suited for rapidly prototyping and exploring new computer vision applications without the need for extensive dataset collection and model training. Things to try One interesting aspect of SAM is its ability to generate masks from a variety of input prompts, not just bounding boxes. Try experimenting with different types of prompts, such as clicking on specific points of interest or drawing rough outlines around objects. This can help you understand the model's flexibility and discover new ways to leverage its segmentation capabilities. Another avenue to explore is the model's performance on different types of images and scenes. While the examples showcase its ability to handle common objects, you could try challenging it with more complex or unusual imagery to see how it responds. This can help uncover the model's strengths and limitations.

Updated Invalid Date

Image-to-Image

🤔

ram-grounded-sam

idea-research

1.3K

ram-grounded-sam is an AI model that combines the strengths of the Recognize Anything Model (RAM) and the Grounded-Segment-Anything model. It exhibits exceptional recognition abilities, capable of detecting and segmenting a wide range of common objects in images using free-form text prompts. This model builds upon the powerful Segment Anything Model (SAM) and the Grounding DINO detector to provide a robust and versatile tool for visual understanding tasks. Model inputs and outputs The ram-grounded-sam model takes an input image and a text prompt as inputs, and generates segmentation masks for the objects and regions described in the prompt. The text prompt can be a free-form description of the objects or scenes of interest, allowing for flexible and expressive control over the model's behavior. Inputs Image**: The input image for which the model will generate segmentation masks. Text Prompt**: A free-form text description of the objects or scenes of interest in the input image. Outputs Segmentation Masks**: The model outputs a set of segmentation masks, each corresponding to an object or region described in the text prompt. These masks precisely outline the boundaries of the detected entities. Bounding Boxes**: The model also provides bounding boxes around the detected objects, which can be useful for tasks like object detection or localization. Confidence Scores**: The model outputs confidence scores for each detected object, indicating the model's certainty about the presence and precise segmentation of the corresponding entity. Capabilities The ram-grounded-sam model is capable of detecting and segmenting a wide variety of common objects and scenes in images, ranging from everyday household items to complex natural landscapes. It can handle prompts that describe multiple objects or scenes, and can accurately segment all the relevant entities. The model's strong zero-shot performance allows it to generalize to new domains and tasks beyond its training data. What can I use it for? ram-grounded-sam can be a powerful tool for a variety of computer vision and image understanding tasks. Some potential applications include: Automated Image Annotation**: The model can be used to automatically generate detailed labels and masks for the contents of images, which can be valuable for building and annotating large-scale image datasets. Interactive Image Editing**: By providing precise segmentation of objects and regions, the model can enable intuitive and fine-grained image editing capabilities, where users can select and manipulate specific elements of an image. Visual Question Answering**: The model's ability to understand and segment image contents based on text prompts can be leveraged to build more advanced visual question answering systems. Robotic Perception**: The model's real-time segmentation capabilities could be integrated into robotic systems to enable more fine-grained visual understanding and interaction with the environment. Things to try One interesting aspect of the ram-grounded-sam model is its ability to handle complex and open-ended text prompts. Try providing prompts that describe multiple objects or scenes, or use more abstract or descriptive language to see how the model responds. You can also experiment with providing the model with challenging or unusual images to test its generalization capabilities. Another interesting direction to explore is combining ram-grounded-sam with other AI models, such as language models or generative models, to enable more advanced image understanding and manipulation tasks. For example, you could use the model's segmentation outputs to guide the generation of new image content or the editing of existing images.

Updated Invalid Date

Image-to-Text