← ML Research Wiki / 2304.02643

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick (2023)

Paper Information

arXiv ID

2304.02643

Venue

IEEE International Conference on Computer Vision

Domain

computer vision

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Meta AI Research, FAIR (b) Model: Segment Anything Model (SAM) prompt image valid mask image encoder prompt encoder lightweight mask decoder (a) Task: promptable segmentation segmentation prompt image model cat with black ears

Summary

The Segment Anything project introduces a foundation model for image segmentation through the development of three components: a promptable segmentation task, an innovative segmentation model termed SAM, and a data engine that collects a large dataset named SA-1B. The project aims to improve segmentation by employing prompt engineering, allowing for efficient data annotation and zero-shot generalization across tasks. The SA-1B dataset consists of over 1.1 billion masks gathered from 11 million licensed images, significantly surpassing existing segmentation datasets in both quantity and diversity. Through human evaluations and experiments, the efficacy of SAM in generating high-quality segmentation masks is demonstrated, proving its capabilities in various segmentation tasks, including edge detection, object proposals, and text-to-mask segmentation. Despite its advantages, the model does exhibit limitations, including challenges with ambiguity and the processing of fine structures, necessitating careful consideration of its application in real-world settings.

Methods

This paper employs the following methods:

Segment Anything Model (SAM)
promptable segmentation task

Models Used

Segment Anything Model (SAM)

Datasets

The following datasets were used in this research:

SA-1B

Evaluation Metrics

mIoU
average precision (AP)
mean intersection-over-union (mIoU)
average recall (AR)

Results

SA-1B dataset contains over 1 billion masks from 11 million images.
SAM demonstrates high-quality mask generation in various experiments, outperforming other existing methods in certain cases.

Limitations

The authors identified the following limitations:

Performance is impacted by ambiguity in user prompts.
May miss fine structural details in segmentation.

Technical Requirements

Number of GPUs: 256
GPU Type: NVIDIA A100

Keywords

Foundation model prompt engineering large-scale dataset interactive segmentation zero-shot transfer

Papers Using Similar Methods

External Resources

Funding: Meta AI
References: 148
Influential Citations: 1023

Segment Anything

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers