← ML Research Wiki / 2304.02643

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick (2023)

Paper Information
arXiv ID
Venue
IEEE International Conference on Computer Vision
Domain
computer vision
SOTA Claim
Yes
Reproducibility
8/10

Abstract

Meta AI Research, FAIR (b) Model: Segment Anything Model (SAM) prompt image valid mask image encoder prompt encoder lightweight mask decoder (a) Task: promptable segmentation segmentation prompt image model cat with black ears

Summary

The Segment Anything project introduces a foundation model for image segmentation through the development of three components: a promptable segmentation task, an innovative segmentation model termed SAM, and a data engine that collects a large dataset named SA-1B. The project aims to improve segmentation by employing prompt engineering, allowing for efficient data annotation and zero-shot generalization across tasks. The SA-1B dataset consists of over 1.1 billion masks gathered from 11 million licensed images, significantly surpassing existing segmentation datasets in both quantity and diversity. Through human evaluations and experiments, the efficacy of SAM in generating high-quality segmentation masks is demonstrated, proving its capabilities in various segmentation tasks, including edge detection, object proposals, and text-to-mask segmentation. Despite its advantages, the model does exhibit limitations, including challenges with ambiguity and the processing of fine structures, necessitating careful consideration of its application in real-world settings.

Methods

This paper employs the following methods:

  • Segment Anything Model (SAM)
  • promptable segmentation task

Models Used

  • Segment Anything Model (SAM)

Datasets

The following datasets were used in this research:

  • SA-1B

Evaluation Metrics

  • mIoU
  • average precision (AP)
  • mean intersection-over-union (mIoU)
  • average recall (AR)

Results

  • SA-1B dataset contains over 1 billion masks from 11 million images.
  • SAM demonstrates high-quality mask generation in various experiments, outperforming other existing methods in certain cases.

Limitations

The authors identified the following limitations:

  • Performance is impacted by ambiguity in user prompts.
  • May miss fine structural details in segmentation.

Technical Requirements

  • Number of GPUs: 256
  • GPU Type: NVIDIA A100

Keywords

Foundation model prompt engineering large-scale dataset interactive segmentation zero-shot transfer

Papers Using Similar Methods

External Resources