Promptable Seg (2024)
Grounded SAM proposes an innovative framework that integrates open-set detector models with promptable segmentation models to tackle complex open-world visual tasks. The paper identifies current methodologies in open-world visual perception, including Unified Models, LLM as Controllers, and Ensemble Foundation Models, and introduces Grounded SAM as a flexible solution that facilitates efficient assembly of diverse expert models. Key capabilities include open-set segmentation, automatic image annotation through RAM-Grounded-SAM, and highly controllable image editing with Grounded-SAM-SD. The effectiveness of Grounded SAM is validated on the Segmentation in the Wild (SGinW) benchmark, showing significant performance improvements over previous approaches. Future prospects include enhancing annotation processes, leveraging large language models for execution of computer vision tasks, and creating new datasets.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: