← ML Research Wiki / 1405.0312

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C Lawrence Zitnick, Piotr Doll, California Institute of Technology, University of California at Irvine (2014)

Paper Information

arXiv ID

1405.0312

Venue

European Conference on Computer Vision

Domain

Computer vision

SOTA Claim

Yes

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.This is achieved by gathering images of complex everyday scenes containing common objects in their natural context.Objects are labeled using per-instance segmentations to aid in precise object localization.Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old.With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation.We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN.Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model..

Summary

This paper presents the Microsoft Common Objects in Context (MS COCO) dataset aimed at enhancing object recognition within the broader scope of scene understanding. The dataset features images of everyday scenes with common objects in their natural contexts, supporting precise object localization through per-instance segmentation. MS COCO consists of 91 object categories and 2.5 million labeled instances spread across 328,000 images. It uniquely facilitates the study of non-iconic views, contextual reasoning between objects, and precise 2D localization. It contrasts with other prominent datasets like PASCAL and ImageNet in terms of instance density per image and the variety of contextual information. A detailed statistical analysis compares MS COCO to existing datasets and outlines challenges in gathering non-iconic images effectively. The dataset's creation utilized crowd sourcing via Amazon Mechanical Turk to ensure accuracy in labeling and segmentation using robust user interfaces. The paper also discusses the implications of the dataset for training and evaluating modern image recognition systems, alongside proposed future enhancements such as including 'stuff' categories and additional annotations for better performance evaluation.

Methods

This paper employs the following methods:

Crowd Sourcing
Instance Segmentation
Contextual Reasoning

Models Used

DPMv5-P
DPMv5-C

Datasets

The following datasets were used in this research:

ImageNet
PASCAL VOC
SUN
MS COCO

Evaluation Metrics

None specified

Results

The MS COCO dataset contains 2.5 million labeled instances in 328,000 images.
Models trained on MS COCO perform better on everyday scenes than those trained on prior datasets.
MS COCO has an average of 7.7 object instances per image compared to lower counts in other datasets.

Limitations

The authors identified the following limitations:

The dataset only includes 'thing' categories and does not yet label 'stuff' categories.
Initial segmentation quality varied due to the complexity of the task and varying annotator quality.

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

dataset object recognition scene understanding segmentation detection

Papers Using Similar Methods

External Resources

Funding: Microsoft
References: 52
Influential Citations: 6378

Microsoft COCO: Common Objects in Context

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers