Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C Lawrence Zitnick, Piotr Doll, California Institute of Technology, University of California at Irvine (2014)
This paper presents the Microsoft Common Objects in Context (MS COCO) dataset aimed at enhancing object recognition within the broader scope of scene understanding. The dataset features images of everyday scenes with common objects in their natural contexts, supporting precise object localization through per-instance segmentation. MS COCO consists of 91 object categories and 2.5 million labeled instances spread across 328,000 images. It uniquely facilitates the study of non-iconic views, contextual reasoning between objects, and precise 2D localization. It contrasts with other prominent datasets like PASCAL and ImageNet in terms of instance density per image and the variety of contextual information. A detailed statistical analysis compares MS COCO to existing datasets and outlines challenges in gathering non-iconic images effectively. The dataset's creation utilized crowd sourcing via Amazon Mechanical Turk to ensure accuracy in labeling and segmentation using robust user interfaces. The paper also discusses the implications of the dataset for training and evaluating modern image recognition systems, alongside proposed future enhancements such as including 'stuff' categories and additional annotations for better performance evaluation.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: