← ML Research Wiki / 1409.0575

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, · Hao, Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C Berg, · Li, Fei-Fei, O Russakovsky, J Deng, H Su, J Krause, S Satheesh, S Ma, Z Huang, A Karpathy, A Khosla, M Bernstein, A C Berg, L Fei-Fei, Stanford University StanfordCAUSA, University of Michigan Ann ArborMIUSA, Stanford University StanfordCAUSA, Stanford University StanfordCAUSA, Stanford University StanfordCAUSA, Stanford University StanfordCAUSA, Stanford University StanfordCAUSA, Stanford University StanfordCAUSA, Massachusetts Institute of Technology CambridgeMAUSA, Stanford University StanfordCAUSA, UNC Chapel Hill Chapel HillNCUSA, Stanford University StanfordCAUSA (2014)

Paper Information

arXiv ID

1409.0575

Venue

International Journal of Computer Vision

Domain

Not specified

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the chal-lenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

Summary

This paper presents the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark for object classification and detection involving hundreds of categories and millions of images. It discusses the dataset's creation, the challenges encountered in collecting and annotating large-scale data, and innovations in object recognition that emerged from this challenge. The authors analyze progress made over the years, how state-of-the-art algorithms now compare to human accuracy, and propose future directions for research in large-scale image recognition. Key goals include detailing the dataset's annotation process, highlighting key achievements, and assessing the current state of categorical object recognition.

Methods

This paper employs the following methods:

Crowdsourcing
Deep Learning
Convolutional Neural Networks

Models Used

GoogLeNet
SuperVision

Datasets

The following datasets were used in this research:

ImageNet

Evaluation Metrics

Top-5 Error
Mean Average Precision

Results

Reduction of image classification error from 28.2% to 6.7%
Reduction of single-object localization error from 42.5% to 25.3%
Increased mean average precision in object detection from 22.6% in 2013 to 43.9% in 2014

Limitations

The authors identified the following limitations:

Annotation errors from non-expert crowd labelers
Challenges of scaling and designing annotations for large datasets

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 124
Influential Citations: 4673

ImageNet Large Scale Visual Recognition Challenge

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers