← ML Research Wiki / 1409.1556

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan Visual Geometry Group University of Oxford, Andrew Zisserman Visual Geometry Group University of Oxford (2014)

Paper Information
arXiv ID
Venue
International Conference on Learning Representations
Domain
Computer vision
SOTA Claim
Yes
Reproducibility
8/10

Abstract

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve the stateof-the-art results. Importantly, we have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.1

Summary

In this paper, the authors explore the impact of convolutional network (ConvNet) depth on accuracy in large-scale image recognition, reporting on a significant performance improvement achieved by using up to 19 weight layers. This work contributed to their successful submission to the ImageNet Challenge 2014, where they secured top positions in both classification and localization tasks. The authors present two architectures, "Net-D" and "Net-E", demonstrating state-of-the-art results in image classification on the ILSVRC-2012 dataset, and show that their models generalize well to other datasets, enhancing research in deep visual representations. The paper provides a detailed methodology for training, evaluation, and the architectures employed, as well as discussing the implementation and enhancements over previous ConvNet designs.

Methods

This paper employs the following methods:

  • Convolutional Neural Networks

Models Used

  • Net-D
  • Net-E

Datasets

The following datasets were used in this research:

  • ImageNet

Evaluation Metrics

  • top-1 error
  • top-5 error

Results

  • Improved accuracy with deeper ConvNets on ILSVRC-2012
  • Achieved state-of-the-art results in ILSVRC classification and localisation tasks

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: 4
  • GPU Type: NVIDIA Titan Black

Keywords

deep learning convolutional neural networks ImageNet large-scale recognition

Papers Using Similar Methods

External Resources