← ML Research Wiki / 1409.4842

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Google Inc University of North Carolina Chapel Hill, Google Inc Google Inc University of Michigan Google Inc Google Inc Google Inc Google Inc, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Google Inc University of North Carolina Chapel Hill, Google Inc Google Inc University of Michigan Google Inc Google Inc Google Inc Google Inc, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Google Inc University of North Carolina Chapel Hill, Google Inc Google Inc University of Michigan Google Inc Google Inc Google Inc Google Inc (2014)

Paper Information
arXiv ID
Venue
Computer Vision and Pattern Recognition
Domain
computer vision
SOTA Claim
Yes

Abstract

We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

Summary

The paper presents a deep convolutional neural network architecture called Inception, specifically the GoogLeNet model, which achieved state-of-the-art results in the ImageNet Large-Scale Visual Recognition Challenge 2014. The architecture emphasizes improved utilization of computing resources through intricate design, increasing both depth and width without exceeding computational budgets. Key insights include using 1x1 convolutions for dimensionality reduction and the incorporation of multi-scale processing. The model achieves significant results in both classification and detection tasks, outperforming previous architectures while using fewer parameters. The paper discusses the importance of efficient architectural choices in the context of mobile and embedded environments, highlighting the balance between accuracy and computational efficiency. The results demonstrate that approximating optimal sparse structures with dense components can yield competitive performance in object detection and image classification tasks, reinforcing the efficacy of the Inception architecture.

Methods

This paper employs the following methods:

  • Convolutional Neural Network (CNN)
  • Inception Module

Models Used

  • GoogLeNet

Datasets

The following datasets were used in this research:

  • ImageNet

Evaluation Metrics

  • Top-1 Accuracy
  • Top-5 Error Rate
  • Mean Average Precision (mAP)

Results

  • Top-5 error of 6.67% in ILSVRC 2014 classification
  • Mean Average Precision (mAP) of 38.02% in ILSVRC 2014 detection

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

Inception architecture GoogLeNet deep learning convolutional neural networks ImageNet

Papers Using Similar Methods

External Resources