← ML Research Wiki / 1409.4842

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Google Inc University of North Carolina Chapel Hill, Google Inc Google Inc University of Michigan Google Inc Google Inc Google Inc Google Inc, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Google Inc University of North Carolina Chapel Hill, Google Inc Google Inc University of Michigan Google Inc Google Inc Google Inc Google Inc, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Google Inc University of North Carolina Chapel Hill, Google Inc Google Inc University of Michigan Google Inc Google Inc Google Inc Google Inc (2014)

Paper Information

arXiv ID

1409.4842

Venue

Computer Vision and Pattern Recognition

Domain

computer vision

SOTA Claim

Yes

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

Summary

The paper presents a deep convolutional neural network architecture called Inception, specifically the GoogLeNet model, which achieved state-of-the-art results in the ImageNet Large-Scale Visual Recognition Challenge 2014. The architecture emphasizes improved utilization of computing resources through intricate design, increasing both depth and width without exceeding computational budgets. Key insights include using 1x1 convolutions for dimensionality reduction and the incorporation of multi-scale processing. The model achieves significant results in both classification and detection tasks, outperforming previous architectures while using fewer parameters. The paper discusses the importance of efficient architectural choices in the context of mobile and embedded environments, highlighting the balance between accuracy and computational efficiency. The results demonstrate that approximating optimal sparse structures with dense components can yield competitive performance in object detection and image classification tasks, reinforcing the efficacy of the Inception architecture.

Methods

This paper employs the following methods:

Convolutional Neural Network (CNN)
Inception Module

Models Used

GoogLeNet

Datasets

The following datasets were used in this research:

ImageNet

Evaluation Metrics

Top-1 Accuracy
Top-5 Error Rate
Mean Average Precision (mAP)

Results

Top-5 error of 6.67% in ILSVRC 2014 classification
Mean Average Precision (mAP) of 38.02% in ILSVRC 2014 detection

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Inception architecture GoogLeNet deep learning convolutional neural networks ImageNet

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 264
Influential Citations: 4203

Going deeper with convolutions

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers