← ML Research Wiki / 1512.00567

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy [email protected] Google Inc Zbigniew Wojna University College London, Vincent Vanhoucke [email protected] Google Inc Zbigniew Wojna University College London, Sergey Ioffe [email protected] Google Inc Zbigniew Wojna University College London, Jonathon Shlens [email protected] Google Inc Zbigniew Wojna University College London (2015)

Paper Information
arXiv ID
Venue
Computer Vision and Pattern Recognition
Domain
Not specified
SOTA Claim
Yes
Reproducibility
7/10

Abstract

Convolutional networks are at the core of most stateof-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.Still, the complexity of the Inception architecture makes

Summary

The paper "Rethinking the Inception Architecture for Computer Vision" by Christian Szegedy et al. discusses advancements in convolutional neural networks, specifically focusing on the Inception architecture. It emphasizes the importance of computational efficiency while improving model performance, particularly in constrained environments such as mobile vision. The authors propose several design principles for scaling convolutional networks and outline methods for factorizing convolutions to enhance computational savings. The study benchmarks their approach using the ILSVRC 2012 classification challenge validation set, reporting significant improvements over prior state-of-the-art results, including a top-1 error rate of 21.2% and top-5 error rate of 5.6% with a network that has a low parameter count. They detail enhancements made in the Inception-v3 architecture, which builds upon Inception-v2 by incorporating factorized convolutions, grid size reduction techniques, and auxiliary classifiers, thereby achieving high performance while minimizing computational costs. The paper also introduces a label-smoothing technique to regularize model predictions, further improving accuracy. Overall, the authors contribute valuable insights into the design and optimization of deep learning architectures for computer vision tasks, showing that low computational cost can coexist with high accuracy.

Methods

This paper employs the following methods:

  • Convolutional Neural Networks
  • Factorized Convolutions

Models Used

  • Inception-v2
  • Inception-v3

Datasets

The following datasets were used in this research:

  • ILSVRC 2012

Evaluation Metrics

  • Top-1 error
  • Top-5 error

Results

  • 21.2% top-1 error
  • 5.6% top-5 error
  • 3.5% top-5 error with ensemble models

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: 50
  • GPU Type: NVidia Kepler

Papers Using Similar Methods

External Resources