← ML Research Wiki / 1709.01507

Squeeze-and-Excitation Networks

Jie Hu [email protected] University of Oxford Momenta, Momenta, Li Shen [email protected] University of Oxford Momenta, Momenta, Gang Sun [email protected] University of Oxford Momenta, Momenta, Jie Hu [email protected] University of Oxford Momenta, Momenta, Li Shen [email protected] University of Oxford Momenta, Momenta, Gang Sun [email protected] University of Oxford Momenta, Momenta (2017)

Paper Information
arXiv ID
Venue
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Domain
artificial intelligence, deep learning
SOTA Claim
Yes
Reproducibility
7/10

Abstract

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, much existing work has shown the benefits of enhancing spatial encoding. In this work, we focus on channels and propose a novel architectural unit, which we term the "Squeeze-and-Excitation"(SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at slight computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ∼25% relative improvement over the winning entry of 2016.

Summary

This paper presents the Squeeze-and-Excitation Networks (SENets), a novel architectural unit that enhances convolutional neural networks (CNNs) by adaptively recalibrating channel-wise feature responses through a process called the Squeeze-and-Excitation (SE) block. The SE block improves representational power by explicitly modeling interdependencies among channels, yielding networks that generalize well across various datasets. The authors show that stacking SE blocks leads to performance improvements at a minimal computational cost, as demonstrated in their successful submission to the ILSVRC 2017 competition, where they achieved a top-5 error of 2.251%, a 25% relative improvement over the previous year. Extensive evaluations on the ImageNet 2012 dataset highlight the effectiveness of SENets, showing consistent performance gains across various architectures such as SE-ResNet, SE-Inception, and SE-ResNeXt, confirming their broad applicability in CNN designs.

Methods

This paper employs the following methods:

  • Squeeze-and-Excitation (SE) block
  • Global average pooling
  • Sigmoid gating mechanism

Models Used

  • SE-ResNet
  • SE-Inception
  • SE-ResNeXt

Datasets

The following datasets were used in this research:

  • ImageNet 2012
  • Places365-Challenge

Evaluation Metrics

  • Top-1 error
  • Top-5 error

Results

  • SENets won first place in ILSVRC 2017
  • Achieved a top-5 error of 2.251% on ImageNet 2012
  • Consistent performance improvements across various architectures

Limitations

The authors identified the following limitations:

  • None specified

Technical Requirements

  • Number of GPUs: 8
  • GPU Type: NVIDIA Titan X

Keywords

Squeeze-and-Excitation SE block convolutional neural networks channel-wise feature recalibration attention mechanism ImageNet

Papers Using Similar Methods

External Resources