← ML Research Wiki / 1709.01507

Squeeze-and-Excitation Networks

Jie Hu [email protected] University of Oxford Momenta, Momenta, Li Shen [email protected] University of Oxford Momenta, Momenta, Gang Sun [email protected] University of Oxford Momenta, Momenta, Jie Hu [email protected] University of Oxford Momenta, Momenta, Li Shen [email protected] University of Oxford Momenta, Momenta, Gang Sun [email protected] University of Oxford Momenta, Momenta (2017)

Paper Information

arXiv ID

1709.01507

Venue

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Domain

artificial intelligence, deep learning

SOTA Claim

Yes

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, much existing work has shown the benefits of enhancing spatial encoding. In this work, we focus on channels and propose a novel architectural unit, which we term the "Squeeze-and-Excitation"(SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at slight computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ∼25% relative improvement over the winning entry of 2016.

Summary

This paper presents the Squeeze-and-Excitation Networks (SENets), a novel architectural unit that enhances convolutional neural networks (CNNs) by adaptively recalibrating channel-wise feature responses through a process called the Squeeze-and-Excitation (SE) block. The SE block improves representational power by explicitly modeling interdependencies among channels, yielding networks that generalize well across various datasets. The authors show that stacking SE blocks leads to performance improvements at a minimal computational cost, as demonstrated in their successful submission to the ILSVRC 2017 competition, where they achieved a top-5 error of 2.251%, a 25% relative improvement over the previous year. Extensive evaluations on the ImageNet 2012 dataset highlight the effectiveness of SENets, showing consistent performance gains across various architectures such as SE-ResNet, SE-Inception, and SE-ResNeXt, confirming their broad applicability in CNN designs.

Methods

This paper employs the following methods:

Squeeze-and-Excitation (SE) block
Global average pooling
Sigmoid gating mechanism

Models Used

SE-ResNet
SE-Inception
SE-ResNeXt

Datasets

The following datasets were used in this research:

ImageNet 2012
Places365-Challenge

Evaluation Metrics

Top-1 error
Top-5 error

Results

SENets won first place in ILSVRC 2017
Achieved a top-5 error of 2.251% on ImageNet 2012
Consistent performance improvements across various architectures

Limitations

The authors identified the following limitations:

None specified

Technical Requirements

Number of GPUs: 8
GPU Type: NVIDIA Titan X

Keywords

Squeeze-and-Excitation SE block convolutional neural networks channel-wise feature recalibration attention mechanism ImageNet

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 85
Influential Citations: 1854

Squeeze-and-Excitation Networks

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers