← ML Research Wiki / 1801.04381

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler [email protected] Google Inc, Andrew Howard [email protected] Google Inc, Menglong Zhu [email protected] Google Inc, Andrey Zhmoginov [email protected] Google Inc, Liang-Chieh Chen [email protected] Google Inc (2018)

Paper Information

arXiv ID

1801.04381

Venue

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Domain

Computer vision

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3.is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design.Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection[2], VOC image segmentation[3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

Summary

The paper presents MobileNetV2, a novel mobile architecture that improves state-of-the-art performance for various mobile tasks and benchmarks while optimizing resource efficiency. The architecture is built on an inverted residual structure, leveraging linear bottlenecks that facilitate effective feature representation through an innovative and efficient convolutional block design. The proposed models demonstrate superior performance in image classification (ImageNet), object detection (COCO), and image segmentation (VOC), achieving higher accuracy and lower computational costs compared to existing models like MobileNetV1, ShuffleNet, and YOLOv2. The authors introduce methods like SSDLite for mobile object detection, emphasizing efficiency in memory usage and processing speed.

Methods

This paper employs the following methods:

Inverted Residuals
Linear Bottlenecks
Depthwise Separable Convolutions

Models Used

MobileNetV2
MobileNetV1
DeepLabv3
SSDLite

Datasets

The following datasets were used in this research:

ImageNet
COCO
VOC

Evaluation Metrics

mAP
mIOU
Multiply-Adds

Results

Achieved state-of-the-art accuracy on ImageNet
Outperformed YOLOv2 on COCO dataset
MobileNetV2 SSDLite is 20× more efficient and 10× smaller than YOLOv2

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: 16
GPU Type: None specified

Keywords

MobileNetV2 inverted residuals linear bottlenecks depthwise separable convolutions efficient neural networks

Papers Using Similar Methods

External Resources

Funding: Google Inc.
References: 51
Influential Citations: 2716

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers