← ML Research Wiki / 1704.04861

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard [email protected] Hartwig Adam Google Inc, Menglong Zhu [email protected] Hartwig Adam Google Inc, Bo Chen [email protected] Hartwig Adam Google Inc, Dmitry Kalenichenko [email protected] Hartwig Adam Google Inc, Weijun Wang [email protected] Hartwig Adam Google Inc, Tobias Weyand [email protected] Hartwig Adam Google Inc, Marco Andreetto Hartwig Adam Google Inc (2017)

Paper Information

arXiv ID

1704.04861

Venue

arXiv.org

Domain

computer vision

SOTA Claim

Yes

Reproducibility

5/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyperparameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

Summary

This paper presents MobileNets, a class of efficient models designed for mobile and embedded vision applications, utilizing a streamlined architecture that implements depthwise separable convolutions. The authors introduce two global hyperparameters: the width multiplier and the resolution multiplier, which allow model builders to efficiently trade off between latency and accuracy to suit their specific application needs. Extensive experiments conducted demonstrate that MobileNets outperform other popular models in terms of resource efficiency while maintaining strong accuracy, particularly on the ImageNet classification task. The authors provide evidence of MobileNets' versatility across various applications including object detection, fine-grained classification, and geo-localization. Overall, MobileNets are shown to be capable of delivering comparable performance to larger models with significantly reduced computational and size requirements.

Methods

This paper employs the following methods:

Depthwise Separable Convolution
Width Multiplier
Resolution Multiplier

Models Used

MobileNet

Datasets

The following datasets were used in this research:

ImageNet
Stanford Dogs
COCO

Evaluation Metrics

Accuracy
Mean Average Precision (mAP)

Results

Strong performance compared to other popular models on ImageNet classification
MobileNets deliver efficient performance in object detection, fine-grained classification, and geo-localization.

Limitations

The authors identified the following limitations:

None specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

MobileNets efficient neural networks depthwise separable convolutions model compression hyperparameters

Papers Using Similar Methods

External Resources

Funding: Google Inc
References: 38
Influential Citations: 2682

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers