← ML Research Wiki / 1704.04861

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard [email protected] Hartwig Adam Google Inc, Menglong Zhu [email protected] Hartwig Adam Google Inc, Bo Chen [email protected] Hartwig Adam Google Inc, Dmitry Kalenichenko [email protected] Hartwig Adam Google Inc, Weijun Wang [email protected] Hartwig Adam Google Inc, Tobias Weyand [email protected] Hartwig Adam Google Inc, Marco Andreetto Hartwig Adam Google Inc (2017)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
computer vision
SOTA Claim
Yes
Reproducibility
5/10

Abstract

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyperparameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

Summary

This paper presents MobileNets, a class of efficient models designed for mobile and embedded vision applications, utilizing a streamlined architecture that implements depthwise separable convolutions. The authors introduce two global hyperparameters: the width multiplier and the resolution multiplier, which allow model builders to efficiently trade off between latency and accuracy to suit their specific application needs. Extensive experiments conducted demonstrate that MobileNets outperform other popular models in terms of resource efficiency while maintaining strong accuracy, particularly on the ImageNet classification task. The authors provide evidence of MobileNets' versatility across various applications including object detection, fine-grained classification, and geo-localization. Overall, MobileNets are shown to be capable of delivering comparable performance to larger models with significantly reduced computational and size requirements.

Methods

This paper employs the following methods:

  • Depthwise Separable Convolution
  • Width Multiplier
  • Resolution Multiplier

Models Used

  • MobileNet

Datasets

The following datasets were used in this research:

  • ImageNet
  • Stanford Dogs
  • COCO

Evaluation Metrics

  • Accuracy
  • Mean Average Precision (mAP)

Results

  • Strong performance compared to other popular models on ImageNet classification
  • MobileNets deliver efficient performance in object detection, fine-grained classification, and geo-localization.

Limitations

The authors identified the following limitations:

  • None specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

MobileNets efficient neural networks depthwise separable convolutions model compression hyperparameters

Papers Using Similar Methods

External Resources