Venue
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3.is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design.Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection[2], VOC image segmentation[3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.
The paper presents MobileNetV2, a novel mobile architecture that improves state-of-the-art performance for various mobile tasks and benchmarks while optimizing resource efficiency. The architecture is built on an inverted residual structure, leveraging linear bottlenecks that facilitate effective feature representation through an innovative and efficient convolutional block design. The proposed models demonstrate superior performance in image classification (ImageNet), object detection (COCO), and image segmentation (VOC), achieving higher accuracy and lower computational costs compared to existing models like MobileNetV1, ShuffleNet, and YOLOv2. The authors introduce methods like SSDLite for mobile object detection, emphasizing efficiency in memory usage and processing speed.
This paper employs the following methods:
- Inverted Residuals
- Linear Bottlenecks
- Depthwise Separable Convolutions
- MobileNetV2
- MobileNetV1
- DeepLabv3
- SSDLite
The following datasets were used in this research:
- Achieved state-of-the-art accuracy on ImageNet
- Outperformed YOLOv2 on COCO dataset
- MobileNetV2 SSDLite is 20× more efficient and 10× smaller than YOLOv2
The authors identified the following limitations:
- Number of GPUs: 16
- GPU Type: None specified
MobileNetV2
inverted residuals
linear bottlenecks
depthwise separable convolutions
efficient neural networks