← ML Research Wiki / 1512.02325

SSD: Single Shot MultiBox Detector

Wei Liu UNC Chapel Hill, Dragomir Anguelov Zoox Inc. 3 Google Inc, Dumitru Erhan, Christian Szegedy [email protected], Scott Reed [email protected] University of Michigan Ann-Arbor, Cheng-Yang Fu UNC Chapel Hill, Alexander C Berg [email protected] UNC Chapel Hill (2015)

Paper Information

arXiv ID

1512.02325

Venue

European Conference on Computer Vision

Domain

computer vision

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For 300 × 300 input, SSD achieves 74.3% mAP 1 on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512 × 512 input, SSD achieves 76.9% mAP, outperforming a comparable state-of-the-art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at: https://github.com/weiliu89/caffe/tree/ssd .

Summary

The paper introduces SSD (Single Shot MultiBox Detector), a method for real-time object detection using a single convolutional neural network. SSD simplifies object detection by eliminating the proposal generation step used in previous methods, instead predicting class scores and bounding box adjustments for a fixed set of default bounding boxes across multiple feature maps. Evaluated on datasets such as PASCAL VOC, COCO, and ILSVRC, SSD demonstrates competitive accuracy and improved speed compared to conventional methods like Faster R-CNN and YOLO. Key contributions include efficient handling of various object sizes through multi-scale feature maps and an innovative training approach that includes hard negative mining and extensive data augmentation, achieving high accuracy even with low-resolution inputs.

Methods

This paper employs the following methods:

Convolutional Neural Network

Models Used

SSD

Datasets

The following datasets were used in this research:

PASCAL VOC
COCO
ILSVRC

Evaluation Metrics

mAP

Results

SSD achieves 74.3% mAP on VOC2007 test at 59 FPS
SSD512 outperforms Faster R-CNN by 1.7% mAP

Limitations

The authors identified the following limitations:

Sensitivity to bounding box sizes, especially smaller objects
Potential for further improvements in default box tiling for better detection performance on small objects

Technical Requirements

Number of GPUs: 1
GPU Type: Nvidia Titan X

Keywords

Single Shot MultiBox Detector SSD real-time detection convolutional neural network multi-scale detection

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 34
Influential Citations: 3602

SSD: Single Shot MultiBox Detector

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers