← ML Research Wiki / 1506.01497

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He [email protected], Ross Girshick, Jian Sun [email protected], Microsoft Research, University of Science and Technology of China, Microsoft Research (2015)

Paper Information

arXiv ID

1506.01497

Venue

IEEE Transactions on Pattern Analysis and Machine Intelligence

Domain

computer vision

SOTA Claim

Yes

Code

Available

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.Advances like SPPnet[7]and Fast R-CNN[5]have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.An RPN is a fullyconvolutional network that simultaneously predicts object bounds and objectness scores at each position.RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.The code will be released.

Summary

The paper presents Faster R-CNN, a unified architecture for real-time object detection that integrates Region Proposal Networks (RPN) with Fast R-CNN. The RPN utilizes shared convolutional features between region proposal and detection tasks, significantly reducing the computational overhead associated with traditional region proposal methods. The proposed system demonstrates impressive frame rates (5 fps) while achieving state-of-the-art accuracy on PASCAL VOC datasets, with a mean Average Precision (mAP) of 73.2% for VOC 2007 and 70.4% for VOC 2012. The authors detail the RPN architecture, training strategies, and comparison with traditional methods, indicating substantial efficiency improvements over existing algorithms.

Methods

This paper employs the following methods:

Region Proposal Network (RPN)
Fast R-CNN

Models Used

VGG-16
ZF net

Datasets

The following datasets were used in this research:

PASCAL VOC 2007
PASCAL VOC 2012

Evaluation Metrics

mean Average Precision (mAP)

Results

Achieves state-of-the-art object detection accuracy of 73.2% mAP on PASCAL VOC 2007 and 70.4% mAP on PASCAL VOC 2012
Frame rate of 5 fps on GPU while maintaining accuracy

Limitations

The authors identified the following limitations:

None specified

Technical Requirements

Number of GPUs: None specified
GPU Type: NVIDIA K40

Keywords

region proposal networks deep learning object detection fast r-cnn

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 47
Influential Citations: 8930

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers