← ML Research Wiki / 1506.01497

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He [email protected], Ross Girshick, Jian Sun [email protected], Microsoft Research, University of Science and Technology of China, Microsoft Research (2015)

Paper Information
arXiv ID
Venue
IEEE Transactions on Pattern Analysis and Machine Intelligence
Domain
computer vision
SOTA Claim
Yes
Code
Reproducibility
7/10

Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.Advances like SPPnet[7]and Fast R-CNN[5]have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.An RPN is a fullyconvolutional network that simultaneously predicts object bounds and objectness scores at each position.RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.The code will be released.

Summary

The paper presents Faster R-CNN, a unified architecture for real-time object detection that integrates Region Proposal Networks (RPN) with Fast R-CNN. The RPN utilizes shared convolutional features between region proposal and detection tasks, significantly reducing the computational overhead associated with traditional region proposal methods. The proposed system demonstrates impressive frame rates (5 fps) while achieving state-of-the-art accuracy on PASCAL VOC datasets, with a mean Average Precision (mAP) of 73.2% for VOC 2007 and 70.4% for VOC 2012. The authors detail the RPN architecture, training strategies, and comparison with traditional methods, indicating substantial efficiency improvements over existing algorithms.

Methods

This paper employs the following methods:

  • Region Proposal Network (RPN)
  • Fast R-CNN

Models Used

  • VGG-16
  • ZF net

Datasets

The following datasets were used in this research:

  • PASCAL VOC 2007
  • PASCAL VOC 2012

Evaluation Metrics

  • mean Average Precision (mAP)

Results

  • Achieves state-of-the-art object detection accuracy of 73.2% mAP on PASCAL VOC 2007 and 70.4% mAP on PASCAL VOC 2012
  • Frame rate of 5 fps on GPU while maintaining accuracy

Limitations

The authors identified the following limitations:

  • None specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: NVIDIA K40

Keywords

region proposal networks deep learning object detection fast r-cnn

Papers Using Similar Methods

External Resources