Venue
IEEE Transactions on Pattern Analysis and Machine Intelligence
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.Advances like SPPnet[7]and Fast R-CNN[5]have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.An RPN is a fullyconvolutional network that simultaneously predicts object bounds and objectness scores at each position.RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.The code will be released.
The paper presents Faster R-CNN, a unified architecture for real-time object detection that integrates Region Proposal Networks (RPN) with Fast R-CNN. The RPN utilizes shared convolutional features between region proposal and detection tasks, significantly reducing the computational overhead associated with traditional region proposal methods. The proposed system demonstrates impressive frame rates (5 fps) while achieving state-of-the-art accuracy on PASCAL VOC datasets, with a mean Average Precision (mAP) of 73.2% for VOC 2007 and 70.4% for VOC 2012. The authors detail the RPN architecture, training strategies, and comparison with traditional methods, indicating substantial efficiency improvements over existing algorithms.
This paper employs the following methods:
- Region Proposal Network (RPN)
- Fast R-CNN
The following datasets were used in this research:
- PASCAL VOC 2007
- PASCAL VOC 2012
- mean Average Precision (mAP)
- Achieves state-of-the-art object detection accuracy of 73.2% mAP on PASCAL VOC 2007 and 70.4% mAP on PASCAL VOC 2012
- Frame rate of 5 fps on GPU while maintaining accuracy
The authors identified the following limitations:
- Number of GPUs: None specified
- GPU Type: NVIDIA K40
region proposal networks
deep learning
object detection
fast r-cnn