← ML Research Wiki / 1504.08083

Fast R-CNN

Ross Girshick [email protected] Microsoft Research (2015)

Paper Information
arXiv ID
Domain
Computer vision
SOTA Claim
Yes
Code
Reproducibility
7/10

Abstract

This paper proposes Fast R-CNN, a clean and fast framework for object detection. Compared to traditional R-CNN, and its accelerated version SPPnet, Fast R-CNN trains networks using a multi-task loss in a single training stage. The multi-task loss simplifies learning and improves detection accuracy. Unlike SPPnet, all network layers can be updated during fine-tuning. We show that this difference has practical ramifications for very deep networks, such as VGG16, where mAP suffers when only the fully-connected layers are updated. Compared to "slow" R-CNN, Fast R-CNN is 9× faster at training VGG16 for detection, 213× faster at test-time, and achieves a significantly higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3× faster, tests 10× faster, and is more accurate. Fast R-CNN is implemented in Python and C++ and is available under the open-source MIT License at https: //github.com/rbgirshick/fast-rcnn.

Summary

The paper introduces Fast R-CNN, an object detection framework that enhances efficiency and accuracy compared to traditional methods like R-CNN and SPPnet. Fast R-CNN simplifies the training process by utilizing a single-stage multi-task loss that allows for shared convolutional features and enables back-propagation through all layers, significantly improving speed and accuracy. The framework demonstrates to be 9 times faster during training and 213 times faster at test time than traditional R-CNN while achieving a mean Average Precision (mAP) of 66% on the PASCAL VOC 2012 dataset. The authors also discuss important aspects such as the importance of fine-tuning the convolutional layers in deep networks and the implications of various sampling strategies and loss functions used in training.

Methods

This paper employs the following methods:

  • Multi-task loss
  • Back-propagation
  • Stochastic Gradient Descent
  • Truncated SVD
  • Image-centric sampling

Models Used

  • VGG16
  • CaffeNet (AlexNet)
  • VGG CNN M 1024

Datasets

The following datasets were used in this research:

  • PASCAL VOC 2012
  • ImageNet

Evaluation Metrics

  • mAP
  • Average Recall (AR)

Results

  • Fast R-CNN achieves a mAP of 66% on PASCAL VOC 2012
  • 9× faster training compared to R-CNN
  • 213× faster test-time than R-CNN
  • 3× faster training than SPPnet
  • 10× faster testing compared to SPPnet

Limitations

The authors identified the following limitations:

  • Fast R-CNN relies on object proposals that may affect detection quality
  • SVD compression may introduce a small drop in mAP

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

Fast R-CNN object detection multi-task loss RoI pooling single-stage training

Papers Using Similar Methods

External Resources