← ML Research Wiki / 1504.08083

Fast R-CNN

Ross Girshick [email protected] Microsoft Research (2015)

Paper Information

arXiv ID

1504.08083

Domain

Computer vision

SOTA Claim

Yes

Code

Available

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

This paper proposes Fast R-CNN, a clean and fast framework for object detection. Compared to traditional R-CNN, and its accelerated version SPPnet, Fast R-CNN trains networks using a multi-task loss in a single training stage. The multi-task loss simplifies learning and improves detection accuracy. Unlike SPPnet, all network layers can be updated during fine-tuning. We show that this difference has practical ramifications for very deep networks, such as VGG16, where mAP suffers when only the fully-connected layers are updated. Compared to "slow" R-CNN, Fast R-CNN is 9× faster at training VGG16 for detection, 213× faster at test-time, and achieves a significantly higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3× faster, tests 10× faster, and is more accurate. Fast R-CNN is implemented in Python and C++ and is available under the open-source MIT License at https: //github.com/rbgirshick/fast-rcnn.

Summary

The paper introduces Fast R-CNN, an object detection framework that enhances efficiency and accuracy compared to traditional methods like R-CNN and SPPnet. Fast R-CNN simplifies the training process by utilizing a single-stage multi-task loss that allows for shared convolutional features and enables back-propagation through all layers, significantly improving speed and accuracy. The framework demonstrates to be 9 times faster during training and 213 times faster at test time than traditional R-CNN while achieving a mean Average Precision (mAP) of 66% on the PASCAL VOC 2012 dataset. The authors also discuss important aspects such as the importance of fine-tuning the convolutional layers in deep networks and the implications of various sampling strategies and loss functions used in training.

Methods

This paper employs the following methods:

Multi-task loss
Back-propagation
Stochastic Gradient Descent
Truncated SVD
Image-centric sampling

Models Used

VGG16
CaffeNet (AlexNet)
VGG CNN M 1024

Datasets

The following datasets were used in this research:

PASCAL VOC 2012
ImageNet

Evaluation Metrics

mAP
Average Recall (AR)

Results

Fast R-CNN achieves a mAP of 66% on PASCAL VOC 2012
9× faster training compared to R-CNN
213× faster test-time than R-CNN
3× faster training than SPPnet
10× faster testing compared to SPPnet

Limitations

The authors identified the following limitations:

Fast R-CNN relies on object proposals that may affect detection quality
SVD compression may introduce a small drop in mAP

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Fast R-CNN object detection multi-task loss RoI pooling single-stage training

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 23
Influential Citations: 3246

Fast R-CNN

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers