← ML Research Wiki / 1703.06870

Mask R-CNN

Kaiming He Facebook AI Research (FAIR), Georgia Gkioxari Facebook AI Research (FAIR), Piotr Dollár Facebook AI Research (FAIR), Ross Girshick Facebook AI Research (FAIR), Kaiming He Facebook AI Research (FAIR), Georgia Gkioxari Facebook AI Research (FAIR), Piotr Dollár Facebook AI Research (FAIR), Ross Girshick Facebook AI Research (FAIR) (2017)

Paper Information

arXiv ID

1703.06870

Domain

Artificial Intelligence, Computer Vision

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

Summary

The paper introduces Mask R-CNN, a framework for object instance segmentation that enhances Faster R-CNN by adding a parallel mask prediction branch, allowing for high-quality segmentation of objects in images while retaining speed and flexibility. The authors achieve state-of-the-art results across various benchmarks, particularly on the COCO dataset, demonstrating the framework's effectiveness in instance segmentation, object detection, and human pose estimation. The paper emphasizes the importance of precise pixel alignments in segmentation tasks and introduces the RoIAlign layer to improve accuracy by addressing issues from previous RoIPool methods. Detailed comparisons with existing models show significant improvements in performance, showcasing Mask R-CNN's simplicity, high speed, and generalizability to other tasks. The code for the model will be made publicly available to facilitate further research.

Methods

This paper employs the following methods:

Mask R-CNN
RoIAlign

Models Used

Faster R-CNN
ResNet-50
ResNet-101
ResNeXt-101
Feature Pyramid Network (FPN)

Datasets

The following datasets were used in this research:

COCO
Cityscapes

Evaluation Metrics

AP (Average Precision)
AP 50
AP 75

Results

State-of-the-art performance on COCO dataset
Achieved 35.7 mask AP
Improved accuracy by up to 50% with RoIAlign

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: 8
GPU Type: None specified

Keywords

Mask R-CNN object detection instance segmentation deep learning CNN RoIAlign

Papers Using Similar Methods

External Resources

Funding: Facebook AI Research (FAIR)
References: 40
Influential Citations: 3854

Mask R-CNN

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers