ML Research Wiki / Benchmarks / Object Detection / COCO-O

COCO-O

Object Detection Benchmark

Performance Over Time

📊 Showing 45 results | 📏 Metric: Average mAP

Top Performing Models

Rank	Model	Paper	Average mAP	Date	Code
1	EVA	EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	57.80	2022-11-14	📦 rwightman/pytorch-image-models 📦 open-mmlab/mmselfsup 📦 baaivision/eva
2	DETA (Swin-L)	NMS Strikes Back	48.50	2022-12-12	📦 jozhang97/deta
3	GLIP-L (Swin-L)	Grounded Language-Image Pre-training	48.00	2021-12-07	📦 microsoft/GLIP 📦 brown-palm/ObjectPrompt 📦 rsCPSyEu/ovd_cod
4	GRiT (ViT-H)	GRiT: A Generative Region-to-text Transformer for Object Understanding	42.90	2022-12-01	📦 JialianW/GRiT
5	DINO (Swin-L)	DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection	42.10	2022-03-07	📦 IDEA-Research/Grounded-Segment-Anything 📦 PaddlePaddle/PaddleDetection 📦 lucasjinreal/yolov7_d2
6	CBNetV2 (Swin-L)	CBNet: A Composite Backbone Network Architecture for Object Detection	39.00	2021-07-01	📦 PaddlePaddle/PaddleDetection 📦 shinya7y/UniverseNet 📦 VDIGPKU/CBNetV2 📦 epsilon-deltta/epsilon-deltta
7	ConvNeXt-XL (Cascade Mask R-CNN)	A ConvNet for the 2020s	37.50	2022-01-10	📦 keras-team/keras 📦 rwightman/pytorch-image-models 📦 pytorch/vision
8	InternImage-L (Cascade Mask R-CNN)	InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	37.00	2022-11-10	📦 opengvlab/internimage 📦 OpenGVLab/M3I-Pretraining 📦 chenller/mmseg-extension
9	DyHead (Swin-L)	Dynamic Head: Unifying Object Detection Heads with Attentions	35.30	2021-06-15	📦 open-mmlab/mmdetection 📦 microsoft/DynamicHead 📦 Coldestadam/DynamicHead
10	ViTDet (ViT-H)	Exploring Plain Vision Transformer Backbones for Object Detection	34.30	2022-03-30	📦 facebookresearch/detectron2 📦 PaddlePaddle/PaddleDetection 📦 alibaba/EasyCV

All Papers (45)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

2022

EVA

rwightman/pytorch-image-models open-mmlab/mmselfsup

NMS Strikes Back

2022

DETA (Swin-L)

jozhang97/deta

Grounded Language-Image Pre-training

2021

GLIP-L (Swin-L)

microsoft/GLIP brown-palm/ObjectPrompt rsCPSyEu/ovd_cod

GRiT: A Generative Region-to-text Transformer for Object Understanding

2022

GRiT (ViT-H)

JialianW/GRiT

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

2022

DINO (Swin-L)

IDEA-Research/Grounded-Segment-Anything PaddlePaddle/PaddleDetection

CBNet: A Composite Backbone Network Architecture for Object Detection

2021

CBNetV2 (Swin-L)

PaddlePaddle/PaddleDetection shinya7y/UniverseNet

A ConvNet for the 2020s

2022

ConvNeXt-XL (Cascade Mask R-CNN)

keras-team/keras rwightman/pytorch-image-models

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2022

InternImage-L (Cascade Mask R-CNN)

opengvlab/internimage OpenGVLab/M3I-Pretraining chenller/mmseg-extension

Dynamic Head: Unifying Object Detection Heads with Attentions

2021

DyHead (Swin-L)

open-mmlab/mmdetection microsoft/DynamicHead Coldestadam/DynamicHead

Exploring Plain Vision Transformer Backbones for Object Detection

2022

ViTDet (ViT-H)

facebookresearch/detectron2 PaddlePaddle/PaddleDetection

Vision Transformer Adapter for Dense Predictions

2022

ViT-Adapter (BEiTv2-L)

czczup/vit-adapter chenller/mmseg-extension

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

2022

FIBER-B (Swin-B)

microsoft/fiber

Instances as Queries

2021

QueryInst (Swin-L)

open-mmlab/mmdetection hustvl/QueryInst

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

2022

YOLOv6-L6

PaddlePaddle/PaddleDetection meituan/yolov6

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

2022

YOLOv7-E6E

pjreddie/darknet AlexeyAB/darknet

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

2021

MViTV2-H (Cascade Mask R-CNN)

rwightman/pytorch-image-models facebookresearch/detectron2

Robust and Accurate Object Detection via Adversarial Learning

2021

Det-AdvProp (EfficientNet-B5)

google/automl MindSpore-scientific-2/code-5 MindSpore-scientific-2/code-4

YOLOv4: Optimal Speed and Accuracy of Object Detection

2020

YOLOv4-P6

tensorflow/models pjreddie/darknet

YOLOX: Exceeding YOLO Series in 2021

2021

YOLOX-X

open-mmlab/mmdetection PaddlePaddle/PaddleDetection

Probabilistic two-stage detection

2021

CenterNet2 (R2-101-DCN)

xingyizhou/CenterNet2 smart-car-lab/Centernet2-mmdetction aim-uofa/DiverGen

Grounded Language-Image Pre-training

2021

GLIP-T (Swin-T)

microsoft/GLIP brown-palm/ObjectPrompt rsCPSyEu/ovd_cod

EfficientDet: Scalable and Efficient Object Detection

2019

EfficientDet-D5 (EfficientNet-B5)

tensorflow/models PaddlePaddle/PaddleDetection

PVT v2: Improved Baselines with Pyramid Vision Transformer

2021

PVTv2-B5 (Mask R-CNN)

rwightman/pytorch-image-models open-mmlab/mmdetection

VarifocalNet: An IoU-aware Dense Object Detector

2020

VFNet (RX-101-64x4d)

open-mmlab/mmdetection hyz-xmaster/VarifocalNet

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

2019

GCNet (RX-101-32x4d-DCN)

open-mmlab/mmdetection open-mmlab/mmsegmentation

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

2020

GFLv2 (R2-101-DCN)

PaddlePaddle/PaddleDetection implus/GFocalV2

RepPoints V2: Verification Meets Regression for Object Detection

2020

RepPointsV2 (RX-101-64x4d-DCN)

Scalsol/RepPointsV2

USB: Universal-Scale Object Detection Benchmark

2021

UniverseNet (R2-101-DCN)

shinya7y/UniverseNet

YOLOX: Exceeding YOLO Series in 2021

2021

YOLOX-S

open-mmlab/mmdetection PaddlePaddle/PaddleDetection

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

2021

YOLOS-B (ViT-B)

huggingface/transformers hustvl/YOLOS

Dynamic Head: Unifying Object Detection Heads with Attentions

2021

DyHead (ResNet-50)

open-mmlab/mmdetection microsoft/DynamicHead Coldestadam/DynamicHead

Hybrid Task Cascade for Instance Segmentation

2019

HTC (ResNet-50)

open-mmlab/mmdetection PaddlePaddle/PaddleDetection

Deformable DETR: Deformable Transformers for End-to-End Object Detection

2020

Deformable-DETR (ResNet-50)

PaddlePaddle/PaddleDetection fundamentalvision/Deformable-DETR

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

2019

Cascade R-CNN (ResNet-50)

open-mmlab/mmdetection zhaoweicai/cascade-rcnn

Mask R-CNN

2017

Mask R-CNN (ResNet-50)

tensorflow/models facebookresearch/detectron2

End-to-End Object Detection with Transformers

2020

DETR (ResNet-50)

huggingface/transformers tensorflow/models

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

2019

ATSS (ResNet-50)

open-mmlab/mmdetection RangiLyu/nanodet

FCOS: Fully Convolutional One-Stage Object Detection

2019

FCOS (ResNet-50)

open-mmlab/mmdetection pytorch/vision

Focal Loss for Dense Object Detection

2017

RetinaNet (ResNet-50)

tensorflow/models facebookresearch/detectron2

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

2015

Faster R-CNN (ResNet-50-FPN)

facebookresearch/detectron2 open-mmlab/mmdetection

YOLOv3: An Incremental Improvement

2018

YOLOv3 (DarkNet-53)

open-mmlab/mmdetection ultralytics/yolov3

SSD: Single Shot MultiBox Detector

2015

SSD (VGG-16)

open-mmlab/mmdetection serengil/deepface

Exploring Plain Vision Transformer Backbones for Object Detection

2022

ViTDet (ViT-H)

facebookresearch/detectron2 PaddlePaddle/PaddleDetection

USB: Universal-Scale Object Detection Benchmark

2021

UniverseNet (R2-101-DCN)

shinya7y/UniverseNet

Mask R-CNN

2017

Mask R-CNN (ResNet-50)

tensorflow/models facebookresearch/detectron2

Model	Paper	Average mAP	Date
EVA	EVA: Exploring the Limits of Masked Visual Repres…	57.80	2022-11-14
DETA (Swin-L)	NMS Strikes Back	48.50	2022-12-12
GLIP-L (Swin-L)	Grounded Language-Image Pre-training	48.00	2021-12-07
GRiT (ViT-H)	GRiT: A Generative Region-to-text Transformer for…	42.90	2022-12-01
DINO (Swin-L)	DINO: DETR with Improved DeNoising Anchor Boxes f…	42.10	2022-03-07
CBNetV2 (Swin-L)	CBNet: A Composite Backbone Network Architecture …	39.00	2021-07-01
ConvNeXt-XL (Cascade Mask R-CNN)	A ConvNet for the 2020s	37.50	2022-01-10
InternImage-L (Cascade Mask R-CNN)	InternImage: Exploring Large-Scale Vision Foundat…	37.00	2022-11-10
DyHead (Swin-L)	Dynamic Head: Unifying Object Detection Heads wit…	35.30	2021-06-15
ViTDet (ViT-H)	Exploring Plain Vision Transformer Backbones for …	34.30	2022-03-30
ViT-Adapter (BEiTv2-L)	Vision Transformer Adapter for Dense Predictions	34.25	2022-05-17
FIBER-B (Swin-B)	Coarse-to-Fine Vision-Language Pre-training with …	33.70	2022-06-15
QueryInst (Swin-L)	Instances as Queries	33.20	2021-05-05
YOLOv6-L6	YOLOv6: A Single-Stage Object Detection Framework…	32.50	2022-09-07
YOLOv7-E6E	YOLOv7: Trainable bag-of-freebies sets new state-…	32.00	2022-07-06
MViTV2-H (Cascade Mask R-CNN)	MViTv2: Improved Multiscale Vision Transformers f…	30.90	2021-12-02
Det-AdvProp (EfficientNet-B5)	Robust and Accurate Object Detection via Adversar…	30.80	2021-03-23
YOLOv4-P6	YOLOv4: Optimal Speed and Accuracy of Object Dete…	30.40	2020-04-23
YOLOX-X	YOLOX: Exceeding YOLO Series in 2021	30.30	2021-07-18
CenterNet2 (R2-101-DCN)	Probabilistic two-stage detection	29.50	2021-03-12
GLIP-T (Swin-T)	Grounded Language-Image Pre-training	29.10	2021-12-07
EfficientDet-D5 (EfficientNet-B5)	EfficientDet: Scalable and Efficient Object Detec…	28.50	2019-11-20
PVTv2-B5 (Mask R-CNN)	PVT v2: Improved Baselines with Pyramid Vision Tr…	28.20	2021-06-25
VFNet (RX-101-64x4d)	VarifocalNet: An IoU-aware Dense Object Detector	28.00	2020-08-31
GCNet (RX-101-32x4d-DCN)	GCNet: Non-local Networks Meet Squeeze-Excitation…	26.00	2019-04-25
GFLv2 (R2-101-DCN)	Generalized Focal Loss V2: Learning Reliable Loca…	25.10	2020-11-25
RepPointsV2 (RX-101-64x4d-DCN)	RepPoints V2: Verification Meets Regression for O…	24.90	2020-07-16
UniverseNet (R2-101-DCN)	USB: Universal-Scale Object Detection Benchmark	24.80	2021-03-25
YOLOX-S	YOLOX: Exceeding YOLO Series in 2021	20.60	2021-07-18
YOLOS-B (ViT-B)	You Only Look at One Sequence: Rethinking Transfo…	20.00	2021-06-01
DyHead (ResNet-50)	Dynamic Head: Unifying Object Detection Heads wit…	19.30	2021-06-15
HTC (ResNet-50)	Hybrid Task Cascade for Instance Segmentation	19.10	2019-01-22
Deformable-DETR (ResNet-50)	Deformable DETR: Deformable Transformers for End-…	18.50	2020-10-08
Cascade R-CNN (ResNet-50)	Cascade R-CNN: High Quality Object Detection and …	18.20	2019-06-24
Mask R-CNN (ResNet-50)	Mask R-CNN	17.10	2017-03-20
DETR (ResNet-50)	End-to-End Object Detection with Transformers	17.10	2020-05-26
ATSS (ResNet-50)	Bridging the Gap Between Anchor-based and Anchor-…	16.80	2019-12-05
FCOS (ResNet-50)	FCOS: Fully Convolutional One-Stage Object Detect…	16.70	2019-04-02
RetinaNet (ResNet-50)	Focal Loss for Dense Object Detection	16.60	2017-08-07
Faster R-CNN (ResNet-50-FPN)	Faster R-CNN: Towards Real-Time Object Detection …	16.40	2015-06-04
YOLOv3 (DarkNet-53)	YOLOv3: An Incremental Improvement	14.80	2018-04-08
SSD (VGG-16)	SSD: Single Shot MultiBox Detector	13.60	2015-12-08
ViTDet (ViT-H)	Exploring Plain Vision Transformer Backbones for …	7.89	2022-03-30
UniverseNet (R2-101-DCN)	USB: Universal-Scale Object Detection Benchmark	1.86	2021-03-25
Mask R-CNN (ResNet-50)	Mask R-CNN	-0.11	2017-03-20

COCO-O

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (45)