EVA
|
EVA: Exploring the Limits of Masked Visual Repres…
|
57.80
|
2022-11-14
|
|
DETA
(Swin-L)
|
NMS Strikes Back
|
48.50
|
2022-12-12
|
|
GLIP-L
(Swin-L)
|
Grounded Language-Image Pre-training
|
48.00
|
2021-12-07
|
|
GRiT
(ViT-H)
|
GRiT: A Generative Region-to-text Transformer for…
|
42.90
|
2022-12-01
|
|
DINO (Swin-L)
|
DINO: DETR with Improved DeNoising Anchor Boxes f…
|
42.10
|
2022-03-07
|
|
CBNetV2
(Swin-L)
|
CBNet: A Composite Backbone Network Architecture …
|
39.00
|
2021-07-01
|
|
ConvNeXt-XL
(Cascade Mask R-CNN)
|
A ConvNet for the 2020s
|
37.50
|
2022-01-10
|
|
InternImage-L (Cascade Mask R-CNN)
|
InternImage: Exploring Large-Scale Vision Foundat…
|
37.00
|
2022-11-10
|
|
DyHead
(Swin-L)
|
Dynamic Head: Unifying Object Detection Heads wit…
|
35.30
|
2021-06-15
|
|
ViTDet (ViT-H)
|
Exploring Plain Vision Transformer Backbones for …
|
34.30
|
2022-03-30
|
|
ViT-Adapter (BEiTv2-L)
|
Vision Transformer Adapter for Dense Predictions
|
34.25
|
2022-05-17
|
|
FIBER-B
(Swin-B)
|
Coarse-to-Fine Vision-Language Pre-training with …
|
33.70
|
2022-06-15
|
|
QueryInst
(Swin-L)
|
Instances as Queries
|
33.20
|
2021-05-05
|
|
YOLOv6-L6
|
YOLOv6: A Single-Stage Object Detection Framework…
|
32.50
|
2022-09-07
|
|
YOLOv7-E6E
|
YOLOv7: Trainable bag-of-freebies sets new state-…
|
32.00
|
2022-07-06
|
|
MViTV2-H
(Cascade Mask R-CNN)
|
MViTv2: Improved Multiscale Vision Transformers f…
|
30.90
|
2021-12-02
|
|
Det-AdvProp
(EfficientNet-B5)
|
Robust and Accurate Object Detection via Adversar…
|
30.80
|
2021-03-23
|
|
YOLOv4-P6
|
YOLOv4: Optimal Speed and Accuracy of Object Dete…
|
30.40
|
2020-04-23
|
|
YOLOX-X
|
YOLOX: Exceeding YOLO Series in 2021
|
30.30
|
2021-07-18
|
|
CenterNet2
(R2-101-DCN)
|
Probabilistic two-stage detection
|
29.50
|
2021-03-12
|
|
GLIP-T
(Swin-T)
|
Grounded Language-Image Pre-training
|
29.10
|
2021-12-07
|
|
EfficientDet-D5
(EfficientNet-B5)
|
EfficientDet: Scalable and Efficient Object Detec…
|
28.50
|
2019-11-20
|
|
PVTv2-B5
(Mask R-CNN)
|
PVT v2: Improved Baselines with Pyramid Vision Tr…
|
28.20
|
2021-06-25
|
|
VFNet
(RX-101-64x4d)
|
VarifocalNet: An IoU-aware Dense Object Detector
|
28.00
|
2020-08-31
|
|
GCNet
(RX-101-32x4d-DCN)
|
GCNet: Non-local Networks Meet Squeeze-Excitation…
|
26.00
|
2019-04-25
|
|
GFLv2
(R2-101-DCN)
|
Generalized Focal Loss V2: Learning Reliable Loca…
|
25.10
|
2020-11-25
|
|
RepPointsV2
(RX-101-64x4d-DCN)
|
RepPoints V2: Verification Meets Regression for O…
|
24.90
|
2020-07-16
|
|
UniverseNet
(R2-101-DCN)
|
USB: Universal-Scale Object Detection Benchmark
|
24.80
|
2021-03-25
|
|
YOLOX-S
|
YOLOX: Exceeding YOLO Series in 2021
|
20.60
|
2021-07-18
|
|
YOLOS-B
(ViT-B)
|
You Only Look at One Sequence: Rethinking Transfo…
|
20.00
|
2021-06-01
|
|
DyHead
(ResNet-50)
|
Dynamic Head: Unifying Object Detection Heads wit…
|
19.30
|
2021-06-15
|
|
HTC
(ResNet-50)
|
Hybrid Task Cascade for Instance Segmentation
|
19.10
|
2019-01-22
|
|
Deformable-DETR
(ResNet-50)
|
Deformable DETR: Deformable Transformers for End-…
|
18.50
|
2020-10-08
|
|
Cascade R-CNN
(ResNet-50)
|
Cascade R-CNN: High Quality Object Detection and …
|
18.20
|
2019-06-24
|
|
Mask R-CNN
(ResNet-50)
|
Mask R-CNN
|
17.10
|
2017-03-20
|
|
DETR
(ResNet-50)
|
End-to-End Object Detection with Transformers
|
17.10
|
2020-05-26
|
|
ATSS
(ResNet-50)
|
Bridging the Gap Between Anchor-based and Anchor-…
|
16.80
|
2019-12-05
|
|
FCOS
(ResNet-50)
|
FCOS: Fully Convolutional One-Stage Object Detect…
|
16.70
|
2019-04-02
|
|
RetinaNet
(ResNet-50)
|
Focal Loss for Dense Object Detection
|
16.60
|
2017-08-07
|
|
Faster R-CNN (ResNet-50-FPN)
|
Faster R-CNN: Towards Real-Time Object Detection …
|
16.40
|
2015-06-04
|
|
YOLOv3
(DarkNet-53)
|
YOLOv3: An Incremental Improvement
|
14.80
|
2018-04-08
|
|
SSD (VGG-16)
|
SSD: Single Shot MultiBox Detector
|
13.60
|
2015-12-08
|
|
ViTDet
(ViT-H)
|
Exploring Plain Vision Transformer Backbones for …
|
7.89
|
2022-03-30
|
|
UniverseNet (R2-101-DCN)
|
USB: Universal-Scale Object Detection Benchmark
|
1.86
|
2021-03-25
|
|
Mask R-CNN (ResNet-50)
|
Mask R-CNN
|
-0.11
|
2017-03-20
|
|