CoCa
|
CoCa: Contrastive Captioners are Image-Text Found…
|
82.70
|
2022-05-04
|
|
LiT
|
LiT: Zero-Shot Transfer with Locked-image text Tu…
|
82.50
|
2021-11-15
|
|
BASIC
|
Combined Scaling for Zero-shot Transfer Learning
|
82.30
|
2021-11-19
|
|
ViT-H/14
|
An Image is Worth 16x16 Words: Transformers for I…
|
82.10
|
2020-10-22
|
|
EVA-02-CLIP-E/14+
|
EVA-CLIP: Improved Training Techniques for CLIP a…
|
79.60
|
2023-03-27
|
|
Baseline (ViT-G/14)
|
Model soups: averaging weights of multiple fine-t…
|
79.03
|
2022-03-10
|
|
Model soups (ViT-G/14)
|
Model soups: averaging weights of multiple fine-t…
|
78.52
|
2022-03-10
|
|
MAWS (ViT-6.5B)
|
The effectiveness of MAE pre-pretraining for bill…
|
77.90
|
2023-03-23
|
|
MAWS (ViT-2B)
|
The effectiveness of MAE pre-pretraining for bill…
|
75.80
|
2023-03-23
|
|
MAWS (ViT-H)
|
The effectiveness of MAE pre-pretraining for bill…
|
72.60
|
2023-03-23
|
|
CLIP
|
Learning Transferable Visual Models From Natural …
|
72.30
|
2021-02-26
|
|
ALIGN
|
Combined Scaling for Zero-shot Transfer Learning
|
72.20
|
2021-11-19
|
|
WiSE-FT
|
Robust fine-tuning of zero-shot models
|
72.10
|
2021-09-04
|
|
ViT-e
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
72.00
|
2022-09-14
|
|
ViT-G/14
|
Scaling Vision Transformers
|
70.53
|
2021-06-08
|
|
SWAG (ViT H/14)
|
Revisiting Weakly Supervised Pre-Training of Visu…
|
69.50
|
2022-01-20
|
|
NS (Eff.-L2)
|
Scaling Vision Transformers
|
68.50
|
2021-06-08
|
|
RegNetY 128GF (Platt)
|
Revisiting Weakly Supervised Pre-Training of Visu…
|
64.30
|
2022-01-20
|
|
LLE (ViT-H/14, MAE, Edge Aug)
|
A Whac-A-Mole Dilemma: Shortcuts Come in Multiple…
|
60.78
|
2022-12-09
|
|
SEER (RegNet10B)
|
Vision Models Are More Robust And Fair When Pretr…
|
60.20
|
2022-02-16
|
|
ViT H/14 (Platt)
|
Revisiting Weakly Supervised Pre-Training of Visu…
|
60.00
|
2022-01-20
|
|
BiT-L (ResNet-152x4)
|
Big Transfer (BiT): General Visual Representation…
|
58.70
|
2019-12-24
|
|
ViT L/16 (Platt)
|
Revisiting Weakly Supervised Pre-Training of Visu…
|
57.30
|
2022-01-20
|
|
Vit B/16 (Bamboo)
|
Bamboo: Building Mega-Scale Vision Dataset Contin…
|
53.90
|
2022-03-15
|
|
AR-L (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
52.00
|
2022-06-02
|
|
ALIGN-MRL
|
Matryoshka Representation Learning
|
51.60
|
2022-05-26
|
|
ViT-B/16 (ANN-1.3B)
|
Billion-Scale Pretraining with Vision Transformer…
|
50.70
|
2021-08-12
|
|
ViT-B/16 (512x512) + Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
49.39
|
2021-11-30
|
|
ResNet-101 (JFT-300M)
|
Billion-Scale Pretraining with Vision Transformer…
|
49.10
|
2021-08-12
|
|
ViT B/16
|
Revisiting Weakly Supervised Pre-Training of Visu…
|
48.90
|
2022-01-20
|
|
ViT-B/32
|
Billion-Scale Pretraining with Vision Transformer…
|
48.40
|
2021-08-12
|
|
ViT-B/16 (512x512) + Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
47.53
|
2021-11-30
|
|
AR-B (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
47.10
|
2022-06-02
|
|
BiT-M (ResNet-152x4)
|
Big Transfer (BiT): General Visual Representation…
|
47.00
|
2019-12-24
|
|
ViT-B/16 (512x512)
|
Pyramid Adversarial Training Improves ViT Perform…
|
46.68
|
2021-11-30
|
|
ViT-B (Discrete 512x512)
|
Discrete Representations Strengthen Vision Transf…
|
46.62
|
2021-11-20
|
|
AR-L
|
Optimizing Relevance Maps of Vision Transformers …
|
46.50
|
2022-06-02
|
|
ViT-L (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
43.20
|
2022-06-02
|
|
CLIP L
|
Optimal Representations for Covariate Shift
|
42.80
|
2021-12-31
|
|
ResNet-50 (JFT-300M)
|
Billion-Scale Pretraining with Vision Transformer…
|
42.50
|
2021-08-12
|
|
ViT-B (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
42.20
|
2022-06-02
|
|
CLIP L (LAION)
|
Optimal Representations for Covariate Shift
|
42.10
|
2021-12-31
|
|
AR-B
|
Optimizing Relevance Maps of Vision Transformers …
|
41.40
|
2022-06-02
|
|
RegViT on 384x384 + Adv Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
39.79
|
2021-11-30
|
|
ResNet-152 + GenInt with Transfer
|
Generative Interventions for Causal Learning
|
39.38
|
2020-12-22
|
|
AR-S (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
39.30
|
2022-06-02
|
|
ResNet-50 (Bamboo)
|
Bamboo: Building Mega-Scale Vision Dataset Contin…
|
38.80
|
2022-03-15
|
|
RegViT on 384x384 + Adv Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
37.41
|
2021-11-30
|
|
ViT-L
|
Optimizing Relevance Maps of Vision Transformers …
|
37.40
|
2022-06-02
|
|
DeiT-L (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
36.30
|
2022-06-02
|
|
BiT-S (ResNet-152x4)
|
Big Transfer (BiT): General Visual Representation…
|
36.00
|
2019-12-24
|
|
RegViT on 384x384
|
Pyramid Adversarial Training Improves ViT Perform…
|
35.59
|
2021-11-30
|
|
ViT-B
|
Optimizing Relevance Maps of Vision Transformers …
|
35.10
|
2022-06-02
|
|
RegViT on 384x384 + Random Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
34.83
|
2021-11-30
|
|
AR-S
|
Optimizing Relevance Maps of Vision Transformers …
|
34.30
|
2022-06-02
|
|
RegViT on 384x384 + Random Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
34.12
|
2021-11-30
|
|
RegViT (RandAug) + Adv Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
32.92
|
2021-11-30
|
|
DeiT-S (Opt Relevance)
|
Optimizing Relevance Maps of Vision Transformers …
|
31.60
|
2022-06-02
|
|
ResNet-50 + CGC
|
Context-Gated Convolution
|
31.53
|
2019-10-12
|
|
DeiT-L
|
Optimizing Relevance Maps of Vision Transformers …
|
31.40
|
2022-06-02
|
|
Discrete ViT + Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
30.98
|
2021-11-30
|
|
Discrete ViT + Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
30.28
|
2021-11-30
|
|
RegViT (RandAug) + Adv Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
30.11
|
2021-11-30
|
|
Discrete ViT
|
Pyramid Adversarial Training Improves ViT Perform…
|
29.95
|
2021-11-30
|
|
RegViT (RandAug) + Random Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
29.41
|
2021-11-30
|
|
RegViT (RandAug)
|
Pyramid Adversarial Training Improves ViT Perform…
|
29.30
|
2021-11-30
|
|
ResNet-50 + GroupNorm
|
Improving robustness against common corruptions b…
|
29.20
|
2020-06-30
|
|
ResNet-50 + RoHL
|
Improving robustness against common corruptions b…
|
29.20
|
2020-06-30
|
|
RegViT (RandAug) + Random Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
28.72
|
2021-11-30
|
|
MLP-Mixer + Pyramid
|
Pyramid Adversarial Training Improves ViT Perform…
|
28.60
|
2021-11-30
|
|
ResNet-50 + FixUp
|
Improving robustness against common corruptions b…
|
28.50
|
2020-06-30
|
|
ResNet-50 + MixUp (rescaled)
|
On Mixup Regularization
|
28.37
|
2020-06-10
|
|
DeiT-S
|
Optimizing Relevance Maps of Vision Transformers …
|
28.30
|
2022-06-02
|
|
ResNet-18 + GenInt with Transfer
|
Generative Interventions for Causal Learning
|
27.03
|
2020-12-22
|
|
MLP-Mixer
|
Pyramid Adversarial Training Improves ViT Perform…
|
25.90
|
2021-11-30
|
|
RELICv2
|
Pushing the limits of self-supervised ResNets: Ca…
|
25.90
|
2022-01-13
|
|
ViT + MixUp
|
Pyramid Adversarial Training Improves ViT Perform…
|
25.65
|
2021-11-30
|
|
C-BYOL
|
Compressive Visual Representations
|
25.50
|
2021-09-27
|
|
MLP-Mixer + Pixel
|
Pyramid Adversarial Training Improves ViT Perform…
|
24.75
|
2021-11-30
|
|
BYOL (BG_RM)
|
Characterizing and Improving the Robustness of Se…
|
23.90
|
2021-03-23
|
|
RELIC
|
Pushing the limits of self-supervised ResNets: Ca…
|
23.80
|
2022-01-13
|
|
BYOL
|
Pushing the limits of self-supervised ResNets: Ca…
|
23.00
|
2022-01-13
|
|
SwAV (BG_RM)
|
Characterizing and Improving the Robustness of Se…
|
21.90
|
2021-03-23
|
|
ViT + CutMix
|
Pyramid Adversarial Training Improves ViT Perform…
|
21.61
|
2021-11-30
|
|
MoCo-v2 (BG_Swaps)
|
Characterizing and Improving the Robustness of Se…
|
20.80
|
2021-03-23
|
|
C-SimCLR
|
Compressive Visual Representations
|
20.80
|
2021-09-27
|
|
DILEMMA
|
Representation Learning by Detecting Incorrect Lo…
|
20.51
|
2022-04-10
|
|
ResNet-50 (ImageNet-Captions)
|
Data Determines Distributional Robustness in Cont…
|
18.70
|
2022-05-03
|
|
ViT
|
Pyramid Adversarial Training Improves ViT Perform…
|
17.36
|
2021-11-30
|
|
ResNet34-RPG
|
Compact and Optimal Deep Learning with Recurrent …
|
16.50
|
2021-07-15
|
|
CLIP (CC12M pretrain)
|
Robust Cross-Modal Representation Learning with P…
|
15.24
|
2022-04-10
|
|
SimCLR
|
Pushing the limits of self-supervised ResNets: Ca…
|
14.60
|
2022-01-13
|
|
ResNet-152 (FRCNN-ag-ad, VOC)
|
Class-agnostic Object Detection
|
13.20
|
2020-11-28
|
|
BigBiGAN (RevNet-50 4×)
|
Self-Supervised Learning for Large-Scale Unsuperv…
|
4.92
|
2020-08-24
|
|