Mixer-B/8-SAM
|
When Vision Transformers Outperform ResNets witho…
|
76.50
|
2021-06-03
|
|
ViT-B/16-SAM
|
When Vision Transformers Outperform ResNets witho…
|
73.60
|
2021-06-03
|
|
ResNet-152x2-SAM
|
When Vision Transformers Outperform ResNets witho…
|
71.90
|
2021-06-03
|
|
ResNet-50
|
Deep Residual Learning for Image Recognition
|
63.90
|
2015-12-10
|
|
AugMix (ResNet-50)
|
AugMix: A Simple Data Processing Method to Improv…
|
58.90
|
2019-12-05
|
|
Stylized ImageNet (ResNet-50)
|
ImageNet-trained CNNs are biased towards texture;…
|
58.50
|
2018-11-29
|
|
DeepAugment (ResNet-50)
|
The Many Faces of Robustness: A Critical Analysis…
|
57.80
|
2020-06-29
|
|
PRIME (ResNet-50)
|
PRIME: A few primitives can boost robustness to c…
|
57.10
|
2021-12-27
|
|
RVT-Ti*
|
Towards Robust Vision Transformer
|
56.10
|
2021-05-17
|
|
PRIME with JSD (ResNet-50)
|
PRIME: A few primitives can boost robustness to c…
|
53.70
|
2021-12-27
|
|
DeepAugment+AugMix (ResNet-50)
|
The Many Faces of Robustness: A Critical Analysis…
|
53.20
|
2020-06-29
|
|
RVT-S*
|
Towards Robust Vision Transformer
|
52.30
|
2021-05-17
|
|
Sequencer2D-L
|
Sequencer: Deep LSTM for Image Classification
|
51.90
|
2022-05-04
|
|
RVT-B*
|
Towards Robust Vision Transformer
|
51.30
|
2021-05-17
|
|
ConvFormer-B36
|
MetaFormer Baselines for Vision
|
48.90
|
2022-10-24
|
|
ConvFormer-B36 (384)
|
MetaFormer Baselines for Vision
|
47.80
|
2022-10-24
|
|
CAFormer-B36
|
MetaFormer Baselines for Vision
|
46.10
|
2022-10-24
|
|
Pyramid Adversarial Training Improves ViT
|
Pyramid Adversarial Training Improves ViT Perform…
|
46.08
|
2021-11-30
|
|
CAFormer-B36 (384)
|
MetaFormer Baselines for Vision
|
45.00
|
2022-10-24
|
|
DiscreteViT
|
Discrete Representations Strengthen Vision Transf…
|
44.74
|
2021-11-20
|
|
SEER (RegNet10B)
|
Vision Models Are More Robust And Fair When Pretr…
|
43.90
|
2022-02-16
|
|
FAN-L-Hybrid+STL
|
Fully Attentional Networks with Self-emerging Tok…
|
43.40
|
2024-01-08
|
|
Pyramid Adversarial Training Improves ViT (Im21k)
|
Pyramid Adversarial Training Improves ViT Perform…
|
42.16
|
2021-11-30
|
|
VOLO-D5+HAT
|
Improving Vision Transformers by Revisiting High-…
|
40.30
|
2022-04-03
|
|
GPaCo (ViT-L)
|
Generalized Parametric Contrastive Learning
|
39.70
|
2022-09-26
|
|
Discrete Adversarial Distillation (ViT-B,224)
|
Distilling Out-of-Distribution Robustness from Vi…
|
34.90
|
2023-11-02
|
|
ConvFormer-B36 (IN21K)
|
MetaFormer Baselines for Vision
|
34.70
|
2022-10-24
|
|
MAE+DAT (ViT-H)
|
Enhance the Visual Representation via Discrete Ad…
|
34.39
|
2022-09-16
|
|
MAE (ViT-H, 448)
|
Masked Autoencoders Are Scalable Vision Learners
|
33.50
|
2021-11-11
|
|
ConvFormer-B36 (IN21K, 384)
|
MetaFormer Baselines for Vision
|
33.50
|
2022-10-24
|
|
LLE (ViT-H/14, MAE, Edge Aug)
|
A Whac-A-Mole Dilemma: Shortcuts Come in Multiple…
|
33.10
|
2022-12-09
|
|
ConvNeXt-XL (Im21k, 384)
|
A ConvNet for the 2020s
|
31.80
|
2022-01-10
|
|
CAFormer-B36 (IN21K)
|
MetaFormer Baselines for Vision
|
31.70
|
2022-10-24
|
|
LLE (ViT-B/16, SWAG, Edge Aug)
|
A Whac-A-Mole Dilemma: Shortcuts Come in Multiple…
|
31.30
|
2022-12-09
|
|
CAFormer-B36 (IN21K, 384)
|
MetaFormer Baselines for Vision
|
29.60
|
2022-10-24
|
|
FAN-Hybrid-L(IN-21K, 384))
|
Understanding The Robustness in Vision Transforme…
|
28.90
|
2022-04-26
|
|
CAR-FT (CLIP, ViT-L/14@336px)
|
Context-Aware Robust Fine-Tuning
|
10.30
|
2022-11-29
|
|
Model soups (ViT-G/14)
|
Model soups: averaging weights of multiple fine-t…
|
4.54
|
2022-03-10
|
|
Model soups (BASIC-L)
|
Model soups: averaging weights of multiple fine-t…
|
3.90
|
2022-03-10
|
|