AIMv2-3B (448 res)
|
Multimodal Autoregressive Pre-training of Large V…
|
85.90
|
2024-11-21
|
|
Hiera-H (448px)
|
Hiera: A Hierarchical Vision Transformer without …
|
83.80
|
2023-06-01
|
|
MAE (ViT-H, 448)
|
Masked Autoencoders Are Scalable Vision Learners
|
83.40
|
2021-11-11
|
|
AIMv2-3B
|
Multimodal Autoregressive Pre-training of Large V…
|
81.50
|
2024-11-21
|
|
AIMv2-1B
|
Multimodal Autoregressive Pre-training of Large V…
|
79.70
|
2024-11-21
|
|
AIMv2-H
|
Multimodal Autoregressive Pre-training of Large V…
|
77.90
|
2024-11-21
|
|
AIMv2-L
|
Multimodal Autoregressive Pre-training of Large V…
|
76.00
|
2024-11-21
|
|
FixSENet-154
|
Fixing the train-test resolution discrepancy
|
75.40
|
2019-06-14
|
|
b_22DeiT-LT(ours)
|
DeiT-LT Distillation Strikes Back for Vision Tran…
|
75.10
|
2024-04-03
|
|
SEB+EfficientNet-B5
|
On the Eigenvalues of Global Covariance Pooling f…
|
72.30
|
2022-05-26
|
|
TransFG
|
TransFG: A Transformer Architecture for Fine-grai…
|
71.70
|
2021-03-14
|
|
iSQRT-COV-Net
|
Deep CNNs Meet Global Covariance Pooling: Better …
|
14.63
|
2019-04-15
|
|
MetaFormer
(MetaFormer-2,384,extra_info)
|
MetaFormer: A Unified Meta Framework for Fine-Gra…
|
|
2022-03-05
|
|
MetaFormer
(MetaFormer-2,384)
|
MetaFormer: A Unified Meta Framework for Fine-Gra…
|
|
2022-03-05
|
|
IncResNetV2 SE
|
The iNaturalist Species Classification and Detect…
|
|
2017-07-20
|
|
SpineNet-143
|
SpineNet: Learning Scale-Permuted Backbone for Re…
|
|
2019-12-10
|
|
MetaSAug
|
MetaSAug: Meta Semantic Augmentation for Long-Tai…
|
|
2021-03-23
|
|
Graph-RISE (40M)
|
Graph-RISE: Graph-Regularized Image Semantic Embe…
|
|
2019-02-14
|
|