ML Research Wiki / Benchmarks / Zero-Shot Transfer Image Classification / ImageNet-A

ImageNet-A

Zero-Shot Transfer Image Classification Benchmark

Performance Over Time

📊 Showing 12 results | 📏 Metric: Accuracy (Private)

Top Performing Models

Rank Model Paper Accuracy (Private) Date Code
1 CoCa CoCa: Contrastive Captioners are Image-Text Foundation Models 90.20 2022-05-04 📦 mlfoundations/open_clip 📦 facebookresearch/multimodal 📦 lucidrains/CoCa-pytorch
2 LiT-22B Scaling Vision Transformers to 22 Billion Parameters 90.10 2023-02-10 📦 lucidrains/flash-cosine-sim-attention
3 LiT ViT-e PaLI: A Jointly-Scaled Multilingual Language-Image Model 88.00 2022-09-14 📦 google-research/big_vision
4 EVA-CLIP-18B EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters 87.30 2024-02-06 📦 baaivision/EVA 📦 baaivision/eva
5 BASIC Combined Scaling for Zero-shot Transfer Learning 85.60 2021-11-19 -
6 InternVL-C InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks 83.80 2023-12-21 📦 opengvlab/internvl 📦 opengvlab/internvl-mmdetseg
7 EVA-CLIP-E/14+ EVA-CLIP: Improved Training Techniques for CLIP at Scale 82.10 2023-03-27 📦 baaivision/eva 📦 PaddlePaddle/PaddleMIX 📦 Yui010206/CREMA 📦 jaehong31/raccoon
8 LiT-tuning LiT: Zero-Shot Transfer with Locked-image text Tuning 79.40 2021-11-15 📦 mlfoundations/open_clip 📦 google-research/vision_transformer 📦 google-research/big_vision 📦 laion-ai/clip_benchmark 📦 eify/clip_benchmark
9 CLIP Learning Transferable Visual Models From Natural Language Supervision 77.20 2021-02-26 📦 openai/CLIP 📦 mlfoundations/open_clip 📦 towhee-io/towhee
10 ALIGN Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 75.80 2021-02-11 📦 facebookresearch/metaclip 📦 kakaobrain/coyo-dataset 📦 MicPie/clasp 📦 willard-yuan/video-text-retrieval-papers 📦 pwc-1/Paper-8

All Papers (12)