LiT-22B
|
Scaling Vision Transformers to 22 Billion Paramet…
|
87.60
|
2023-02-10
|
|
LiT ViT-e
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
84.90
|
2022-09-14
|
|
CoCa
|
CoCa: Contrastive Captioners are Image-Text Found…
|
82.70
|
2022-05-04
|
|
EVA-CLIP-18B
|
EVA-CLIP-18B: Scaling CLIP to 18 Billion Paramete…
|
82.20
|
2024-02-06
|
|
LiT-tuning
|
LiT: Zero-Shot Transfer with Locked-image text Tu…
|
81.10
|
2021-11-15
|
|
InternVL-C
|
InternVL: Scaling up Vision Foundation Models and…
|
80.60
|
2023-12-21
|
|
EVA-CLIP-E/14+
|
EVA-CLIP: Improved Training Techniques for CLIP a…
|
79.60
|
2023-03-27
|
|
CLIP
|
Learning Transferable Visual Models From Natural …
|
72.30
|
2021-02-26
|
|
PaLI
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
42.62
|
2022-09-14
|
|