CoCa
|
CoCa: Contrastive Captioners are Image-Text Found…
|
96.50
|
2022-05-04
|
|
LiT ViT-e
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
96.10
|
2022-09-14
|
|
LiT-22B
|
Scaling Vision Transformers to 22 Billion Paramet…
|
96.00
|
2023-02-10
|
|
BASIC
|
Combined Scaling for Zero-shot Transfer Learning
|
95.70
|
2021-11-19
|
|
EVA-CLIP-18B
|
EVA-CLIP-18B: Scaling CLIP to 18 Billion Paramete…
|
95.70
|
2024-02-06
|
|
EVA-CLIP-E/14+
|
EVA-CLIP: Improved Training Techniques for CLIP a…
|
94.50
|
2023-03-27
|
|
LiT-tuning
|
LiT: Zero-Shot Transfer with Locked-image text Tu…
|
93.90
|
2021-11-15
|
|
ALIGN
|
Scaling Up Visual and Vision-Language Representat…
|
92.20
|
2021-02-11
|
|
CLIP
|
Learning Transferable Visual Models From Natural …
|
88.90
|
2021-02-26
|
|
AltCLIP
|
AltCLIP: Altering the Language Encoder in CLIP fo…
|
87.20
|
2022-11-12
|
|
PaLI
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
81.97
|
2022-09-14
|
|