CoCa
|
CoCa: Contrastive Captioners are Image-Text Found…
|
90.20
|
2022-05-04
|
|
LiT-22B
|
Scaling Vision Transformers to 22 Billion Paramet…
|
90.10
|
2023-02-10
|
|
LiT ViT-e
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
88.00
|
2022-09-14
|
|
EVA-CLIP-18B
|
EVA-CLIP-18B: Scaling CLIP to 18 Billion Paramete…
|
87.30
|
2024-02-06
|
|
BASIC
|
Combined Scaling for Zero-shot Transfer Learning
|
85.60
|
2021-11-19
|
|
InternVL-C
|
InternVL: Scaling up Vision Foundation Models and…
|
83.80
|
2023-12-21
|
|
EVA-CLIP-E/14+
|
EVA-CLIP: Improved Training Techniques for CLIP a…
|
82.10
|
2023-03-27
|
|
LiT-tuning
|
LiT: Zero-Shot Transfer with Locked-image text Tu…
|
79.40
|
2021-11-15
|
|
CLIP
|
Learning Transferable Visual Models From Natural …
|
77.20
|
2021-02-26
|
|
ALIGN
|
Scaling Up Visual and Vision-Language Representat…
|
75.80
|
2021-02-11
|
|
AltCLIP
|
AltCLIP: Altering the Language Encoder in CLIP fo…
|
69.50
|
2022-11-12
|
|
PaLI
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
44.70
|
2022-09-14
|
|