M2-Encoder
|
M2-Encoder: Advancing Bilingual Image-Text Unders…
|
88.50
|
2024-01-29
|
|
CoCa
|
CoCa: Contrastive Captioners are Image-Text Found…
|
86.30
|
2022-05-04
|
|
LiT-22B
|
Scaling Vision Transformers to 22 Billion Paramet…
|
85.90
|
2023-02-10
|
|
BASIC
|
Combined Scaling for Zero-shot Transfer Learning
|
85.70
|
2021-11-19
|
|
LiT ViT-e
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
85.40
|
2022-09-14
|
|
LiT-tuning
|
LiT: Zero-Shot Transfer with Locked-image text Tu…
|
84.50
|
2021-11-15
|
|
IMP-MoE-L
|
Alternating Gradient Descent and Mixture-of-Exper…
|
83.90
|
2023-05-10
|
|
EVA-CLIP-18B
|
EVA-CLIP-18B: Scaling CLIP to 18 Billion Paramete…
|
83.80
|
2024-02-06
|
|
InternVL-C
|
InternVL: Scaling up Vision Foundation Models and…
|
83.20
|
2023-12-21
|
|
MAWS (ViT-2B)
|
The effectiveness of MAE pre-pretraining for bill…
|
82.10
|
2023-03-23
|
|
EVA-CLIP-E/14+
|
EVA-CLIP: Improved Training Techniques for CLIP a…
|
82.00
|
2023-03-27
|
|
MAWS (ViT-H)
|
The effectiveness of MAE pre-pretraining for bill…
|
81.10
|
2023-03-23
|
|
REACT
|
Learning Customized Visual Models with Retrieval-…
|
78.50
|
2023-01-17
|
|
ALIGN
|
Scaling Up Visual and Vision-Language Representat…
|
76.40
|
2021-02-11
|
|
CLIP(ViT-L/14-336px)
|
Learning Transferable Visual Models From Natural …
|
76.20
|
2021-02-26
|
|
AltCLIP
|
AltCLIP: Altering the Language Encoder in CLIP fo…
|
74.50
|
2022-11-12
|
|
PaLI
|
PaLI: A Jointly-Scaled Multilingual Language-Imag…
|
72.11
|
2022-09-14
|
|
Diffusion Classifier (zero-shot)
|
Your Diffusion Model is Secretly a Zero-Shot Clas…
|
61.40
|
2023-03-28
|
|
CLIP (ResNet50)
|
Learning Transferable Visual Models From Natural …
|
59.60
|
2021-02-26
|
|
CLIP
|
Learning Transferable Visual Models From Natural …
|
31.30
|
2021-02-26
|
|