ViT-L-14 (LAION400M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
47.86
|
2022-12-13
|
|
ViT-B-16+240 (LAION400M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
46.53
|
2022-12-13
|
|
ViT-B-16 (LAION400M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
44.93
|
2022-12-13
|
|
Swin-T (MosaiCLIP, CC-12M)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
44.50
|
2023-05-23
|
|
RN-50 (MosaiCLIP, CC-12M)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
44.40
|
2023-05-23
|
|
ViT-B-32 (LAION400M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
42.75
|
2022-12-13
|
|
MosaiCLIP (YFCC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
41.50
|
2023-05-23
|
|
RN-50 (NegCLIP, CC-12M)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
41.40
|
2023-05-23
|
|
MosaiCLIP (CC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
40.90
|
2023-05-23
|
|
RN50 (YFCC15M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
39.85
|
2022-12-13
|
|
Swin-T (NegCLIP, CC-12M)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
39.60
|
2023-05-23
|
|
CLIP (YFCC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
39.50
|
2023-05-23
|
|
RN101 (YFCC15M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
39.50
|
2022-12-13
|
|
NegCLIP (YFCC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
39.00
|
2023-05-23
|
|
CLIP-FT (YFCC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
38.30
|
2023-05-23
|
|
NegCLIP (CC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
37.50
|
2023-05-23
|
|
Swin-T (CLIP, CC-12M)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
37.30
|
2023-05-23
|
|
RN-50 (CLIP, CC-12M)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
36.70
|
2023-05-23
|
|
CLIP-FT (CC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
35.60
|
2023-05-23
|
|
CLIP (CC-FT)
|
Coarse-to-Fine Contrastive Learning in Image-Text…
|
35.00
|
2023-05-23
|
|
RN50 (CC12M)
|
CREPE: Can Vision-Language Foundation Models Reas…
|
34.88
|
2022-12-13
|
|
Random
|
CREPE: Can Vision-Language Foundation Models Reas…
|
20.00
|
2022-12-13
|
|