ML Research Wiki / Benchmarks / Image Retrieval / CREPE (Compositional REPresentation Evaluation)

CREPE (Compositional REPresentation Evaluation)

Image Retrieval Benchmark

Performance Over Time

📊 Showing 22 results | 📏 Metric: Recall@1 (HN-Atom + HN-Comp, SC)

Top Performing Models

Rank	Model	Paper	Recall@1 (HN-Atom + HN-Comp, SC)	Date	Code
1	ViT-L-14 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reason Compositionally?	47.86	2022-12-13	📦 raivnlab/crepe
2	ViT-B-16+240 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reason Compositionally?	46.53	2022-12-13	📦 raivnlab/crepe
3	ViT-B-16 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reason Compositionally?	44.93	2022-12-13	📦 raivnlab/crepe
4	Swin-T (MosaiCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality	44.50	2023-05-23	-
5	RN-50 (MosaiCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality	44.40	2023-05-23	-
6	ViT-B-32 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reason Compositionally?	42.75	2022-12-13	📦 raivnlab/crepe
7	MosaiCLIP (YFCC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality	41.50	2023-05-23	-
8	RN-50 (NegCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality	41.40	2023-05-23	-
9	MosaiCLIP (CC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality	40.90	2023-05-23	-
10	RN50 (YFCC15M)	CREPE: Can Vision-Language Foundation Models Reason Compositionally?	39.85	2022-12-13	📦 raivnlab/crepe

All Papers (22)

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

ViT-L-14 (LAION400M)

raivnlab/crepe

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

ViT-B-16+240 (LAION400M)

raivnlab/crepe

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

ViT-B-16 (LAION400M)

raivnlab/crepe

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

Swin-T (MosaiCLIP, CC-12M)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

RN-50 (MosaiCLIP, CC-12M)

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

ViT-B-32 (LAION400M)

raivnlab/crepe

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

MosaiCLIP (YFCC-FT)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

RN-50 (NegCLIP, CC-12M)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

MosaiCLIP (CC-FT)

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

RN50 (YFCC15M)

raivnlab/crepe

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

Swin-T (NegCLIP, CC-12M)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

CLIP (YFCC-FT)

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

RN101 (YFCC15M)

raivnlab/crepe

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

NegCLIP (YFCC-FT)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

CLIP-FT (YFCC-FT)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

NegCLIP (CC-FT)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

Swin-T (CLIP, CC-12M)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

RN-50 (CLIP, CC-12M)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

CLIP-FT (CC-FT)

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

2023

CLIP (CC-FT)

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

RN50 (CC12M)

raivnlab/crepe

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

2022

Random

raivnlab/crepe

CREPE (Compositional REPresentation Evaluation)

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (22)

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

Model	Paper	Recall@1 (HN-Atom + HN-Comp, SC)	Date
ViT-L-14 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reas…	47.86	2022-12-13
ViT-B-16+240 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reas…	46.53	2022-12-13
ViT-B-16 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reas…	44.93	2022-12-13
Swin-T (MosaiCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text…	44.50	2023-05-23
RN-50 (MosaiCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text…	44.40	2023-05-23
ViT-B-32 (LAION400M)	CREPE: Can Vision-Language Foundation Models Reas…	42.75	2022-12-13
MosaiCLIP (YFCC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	41.50	2023-05-23
RN-50 (NegCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text…	41.40	2023-05-23
MosaiCLIP (CC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	40.90	2023-05-23
RN50 (YFCC15M)	CREPE: Can Vision-Language Foundation Models Reas…	39.85	2022-12-13
Swin-T (NegCLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text…	39.60	2023-05-23
CLIP (YFCC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	39.50	2023-05-23
RN101 (YFCC15M)	CREPE: Can Vision-Language Foundation Models Reas…	39.50	2022-12-13
NegCLIP (YFCC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	39.00	2023-05-23
CLIP-FT (YFCC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	38.30	2023-05-23
NegCLIP (CC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	37.50	2023-05-23
Swin-T (CLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text…	37.30	2023-05-23
RN-50 (CLIP, CC-12M)	Coarse-to-Fine Contrastive Learning in Image-Text…	36.70	2023-05-23
CLIP-FT (CC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	35.60	2023-05-23
CLIP (CC-FT)	Coarse-to-Fine Contrastive Learning in Image-Text…	35.00	2023-05-23
RN50 (CC12M)	CREPE: Can Vision-Language Foundation Models Reas…	34.88	2022-12-13
Random	CREPE: Can Vision-Language Foundation Models Reas…	20.00	2022-12-13