ML Research Wiki / Benchmarks / Image Retrieval / PhotoChat

PhotoChat

Image Retrieval Benchmark

Performance Over Time

📊 Showing 5 results | 📏 Metric: R1

Rank	Model	Paper	R1	Date	Code
1	PaCE	PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts	15.20	2023-05-24	📦 AlibabaResearch/DAMO-ConvAI
2	VLMo	VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts	11.50	2021-11-03	📦 microsoft/unilm 📦 ylsung/vl-merging
3	ViLT	ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision	11.50	2021-02-05	📦 huggingface/transformers 📦 dandelin/vilt 📦 glamor-usc/climb
4	SCAN	Stacked Cross Attention for Image-Text Matching	10.40	2018-03-21	📦 kuanghuei/SCAN 📦 MysteryVaibhav/SCAN 📦 adlnlp/attention_vl
5	DE++	PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling	9.00	2021-07-06	-

2023

PaCE

AlibabaResearch/DAMO-ConvAI

2021

VLMo

microsoft/unilm ylsung/vl-merging

2021

ViLT

huggingface/transformers dandelin/vilt

2018

SCAN

kuanghuei/SCAN MysteryVaibhav/SCAN

2021

DE++

Model	Paper	R1	Date
PaCE	PaCE: Unified Multi-modal Dialogue Pre-training w…	15.20	2023-05-24
VLMo	VLMo: Unified Vision-Language Pre-Training with M…	11.50	2021-11-03
ViLT	ViLT: Vision-and-Language Transformer Without Con…	11.50	2021-02-05
SCAN	Stacked Cross Attention for Image-Text Matching	10.40	2018-03-21
DE++	PhotoChat: A Human-Human Dialogue Dataset with Ph…	9.00	2021-07-06