ML Research Wiki / Benchmarks / Cross-Modal Retrieval / Flickr30k

Flickr30k

Cross-Modal Retrieval Benchmark

Performance Over Time

📊 Showing 23 results | 📏 Metric: Image-to-text R@1

Top Performing Models

Rank Model Paper Image-to-text R@1 Date Code
1 ERNIE-ViL 2.0 📚 ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training 93.30 2022-09-30 📦 PaddlePaddle/ERNIE
2 X2-VLM (large) 📚 X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks 91.80 2022-11-22 📦 zengyan-97/x-vlm 📦 zengyan-97/x2-vlm
3 VAST 📚 VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset 91.00 2023-05-29 📦 TXH-mercury/VALOR 📦 txh-mercury/vast
4 X2-VLM (base) 📚 X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks 90.40 2022-11-22 📦 zengyan-97/x-vlm 📦 zengyan-97/x2-vlm
5 BEiT-3 📚 Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks 90.30 2022-08-22 📦 microsoft/unilm 📦 lyan62/data-curation
6 OmniVL (14M) 📚 OmniVL:One Foundation Model for Image-Language and Video-Language Tasks 87.90 2022-09-15 -
7 X-VLM (base) 📚 Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts 86.90 2021-11-16 📦 zengyan-97/x-vlm
8 VSE-Gradient 📚 Dissecting Deep Metric Learning Losses for Image-Text Retrieval 86.30 2022-10-21 📦 microsoft/VSE_Gradient 📦 littleredxh/vse-gradient
9 ALIGN 📚 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 84.90 2021-02-11 📦 facebookresearch/metaclip 📦 kakaobrain/coyo-dataset 📦 MicPie/clasp 📦 willard-yuan/video-text-retrieval-papers 📦 pwc-1/Paper-8
10 IAIS 📚 Learning Relation Alignment for Calibrated Cross-modal Retrieval 76.86 2021-05-28 📦 lancopku/IAIS

All Papers (23)