ML Research Wiki / Benchmarks / Image-to-Text Retrieval / Flickr30k

Flickr30k

Image-to-Text Retrieval Benchmark

Performance Over Time

📊 Showing 11 results | 📏 Metric: Recall@1

Top Performing Models

Rank Model Paper Recall@1 Date Code
1 InternVL-G-FT (finetuned, w/o ranking) InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks 97.90 2023-12-21 📦 opengvlab/internvl 📦 opengvlab/internvl-mmdetseg
2 BLIP-2 ViT-G (zero-shot, 1K test set) BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 97.60 2023-01-30 📦 huggingface/transformers 📦 salesforce/lavis 📦 thudm/visualglm-6b
3 ONE-PEACE (finetuned, w/o ranking) ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities 97.60 2023-05-18 📦 modelscope/modelscope 📦 OFA-Sys/ONE-PEACE
4 InternVL-C-FT (finetuned, w/o ranking) InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks 97.20 2023-12-21 📦 opengvlab/internvl 📦 opengvlab/internvl-mmdetseg
5 BLIP-2 ViT-L (zero-shot, 1K test set) BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 96.90 2023-01-30 📦 huggingface/transformers 📦 salesforce/lavis 📦 thudm/visualglm-6b
6 ERNIE-ViL 2.0 ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training 96.10 2022-09-30 📦 PaddlePaddle/ERNIE
7 ALBEF Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 95.90 2021-07-16 📦 salesforce/lavis 📦 salesforce/ALBEF 📦 facebookresearch/multimodal
8 ALBEF HADA: A Graph-based Amalgamation Framework in Image-text Retrieval 92.60 2023-01-11 📦 m2man/hada 📦 m2man/HADA-LAVIS
9 UNITER HADA: A Graph-based Amalgamation Framework in Image-text Retrieval 87.30 2023-01-11 📦 m2man/hada 📦 m2man/HADA-LAVIS
10 GSMN A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval 76.40 2021-06-04 📦 m2man/LGSGM

All Papers (11)