Flickr30k

Dataset Information
Modalities
Images, Texts
Languages
English
Introduced
2014
Homepage

Overview

The Flickr30k dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators.

Source: Guiding Long-Short Term Memory for Image Caption Generation

Image Source: Dual-Path Convolutional Image-Text Embedding with Instance Loss

Variants: Flickr30k, Flickr, Flickr30k Captions test, Flickr30K 1K test

Associated Benchmarks

This dataset is used in 5 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Cross-Modal Retrieval 3SHNet 3SHNet: Boosting Image-Sentence Retrieval via … 2024-04-26
Cross-Modal Retrieval DSMD Dynamic Self-adaptive Multiscale Distillation from … 2024-04-16
Image-to-Text Retrieval InternVL-C-FT (finetuned, w/o ranking) InternVL: Scaling up Vision Foundation … 2023-12-21
Image-to-Text Retrieval InternVL-G-FT (finetuned, w/o ranking) InternVL: Scaling up Vision Foundation … 2023-12-21
Cross-Modal Retrieval VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … 2023-05-29
Image-to-Text Retrieval ONE-PEACE (finetuned, w/o ranking) ONE-PEACE: Exploring One General Representation … 2023-05-18
Image Retrieval MaMMUT (ours) MaMMUT: A Simple Architecture for … 2023-03-29
Cross-Modal Retrieval RCAR Plug-and-Play Regulators for Image-Text Matching 2023-03-23
Image-to-Text Retrieval BLIP-2 ViT-G (zero-shot, 1K test set) BLIP-2: Bootstrapping Language-Image Pre-training with … 2023-01-30
Image Retrieval BLIP-2 ViT-G (zero-shot, 1K test set) BLIP-2: Bootstrapping Language-Image Pre-training with … 2023-01-30
Image Retrieval BLIP-2 ViT-L (zero-shot, 1K test set) BLIP-2: Bootstrapping Language-Image Pre-training with … 2023-01-30
Image-to-Text Retrieval BLIP-2 ViT-L (zero-shot, 1K test set) BLIP-2: Bootstrapping Language-Image Pre-training with … 2023-01-30
Image-to-Text Retrieval UNITER HADA: A Graph-based Amalgamation Framework … 2023-01-11
Image Retrieval UNITER HADA: A Graph-based Amalgamation Framework … 2023-01-11
Image Retrieval HADA HADA: A Graph-based Amalgamation Framework … 2023-01-11
Image Retrieval ALBEF HADA: A Graph-based Amalgamation Framework … 2023-01-11
Image-to-Text Retrieval ALBEF HADA: A Graph-based Amalgamation Framework … 2023-01-11
Cross-Modal Retrieval X2-VLM (base) X$^2$-VLM: All-In-One Pre-trained Model For … 2022-11-22
Cross-Modal Retrieval X2-VLM (large) X$^2$-VLM: All-In-One Pre-trained Model For … 2022-11-22
Semi Supervised Learning for Image Captioning CapDec Text-Only Training for Image Captioning … 2022-11-01

Research Papers

Recent papers with results on this dataset: