Flickr30k

Name: Flickr30k
Published: 2014-01-01
License: Custom (research-only, non-commercial)

Dataset Information

Modalities

Images, Texts

Languages

English

Introduced

2014

License

Custom (research-only, non-commercial)

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The Flickr30k dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators.

Source: Guiding Long-Short Term Memory for Image Caption Generation

Image Source: Dual-Path Convolutional Image-Text Embedding with Instance Loss

Variants: Flickr30k, Flickr, Flickr30k Captions test, Flickr30K 1K test

Associated Benchmarks

This dataset is used in 5 benchmarks:

Image Retrieval - Metrics: Recall@10, Recall@5, Recall@1, Recall@Sum, Image-to-text R@1, Image-to-text R@10, Image-to-text R@5, QPS
Phrase Grounding - Metrics: Pointing Game Accuracy
Semi Supervised Learning for Image Captioning - Metrics: CIDEr
Cross-Modal Retrieval - Metrics: Image-to-text R@1, Image-to-text R@5, Image-to-text R@10, Text-to-image R@1, Text-to-image R@5, Text-to-image R@10
Image-to-Text Retrieval - Metrics: Recall@1, Recall@5, Recall@10, Recall@Sum

Recent Benchmark Submissions

Task	Model	Paper	Date
Cross-Modal Retrieval	3SHNet	3SHNet: Boosting Image-Sentence Retrieval via …	2024-04-26
Cross-Modal Retrieval	DSMD	Dynamic Self-adaptive Multiscale Distillation from …	2024-04-16
Image-to-Text Retrieval	InternVL-C-FT (finetuned, w/o ranking)	InternVL: Scaling up Vision Foundation …	2023-12-21
Image-to-Text Retrieval	InternVL-G-FT (finetuned, w/o ranking)	InternVL: Scaling up Vision Foundation …	2023-12-21
Cross-Modal Retrieval	VAST	VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation …	2023-05-29
Image-to-Text Retrieval	ONE-PEACE (finetuned, w/o ranking)	ONE-PEACE: Exploring One General Representation …	2023-05-18
Image Retrieval	MaMMUT (ours)	MaMMUT: A Simple Architecture for …	2023-03-29
Cross-Modal Retrieval	RCAR	Plug-and-Play Regulators for Image-Text Matching	2023-03-23
Image-to-Text Retrieval	BLIP-2 ViT-G (zero-shot, 1K test set)	BLIP-2: Bootstrapping Language-Image Pre-training with …	2023-01-30
Image Retrieval	BLIP-2 ViT-G (zero-shot, 1K test set)	BLIP-2: Bootstrapping Language-Image Pre-training with …	2023-01-30
Image Retrieval	BLIP-2 ViT-L (zero-shot, 1K test set)	BLIP-2: Bootstrapping Language-Image Pre-training with …	2023-01-30
Image-to-Text Retrieval	BLIP-2 ViT-L (zero-shot, 1K test set)	BLIP-2: Bootstrapping Language-Image Pre-training with …	2023-01-30
Image-to-Text Retrieval	UNITER	HADA: A Graph-based Amalgamation Framework …	2023-01-11
Image Retrieval	UNITER	HADA: A Graph-based Amalgamation Framework …	2023-01-11
Image Retrieval	HADA	HADA: A Graph-based Amalgamation Framework …	2023-01-11
Image Retrieval	ALBEF	HADA: A Graph-based Amalgamation Framework …	2023-01-11
Image-to-Text Retrieval	ALBEF	HADA: A Graph-based Amalgamation Framework …	2023-01-11
Cross-Modal Retrieval	X2-VLM (base)	X$^2$-VLM: All-In-One Pre-trained Model For …	2022-11-22
Cross-Modal Retrieval	X2-VLM (large)	X$^2$-VLM: All-In-One Pre-trained Model For …	2022-11-22
Semi Supervised Learning for Image Captioning	CapDec	Text-Only Training for Image Captioning …	2022-11-01

Research Papers

Recent papers with results on this dataset:

External Links:

Flickr30k

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview