YouCook2

Name: YouCook2
Published: 2018-01-01
License: Custom

Dataset Information

Modalities

Videos, Texts

Languages

English

Introduced

2018

License

Custom

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

YouCook2 is the largest task-oriented, instructional video dataset in the vision community. It contains 2000 long untrimmed videos from 89 cooking recipes; on average, each distinct recipe has 22 videos. The procedure steps for each video are annotated with temporal boundaries and described by imperative English sentences (see the example below). The videos were downloaded from YouTube and are all in the third-person viewpoint. All the videos are unconstrained and can be performed by individual persons at their houses with unfixed cameras. YouCook2 contains rich recipe types and various cooking styles from all over the world.

Source: http://youcook2.eecs.umich.edu/
Image Source: https://competitions.codalab.org/competitions/20594

Variants: YouCook2

Associated Benchmarks

This dataset is used in 5 benchmarks:

Video Captioning - Metrics: BLEU-4, BLEU-3, CIDEr, ROUGE-L, METEOR
Video Retrieval - Metrics: text-to-video R@1, text-to-video R@5, text-to-video R@10, text-to-video Median Rank, text-to-video Mean Rank
Dense Video Captioning - Metrics: CIDEr, METEOR, SODA, BLEU4, ROUGE-L, F1, Precision, Recall
Long Video Retrieval (Background Removed) - Metrics: Cap. Avg. R@1, Cap. Avg. R@5, Cap. Avg. R@10, DTW R@1, DTW R@5, DTW R@10, OTAM R@1, OTAM R@5, OTAM R@10
Zero-Shot Video Retrieval - Metrics: text-to-video R@1, text-to-video R@5, text-to-video R@10, text-to-video Mean Rank, text-to-video Median Rank

Recent Benchmark Submissions

Task	Model	Paper	Date
Dense Video Captioning	HiCM²	HiCM$^2$: Hierarchical Compact Memory Modeling …	2024-12-19
Dense Video Captioning	CM²	Do You Remember? Dense Video …	2024-04-11
Video Captioning	MA-LMM	MA-LMM: Memory-Augmented Large Multimodal Model …	2024-04-08
Zero-Shot Video Retrieval	Norton	Multi-granularity Correspondence Learning from Long-term …	2024-01-30
Long Video Retrieval (Background Removed)	Norton	Multi-granularity Correspondence Learning from Long-term …	2024-01-30
Video Retrieval	OmniVec (pretrained)	OmniVec: Learning robust representations with …	2023-11-07
Video Retrieval	OmniVec	OmniVec: Learning robust representations with …	2023-11-07
Zero-Shot Video Retrieval	HowToCaption	HowToCaption: Prompting LLMs to Transform …	2023-10-07
Video Captioning	HowToCaption	HowToCaption: Prompting LLMs to Transform …	2023-10-07
Zero-Shot Video Retrieval	VAST, HowToCaption-finetuned	HowToCaption: Prompting LLMs to Transform …	2023-10-07
Video Captioning	COSA	COSA: Concatenated Sample Pretrained Vision-Language …	2023-06-15
Video Retrieval	VAST	VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation …	2023-05-29
Video Captioning	VAST	VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation …	2023-05-29
Video Captioning	UniVL + MELTR	MELTR: Meta Loss Transformer for …	2023-03-23
Video Retrieval	UniVL + MELTR	MELTR: Meta Loss Transformer for …	2023-03-23
Video Captioning	TextKG	Text with Knowledge Graph Augmented …	2023-03-22
Dense Video Captioning	GVL	Learning Grounded Vision-Language Representation for …	2023-03-11
Dense Video Captioning	Vid2Seq	Vid2Seq: Large-Scale Pretraining of a …	2023-02-27
Long Video Retrieval (Background Removed)	TempCLR	TempCLR: Temporal Alignment Representation with …	2022-12-28
Video Retrieval	VideoCoCa (zero-shot)	VideoCoCa: Video-Text Modeling with Zero-Shot …	2022-12-09

Research Papers

Recent papers with results on this dataset:

External Links:

YouCook2

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview