ML Research Wiki / Benchmarks / Dense Video Captioning / YouCook2

YouCook2

Dense Video Captioning Benchmark

Performance Over Time

📊 Showing 6 results | 📏 Metric: CIDEr

Rank	Model	Paper	CIDEr	Date	Code
1	E2vidD6-MASSalign-BiD 📚	Multimodal Pretraining for Dense Video Captioning	39.03	2020-11-10	📦 google-research-datasets/Video-Timeline-Tags-ViTT
2	HiCM² 📚	HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning	32.51	2024-12-19	📦 ailab-kyunghee/HiCM2-DVC
3	CM²	Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval	28.43	2024-04-11	📦 ailab-kyunghee/cm2_dvc
4	Vid2Seq 📚	Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning	7.90	2023-02-27	📦 google-research/scenic 📦 antoyang/VidChapters 📦 KastanDay/video-pretrained-transformer
5	GVL	Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos	4.91	2023-03-11	📦 zjr2000/gvl
6	PDVC (TSN features, no SCST)	End-to-End Dense Video Captioning with Parallel Decoding	4.42	2021-08-17	📦 ttengwang/pdvc 📦 aim3-ruc/youmakeup_challenge2022

2020

E2vidD6-MASSalign-BiD

google-research-datasets/Video-Timeline-Tags-ViTT

2024

HiCM²

ailab-kyunghee/HiCM2-DVC

2024

CM²

ailab-kyunghee/cm2_dvc

2023

Vid2Seq

google-research/scenic antoyang/VidChapters KastanDay/video-pretrained-transformer

2023

GVL

zjr2000/gvl

2021

PDVC (TSN features, no SCST)

ttengwang/pdvc aim3-ruc/youmakeup_challenge2022

Model	Paper	CIDEr	Date
E2vidD6-MASSalign-BiD	Multimodal Pretraining for Dense Video Captioning	39.03	2020-11-10
HiCM²	HiCM$^2$: Hierarchical Compact Memory Modeling fo…	32.51	2024-12-19
CM²	Do You Remember? Dense Video Captioning with Cros…	28.43	2024-04-11
Vid2Seq	Vid2Seq: Large-Scale Pretraining of a Visual Lang…	7.90	2023-02-27
GVL	Learning Grounded Vision-Language Representation …	4.91	2023-03-11
PDVC (TSN features, no SCST)	End-to-End Dense Video Captioning with Parallel D…	4.42	2021-08-17