📊 Showing 6 results | 📏 Metric: CIDEr
Rank | Model | Paper | CIDEr | Date | Code |
---|---|---|---|---|---|
1 | E2vidD6-MASSalign-BiD 📚 | Multimodal Pretraining for Dense Video Captioning | 39.03 | 2020-11-10 | 📦 google-research-datasets/Video-Timeline-Tags-ViTT |
2 | HiCM² 📚 | HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning | 32.51 | 2024-12-19 | 📦 ailab-kyunghee/HiCM2-DVC |
3 | CM² | Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | 28.43 | 2024-04-11 | 📦 ailab-kyunghee/cm2_dvc |
4 | Vid2Seq 📚 | Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning | 7.90 | 2023-02-27 | 📦 google-research/scenic 📦 antoyang/VidChapters 📦 KastanDay/video-pretrained-transformer |
5 | GVL | Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos | 4.91 | 2023-03-11 | 📦 zjr2000/gvl |
6 | PDVC (TSN features, no SCST) | End-to-End Dense Video Captioning with Parallel Decoding | 4.42 | 2021-08-17 | 📦 ttengwang/pdvc 📦 aim3-ruc/youmakeup_challenge2022 |