ML Research Wiki / Benchmarks / Video Retrieval / LSMDC

LSMDC

Video Retrieval Benchmark

Performance Over Time

📊 Showing 38 results | 📏 Metric: text-to-video R@1

Top Performing Models

Rank Model Paper text-to-video R@1 Date Code
1 EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015) Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations 53.70 2022-11-21 📦 jpthu17/emcl 📦 jpthu17/diffusionret 📦 jpthu17/HBI 📦 jpthu17/dicosa
2 InternVideo2-6B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 46.40 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
3 vid-TLDR (UMT-L) 📚 vid-TLDR: Training Free Token merging for Light-weight Video Transformer 43.10 2024-03-20 📦 mlvlab/vid-tldr
4 UMT-L (ViT-L/16) 📚 Unmasked Teacher: Towards Training-Efficient Video Foundation Models 43.00 2023-03-28 📦 opengvlab/unmasked_teacher
5 HunYuan_tvr (huge) 📚 Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations 40.40 2022-04-07 -
6 COSA 📚 COSA: Concatenated Sample Pretrained Vision-Language Foundation Model 39.40 2023-06-15 📦 txh-mercury/cosa
7 mPLUG-2 📚 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 34.40 2023-02-01 📦 modelscope/modelscope 📦 x-plug/mplug-owl 📦 alibaba/AliceMind 📦 X-PLUG/mPLUG-2
8 VALOR 📚 VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset 34.20 2023-04-17 📦 TXH-mercury/VALOR
9 InternVideo 📚 InternVideo: General Video Foundation Models via Generative and Discriminative Learning 34.00 2022-12-06 📦 opengvlab/internvideo 📦 yingsen1/unimd
10 CLIP-ViP 📚 CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment 30.70 2022-09-14 📦 microsoft/xpretrain

All Papers (38)