ML Research Wiki / Benchmarks / Video Retrieval / MSVD

MSVD

Video Retrieval Benchmark

Performance Over Time

📊 Showing 24 results | 📏 Metric: text-to-video R@1

Top Performing Models

Rank Model Paper text-to-video R@1 Date Code
1 InternVideo2-6B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 61.40 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
2 HunYuan_tvr (huge) 📚 Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations 59.00 2022-04-07 -
3 InternVideo 📚 InternVideo: General Video Foundation Models via Generative and Discriminative Learning 58.40 2022-12-06 📦 opengvlab/internvideo 📦 yingsen1/unimd
4 HunYuan_tvr 📚 Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations 58.20 2022-04-07 -
5 vid-TLDR (UMT-L) 📚 vid-TLDR: Training Free Token merging for Light-weight Video Transformer 57.90 2024-03-20 📦 mlvlab/vid-tldr
6 VLAB 📚 VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending 57.50 2023-05-22 -
7 MDMMT-2 📚 MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization 56.80 2022-03-14 -
8 Side4Video Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning 56.10 2023-11-27 📦 whwu95/ATM 📦 HJYao00/Side4Video
9 CAMoE 📚 Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss 51.80 2021-09-09 📦 starmemda/camow 📦 starmemda/CAMoE
10 Cap4Video Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? 51.80 2022-12-31 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE

All Papers (24)