ML Research Wiki / Benchmarks / Zero-Shot Video Retrieval / MSVD

MSVD

Zero-Shot Video Retrieval Benchmark

Performance Over Time

📊 Showing 14 results | 📏 Metric: text-to-video R@1

Top Performing Models

Rank Model Paper text-to-video R@1 Date Code
1 InternVideo2-6B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 59.30 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
2 InternVideo2-1B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 58.10 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
3 VAST, HowToCaption-finetuned HowToCaption: Prompting LLMs to Transform Video Annotations at Scale 54.80 2023-10-07 📦 ninatu/howtocaption
4 LanguageBind(ViT-L/14) 📚 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment 54.10 2023-10-03 📦 PKU-YuanGroup/Video-LLaVA 📦 PKU-YuanGroup/MoE-LLaVA 📦 pku-yuangroup/languagebind
5 LanguageBind(ViT-H/14) 📚 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment 53.90 2023-10-03 📦 PKU-YuanGroup/Video-LLaVA 📦 PKU-YuanGroup/MoE-LLaVA 📦 pku-yuangroup/languagebind
6 vid-TLDR (UMT-L) 📚 vid-TLDR: Training Free Token merging for Light-weight Video Transformer 50.00 2024-03-20 📦 mlvlab/vid-tldr
7 UMT-L (ViT-L/16) 📚 Unmasked Teacher: Towards Training-Efficient Video Foundation Models 49.00 2023-03-28 📦 opengvlab/unmasked_teacher
8 HowToCaption HowToCaption: Prompting LLMs to Transform Video Annotations at Scale 44.50 2023-10-07 📦 ninatu/howtocaption
9 MILES MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval 44.40 2022-04-26 📦 tencentarc/mcq
10 Y. Ge et. al. Bridging Video-text Retrieval with Multiple Choice Questions 43.60 2022-01-13 📦 towhee-io/towhee 📦 tencentarc/mcq

All Papers (14)