ML Research Wiki / Benchmarks / Zero-Shot Video Retrieval / LSMDC

LSMDC

Zero-Shot Video Retrieval Benchmark

Performance Over Time

📊 Showing 16 results | 📏 Metric: text-to-video R@1

Top Performing Models

Rank Model Paper text-to-video R@1 Date Code
1 InternVideo2-6B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 33.80 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
2 InternVideo2-1B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 32.00 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
3 VAST, HowToCaption-finetuned HowToCaption: Prompting LLMs to Transform Video Annotations at Scale 27.70 2023-10-07 📦 ninatu/howtocaption
4 UMT-L (ViT-L/16) 📚 Unmasked Teacher: Towards Training-Efficient Video Foundation Models 25.20 2023-03-28 📦 opengvlab/unmasked_teacher
5 mPLUG-2 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 24.10 2023-02-01 📦 modelscope/modelscope 📦 x-plug/mplug-owl 📦 alibaba/AliceMind 📦 X-PLUG/mPLUG-2
6 BT-Adapter BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning 19.50 2023-09-27 📦 farewellthree/BT-Adapter
7 HiTeA-17M 📚 HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training 18.30 2022-12-30 -
8 InternVideo 📚 InternVideo: General Video Foundation Models via Generative and Discriminative Learning 17.60 2022-12-06 📦 opengvlab/internvideo 📦 yingsen1/unimd
9 HowToCaption HowToCaption: Prompting LLMs to Transform Video Annotations at Scale 17.30 2023-10-07 📦 ninatu/howtocaption
10 Yatai Ji et. al. Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning 17.20 2022-11-24 📦 iigroup/scl

All Papers (16)