ML Research Wiki / Benchmarks / Zero-Shot Video Retrieval / ActivityNet

ActivityNet

Zero-Shot Video Retrieval Benchmark

Performance Over Time

📊 Showing 12 results | 📏 Metric: text-to-video R@1

Top Performing Models

Rank Model Paper text-to-video R@1 Date Code
1 InternVideo2-6B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 63.20 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
2 InternVideo2-1B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 60.40 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
3 GRAM 📚 Gramian Multimodal Representation Learning and Alignment 59.00 2024-12-16 📦 ispamm/GRAM 📦 luigisigillo/gwit
4 UMT-L (ViT-L/16) 📚 Unmasked Teacher: Towards Training-Efficient Video Foundation Models 42.80 2023-03-28 📦 opengvlab/unmasked_teacher
5 vid-TLDR (UMT-L) 📚 vid-TLDR: Training Free Token merging for Light-weight Video Transformer 42.80 2024-03-20 📦 mlvlab/vid-tldr
6 LanguageBind(ViT-H/14) 📚 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment 41.00 2023-10-03 📦 PKU-YuanGroup/Video-LLaVA 📦 PKU-YuanGroup/MoE-LLaVA 📦 pku-yuangroup/languagebind
7 LanguageBind(ViT-L/14) 📚 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment 38.40 2023-10-03 📦 PKU-YuanGroup/Video-LLaVA 📦 PKU-YuanGroup/MoE-LLaVA 📦 pku-yuangroup/languagebind
8 BT-Adapter 📚 BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning 37.00 2023-09-27 📦 farewellthree/BT-Adapter
9 VideoCoCa 📚 VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners 34.50 2022-12-09 -
10 Singularity-temporal-5M 📚 Revealing Single Frame Bias for Video-and-Language Learning 30.80 2022-06-07 📦 jayleicn/ClipBERT 📦 jayleicn/singularity

All Papers (12)