ML Research Wiki / Benchmarks / Video Retrieval / VATEX

VATEX

Video Retrieval Benchmark

Performance Over Time

📊 Showing 12 results | 📏 Metric: text-to-video R@1

Top Performing Models

Rank Model Paper text-to-video R@1 Date Code
1 GRAM 📚 Gramian Multimodal Representation Learning and Alignment 87.70 2024-12-16 📦 ispamm/GRAM 📦 luigisigillo/gwit
2 VAST 📚 VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset 83.00 2023-05-29 📦 TXH-mercury/VALOR 📦 txh-mercury/vast
3 VALOR 📚 VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset 78.50 2023-04-17 📦 TXH-mercury/VALOR
4 InternVideo2-6B 📚 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding 75.50 2024-03-22 📦 opengvlab/internvideo 📦 opengvlab/internvideo2
5 Unmasked Teacher Unmasked Teacher: Towards Training-Efficient Video Foundation Models 72.00 2023-03-28 📦 opengvlab/unmasked_teacher
6 InternVideo InternVideo: General Video Foundation Models via Generative and Discriminative Learning 71.10 2022-12-06 📦 opengvlab/internvideo 📦 yingsen1/unimd
7 Side4Video Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning 68.80 2023-11-27 📦 whwu95/ATM 📦 HJYao00/Side4Video
8 Cap4Video Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? 66.60 2022-12-31 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE
9 TS2-Net TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval 59.10 2022-07-16 📦 yuqi657/ts2_net
10 LAFF Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval 59.10 2021-12-03 📦 ruc-aimc-lab/laff

All Papers (12)