📊 Showing 3 results | 📏 Metric: text-to-video R@1
Rank | Model | Paper | text-to-video R@1 | Date | Code |
---|---|---|---|---|---|
1 | TESTA (ViT-B/16) 📚 | TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding | 24.90 | 2023-10-29 | 📦 renshuhuai-andy/testa |
2 | VINDLU 📚 | VindLU: A Recipe for Effective Video-and-Language Pretraining | 18.40 | 2022-12-09 | 📦 klauscc/vindlu |
3 | LF-VILA 📚 | Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning | 13.60 | 2022-10-12 | 📦 microsoft/xpretrain |