π Showing 5 results | π Metric: text-to-video R@1
Rank | Model | Paper | text-to-video R@1 | Date | Code |
---|---|---|---|---|---|
1 | TESTA (ViT-B/16) π | TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding | 83.40 | 2023-10-29 | π¦ renshuhuai-andy/testa |
2 | LF-VILA π | Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning | 69.70 | 2022-10-12 | π¦ microsoft/xpretrain |
3 | VINDLU π | VindLU: A Recipe for Effective Video-and-Language Pretraining | 67.80 | 2022-12-09 | π¦ klauscc/vindlu |
4 | Frozen π | Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval | 53.80 | 2021-04-01 | π¦ towhee-io/towhee π¦ m-bain/webvid π¦ m-bain/frozen-in-time π¦ princetonvisualai/mqvr π¦ willard-yuan/video-text-retrieval-papers |
5 | QB-Norm+TT-CE+ | Cross Modal Retrieval with Querybank Normalisation | 15.10 | 2021-12-23 | π¦ ioanacroi/qb-norm |