π Showing 5 results | π Metric: text-to-video R@1
Rank | Model | Paper | text-to-video R@1 | Date | Code |
---|---|---|---|---|---|
1 | GRAM π | Gramian Multimodal Representation Learning and Alignment | 83.90 | 2024-12-16 | π¦ ispamm/GRAM π¦ luigisigillo/gwit |
2 | InternVideo2-6B π | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | 71.50 | 2024-03-22 | π¦ opengvlab/internvideo π¦ opengvlab/internvideo2 |
3 | InternVideo2-1B π | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | 70.40 | 2024-03-22 | π¦ opengvlab/internvideo π¦ opengvlab/internvideo2 |
4 | VideoCoCa π | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | 53.20 | 2022-12-09 | - |
5 | InternVideo | InternVideo: General Video Foundation Models via Generative and Discriminative Learning | 49.50 | 2022-12-06 | π¦ opengvlab/internvideo π¦ yingsen1/unimd |