InternVideo2-6B
|
InternVideo2: Scaling Foundation Models for Multi…
|
63.20
|
2024-03-22
|
|
InternVideo2-1B
|
InternVideo2: Scaling Foundation Models for Multi…
|
60.40
|
2024-03-22
|
|
GRAM
|
Gramian Multimodal Representation Learning and Al…
|
59.00
|
2024-12-16
|
|
UMT-L (ViT-L/16)
|
Unmasked Teacher: Towards Training-Efficient Vide…
|
42.80
|
2023-03-28
|
|
vid-TLDR (UMT-L)
|
vid-TLDR: Training Free Token merging for Light-w…
|
42.80
|
2024-03-20
|
|
LanguageBind(ViT-H/14)
|
LanguageBind: Extending Video-Language Pretrainin…
|
41.00
|
2023-10-03
|
|
LanguageBind(ViT-L/14)
|
LanguageBind: Extending Video-Language Pretrainin…
|
38.40
|
2023-10-03
|
|
BT-Adapter
|
BT-Adapter: Video Conversation is Feasible Withou…
|
37.00
|
2023-09-27
|
|
VideoCoCa
|
VideoCoCa: Video-Text Modeling with Zero-Shot Tra…
|
34.50
|
2022-12-09
|
|
Singularity-temporal-5M
|
Revealing Single Frame Bias for Video-and-Languag…
|
30.80
|
2022-06-07
|
|
InternVideo
|
InternVideo: General Video Foundation Models via …
|
30.70
|
2022-12-06
|
|
Singularity-temporal-17M
|
Revealing Single Frame Bias for Video-and-Languag…
|
30.60
|
2022-06-07
|
|