InternVideo2-6B
|
InternVideo2: Scaling Foundation Models for Multi…
|
61.40
|
2024-03-22
|
|
HunYuan_tvr (huge)
|
Tencent Text-Video Retrieval: Hierarchical Cross-…
|
59.00
|
2022-04-07
|
|
InternVideo
|
InternVideo: General Video Foundation Models via …
|
58.40
|
2022-12-06
|
|
HunYuan_tvr
|
Tencent Text-Video Retrieval: Hierarchical Cross-…
|
58.20
|
2022-04-07
|
|
vid-TLDR (UMT-L)
|
vid-TLDR: Training Free Token merging for Light-w…
|
57.90
|
2024-03-20
|
|
VLAB
|
VLAB: Enhancing Video Language Pre-training by Fe…
|
57.50
|
2023-05-22
|
|
MDMMT-2
|
MDMMT-2: Multidomain Multimodal Transformer for V…
|
56.80
|
2022-03-14
|
|
Side4Video
|
Side4Video: Spatial-Temporal Side Network for Mem…
|
56.10
|
2023-11-27
|
|
CAMoE
|
Improving Video-Text Retrieval by Multi-Stream Co…
|
51.80
|
2021-09-09
|
|
Cap4Video
|
Cap4Video: What Can Auxiliary Captions Do for Tex…
|
51.80
|
2022-12-31
|
|
CenterCLIP (ViT-B/16)
|
CenterCLIP: Token Clustering for Efficient Text-V…
|
50.60
|
2022-05-02
|
|
X-CLIP
|
X-CLIP: End-to-End Multi-grained Contrastive Lear…
|
50.40
|
2022-07-15
|
|
DMAE
(ViT-B/32)
|
Dual-Modal Attention-Enhanced Text-Video Retrieva…
|
48.70
|
2023-09-20
|
|
QB-Norm+CLIP2Video
|
Cross Modal Retrieval with Querybank Normalisation
|
48.00
|
2021-12-23
|
|
DiffusionRet+QB-Norm
|
DiffusionRet: Generative Text-Video Retrieval wit…
|
47.90
|
2023-03-17
|
|
PAU
|
Prototype-based Aleatoric Uncertainty Quantificat…
|
47.30
|
2023-09-29
|
|
X-Pool
|
X-Pool: Cross-Modal Language-Video Attention for …
|
47.20
|
2022-03-28
|
|
DiffusionRet
|
DiffusionRet: Generative Text-Video Retrieval wit…
|
46.60
|
2023-03-17
|
|
CLIP4Clip
|
CLIP4Clip: An Empirical Study of CLIP for End to …
|
46.20
|
2021-04-18
|
|
LAFF
|
Lightweight Attentional Feature Fusion: A New Bas…
|
45.40
|
2021-12-03
|
|
CLIP
|
A Straightforward Framework For Video Retrieval U…
|
37.00
|
2021-02-24
|
|
FROZEN
|
Frozen in Time: A Joint Video and Image Encoder f…
|
33.70
|
2021-04-01
|
|
SSML
|
Noise Estimation Using Density Estimation for Sel…
|
20.30
|
2020-03-06
|
|
Collaborative Experts
|
Use What You Have: Video Retrieval Using Represen…
|
19.80
|
2019-07-31
|
|