SG-DETR (w/ PT)
|
Saliency-Guided DETR for Moment Retrieval and Hig…
|
71.10
|
2024-10-02
|
|
LLaVA-MR
|
LLaVA-MR: Large Language-and-Vision Assistant for…
|
70.65
|
2024-11-21
|
|
FlashVTG
|
FlashVTG: Feature Layering and Adaptive Score Han…
|
70.32
|
2024-12-18
|
|
SG-DETR
|
Saliency-Guided DETR for Moment Retrieval and Hig…
|
70.20
|
2024-10-02
|
|
InternVideo2-6B
|
InternVideo2: Scaling Foundation Models for Multi…
|
70.03
|
2024-03-22
|
|
InternVideo2-1B
|
InternVideo2: Scaling Foundation Models for Multi…
|
68.36
|
2024-03-22
|
|
VideoChat-T (FT)
|
TimeSuite: Improving MLLMs for Long Video Underst…
|
67.10
|
2024-10-25
|
|
UniMD+Sync.
|
UniMD: Towards Unifying Moment Retrieval and Temp…
|
63.98
|
2024-04-07
|
|
UnLoc-L
|
UnLoc: A Unified Framework for Video Localization…
|
60.80
|
2023-08-21
|
|
BAM-DETR
|
BAM-DETR: Boundary-Aligned Moment Detection Trans…
|
59.95
|
2023-11-30
|
|
BM-DETR
|
Background-aware Moment Detection for Video Momen…
|
59.48
|
2023-06-05
|
|
UVCOM
|
Bridging the Gap: A Unified Video Comprehension F…
|
59.25
|
2023-11-28
|
|
CG-DETR
|
Correlation-Guided Query-Dependency Calibration f…
|
58.44
|
2023-11-15
|
|
LLMEPET
|
Prior Knowledge Integration via LLM Encoding and …
|
58.31
|
2024-07-21
|
|
UnLoc-B
|
UnLoc: A Unified Framework for Video Localization…
|
58.10
|
2023-08-21
|
|
QD-DETR (Only Video)
|
Query-Dependent Video Representation for Moment R…
|
57.31
|
2023-03-24
|
|
video-mamba-suite
|
Video Mamba Suite: State Space Model as a Versati…
|
57.18
|
2024-03-14
|
|
Moment-DETR w/ PT (on 10K HowTo100M videos)
|
QVHighlights: Detecting Moments and Highlights in…
|
55.65
|
2021-07-20
|
|
Moment-DETR
|
QVHighlights: Detecting Moments and Highlights in…
|
53.63
|
2021-07-20
|
|
LD-DETR
|
LD-DETR: Loop Decoder DEtection TRansformer for V…
|
53.44
|
2025-01-18
|
|
VideoLights-B-pt
|
VideoLights: Feature Refinement and Cross-Task Al…
|
52.94
|
2024-12-02
|
|
UMT (VO)
|
UMT: Unified Multi-modal Transformers for Joint V…
|
49.35
|
2022-03-23
|
|
UMT (VA)
|
UMT: Unified Multi-modal Transformers for Joint V…
|
48.31
|
2022-03-23
|
|
VideoChat-T (ZS)
|
TimeSuite: Improving MLLMs for Long Video Underst…
|
45.43
|
2024-10-25
|
|
SimVTP
|
SimVTP: Simple Video Text Pre-training with Maske…
|
44.70
|
2022-12-07
|
|