Human-centric Spatio-Temporal Video Grounding
The newly proposed HC-STVG task aims to localize the target person spatio-temporally in an untrimmed video. For this task, we collect a new benchmark dataset, which has spatio temporal annotations related to the target persons in complex multi-person scenes, together with full interaction and rich action information.
Variants: HC-STVG1
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Spatio-Temporal Video Grounding | TA-STVG | Knowing Your Target: Target-Aware Transformer … | 2025-02-16 |
Spatio-Temporal Video Grounding | CG-STVG | Context-Guided Spatio-Temporal Video Grounding | 2024-01-03 |
Spatio-Temporal Video Grounding | TubeDETR | TubeDETR: Spatio-Temporal Video Grounding with … | 2022-03-30 |
Recent papers with results on this dataset: