VidSTG

Dataset Information
License
Unknown
Homepage

Overview

The VidSTG dataset is a spatio-temporal video grounding dataset constructed based on the video relation dataset VidOR. VidOR contains 7,000, 835 and 2,165 videos for training, validation and testing, respectively. The goal of the Spatio-Temporal Video Grounding task (STVG) is to localize the spatio-temporal section of an untrimmed video that matches a given sentence depicting an object. VidSTG contains 5,563, 618, and 743 videos for training, validation, and testing, respectively.

Source: https://github.com/Guaranteer/VidSTG-Dataset
Image Source: https://github.com/Guaranteer/VidSTG-Dataset

Variants: VidSTG

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Spatio-Temporal Video Grounding TA-STVG Knowing Your Target: Target-Aware Transformer … 2025-02-16
Spatio-Temporal Video Grounding CG-STVG Context-Guided Spatio-Temporal Video Grounding 2024-01-03
Spatio-Temporal Video Grounding TubeDETR TubeDETR: Spatio-Temporal Video Grounding with … 2022-03-30

Research Papers

Recent papers with results on this dataset: