VidSTG

Name: VidSTG
License: Unknown

Dataset Information

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The VidSTG dataset is a spatio-temporal video grounding dataset constructed based on the video relation dataset VidOR. VidOR contains 7,000, 835 and 2,165 videos for training, validation and testing, respectively. The goal of the Spatio-Temporal Video Grounding task (STVG) is to localize the spatio-temporal section of an untrimmed video that matches a given sentence depicting an object. VidSTG contains 5,563, 618, and 743 videos for training, validation, and testing, respectively.

Source: https://github.com/Guaranteer/VidSTG-Dataset
Image Source: https://github.com/Guaranteer/VidSTG-Dataset

Variants: VidSTG

Associated Benchmarks

This dataset is used in 1 benchmark:

Spatio-Temporal Video Grounding - Metrics: Declarative m_vIoU, Declarative [email protected], Declarative [email protected], Interrogative m_vIoU, Interrogative [email protected], Interrogative [email protected]

Recent Benchmark Submissions

Task	Model	Paper	Date
Spatio-Temporal Video Grounding	TA-STVG	Knowing Your Target: Target-Aware Transformer …	2025-02-16
Spatio-Temporal Video Grounding	CG-STVG	Context-Guided Spatio-Temporal Video Grounding	2024-01-03
Spatio-Temporal Video Grounding	TubeDETR	TubeDETR: Spatio-Temporal Video Grounding with …	2022-03-30

Research Papers

Recent papers with results on this dataset:

External Links:

VidSTG

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview