MultiSports

Name: MultiSports
Published: 2021-05-16
License: CC BY_NC 4.0

Dataset Information

Modalities

Videos

Introduced

2021

License

CC BY_NC 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Spatio-temporal action detection is an important and challenging problem in video understanding. The existing action detection benchmarks are limited in aspects of small numbers of instances in a trimmed video or low-level atomic actions. This paper aims to present a new multi-person dataset of spatio-temporal localized sports actions, coined as MultiSports. We first analyze the important ingredients of constructing a realistic and challenging dataset for spatio-temporal action detection by proposing three criteria: (1) multi-person scenes and motion dependent identification, (2) with well-defined boundaries, (3) relatively fine-grained classes of high complexity. Based on these guidelines, we build the dataset of MultiSports v1.0 by selecting 4 sports classes, collecting 3200 video clips, and annotating 37701 action instances with 902k bounding boxes. Our dataset is characterized with important properties of high diversity, dense annotation, and high quality. Our MultiSports, with its realistic setting and detailed annotations, exposes the intrinsic challenges of spatio-temporal action detection. We hope our MultiSports can serve as a standard benchmark for spatio-temporal action detection in the future.

Variants: MultiSports

Associated Benchmarks

This dataset is used in 2 benchmarks:

Action Detection - Metrics: Frame-mAP 0.5, Video-mAP 0.2, Video-mAP 0.5
Open Vocabulary Action Detection - Metrics: val mAP

Recent Benchmark Submissions

Task	Model	Paper	Date
Action Detection	SiA	Scaling Open-Vocabulary Action Detection	2025-04-04
Open Vocabulary Action Detection	SiA	Scaling Open-Vocabulary Action Detection	2025-04-04
Action Detection	HIT	Holistic Interaction Transformer Network for …	2022-10-23

Research Papers

Recent papers with results on this dataset:

Scaling Open-Vocabulary Action Detection (2025) -
Holistic Interaction Transformer Network for Action Detection (2022) -

External Links:

MultiSports

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview