MultiTHUMOS

Dataset Information
Modalities
Videos
Introduced
2018
License
CC BY 4.0
Homepage

Overview

The MultiTHUMOS dataset contains dense, multilabel, frame-level action annotations for 30 hours across 400 videos in the THUMOS'14 action detection dataset. It consists of 38,690 annotations of 65 action classes, with an average of 1.5 labels per frame and 10.5 action classes per video.

Source: http://ai.stanford.edu/~syyeung/everymoment.html
Image Source: http://ai.stanford.edu/~syyeung/everymoment.html

Variants: Multi-THUMOS, MultiTHUMOS

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Temporal Action Localization DualDETR (I3D-rgb) Dual DETRs for Multi-Label Temporal … 2024-03-31
Temporal Action Localization TriDet (VideoMAEv2) Temporal Action Localization with Enhanced … 2023-09-11
Temporal Action Localization TriDet (I3D-rgb) Temporal Action Localization with Enhanced … 2023-09-11
Action Detection PAT PAT: Position-Aware Transformer for Dense … 2023-08-09
Temporal Action Localization TemporalMaxer TemporalMaxer: Maximize Temporal Context with … 2023-03-16
Temporal Action Localization PointTAD PointTAD: Multi-Label Temporal Action Detection … 2022-10-20
Temporal Action Localization MS-TCT MS-TCT: Multi-Scale Temporal ConvTransformer for … 2021-12-07
Temporal Action Localization MLAD Modeling Multi-Label Action Dependencies for … 2021-03-04

Research Papers

Recent papers with results on this dataset: