The MultiTHUMOS dataset contains dense, multilabel, frame-level action annotations for 30 hours across 400 videos in the THUMOS'14 action detection dataset. It consists of 38,690 annotations of 65 action classes, with an average of 1.5 labels per frame and 10.5 action classes per video.
Source: http://ai.stanford.edu/~syyeung/everymoment.html
Image Source: http://ai.stanford.edu/~syyeung/everymoment.html
Variants: Multi-THUMOS, MultiTHUMOS
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Temporal Action Localization | DualDETR (I3D-rgb) | Dual DETRs for Multi-Label Temporal … | 2024-03-31 |
Temporal Action Localization | TriDet (VideoMAEv2) | Temporal Action Localization with Enhanced … | 2023-09-11 |
Temporal Action Localization | TriDet (I3D-rgb) | Temporal Action Localization with Enhanced … | 2023-09-11 |
Action Detection | PAT | PAT: Position-Aware Transformer for Dense … | 2023-08-09 |
Temporal Action Localization | TemporalMaxer | TemporalMaxer: Maximize Temporal Context with … | 2023-03-16 |
Temporal Action Localization | PointTAD | PointTAD: Multi-Label Temporal Action Detection … | 2022-10-20 |
Temporal Action Localization | MS-TCT | MS-TCT: Multi-Scale Temporal ConvTransformer for … | 2021-12-07 |
Temporal Action Localization | MLAD | Modeling Multi-Label Action Dependencies for … | 2021-03-04 |
Recent papers with results on this dataset: