We construct a fine-grained video dataset organized by both semantic and temporal structures, where each structure contains two-level annotations.
For semantic structure, the action-level labels describe the action types of athletes and the step-level labels depict the sub-action types of consecutive steps in the procedure, where adjacent steps in each action procedure belong to different sub-action types. A combination of sub-action types produces an action type.
In temporal structure, the action-level labels locate the temporal boundary of a complete action instance performed by an athlete. During this annotation process, we discard all the incomplete action instances and filter out the slow playbacks. The step-level labels are the starting frames of consecutive steps in the action procedure.
Variants: FineDiving
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Action Quality Assessment | RICA^2 (Deterministic) | RICA2: Rubric-Informed, Calibrated Assessment of … | 2024-08-04 |
Action Quality Assessment | RICA^2 | RICA2: Rubric-Informed, Calibrated Assessment of … | 2024-08-04 |
Action Quality Assessment | FineParser | FineParser: A Fine-grained Spatio-temporal Action … | 2024-05-11 |
Action Quality Assessment | NeuroSymbolic-AQA | Hierarchical NeuroSymbolic Approach for Comprehensive … | 2024-03-20 |
Recent papers with results on this dataset: