CrossTask dataset contains instructional videos, collected for 83 different tasks. For each task an ordered list of steps with manual descriptions is provided. The dataset is divided in two parts: 18 primary and 65 related tasks. Videos for the primary tasks are collected manually and provided with annotations for temporal step boundaries. Videos for the related tasks are collected automatically and don't have annotations.
Source: CrossTask
Image Source: https://arxiv.org/pdf/1903.08225v2.pdf
Variants: CrossTask
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Temporal Action Localization | VideoCLIP | VideoCLIP: Contrastive Pre-training for Zero-shot … | 2021-09-28 |
Temporal Action Localization | TACo | TACo: Token-aware Cascade Contrastive Learning … | 2021-08-23 |
Temporal Action Localization | VLM | VLM: Task-agnostic Video-Language Model Pre-training … | 2021-05-20 |
Temporal Action Localization | Text-Video Embedding | HowTo100M: Learning a Text-Video Embedding … | 2019-06-07 |
Temporal Action Localization | Fully-supervised upper-bound | Cross-task weakly supervised learning from … | 2019-03-19 |
Temporal Action Localization | Zhukov | Cross-task weakly supervised learning from … | 2019-03-19 |
Temporal Action Localization | Alayrac | Unsupervised Learning from Narrated Instruction … | 2015-06-30 |
Recent papers with results on this dataset: