CrossTask

Dataset Information
Modalities
Videos, Texts
Introduced
2019
License
Unknown
Homepage

Overview

CrossTask dataset contains instructional videos, collected for 83 different tasks. For each task an ordered list of steps with manual descriptions is provided. The dataset is divided in two parts: 18 primary and 65 related tasks. Videos for the primary tasks are collected manually and provided with annotations for temporal step boundaries. Videos for the related tasks are collected automatically and don't have annotations.

Source: CrossTask
Image Source: https://arxiv.org/pdf/1903.08225v2.pdf

Variants: CrossTask

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Temporal Action Localization VideoCLIP VideoCLIP: Contrastive Pre-training for Zero-shot … 2021-09-28
Temporal Action Localization TACo TACo: Token-aware Cascade Contrastive Learning … 2021-08-23
Temporal Action Localization VLM VLM: Task-agnostic Video-Language Model Pre-training … 2021-05-20
Temporal Action Localization Text-Video Embedding HowTo100M: Learning a Text-Video Embedding … 2019-06-07
Temporal Action Localization Fully-supervised upper-bound Cross-task weakly supervised learning from … 2019-03-19
Temporal Action Localization Zhukov Cross-task weakly supervised learning from … 2019-03-19
Temporal Action Localization Alayrac Unsupervised Learning from Narrated Instruction … 2015-06-30

Research Papers

Recent papers with results on this dataset: