Assembly101

Dataset Information
Modalities
Videos
Languages
English
Introduced
2022
License
Unknown
Homepage

Overview

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.

Image Source: https://assembly-101.github.io/

Variants: Assembly101

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
3D Action Recognition CHASE(CTR-GCN) CHASE: Learning Convex Hull Adaptive … 2024-10-09
3D Action Recognition HandFormer-B/21 On the Utility of 3D … 2024-03-14
Action Segmentation LTContext How Much Temporal Long-Term Context … 2023-08-22
3D Action Recognition ISTA-Net Interactive Spatiotemporal Token Attention Network … 2023-07-14
Action Anticipation Goal Consistency Action Anticipation with Goal Consistency 2023-06-26
Action Segmentation UVAST Unified Fully and Timestamp Supervised … 2022-09-01
Action Segmentation ASFormer ASFormer: Transformer for Action Segmentation 2021-10-16
Action Segmentation C2F-TCN Coarse to Fine Multi-Resolution Temporal … 2021-05-23
3D Action Recognition RGBPoseConv3D Revisiting Skeleton-based Action Recognition 2021-04-28
Action Segmentation MS-TCN++ MS-TCN++: Multi-Stage Temporal Convolutional Network … 2020-06-16
Action Anticipation TempAgg Temporal Aggregate Representations for Long-Range … 2020-06-01
3D Action Recognition MS-G3D Disentangling and Unifying Graph Convolutions … 2020-03-31
3D Action Recognition TSM TSM: Temporal Shift Module for … 2018-11-20
3D Action Recognition 2s-AGCN Two-Stream Adaptive Graph Convolutional Networks … 2018-05-20

Research Papers

Recent papers with results on this dataset: