EPIC-KITCHENS-100

Dataset Information
Modalities
Videos, Texts
Languages
English
Introduced
2020
License
Homepage

Overview

This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on".
The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.

Variants: EPIC-KITCHENS-100

Associated Benchmarks

This dataset is used in 5 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Action Recognition LLaVAction LLaVAction: evaluating and training multi-modal … 2025-03-24
Action Recognition LVMAE Extending Video Masked Autoencoders to … 2024-11-20
Action Anticipation S-GEAR Semantically Guided Representation Learning For … 2024-07-02
Action Anticipation PlausiVL Can't make an Omelette without … 2024-05-30
Action Recognition TIM TIM: A Time Interval Machine … 2024-04-08
Action Recognition CAST(ViT-B/16) CAST: Cross-Attention in Space and … 2023-11-30
Temporal Action Localization AdaTAD (verb, VideoMAE-L) End-to-End Temporal Action Detection with … 2023-11-28
Action Recognition Avion (ViT-L) Training a Large Video Model … 2023-09-28
Action Recognition TAdaConvNeXtV2-S Temporally-Adaptive Models for Efficient Video … 2023-08-10
Action Recognition TAdaFormer-L/14 Temporally-Adaptive Models for Efficient Video … 2023-08-10
Temporal Action Localization TemporalMaxer (verb) TemporalMaxer: Maximize Temporal Context with … 2023-03-16
Temporal Action Localization TriDet (verb) TriDet: Temporal Action Detection with … 2023-03-13
Audio Classification Audiovisual Masked Autoencoder (Audiovisual, Single) Audiovisual Masked Autoencoders 2022-12-09
Audio Classification Audiovisual Masked Autoencoder (Audio-only, Single) Audiovisual Masked Autoencoders 2022-12-09
Audio Classification Audiovisual Masked Autoencoder (Video-only, Single) Audiovisual Masked Autoencoders 2022-12-09
Action Recognition LaViLa (TimeSformer-L) Learning Video Representations from Large … 2022-12-08
Action Anticipation InAViT Interaction Region Visual Transformer for … 2022-11-25
Action Anticipation AFFT Anticipative Feature Fusion Transformer for … 2022-10-23
Audio Classification PlayItBackX3 Play It Back: Iterative Attention … 2022-10-20
Action Recognition M&M (WTS 60M) M&M Mix: A Multimodal Multiview … 2022-06-20

Research Papers

Recent papers with results on this dataset: