EPIC-KITCHENS-100

Name: EPIC-KITCHENS-100
Published: 2020-06-23
License: CC BY NC 4.0

Dataset Information

Modalities

Videos, Texts

Languages

English

Introduced

2020

License

CC BY NC 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on".
The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.

Variants: EPIC-KITCHENS-100

Associated Benchmarks

This dataset is used in 5 benchmarks:

Temporal Action Localization - Metrics: Avg mAP (0.1-0.5), mAP [email protected], mAP [email protected], mAP [email protected], mAP [email protected], mAP [email protected]
Audio Classification - Metrics: Top-1 Action, Top-1 Noun, Top-1 Verb, Top-5 Action, Top-5 Noun, Top-5 Verb
Action Recognition - Metrics: Action@1, Verb@1, Noun@1, GFLOPs
Action Anticipation - Metrics: Recall@5, Top-5 Verb, Top-5 Noun
Unsupervised Domain Adaptation - Metrics: Average Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Action Recognition	LLaVAction	LLaVAction: evaluating and training multi-modal …	2025-03-24
Action Recognition	LVMAE	Extending Video Masked Autoencoders to …	2024-11-20
Action Anticipation	S-GEAR	Semantically Guided Representation Learning For …	2024-07-02
Action Anticipation	PlausiVL	Can't make an Omelette without …	2024-05-30
Action Recognition	TIM	TIM: A Time Interval Machine …	2024-04-08
Action Recognition	CAST(ViT-B/16)	CAST: Cross-Attention in Space and …	2023-11-30
Temporal Action Localization	AdaTAD (verb, VideoMAE-L)	End-to-End Temporal Action Detection with …	2023-11-28
Action Recognition	Avion (ViT-L)	Training a Large Video Model …	2023-09-28
Action Recognition	TAdaConvNeXtV2-S	Temporally-Adaptive Models for Efficient Video …	2023-08-10
Action Recognition	TAdaFormer-L/14	Temporally-Adaptive Models for Efficient Video …	2023-08-10
Temporal Action Localization	TemporalMaxer (verb)	TemporalMaxer: Maximize Temporal Context with …	2023-03-16
Temporal Action Localization	TriDet (verb)	TriDet: Temporal Action Detection with …	2023-03-13
Audio Classification	Audiovisual Masked Autoencoder (Audiovisual, Single)	Audiovisual Masked Autoencoders	2022-12-09
Audio Classification	Audiovisual Masked Autoencoder (Audio-only, Single)	Audiovisual Masked Autoencoders	2022-12-09
Audio Classification	Audiovisual Masked Autoencoder (Video-only, Single)	Audiovisual Masked Autoencoders	2022-12-09
Action Recognition	LaViLa (TimeSformer-L)	Learning Video Representations from Large …	2022-12-08
Action Anticipation	InAViT	Interaction Region Visual Transformer for …	2022-11-25
Action Anticipation	AFFT	Anticipative Feature Fusion Transformer for …	2022-10-23
Audio Classification	PlayItBackX3	Play It Back: Iterative Attention …	2022-10-20
Action Recognition	M&M (WTS 60M)	M&M Mix: A Multimodal Multiview …	2022-06-20

Research Papers

Recent papers with results on this dataset:

External Links:

EPIC-KITCHENS-100

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview