Ego4D

Dataset Information
Modalities
Videos
License
Homepage

Overview

Ego4D is a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of daily life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, a host of new benchmark challenges are presented, centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, the aim is to push the frontier of first-person perception.

Description from: Facebook AI

Paper: Ego4D: Around the World in 3,000 Hours of Egocentric Video

GitHub: https://github.com/EGO4D

Variants: Ego4D, Ego4D MQ val, Ego4D MQ test

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Natural Language Queries DeCafNet-100% DeCafNet: Delegate and Conquer for … 2025-05-22
Natural Language Queries DeCafNet-50% (no NaQ) DeCafNet: Delegate and Conquer for … 2025-05-22
Natural Language Queries DeCafNet-50% DeCafNet: Delegate and Conquer for … 2025-05-22
Short-term Object Interaction Anticipation SOIA-DOD Short-term Object Interaction Anticipation with … 2024-07-08
Natural Language Queries EgoVideo EgoVideo: Exploring Egocentric Foundation Model … 2024-06-26
Short-term Object Interaction Anticipation EgoVideo EgoVideo: Exploring Egocentric Foundation Model … 2024-06-26
Natural Language Queries UniMD+Sync. UniMD: Towards Unifying Moment Retrieval … 2024-04-07
Natural Language Queries RGNet RGNet: A Unified Clip Retrieval … 2023-12-11
Natural Language Queries EgoVLPv2 EgoVLPv2: Egocentric Video-Language Pre-training with … 2023-07-11
Short-term Object Interaction Anticipation GANOv2 Guided Attention for Next Active … 2023-05-25
Short-term Object Interaction Anticipation InternVideo InternVideo-Ego4D: A Pack of Champion … 2022-11-17
Natural Language Queries InternVideo InternVideo-Ego4D: A Pack of Champion … 2022-11-17
Future Hand Prediction InternVideo InternVideo-Ego4D: A Pack of Champion … 2022-11-17
State Change Object Detection InternVideo InternVideo-Ego4D: A Pack of Champion … 2022-11-17
Natural Language Queries ReLER@ZJU-Alibaba ReLER@ZJU-Alibaba Submission to the Ego4D … 2022-07-01
Natural Language Queries EgoVLP Egocentric Video-Language Pretraining 2022-06-03

Research Papers

Recent papers with results on this dataset: