EGTEA

EGTEA Gaze+

Dataset Information
Introduced
2018
License
Unknown
Homepage

Overview

Extended GTEA Gaze+
EGTEA Gaze+ is a large-scale dataset for FPV actions and gaze. It subsumes GTEA Gaze+ and comes with HD videos (1280x960), audios, gaze tracking data, frame-level action annotations, and pixel-level hand masks at sampled frames.
Specifically, EGTEA Gaze+ contains 28 hours (de-identified) of cooking activities from 86 unique sessions of 32 subjects. These videos come with audios and gaze tracking (30Hz). We have further provided human annotations of actions (human-object interactions) and hand masks.

The action annotations include 10325 instances of fine-grained actions, such as "Cut bell pepper" or "Pour condiment (from) condiment container into salad".

The hand annotations consist of 15,176 hand masks from 13,847 frames from the videos.

Source: http://cbs.ic.gatech.edu/fpv/
Image Source: http://cbs.ic.gatech.edu/fpv/

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Action Anticipation InAViT Interaction Region Visual Transformer for … 2022-11-25
Action Anticipation Abstract Goal Predicting the Next Action by … 2022-09-12
Long-tail Learning CDB-loss (3D- ResNeXt101) Class-Wise Difficulty-Balanced Loss for Solving … 2020-10-05
Long-tail Learning CB Loss Class-Balanced Loss Based on Effective … 2019-01-16
Long-tail Learning Focal loss (3D- ResNeXt101) Focal Loss for Dense Object … 2017-08-07

Research Papers

Recent papers with results on this dataset: