EGTEA Gaze+
Extended GTEA Gaze+
EGTEA Gaze+ is a large-scale dataset for FPV actions and gaze. It subsumes GTEA Gaze+ and comes with HD videos (1280x960), audios, gaze tracking data, frame-level action annotations, and pixel-level hand masks at sampled frames.
Specifically, EGTEA Gaze+ contains 28 hours (de-identified) of cooking activities from 86 unique sessions of 32 subjects. These videos come with audios and gaze tracking (30Hz). We have further provided human annotations of actions (human-object interactions) and hand masks.
The action annotations include 10325 instances of fine-grained actions, such as "Cut bell pepper" or "Pour condiment (from) condiment container into salad".
The hand annotations consist of 15,176 hand masks from 13,847 frames from the videos.
Source: http://cbs.ic.gatech.edu/fpv/
Image Source: http://cbs.ic.gatech.edu/fpv/
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Action Anticipation | InAViT | Interaction Region Visual Transformer for … | 2022-11-25 |
Action Anticipation | Abstract Goal | Predicting the Next Action by … | 2022-09-12 |
Long-tail Learning | CDB-loss (3D- ResNeXt101) | Class-Wise Difficulty-Balanced Loss for Solving … | 2020-10-05 |
Long-tail Learning | CB Loss | Class-Balanced Loss Based on Effective … | 2019-01-16 |
Long-tail Learning | Focal loss (3D- ResNeXt101) | Focal Loss for Dense Object … | 2017-08-07 |
Recent papers with results on this dataset: