We construct Gaze-CIFAR-10, a gaze-augmented image dataset based on the standard CIFAR-10 benchmark, enhanced with human eye-tracking annotations collected using the HTC VIVE Pro Eye headset. The original CIFAR-10 dataset consists of 60,000 color images across 10 categories, each with a resolution of $32 \times 32$ pixels. To enable reliable human gaze tracking, all images are upsampled to $1024 \times 1024$ using the Real-ESRGAN model.
For each image, we collect a sequence of 176 eye-gaze coordinates, normalized to the range $[0, 224]$ (with the lower-left corner of the image as the origin) to match the input resolution of Vision Transformers (ViTs). Gaze data were collected from 20 participants using an HTC VIVE Pro Eye headset. The dataset contains 50,000 training images and 10,000 test images, with corresponding synchronized gaze trajectories.
Variants: Gaze-CIFAR-10
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Image Classification | DSGE-ConvNeXtV2 | Gaze-Guided Learning: Avoiding Shortcut Bias … | 2025-04-08 |
Image Classification | DSGE-ViT | Gaze-Guided Learning: Avoiding Shortcut Bias … | 2025-04-08 |
Recent papers with results on this dataset: