Gaze-CIFAR-10

Dataset Information
Modalities
Images, Time series, Tracking
Languages
English
Introduced
2025
Homepage

Overview

We construct Gaze-CIFAR-10, a gaze-augmented image dataset based on the standard CIFAR-10 benchmark, enhanced with human eye-tracking annotations collected using the HTC VIVE Pro Eye headset. The original CIFAR-10 dataset consists of 60,000 color images across 10 categories, each with a resolution of $32 \times 32$ pixels. To enable reliable human gaze tracking, all images are upsampled to $1024 \times 1024$ using the Real-ESRGAN model.

For each image, we collect a sequence of 176 eye-gaze coordinates, normalized to the range $[0, 224]$ (with the lower-left corner of the image as the origin) to match the input resolution of Vision Transformers (ViTs). Gaze data were collected from 20 participants using an HTC VIVE Pro Eye headset. The dataset contains 50,000 training images and 10,000 test images, with corresponding synchronized gaze trajectories.

Variants: Gaze-CIFAR-10

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Image Classification DSGE-ConvNeXtV2 Gaze-Guided Learning: Avoiding Shortcut Bias … 2025-04-08
Image Classification DSGE-ViT Gaze-Guided Learning: Avoiding Shortcut Bias … 2025-04-08

Research Papers

Recent papers with results on this dataset: