EgoSchema

Dataset Information
Modalities
Videos
Introduced
2023
License
Unknown
Homepage

Overview

EgoSchema is very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior.

Variants: EgoSchema, EgoSchema (subset), EgoSchema (fullset)

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering (VQA) Lyra-Pro Lyra: An Efficient and Speech-Centric … 2024-12-12

Research Papers

Recent papers with results on this dataset: