VLEP

Video-and-Language Event Prediction

Dataset Information
Modalities
Videos, Texts
Introduced
2020
License
Unknown
Homepage

Overview

VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.

Source: What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Variants: VLEP

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering LLaMA-VQA Large Language Models are Temporal … 2023-10-24

Research Papers

Recent papers with results on this dataset: