VideoInstruct

Video Instruction Dataset

Dataset Information
Modalities
Videos, Texts
Languages
English
Introduced
2023
Homepage

Overview

Video Instruction Dataset is used to train Video-ChatGPT. It consists of 100,000 high-quality video instruction pairs. employs a combination of human-assisted and semi-automatic annotation techniques, aiming to produce high-quality video instruction data. These methods create question-answer pairs related to

  1. Video summarization
  2. Description-based question-answers (exploring spatial, temporal, relationships, and reasoning concepts)
  3. Creative/generative question-answers

Variants: VideoInstruct

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video-based Generative Performance Benchmarking TS-LLaVA-34B TS-LLaVA: Constructing Visual Tokens through … 2024-11-17
Video-based Generative Performance Benchmarking (Correctness of Information) TS-LLaVA-34B TS-LLaVA: Constructing Visual Tokens through … 2024-11-17
Video-based Generative Performance Benchmarking PPLLaVA-7B-dpo PPLLaVA: Varied Video Sequence Understanding … 2024-11-04
Video-based Generative Performance Benchmarking (Correctness of Information) PPLLaVA-7B PPLLaVA: Varied Video Sequence Understanding … 2024-11-04
Video-based Generative Performance Benchmarking PPLLaVA-7B PPLLaVA: Varied Video Sequence Understanding … 2024-11-04
Video-based Generative Performance Benchmarking (Correctness of Information) SlowFast-LLaVA-34B SlowFast-LLaVA: A Strong Training-Free Baseline … 2024-07-22
Video-based Generative Performance Benchmarking SlowFast-LLaVA-34B SlowFast-LLaVA: A Strong Training-Free Baseline … 2024-07-22
Video-based Generative Performance Benchmarking VideoGPT+ VideoGPT+: Integrating Image and Video … 2024-06-13
VCGBench-Diverse VideoGPT+ VideoGPT+: Integrating Image and Video … 2024-06-13
Video-based Generative Performance Benchmarking (Correctness of Information) VideoGPT+ VideoGPT+: Integrating Image and Video … 2024-06-13
Video-based Generative Performance Benchmarking (Correctness of Information) PLLaVA-34B PLLaVA : Parameter-free LLaVA Extension … 2024-04-25
Video-based Generative Performance Benchmarking PLLaVA-34B PLLaVA : Parameter-free LLaVA Extension … 2024-04-25
Video-based Generative Performance Benchmarking (Correctness of Information) MiniGPT4-video-7B MiniGPT4-Video: Advancing Multimodal LLMs for … 2024-04-04
Video-based Generative Performance Benchmarking (Correctness of Information) ST-LLM ST-LLM: Large Language Models Are … 2024-03-30
Video-based Generative Performance Benchmarking ST-LLM-7B ST-LLM: Large Language Models Are … 2024-03-30
Video-based Generative Performance Benchmarking IG-VLM-GPT4v An Image Grid Can Be … 2024-03-27
Video-based Generative Performance Benchmarking LITA-13B LITA: Language Instructed Temporal-Localization Assistant 2024-03-27
Video-based Generative Performance Benchmarking CAT-7B CAT: Enhancing Multimodal Large Language … 2024-03-07
Video-based Generative Performance Benchmarking VLM-RLAIF Tuning Large Multimodal Models for … 2024-02-06
Video-based Generative Performance Benchmarking (Correctness of Information) VTimeLLM VTimeLLM: Empower LLM to Grasp … 2023-11-30

Research Papers

Recent papers with results on this dataset: