ML Research Wiki / Benchmarks / Video-based Generative Performance Benchmarking (Correctness of Information) / VideoInstruct

VideoInstruct

Video-based Generative Performance Benchmarking (Correctness of Information) Benchmark

Performance Over Time

📊 Showing 18 results | 📏 Metric: gpt-score

Top Performing Models

Rank Model Paper gpt-score Date Code
1 PPLLaVA-7B PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance 3.85 2024-11-04 📦 farewellthree/ppllava
2 PLLaVA-34B PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning 3.60 2024-04-25 📦 magic-research/PLLaVA
3 TS-LLaVA-34B TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models 3.55 2024-11-17 📦 tingyu215/ts-llava
4 SlowFast-LLaVA-34B SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models 3.48 2024-07-22 📦 apple/ml-slowfast-llava
5 VideoChat2_HD_mistral MVBench: A Comprehensive Multi-modal Video Understanding Benchmark 3.40 2023-11-28 📦 opengvlab/ask-anything 📦 magic-research/PLLaVA 📦 bytedance/tarsier
6 VideoGPT+ VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding 3.27 2024-06-13 📦 mbzuai-oryx/videogpt-plus
7 ST-LLM ST-LLM: Large Language Models Are Effective Temporal Learners 3.23 2024-03-30 📦 TencentARC/ST-LLM
8 MiniGPT4-video-7B MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens 3.08 2024-04-04 📦 Vision-CAIR/MiniGPT4-video 📦 pwc-1/Paper-9
9 VideoChat2 MVBench: A Comprehensive Multi-modal Video Understanding Benchmark 3.02 2023-11-28 📦 opengvlab/ask-anything 📦 magic-research/PLLaVA 📦 bytedance/tarsier
10 Chat-UniVi Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding 2.89 2023-11-14 📦 pku-yuangroup/chat-univi 📦 skyworkai/moh 📦 skyworkai/moe-plus-plus 📦 pku-yuangroup/video-bench

All Papers (18)