ML Research Wiki / Benchmarks / VCGBench-Diverse / VideoInstruct

VideoInstruct

VCGBench-Diverse Benchmark

Performance Over Time

📊 Showing 6 results | 📏 Metric: mean

Rank	Model	Paper	mean	Date	Code
1	VideoGPT+	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	2.47	2024-06-13	📦 mbzuai-oryx/videogpt-plus
2	Chat-UniVi	Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	2.29	2023-11-14	📦 pku-yuangroup/chat-univi 📦 skyworkai/moh 📦 skyworkai/moe-plus-plus 📦 pku-yuangroup/video-bench
3	VideoChat2	MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	2.20	2023-11-28	📦 opengvlab/ask-anything 📦 magic-research/PLLaVA 📦 bytedance/tarsier
4	BT-Adapter	BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning	2.19	2023-09-27	📦 farewellthree/BT-Adapter
5	VTimeLLM	VTimeLLM: Empower LLM to Grasp Video Moments	2.17	2023-11-30	📦 huangb23/vtimellm
6	Video-ChatGPT	Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	2.08	2023-06-08	📦 mbzuai-oryx/video-chatgpt 📦 qiujihao19/artemis

2024

VideoGPT+

mbzuai-oryx/videogpt-plus

2023

Chat-UniVi

pku-yuangroup/chat-univi skyworkai/moh

2023

VideoChat2

opengvlab/ask-anything magic-research/PLLaVA bytedance/tarsier

2023

BT-Adapter

farewellthree/BT-Adapter

2023

VTimeLLM

huangb23/vtimellm

2023

Video-ChatGPT

mbzuai-oryx/video-chatgpt qiujihao19/artemis

Model	Paper	mean	Date
VideoGPT+	VideoGPT+: Integrating Image and Video Encoders f…	2.47	2024-06-13
Chat-UniVi	Chat-UniVi: Unified Visual Representation Empower…	2.29	2023-11-14
VideoChat2	MVBench: A Comprehensive Multi-modal Video Unders…	2.20	2023-11-28
BT-Adapter	BT-Adapter: Video Conversation is Feasible Withou…	2.19	2023-09-27
VTimeLLM	VTimeLLM: Empower LLM to Grasp Video Moments	2.17	2023-11-30
Video-ChatGPT	Video-ChatGPT: Towards Detailed Video Understandi…	2.08	2023-06-08