📊 Showing 6 results | 📏 Metric: mean
Rank | Model | Paper | mean | Date | Code |
---|---|---|---|---|---|
1 | VideoGPT+ | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | 2.47 | 2024-06-13 | 📦 mbzuai-oryx/videogpt-plus |
2 | Chat-UniVi | Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | 2.29 | 2023-11-14 | 📦 pku-yuangroup/chat-univi 📦 skyworkai/moh 📦 skyworkai/moe-plus-plus 📦 pku-yuangroup/video-bench |
3 | VideoChat2 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | 2.20 | 2023-11-28 | 📦 opengvlab/ask-anything 📦 magic-research/PLLaVA 📦 bytedance/tarsier |
4 | BT-Adapter | BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning | 2.19 | 2023-09-27 | 📦 farewellthree/BT-Adapter |
5 | VTimeLLM | VTimeLLM: Empower LLM to Grasp Video Moments | 2.17 | 2023-11-30 | 📦 huangb23/vtimellm |
6 | Video-ChatGPT | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models | 2.08 | 2023-06-08 | 📦 mbzuai-oryx/video-chatgpt 📦 qiujihao19/artemis |