📊 Showing 6 results | 📏 Metric: Accuracy
Rank | Model | Paper | Accuracy | Date | Code |
---|---|---|---|---|---|
1 | Flash-VStream | Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | 61.60 | 2024-06-12 | 📦 IVGSZ/Flash-VStream |
2 | Vista-LLaMA | Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens | 60.70 | 2023-12-12 | - |
3 | VideoChat | VideoChat: Chat-Centric Video Understanding | 56.60 | 2023-05-10 | 📦 opengvlab/ask-anything |
4 | MovieChat+ | MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | 54.80 | 2024-04-26 | 📦 rese1f/MovieChat |
5 | Video-ChatGPT | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models | 54.60 | 2023-06-08 | 📦 mbzuai-oryx/video-chatgpt 📦 qiujihao19/artemis |
6 | MovieChat | MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | 49.90 | 2023-07-31 | 📦 rese1f/MovieChat |