📊 Showing 6 results | 📏 Metric: Accuracy (Top-1)
Rank | Model | Paper | Accuracy (Top-1) | Date | Code |
---|---|---|---|---|---|
1 | Oyrx (34B) | Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution | 71.40 | 2024-09-19 | 📦 oryx-mllm/oryx |
2 | BIMBA-LLaVA-Qwen2-7B | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | 68.51 | 2025-03-12 | 📦 md-mohaiminul/BIMBA |
3 | InternVideo2 (8B) | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | 63.40 | 2024-03-22 | 📦 opengvlab/internvideo 📦 opengvlab/internvideo2 |
4 | VideoLLaMA2 (72B) | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | 57.50 | 2024-06-11 | 📦 damo-nlp-sg/videollama2 📦 damo-nlp-sg/videollama3 📦 damo-nlp-sg/inf-clip |
5 | TraveLER | TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering | 50.20 | 2024-04-01 | 📦 traveler-framework/traveler |
6 | Flamingo | Perception Test: A Diagnostic Benchmark for Multimodal Video Models | 0.46 | 2023-05-23 | 📦 deepmind/perception_test |