📊 Showing 7 results | 📏 Metric: AnswerExactMatch (Question Answering)
Rank | Model | Paper | AnswerExactMatch (Question Answering) | Date | Code |
---|---|---|---|---|---|
1 | CREMA | CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion | 54.60 | 2024-02-08 | 📦 Yui010206/CREMA |
2 | Situation3D | Situational Awareness Matters in 3D Vision Language Reasoning | 52.60 | 2024-06-11 | 📦 YunzeMan/Situation3D |
3 | Lexicon3D | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | 50.70 | 2024-09-05 | 📦 yunzeman/lexicon3d |
4 | LM4VisualEncoding | Frozen Transformers in Language Models Are Effective Visual Encoder Layers | 48.09 | 2023-10-19 | 📦 ziqipang/lm4visualencoding 📦 zhixinlai/llmboostmedical |
5 | ScanQA (w/ auxiliary loss) 📚 | SQA3D: Situated Question Answering in 3D Scenes | 47.20 | 2022-10-14 | 📦 SilongYong/SQA3D |
6 | ScanQA | SQA3D: Situated Question Answering in 3D Scenes | 46.58 | 2022-10-14 | 📦 SilongYong/SQA3D |
7 | MCAN | Deep Modular Co-Attention Networks for Visual Question Answering | 43.42 | 2019-06-25 | 📦 MILVLG/mcan-vqa 📦 apugoneappu/ask_me_anything 📦 apugoneappu/vqa_visualise |