Math-Vision (Math-V) dataset is a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.
Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on Math-Vision, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development.
Variants: MATH-V
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Multimodal Reasoning | GPT4V | Measuring Multimodal Mathematical Reasoning with … | 2024-02-22 |
Multimodal Reasoning | Gemini Pro | Measuring Multimodal Mathematical Reasoning with … | 2024-02-22 |
Multimodal Reasoning | Qwen-VL-Max | Measuring Multimodal Mathematical Reasoning with … | 2024-02-22 |
Multimodal Reasoning | InternLM-XComposer2-VL | Measuring Multimodal Mathematical Reasoning with … | 2024-02-22 |
Recent papers with results on this dataset: