MRR-Benchmark

Multi-Modal Reading Benchmark

Dataset Information
Modalities
Images, Texts
Languages
English
Introduced
2024
License
Homepage

Overview

Multi-Modal Reading (MMR) Benchmark includes 550 annotated question-answer pairs across 11 distinct tasks involving texts, fonts, visual elements, bounding boxes, spatial relations, and grounding, with carefully designed evaluation metrics.

Variants: MRR-Benchmark

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
MMR total GPT-4o GPT-4o: Visual perception performance of … 2024-06-14
MMR total Idefics-2-8B What matters when building vision-language … 2024-05-03
MMR total Phi-3-Vision Phi-3 Technical Report: A Highly … 2024-04-22
MMR total InternVL2-8B InternVL: Scaling up Vision Foundation … 2023-12-21
MMR total InternVL2-1B InternVL: Scaling up Vision Foundation … 2023-12-21
MMR total Monkey-Chat-7B Monkey: Image Resolution and Text … 2023-11-11
MMR total GPT-4V The Dawn of LMMs: Preliminary … 2023-09-29
MMR total Qwen-vl-max Qwen-VL: A Versatile Vision-Language Model … 2023-08-24
MMR total Qwen-vl-plus Qwen-VL: A Versatile Vision-Language Model … 2023-08-24
MMR total Idefics-80B OBELICS: An Open Web-Scale Filtered … 2023-06-21
MMR total LLaVA-1.5-13B Visual Instruction Tuning 2023-04-17
MMR total LLaVA-NEXT-13B Visual Instruction Tuning 2023-04-17
MMR total LLaVA-NEXT-34B Visual Instruction Tuning 2023-04-17

Research Papers

Recent papers with results on this dataset: