Multi-Modal Reading Benchmark
Multi-Modal Reading (MMR) Benchmark includes 550 annotated question-answer pairs across 11 distinct tasks involving texts, fonts, visual elements, bounding boxes, spatial relations, and grounding, with carefully designed evaluation metrics.
Variants: MRR-Benchmark
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
MMR total | GPT-4o | GPT-4o: Visual perception performance of … | 2024-06-14 |
MMR total | Idefics-2-8B | What matters when building vision-language … | 2024-05-03 |
MMR total | Phi-3-Vision | Phi-3 Technical Report: A Highly … | 2024-04-22 |
MMR total | InternVL2-8B | InternVL: Scaling up Vision Foundation … | 2023-12-21 |
MMR total | InternVL2-1B | InternVL: Scaling up Vision Foundation … | 2023-12-21 |
MMR total | Monkey-Chat-7B | Monkey: Image Resolution and Text … | 2023-11-11 |
MMR total | GPT-4V | The Dawn of LMMs: Preliminary … | 2023-09-29 |
MMR total | Qwen-vl-max | Qwen-VL: A Versatile Vision-Language Model … | 2023-08-24 |
MMR total | Qwen-vl-plus | Qwen-VL: A Versatile Vision-Language Model … | 2023-08-24 |
MMR total | Idefics-80B | OBELICS: An Open Web-Scale Filtered … | 2023-06-21 |
MMR total | LLaVA-1.5-13B | Visual Instruction Tuning | 2023-04-17 |
MMR total | LLaVA-NEXT-13B | Visual Instruction Tuning | 2023-04-17 |
MMR total | LLaVA-NEXT-34B | Visual Instruction Tuning | 2023-04-17 |
Recent papers with results on this dataset: