MMNeedle

Multimodal Needle in a Haystack

Dataset Information
Modalities
Images, Texts
Languages
English
Introduced
2024
License
CC BY 4.0
Homepage

Overview

We introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs.

Variants: MMNeedle

Associated Benchmarks

This dataset is used in 2 benchmarks:

  • Hallucination
  • Long-Context Understanding -

Recent Benchmark Submissions

Task Model Paper Date
Long-Context Understanding IDEFICS2-8B What matters when building vision-language … 2024-05-03
Long-Context Understanding LLaVA-Llama-3 LLaVA-UHD: an LMM Perceiving Any … 2024-03-18
Long-Context Understanding Gemini Pro 1.5 Gemini 1.5: Unlocking multimodal understanding … 2024-03-08
Long-Context Understanding Gemini Pro 1.0 Gemini: A Family of Highly … 2023-12-19
Long-Context Understanding mPLUG-Owl-v2 mPLUG-Owl2: Revolutionizing Multi-modal Large Language … 2023-11-07
Long-Context Understanding CogVLM-17B CogVLM: Visual Expert for Pretrained … 2023-11-06
Long-Context Understanding CogVLM2-Llama-3 CogVLM: Visual Expert for Pretrained … 2023-11-06
Long-Context Understanding InstructBLIP-Flan-T5-XXL InstructBLIP: Towards General-purpose Vision-Language Models … 2023-05-11
Long-Context Understanding InstructBLIP-Vicuna-13B InstructBLIP: Towards General-purpose Vision-Language Models … 2023-05-11
Long-Context Understanding GPT-4o GPT-4 Technical Report 2023-03-15
Long-Context Understanding GPT-4V GPT-4 Technical Report 2023-03-15

Research Papers

Recent papers with results on this dataset: