MMNeedle

Name: MMNeedle
Published: 2024-06-17
License: CC BY 4.0

Multimodal Needle in a Haystack

Dataset Information

Modalities

Images, Texts

Languages

English

Introduced

2024

License

CC BY 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

We introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs.

Variants: MMNeedle

Associated Benchmarks

This dataset is used in 2 benchmarks:

Hallucination
Long-Context Understanding - Metrics: 1 Image, 4*4 Stitching, Exact Accuracy, 1 Image, 8*8 Stitching, Exact Accuracy, 1 Image, 2*2 Stitching, Exact Accuracy, 10 Images, 1*1 Stitching, Exact Accuracy, 10 Images, 2*2 Stitching, Exact Accuracy, 10 Images, 4*4 Stitching, Exact Accuracy, 10 Images, 8*8 Stitching, Exact Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Long-Context Understanding	IDEFICS2-8B	What matters when building vision-language …	2024-05-03
Long-Context Understanding	LLaVA-Llama-3	LLaVA-UHD: an LMM Perceiving Any …	2024-03-18
Long-Context Understanding	Gemini Pro 1.5	Gemini 1.5: Unlocking multimodal understanding …	2024-03-08
Long-Context Understanding	Gemini Pro 1.0	Gemini: A Family of Highly …	2023-12-19
Long-Context Understanding	mPLUG-Owl-v2	mPLUG-Owl2: Revolutionizing Multi-modal Large Language …	2023-11-07
Long-Context Understanding	CogVLM-17B	CogVLM: Visual Expert for Pretrained …	2023-11-06
Long-Context Understanding	CogVLM2-Llama-3	CogVLM: Visual Expert for Pretrained …	2023-11-06
Long-Context Understanding	InstructBLIP-Flan-T5-XXL	InstructBLIP: Towards General-purpose Vision-Language Models …	2023-05-11
Long-Context Understanding	InstructBLIP-Vicuna-13B	InstructBLIP: Towards General-purpose Vision-Language Models …	2023-05-11
Long-Context Understanding	GPT-4o	GPT-4 Technical Report	2023-03-15
Long-Context Understanding	GPT-4V	GPT-4 Technical Report	2023-03-15

Research Papers

Recent papers with results on this dataset:

External Links:

MMNeedle

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview