ML Research Wiki / Benchmarks / Long-Context Understanding / MMNeedle

MMNeedle

Long-Context Understanding Benchmark

Performance Over Time

📊 Showing 11 results | 📏 Metric: 1 Image, 4*4 Stitching, Exact Accuracy

Top Performing Models

Rank	Model	Paper	1 Image, 4*4 Stitching, Exact Accuracy	Date	Code
1	GPT-4o	GPT-4 Technical Report	94.60	2023-03-15	📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
2	Gemini Pro 1.5	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	90.34	2024-03-08	📦 dlvuldet/primevul
3	GPT-4V	GPT-4 Technical Report	86.09	2023-03-15	📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
4	LLaVA-Llama-3	LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images	43.80	2024-03-18	📦 thunlp/llava-uhd
5	Gemini Pro 1.0	Gemini: A Family of Highly Capable Multimodal Models	29.53	2023-12-19	📦 valdecy/pybibx
6	IDEFICS2-8B	What matters when building vision-language models?	18.90	2024-05-03	-
7	CogVLM2-Llama-3	CogVLM: Visual Expert for Pretrained Language Models	7.30	2023-11-06	📦 thudm/cogvlm 📦 THUDM/CogAgent 📦 2024-MindSpore-1/Code2 📦 MS-P3/code5
8	InstructBLIP-Flan-T5-XXL	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	3.80	2023-05-11	📦 salesforce/lavis 📦 tabtoyou/kollava 📦 pwc-1/Paper-9 📦 MS-P3/code3
9	mPLUG-Owl-v2	mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	1.90	2023-11-07	📦 x-plug/mplug-owl 📦 X-PLUG/mPLUG-Owl
10	CogVLM-17B	CogVLM: Visual Expert for Pretrained Language Models	0.00	2023-11-06	📦 thudm/cogvlm 📦 THUDM/CogAgent 📦 2024-MindSpore-1/Code2 📦 MS-P3/code5

All Papers (11)

GPT-4 Technical Report

2023

GPT-4o

openai/evals shmsw25/factscore

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

2024

Gemini Pro 1.5

dlvuldet/primevul

GPT-4 Technical Report

2023

GPT-4V

openai/evals shmsw25/factscore

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

2024

LLaVA-Llama-3

thunlp/llava-uhd

Gemini: A Family of Highly Capable Multimodal Models

2023

Gemini Pro 1.0

valdecy/pybibx

What matters when building vision-language models?

2024

IDEFICS2-8B

CogVLM: Visual Expert for Pretrained Language Models

2023

CogVLM2-Llama-3

thudm/cogvlm THUDM/CogAgent

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

2023

InstructBLIP-Flan-T5-XXL

salesforce/lavis tabtoyou/kollava

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2023

mPLUG-Owl-v2

x-plug/mplug-owl X-PLUG/mPLUG-Owl

CogVLM: Visual Expert for Pretrained Language Models

2023

CogVLM-17B

thudm/cogvlm THUDM/CogAgent

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

2023

InstructBLIP-Vicuna-13B

salesforce/lavis tabtoyou/kollava

MMNeedle

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (11)

GPT-4 Technical Report

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

GPT-4 Technical Report

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Gemini: A Family of Highly Capable Multimodal Models

What matters when building vision-language models?

CogVLM: Visual Expert for Pretrained Language Models

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

CogVLM: Visual Expert for Pretrained Language Models

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

Model	Paper	1 Image, 4*4 Stitching, Exact Accuracy	Date
GPT-4o	GPT-4 Technical Report	94.60	2023-03-15
Gemini Pro 1.5	Gemini 1.5: Unlocking multimodal understanding ac…	90.34	2024-03-08
GPT-4V	GPT-4 Technical Report	86.09	2023-03-15
LLaVA-Llama-3	LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and…	43.80	2024-03-18
Gemini Pro 1.0	Gemini: A Family of Highly Capable Multimodal Mod…	29.53	2023-12-19
IDEFICS2-8B	What matters when building vision-language models?	18.90	2024-05-03
CogVLM2-Llama-3	CogVLM: Visual Expert for Pretrained Language Mod…	7.30	2023-11-06
InstructBLIP-Flan-T5-XXL	InstructBLIP: Towards General-purpose Vision-Lang…	3.80	2023-05-11
mPLUG-Owl-v2	mPLUG-Owl2: Revolutionizing Multi-modal Large Lan…	1.90	2023-11-07
CogVLM-17B	CogVLM: Visual Expert for Pretrained Language Mod…	0.00	2023-11-06
InstructBLIP-Vicuna-13B	InstructBLIP: Towards General-purpose Vision-Lang…	0.00	2023-05-11