ViP-Bench

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Dataset Information
Modalities
Images, Texts, Interactive
Languages
English
Introduced
2023
License
Homepage

Overview

ViP-Bench is a comprehensive benchmark designed to assess the capability of multimodal models in understanding visual prompts across multiple dimensions. It aims to evaluate how well these models interpret various visual prompts, including recognition, OCR, knowledge, math, relationship reasoning, and language generation. ViP-Bench includes a diverse set of 303 images and questions, providing a thorough assessment of visual understanding capabilities at the region level. This benchmark sets a foundation for future research into multimodal models with arbitrary visual prompts.

Variants: ViP-Bench

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt Inst-IT: Boosting Multimodal Instance Understanding … 2024-12-04
Visual Question Answering LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt Inst-IT: Boosting Multimodal Instance Understanding … 2024-12-04
Visual Question Answering ViP-LLaVA-13B (Visual Prompt) Making Large Language Models Better … 2023-10-31
Visual Question Answering LLaVA-1.5-13B (Visual Prompt) Improved Baselines with Visual Instruction … 2023-10-05
Visual Question Answering LLaVA-1.5-13B (Coordinates) Improved Baselines with Visual Instruction … 2023-10-05
Visual Question Answering Qwen-VL-Chat (Visual Prompt) Qwen-VL: A Versatile Vision-Language Model … 2023-08-24
Visual Question Answering Qwen-VL-Chat (Coordinates) Qwen-VL: A Versatile Vision-Language Model … 2023-08-24
Visual Question Answering GPT4ROI 7B (ROI) GPT4RoI: Instruction Tuning Large Language … 2023-07-07
Visual Question Answering Shikra-7B (Coordinates) Shikra: Unleashing Multimodal LLM's Referential … 2023-06-27
Visual Question Answering Kosmos-2 (Discrete Token) Kosmos-2: Grounding Multimodal Large Language … 2023-06-26
Visual Question Answering InstructBLIP-13B (Visual Prompt) InstructBLIP: Towards General-purpose Vision-Language Models … 2023-05-11
Visual Question Answering GPT-4V-turbo-detail:low (Visual Prompt) GPT-4 Technical Report 2023-03-15
Visual Question Answering GPT-4V-turbo-detail:high (Visual Prompt) GPT-4 Technical Report 2023-03-15

Research Papers

Recent papers with results on this dataset: