InfoSeek

Visual Information Seeking

Dataset Information
Modalities
Images, Texts
Languages
English
Introduced
2023
License
Homepage

Overview

In this project, we introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions that cannot be answered with only common sense knowledge. Using InfoSeek, we analyze various pre-trained visual question answering models and gain insights into their characteristics. Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset elicits models to use fine-grained knowledge that was learned during their pre-training.

Variants: InfoSeek

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering (VQA) RA-VQAv2 w/ PreFLMR PreFLMR: Scaling Up Fine-Grained Late-Interaction … 2024-02-13
Retrieval PreFLMR PreFLMR: Scaling Up Fine-Grained Late-Interaction … 2024-02-13
Visual Question Answering (VQA) PaLI-X PaLI-X: On Scaling up a … 2023-05-29
Visual Question Answering (VQA) CLIP + PaLM (540B) Can Pre-trained Vision and Language … 2023-02-23
Visual Question Answering (VQA) CLIP + FiD Can Pre-trained Vision and Language … 2023-02-23
Visual Question Answering (VQA) PaLI Can Pre-trained Vision and Language … 2023-02-23
Visual Question Answering (VQA) BLIP2 BLIP-2: Bootstrapping Language-Image Pre-training with … 2023-01-30

Research Papers

Recent papers with results on this dataset: