EarthVQA

Name: EarthVQA
Published: 2024-05-12
License: CC BY-NC

A multi-modal multi-task VQA dataset for remote sensing

Dataset Information

Modalities

Images, Texts

Languages

English

Introduced

2024

License

CC BY-NC

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.

Characteristics:

Multi-level annotations: The paired image-mask-QA pairs assisst for relational reasoning-based remote sensing visual question answering.
Applicable QA pairs: All QA pairs are designed based on the actual city planning needs.

Variants: EarthVQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Visual Question Answering - Metrics: Overall Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Visual Question Answering	SOBA	EarthVQA: Towards Queryable Earth via …	2023-12-19

Research Papers

Recent papers with results on this dataset:

EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering (2023) -

External Links:

Papers with Code Entry

EarthVQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview