RSICD

Name: RSICD
License: Unknown

Remote Sensing Image Captioning Dataset

Dataset Information

Modalities

Images

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.

Source: https://github.com/201528014227051/RSICD_optimal
Image Source: https://github.com/201528014227051/RSICD_optimal

Variants: RSICD

Associated Benchmarks

This dataset is used in 2 benchmarks:

Cross-Modal Retrieval - Metrics: Mean Recall, Image-to-text R@1, text-to-image R@1
Image-to-Text Retrieval - Metrics: Image to Text Recall@1

Recent Benchmark Submissions

Task	Model	Paper	Date
Cross-Modal Retrieval	HarMA (w/ GeoRSCLIP)	Efficient Remote Sensing with Harmonized …	2024-04-28
Cross-Modal Retrieval	DOVE	Direction-Oriented Visual-semantic Embedding Model for …	2023-10-12
Cross-Modal Retrieval	PE-RSITR (MRS-Adapter)	Parameter-Efficient Transfer Learning for Remote …	2023-08-24
Cross-Modal Retrieval	GeoRSCLIP-FT	RS5M and GeoRSCLIP: A Large …	2023-06-20
Image-to-Text Retrieval	GeoRSCLIP-FT	RS5M and GeoRSCLIP: A Large …	2023-06-20
Cross-Modal Retrieval	RemoteCLIP	RemoteCLIP: A Vision Language Foundation …	2023-06-19
Cross-Modal Retrieval	GaLR	Remote Sensing Cross-Modal Text-Image Retrieval …	2022-04-21
Cross-Modal Retrieval	AMFMN	Exploring a Fine-Grained Multiscale Method …	2022-04-21

Research Papers

Recent papers with results on this dataset:

External Links:

RSICD

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview