RSICD

Remote Sensing Image Captioning Dataset

Dataset Information
Modalities
Images
License
Unknown
Homepage

Overview

The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.

Source: https://github.com/201528014227051/RSICD_optimal
Image Source: https://github.com/201528014227051/RSICD_optimal

Variants: RSICD

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Cross-Modal Retrieval HarMA (w/ GeoRSCLIP) Efficient Remote Sensing with Harmonized … 2024-04-28
Cross-Modal Retrieval DOVE Direction-Oriented Visual-semantic Embedding Model for … 2023-10-12
Cross-Modal Retrieval PE-RSITR (MRS-Adapter) Parameter-Efficient Transfer Learning for Remote … 2023-08-24
Cross-Modal Retrieval GeoRSCLIP-FT RS5M and GeoRSCLIP: A Large … 2023-06-20
Image-to-Text Retrieval GeoRSCLIP-FT RS5M and GeoRSCLIP: A Large … 2023-06-20
Cross-Modal Retrieval RemoteCLIP RemoteCLIP: A Vision Language Foundation … 2023-06-19
Cross-Modal Retrieval GaLR Remote Sensing Cross-Modal Text-Image Retrieval … 2022-04-21
Cross-Modal Retrieval AMFMN Exploring a Fine-Grained Multiscale Method … 2022-04-21

Research Papers

Recent papers with results on this dataset: