Remote Sensing Image Captioning Dataset
The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.
Source: https://github.com/201528014227051/RSICD_optimal
Image Source: https://github.com/201528014227051/RSICD_optimal
Variants: RSICD
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Cross-Modal Retrieval | HarMA (w/ GeoRSCLIP) | Efficient Remote Sensing with Harmonized … | 2024-04-28 |
Cross-Modal Retrieval | DOVE | Direction-Oriented Visual-semantic Embedding Model for … | 2023-10-12 |
Cross-Modal Retrieval | PE-RSITR (MRS-Adapter) | Parameter-Efficient Transfer Learning for Remote … | 2023-08-24 |
Cross-Modal Retrieval | GeoRSCLIP-FT | RS5M and GeoRSCLIP: A Large … | 2023-06-20 |
Image-to-Text Retrieval | GeoRSCLIP-FT | RS5M and GeoRSCLIP: A Large … | 2023-06-20 |
Cross-Modal Retrieval | RemoteCLIP | RemoteCLIP: A Vision Language Foundation … | 2023-06-19 |
Cross-Modal Retrieval | GaLR | Remote Sensing Cross-Modal Text-Image Retrieval … | 2022-04-21 |
Cross-Modal Retrieval | AMFMN | Exploring a Fine-Grained Multiscale Method … | 2022-04-21 |
Recent papers with results on this dataset: