SCICAP is a large-scale image captioning dataset that contains real-world scientific figures and captions. SCICAP was constructed using more than two million images from over 290,000 papers collected and released by arXiv.
Image source: https://arxiv.org/pdf/2110.11624v1.pdf
Variants: SCICAP
This dataset is used in 1 benchmark:
Recent papers with results on this dataset: