Wikipedia-based Image Text
Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.
Key Advantages
A few unique advantages of WIT:
Variants: WIT
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Image Retrieval | WIT-ALL | WIT: Wikipedia-based Image Text Dataset … | 2021-03-02 |
Image Retrieval | CC (Conceptual Captions) | WIT: Wikipedia-based Image Text Dataset … | 2021-03-02 |
Recent papers with results on this dataset: