WIT

Name: WIT
Published: 2021-03-02
License: Unknown

Wikipedia-based Image Text

Dataset Information

Modalities

Images, Texts

Languages

Multilingual

Introduced

2021

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.

Key Advantages

A few unique advantages of WIT:

The largest multimodal dataset (time of this writing) by the number of image-text examples.
A massively multilingual (first of its kind) with coverage for over 100+ languages.
A collection of diverse set of concepts and real world entities.
Brings forth challenging real-world test sets.

Variants: WIT

Associated Benchmarks

This dataset is used in 1 benchmark:

Image Retrieval - Metrics: R@1, R@5

Recent Benchmark Submissions

Task	Model	Paper	Date
Image Retrieval	WIT-ALL	WIT: Wikipedia-based Image Text Dataset …	2021-03-02
Image Retrieval	CC (Conceptual Captions)	WIT: Wikipedia-based Image Text Dataset …	2021-03-02

Research Papers

Recent papers with results on this dataset:

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning (2021) -

External Links:

WIT

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview