ToTTo

Dataset Information
Modalities
Texts
Languages
English
Introduced
2020
License
Unknown
Homepage

Overview

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

During the dataset creation process, tables from English Wikipedia are matched with (noisy) descriptions. Each table cell mentioned in the description is highlighted and the descriptions are iteratively cleaned and corrected to faithfully reflect the content of the highlighted cells.

Source: Google Research Datasets

Variants: ToTTo

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Data-to-Text Generation LATTICE (T5-base) Robust (Controlled) Table-to-Text Generation with … 2022-05-08
Data-to-Text Generation T5 The GEM Benchmark: Natural Language … 2021-02-02
Data-to-Text Generation T5-3B Text-to-Text Pre-Training for Data-to-Text Tasks 2020-05-21
Data-to-Text Generation BERT-to-BERT ToTTo: A Controlled Table-To-Text Generation … 2020-04-29
Data-to-Text Generation Pointer Generator ToTTo: A Controlled Table-To-Text Generation … 2020-04-29
Data-to-Text Generation NCP+CC (Puduppully et al 2019) ToTTo: A Controlled Table-To-Text Generation … 2020-04-29

Research Papers

Recent papers with results on this dataset: