Gazeta

Dataset Information
Modalities
Texts
Languages
Russian
License
Unknown
Homepage

Overview

Gazeta is a dataset for automatic summarization of Russian news. The dataset consists of 63,435 text-summary pairs. To form training, validation, and test datasets, these pairs were sorted by time and the first 52,400 pairs are used as the training dataset, the proceeding 5,265 pairs as the validation dataset, and the remaining 5,770 pairs as the test dataset.

Source: https://github.com/IlyaGusev/gazeta

Variants: Gazeta

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Text Summarization Finetuned mBART Dataset for Automatic Summarization of … 2020-06-19

Research Papers

Recent papers with results on this dataset: