Gazeta

Name: Gazeta
License: Unknown

Dataset Information

Modalities

Texts

Languages

Russian

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Gazeta is a dataset for automatic summarization of Russian news. The dataset consists of 63,435 text-summary pairs. To form training, validation, and test datasets, these pairs were sorted by time and the first 52,400 pairs are used as the training dataset, the proceeding 5,265 pairs as the validation dataset, and the remaining 5,770 pairs as the test dataset.

Source: https://github.com/IlyaGusev/gazeta

Variants: Gazeta

Associated Benchmarks

This dataset is used in 1 benchmark:

Text Summarization - Metrics: ROUGE-1, ROUGE-2, ROUGE-L, BLEU, Meteor

Recent Benchmark Submissions

Task	Model	Paper	Date
Text Summarization	Finetuned mBART	Dataset for Automatic Summarization of …	2020-06-19

Research Papers

Recent papers with results on this dataset:

Dataset for Automatic Summarization of Russian News (2020) -

External Links:

Gazeta

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview