XSum

Dataset Information
Modalities
Texts
Languages
English
Introduced
2018
License
Unknown
Homepage

Overview

The Extreme Summarization (XSum) dataset is a dataset for evaluation of abstractive single-document summarization systems. The goal is to create a short, one-sentence new summary answering the question “What is the article about?”. The dataset consists of 226,711 news articles accompanied with a one-sentence summary. The articles are collected from BBC articles (2010 to 2017) and cover a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts). The official random split contains 204,045 (90%), 11,332 (5%) and 11,334 (5) documents in training, validation and test sets, respectively.

Source: https://arxiv.org/pdf/1808.08745.pdf
Image Source: https://arxiv.org/pdf/1808.08745.pdf

Variants: XSum, X-Sum

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Text Summarization SRformer-BART Segmented Recurrent Transformer: An Efficient … 2023-05-24
Extreme Summarization PEGASUS The GEM Benchmark: Natural Language … 2021-02-02

Research Papers

Recent papers with results on this dataset: