The Extreme Summarization (XSum) dataset is a dataset for evaluation of abstractive single-document summarization systems. The goal is to create a short, one-sentence new summary answering the question “What is the article about?”. The dataset consists of 226,711 news articles accompanied with a one-sentence summary. The articles are collected from BBC articles (2010 to 2017) and cover a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts). The official random split contains 204,045 (90%), 11,332 (5%) and 11,334 (5) documents in training, validation and test sets, respectively.
Source: https://arxiv.org/pdf/1808.08745.pdf
Image Source: https://arxiv.org/pdf/1808.08745.pdf
Variants: XSum, X-Sum
This dataset is used in 4 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Text Summarization | SRformer-BART | Segmented Recurrent Transformer: An Efficient … | 2023-05-24 |
Extreme Summarization | PEGASUS | The GEM Benchmark: Natural Language … | 2021-02-02 |
Recent papers with results on this dataset: