BookSum is a collection of datasets for long-form narrative summarization. This dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of this dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures.
BookSum contains summaries for 142,753 paragraphs, 12,293 chapters and 436 books.
Variants: BookSum
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Text Summarization | Echoes-Extractive-Abstractive | Echoes from Alexandria: A Large … | 2023-06-07 |
Text Summarization | BART-LS | Adapting Pretrained Text-to-Text Models for … | 2022-09-21 |
Text Summarization | Top Down Transformer (AdaPool) (464M) | Long Document Summarization with Top-down … | 2022-03-15 |
Recent papers with results on this dataset: