BookSum

Dataset Information
Modalities
Texts
Languages
English
Introduced
2021
License
Homepage

Overview

BookSum is a collection of datasets for long-form narrative summarization. This dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of this dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures.

BookSum contains summaries for 142,753 paragraphs, 12,293 chapters and 436 books.

Variants: BookSum

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Text Summarization Echoes-Extractive-Abstractive Echoes from Alexandria: A Large … 2023-06-07
Text Summarization BART-LS Adapting Pretrained Text-to-Text Models for … 2022-09-21
Text Summarization Top Down Transformer (AdaPool) (464M) Long Document Summarization with Top-down … 2022-03-15

Research Papers

Recent papers with results on this dataset: