BookSum

Name: BookSum
Published: 2021-05-18
License: BSD-3 License

Dataset Information

Modalities

Texts

Languages

English

Introduced

2021

License

BSD-3 License

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

BookSum is a collection of datasets for long-form narrative summarization. This dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of this dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures.

BookSum contains summaries for 142,753 paragraphs, 12,293 chapters and 436 books.

Variants: BookSum

Associated Benchmarks

This dataset is used in 1 benchmark:

Text Summarization - Metrics: ROUGE, ROUGE-2, ROUGE-L

Recent Benchmark Submissions

Task	Model	Paper	Date
Text Summarization	Echoes-Extractive-Abstractive	Echoes from Alexandria: A Large …	2023-06-07
Text Summarization	BART-LS	Adapting Pretrained Text-to-Text Models for …	2022-09-21
Text Summarization	Top Down Transformer (AdaPool) (464M)	Long Document Summarization with Top-down …	2022-03-15

Research Papers

Recent papers with results on this dataset:

External Links:

BookSum

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview