LitBank

Dataset Information
Modalities
Texts
Introduced
2019
License
Unknown
Homepage

Overview

LitBank is an annotated dataset of 100 works of English-language fiction to support tasks in natural language processing and the computational humanities, described in more detail in the following publications:

  • David Bamman, Sejal Popat and Sheng Shen (2019), "An Annotated Dataset of Literary Entities," NAACL 2019.
  • Matthew Sims, Jong Ho Park and David Bamman (2019), "Literary Event Detection," ACL 2019.
  • David Bamman, Olivia Lewke and Anya Mansoor (2020), "An Annotated Dataset of Coreference in English Literature", LREC.

LitBank currently contains annotations for entities, events, entity coreference, and quotation attribution in a sample of ~2,000 words from each of those texts, totaling 210,532 tokens.

LitBank is licensed under a Creative Commons Attribution 4.0 International License.

Variants: LitBank

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Coreference Resolution Maverick_incr Maverick: Efficient and Accurate Coreference … 2024-07-31
Coreference Resolution longdoc S (OntoNotes + PreCo + LitBank) On Generalization in Coreference Resolution 2021-09-20

Research Papers

Recent papers with results on this dataset: