DEplain-APA-doc

Dataset Information
Modalities
Texts
Languages
German
Introduced
2023
License
Homepage

Overview

DEplain-APA-doc: A German Parallel Corpus for Document Simplification on News Texts

DEplain is a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” (or in German: “Einfache Sprache”). DEplain consists of four main subcorpora: DEplain-APA-doc, DEplain-APA-sent, DEplain-web-doc, and DEplain-web-sent.

DEplain-APA-doc consists of approx. 500 news document pairs. The data is available upon request, please see https://doi.org/10.5281/zenodo.7674560 for more information. The corpus can be used for German text simplification, or in more detail document simplification.

Variants: DEplain-APA-doc

Associated Benchmarks

This dataset is used in 1 benchmark:

  • Text Simplification -

Recent Benchmark Submissions

Task Model Paper Date
Text Simplification long-mBART (trained on DEplain-APA-doc) DEPLAIN: A German Parallel Corpus … 2023-05-30
Text Simplification long-mBART (trained on DEplain-APA-doc & DEplain-web-doc) DEPLAIN: A German Parallel Corpus … 2023-05-30
Text Simplification long-mBART (trained on DEplain-web-doc) DEPLAIN: A German Parallel Corpus … 2023-05-30

Research Papers

Recent papers with results on this dataset: