DEplain is a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” (or in German: “Einfache Sprache”). DEplain consists of four main subcorpora: DEplain-APA-doc, DEplain-APA-sent, DEplain-web-doc, and DEplain-web-sent.
DEplain-APA-sent consists of approx. 500 news document pairs and approx. 13k sentence pairs. The sentence pairs are all manually aligned. The data is available upon request, please see https://doi.org/10.5281/zenodo.7674560 for more information. The corpus can be used for German text simplification, or in more detail sentence simplification.
Variants: DEplain-APA-sent
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Text Simplification | mBART (trained on DEplain-APA-sent & DEplain-web-sent) | DEPLAIN: A German Parallel Corpus … | 2023-05-30 |
Text Simplification | mBART (trained on DEplain-APA-sent) | DEPLAIN: A German Parallel Corpus … | 2023-05-30 |
Recent papers with results on this dataset: