DEplain is a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” (or in German: “Einfache Sprache”). DEplain consists of four main subcorpora: DEplain-APA-doc, DEplain-APA-sent, DEplain-web-doc, and DEplain-web-sent.
DEplain-APA-doc consists of approx. 500 news document pairs. The data is available upon request, please see https://doi.org/10.5281/zenodo.7674560 for more information. The corpus can be used for German text simplification, or in more detail document simplification.
Variants: DEplain-APA-doc
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Text Simplification | long-mBART (trained on DEplain-APA-doc) | DEPLAIN: A German Parallel Corpus … | 2023-05-30 |
Text Simplification | long-mBART (trained on DEplain-APA-doc & DEplain-web-doc) | DEPLAIN: A German Parallel Corpus … | 2023-05-30 |
Text Simplification | long-mBART (trained on DEplain-web-doc) | DEPLAIN: A German Parallel Corpus … | 2023-05-30 |
Recent papers with results on this dataset: