Newsela

Dataset Information
Languages
English
Introduced
2015
License
Unknown
Homepage

Overview

The Newsela dataset was introduced by Xu et al. in their research on text simplification. It is a corpus that includes thousands of news articles professionally leveled to different reading complexities. The dataset is used for academic research in fields such as text difficulty and text simplification. It is made available to academic partners upon request. The dataset is often used as a benchmark in the field of text simplification. Please note that the Newsela dataset is different from the NELA datasets, which are collections of news articles for the study of media bias and other applications.

Variants: Newsela

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Text Simplification Edit-Unsup-TS Iterative Edit-Based Unsupervised Sentence Simplification 2020-06-17
Text Simplification CRF Alignment + Transformer Neural CRF Model for Sentence … 2020-05-05
Text Simplification EditNTS EditNTS: An Neural Programmer-Interpreter Model … 2019-06-19
Text Simplification S2S-Cluster-FA Complexity-Weighted Loss and Diverse Reranking … 2019-04-04
Text Simplification DMASS + DCSS Integrating Transformer and Paraphrase Rules … 2018-10-26
Text Simplification Pointer + Multi-task Entailment and Paraphrase Generation Dynamic Multi-Level Multi-Task Learning for … 2018-06-19
Text Simplification NSELSTM-S Sentence Simplification with Memory-Augmented Neural … 2018-04-20
Text Simplification NSELSTM-B Sentence Simplification with Memory-Augmented Neural … 2018-04-20
Text Simplification DRESS Sentence Simplification with Deep Reinforcement … 2017-03-31
Text Simplification DRESS-LS Sentence Simplification with Deep Reinforcement … 2017-03-31

Research Papers

Recent papers with results on this dataset: