DuReader

Dataset Information
Modalities
Texts
Languages
Chinese
Introduced
2018
License
Unknown
Homepage

Overview

DuReader is a large-scale open-domain Chinese machine reading comprehension dataset. The dataset consists of 200K questions, 420K answers and 1M documents. The questions and documents are based on Baidu Search and Baidu Zhidao. The answers are manually generated. The dataset additionally provides question type annotations – each question was manually annotated as either Entity, Description or YesNo and one of Fact or Opinion.

Source: https://arxiv.org/pdf/1711.05073v4.pdf
Image Source: https://arxiv.org/pdf/1711.05073v4.pdf

Variants: DuReader

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Open-Domain Question Answering ERNIE 2.0 Large ERNIE 2.0: A Continual Pre-training … 2019-07-29
Open-Domain Question Answering ERNIE 2.0 Base ERNIE 2.0: A Continual Pre-training … 2019-07-29

Research Papers

Recent papers with results on this dataset: