DuReader

Name: DuReader
Published: 2018-01-01
License: Unknown

Dataset Information

Modalities

Texts

Languages

Chinese

Introduced

2018

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

DuReader is a large-scale open-domain Chinese machine reading comprehension dataset. The dataset consists of 200K questions, 420K answers and 1M documents. The questions and documents are based on Baidu Search and Baidu Zhidao. The answers are manually generated. The dataset additionally provides question type annotations – each question was manually annotated as either Entity, Description or YesNo and one of Fact or Opinion.

Source: https://arxiv.org/pdf/1711.05073v4.pdf
Image Source: https://arxiv.org/pdf/1711.05073v4.pdf

Variants: DuReader

Associated Benchmarks

This dataset is used in 1 benchmark:

Open-Domain Question Answering - Metrics: EM

Recent Benchmark Submissions

Task	Model	Paper	Date
Open-Domain Question Answering	ERNIE 2.0 Large	ERNIE 2.0: A Continual Pre-training …	2019-07-29
Open-Domain Question Answering	ERNIE 2.0 Base	ERNIE 2.0: A Continual Pre-training …	2019-07-29

Research Papers

Recent papers with results on this dataset:

ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (2019) -

External Links:

DuReader

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview