RACE

ReAding Comprehension dataset from Examinations

Dataset Information
Modalities
Texts
Languages
English
Introduced
2017
Homepage

Overview

The ReAding Comprehension dataset from Examinations (RACE) dataset is a machine reading comprehension dataset consisting of 27,933 passages and 97,867 questions from English exams, targeting Chinese students aged 12-18. RACE consists of two subsets, RACE-M and RACE-H, from middle school and high school exams, respectively. RACE-M has 28,293 questions and RACE-H has 69,574. Each question is associated with 4 candidate answers, one of which is correct. The data generation process of RACE differs from most machine reading comprehension datasets - instead of generating questions and answers by heuristics or crowd-sourcing, questions in RACE are specifically designed for testing human reading skills, and are created by domain experts.

Source: Dynamic Fusion Networks for Machine Reading Comprehension
Image Source: Lai et al

Variants: RACE

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Reading Comprehension Orca 2-13B Orca 2: Teaching Small Language … 2023-11-18
Reading Comprehension Orca 2-7B Orca 2: Teaching Small Language … 2023-11-18
Reading Comprehension GPT-NeoX (one-shot) BloombergGPT: A Large Language Model … 2023-03-30
Reading Comprehension OPT 66B (one-shot) BloombergGPT: A Large Language Model … 2023-03-30
Reading Comprehension BLOOM 176B (one-shot) BloombergGPT: A Large Language Model … 2023-03-30
Reading Comprehension Bloomberg GPT (one-shot) BloombergGPT: A Large Language Model … 2023-03-30
Reading Comprehension LLaMA 7B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Reading Comprehension LLaMA 65B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Reading Comprehension LLaMA 33B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Reading Comprehension LLaMA 13B (zero-shot) LLaMA: Open and Efficient Foundation … 2023-02-27
Reading Comprehension PaLM 62B (zero-shot) PaLM: Scaling Language Modeling with … 2022-04-05
Reading Comprehension PaLM 8B (zero-shot) PaLM: Scaling Language Modeling with … 2022-04-05
Reading Comprehension PaLM 540B (zero-shot) PaLM: Scaling Language Modeling with … 2022-04-05
Reading Comprehension HAT (Encoder) Hierarchical Learning for Generation with … 2021-04-15
Reading Comprehension ALBERT (Ensemble) Improving Machine Reading Comprehension with … 2020-11-06
Reading Comprehension B10-10-10 Funnel-Transformer: Filtering out Sequential Redundancy … 2020-06-05
Reading Comprehension DeBERTalarge DeBERTa: Decoding-enhanced BERT with Disentangled … 2020-06-05
Question Answering GPT-3 175B (few-shot, k=32) Language Models are Few-Shot Learners 2020-05-28
Reading Comprehension GPT-3 175B (zero-shot) Language Models are Few-Shot Learners 2020-05-28
Reading Comprehension GPT-3 175B (0-shot) Language Models are Few-Shot Learners 2020-05-28

Research Papers

Recent papers with results on this dataset: