NewsQA

Dataset Information
Modalities
Texts
Languages
English
Introduced
2017
License
Homepage

Overview

The NewsQA dataset is a crowd-sourced machine reading comprehension dataset of 120,000 question-answer pairs.

  • Documents are CNN news articles.
  • Questions are written by human users in natural language.
  • Answers may be multiword passages of the source text.
  • Questions may be unanswerable.
  • NewsQA is collected using a 3-stage, siloed process.
  • Questioners see only an article’s headline and highlights.
  • Answerers see the question and the full article, then select an answer passage.
  • Validators see the article, the question, and a set of answers that they rank.
  • NewsQA is more natural and more challenging than previous datasets.

Source: https://www.microsoft.com/en-us/research/project/newsqa-dataset/
Image Source: Trischler et al

Variants: NewsQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering OpenAI/o3-2025-01-31-high o3-mini vs DeepSeek-R1: Which One … 2025-01-30
Question Answering deepseek-r1 DeepSeek-R1: Incentivizing Reasoning Capability in … 2025-01-22
Question Answering OpenAI/GPT-4o GPT-4o as the Gold Standard: … 2024-10-03
Question Answering Google/Gemini 2.5 Pro Gemini 1.5: Unlocking multimodal understanding … 2024-03-08
Question Answering DyREX DyREx: Dynamic Query Representation for … 2022-10-26
Question Answering OpenAI/o1-2024-12-17-high 0/1 Deep Neural Networks via … 2022-06-19
Question Answering Riple/Saanvi-v0.1 Time-series Transformer Generative Adversarial Networks 2022-05-23
Question Answering LinkBERT (large) LinkBERT: Pretraining Language Models with … 2022-03-29
Question Answering xAI/grok-3-1212 XAI for Transformers: Better Explanations … 2022-02-15
Question Answering OpenAI/o4-mini-2025-05-01-high Thinking Like Transformers 2021-06-13
Question Answering SpanBERT SpanBERT: Improving Pre-training by Representing … 2019-07-24
Question Answering DecaProp Densely Connected Attention Propagation for … 2018-11-10
Question Answering MINIMAL(Dyn) Efficient and Robust Question Answering … 2018-05-21
Question Answering AMANDA A Question-Focused Multi-Factor Attention Network … 2018-01-25
Question Answering FastQAExt Making Neural QA as Simple … 2017-03-14
Question Answering Riple/Saanvi-v0.5-DeepAnalysis DeepSense: A Unified Deep Learning … 2016-11-07

Research Papers

Recent papers with results on this dataset: