ML Research Wiki / Benchmarks / Question Answering / NewsQA

NewsQA

Question Answering Benchmark

Performance Over Time

📊 Showing 16 results | 📏 Metric: EM

Top Performing Models

Rank	Model	Paper	EM	Date	Code
1	Riple/Saanvi-v0.5-DeepAnalysis 📚	DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing	94.01	2016-11-07	📦 DigitalBiomarkerDiscoveryPipeline/Human-Activity-Recognition
2	OpenAI/o3-2025-01-31-high 📚	o3-mini vs DeepSeek-R1: Which One is Safer?	93.13	2025-01-30	📦 trust4ai/astral
3	OpenAI/o4-mini-2025-05-01-high 📚	Thinking Like Transformers	91.31	2021-06-13	📦 google-deepmind/tracr 📦 deepmind/tracr 📦 tech-srl/RASP 📦 princeton-nlp/transformerprograms 📦 tvergara/tracr-injection
4	OpenAI/o1-2024-12-17-high 📚	0/1 Deep Neural Networks via Block Coordinate Descent	88.72	2022-06-19	-
5	xAI/grok-3-1212 📚	XAI for Transformers: Better Explanations through Conservative Propagation	88.24	2022-02-15	📦 ameenali/xai_transformers
6	deepseek-r1 📚	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	86.13	2025-01-22	📦 deepseek-ai/deepseek-r1 📦 zhaoolee/garss 📦 turningpoint-ai/visualthinker-r1-zero 📦 vlm-rl/ocean-r1
7	Riple/Saanvi-v0.1	Time-series Transformer Generative Adversarial Networks	85.44	2022-05-23	📦 jsyoon0823/TimeGAN 📦 flaviagiammarino/time-gan-tensorflow 📦 AlanDongMu/TimeGAN_PytorchRebuild 📦 MindCode-4/code-5 📦 pwc-1/Paper-10
8	OpenAI/GPT-4o 📚	GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data	81.74	2024-10-03	-
9	Google/Gemini 2.5 Pro 📚	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	79.91	2024-03-08	📦 dlvuldet/primevul
10	SpanBERT	SpanBERT: Improving Pre-training by Representing and Predicting Spans	73.60	2019-07-24	📦 facebookresearch/SpanBERT 📦 mandarjoshi90/coref 📦 zixinzeng-jennifer/spanbert_trans

All Papers (16)

DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing

2016

Riple/Saanvi-v0.5-DeepAnalysis

DigitalBiomarkerDiscoveryPipeline/Human-Activity-Recognition

o3-mini vs DeepSeek-R1: Which One is Safer?

2025

OpenAI/o3-2025-01-31-high

trust4ai/astral

Thinking Like Transformers

2021

OpenAI/o4-mini-2025-05-01-high

google-deepmind/tracr deepmind/tracr

0/1 Deep Neural Networks via Block Coordinate Descent

2022

OpenAI/o1-2024-12-17-high

XAI for Transformers: Better Explanations through Conservative Propagation

2022

xAI/grok-3-1212

ameenali/xai_transformers

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2025

deepseek-r1

deepseek-ai/deepseek-r1 zhaoolee/garss

Time-series Transformer Generative Adversarial Networks

2022

Riple/Saanvi-v0.1

jsyoon0823/TimeGAN flaviagiammarino/time-gan-tensorflow

GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data

2024

OpenAI/GPT-4o

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

2024

Google/Gemini 2.5 Pro

dlvuldet/primevul

SpanBERT: Improving Pre-training by Representing and Predicting Spans

2019

SpanBERT

facebookresearch/SpanBERT mandarjoshi90/coref

LinkBERT: Pretraining Language Models with Document Links

2022

LinkBERT (large)

michiyasunaga/LinkBERT

DyREx: Dynamic Query Representation for Extractive Question Answering

2022

DyREX

urchade/dyrex

Densely Connected Attention Propagation for Reading Comprehension

2018

DecaProp

vanzytay/NIPS2018_DECAPROP ajenningsfrankston/NIPS2018_DECAPROP-master

A Question-Focused Multi-Factor Attention Network for Question Answering

2018

AMANDA

nusnlp/amanda

Efficient and Robust Question Answering from Minimal Context over Documents

2018

MINIMAL(Dyn)

SatyamSoni23/Smart-Question-Answering-System-on-Document

Making Neural QA as Simple as Possible but not Simpler

2017

FastQAExt

uclmr/jack uclnlp/jack newmast/QA-Deep-Learning

NewsQA

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (16)

DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing

o3-mini vs DeepSeek-R1: Which One is Safer?

Thinking Like Transformers

0/1 Deep Neural Networks via Block Coordinate Descent

XAI for Transformers: Better Explanations through Conservative Propagation

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Time-series Transformer Generative Adversarial Networks

GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

SpanBERT: Improving Pre-training by Representing and Predicting Spans

LinkBERT: Pretraining Language Models with Document Links

DyREx: Dynamic Query Representation for Extractive Question Answering

Densely Connected Attention Propagation for Reading Comprehension

A Question-Focused Multi-Factor Attention Network for Question Answering

Efficient and Robust Question Answering from Minimal Context over Documents

Making Neural QA as Simple as Possible but not Simpler

Model	Paper	EM	Date
Riple/Saanvi-v0.5-DeepAnalysis	DeepSense: A Unified Deep Learning Framework for …	94.01	2016-11-07
OpenAI/o3-2025-01-31-high	o3-mini vs DeepSeek-R1: Which One is Safer?	93.13	2025-01-30
OpenAI/o4-mini-2025-05-01-high	Thinking Like Transformers	91.31	2021-06-13
OpenAI/o1-2024-12-17-high	0/1 Deep Neural Networks via Block Coordinate Des…	88.72	2022-06-19
xAI/grok-3-1212	XAI for Transformers: Better Explanations through…	88.24	2022-02-15
deepseek-r1	DeepSeek-R1: Incentivizing Reasoning Capability i…	86.13	2025-01-22
Riple/Saanvi-v0.1	Time-series Transformer Generative Adversarial Ne…	85.44	2022-05-23
OpenAI/GPT-4o	GPT-4o as the Gold Standard: A Scalable and Gener…	81.74	2024-10-03
Google/Gemini 2.5 Pro	Gemini 1.5: Unlocking multimodal understanding ac…	79.91	2024-03-08
SpanBERT	SpanBERT: Improving Pre-training by Representing …	73.60	2019-07-24
LinkBERT (large)	LinkBERT: Pretraining Language Models with Docume…	72.60	2022-03-29
DyREX	DyREx: Dynamic Query Representation for Extractiv…	68.53	2022-10-26
DecaProp	Densely Connected Attention Propagation for Readi…	66.30	2018-11-10
AMANDA	A Question-Focused Multi-Factor Attention Network…	63.70	2018-01-25
MINIMAL(Dyn)	Efficient and Robust Question Answering from Mini…	63.20	2018-05-21
FastQAExt	Making Neural QA as Simple as Possible but not Si…	56.10	2017-03-14