ML Research Wiki / Benchmarks / Question Answering / NewsQA

NewsQA

Question Answering Benchmark

Performance Over Time

📊 Showing 16 results | 📏 Metric: EM

Top Performing Models

Rank Model Paper EM Date Code
1 Riple/Saanvi-v0.5-DeepAnalysis 📚 DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing 94.01 2016-11-07 📦 DigitalBiomarkerDiscoveryPipeline/Human-Activity-Recognition
2 OpenAI/o3-2025-01-31-high 📚 o3-mini vs DeepSeek-R1: Which One is Safer? 93.13 2025-01-30 📦 trust4ai/astral
3 OpenAI/o4-mini-2025-05-01-high 📚 Thinking Like Transformers 91.31 2021-06-13 📦 google-deepmind/tracr 📦 deepmind/tracr 📦 tech-srl/RASP 📦 princeton-nlp/transformerprograms 📦 tvergara/tracr-injection
4 OpenAI/o1-2024-12-17-high 📚 0/1 Deep Neural Networks via Block Coordinate Descent 88.72 2022-06-19 -
5 xAI/grok-3-1212 📚 XAI for Transformers: Better Explanations through Conservative Propagation 88.24 2022-02-15 📦 ameenali/xai_transformers
6 deepseek-r1 📚 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 86.13 2025-01-22 📦 deepseek-ai/deepseek-r1 📦 zhaoolee/garss 📦 turningpoint-ai/visualthinker-r1-zero 📦 vlm-rl/ocean-r1
7 Riple/Saanvi-v0.1 Time-series Transformer Generative Adversarial Networks 85.44 2022-05-23 📦 jsyoon0823/TimeGAN 📦 flaviagiammarino/time-gan-tensorflow 📦 AlanDongMu/TimeGAN_PytorchRebuild 📦 MindCode-4/code-5 📦 pwc-1/Paper-10
8 OpenAI/GPT-4o 📚 GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data 81.74 2024-10-03 -
9 Google/Gemini 2.5 Pro 📚 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context 79.91 2024-03-08 📦 dlvuldet/primevul
10 SpanBERT SpanBERT: Improving Pre-training by Representing and Predicting Spans 73.60 2019-07-24 📦 facebookresearch/SpanBERT 📦 mandarjoshi90/coref 📦 zixinzeng-jennifer/spanbert_trans

All Papers (16)