ML Research Wiki / Benchmarks / Math Word Problem Solving / MAWPS

MAWPS

Math Word Problem Solving Benchmark

Performance Over Time

📊 Showing 15 results | 📏 Metric: Accuracy (%)

Top Performing Models

Rank	Model	Paper	Accuracy (%)	Date	Code
1	OpenMath-CodeLlama-70B (w/ code) 📚	OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	95.70	2024-02-15	📦 kipok/nemo-skills
2	MsAT-DeductReasoner	Learning Multi-Step Reasoning by Solving Arithmetic Tasks	94.30	2023-06-02	📦 TianduoWang/MsAT
3	ATHENA (roberta-large)	ATHENA: Mathematical Reasoning with Thought Expansion	93.00	2023-11-02	📦 the-jb/athena-math
4	Multi-view 📚	Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem	92.30	2022-10-21	📦 zwq2018/multi-view-consistency-for-mwp
5	Exp-Tree	An Expression Tree Decoding Strategy for Mathematical Equation Generation	92.30	2023-10-14	📦 zwq2018/multi-view-consistency-for-mwp
6	ATHENA (roberta-base)	ATHENA: Mathematical Reasoning with Thought Expansion	92.20	2023-11-02	📦 the-jb/athena-math
7	Roberta-DeductReasoner	Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction	92.00	2022-03-19	📦 allanj/deductive-mwp
8	DeBERTa (PM + VM) 📚	Math Word Problem Solving by Generating Linguistic Variants of Problem Statements	91.00	2023-06-24	📦 starscream-11813/variational-mathematical-reasoning
9	Graph2Tree with RoBERTa	Are NLP Models really able to Solve Simple Math Word Problems?	88.70	2021-03-12	📦 arkilpatel/SVAMP 📦 debjitpaul/refiner 📦 vedantgaur/symbolic-mwp-reasoning
10	GTS with RoBERTa	Are NLP Models really able to Solve Simple Math Word Problems?	88.50	2021-03-12	📦 arkilpatel/SVAMP 📦 debjitpaul/refiner 📦 vedantgaur/symbolic-mwp-reasoning

All Papers (15)

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

2024

OpenMath-CodeLlama-70B (w/ code)

kipok/nemo-skills

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

2023

MsAT-DeductReasoner

TianduoWang/MsAT

ATHENA: Mathematical Reasoning with Thought Expansion

2023

ATHENA (roberta-large)

the-jb/athena-math

Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem

2022

Multi-view

zwq2018/multi-view-consistency-for-mwp

An Expression Tree Decoding Strategy for Mathematical Equation Generation

2023

Exp-Tree

zwq2018/multi-view-consistency-for-mwp

ATHENA: Mathematical Reasoning with Thought Expansion

2023

ATHENA (roberta-base)

the-jb/athena-math

Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction

2022

Roberta-DeductReasoner

allanj/deductive-mwp

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

2023

DeBERTa (PM + VM)

starscream-11813/variational-mathematical-reasoning

Are NLP Models really able to Solve Simple Math Word Problems?

2021

Graph2Tree with RoBERTa

arkilpatel/SVAMP debjitpaul/refiner vedantgaur/symbolic-mwp-reasoning

Are NLP Models really able to Solve Simple Math Word Problems?

2021

GTS with RoBERTa

arkilpatel/SVAMP debjitpaul/refiner vedantgaur/symbolic-mwp-reasoning

Llama 2: Open Foundation and Fine-Tuned Chat Models

2023

LLaMA 2-Chat

facebookresearch/llama llamafamily/llama-chinese

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

2023

GPT-3.5 turbo (175B)

starscream-11813/variational-mathematical-reasoning

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

2023

GPT-J

starscream-11813/variational-mathematical-reasoning

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

2023

GPT-3 text-curie-001 (13B)

starscream-11813/variational-mathematical-reasoning

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

2023

GPT-3 text-babbage-001 (6.7B)

starscream-11813/variational-mathematical-reasoning

MAWPS

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (15)

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

ATHENA: Mathematical Reasoning with Thought Expansion

Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem

An Expression Tree Decoding Strategy for Mathematical Equation Generation

ATHENA: Mathematical Reasoning with Thought Expansion

Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Are NLP Models really able to Solve Simple Math Word Problems?

Are NLP Models really able to Solve Simple Math Word Problems?

Llama 2: Open Foundation and Fine-Tuned Chat Models

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Model	Paper	Accuracy (%)	Date
OpenMath-CodeLlama-70B (w/ code)	OpenMathInstruct-1: A 1.8 Million Math Instructio…	95.70	2024-02-15
MsAT-DeductReasoner	Learning Multi-Step Reasoning by Solving Arithmet…	94.30	2023-06-02
ATHENA (roberta-large)	ATHENA: Mathematical Reasoning with Thought Expan…	93.00	2023-11-02
Multi-view	Multi-View Reasoning: Consistent Contrastive Lear…	92.30	2022-10-21
Exp-Tree	An Expression Tree Decoding Strategy for Mathemat…	92.30	2023-10-14
ATHENA (roberta-base)	ATHENA: Mathematical Reasoning with Thought Expan…	92.20	2023-11-02
Roberta-DeductReasoner	Learning to Reason Deductively: Math Word Problem…	92.00	2022-03-19
DeBERTa (PM + VM)	Math Word Problem Solving by Generating Linguisti…	91.00	2023-06-24
Graph2Tree with RoBERTa	Are NLP Models really able to Solve Simple Math W…	88.70	2021-03-12
GTS with RoBERTa	Are NLP Models really able to Solve Simple Math W…	88.50	2021-03-12
LLaMA 2-Chat	Llama 2: Open Foundation and Fine-Tuned Chat Mode…	82.40	2023-07-18
GPT-3.5 turbo (175B)	Math Word Problem Solving by Generating Linguisti…	80.30	2023-06-24
GPT-J	Math Word Problem Solving by Generating Linguisti…	9.90	2023-06-24
GPT-3 text-curie-001 (13B)	Math Word Problem Solving by Generating Linguisti…	4.09	2023-06-24
GPT-3 text-babbage-001 (6.7B)	Math Word Problem Solving by Generating Linguisti…	2.76	2023-06-24