ML Research Wiki / Benchmarks / Math Word Problem Solving / SVAMP

SVAMP

Math Word Problem Solving Benchmark

Performance Over Time

📊 Showing 23 results | 📏 Metric: Execution Accuracy

Top Performing Models

Rank	Model	Paper	Execution Accuracy	Date	Code
1	GPT-4 DUP	Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	94.20	2024-04-23	📦 whu-zqh/dup
2	GPT-4 (Teaching-Inspired)	Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models	93.90	2024-10-10	📦 sallytan13/teaching-inspired-prompting
3	GPT-4 (Model Selection)	Automatic Model Selection with Large Language Models for Reasoning	93.70	2023-05-23	📦 xuzhao0/model-selection-reasoning
4	GPT-4 (PHP)	Progressive-Hint Prompting Improves Reasoning in Large Language Models	91.90	2023-04-19	📦 chuanyang-Zheng/Progressive-Hint
5	OpenMath-CodeLlama-70B (w/ code) 📚	OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	87.80	2024-02-15	📦 kipok/nemo-skills
6	MathCoder-L-70B 📚	MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning	84.90	2023-10-05	📦 mathllm/mathcoder
7	MMOS-CODE-34B(0-shot) 📚	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	80.60	2024-02-23	📦 cyzhh/MMOS
8	MMOS-DeepSeekMath-7B(0-shot) 📚	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	79.30	2024-02-23	📦 cyzhh/MMOS
9	MMOS-CODE-7B(0-shot) 📚	An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	76.40	2024-02-23	📦 cyzhh/MMOS
10	LLaMA 2-Chat	Llama 2: Open Foundation and Fine-Tuned Chat Models	69.20	2023-07-18	📦 facebookresearch/llama 📦 llamafamily/llama-chinese 📦 flagalpha/llama2-chinese

All Papers (23)

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

2024

GPT-4 DUP

whu-zqh/dup

Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

2024

GPT-4 (Teaching-Inspired)

sallytan13/teaching-inspired-prompting

Automatic Model Selection with Large Language Models for Reasoning

2023

GPT-4 (Model Selection)

xuzhao0/model-selection-reasoning

Progressive-Hint Prompting Improves Reasoning in Large Language Models

2023

GPT-4 (PHP)

chuanyang-Zheng/Progressive-Hint

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

2024

OpenMath-CodeLlama-70B (w/ code)

kipok/nemo-skills

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

2023

MathCoder-L-70B

mathllm/mathcoder

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

2024

MMOS-CODE-34B(0-shot)

cyzhh/MMOS

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

2024

MMOS-DeepSeekMath-7B(0-shot)

cyzhh/MMOS

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

2024

MMOS-CODE-7B(0-shot)

cyzhh/MMOS

Llama 2: Open Foundation and Fine-Tuned Chat Models

2023

LLaMA 2-Chat

facebookresearch/llama llamafamily/llama-chinese

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

2023

DeBERTa

starscream-11813/variational-mathematical-reasoning

Large Language Models are Zero-Shot Reasoners

2022

PaLM (zero-shot, CoT)

kojima-takeshi188/zero_shot_cot skytliang/multi-agents-debate

Large Language Models are Zero-Shot Reasoners

2022

PaLM (zero-shot)

kojima-takeshi188/zero_shot_cot skytliang/multi-agents-debate

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

2023

SYRELM (Vicuna 13B)

joykirat18/syrelm

ATHENA: Mathematical Reasoning with Thought Expansion

2023

ATHENA (roberta-large)

the-jb/athena-math

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

2023

MsAT-DeductReasoner

TianduoWang/MsAT

Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction

2022

Roberta-DeductReasoner

allanj/deductive-mwp

ATHENA: Mathematical Reasoning with Thought Expansion

2023

ATHENA (roberta-base)

the-jb/athena-math

Are NLP Models really able to Solve Simple Math Word Problems?

2021

Graph2Tree with RoBERTa

arkilpatel/SVAMP debjitpaul/refiner vedantgaur/symbolic-mwp-reasoning

Are NLP Models really able to Solve Simple Math Word Problems?

2021

GTS with RoBERTa

arkilpatel/SVAMP debjitpaul/refiner vedantgaur/symbolic-mwp-reasoning

Are NLP Models really able to Solve Simple Math Word Problems?

2021

LSTM Seq2Seq with RoBERTa

arkilpatel/SVAMP debjitpaul/refiner vedantgaur/symbolic-mwp-reasoning

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

2023

SYRELM (GPT-J)

joykirat18/syrelm

Are NLP Models really able to Solve Simple Math Word Problems?

2021

Transformer with RoBERTa

arkilpatel/SVAMP debjitpaul/refiner vedantgaur/symbolic-mwp-reasoning

SVAMP

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (23)

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

Automatic Model Selection with Large Language Models for Reasoning

Progressive-Hint Prompting Improves Reasoning in Large Language Models

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Llama 2: Open Foundation and Fine-Tuned Chat Models

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Large Language Models are Zero-Shot Reasoners

Large Language Models are Zero-Shot Reasoners

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

ATHENA: Mathematical Reasoning with Thought Expansion

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction

ATHENA: Mathematical Reasoning with Thought Expansion

Are NLP Models really able to Solve Simple Math Word Problems?

Are NLP Models really able to Solve Simple Math Word Problems?

Are NLP Models really able to Solve Simple Math Word Problems?

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

Are NLP Models really able to Solve Simple Math Word Problems?

Model	Paper	Execution Accuracy	Date
GPT-4 DUP	Achieving >97% on GSM8K: Deeply Understanding the…	94.20	2024-04-23
GPT-4 (Teaching-Inspired)	Teaching-Inspired Integrated Prompting Framework:…	93.90	2024-10-10
GPT-4 (Model Selection)	Automatic Model Selection with Large Language Mod…	93.70	2023-05-23
GPT-4 (PHP)	Progressive-Hint Prompting Improves Reasoning in …	91.90	2023-04-19
OpenMath-CodeLlama-70B (w/ code)	OpenMathInstruct-1: A 1.8 Million Math Instructio…	87.80	2024-02-15
MathCoder-L-70B	MathCoder: Seamless Code Integration in LLMs for …	84.90	2023-10-05
MMOS-CODE-34B(0-shot)	An Empirical Study of Data Ability Boundary in LL…	80.60	2024-02-23
MMOS-DeepSeekMath-7B(0-shot)	An Empirical Study of Data Ability Boundary in LL…	79.30	2024-02-23
MMOS-CODE-7B(0-shot)	An Empirical Study of Data Ability Boundary in LL…	76.40	2024-02-23
LLaMA 2-Chat	Llama 2: Open Foundation and Fine-Tuned Chat Mode…	69.20	2023-07-18
DeBERTa	Math Word Problem Solving by Generating Linguisti…	63.50	2023-06-24
PaLM (zero-shot, CoT)	Large Language Models are Zero-Shot Reasoners	62.10	2022-05-24
PaLM (zero-shot)	Large Language Models are Zero-Shot Reasoners	58.80	2022-05-24
SYRELM (Vicuna 13B)	Frugal LMs Trained to Invoke Symbolic Solvers Ach…	56.65	2023-12-09
ATHENA (roberta-large)	ATHENA: Mathematical Reasoning with Thought Expan…	54.80	2023-11-02
MsAT-DeductReasoner	Learning Multi-Step Reasoning by Solving Arithmet…	48.90	2023-06-02
Roberta-DeductReasoner	Learning to Reason Deductively: Math Word Problem…	47.30	2022-03-19
ATHENA (roberta-base)	ATHENA: Mathematical Reasoning with Thought Expan…	45.60	2023-11-02
Graph2Tree with RoBERTa	Are NLP Models really able to Solve Simple Math W…	43.80	2021-03-12
GTS with RoBERTa	Are NLP Models really able to Solve Simple Math W…	41.00	2021-03-12
LSTM Seq2Seq with RoBERTa	Are NLP Models really able to Solve Simple Math W…	40.30	2021-03-12
SYRELM (GPT-J)	Frugal LMs Trained to Invoke Symbolic Solvers Ach…	40.10	2023-12-09
Transformer with RoBERTa	Are NLP Models really able to Solve Simple Math W…	38.90	2021-03-12