ML Research Wiki / Benchmarks / Math Word Problem Solving / SVAMP

SVAMP

Math Word Problem Solving Benchmark

Performance Over Time

📊 Showing 23 results | 📏 Metric: Execution Accuracy

Top Performing Models

Rank Model Paper Execution Accuracy Date Code
1 GPT-4 DUP Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems 94.20 2024-04-23 📦 whu-zqh/dup
2 GPT-4 (Teaching-Inspired) Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models 93.90 2024-10-10 📦 sallytan13/teaching-inspired-prompting
3 GPT-4 (Model Selection) Automatic Model Selection with Large Language Models for Reasoning 93.70 2023-05-23 📦 xuzhao0/model-selection-reasoning
4 GPT-4 (PHP) Progressive-Hint Prompting Improves Reasoning in Large Language Models 91.90 2023-04-19 📦 chuanyang-Zheng/Progressive-Hint
5 OpenMath-CodeLlama-70B (w/ code) 📚 OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset 87.80 2024-02-15 📦 kipok/nemo-skills
6 MathCoder-L-70B 📚 MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning 84.90 2023-10-05 📦 mathllm/mathcoder
7 MMOS-CODE-34B(0-shot) 📚 An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning 80.60 2024-02-23 📦 cyzhh/MMOS
8 MMOS-DeepSeekMath-7B(0-shot) 📚 An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning 79.30 2024-02-23 📦 cyzhh/MMOS
9 MMOS-CODE-7B(0-shot) 📚 An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning 76.40 2024-02-23 📦 cyzhh/MMOS
10 LLaMA 2-Chat Llama 2: Open Foundation and Fine-Tuned Chat Models 69.20 2023-07-18 📦 facebookresearch/llama 📦 llamafamily/llama-chinese 📦 flagalpha/llama2-chinese

All Papers (23)