ML Research Wiki / Benchmarks / Math Word Problem Solving / MATH

MATH

Math Word Problem Solving Benchmark

Performance Over Time

📊 Showing 132 results | 📏 Metric: Accuracy

Top Performing Models

Rank Model Paper Accuracy Date Code
1 Qwen2.5-Math-72B-Instruct(TIR,Greedy) 📚 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement 88.10 2024-09-18 -
2 GPT-4 Turbo (MACM, w/code, voting) MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems 87.92 2024-04-06 📦 bin123apple/macm
3 Qwen2.5-Math-72B-Instruct(COT,Greedy) 📚 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement 85.90 2024-09-18 -
4 Qwen2.5-Math-7B-Instruct(TIR,Greedy) 📚 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement 85.20 2024-09-18 -
5 GPT-4-code model (CSV, w/ code, SC, k=16) Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification 84.30 2023-08-15 📦 kipok/nemo-skills
6 Qwen2-Math-72B-Instruct(greedy) 📚 Qwen2 Technical Report 84.00 2024-07-15 📦 qwenlm/qwen1.5 📦 qwenlm/qwen2 📦 vicentvankor/sun-shine
7 Qwen2.5-Math-7B-Instruct(COT,Greedy) 📚 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement 83.60 2024-09-18 -
8 Qwen2.5-Math-1.5B-Instruct(TIR,Greedy) 📚 Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement 79.90 2024-09-18 -
9 OpenMath2-Llama3.1-70B (majority@256) 📚 OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data 79.60 2024-10-02 📦 NVIDIA/NeMo-Skills
10 OpenMath2-Llama3.1-8B (majority@256) 📚 OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data 76.10 2024-10-02 📦 NVIDIA/NeMo-Skills

All Papers (132)