Zhihong Shao [email protected] DeepSeek-AI Tsinghua University, Peiyi Wang [email protected] DeepSeek-AI Peking University, Qihao Zhu DeepSeek-AI Peking University, Runxin Xu DeepSeek-AI, Junxiao Song DeepSeek-AI, Xiao Bi DeepSeek-AI, Haowei Zhang DeepSeek-AI, Mingchuan Zhang DeepSeek-AI, Y K Li DeepSeek-AI, Y Wu DeepSeek-AI, Daya Guo [email protected] DeepSeek-AI (2024)
This paper introduces DeepSeekMath, a domain-specific language model focused on mathematical reasoning, which significantly outperforms existing open-source models and approaches the performance of proprietary models like GPT-4. The authors present DeepSeekMath 7B, trained on a dataset of 120 billion math-related tokens sourced from Common Crawl, combined with natural language and code data. The model achieved a score of 51.7% on the MATH benchmark and demonstrated self-consistency results of 60.9%. Key innovations include a well-crafted data selection pipeline for high-quality training data and a new reinforcement learning algorithm called Group Relative Policy Optimization (GRPO), which improves memory efficiency and mathematical reasoning capabilities. Evaluations showed that DeepSeekMath 7B excels in various benchmarks, outperforming larger models like Minerva (540B) while also showcasing improvements in multilingual tasks. Despite these successes, the authors note limitations in geometry problems and the model's inability to handle some mathematical tasks effectively.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: