Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li, Guowei Li, Jiashi Li, Yao K Li, Wenfeng Liang, Fangyun Lin, A X Liu, Bo Liu, Wen Liu, Xiaodong Liu, Xin Liu, Yiyuan Liu, Haoyu Lu, Shanghao Lu, Fuli Luo, Shirong Ma, Xiaotao Nie, Tian Pei, Yishi Piao, Junjie Qiu, Hui Qu, Y Wu, Xin Wu, Zhenda Xie, Ziwei Xie, Yiliang Xie, Hanwei Xiong, R X Xu, Yanhong Xu, Dejian Xu, Yuxiang Yang, Shuiping You, Xingkai Yu, B Yu, Haowei Zhang, Lecong Zhang, Liyue Zhang, Mingchuan Zhang, Minghua Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhang, Yao Zhao, Shangyan Zhao, Shunfeng Zhou, Qihao Zhou, Yuheng Zhu, Zou DeepSeek-AI, Tongzheng Ren, Zehui Ren Chong Ruan Zhihong Shao, Jingxiang Sun, Bingxuan Wang, Yaohui WangZhangli Sha, Junxiao Song, Xuecheng Su, Yaofeng Sun, Minghui Tang, Peiyi Wang, Shiyu Wang, Yongji Wang, Tong (2024)
The paper introduces DeepSeek LLM, a project focusing on the scaling of open-source large language models (LLMs) with a long-term perspective. It discusses the significance of scaling laws in model performance and reports the creation of datasets and scaling experiments that underline the influence of data quality on model performance. The authors present a dataset of 2 trillion tokens collected primarily in Chinese and English, and evaluate the DeepSeek LLM 67B model, showing its competitive performance against LLaMA-2 70B and GPT-3.5 across various benchmarks including mathematics, coding, and reasoning tasks. The paper emphasizes the optimization of hyperparameters, scaling behavior, and fine-tuning techniques like supervised fine-tuning (SFT) and direct preference optimization (DPO) to improve conversational capabilities and model alignment. Furthermore, evaluations include safety assessments to ensure responses align with human values and mitigate harmful outputs. The study concludes with a discussion of limitations and prospects for future developments in the DeepSeek LLM project.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: