← ML Research Wiki / 2407.10671

QWEN2 TECHNICAL REPORT

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jianxin Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan (2024)

Paper Information

arXiv ID

2407.10671

Venue

arXiv.org

Domain

Not specified

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model.Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning.

Summary

The Qwen2 Technical Report presents the Qwen2 series, a range of advanced language models that include dense and Mixture-of-Experts models, ranging from 0.5 to 72 billion parameters. This series significantly enhances functionalities in language understanding, generation, and multilingual communication compared to its predecessor Qwen1.5 and other proprietary models. The flagship model, Qwen2-72B, demonstrates impressive performance across various benchmarks, including achieving scores of 84.2 on MMLU and 89.5 on GSM8K. Additionally, an instruction-tuned variant, Qwen2-72B-Instruct, showcases competitive results on MT-Bench and Arena-Hard. The release aims to promote accessibility via platforms like Hugging Face and GitHub, facilitating community innovation. Key innovations in the model include advancements in the architecture based on Transformer principles and improved pre-training methodologies involving extensive datasets of over 7 trillion tokens, inclusive of multilingual and code-related content. The paper stresses ongoing efforts to mitigate safety and contamination concerns while aiming for responsible AI development.

Methods

This paper employs the following methods:

Mixture-of-Experts (MoE)
Transformer
Grouped Query Attention (GQA)
Dual Chunk Attention (DCA)
Direct Preference Optimization (DPO)

Models Used

Qwen2-0.5B
Qwen2-1.5B
Qwen2-7B
Qwen2-57B-A14B
Qwen2-72B
Qwen2-72B-Instruct

Datasets

The following datasets were used in this research:

MMLU
GPQA
HumanEval
GSM8K
BBH
MT-Bench
Arena-Hard
LiveCodeBench
None specified

Evaluation Metrics

MMLU
GPQA
HumanEval
GSM8K
BBH
MT-Bench
Arena-Hard
LiveCodeBench
None specified

Results

Qwen2-72B achieves 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as the base model.
Qwen2-72B-Instruct scores 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench.

Limitations

The authors identified the following limitations:

Concerns regarding contamination and safety in data handling
Performance gaps in English comprehension and instruction-following relative to proprietary models.

QWEN2 TECHNICAL REPORT

Abstract

Summary

Methods

Models Used

Datasets

Evaluation Metrics

Results

Limitations

Technical Requirements

Papers Using Similar Methods

External Resources

QWEN2 TECHNICAL REPORT

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers

Papers Using Similar Methods

External Resources

Edit Paper Information

Abstract

Methods

Models Used

Datasets

Evaluation Metrics

Results

Limitations

Technical Requirements