← ML Research Wiki / 2306.08568

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo Hong Kong Baptist University, Can Xu [email protected] Hong Kong Baptist University, Pu Zhao [email protected] Hong Kong Baptist University, Qingfeng Sun Hong Kong Baptist University, Xiubo Geng [email protected] Hong Kong Baptist University, Wenxiang Hu Hong Kong Baptist University, Chongyang Tao [email protected] Hong Kong Baptist University, Jing Ma [email protected] Hong Kong Baptist University, Qingwei Lin [email protected] Hong Kong Baptist University, Daxin Jiang [email protected] Hong Kong Baptist University, Microsoft Hong Kong Baptist University (2023)

Paper Information

arXiv ID

2306.08568

Venue

International Conference on Learning Representations

Domain

Natural language processing and software engineering

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks.However, most existing models are solely pre-trained on extensive raw code data without instruction finetuning.In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model.It surpasses all other open-source Code LLMs by a substantial margin.Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+.Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM.

Summary

The paper introduces WizardCoder, a Code Large Language Model (LLM) that leverages the Evol-Instruct method to enhance code generation capabilities through instruction fine-tuning. Unlike prior models that rely solely on extensive raw code data, WizardCoder focuses on creating complex instruction datasets specific to coding tasks. The authors conducted experiments on four code generation benchmarks: HumanEval, HumanEval+, MBPP, and DS-1000, demonstrating that WizardCoder outperforms existing open-source and even some closed-source models, including Claude and Bard. The paper details the methodology, experimental setup, and results, providing insights into the effectiveness of the Evol-Instruct adaptations for code-related tasks.

Methods

This paper employs the following methods:

Evol-Instruct

Models Used

WizardCoder
StarCoder
Claude
Bard

Datasets

The following datasets were used in this research:

HumanEval
HumanEval+
MBPP
DS-1000

Evaluation Metrics

pass@1

Results

WizardCoder surpasses all open-source Code LLMs by a substantial margin in terms of code generation
Outperforms closed-source models like Claude and Bard on HumanEval and HumanEval+ benchmarks

Limitations

The authors identified the following limitations:

Despite achieving impressive performance, WizardCoder still falls significantly behind the SOTA LLM, GPT-4. Future work will address this gap.

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Large Language Models Code generation Instruction fine-tuning Evol-Instruct Benchmarking

Papers Using Similar Methods

External Resources

Funding: Microsoft and Hong Kong Baptist University
References: 37
Influential Citations: 86

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers