← ML Research Wiki / 2306.08568

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo Hong Kong Baptist University, Can Xu [email protected] Hong Kong Baptist University, Pu Zhao [email protected] Hong Kong Baptist University, Qingfeng Sun Hong Kong Baptist University, Xiubo Geng [email protected] Hong Kong Baptist University, Wenxiang Hu Hong Kong Baptist University, Chongyang Tao [email protected] Hong Kong Baptist University, Jing Ma [email protected] Hong Kong Baptist University, Qingwei Lin [email protected] Hong Kong Baptist University, Daxin Jiang [email protected] Hong Kong Baptist University, Microsoft Hong Kong Baptist University (2023)

Paper Information
arXiv ID
Venue
International Conference on Learning Representations
Domain
Natural language processing and software engineering
SOTA Claim
Yes
Code
Available
Reproducibility
8/10

Abstract

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks.However, most existing models are solely pre-trained on extensive raw code data without instruction finetuning.In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model.It surpasses all other open-source Code LLMs by a substantial margin.Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+.Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM.

Summary

The paper introduces WizardCoder, a Code Large Language Model (LLM) that leverages the Evol-Instruct method to enhance code generation capabilities through instruction fine-tuning. Unlike prior models that rely solely on extensive raw code data, WizardCoder focuses on creating complex instruction datasets specific to coding tasks. The authors conducted experiments on four code generation benchmarks: HumanEval, HumanEval+, MBPP, and DS-1000, demonstrating that WizardCoder outperforms existing open-source and even some closed-source models, including Claude and Bard. The paper details the methodology, experimental setup, and results, providing insights into the effectiveness of the Evol-Instruct adaptations for code-related tasks.

Methods

This paper employs the following methods:

  • Evol-Instruct

Models Used

  • WizardCoder
  • StarCoder
  • Claude
  • Bard

Datasets

The following datasets were used in this research:

  • HumanEval
  • HumanEval+
  • MBPP
  • DS-1000

Evaluation Metrics

  • pass@1

Results

  • WizardCoder surpasses all open-source Code LLMs by a substantial margin in terms of code generation
  • Outperforms closed-source models like Claude and Bard on HumanEval and HumanEval+ benchmarks

Limitations

The authors identified the following limitations:

  • Despite achieving impressive performance, WizardCoder still falls significantly behind the SOTA LLM, GPT-4. Future work will address this gap.

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

Large Language Models Code generation Instruction fine-tuning Evol-Instruct Benchmarking

Papers Using Similar Methods

External Resources