← ML Research Wiki / 2304.12244

WizardLM: Empowering Large Language Models to Follow Complex Instructions

Can Xu [email protected] Peking University, Qingfeng Sun Peking University, Kai Zheng [email protected] Peking University, Xiubo Geng [email protected] Peking University, Pu Zhao [email protected] Peking University, Jiazhan Feng [email protected] Peking University, † Chongyang Tao [email protected] Peking University, Qingwei Lin [email protected] Peking University, Daxin Jiang [email protected] Peking University, Microsoft Peking University (2023)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
Not specified
SOTA Claim
Yes
Reproducibility
4/10

Abstract

Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed and Vicuna's testset show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM model are preferred to outputs from OpenAI ChatGPT. In GPT-4 automatic evaluation, WizardLM achieves more than 90% capacity of ChatGPT on 17 out of 29 skills. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing LLMs. Our code and data are public at https://github.com

Summary

This paper presents WizardLM, a large language model trained to follow complex instructions by generating diverse instruction data using a method called Evol-Instruct. The authors argue that creating open-domain instruction data using a language model can overcome the challenges of manually creating such data which is labor-intensive and often lacks complexity. The Evol-Instruct method rewrites initial simple instructions to more complex ones and generates a large dataset of 250,000 instructions to fine-tune LLaMA, resulting in WizardLM. Human evaluations indicate that WizardLM’s outputs are preferred over those from human-created datasets and even ChatGPT in complex scenarios, with great performance in GPT-4 automatic evaluations. Through this process, the authors demonstrate the potential of AI-evolved instructions for enhancing language model capabilities.

Methods

This paper employs the following methods:

  • Evol-Instruct

Models Used

  • LLaMA
  • ChatGPT
  • WizardLM

Datasets

The following datasets were used in this research:

  • Alpaca
  • ShareGPT
  • Evol-Instruct testset
  • Vicuna testset

Evaluation Metrics

  • Human Evaluation Win Rate
  • GPT-4 Automatic Evaluation

Results

  • WizardLM significantly outperforms Vicuna on the Evol-Instruct testset and human evaluations.
  • Human annotators prefer WizardLM's outputs over ChatGPT on higher complexity instructions.

Limitations

The authors identified the following limitations:

  • The method may pose challenges for scalability and reliability.
  • The test set may not encompass all real-world scenarios for LLM application.

Technical Requirements

  • Number of GPUs: 8
  • GPU Type: V100

Papers Using Similar Methods

External Resources