← ML Research Wiki / 2304.03277

INSTRUCTION TUNING WITH GPT-4

Baolin Peng [email protected] Microsoft Research, Chunyuan Li Microsoft Research, Pengcheng He Microsoft Research, Michel Galley [email protected] Microsoft Research, Jianfeng Gao [email protected] Microsoft Research (2023)

Paper Information

arXiv ID

2304.03277

Venue

arXiv.org

Domain

Natural Language Processing

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Prior work has shown that finetuning large language models (LLMs) using machinegenerated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed.In this paper, we present the first attempt to use GPT-4 to generate instructionfollowing data for LLM finetuning.Our early experiments on instruction-tuned LLaMA models show that the 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks to the instruction-following data generated by previous state-of-the-art models.We also collect feedback and comparison data from GPT-4 to enable a comprehensive evaluation and reward model training.We make our data generated using GPT-4 as well as our codebase publicly available. 1* Equal Contribution 1 https://instruction-tuning-with-gpt-4.github.io/Note: This is a preliminary release, and we will continue to expand the dataset and will finetune larger models.

Summary

This paper presents the first attempt to use GPT-4 for generating instruction-following data to finetune large language models (LLMs) and assesses its effects on zero-shot performance tasks. The authors demonstrate that the instruction-following datasets, comprising 52K examples in English and Chinese, significantly improve performance compared to previous models. Their approach includes collecting GPT-4 feedback for evaluating model performance and training reward models. The empirical studies validate the effectiveness of GPT-4-generated data, suggesting best practices for building instruction-following LLMs. The paper also outlines plans for future work to expand data collection and enhance model training using reinforcement learning from human feedback.

Methods

This paper employs the following methods:

Instruction-tuning
Self-Instruct tuning
Reinforcement Learning from Human Feedback (RLHF)

Models Used

LLaMA-GPT4
LLaMA-GPT4-CN

Datasets

The following datasets were used in this research:

Alpaca
None specified

Evaluation Metrics

ROUGE-L
None specified

Results

GPT-4 data leads to superior zero-shot performance on new tasks
LLaMA models trained with GPT-4 show comparable performance to GPT-4 itself
GPT-4-generated instruction-following data enhances alignment with human values

Limitations

The authors identified the following limitations:

Current dataset size is limited to 52K instances; plans for expansion are suggested
Only using reward models during decoding stage suggests lack of full integration in training

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

GPT-4 instruction tuning LLMs self-instruct evaluation reward models

Papers Using Similar Methods

External Resources

Funding: Microsoft Research
References: 27
Influential Citations: 53

INSTRUCTION TUNING WITH GPT-4

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers