← ML Research Wiki / 2303.18223

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen (2023)

Paper Information

arXiv ID

2303.18223

Venue

arXiv.org

Domain

Not specified

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pretraining Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP) tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., incontext learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g., containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers. InstructGPT GPT-NeoX-20B CodeGen OPT OPT-IML MT-NLG T0 Tk-Instruct 1-4 GPT-4 GShard UL2 PaLM Flan-T5 Flan-PaLM Sparrow ChatGPT Ernie 3.0 Titan Yuan 1.0 PanGu-Σ Gopher GLaM mT5 ERNIE Bot PanGu-PLUG Bard LaMDA CPM-2 HyperCLOVA Publicly Available Codex Jurassic-1 Ernie 3.0 Anthropic NLLB Cohere Pythia Vicuna Luminous YaLM 11-12 2022 GLM AlexaTM BLOOM WeLM AlphaCode Chinchilla rize these corpora into six groups: Books, CommonCrawl, Reddit links, Wikipedia, Code, and others.Books. BookCorpus [109] is a commonly used dataset in previous small-scale models (e.g., GPT [119] and GPT-2 [26]), consisting of over 11,000 books covering a wide range of topics and genres (e.g., novels and biographies). Another large-scale book corpus is Project Gutenberg [110], consisting of over 70,000 literary books including novels, essays, poetry, drama, history, science, philosophy, and other types of works in the public domain. It is currently one of the largest open-source book collections, which is used in training of MT-NLG [97] and LLaMA [57]. As for Books1 [55] and Books2 [55] used in GPT-3 [55], they are much larger than BookCorpus but have not been publicly released so far. CommonCrawl. CommonCrawl [120] is one of the largest open-source web crawling databases, containing a petabyte-

Summary

This survey reviews recent advancements in large language models (LLMs), emphasizing their evolution from statistical and neural language models to pre-trained Transformer models. It details the scaling effects on model capacity, highlighting emergent abilities like in-context learning and discussing important aspects such as pre-training, adaptation tuning, utilization, and capacity evaluation of LLMs. The paper also addresses challenges, such as alignment with human values and the technical difficulties in training LLMs, alongside providing a comprehensive literature review and recommendations for future work.

Methods

This paper employs the following methods:

Transformer
RNN

Models Used

ChatGPT
GPT-3
GPT-4
OPT
PaLM
BERT
ERNIE 3.0
LLaMA
Gopher
Codex
Jurassic-1
GShard

Datasets

The following datasets were used in this research:

CommonCrawl
BookCorpus
Project Gutenberg
C4
Wikipedia
APPS
HumanEval
SQuAD

Evaluation Metrics

BLEU
ROUGE
Accuracy
F1-score

Results

Emergent abilities of LLMs
Significant performance improvements in LLMs with increased scaling
Pre-training improves generalization and task performance

Limitations

The authors identified the following limitations:

Difficulty in training capable LLMs due to resource demands
Challenges in aligning LLMs with human preferences
Issues with hallucination and knowledge recency

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 417
Influential Citations: 114

A Survey of Large Language Models

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers