Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pretraining Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP) tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., incontext learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g., containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers. InstructGPT GPT-NeoX-20B CodeGen OPT OPT-IML MT-NLG T0 Tk-Instruct 1-4 GPT-4 GShard UL2 PaLM Flan-T5 Flan-PaLM Sparrow ChatGPT Ernie 3.0 Titan Yuan 1.0 PanGu-Σ Gopher GLaM mT5 ERNIE Bot PanGu-PLUG Bard LaMDA CPM-2 HyperCLOVA Publicly Available Codex Jurassic-1 Ernie 3.0 Anthropic NLLB Cohere Pythia Vicuna Luminous YaLM 11-12 2022 GLM AlexaTM BLOOM WeLM AlphaCode Chinchilla rize these corpora into six groups: Books, CommonCrawl, Reddit links, Wikipedia, Code, and others.Books. BookCorpus [109] is a commonly used dataset in previous small-scale models (e.g., GPT [119] and GPT-2 [26]), consisting of over 11,000 books covering a wide range of topics and genres (e.g., novels and biographies). Another large-scale book corpus is Project Gutenberg [110], consisting of over 70,000 literary books including novels, essays, poetry, drama, history, science, philosophy, and other types of works in the public domain. It is currently one of the largest open-source book collections, which is used in training of MT-NLG [97] and LLaMA [57]. As for Books1 [55] and Books2 [55] used in GPT-3 [55], they are much larger than BookCorpus but have not been publicly released so far. CommonCrawl. CommonCrawl [120] is one of the largest open-source web crawling databases, containing a petabyte-
This survey reviews recent advancements in large language models (LLMs), emphasizing their evolution from statistical and neural language models to pre-trained Transformer models. It details the scaling effects on model capacity, highlighting emergent abilities like in-context learning and discussing important aspects such as pre-training, adaptation tuning, utilization, and capacity evaluation of LLMs. The paper also addresses challenges, such as alignment with human values and the technical difficulties in training LLMs, alongside providing a comprehensive literature review and recommendations for future work.
This paper employs the following methods:
- ChatGPT
- GPT-3
- GPT-4
- OPT
- PaLM
- BERT
- ERNIE 3.0
- LLaMA
- Gopher
- Codex
- Jurassic-1
- GShard
The following datasets were used in this research:
- CommonCrawl
- BookCorpus
- Project Gutenberg
- C4
- Wikipedia
- APPS
- HumanEval
- SQuAD
- BLEU
- ROUGE
- Accuracy
- F1-score
- Emergent abilities of LLMs
- Significant performance improvements in LLMs with increased scaling
- Pre-training improves generalization and task performance
The authors identified the following limitations:
- Difficulty in training capable LLMs due to resource demands
- Challenges in aligning LLMs with human preferences
- Issues with hallucination and knowledge recency
- Number of GPUs: None specified
- GPU Type: None specified