← ML Research Wiki / 2402.06196

Large Language Models: A Survey

Shervin Minaee Applied Scientist Amazon Inc, Tomas Mikolov Senior Researcher CIIRC CTU, Narjes Nikzad Cologne University of Applied Sciences, Meysam Chenaghlu, Richard Socher, Xavier Amatriain VP of Product AI and Compute Enablement Google Inc, Jianfeng Gao VP of Deep Learning Group Microsoft Research (2024)

Paper Information

arXiv ID

2402.06196

Venue

arXiv.org

Domain

Artificial Intelligence / Natural Language Processing

SOTA Claim

Yes

Reproducibility

5/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022.LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws [1],[2].The research area of LLMs, while very recent, is evolving rapidly in many different ways.In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations.We also give an overview of techniques developed to build, and augment LLMs.We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks.Finally, we conclude the paper by discussing open challenges and future research directions.

Summary

This paper presents a survey on Large Language Models (LLMs) since the advent of models like ChatGPT. It discusses the evolution of LLMs starting from statistical language models to the current generation including GPT, LLaMA, and PaLM families. The authors review various methods for building LLMs, their augmentations, and the datasets and metrics used for evaluation. They emphasize the emergent abilities of LLMs, their architectures, and the ongoing challenges within the field as well as future directions for research in making LLMs more efficient, capable, and reliable.

Methods

This paper employs the following methods:

Transformer
RNN
LSTM
GRU
Mixture of Experts (MoE)

Models Used

GPT-1
GPT-2
GPT-3
GPT-4
LLaMA
PaLM
Codex
WebGPT
InstructGPT

Datasets

The following datasets were used in this research:

Natural Questions
MMLU
MBPP
HumanEval
APPS
RACE
SQuAD
BoolQ
MultiRC
GSM8K
MATH
HellaSwag
AI2 Reasoning Challenge (ARC)
PIQA
SIQA
OpenBookQA
TruthfulQA
HotpotQA
ToolQA
GPT4Tools

Evaluation Metrics

Accuracy
Precision
Recall
F1-score
ROUGE
BLEU
pass@k
Exact Match (EM)
Human Equivalence Score (HEQ)

Results

Overview of LLM families such as GPT, LLaMA, and PaLM
Discussion of emergent abilities of LLMs
Comparison of LLMs performance based on specific benchmarks

Limitations

The authors identified the following limitations:

LLMs may generate hallucinations
LLMs can lack state/memory
Limited access to current information
Resource-intensive training and serving
Variability in responses based on prompts

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Large Language Models LLMs transformers dataset evaluation metrics fine-tuning prompt engineering multimodal models cost-effective training alignment hallucination

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 240
Influential Citations: 11

Large Language Models: A Survey

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers