← ML Research Wiki / 2302.13971

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron MetaAI, Thibaut Lavril MetaAI, Gautier Izacard MetaAI, Xavier Martinet MetaAI, Marie-Anne Lachaux MetaAI, Timothee Lacroix MetaAI, Baptiste Rozière MetaAI, Naman Goyal MetaAI, Eric Hambro MetaAI, Faisal Azhar MetaAI, Aurelien Rodriguez MetaAI, Armand Joulin MetaAI, Edouard Grave MetaAI, Guillaume Lample MetaAI (2023)

Paper Information

arXiv ID

2302.13971

Venue

arXiv.org

Domain

Artificial Intelligence

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community 1 . . 1990. A statistical approach to machine translation. Computational linguistics, 16(2):79-85. son. 2013. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005. Tafjord. 2018. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.

Summary

This paper presents LLaMA, a collection of foundation language models developed by MetaAI, which range from 7B to 65B parameters. The authors emphasize the use of publicly available datasets for training and demonstrate that models like LLaMA-13B outperform larger counterparts like GPT-3 in several benchmarks. The motivation behind LLaMA is to achieve high performance with efficient training on a larger number of tokens and to democratize access to high-quality language models. The paper details the architecture, training data, methods used in the models, their performance across various benchmarks, and also addresses issues such as biases and toxicity in the generated outputs.

Methods

This paper employs the following methods:

Transformer
AdamW optimizer
Bytepair encoding (BPE)

Models Used

LLaMA-7B
LLaMA-13B
LLaMA-33B
LLaMA-65B
GPT-3
Chinchilla
PaLM
Gopher
BLOOM
OPT

Datasets

The following datasets were used in this research:

CommonCrawl
C4
Gutenberg
Books3
ArXiv
Stack Exchange

Evaluation Metrics

Exact Match
Pass@1
Pass@100
Accuracy

Results

LLaMA-13B outperforms GPT-3 on most benchmarks
LLaMA-65B competes well with Chinchilla-70B and PaLM-540B
LLaMA-65B achieves state-of-the-art performance on Natural Questions and TriviaQA
LLaMA models show competitive performance in zero-shot and few-shot tasks across multiple benchmarks

Limitations

The authors identified the following limitations:

Model biases in generated outputs
Potential toxicity in outputs
Dependency on dataset quality

Technical Requirements

Number of GPUs: 2048
GPU Type: NVIDIA A100 80GB

Keywords

Language models Transformers Scaling laws Open-source datasets

Papers Using Similar Methods

External Resources

Funding: Meta AI Research
References: 80
Influential Citations: 1418

LLaMA: Open and Efficient Foundation Language Models

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers