← ML Research Wiki / 2302.13971

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron MetaAI, Thibaut Lavril MetaAI, Gautier Izacard MetaAI, Xavier Martinet MetaAI, Marie-Anne Lachaux MetaAI, Timothee Lacroix MetaAI, Baptiste Rozière MetaAI, Naman Goyal MetaAI, Eric Hambro MetaAI, Faisal Azhar MetaAI, Aurelien Rodriguez MetaAI, Armand Joulin MetaAI, Edouard Grave MetaAI, Guillaume Lample MetaAI (2023)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
Artificial Intelligence
SOTA Claim
Yes
Reproducibility
8/10

Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community 1 . . 1990. A statistical approach to machine translation. Computational linguistics, 16(2):79-85. son. 2013. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005. Tafjord. 2018. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.

Summary

This paper presents LLaMA, a collection of foundation language models developed by MetaAI, which range from 7B to 65B parameters. The authors emphasize the use of publicly available datasets for training and demonstrate that models like LLaMA-13B outperform larger counterparts like GPT-3 in several benchmarks. The motivation behind LLaMA is to achieve high performance with efficient training on a larger number of tokens and to democratize access to high-quality language models. The paper details the architecture, training data, methods used in the models, their performance across various benchmarks, and also addresses issues such as biases and toxicity in the generated outputs.

Methods

This paper employs the following methods:

  • Transformer
  • AdamW optimizer
  • Bytepair encoding (BPE)

Models Used

  • LLaMA-7B
  • LLaMA-13B
  • LLaMA-33B
  • LLaMA-65B
  • GPT-3
  • Chinchilla
  • PaLM
  • Gopher
  • BLOOM
  • OPT

Datasets

The following datasets were used in this research:

  • CommonCrawl
  • C4
  • Gutenberg
  • Books3
  • ArXiv
  • Stack Exchange

Evaluation Metrics

  • Exact Match
  • Pass@1
  • Pass@100
  • Accuracy

Results

  • LLaMA-13B outperforms GPT-3 on most benchmarks
  • LLaMA-65B competes well with Chinchilla-70B and PaLM-540B
  • LLaMA-65B achieves state-of-the-art performance on Natural Questions and TriviaQA
  • LLaMA models show competitive performance in zero-shot and few-shot tasks across multiple benchmarks

Limitations

The authors identified the following limitations:

  • Model biases in generated outputs
  • Potential toxicity in outputs
  • Dependency on dataset quality

Technical Requirements

  • Number of GPUs: 2048
  • GPU Type: NVIDIA A100 80GB

Keywords

Language models Transformers Scaling laws Open-source datasets

Papers Using Similar Methods

External Resources