Domain
Artificial Intelligence
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community 1 . . 1990. A statistical approach to machine translation. Computational linguistics, 16(2):79-85. son. 2013. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005. Tafjord. 2018. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.
This paper presents LLaMA, a collection of foundation language models developed by MetaAI, which range from 7B to 65B parameters. The authors emphasize the use of publicly available datasets for training and demonstrate that models like LLaMA-13B outperform larger counterparts like GPT-3 in several benchmarks. The motivation behind LLaMA is to achieve high performance with efficient training on a larger number of tokens and to democratize access to high-quality language models. The paper details the architecture, training data, methods used in the models, their performance across various benchmarks, and also addresses issues such as biases and toxicity in the generated outputs.
This paper employs the following methods:
- Transformer
- AdamW optimizer
- Bytepair encoding (BPE)
- LLaMA-7B
- LLaMA-13B
- LLaMA-33B
- LLaMA-65B
- GPT-3
- Chinchilla
- PaLM
- Gopher
- BLOOM
- OPT
The following datasets were used in this research:
- CommonCrawl
- C4
- Gutenberg
- Books3
- ArXiv
- Stack Exchange
- Exact Match
- Pass@1
- Pass@100
- Accuracy
- LLaMA-13B outperforms GPT-3 on most benchmarks
- LLaMA-65B competes well with Chinchilla-70B and PaLM-540B
- LLaMA-65B achieves state-of-the-art performance on Natural Questions and TriviaQA
- LLaMA models show competitive performance in zero-shot and few-shot tasks across multiple benchmarks
The authors identified the following limitations:
- Model biases in generated outputs
- Potential toxicity in outputs
- Dependency on dataset quality
- Number of GPUs: 2048
- GPU Type: NVIDIA A100 80GB
Language models
Transformers
Scaling laws
Open-source datasets