Stella Biderman Booz Allen Hamilton McLeanUSA, Hailey Schoelkopf Yale University New HavenUSA, Quentin Anthony, Herbie Bradley University of Cambridge UK, Kyle O'brien, Eric Hallahan, Mohammad Aflah Khan Indraprastha Institute of Information Technology Delhi India 6 Stability AI 7 Datasaur.aiUSA, Shivanshu Purohit, Sai Usvsn, Prashanth, Edward Raff Booz Allen Hamilton McLeanUSA, Aviya Skowron, Lintang Sutawika, Oskar Van Der Wal Institute for Logic, Language and Computation University of Amsterdam Nether-lands, Hailey Schoelkopf (2023)
The paper introduces Pythia, a suite comprising 16 large language models (LLMs) ranging from 70M to 12B parameters, all trained using the same public data order. Pythia aims to facilitate research in various NLP domains by providing access to checkpoints and training data for analysis. The authors discuss key insights from Pythia, including findings on memorization effects, the impact of term frequency on performance, and methods for reducing gender bias in models. The study emphasizes the organized model architecture and controlled training conditions that enable detailed experimentation on LLMs, presenting several case studies to illustrate how adjustments in training data influence model behavior.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: