← ML Research Wiki / 2402.00838

OLMo : Accelerating the Science of Language Models

Dirk Groeneveld Allen Institute for Artificial Intelligence, Iz Beltagy Allen Institute for Artificial Intelligence, Noah A Smith Allen Institute for Artificial Intelligence, Hannaneh Hajishirzi, Sid Black, Stella Biderman, Eric Hallahan, Quentin An- Thony, Leo Gao, Laurence Golding, Horace He, Con- Nor Leahy, Kyle Mcdonell, Jason Phang, Michael Pieler, Su Lin Blodgett, Lisa Green, Brendan O', American English, Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T J Henighan, Rewon Child, Aditya Ramesh, Daniel M Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam Mccandlish, Alec Radford, Ilya Sutskever, Dario 2020 Amodei, Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, Ion Stoica, Eric P 2023 Xing, Vicuna, Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, HyungPaul Barham, Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vin- Odkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, An- Drew M Dai, Thanumalayan Sankaranarayana, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Alexandra Chronopoulou, Matthew Peters, Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina 2019 Toutanova, Boolq, Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind 2018 Tafjord, Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, Reynold 2023 Xin, Free, Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Sun 2023 Ultrafeedback, Jesse Dodge, Taylor Prewitt, Remi Tachet, Des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Yanai Elazar, Akshita Bhagia, Ian Helgi Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Evan Pete Walsh, Luca Soldaini, Sameer Singh, Jonathan Tow, Baber Abbasi, Anthony Dipofi, Charles Foster, Jeffrey Hsu, Alain Le Noac'h, Haonan Li, Anas Awadalla, Hao Peng, Suchin Gururangan, Mitchell Wortsman, Yitzhak Samir, Achal Gadre, Maciej Dave, Weijia Kilian, Jean Shi, Georgios Mercat, Gabriel Smyrnis, Matt Ilharco, Reinhard Jordan, Alex Heckel, Ali Dimakis, Vaishaal Farhadi, Ludwig Shankar, Schmidt, Openlm, Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, University of Washington γ Yale University δ New York University µ Carnegie Mellon University, USVSN Sai Prashanth Shivanshu Purohit, Laria Reynolds Jonathan Tow Ben Wang, and Samuel Weinbach 2022, Kyle McDonell Niklas Muennighoff Chris Ociepa, Jason Phang, Laria Reynolds Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, An-ish Thite, Kevin Wang, and Andy ZouBen Wang, Maarten Sap Dipankar Ray, and Ece Kamar2022 (2024)

Paper Information
arXiv ID
Venue
Annual Meeting of the Association for Computational Linguistics
Domain
natural language processing
SOTA Claim
Yes
Code
Reproducibility
8/10

Abstract

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings.As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed.Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models.Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code.We hope this release will empower the open research community and inspire a new wave of innovation.

Summary

The paper introduces OLMo, a truly open language model aimed at facilitating scientific research in natural language processing (NLP). The authors emphasize the need for accessible and detailed language models to study their biases, risks, and capabilities. OLMo is built upon a comprehensive framework that includes not only models but also datasets, training logs, and evaluation tools, thereby allowing the research community to replicate and build upon their work. The architecture of OLMo is based on the transformer model, and it releases multiple variants trained on a significant amount of data. The authors utilized various established tools and methods for adaptation and evaluation, including fine-tuning processes tailored for general chat capabilities. Evaluation results demonstrate OLMo's competitive performance against existing models in various NLP tasks, while they also highlight areas for potential improvement and future research directions.

Methods

This paper employs the following methods:

  • Transformer
  • ZeRO optimizer
  • AdamW optimizer

Models Used

  • OLMo-7B
  • LLaMA
  • MPT-7B

Datasets

The following datasets were used in this research:

  • Dolma

Evaluation Metrics

  • Perplexity
  • Accuracy
  • MMLU
  • ToxiGen
  • TruthfulQA

Results

  • OLMo shows competitive performance against models like LLaMA and MPT-7B on NLP tasks.
  • Instruction tuning significantly improves performance and safety metrics.

Limitations

The authors identified the following limitations:

  • Potential biases and toxic content in training data.
  • Challenges in training and adapting large models.
  • Existing evaluations may not fully represent user interactions with language models.

Technical Requirements

  • Number of GPUs: 32
  • GPU Type: AMD MI250X, NVIDIA A100

Keywords

language models transformer architecture open datasets model training evaluation benchmarks

Papers Using Similar Methods

External Resources