← ML Research Wiki / 2309.10305

Baichuan 2: Open Large-scale Language Models

Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, Mingan Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu, Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Matthew E Peters, Mohit IyyerMark Neumann, Matt Gardner, Christopher Clark, Alec Radford, Karthik Narasimhan, Tim Salimans, Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, Chelsea Finn, Teven Le Scao, Angela Fan, Christopher Akiki, Elizabeth-Jane Pavlick, Suzana Ili'c, Daniel Hesslow, Roman Castagn'e, Alexandra Sasha Luccioni, Franccois Yvon, Matthias Gallé, Jonathan Tow, Alexander M Rush, Stella Rose Biderman, Albert Webson, Sasanka Pawan, Thomas Ammanamanchi, Benoît Wang, Niklas Sagot, Albert Muennighoff, Del Villanova, Olatunji Moral, Rachel Ruwase, Stas Bawden, Angelina Bekman, Iz Mcmillan-Major, Huu Beltagy, Lucile Nguyen, Samson Saulnier, Pedro Tan, Victor Ortiz Suarez, Hugo Sanh, Yacine Laurenccon, Julien Jernite, Margaret Launay, Colin Mitchell, Aaron Raffel, Adi Gokaslan, AitorSoroa Simhi, Alham Etxabe, Aji Fikri, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris C Emezue, Christopher Klamm, Colin Leong, Daniel Alexander Van Strien, David Ifeoluwa Adelani, Dragomir R Radev, Eduardo Gonz'alez Ponferrada, Efrat Levkovizh, Eyal Bar NatanEthan Kim, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Trung Hieu, Ian Tran, Idris Yu, Isaac Abdulmumin, Johnson, Jesse Chim, Jian Dodge, Jonathan Zhu, Jorg Chang, Josephine L Frohberg, Joydeep Tobing, Khalid Bhattacharjee, Kimbo Almubarak, Kyle Chen, Leandro Lo, Leon Von Werra, Long Weber, LoubnaBen Phan, Ludovic Allal, Manan Tanguy, Manuel Dey, Maraim Romero Muñoz, Mar'iaMasoud, Mario Grandury, Max Vsavsko, Maximin Huang, Mayank Coavoux, MikeTian- Jian Singh, Minh Chien Jiang, Mohammad Ali Vu, Mustafa Jauhar, Nishant Ghaleb, Nora Subramani, Nurulaqilla Kassner, Olivier Khamis, Omar Nguyen, Ona Espejel, Paulo De Gibert, Peter Villegas, Pierre Henderson, Priscilla A Colombo, Quentin Amuok, Rheza Lhoest, Rishi Harliman, Roberto Bommasani, Rui L'opez, Salomey Ribeiro, Sampo Osei, Sebastian Pyysalo, Shamik Nagel, Shamsuddeen Hassan Bose, Shanya Muhammad, S Sharma, Somaieh Longpre, Stanislav Nikpoor, Suhas Silberberg, Sydney Pai, Tiago Timponi Zink, Timo Torrent, Tristan Schick, Valentin Thrush, Vassilina Danchev, Veronika Nikoulina, Violette Laippala, Vrinda Lepercq, Zaid Prabhu, Zeerak Alyafeai, Arun Talat, Benjamin Raja, Chenglei Heinzerling, Elizabeth Si, Sabrina J Salesky, Wilson Y Mielke, Abheesht Lee, Andrea Sharma, Antoine Santilli, Arnaud Chaffin, Debajyoti Stiegler, Eliza Datta, Gunjan Szczechla, Han Chhablani, Harshit Wang, Hendrik Pandey, Jason Alan Strobelt, Jos Fries, Leo Rozen, Lintang Gao, Sutawika, Saiful Bari, Maged S Al-Shaibani, Matteo Manica, Nihal V Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben- David, Stephen H Bach, Taewoon Kim, Tali Bers, Ofir Phang, Conglong Press, Deepak Li, Hatim Narayanan, Jared Bourfoune, Jeff Casper, Max Rasley, Mayank Ryabinin, Minjia Mishra, Mohammad Zhang, Myriam Shoeybi, Nicolas Peyrounette, Nouamane Patry, Omar Tazi, Patrick Sanseviero, Pierre Von Platen, Pierre Cornette, Rémi Franccois Lavall'ee, Samyam Lacroix, Sanchit Rajbhandari, Shaden Gandhi, Stéphane Smith, Suraj Requena, Tim Patil, Ahmed Dettmers, Amanpreet Baruwa, Anastasia Singh, Anne-Laure Cheveleva, Arjun Ligozat, Subramonian, Charles Aur'elie N'ev'eol, Daniel H Lovering, Deepak R Garrette, Ehud Tunuguntla, Ekaterina Reiter, Ekaterina Taktasheva, Eli Voloshina, Bogdanov, Indra Genta, Hailey Winata, Jan-Christoph Schoelkopf, Jekaterina Kalo, Jessica Zosa Novikova, Xiangru Forde, Jungo Tang, Ken Kasai, Liam Kawamura, Marine Hazan, Miruna Carpuat, Najoung Clinciu, Newton Kim, Oleg Cheng, Omer Serikov, Oskar Antverg, Rui Van Der Wal, Ruochen Zhang, Sebastian Zhang, Shachar Gehrmann, S Osher Mirkin, Tatiana Pais, Thomas Shavrina, Tian Scialom, Tomasz Yun, Verena Limisiewicz, Vitaly Rieser, Vladislav Protasov, Yada Mikhailov, Yonatan Pruksachatkun, Zachary Belinkov, Zdenvek Bamberger, Alice Kasner, Amanda Rueda, Baichuan Inc, Itziar Gonzalez-Dios Javier de la RosaJenny, Thibault Févry Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiang Tang, Zheng Xin Yong, Shaked Brody, Y Uri, Hadar Tojarieh, Hyung Won Chung, Jaesung TaeZhiqing Sun, Adam Roberts, Jason (2023)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
Natural language processing, artificial intelligence
SOTA Claim
Yes
Code
Available
Reproducibility
8/10

Abstract

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.However, most powerful LLMs are closed-source or limited in their capability for languages other than English.In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.Furthermore, Baichuan 2 excels in vertical domains such as medicine and law.We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.

Summary

Baichuan 2 introduces a series of large-scale multilingual language models with 7 billion and 13 billion parameters. The models were trained from scratch on 2.6 trillion tokens, making it larger than previous models. Baichuan 2 exhibits exceptional performance on public benchmarks, ensuring robustness across various domains, including medicine and law. The report emphasizes the open-source nature of these models, allowing the research community access to pre-training checkpoints to improve transparency and understanding of training dynamics. The paper elaborates on the architecture modifications made to the Transformer model, the data sourcing methods, and the training optimizations employed to enhance efficiency and performance. Additionally, it discusses alignment procedures for chat models that incorporate supervised fine-tuning and reinforcement learning from human feedback (RLHF) to improve the interaction quality and safety of the models. The results demonstrate Baichuan 2's superior performance compared to many existing open-source models across several specific tasks, showcasing its capabilities in multilingual contexts and vertical applications.

Methods

This paper employs the following methods:

  • Transformer
  • Reinforcement Learning from Human Feedback (RLHF)
  • Supervised Fine-Tuning (SFT)

Models Used

  • Baichuan 2-7B
  • Baichuan 2-13B
  • Baichuan 2-7B-Chat
  • Baichuan 2-13B-Chat

Datasets

The following datasets were used in this research:

  • MMLU
  • CMMLU
  • GSM8K
  • HumanEval
  • MedQA
  • JEC-QA
  • Flores-101

Evaluation Metrics

  • None specified

Results

  • Baichuan 2 matches or outperforms other open-source models on benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
  • Achieves nearly 30% higher performance compared to Baichuan 1 on benchmarks.
  • Demonstrates significant improvement in math and code problems, nearly doubling results from Baichuan 1.

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: 1024
  • GPU Type: NVIDIA A800

Keywords

Large language models Multilingual NLP Transformer architecture Reinforcement Learning from Human Feedback Model safety Scaling laws

Papers Using Similar Methods

External Resources