Gemma Team, Google Deepmind, Andreas Hutter, Andrei Terzis, Angelos Kulik, Anushan Fi- Los, Aurelien Fernando, Danila Boffy, Edouard Sinopalnikov, Gabriela Leurent, Geoffrey Surita, Jilin Cideron, Karthik Chen, Kathy Raveen- Dran, Kehang Meier-Hellstern, Kevin Han, Kritika Robinson, Le Muralidharan, Leonard Hou, Lev Berrada, Luheng Proleev, Marie He, Mark Pel- Lat, Matt Sherwood, Matthias Hoffman, Nicola Grundmann, Nikola De Cao, Nino Momchev, Noah Vieillard, Peter Constant, Piotr Liu, Qiao Stanczyk, Ruba Zhang, Seliem Haroun, Siddhartha El- Sayed, Tianhe Brahma, Kevin Yu, Tom Le Paine, Yingjie Miao, Yuanzhong Xu (2024)
This paper introduces Gemma, a family of lightweight open models developed using Gemini research and technology. Gemma models exhibit strong performance in language understanding, reasoning, and safety on various benchmarks. The models are available in two sizes (2 billion and 7 billion parameters) and include both pretrained and fine-tuned checkpoints. Gemma outperforms similar open models on 11 out of 18 text-based tasks, emphasizing the importance of responsible LLM releases for enhancing model safety and spurring innovation. The training process involved large datasets and sophisticated training architectures, with a focus on reducing potential harm and improving performance through rigorous evaluation methods. Early results indicate significant advancements in various language tasks, with detailed assessments of model limitations and future directions for responsible deployment and community engagement in AI development.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: