← ML Research Wiki / 2403.08295

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Google Deepmind, Andreas Hutter, Andrei Terzis, Angelos Kulik, Anushan Fi- Los, Aurelien Fernando, Danila Boffy, Edouard Sinopalnikov, Gabriela Leurent, Geoffrey Surita, Jilin Cideron, Karthik Chen, Kathy Raveen- Dran, Kehang Meier-Hellstern, Kevin Han, Kritika Robinson, Le Muralidharan, Leonard Hou, Lev Berrada, Luheng Proleev, Marie He, Mark Pel- Lat, Matt Sherwood, Matthias Hoffman, Nicola Grundmann, Nikola De Cao, Nino Momchev, Noah Vieillard, Peter Constant, Piotr Liu, Qiao Stanczyk, Ruba Zhang, Seliem Haroun, Siddhartha El- Sayed, Tianhe Brahma, Kevin Yu, Tom Le Paine, Yingjie Miao, Yuanzhong Xu (2024)

Paper Information

arXiv ID

2403.08295

Venue

arXiv.org

Domain

artificial intelligence, natural language processing, machine learning

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety.We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints.Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development.We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

Summary

This paper introduces Gemma, a family of lightweight open models developed using Gemini research and technology. Gemma models exhibit strong performance in language understanding, reasoning, and safety on various benchmarks. The models are available in two sizes (2 billion and 7 billion parameters) and include both pretrained and fine-tuned checkpoints. Gemma outperforms similar open models on 11 out of 18 text-based tasks, emphasizing the importance of responsible LLM releases for enhancing model safety and spurring innovation. The training process involved large datasets and sophisticated training architectures, with a focus on reducing potential harm and improving performance through rigorous evaluation methods. Early results indicate significant advancements in various language tasks, with detailed assessments of model limitations and future directions for responsible deployment and community engagement in AI development.

Methods

This paper employs the following methods:

Transformer
Reinforcement Learning from Human Feedback
Supervised Fine-Tuning

Models Used

Gemma 2B
Gemma 7B

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

None specified

Results

Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks.
Gemma 7B has a 61.2% positive win rate against Mistral v0.2 7B Instruct on instruction-following tasks.
Gemma models demonstrate strong performance on mathematics and coding benchmarks.

Limitations

The authors identified the following limitations:

The release of models is irreversible and poses potential risks that are not yet fully defined.

Technical Requirements

Number of GPUs: 4096
GPU Type: TPUv5e

Keywords

open models transformers language understanding safety responsible deployment instruction tuning

Papers Using Similar Methods

External Resources

Funding: Google DeepMind
References: 47
Influential Citations: 60

Gemma: Open Models Based on Gemini Research and Technology

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers