Bradley Brown [email protected] Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind, Jordan Juravsky Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind, Ryan Ehrlich [email protected] Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind, Ronald Clark [email protected] Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind, Quoc V Le § Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind, Christopher Ré Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind, Azalia Mirhoseini [email protected] Department of Computer Science Stanford University ‡ University of Oxford § Google DeepMind (2024)
This paper investigates the potential of increasing inference compute for large language models (LLMs) through repeated sampling of candidate solutions. It establishes that expanding the number of solutions sampled leads to significant improvements in coverage, or the fraction of problems that can be solved by the generated samples. The authors demonstrate that performance benefits are particularly pronounced in domains such as coding and formal proofs, where automatic verification tools can validate correct samples. A comparative analysis of several datasets shows that repeated sampling can enhance problem-solving rates, sometimes outperforming state-of-the-art single sample models. Key findings indicate a log-linear relationship between sample size and coverage, suggestive of underlying scaling laws in inference. Additionally, the authors highlight challenges in domains lacking automatic verifiers, where traditional verification methods struggle to maintain performance as sampling increases. The paper emphasizes the importance of precision in sample selection methods, drawing attention to the nuances and limitations inherent in repeated sampling across various tasks and models.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: