← ML Research Wiki / 2501.03440

CI at Scale: Lean, Green, and Fast

(2025)

Paper Information

arXiv ID

2501.03440

Venue

arXiv.org

Contents

Abstract
Methods
Datasets
Results
Related Work
External Resources

Abstract

Maintaining a "green" mainline branch-where all builds pass successfully-is crucial but challenging in fast-paced, large-scale software development environments, particularly with concurrent code changes in large monorepos.SubmitQueue, a system designed to address these challenges, speculatively executes builds and only lands changes with successful outcomes.However, despite its effectiveness, the system faces inefficiencies in resource utilization, leading to a high rate of premature build aborts and delays in landing smaller changes blocked by larger conflicting ones.This paper introduces enhancements to SubmitQueue, focusing on optimizing resource usage and improving build prioritization.Central to this is our innovative probabilistic model, which distinguishes between changes with shorter and longer build times to prioritize builds for more efficient scheduling.By leveraging a machine learning model to predict build times and incorporating this into the probabilistic framework, we expedite the landing of smaller changes blocked by conflicting larger time-consuming changes.Additionally, introducing a concept of speculation threshold ensures that only the most likely builds are executed, reducing unnecessary resource consumption.After implementing these enhancements across Uber's major monorepos (Go, iOS, and Android), we observed a reduction in Continuous Integration (CI) resource usage by approximately 53%, CPU usage by 44%, and P95 waiting times by 37%.These improvements highlight the enhanced efficiency of SubmitQueue in managing large-scale software changes while maintaining a green mainline.

Summary

The paper presents enhancements to the SubmitQueue system utilized in continuous integration (CI) for large-scale software development. It addresses challenges associated with maintaining a 'green' mainline branch in the context of concurrent code changes across extensive monorepos. The authors introduce a probabilistic model to differentiate between changes with shorter and longer build times, improving build prioritization. By implementing these enhancements, the authors report significant reductions in CI resource usage (53%), CPU usage (44%), and P95 waiting times (37%) post-implementation. The paper systematically discusses the background, the limitations of existing systems, and the proposed solutions that utilize machine learning to optimize build times and reduce waiting times for smaller changes.

Methods

This paper employs the following methods:

Probabilistic Modeling
Speculative Execution

Models Used

NGBoost

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

P95 waiting times
CPU usage
CI resource usage

Results

Reduction in CI resource usage by 53%
Reduction in CPU usage by 44%
Reduction in P95 waiting times by 37%

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified
Compute Requirements: None specified

Papers Using Similar Methods

External Resources

References: 29

CI at Scale: Lean, Green, and Fast

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Technical Requirements edit

Related Papers