← ML Research Wiki / 2506.17204

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

(2025)

Paper Information

arXiv ID

2506.17204

Contents

Abstract
Methods
Datasets
Results
Related Work
External Resources

Abstract

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization.Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures.This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training.Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference.We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.Our code is publicly available at GitHub .

Summary

This paper investigates the role of static network sparsity in improving the scalability of Deep Reinforcement Learning (DRL) models. It highlights how traditional approaches to scaling DRL, such as dynamic resets and complex architectures, often lead to optimization challenges. In contrast, the authors propose that simple one-shot random pruning of network weights can enhance parameter efficiency and resilience against optimization pathologies like plasticity loss, gradient interference, and performance degradation during scaling. The study presents empirical findings from extensive experiments, revealing that sparse networks outperform dense architectures, demonstrating heightened scalability with preserved learning capacity. Notably, the benefits of network sparsity are shown to generalize across various DRL scenarios, including visual and streaming applications.

Methods

This paper employs the following methods:

One-shot random pruning
Static sparse training (SST)
Dynamic sparse training (DST)
Soft Actor-Critic (SAC)
Deep Deterministic Policy Gradient (DDPG)

Models Used

SimBa
Soft Actor-Critic (SAC)
Deep Deterministic Policy Gradient (DDPG)

Datasets

The following datasets were used in this research:

DeepMind Control Suite (DMC)
Atari-100k

Evaluation Metrics

Stable-rank (Srank)
Fraction of Active Units (FAU)

Results

Sparse networks show superior performance compared to dense counterparts when scaling up model size
Introducing sparsity improves parameter efficiency and scaling potential beyond limitations of dense architectures
Sparsity effectively mitigates optimization pathologies allowing larger models to perform better

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified
Compute Requirements: None specified

External Resources

References: 58

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Technical Requirements edit

Related Papers