← ML Research Wiki / 2506.17204

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

(2025)

Paper Information
arXiv ID

Abstract

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization.Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures.This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training.Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference.We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.Our code is publicly available at GitHub .

Summary

This paper investigates the role of static network sparsity in improving the scalability of Deep Reinforcement Learning (DRL) models. It highlights how traditional approaches to scaling DRL, such as dynamic resets and complex architectures, often lead to optimization challenges. In contrast, the authors propose that simple one-shot random pruning of network weights can enhance parameter efficiency and resilience against optimization pathologies like plasticity loss, gradient interference, and performance degradation during scaling. The study presents empirical findings from extensive experiments, revealing that sparse networks outperform dense architectures, demonstrating heightened scalability with preserved learning capacity. Notably, the benefits of network sparsity are shown to generalize across various DRL scenarios, including visual and streaming applications.

Methods

This paper employs the following methods:

  • One-shot random pruning
  • Static sparse training (SST)
  • Dynamic sparse training (DST)
  • Soft Actor-Critic (SAC)
  • Deep Deterministic Policy Gradient (DDPG)

Models Used

  • SimBa
  • Soft Actor-Critic (SAC)
  • Deep Deterministic Policy Gradient (DDPG)

Datasets

The following datasets were used in this research:

  • DeepMind Control Suite (DMC)
  • Atari-100k

Evaluation Metrics

  • Stable-rank (Srank)
  • Fraction of Active Units (FAU)

Results

  • Sparse networks show superior performance compared to dense counterparts when scaling up model size
  • Introducing sparsity improves parameter efficiency and scaling potential beyond limitations of dense architectures
  • Sparsity effectively mitigates optimization pathologies allowing larger models to perform better

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified
  • Compute Requirements: None specified

External Resources