← ML Research Wiki / 2301.00774

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot

Elias Frantar, Dan Alistarh (2023)

Paper Information

arXiv ID

2301.00774

Venue

International Conference on Machine Learning

Domain

Natural language processing

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time.SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/ sparsegpt.

Summary

This paper introduces SparseGPT, a method that allows for one-shot pruning of large generative pretrained transformer (GPT) family models to at least 50% sparsity without the need for retraining, resulting in minimal accuracy loss. SparseGPT efficiently works on large models like OPT-175B and BLOOM-176B, achieving significant weight reduction while maintaining performance. The authors demonstrate that this method enables high levels of sparsity while preserving accuracy metrics, using perplexity and zero-shot accuracy for evaluation. They also discuss the implications of model size on sparsity effectiveness and highlight that larger models tend to tolerate higher sparsity with less impact on accuracy. Additionally, SparseGPT can work alongside weight quantization techniques for further efficiency. Experimental results confirm its advantage over traditional pruning methods, particularly emphasizing its effectiveness at scales of 10-100 billion parameters.

Methods

This paper employs the following methods:

SparseGPT

Models Used

OPT-175B
BLOOM-176B

Datasets

The following datasets were used in this research:

C4
WikiText2
PTB

Evaluation Metrics

Perplexity
Zero-shot accuracy

Results

SPARSE models can reach 50-60% unstructured sparsity with negligible increase in perplexity, and larger models exhibit less perplexity increase at similar sparsity levels.

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: 5
GPU Type: NVIDIA A100 80GB

Keywords

sparse pruning large language models transformer models post-training pruning model compression

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 55
Influential Citations: 110

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers