← ML Research Wiki / 2402.15648

MambaIR: A Simple Baseline for Image Restoration with State-Space Model

Hang Guo Tsinghua Shenzhen International Graduate School Tsinghua University, Jinmin Li Tsinghua Shenzhen International Graduate School Tsinghua University, Tao Dai College of Computer Science and Software Engineering Shenzhen University, Zhihao Ouyang [email protected] ByteDance Inc, Xudong Ren Tsinghua Shenzhen International Graduate School Tsinghua University, Shu-Tao Xia Tsinghua Shenzhen International Graduate School Tsinghua University Peng Cheng Laboratory (2024)

Paper Information
arXiv ID
Venue
European Conference on Computer Vision
Domain
Computer Vision
SOTA Claim
Yes
Code
Reproducibility
7/10

Abstract

Recent years have seen significant advancements in image restoration, largely attributed to the development of modern deep neural networks, such as CNNs and Transformers.However, existing restoration backbones often face the dilemma between global receptive fields and efficient computation, hindering their application in practice.Recently, the Selective Structured State Space Model, especially the improved version Mamba, has shown great potential for long-range dependency modeling with linear complexity, which offers a way to resolve the above dilemma.However, the standard Mamba still faces certain challenges in low-level vision such as local pixel forgetting and channel redundancy.In this work, we introduce a simple but effective baseline, named MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba.In this way, our MambaIR takes advantage of the local pixel similarity and reduces the channel redundancy.Extensive experiments demonstrate the superiority of our method, for example, MambaIR outperforms SwinIR by up to 0.45dB on image SR, using similar computational cost but with a global receptive field.Code is available at https://github.com/csguoh/MambaIR.

Summary

This paper presents MambaIR, a new baseline for image restoration leveraging the Selective Structured State Space Model (Mamba). The standard Mamba model faces challenges in low-level vision tasks, such as pixel forgetting and channel redundancy. MambaIR intends to improve these limitations by incorporating local enhancements and channel attention. Extensive experiments demonstrate that MambaIR outperforms existing models like SwinIR in image super-resolution tasks while maintaining similar computational efficiency. The approach shows promising scalability and robustness across various restoration tasks, including denoising and JPEG compression artifact reduction.

Methods

This paper employs the following methods:

  • Mamba
  • Selective Structured State Space Model
  • Residual State Space Blocks (RSSBs)
  • Vision State-Space Module (VSSM)

Models Used

  • Mamba
  • SwinIR

Datasets

The following datasets were used in this research:

  • DIV2K
  • Flickr2K
  • Set5
  • Set14
  • B100
  • Urban100
  • Manga109
  • BSD500
  • WED
  • BSD68
  • Kodak24
  • McMaster
  • SIDD
  • DND

Evaluation Metrics

  • PSNR
  • SSIM
  • L1 loss
  • Charbonnier loss

Results

  • MambaIR outperforms SwinIR by up to 0.45dB on image SR
  • Achieves superior performance on image denoising and JPEG compression artifact reduction

Limitations

The authors identified the following limitations:

  • Local pixel forgetting
  • Channel redundancy in the standard Mamba model

Technical Requirements

  • Number of GPUs: 8
  • GPU Type: NVIDIA V100

Keywords

Image Restoration State-Space Model Global Receptive Field Long-range Dependency Modeling

Papers Using Similar Methods

External Resources