Venue
European Conference on Computer Vision
Recent years have seen significant advancements in image restoration, largely attributed to the development of modern deep neural networks, such as CNNs and Transformers.However, existing restoration backbones often face the dilemma between global receptive fields and efficient computation, hindering their application in practice.Recently, the Selective Structured State Space Model, especially the improved version Mamba, has shown great potential for long-range dependency modeling with linear complexity, which offers a way to resolve the above dilemma.However, the standard Mamba still faces certain challenges in low-level vision such as local pixel forgetting and channel redundancy.In this work, we introduce a simple but effective baseline, named MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba.In this way, our MambaIR takes advantage of the local pixel similarity and reduces the channel redundancy.Extensive experiments demonstrate the superiority of our method, for example, MambaIR outperforms SwinIR by up to 0.45dB on image SR, using similar computational cost but with a global receptive field.Code is available at https://github.com/csguoh/MambaIR.
This paper presents MambaIR, a new baseline for image restoration leveraging the Selective Structured State Space Model (Mamba). The standard Mamba model faces challenges in low-level vision tasks, such as pixel forgetting and channel redundancy. MambaIR intends to improve these limitations by incorporating local enhancements and channel attention. Extensive experiments demonstrate that MambaIR outperforms existing models like SwinIR in image super-resolution tasks while maintaining similar computational efficiency. The approach shows promising scalability and robustness across various restoration tasks, including denoising and JPEG compression artifact reduction.
This paper employs the following methods:
- Mamba
- Selective Structured State Space Model
- Residual State Space Blocks (RSSBs)
- Vision State-Space Module (VSSM)
The following datasets were used in this research:
- DIV2K
- Flickr2K
- Set5
- Set14
- B100
- Urban100
- Manga109
- BSD500
- WED
- BSD68
- Kodak24
- McMaster
- SIDD
- DND
- PSNR
- SSIM
- L1 loss
- Charbonnier loss
- MambaIR outperforms SwinIR by up to 0.45dB on image SR
- Achieves superior performance on image denoising and JPEG compression artifact reduction
The authors identified the following limitations:
- Local pixel forgetting
- Channel redundancy in the standard Mamba model
- Number of GPUs: 8
- GPU Type: NVIDIA V100
Image Restoration
State-Space Model
Global Receptive Field
Long-range Dependency Modeling