← ML Research Wiki / 2402.02491

VM-UNet: Vision Mamba UNet for Medical Image Segmentation

Jiacheng Ruan [email protected] Shanghai Jiao Tong University, Suncheng Xiang [email protected] Shanghai Jiao Tong University (2024)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
medical image segmentation
Code
Reproducibility
7/10

Abstract

In the realm of medical image segmentation, both CNNbased and Transformer-based models have been extensively explored.However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity.Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach.They not only excel in modeling long-range interactions but also maintain a linear computational complexity.In this paper, leveraging state space models, we propose a Ushape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet).Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed.We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks.To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model.We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems.Our code is available at https://github.com/JCruan519/VM-UNet.

Summary

This paper introduces VM-UNet, a novel medical image segmentation model based on pure State Space Models (SSMs), specifically designed to overcome the limitations of conventional CNN and Transformer architectures in handling long-range dependencies with efficiency. The model comprises three main components: an encoder with Visual State Space (VSS) blocks for feature extraction, a decoder for output restoration, and simple skip connections to optimize performance. Extensive experiments are conducted on ISIC17, ISIC18, and Synapse datasets, demonstrating that VM-UNet achieves competitive segmentation results, thereby establishing a baseline for future SSM-based approaches. The authors emphasize the model's potential applications and outline future directions for improving segmentation efficiency, including module design and compression strategies.

Methods

This paper employs the following methods:

  • State Space Models (SSMs)
  • Visual State Space (VSS) block

Models Used

  • VM-UNet
  • Mamba
  • VMamba

Datasets

The following datasets were used in this research:

  • ISIC17
  • ISIC18
  • Synapse

Evaluation Metrics

  • Mean Intersection over Union (mIoU)
  • Dice Similarity Coefficient (DSC)
  • Accuracy (Acc)
  • Sensitivity (Sen)
  • Specificity (Spe)
  • 95% Hausdorff Distance (HD95)

Results

  • VM-UNet demonstrates competitive performance in medical image segmentation tasks on ISIC17, ISIC18, and Synapse datasets.
  • Establishes a baseline for pure SSM-based segmentation models.
  • Achieves superior results compared to state-of-the-art models on evaluated metrics.

Limitations

The authors identified the following limitations:

  • Potential need for additional specialized modules to further enhance segmentation tasks.
  • Initial parameter count of 30M may limit real-world applications without optimization.

Technical Requirements

  • Number of GPUs: 1
  • GPU Type: NVIDIA RTX A6000

Keywords

visual state space (VSS) blocks medical image segmentation state space models (SSMs) UNet architecture deep learning

Papers Using Similar Methods

External Resources