Domain
medical image segmentation
In the realm of medical image segmentation, both CNNbased and Transformer-based models have been extensively explored.However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity.Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach.They not only excel in modeling long-range interactions but also maintain a linear computational complexity.In this paper, leveraging state space models, we propose a Ushape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet).Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed.We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks.To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model.We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems.Our code is available at https://github.com/JCruan519/VM-UNet.
This paper introduces VM-UNet, a novel medical image segmentation model based on pure State Space Models (SSMs), specifically designed to overcome the limitations of conventional CNN and Transformer architectures in handling long-range dependencies with efficiency. The model comprises three main components: an encoder with Visual State Space (VSS) blocks for feature extraction, a decoder for output restoration, and simple skip connections to optimize performance. Extensive experiments are conducted on ISIC17, ISIC18, and Synapse datasets, demonstrating that VM-UNet achieves competitive segmentation results, thereby establishing a baseline for future SSM-based approaches. The authors emphasize the model's potential applications and outline future directions for improving segmentation efficiency, including module design and compression strategies.
This paper employs the following methods:
- State Space Models (SSMs)
- Visual State Space (VSS) block
The following datasets were used in this research:
- Mean Intersection over Union (mIoU)
- Dice Similarity Coefficient (DSC)
- Accuracy (Acc)
- Sensitivity (Sen)
- Specificity (Spe)
- 95% Hausdorff Distance (HD95)
- VM-UNet demonstrates competitive performance in medical image segmentation tasks on ISIC17, ISIC18, and Synapse datasets.
- Establishes a baseline for pure SSM-based segmentation models.
- Achieves superior results compared to state-of-the-art models on evaluated metrics.
The authors identified the following limitations:
- Potential need for additional specialized modules to further enhance segmentation tasks.
- Initial parameter count of 30M may limit real-world applications without optimization.
- Number of GPUs: 1
- GPU Type: NVIDIA RTX A6000
visual state space (VSS) blocks
medical image segmentation
state space models (SSMs)
UNet architecture
deep learning