ML Research Wiki / Benchmarks / Semantic Segmentation / ADE20K

ADE20K

Semantic Segmentation Benchmark

Performance Over Time

📊 Showing 229 results | 📏 Metric: Validation mIoU

Top Performing Models

Rank Model Paper Validation mIoU Date Code
1 InternImage-H (M3I Pre-training) InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions 1310.00 2022-11-10 📦 opengvlab/internimage 📦 OpenGVLab/M3I-Pretraining 📦 chenller/mmseg-extension
2 ViT-P (InternImage-H) 📚 The Missing Point in Vision Transformers for Universal Image Segmentation 63.60 2025-05-26 📦 sajjad-sh33/vit-p
3 ONE-PEACE 📚 ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities 63.00 2023-05-18 📦 modelscope/modelscope 📦 OFA-Sys/ONE-PEACE
4 InternImage-H 📚 InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions 62.90 2022-11-10 📦 opengvlab/internimage 📦 OpenGVLab/M3I-Pretraining 📦 chenller/mmseg-extension
5 M3I Pre-training (InternImage-H) 📚 Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information 62.90 2022-11-17 📦 OpenGVLab/M3I-Pretraining
6 BEiT-3 📚 Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks 62.80 2022-08-22 📦 microsoft/unilm 📦 lyan62/data-curation
7 EVA 📚 EVA: Exploring the Limits of Masked Visual Representation Learning at Scale 62.30 2022-11-14 📦 rwightman/pytorch-image-models 📦 open-mmlab/mmselfsup 📦 baaivision/eva
8 ViT-P (OneFormer, InternImage-H) The Missing Point in Vision Transformers for Universal Image Segmentation 61.60 2025-05-26 📦 sajjad-sh33/vit-p
9 ViT-Adapter-L (Mask2Former, BEiTv2 pretrain) 📚 Vision Transformer Adapter for Dense Predictions 61.50 2022-05-17 📦 czczup/vit-adapter 📦 chenller/mmseg-extension
10 FD-SwinV2-G Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation 61.40 2022-05-27 📦 SwinTransformer/Feature-Distillation

All Papers (229)

Your ViT is Secretly an Image Segmentation Model

2025
EoMT (DINOv2-L, single-scale, 512x512)

Could Giant Pretrained Image Models Extract Universal Representations?

2022
Frozen Backbone, SwinV2-G-ext22K (Mask2Former)

Vision Transformers with Patch Diversification

2021
PatchDiverse + Swin-L (multi-scale test, upernet, ImageNet22k pretrain)

Sequential Ensembling for Semantic Segmentation

2022
Sequential Ensemble (SegFormer)

Sequential Ensembling for Semantic Segmentation

2022
Sequential Ensemble (DeepLabv3+)

Pyramid Scene Parsing Network

2016
PSPNet (ResNet-152)

Pyramid Scene Parsing Network

2016
PSPNet (ResNet-101)