ML Research Wiki / Benchmarks / Image Classification / ImageNet

ImageNet

Image Classification Benchmark

Performance Over Time

📊 Showing 1020 results | 📏 Metric: Top 1 Accuracy

Top Performing Models

Rank	Model	Paper	Top 1 Accuracy	Date	Code
1	Florence-CoSwin-H	Florence: A New Foundation Model for Computer Vision	99.02	2021-11-22	📦 microsoft/unicl 📦 MindCode-4/code-3
2	Meta Pseudo Labels (EfficientNet-L2)	Meta Pseudo Labels	98.80	2020-03-23	📦 google-research/google-research 📦 kekmodel/MPL-pytorch 📦 sayakpaul/PAWS-TF
3	BiT-L (ResNet)	Big Transfer (BiT): General Visual Representation Learning	98.46	2019-12-24	📦 google-research/big_transfer 📦 sayakpaul/FunMatch-Distillation 📦 bethgelab/InDomainGeneralizationBenchmark
4	PNASNet-5	Progressive Neural Architecture Search	96.20	2017-12-02	📦 tensorflow/models 📦 tensorflow/models 📦 tensorflow/models
5	GhostNetV3 1.6x	GhostNetV3: Exploring the Training Strategies for Compact Models	95.20	2024-04-17	📦 james77777778/keras-image-models
6	ResNeXt-101 64x4	Aggregated Residual Transformations for Deep Neural Networks	94.70	2016-11-16	📦 pytorch/vision 📦 NVIDIA/DeepLearningExamples 📦 open-mmlab/mmpose
7	GhostNetV3 1.3x	GhostNetV3: Exploring the Training Strategies for Compact Models	94.50	2024-04-17	📦 james77777778/keras-image-models
8	GhostNetV3 1.0x	GhostNetV3: Exploring the Training Strategies for Compact Models	93.30	2024-04-17	📦 james77777778/keras-image-models
9	GhostNetV3 0.5x	GhostNetV3: Exploring the Training Strategies for Compact Models	88.50	2024-04-17	📦 james77777778/keras-image-models
10	Unicom (ViT-L/14@336px) (Finetuned)	Unicom: Universal and Compact Representation Learning for Image Retrieval	88.30	2023-04-12	📦 OML-Team/open-metric-learning 📦 deepglint/unicom 📦 RocketFlash/easy_metric_learning

james77777778/keras-image-models

Unicom: Universal and Compact Representation Learning for Image Retrieval

2023

Unicom (ViT-L/14@336px) (Finetuned)

OML-Team/open-metric-learning deepglint/unicom RocketFlash/easy_metric_learning

A Study on Transformer Configuration and Training Objective

2022

Bamboo (Bamboo-H)

VOLO-D4

rwightman/pytorch-image-models xmu-xiaoma666/External-Attention-pytorch

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

2021

EfficientNetV2-M

rwightman/pytorch-image-models pytorch/vision

InternImage-S

opengvlab/internimage OpenGVLab/M3I-Pretraining chenller/mmseg-extension

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

2021

UniNet-B4

FasterViT: Fast Vision Transformers with Hierarchical Attention

2023

FasterViT-2

NVlabs/FasterViT leondgarse/keras_cv_attention_models

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

2023

TransNeXt-Tiny (IN-1K supervised, 224)

Westlake-AI/openmixup daishiresearch/transnext

Training data-efficient image transformers & distillation through attention

2020

DeiT-B

huggingface/transformers rwightman/pytorch-image-models

Masking meets Supervision: A Strong Learning Alliance

2023

ViT-B @224 (DeiT-III + AugSub)

naver-ai/augsub

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

2024

MambaVision-B

nvlabs/mambavision hashmatshadab/mambarobustness jiaowoguanren0615/MambaVision

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

2022

RevBiFPN-S6

cerebrasresearch/revbifpn

Exploring the Limits of Weakly Supervised Pretraining

2018

ResNeXt-101 32×16d

PaddlePaddle/PaddleClas facebookresearch/ClassyVision

2022

Pyramid ViG-B

huawei-noah/efficient-ai-backbones huawei-noah/CV-backbones

Container: Context Aggregation Network

2021

Container Container

allenai/container gaopengcuhk/Container

2021

sMLPNet-S (ImageNet-1k)

microsoft/SPACH liuruiyang98/Jittor-MLP

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

2022

RevBiFPN-S4

cerebrasresearch/revbifpn

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

2021

ZenNAS (0.8ms)

alibaba/lightweight-neural-architecture-search idstcv/ZenNAS

Vision GNN: An Image is Worth Graph of Nodes

2022

Pyramid ViG-M

huawei-noah/efficient-ai-backbones huawei-noah/CV-backbones

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

2023

SwinV2-Ti

tobna/whattransformertofavor

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window

2022

gSwin-S

MultiGrain: a unified image embedding for classes and instances

2019

MultiGrain SENet154 (400px)

facebookresearch/multigrain 2023-MindSpore-1/ms-code-62 leehangyu/MultiGrain_Application

Graph Convolutions Enrich the Self-Attention in Transformers!

2023

Swin-S + GFSA

jeongwhanchoi/gfsa

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications

2024

CAS-ViT-M

tianfang-zhang/cas-vit birder/birder

CvT: Introducing Convolutions to Vision Transformers

2021

CvT-13 (384 res)

huggingface/transformers BR-IDL/PaddleViT

Semi-Supervised Recognition under a Noisy and Fine-grained Dataset

2020

ResNet50_vd_ssld

PaddlePaddle/PaddleClas

MetaFormer Baselines for Vision

2022

ConvFormer-S18 (224 res)

rwightman/pytorch-image-models facebookresearch/xformers

Multiscale Vision Transformers

2021

MViT-B-16

facebookresearch/SlowFast facebookresearch/pytorchvideo

ResNeSt: Split-Attention Networks

2020

ResNeSt-101

rwightman/pytorch-image-models open-mmlab/mmdetection

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

2023

DeiT-B (+MixPro)

fistyee/mixpro

MobileNetV4 -- Universal Models for the Mobile Ecosystem

2024

MNv4-Conv-L

tensorflow/models huggingface/pytorch-image-models

IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

2022

IPT-S

shendu0321/incepformer

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

2021

ViL-Medium-W

microsoft/esvit microsoft/vision-longformer microsoft/VisionLongformerForObjectDetection

Global Filter Networks for Image Classification

2021

GFNet-H-B

raoyongming/GFNet liuruiyang98/Jittor-MLP

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

2019

Oct-ResNet-152 (SE)

osmr/imgclsmob lxtGH/OctaveConv_pytorch

Harmonic Convolutional Networks based on Discrete Cosine Transform

2020

Harm-SE-RNX-101 64x4d (320x320, Mean-Max Pooling)

matej-ulicny/harmonic-networks

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

2023

GTP-LV-ViT-M/P8

ackesnal/gtp-vit

Knowledge distillation: A good teacher is patient and consistent

2021

FunMatch - T384+224 (ResNet-50)

google-research/big_vision google-research/big_transfer

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

2023

CA-Swin-T (+MixPro)

fistyee/mixpro

Graph Convolutions Enrich the Self-Attention in Transformers!

2023

CaiT-S + GFSA

jeongwhanchoi/gfsa

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

2024

RDNet-T

huggingface/pytorch-image-models naver-ai/rdnet birder/birder

MultiGrain: a unified image embedding for classes and instances

2019

MultiGrain SENet154 (500px)

facebookresearch/multigrain 2023-MindSpore-1/ms-code-62 leehangyu/MultiGrain_Application

Visual Attention Network

2022

VAN-B2

huggingface/transformers facebookresearch/xformers

DaViT: Dual Attention Vision Transformers

2022

DaViT-T

rwightman/pytorch-image-models leondgarse/keras_cv_attention_models

Rethinking Channel Dimensions for Efficient Model Design

2020

ReXNet_3.0

PaddlePaddle/PaddleClas clovaai/rexnet

Sequencer: Deep LSTM for Image Classification

2022

Sequencer2D-M

rwightman/pytorch-image-models timeseriesAI/tsai

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

2021

CrossViT-18+

rwightman/pytorch-image-models BR-IDL/PaddleViT

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

2022

Shift-S

keras-team/keras-io microsoft/SPACH

HRFormer: High-Resolution Transformer for Dense Prediction

2021

HRFormer-B

HRNet/HRFormer

Bottleneck Transformers for Visual Recognition

2021

BoTNet T4

rwightman/pytorch-image-models BR-IDL/PaddleViT

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

2023

PVT-M (+MixPro)

fistyee/mixpro

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

2022

EfficientViT-B2 (r256)

rwightman/pytorch-image-models mit-han-lab/efficientvit

Dilated Neighborhood Attention Transformer

2022

DiNAT-Tiny

huggingface/transformers SHI-Labs/Neighborhood-Attention-Transformer

ELSA: Enhanced Local Self-Attention for Vision Transformer

2021

ELSA-Swin-T

damo-cv/elsa

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

2024

MambaVision-T2

nvlabs/mambavision hashmatshadab/mambarobustness jiaowoguanren0615/MambaVision

Learning Transferable Architectures for Scalable Image Recognition

2017

NASNET-A(6)

tensorflow/models tensorflow/models

Towards Robust Vision Transformer

2021

RVT-B*

alibaba/easyrobust vtddggg/Robust-Vision-Transformer

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

2025

CMA(ViT-B/16)

ma-kjh/CMA-OoDD

AOGNet-40M-AN

iVMCL/AOGNet-v2 ivMCL/AttentiveNorm_Detection

FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

gSwin-T

2021

SENet-101

rwightman/pytorch-image-models BR-IDL/PaddleViT

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

2023

GFNet-S

tobna/whattransformertofavor

2021

CAIT-XXS-24

rwightman/pytorch-image-models lucidrains/vit-pytorch

Rethinking and Improving Relative Position Encoding for Vision Transformer

2021

DeiT-S with iRPE-K

microsoft/cream

Centroid Transformers: Learning to Abstract with Attention

2021

CentroidViT-S (arXiv, 2021-02)

ResMLP: Feedforward networks for image classification with data-efficient training

2021

ResMLP-S24

rwightman/pytorch-image-models xmu-xiaoma666/External-Attention-pytorch

MobileNetV4 -- Universal Models for the Mobile Ecosystem

2024

MNv4-Hybrid-M

tensorflow/models huggingface/pytorch-image-models

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

2022

TinyViT-5M-distill (21k)

rwightman/pytorch-image-models microsoft/cream birder/birder

Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

2021

CoE-Large

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

2020

MEAL V2 (ResNet-50) (224 res)

szq0214/MEAL-V2

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

2023

TokenLearner-ViT-8

tobna/whattransformertofavor

ResNeSt: Split-Attention Networks

2020

ResNeSt-50-fast

rwightman/pytorch-image-models open-mmlab/mmdetection

TransBoost: Improving the Best ImageNet Performance using Deep Transduction

2022

TransBoost-ResNet152

omerb01/transboost

Fast AutoAugment

2019

ResNet-200 (Fast AA)

kakaobrain/fast-autoaugment ildoonet/pytorch-randaugment

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

2023

CaiT-XXS (+MixPro)

fistyee/mixpro

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

2023

FastViT-SA12

rwightman/pytorch-image-models apple/ml-fastvit

AlphaNet: Improved Training of Supernets with Alpha-Divergence

2021

AlphaNet-A5

facebookresearch/AttentiveNAS facebookresearch/AlphaNet

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

2021

ResNet-50+AutoDropout+RandAugment

google-research/google-research

FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search

2019

FairNAS-B

xiaomi-automl/FairNAS fairnas/FairNAS

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

2019

ResNeXt-101 (CutMix)

rwightman/pytorch-image-models pytorch/vision

Residual Attention Network for Image Classification

2017

Attention-92

tengshaofeng/ResidualAttentionNetwork-pytorch xiaoboCASIA/SV-X-Softmax

Neural Architecture Transfer

2020

NAT-M4

human-analysis/neural-architecture-transfer awesomelemon/encas

IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

2022

IPT-T

shendu0321/incepformer

GLiT: Neural Architecture Search for Global and Local Image Transformer

2021

GLiT-Smalls

lpxtt/simtrack bychen515/glit

Gated Convolutional Networks with Hybrid Connectivity for Image Classification

2019

HCGNet-C

winycg/HCGNet

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

FixResNet-50 CutMix

facebookresearch/FixRes libffcv/ffcv-imagenet kun-woo-park/Deeplearning_project_STL_10

Mish: A Self Regularized Non-Monotonic Activation Function

2019

CSPResNeXt-50 + Mish

tensorflow/addons digantamisra98/Mish

Revisiting a kNN-based Image Classification System with High-capacity Storage

ResMLP-36

rwightman/pytorch-image-models xmu-xiaoma666/External-Attention-pytorch

2021

PiT-XS

rwightman/pytorch-image-models BR-IDL/PaddleViT

2023

CoaT-Ti

tobna/whattransformertofavor

2022

R-Mix (ResNet-50)

minhlong94/random-mixup

Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks

2019

ResnetV2 50 (FRN layer)

ensta-u2is/torch-uncertainty mindspore-ai/models

FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

2021

FBNetV5-AR-CLS

MogaNet: Multi-order Gated Aggregation Network

2022

MogaNet-XT (256res)

chengtan9907/OpenSTL chengtan9907/simvpv2

Rethinking Channel Dimensions for Efficient Model Design

2020

ReXNet_0.9

PaddlePaddle/PaddleClas clovaai/rexnet

Deep Polynomial Neural Networks

2020

Prodpoly

Faceplugin-ltd/FaceRecognition-Android FaceOnLive/Face-Recognition-SDK-Android

SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search

2019

SCARLET-B

xiaomi-automl/SCARLET-NAS

GLiT: Neural Architecture Search for Global and Local Image Transformer

2021

GLiT-Tinys

lpxtt/simtrack bychen515/glit

Densely Connected Convolutional Networks

2016

DenseNet-169

pytorch/vision pytorch/vision

GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet

2020

GreedyNAS-C

Bag of Tricks for Image Classification with Convolutional Neural Networks

2018

ResNet-50-D

PaddlePaddle/PaddleOCR rwightman/pytorch-image-models

What do Deep Networks Like to See?

2018

Inception v3

spalaciob/s2snets-reconstruction

Meta Knowledge Distillation

2022

MKD ViT-T

GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet

2020

GreedyNAS-A

Bias Loss for Mobile Neural Networks

2021

SkipblockNet-L

lusinlu/biasloss_skipblocknet lusinlu/skipnet_evaluation

Compress image to patches for Vision Transformer

2025

CI2P-ViT

fanchy/ci2pvit

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

2021

SSAL-Resnet50

Engler93/Self-Supervised-Autogenous-Learning

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

2023

UniRepLKNet-A

ailab-cvc/unireplknet Westlake-AI/openmixup chenller/mmseg-extension

Rethinking Local Perception in Lightweight Vision Transformer

2023

CloFormer-XXS

qhfan/CloFormer

MixConv: Mixed Depthwise Convolutional Kernels

2019

MixNet-M

rwightman/pytorch-image-models PaddlePaddle/PaddleClas

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

2024

ResNet50 (FSGDM)

yinleung/FSGDM

SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search

2019

SCARLET-A

xiaomi-automl/SCARLET-NAS

TransBoost: Improving the Best ImageNet Performance using Deep Transduction

2022

TransBoost-MobileNetV3-L

omerb01/transboost

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

2021

ViTAE-T-Stage

ViTAE-Transformer/ViTAE-Transformer Annbless/ViTAE

GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet

2020

GreedyNAS-B

Learning Visual Representations for Transfer Learning by Suppressing Texture

2020

Perona Malik (Perona and Malik, 1990)

HaohanWang/ImageNet-Sketch

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

2023

PVT-T (+MixPro)

fistyee/mixpro

MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features

2022

MobileViTv3-XS

microndla/mobilevitv3 jaiwei98/mobile-vit-pytorch

MnasNet: Platform-Aware Neural Architecture Search for Mobile

2018

MnasNet-A3

rwightman/pytorch-image-models pytorch/vision

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

2021

ViL-Tiny-RPB

microsoft/esvit microsoft/vision-longformer microsoft/VisionLongformerForObjectDetection

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

2021

ConViT-Ti+

rwightman/pytorch-image-models facebookresearch/vissl

TransBoost: Improving the Best ImageNet Performance using Deep Transduction

2022

TransBoost-ResNet34

omerb01/transboost

LIP: Local Importance-based Pooling

2019

LIP-DenseNet-BC-121

sebgao/LIP

SALG-ST

Involution: Inverting the Inherence of Convolution for Visual Recognition

2021

RedNet-26

PaddlePaddle/PaddleClas xmu-xiaoma666/External-Attention-pytorch

FractalNet: Ultra-Deep Neural Networks without Residuals

2016

FractalNet-34

osmr/imgclsmob snf/keras-fractalnet

MixConv: Mixed Depthwise Convolutional Kernels

2019

MixNet-S

rwightman/pytorch-image-models PaddlePaddle/PaddleClas

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

2018

CoordConv ResNet-50

mkocabas/CoordConv-pytorch uber-research/coordconv

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

2021

LeViT-128S

huggingface/transformers rwightman/pytorch-image-models

GhostNet: More Features from Cheap Operations

2019

GhostNet ×1.3

rwightman/pytorch-image-models PaddlePaddle/PaddleDetection

Local Relation Networks for Image Recognition

2019

LR-Net-26

microsoft/Swin-Transformer gan3sh500/local-relational-nets

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

2023

FastViT-T8

rwightman/pytorch-image-models apple/ml-fastvit

Separable Self-attention for Mobile Vision Transformers

2022

MobileViTv2-0.75

rwightman/pytorch-image-models apple/ml-cvnets

MnasNet: Platform-Aware Neural Architecture Search for Mobile

2018

MnasNet-A2

rwightman/pytorch-image-models pytorch/vision

Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

2021

PAWS (ResNet-50, 10% labels)

facebookresearch/suncet facebookresearch/msn

Designing Network Design Spaces

2020

RegNetY-600MF

rwightman/pytorch-image-models open-mmlab/mmdetection

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

2018

ShuffleNet V2

pytorch/vision PaddlePaddle/PaddleSeg

Visual Attention Network

2022

VAN-B0

huggingface/transformers facebookresearch/xformers

AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks

2021

AsymmNet-Large ×1.0

Spark001/AsymmNet

FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search

2019

FairNAS-A

xiaomi-automl/FairNAS fairnas/FairNAS

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

2021

ViTAE-T

ViTAE-Transformer/ViTAE-Transformer Annbless/ViTAE

MUXConv: Information Multiplexing in Convolutional Neural Networks

2020

MUXNet-m

human-analysis/MUXConv

Deep Residual Learning for Image Recognition

2015

ResNet-50

tensorflow/models tensorflow/models

MnasNet: Platform-Aware Neural Architecture Search for Mobile

2018

MnasNet-A1

rwightman/pytorch-image-models pytorch/vision

Searching for MobileNetV3

2019

MobileNet V3-Large 1.0

tensorflow/models tensorflow/models

DiCENet: Dimension-wise Convolutions for Efficient Networks

2019

DiCENet

osmr/imgclsmob sacmehta/EdgeNets

MultiGrain: a unified image embedding for classes and instances

2019

MultiGrain NASNet-A-Mobile (350px)

facebookresearch/multigrain 2023-MindSpore-1/ms-code-62 leehangyu/MultiGrain_Application

GhostNet: More Features from Cheap Operations

2019

Ghost-ResNet-50 (s=2)

rwightman/pytorch-image-models PaddlePaddle/PaddleDetection

Densely Connected Convolutional Networks

2016

DenseNet-121

pytorch/vision pytorch/vision

Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours

2019

Single-Path NAS

osmr/imgclsmob dstamoulis/single-path-nas

WaveMix: A Resource-efficient Neural Network for Image Analysis

2022

WaveMix-192/16 (level 3)

pranavphoenix/WaveMix

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

2018

FBNet-C

facebookresearch/mobile-vision szq0214/fkd

ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

2018

ESPNetv2

PaddlePaddle/PaddleSeg osmr/imgclsmob

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

2021

MobileViT-XS

rwightman/pytorch-image-models xmu-xiaoma666/External-Attention-pytorch

LocalViT: Bringing Locality to Vision Transformers

2021

LocalViT-T

ofsoundof/LocalViT rishikksh20/LocalViT-pytorch

Exploring Randomly Wired Neural Networks for Image Recognition

2019

RandWire-WS (small)

facebookresearch/pycls seungwonpark/RandWireNN

AutoFormer: Searching Transformers for Visual Recognition

2021

AutoFormer-tiny

microsoft/AutoML microsoft/cream

MobileNetV2: Inverted Residuals and Linear Bottlenecks

2018

MobileNetV2 (1.4)

tensorflow/models tensorflow/models

FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search

2019

FairNAS-C

xiaomi-automl/FairNAS fairnas/FairNAS

Rethinking Channel Dimensions for Efficient Model Design

2020

ReXNet_0.6

PaddlePaddle/PaddleClas clovaai/rexnet

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

2018

Proxyless

osmr/imgclsmob mit-han-lab/once-for-all

Rethinking Spatial Dimensions of Vision Transformers

2021

PiT-Ti

rwightman/pytorch-image-models BR-IDL/PaddleViT

Dynamic Convolution: Attention over Convolution Kernels

2019

DY-MobileNetV2 ×1.0

xmu-xiaoma666/External-Attention-pytorch kaijieshi7/Dynamic-convolution-Pytorch

Designing Network Design Spaces

2020

RegNetY-400MF

rwightman/pytorch-image-models open-mmlab/mmdetection

Augmenting Deep Classifiers with Polynomial Neural Networks

2021

PDC

grigorisg9gr/polynomials-for-augmenting-nns jesperhauch/polynomial_deep_learning

GhostNet: More Features from Cheap Operations

2019

Ghost-ResNet-50 (s=4)

rwightman/pytorch-image-models PaddlePaddle/PaddleDetection

Sliced Recursive Transformer

2021

SReT-ExT

szq0214/sret

GhostNet: More Features from Cheap Operations

2019

GhostNet ×1.0

rwightman/pytorch-image-models PaddlePaddle/PaddleDetection

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

2023

DeiT-T (+MixPro)

fistyee/mixpro

MobileNetV4 -- Universal Models for the Mobile Ecosystem

2024

MNv4-Conv-S

tensorflow/models huggingface/pytorch-image-models

Rethinking and Improving Relative Position Encoding for Vision Transformer

2021

DeiT-Ti with iRPE-K

microsoft/cream

TransBoost: Improving the Best ImageNet Performance using Deep Transduction

2022

TransBoost-ResNet18

omerb01/transboost

What's Hidden in a Randomly Weighted Neural Network?

2019

Wide ResNet-50 (edge-popup)

allenai/hidden-networks kosnil/signed_supermasks

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

2020

ResNet-18 (MEAL V2)

szq0214/MEAL-V2

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

2021

ConViT-Ti

rwightman/pytorch-image-models facebookresearch/vissl

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

2022

RevBiFPN-S0

cerebrasresearch/revbifpn

Dynamic Convolution: Attention over Convolution Kernels

2019

DY-MobileNetV2 ×0.75

xmu-xiaoma666/External-Attention-pytorch kaijieshi7/Dynamic-convolution-Pytorch

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

2020

ResNet-18 (FT w/ ResNet-34 teacher)

yoshitomo-matsubara/torchdistill

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

2023

EfficientFormer-V2-S0

tobna/whattransformertofavor

Dynamic Convolution: Attention over Convolution Kernels

2019

DY-ResNet-18

xmu-xiaoma666/External-Attention-pytorch kaijieshi7/Dynamic-convolution-Pytorch

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

2019

ECA-Net (MobileNetV2)

rwightman/pytorch-image-models xmu-xiaoma666/External-Attention-pytorch

Compact Global Descriptor for Neural Networks

2019

MobileNet-224 (CGD)

HolmesShuan/Compact-Global-Descriptor

MobileOne: An Improved One millisecond Mobile Backbone

2022

MobileOne-S0 (distill)

rwightman/pytorch-image-models PaddlePaddle/PaddleDetection

LocalViT: Bringing Locality to Vision Transformers

2021

LocalViT-T2T

ofsoundof/LocalViT rishikksh20/LocalViT-pytorch

MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features

2022

MobileViTv3-0.5

microndla/mobilevitv3 jaiwei98/mobile-vit-pytorch

Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup

2021

ResNet-18 (SAMix)

Westlake-AI/openmixup

On the adequacy of untuned warmup for adaptive optimization

2019

ResNet-50

Tony-Y/pytorch_warmup

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

2021

ResNet-18 (AutoMix)

Westlake-AI/openmixup zeyuanyin/tiny-imagenet Westlake-AI/AutoMix

MobileNetV2: Inverted Residuals and Linear Bottlenecks

2018

MobileNetV2

tensorflow/models tensorflow/models

QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework

2020

Ours

2015

FireCaffe (AlexNet)

0/1 Deep Neural Networks via Block Coordinate Descent

2022

HMAX

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

2020

ViT-Large

huggingface/transformers labmlai/annotated_deep_learning_paper_implementations

Escaping the Big Data Paradigm with Compact Transformers

2021

CCT-14/7x2

keras-team/keras-io SHI-Labs/Compact-Transformers

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

2024

MambaVision-L2

nvlabs/mambavision hashmatshadab/mambarobustness jiaowoguanren0615/MambaVision

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

2023

ONE-PEACE

modelscope/modelscope OFA-Sys/ONE-PEACE

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024

AIMv2-2B

apple/ml-aim

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2022

InternImage-DCNv3-G (M3I Pre-training)

opengvlab/internimage OpenGVLab/M3I-Pretraining chenller/mmseg-extension

ImageNet

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (1020)