ML Research Wiki / Benchmarks / Knowledge Distillation / ImageNet

ImageNet

Knowledge Distillation Benchmark

Performance Over Time

📊 Showing 50 results | 📏 Metric: Top-1 accuracy %

Top Performing Models

Rank	Model	Paper	Top-1 accuracy %	Date	Code
1	ScaleKD (T:BEiT-L S:ViT-B/14)	ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	86.43	2024-11-11	📦 deep-optimization/scalekd
2	ScaleKD (T:Swin-L S:ViT-B/16)	ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	85.53	2024-11-11	📦 deep-optimization/scalekd
3	ScaleKD (T:Swin-L S:ViT-S/16)	ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	83.93	2024-11-11	📦 deep-optimization/scalekd
4	ScaleKD (T:Swin-L S:Swin-T)	ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	83.80	2024-11-11	📦 deep-optimization/scalekd
5	KD++(T: regnety-16GF S:ViT-B)	Improving Knowledge Distillation via Regularizing Feature Norm and Direction	83.60	2023-05-26	📦 wangyz1608/knowledge-distillation-via-nd
6	VkD (T:RegNety 160 S:DeiT-S)	$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections	82.90	2024-03-10	📦 roymiles/vkd
7	SpectralKD (T:Swin-S S:Swin-T)	SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis	82.70	2024-12-26	📦 thy960112/SpectralKD
8	ScaleKD (T:Swin-L S:ResNet-50)	ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	82.55	2024-11-11	📦 deep-optimization/scalekd
9	DiffKD (T:Swin-L S: Swin-T)	Knowledge Diffusion for Distillation	82.50	2023-05-25	📦 hunto/diffkd
10	DIST (T: Swin-L S: Swin-T) 📚	Knowledge Distillation from A Stronger Teacher	82.30	2022-05-21	📦 yoshitomo-matsubara/torchdistill 📦 hunto/dist_kd 📦 hunto/image_classification_sota

All Papers (50)

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

2024

ScaleKD (T:BEiT-L S:ViT-B/14)

deep-optimization/scalekd

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

2024

ScaleKD (T:Swin-L S:ViT-B/16)

deep-optimization/scalekd

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

2024

ScaleKD (T:Swin-L S:ViT-S/16)

deep-optimization/scalekd

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

2024

ScaleKD (T:Swin-L S:Swin-T)

deep-optimization/scalekd

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T: regnety-16GF S:ViT-B)

wangyz1608/knowledge-distillation-via-nd

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

2024

VkD (T:RegNety 160 S:DeiT-S)

roymiles/vkd

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

2024

SpectralKD (T:Swin-S S:Swin-T)

thy960112/SpectralKD

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

2024

ScaleKD (T:Swin-L S:ResNet-50)

deep-optimization/scalekd

Knowledge Diffusion for Distillation

2023

DiffKD (T:Swin-L S: Swin-T)

hunto/diffkd

Knowledge Distillation from A Stronger Teacher

2022

DIST (T: Swin-L S: Swin-T)

yoshitomo-matsubara/torchdistill hunto/dist_kd hunto/image_classification_sota

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

2024

SpectralKD (T:Cait-S24 S:DeiT-S)

thy960112/SpectralKD

Understanding the Role of the Projector in Knowledge Distillation

2023

SRD (T:RegNety 160 S:DeiT-S)

yoshitomo-matsubara/torchdistill Hazqeel09/ellzaf_ml

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

2023

OFA (T: ViT-B S: ResNet-50)

hao840/ofakd

Knowledge Diffusion for Distillation

2023

DiffKD (T:Swin-L S: ResNet-50)

hunto/diffkd

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

2024

VkD (T:RegNety 160 S:DeiT-Ti)

roymiles/vkd

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:resnet-152 S:resnet-101)

wangyz1608/knowledge-distillation-via-nd

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

2019

ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)

Adlik/model_optimizer softsys4ai/neural-distiller

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

2019

ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)

Adlik/model_optimizer softsys4ai/neural-distiller

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:resnet-152 S:resnet-50)

wangyz1608/knowledge-distillation-via-nd

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

2024

SpectralKD (T:Cait-S24 S:DeiT-T)

thy960112/SpectralKD

Understanding the Role of the Projector in Knowledge Distillation

2023

SRD (T:RegNety 160 S:DeIT-Ti)

yoshitomo-matsubara/torchdistill Hazqeel09/ellzaf_ml

Distilling the Knowledge in a Neural Network

2015

ADLIK-MO(T: ResNet101 S: ResNet50)

labmlai/annotated_deep_learning_paper_implementations Deci-AI/super-gradients

Knowledge Distillation Based on Transformed Teacher Matching

2024

WTTM (T: DeiT III-Small S:DeiT-Tiny)

zkxufo/TTM

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

2019

ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)

Adlik/model_optimizer softsys4ai/neural-distiller

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:resnet152 S:resnet34)

wangyz1608/knowledge-distillation-via-nd

Knowledge Distillation Based on Transformed Teacher Matching

2024

WTTM (T:resnet50, S:mobilenet-v1)

zkxufo/TTM

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

ReviewKD++(T:resnet50, S:mobilenet-v1)

wangyz1608/knowledge-distillation-via-nd

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:resnet-152 S:resnet18)

wangyz1608/knowledge-distillation-via-nd

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:renset101 S:resnet18)

wangyz1608/knowledge-distillation-via-nd

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:resnet50 S:resnet18)

wangyz1608/knowledge-distillation-via-nd

Hierarchical Self-supervised Augmented Knowledge Distillation

2021

HSAKD (T: ResNet-34 S:ResNet-18)

winycg/HSAKD

Knowledge Distillation Based on Transformed Teacher Matching

2024

WTTM (T: ResNet-34 S:ResNet-18)

zkxufo/TTM

Knowledge Distillation from A Stronger Teacher

2022

DIST (T: ResNet-34 S:ResNet-18)

yoshitomo-matsubara/torchdistill hunto/dist_kd hunto/image_classification_sota

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T: ResNet-34 S:ResNet-18)

wangyz1608/knowledge-distillation-via-nd

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

2021

WSL (T: ResNet-34 S:ResNet-18)

open-mmlab/mmrazor yzd-v/cls_KD

Complementary Relation Contrastive Distillation

2021

CRCD (T: ResNet-34 S:ResNet-18)

Lechatelia/CRCD

Understanding the Role of the Projector in Knowledge Distillation

2023

SRD (T: ResNet-34 S:ResNet-18)

yoshitomo-matsubara/torchdistill Hazqeel09/ellzaf_ml

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T:ViT-B, S:resnet18)

wangyz1608/knowledge-distillation-via-nd

Distilling Knowledge by Mimicking Features

2020

LSHFM (T: ResNet-34 S:ResNet-18)

DoctorKey/LSHFM.singleclassification DoctorKey/LSHFM.detection DoctorKey/LSHFM.multiclassification

Information Theoretic Representation Distillation

2021

ITRD (T: ResNet-34 S:ResNet-18)

roymiles/ITRD

Knowledge Distillation Meets Self-Supervision

2020

SSKD (T: ResNet-34 S:ResNet-18)

yoshitomo-matsubara/torchdistill xuguodong03/SSKD

Distilling Knowledge via Knowledge Review

2021

Knowledge Review (T: ResNet-34 S:ResNet-18)

yoshitomo-matsubara/torchdistill dvlab-research/reviewkd

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

2021

Adaptive (T:ResNet-50 S:ResNet-18)

wyze-AI/AdaptiveDistillation

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

2023

KD++(T: ViT-S, S:resnet18)

wangyz1608/knowledge-distillation-via-nd

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

2021

AFD (T: ResNet-34 S:ResNet-18)

clovaai/attention-feature-distillation

Contrastive Representation Distillation

2019

CRD (T: ResNet-34 S:ResNet-18)

HobbitLong/RepDistiller yoshitomo-matsubara/torchdistill

A Comprehensive Overhaul of Feature Distillation

2019

Overhual (T: ResNet-34 S:ResNet-18)

clovaai/overhaul-distillation kaung-htet-myat/feature-distillation-tf

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

2016

AT (T: ResNet-34 S:ResNet-18)

yoshitomo-matsubara/torchdistill szagoruyko/attention-transfer

Distilling the Knowledge in a Neural Network

2015

KD (T: ResNet-34 S:ResNet-18)

labmlai/annotated_deep_learning_paper_implementations Deci-AI/super-gradients

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

2016

AT (T: ResNet-34 S:ResNet-18)

yoshitomo-matsubara/torchdistill szagoruyko/attention-transfer

Model	Paper	Top-1 accuracy %	Date
ScaleKD (T:BEiT-L S:ViT-B/14)	ScaleKD: Strong Vision Transformers Could Be Exce…	86.43	2024-11-11
ScaleKD (T:Swin-L S:ViT-B/16)	ScaleKD: Strong Vision Transformers Could Be Exce…	85.53	2024-11-11
ScaleKD (T:Swin-L S:ViT-S/16)	ScaleKD: Strong Vision Transformers Could Be Exce…	83.93	2024-11-11
ScaleKD (T:Swin-L S:Swin-T)	ScaleKD: Strong Vision Transformers Could Be Exce…	83.80	2024-11-11
KD++(T: regnety-16GF S:ViT-B)	Improving Knowledge Distillation via Regularizing…	83.60	2023-05-26
VkD (T:RegNety 160 S:DeiT-S)	$V_kD:$ Improving Knowledge Distillation using Or…	82.90	2024-03-10
SpectralKD (T:Swin-S S:Swin-T)	SpectralKD: A Unified Framework for Interpreting …	82.70	2024-12-26
ScaleKD (T:Swin-L S:ResNet-50)	ScaleKD: Strong Vision Transformers Could Be Exce…	82.55	2024-11-11
DiffKD (T:Swin-L S: Swin-T)	Knowledge Diffusion for Distillation	82.50	2023-05-25
DIST (T: Swin-L S: Swin-T)	Knowledge Distillation from A Stronger Teacher	82.30	2022-05-21
SpectralKD (T:Cait-S24 S:DeiT-S)	SpectralKD: A Unified Framework for Interpreting …	82.20	2024-12-26
SRD (T:RegNety 160 S:DeiT-S)	Understanding the Role of the Projector in Knowle…	82.10	2023-03-20
OFA (T: ViT-B S: ResNet-50)	One-for-All: Bridge the Gap Between Heterogeneous…	81.33	2023-10-30
DiffKD (T:Swin-L S: ResNet-50)	Knowledge Diffusion for Distillation	80.50	2023-05-25
VkD (T:RegNety 160 S:DeiT-Ti)	$V_kD:$ Improving Knowledge Distillation using Or…	79.20	2024-03-10
KD++(T:resnet-152 S:resnet-101)	Improving Knowledge Distillation via Regularizing…	79.15	2023-05-26
ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)	Ensemble Knowledge Distillation for Learning Impr…	78.79	2019-09-17
ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)	Ensemble Knowledge Distillation for Learning Impr…	78.07	2019-09-17
KD++(T:resnet-152 S:resnet-50)	Improving Knowledge Distillation via Regularizing…	77.48	2023-05-26
SpectralKD (T:Cait-S24 S:DeiT-T)	SpectralKD: A Unified Framework for Interpreting …	77.40	2024-12-26
SRD (T:RegNety 160 S:DeIT-Ti)	Understanding the Role of the Projector in Knowle…	77.20	2023-03-20
ADLIK-MO(T: ResNet101 S: ResNet50)	Distilling the Knowledge in a Neural Network	77.14	2015-03-09
WTTM (T: DeiT III-Small S:DeiT-Tiny)	Knowledge Distillation Based on Transformed Teach…	77.03	2024-02-17
ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)	Ensemble Knowledge Distillation for Learning Impr…	76.38	2019-09-17
KD++(T:resnet152 S:resnet34)	Improving Knowledge Distillation via Regularizing…	75.53	2023-05-26
WTTM (T:resnet50, S:mobilenet-v1)	Knowledge Distillation Based on Transformed Teach…	73.09	2024-02-17
ReviewKD++(T:resnet50, S:mobilenet-v1)	Improving Knowledge Distillation via Regularizing…	72.96	2023-05-26
KD++(T:resnet-152 S:resnet18)	Improving Knowledge Distillation via Regularizing…	72.54	2023-05-26
KD++(T:renset101 S:resnet18)	Improving Knowledge Distillation via Regularizing…	72.54	2023-05-26
KD++(T:resnet50 S:resnet18)	Improving Knowledge Distillation via Regularizing…	72.53	2023-05-26
HSAKD (T: ResNet-34 S:ResNet-18)	Hierarchical Self-supervised Augmented Knowledge …	72.39	2021-07-29
WTTM (T: ResNet-34 S:ResNet-18)	Knowledge Distillation Based on Transformed Teach…	72.19	2024-02-17
DIST (T: ResNet-34 S:ResNet-18)	Knowledge Distillation from A Stronger Teacher	72.07	2022-05-21
KD++(T: ResNet-34 S:ResNet-18)	Improving Knowledge Distillation via Regularizing…	72.07	2023-05-26
WSL (T: ResNet-34 S:ResNet-18)	Rethinking Soft Labels for Knowledge Distillation…	72.04	2021-02-01
CRCD (T: ResNet-34 S:ResNet-18)	Complementary Relation Contrastive Distillation	71.96	2021-03-29
SRD (T: ResNet-34 S:ResNet-18)	Understanding the Role of the Projector in Knowle…	71.87	2023-03-20
KD++(T:ViT-B, S:resnet18)	Improving Knowledge Distillation via Regularizing…	71.84	2023-05-26
LSHFM (T: ResNet-34 S:ResNet-18)	Distilling Knowledge by Mimicking Features	71.72	2020-11-03
ITRD (T: ResNet-34 S:ResNet-18)	Information Theoretic Representation Distillation	71.68	2021-12-01
SSKD (T: ResNet-34 S:ResNet-18)	Knowledge Distillation Meets Self-Supervision	71.62	2020-06-12
Knowledge Review (T: ResNet-34 S:ResNet-18)	Distilling Knowledge via Knowledge Review	71.61	2021-04-19
Adaptive (T:ResNet-50 S:ResNet-18)	Adaptive Distillation: Aggregating Knowledge from…	71.61	2021-10-19
KD++(T: ViT-S, S:resnet18)	Improving Knowledge Distillation via Regularizing…	71.46	2023-05-26
AFD (T: ResNet-34 S:ResNet-18)	Show, Attend and Distill:Knowledge Distillation v…	71.38	2021-02-05
CRD (T: ResNet-34 S:ResNet-18)	Contrastive Representation Distillation	71.38	2019-10-23
Overhual (T: ResNet-34 S:ResNet-18)	A Comprehensive Overhaul of Feature Distillation	70.81	2019-04-03
AT (T: ResNet-34 S:ResNet-18)	Paying More Attention to Attention: Improving the…	70.70	2016-12-12
KD (T: ResNet-34 S:ResNet-18)	Distilling the Knowledge in a Neural Network	70.66	2015-03-09
AT (T: ResNet-34 S:ResNet-18)	Paying More Attention to Attention: Improving the…		2016-12-12

ImageNet

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (50)