ML Research Wiki / Benchmarks / Knowledge Distillation / ImageNet

ImageNet

Knowledge Distillation Benchmark

Performance Over Time

📊 Showing 50 results | 📏 Metric: Top-1 accuracy %

Top Performing Models

Rank Model Paper Top-1 accuracy % Date Code
1 ScaleKD (T:BEiT-L S:ViT-B/14) ScaleKD: Strong Vision Transformers Could Be Excellent Teachers 86.43 2024-11-11 📦 deep-optimization/scalekd
2 ScaleKD (T:Swin-L S:ViT-B/16) ScaleKD: Strong Vision Transformers Could Be Excellent Teachers 85.53 2024-11-11 📦 deep-optimization/scalekd
3 ScaleKD (T:Swin-L S:ViT-S/16) ScaleKD: Strong Vision Transformers Could Be Excellent Teachers 83.93 2024-11-11 📦 deep-optimization/scalekd
4 ScaleKD (T:Swin-L S:Swin-T) ScaleKD: Strong Vision Transformers Could Be Excellent Teachers 83.80 2024-11-11 📦 deep-optimization/scalekd
5 KD++(T: regnety-16GF S:ViT-B) Improving Knowledge Distillation via Regularizing Feature Norm and Direction 83.60 2023-05-26 📦 wangyz1608/knowledge-distillation-via-nd
6 VkD (T:RegNety 160 S:DeiT-S) $V_kD:$ Improving Knowledge Distillation using Orthogonal Projections 82.90 2024-03-10 📦 roymiles/vkd
7 SpectralKD (T:Swin-S S:Swin-T) SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis 82.70 2024-12-26 📦 thy960112/SpectralKD
8 ScaleKD (T:Swin-L S:ResNet-50) ScaleKD: Strong Vision Transformers Could Be Excellent Teachers 82.55 2024-11-11 📦 deep-optimization/scalekd
9 DiffKD (T:Swin-L S: Swin-T) Knowledge Diffusion for Distillation 82.50 2023-05-25 📦 hunto/diffkd
10 DIST (T: Swin-L S: Swin-T) 📚 Knowledge Distillation from A Stronger Teacher 82.30 2022-05-21 📦 yoshitomo-matsubara/torchdistill 📦 hunto/dist_kd 📦 hunto/image_classification_sota

All Papers (50)

Knowledge Diffusion for Distillation

2023
DiffKD (T:Swin-L S: ResNet-50)

Distilling the Knowledge in a Neural Network

2015
ADLIK-MO(T: ResNet101 S: ResNet50)

Distilling the Knowledge in a Neural Network

2015
KD (T: ResNet-34 S:ResNet-18)