ML Research Wiki / Benchmarks / Image Classification / iNaturalist

iNaturalist

Image Classification Benchmark

Performance Over Time

📊 Showing 18 results | 📏 Metric: Top 1 Accuracy

Top Performing Models

Rank	Model	Paper	Top 1 Accuracy	Date	Code
1	AIMv2-3B (448 res)	Multimodal Autoregressive Pre-training of Large Vision Encoders	85.90	2024-11-21	📦 apple/ml-aim
2	Hiera-H (448px) 📚	Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles	83.80	2023-06-01	📦 huggingface/pytorch-image-models 📦 facebookresearch/hiera 📦 leondgarse/keras_cv_attention_models 📦 birder/birder
3	MAE (ViT-H, 448) 📚	Masked Autoencoders Are Scalable Vision Learners	83.40	2021-11-11	📦 facebookresearch/mae 📦 lightly-ai/lightly 📦 open-mmlab/mmselfsup
4	AIMv2-3B	Multimodal Autoregressive Pre-training of Large Vision Encoders	81.50	2024-11-21	📦 apple/ml-aim
5	AIMv2-1B	Multimodal Autoregressive Pre-training of Large Vision Encoders	79.70	2024-11-21	📦 apple/ml-aim
6	AIMv2-H	Multimodal Autoregressive Pre-training of Large Vision Encoders	77.90	2024-11-21	📦 apple/ml-aim
7	AIMv2-L	Multimodal Autoregressive Pre-training of Large Vision Encoders	76.00	2024-11-21	📦 apple/ml-aim
8	FixSENet-154 📚	Fixing the train-test resolution discrepancy	75.40	2019-06-14	📦 facebookresearch/FixRes 📦 libffcv/ffcv-imagenet 📦 kun-woo-park/Deeplearning_project_STL_10
9	b_22DeiT-LT(ours)	DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets	75.10	2024-04-03	📦 val-iisc/DeiT-LT 📦 pwc-1/Paper-8
10	SEB+EfficientNet-B5	On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition	72.30	2022-05-26	📦 KingJamesSong/DifferentiableSVD

All Papers (18)

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024

AIMv2-3B (448 res)

apple/ml-aim

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

2023

Hiera-H (448px)

huggingface/pytorch-image-models facebookresearch/hiera

Masked Autoencoders Are Scalable Vision Learners

2021

MAE (ViT-H, 448)

facebookresearch/mae lightly-ai/lightly

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024

AIMv2-3B

apple/ml-aim

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024

AIMv2-1B

apple/ml-aim

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024

AIMv2-H

apple/ml-aim

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024

AIMv2-L

apple/ml-aim

Fixing the train-test resolution discrepancy

2019

FixSENet-154

facebookresearch/FixRes libffcv/ffcv-imagenet kun-woo-park/Deeplearning_project_STL_10

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

2024

b_22DeiT-LT(ours)

val-iisc/DeiT-LT pwc-1/Paper-8

On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition

2022

SEB+EfficientNet-B5

KingJamesSong/DifferentiableSVD

TransFG: A Transformer Architecture for Fine-grained Recognition

2021

TransFG

TACJu/TransFG skchen1993/TrangFG

Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization

2019

iSQRT-COV-Net

jiangtaoxie/fast-MPN-COV jiangtaoxie/MPN-COV ZhangLi-CS/GCP_Optimization

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

2022

MetaFormer (MetaFormer-2,384,extra_info)

dqshuai/metaformer salluru007/papers

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

2022

MetaFormer (MetaFormer-2,384)

dqshuai/metaformer salluru007/papers

The iNaturalist Species Classification and Detection Dataset

2017

IncResNetV2 SE

tensorflow/models deeplearning-wisc/knn-ood

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

2019

SpineNet-143

tensorflow/models tensorflow/tpu

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

2021

MetaSAug

BIT-DA/MetaSAug

Graph-RISE: Graph-Regularized Image Semantic Embedding

2019

Graph-RISE (40M)

tensorflow/neural-structured-learning

iNaturalist

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (18)

Multimodal Autoregressive Pre-training of Large Vision Encoders

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Masked Autoencoders Are Scalable Vision Learners

Multimodal Autoregressive Pre-training of Large Vision Encoders

Multimodal Autoregressive Pre-training of Large Vision Encoders

Multimodal Autoregressive Pre-training of Large Vision Encoders

Multimodal Autoregressive Pre-training of Large Vision Encoders

Fixing the train-test resolution discrepancy

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition

TransFG: A Transformer Architecture for Fine-grained Recognition

Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

The iNaturalist Species Classification and Detection Dataset

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Graph-RISE: Graph-Regularized Image Semantic Embedding

Model	Paper	Top 1 Accuracy	Date
AIMv2-3B (448 res)	Multimodal Autoregressive Pre-training of Large V…	85.90	2024-11-21
Hiera-H (448px)	Hiera: A Hierarchical Vision Transformer without …	83.80	2023-06-01
MAE (ViT-H, 448)	Masked Autoencoders Are Scalable Vision Learners	83.40	2021-11-11
AIMv2-3B	Multimodal Autoregressive Pre-training of Large V…	81.50	2024-11-21
AIMv2-1B	Multimodal Autoregressive Pre-training of Large V…	79.70	2024-11-21
AIMv2-H	Multimodal Autoregressive Pre-training of Large V…	77.90	2024-11-21
AIMv2-L	Multimodal Autoregressive Pre-training of Large V…	76.00	2024-11-21
FixSENet-154	Fixing the train-test resolution discrepancy	75.40	2019-06-14
b_22DeiT-LT(ours)	DeiT-LT Distillation Strikes Back for Vision Tran…	75.10	2024-04-03
SEB+EfficientNet-B5	On the Eigenvalues of Global Covariance Pooling f…	72.30	2022-05-26
TransFG	TransFG: A Transformer Architecture for Fine-grai…	71.70	2021-03-14
iSQRT-COV-Net	Deep CNNs Meet Global Covariance Pooling: Better …	14.63	2019-04-15
MetaFormer (MetaFormer-2,384,extra_info)	MetaFormer: A Unified Meta Framework for Fine-Gra…		2022-03-05
MetaFormer (MetaFormer-2,384)	MetaFormer: A Unified Meta Framework for Fine-Gra…		2022-03-05
IncResNetV2 SE	The iNaturalist Species Classification and Detect…		2017-07-20
SpineNet-143	SpineNet: Learning Scale-Permuted Backbone for Re…		2019-12-10
MetaSAug	MetaSAug: Meta Semantic Augmentation for Long-Tai…		2021-03-23
Graph-RISE (40M)	Graph-RISE: Graph-Regularized Image Semantic Embe…		2019-02-14