ImageNet

Dataset Information
Modalities
Images
Languages
Chinese
Introduced
2009
Homepage

Overview

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection.
The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld.
ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”.
The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

  • Total number of non-empty WordNet synsets: 21841
  • Total number of images: 14197122
  • Number of images with bounding box annotations: 1,034,908
  • Number of synsets with SIFT features: 1000
  • Number of images with SIFT features: 1.2 million

Source: ImageNet Large Scale Visual Recognition Challenge
Image Source: https://cs.stanford.edu/people/karpathy/cnnembed/

Variants: imagenet-1k, ImageNet sigma50, ImageNet sigma250, ImageNet sigma200, ImageNet sigma150, ImageNet sigma100, ImageNet100, ImageNet (linear), ImageNet (finetuned), ImageNet - 5-shot, ImageNet - 5 labeled data per class, ImageNet - 2 labeled data per class, ImageNet - 1-shot, ImageNet - 1 labeled data per class, ImageNet - 10-shot, ImageNet - 0.2% labeled data, ImageNetV2, ImageNet V2, ImageNet - 10% labeled data, ImageNet - 1% labeled data, ImageNet - 0-Shot, ImageNet

Associated Benchmarks

This dataset is used in 24 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Image Reconstruction MGVQ (16x16x8) MGVQ: Could VQ-VAE Beat VAE? … 2025-07-14
Image Reconstruction MGVQ (16x16x4) MGVQ: Could VQ-VAE Beat VAE? … 2025-07-14
Prompt Engineering MMRL++ MMRL++: Parameter-Efficient and Interaction-Aware Representation … 2025-05-15
Image Reconstruction GigaTok-XL-XXL GigaTok: Scaling Visual Tokenizers to … 2025-04-11
Image Classification CMA(ViT-B/16) Enhanced OoD Detection through Cross-Modal … 2025-03-24
Image Clustering PRO-DSC Exploring a Principled Framework for … 2025-03-21
Prompt Engineering MMRL MMRL: Multi-Modal Representation Learning for … 2025-03-11
Image Classification CI2P-ViT Compress image to patches for … 2025-02-14
Image Classification ConvNeXt-T-Hermite Polynomial, trigonometric, and tropical activations 2025-02-03
Knowledge Distillation SpectralKD (T:Cait-S24 S:DeiT-T) SpectralKD: A Unified Framework for … 2024-12-26
Knowledge Distillation SpectralKD (T:Swin-S S:Swin-T) SpectralKD: A Unified Framework for … 2024-12-26
Knowledge Distillation SpectralKD (T:Cait-S24 S:DeiT-S) SpectralKD: A Unified Framework for … 2024-12-26
Image Classification M2D-T Mamba2D: A Natively Multi-Dimensional State-Space … 2024-12-20
Image Reconstruction OptVQ (16x16x4) Preventing Local Pitfalls in Vector … 2024-12-19
Image Reconstruction OptVQ (16x16x8) Preventing Local Pitfalls in Vector … 2024-12-19
Image Reconstruction IBQ (16x16) Taming Scalable Visual Tokenizer for … 2024-12-03
Image Classification ResNet34 (FSGDM) On the Performance Analysis of … 2024-11-29
Image Classification ResNet50 (FSGDM) On the Performance Analysis of … 2024-11-29
Image Classification AIMv2-1B Multimodal Autoregressive Pre-training of Large … 2024-11-21
Image Classification AIMv2-3B (448 res) Multimodal Autoregressive Pre-training of Large … 2024-11-21

Research Papers

Recent papers with results on this dataset: