Machine Learning Benchmarks

Browse 186 benchmarks across 12 tasks
← ML Research Wiki / Benchmarks / Adversarial
Clear
Browse by Category

10-shot image generation

FQL-Driving

FQL-driving

📊 1 results
📏 Metrics: 0-shot MRR

FlyingThings3D

FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …

📊 1 results
📏 Metrics: 0..5sec

MEAD

Multi-view Emotional Audio-visual Dataset

📊 1 results
📏 Metrics: 12k

Music21

Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …

📊 1 results
📏 Metrics: 0..5sec

2D Semantic Segmentation

CamVid

CamVid (Cambridge-driving Labeled Video Database) is a road/driving scene understanding database which was originally captured as five video sequences with …

📊 1 results
📏 Metrics: mIoU

GF-PA66 3D XCT

Stack of 2D gray images of glass fiber-reinforced polyamide 66 (GF-PA66) 3D X-ray Computed Tomography (XCT) specimen. Usage: 2D/3D image …

📊 1 results
📏 Metrics: Jaccard (Mean)

WaterScenes

A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset * WaterScenes, the first …

📊 1 results
📏 Metrics: mIoU

WildScenes

WildScenes is a bi-modal benchmark dataset consisting of multiple large-scale, sequential traversals in natural environments, including semantic annotations in high-resolution …

📊 5 results
📏 Metrics: mIoU, mIoU (Temporal DA) , mIoU (Env DA)

xBD

The xBD dataset contains over 45,000KM2 of polygon labeled pre and post disaster imagery. The dataset provides the post-disaster imagery …

📊 5 results
📏 Metrics: Weighted Average F1-score, Localization F1-score, Classification F1-score

Adversarial Attack

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 6 results
📏 Metrics: Attack: PGD20, Attack: AutoAttack, Attack: DeepFool, Robust Accuracy

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 2 results
📏 Metrics: Attack: AutoAttack

WSJ0-2mix

WSJ0-2mix is a speech recognition corpus of speech mixtures using utterances from the Wall Street Journal (WSJ0) corpus. Source: [Deep …

📊 1 results
📏 Metrics: SDR

Adversarial Defense

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 8 results
📏 Metrics: Accuracy, Attack: AutoAttack, Robust Accuracy

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 3 results
📏 Metrics: autoattack

ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …

📊 1 results
📏 Metrics: Accuracy

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …

📊 2 results
📏 Metrics: Accuracy, Inference speed

Adversarial Robustness

AdvGLUE

Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale …

📊 10 results
📏 Metrics: Accuracy

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 5 results
📏 Metrics: Accuracy, Robust Accuracy, Attack: AutoAttack

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 2 results
📏 Metrics: Clean Accuracy, AutoAttacked Accuracy

ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …

📊 4 results
📏 Metrics: Accuracy

ImageNet-A

The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models. Source: [On Robustness …

📊 4 results
📏 Metrics: Accuracy

ImageNet-C

ImageNet-C is an open source data set that consists of algorithmically generated corruptions (blur, noise) applied to the ImageNet test-set. …

📊 4 results
📏 Metrics: mean Corruption Error (mCE)

Stylized ImageNet

The Stylized-ImageNet dataset is created by removing local texture cues in ImageNet while retaining global shape information on natural images …

📊 4 results
📏 Metrics: Accuracy

Classification

Adult

Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records …

📊 1 results
📏 Metrics: AUROC

BIOSCAN_1M_Insect Dataset

In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-1M Insect …

📊 2 results
📏 Metrics: Macro F1

BiasBios

The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to …

📊 1 results
📏 Metrics: 1:1 Accuracy

BoolQ

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …

📊 2 results
📏 Metrics: Test Accuracy

Brain Tumor MRI Dataset

This dataset is a combination of the following three datasets : figshare, SARTAJ dataset and Br35H This dataset contains 7022 …

📊 1 results
📏 Metrics: F1 score

CIFAKE: Real and AI-Generated Synthetic Images

The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …

📊 1 results
📏 Metrics: Validation Accuracy

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 1 results
📏 Metrics: Accuracy

CIFAR-10C

Common corruptions dataset for CIFAR10

📊 1 results
📏 Metrics: Accuracy on Brightness Corrupted Images

COVID-19 Image Data Collection

Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it …

📊 1 results
📏 Metrics: Accuracy

CWRU Bearing Dataset

Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and …

📊 1 results
📏 Metrics: 10 fold Cross validation

Chest X-Ray Images (Pneumonia)

The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia …

📊 1 results
📏 Metrics: Accuracy

ForgeryNet

We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across …

📊 3 results
📏 Metrics: AUC, Accuracy

Full-body Parkinson’s disease dataset

A public data set of walking full-body kinematics and kinetics in individuals with Parkinson’s disease

📊 7 results
📏 Metrics: F1-score (weighted)

HOWS

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile …

📊 1 results
📏 Metrics: Overall accuracy after last sequence

HRF

The HRF dataset is a dataset for retinal vessel segmentation which comprises 45 images and is organized as 15 subsets. …

📊 1 results
📏 Metrics: Accuracy

IRFL: Image Recognition of Figurative Language

The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel …

📊 1 results
📏 Metrics: 1-of-100 Accuracy

ISIC 2019

The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across …

📊 1 results
📏 Metrics: Balanced Multi-Class Accuracy

ImageNet C-OOD (class-out-of-distribution)

This dataset was presented as part of the ICLR 2023 paper 𝘈 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘧𝘰𝘳 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘪𝘯𝘨 𝘊𝘭𝘢𝘴𝘴-𝘰𝘶𝘵-𝘰𝘧-𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘪𝘵𝘴 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯 …

📊 5 results
📏 Metrics: Detection AUROC (severity 0), Detection AUROC (severity 5), Detection AUROC (severity 10)

InDL

Dataset Introduction In this work, we introduce the In-Diagram Logic (InDL) dataset, an innovative resource crafted to rigorously evaluate the …

📊 9 results
📏 Metrics: Average Recall

LES-AV

This data set comprises 22 fundus images with their corresponding manual annotations for the blood vessels, separated as arteries and …

📊 1 results
📏 Metrics: Accuracy

Liver-US

The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. …

📊 1 results
📏 Metrics: AUC

MHIST

The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each …

📊 6 results
📏 Metrics: Accuracy

MedSecId

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are …

📊 1 results
📏 Metrics: 1 shot Micro-F1

MixedWM38

MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect …

📊 1 results
📏 Metrics: Accuracy, MCC

MuReD Dataset

Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. …

📊 1 results
📏 Metrics: ML F1, ML mAP, ML AUC

N-CARS

A large real-world event-based dataset for object classification. Source: HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification

📊 6 results
📏 Metrics: Accuracy (%), Architecture, Representation, Representation Time( ms / 100ms events), Inference Time, Params (M)

N-ImageNet

The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera …

📊 9 results
📏 Metrics: Accuracy (%)

RITE

The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries …

📊 1 results
📏 Metrics: Accuracy

RSSCN7

he RSSCN7 dataset contains satellite images acquired from Google Earth, which is originally collected for remote sensing scene classification. We …

📊 1 results
📏 Metrics: 1:1 Accuracy

RTE

The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and …

📊 2 results
📏 Metrics: Test Accuracy

SGD

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. …

📊 1 results
📏 Metrics: F1 (Seqeval)

SHD - Adding

This dataset is based on the Spiking Heidelberg Digits (SHD) dataset. Sample inputs consist of two spike encoded digits sampled …

📊 3 results
📏 Metrics: Accuracy (%)

SPOT-10

The SPOTS-10 dataset is an extensive collection of grayscale images showcasing diverse patterns found in ten animal species. Specifically, SPOTS-10 …

📊 9 results
📏 Metrics: Accuracy

SST-2

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the …

📊 2 results
📏 Metrics: Test Accuracy

Sentiment140

Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter. Source: …

📊 1 results
📏 Metrics: Accuracy

SimGas

This dataset consists of computer-generated images for gas leakage segmentation. It features diverse backgrounds, interfering foreground objects, and precise ground …

📊 1 results
📏 Metrics: Frame Level Accuracy

Sound-based drone fault classification using multitask learning

arxiv : https://arxiv.org/abs/2304.11708 Accepted at 29th International Congress on Sound and Vibration (ICSV29). The drone has been used for various …

📊 1 results
📏 Metrics: macro f1 score (A(100), B(100), C(100) Avg.)

TACM12K

Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, …

📊 1 results
📏 Metrics: Accuracy

TCGA

📊 1 results
📏 Metrics: AUPRC, AUROC

TLF2K

Table-LastFm2K (TLF2K) is a relational table dataset derived from the classical LastFM2K dataset. It contains three tables: artists, user_artists, and …

📊 1 results
📏 Metrics: Accuracy

TML1M

Table-MovieLens1M (TML1M) is a relational table dataset derived from the classical MovieLens1M dataset. It consists of three tables: users, movies, …

📊 1 results
📏 Metrics: Accuracy

WSC

The Winograd Schema Challenge was introduced both as an alternative to the Turing Test and as a test of a …

📊 2 results
📏 Metrics: Test Accuracy

WiC

WiC is a benchmark for the evaluation of context-sensitive word embeddings. WiC is framed as a binary classification task. Each …

📊 2 results
📏 Metrics: Test Accuracy

XImageNet-12

Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background …

📊 3 results
📏 Metrics: Robustness Score

Fairness

DiveFace

A new face annotation dataset with balanced distribution between genders and ethnic origins. Source: [SensitiveNets: Learning Agnostic Representations with Application …

📊 1 results
📏 Metrics: Degree of Bias (DoB)

MORPH

MORPH is a facial age estimation dataset, which contains 55,134 facial images of 13,617 subjects ranging from 16 to 77 …

📊 1 results
📏 Metrics: Degree of Bias (DoB)

UTKFace

The UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116 years old). The …

📊 1 results
📏 Metrics: Degree of Bias (DoB)

Handwritten Text Recognition

Belfort

The Belfort dataset This dataset includes minutes of Belfort municipal council drawn up between 1790 and 1946. Documents include …

📊 4 results
📏 Metrics: CER (%), WER (%)

Bentham

Bentham manuscripts refers to a large set of documents that were written by the renowned English philosopher and reformer Jeremy …

📊 1 results
📏 Metrics: CER

Digital Peter

Digital Peter is a dataset of Peter the Great's manuscripts annotated for segmentation and text recognition. The dataset may be …

📊 1 results
📏 Metrics: CER

HKR

The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains …

📊 1 results
📏 Metrics: CER

IAM

The IAM database contains 13,353 images of handwritten lines of text created by 657 writers. The texts those writers transcribed …

📊 16 results
📏 Metrics: CER, WER

IAM(line-level)

The IAM database contains 13,353 images of handwritten lines of text created by 657 writers. The texts those writers transcribed …

📊 5 results
📏 Metrics: Test CER, Test WER

LAM(line-level)

Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main …

📊 6 results
📏 Metrics: Test CER, Test WER

READ 2016

This dataset arises from the READ project (Horizon 2020). The dataset consists of a subset of documents from the Ratsprotokolle …

📊 2 results
📏 Metrics: CER (%), WER (%)

READ2016(line-level)

This dataset arises from the READ project (Horizon 2020). The dataset consists of a subset of documents from the Ratsprotokolle …

📊 5 results
📏 Metrics: Test CER, Test WER

SIMARA

Description We propose a new database for information extraction from historical handwritten documents. The corpus includes 5,393 finding aids …

📊 1 results
📏 Metrics: CER (%), WER (%)

Saint Gall

Saint Gall dataset contains handwritten historical manuscripts written in Latin that date back to the 9th century. It consists of …

📊 1 results
📏 Metrics: CER

Image Classification

AIDER

Dataset aimed to do automated aerial scene classification of disaster events from on-board a UAV. Source: [Deep-Learning-Based Aerial Image Classification …

📊 1 results
📏 Metrics: Test F1 score

AIDERV2

The dataset contains aerial images containing three commonly occurring natural disasters earthquake/collapsed buildings, flood, wildfire/fire, and a normal class; do …

📊 1 results
📏 Metrics: Test F1 score

AmsterTime

AmsterTime dataset offers a collection of 2,500 well-curated images matching the same scene from a street view matched to historical …

📊 1 results
📏 Metrics: Accuracy

ArtDL

ArtDL is a novel painting data set for iconography classification composed of images collected from online sources. Most of the …

📊 1 results
📏 Metrics: Average Precision, F1

BreakHis

The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109 microscopic images of breast tumor tissue collected from 82 …

📊 2 results
📏 Metrics: Average Test Accuracy over all magnifications

CARS196

CARS196 is composed of 16,185 car images of 196 classes.

📊 1 results
📏 Metrics: Accuracy

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 241 results
📏 Metrics: Percentage correct, Top-1 Accuracy, Accuracy, Parameters, Top 1 Accuracy, F1, Cross Entropy Loss

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 197 results
📏 Metrics: Percentage correct, PARAMS, Accuracy, Top 1 Accuracy

CINIC-10

CINIC-10 is a dataset for image classification. It has a total of 270,000 images, 4.5 times that of CIFAR-10. It …

📊 9 results
📏 Metrics: Accuracy, FLOPS, PARAMS

CUB-200-2011

The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely-used dataset for fine-grained visual categorization task. It contains 11,788 images of …

📊 1 results
📏 Metrics: Accuracy

Caltech-256

Caltech-256 is an object recognition dataset containing 30,607 real-world images, of different sizes, spanning 257 classes (256 object classes and …

📊 4 results
📏 Metrics: Accuracy

Causal3DIdent

Update on 3DIdent, where we introduce six additional object classes (Hare, Dragon, Cow, Armadillo, Horse, and Head), and impose a …

📊 2 results
📏 Metrics: Accuracy

Chaoyang

Chaoyang dataset contains 1111 normal, 842 serrated, 1404 adenocarcinoma, 664 adenoma, and 705 normal, 321 serrated, 840 adenocarcinoma, 273 adenoma …

📊 1 results
📏 Metrics: Accuracy

Clothing1M

Clothing1M contains 1M clothing images in 14 classes. It is a dataset with noisy labels, since the data is collected …

📊 49 results
📏 Metrics: Accuracy

ColonINST-v1 (Seen)

ColonINST is a large-scale instruction tuning dataset designed for multimodal analysis in colonoscopy. This dataset comprises 62 categories, 303,001 colonoscopy …

📊 17 results
📏 Metrics: Accuray

ColonINST-v1 (Unseen)

ColonINST is a large-scale instruction tuning dataset designed for multimodal analysis in colonoscopy. This dataset comprises 62 categories, 303,001 colonoscopy …

📊 17 results
📏 Metrics: Accuray

Colored-MNIST(with spurious correlation)

This is a dataset with spurious correlations which can be used to evaluate machine learning methods for out-of-distribution generalization, causal …

📊 6 results
📏 Metrics: Accuracy

DF20

Danish Fungi 2020 (DF20) is a fine-grained dataset and benchmark. The dataset, constructed from observations submitted to the Danish Fungal …

📊 19 results
📏 Metrics: Top-1, Top-3, F1 - macro

DF20 - Mini

Danish Fungi 2020 (DF20) is a novel fine-grained dataset and benchmark. The dataset, constructed from observations submitted to the Danish …

📊 19 results
📏 Metrics: Top-1, Top-3, F1 - macro

DTD

The Describable Textures Dataset (DTD) contains 5640 texture images in the wild. They are annotated with human-centric attributes inspired by …

📊 11 results
📏 Metrics: Accuracy

DVS128 Gesture

Comprises 11 hand gesture categories from 29 subjects under 3 illumination conditions. Source: [A Low Power, Fully Event-Based Gesture Recognition …

📊 1 results
📏 Metrics: Accuracy

ESC-50

The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. …

📊 1 results
📏 Metrics: Top 1 Accuracy

EuroSAT

Eurosat is a dataset and deep learning benchmark for land use and land cover classification. The dataset is based on …

📊 14 results
📏 Metrics: Accuracy (%)

EuroSAT-SAR

A SAR version of the EuroSAT dataset. The images were collected from Sentinel-1 GRD products (two bands VV and VH) …

📊 3 results
📏 Metrics: Overall Accuracy

FEMNIST

See paper: Caldas, Sebastian, et al. "Leaf: A benchmark for federated settings." arXiv preprint arXiv:1812.01097 (2018).

📊 1 results
📏 Metrics: Accuracy

FMD (materials)

Sharan, Lavanya, Ruth Rosenholtz, and Edward Adelson. "Material perception: What can you see in a brief glance?." Journal of Vision …

📊 1 results
📏 Metrics: Accuracy (%)

Fashion-MNIST

Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …

📊 33 results
📏 Metrics: Percentage error, Accuracy, Trainable Parameters, NMI, Power consumption

FlickrLogos-32

Object detection benchmark for logo detection. Images are natural scenes. Each image contains multiple objects, and each image has a …

📊 3 results
📏 Metrics: Accuracy

Food-101

The Food-101 dataset consists of 101 food categories with 750 training and 250 test images per category, making a total …

📊 10 results
📏 Metrics: Accuracy (%)

Food-101N

The Food-101N dataset is introduced in "CleanNet: Transfer Learning for Scalable Image Training with Label Noise (CVPR'18). It is an …

📊 4 results
📏 Metrics: Accuracy

GTSRB

The German Traffic Sign Recognition Benchmark (GTSRB) contains 43 classes of traffic signs, split into 39,209 training images and 12,630 …

📊 1 results
📏 Metrics: F1

GasHisSDB

Four pathologists from Longhua Hospital Shanghai University of Traditional Chinese Medicine provide 600 images of gastric cancer pathology images at …

📊 8 results
📏 Metrics: Accuracy, Precision, F1-Score

Gaze-CIFAR-10

We construct Gaze-CIFAR-10, a gaze-augmented image dataset based on the standard CIFAR-10 benchmark, enhanced with human eye-tracking annotations collected using …

📊 2 results
📏 Metrics: 1:1 Accuracy

HErlev

📊 1 results
📏 Metrics: Accuracy

Id Pattern Dataset

After defining a taxonomy of the main stone deterioration patterns and anomalies, we selected 354 highly representative images of stone-built …

📊 3 results
📏 Metrics: Percentage correct

ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …

📊 1020 results
📏 Metrics: Top 1 Accuracy, Number of params, GFLOPs, Hardware Burden, Top 5 Accuracy, Operations per network pass

ImageNet-100 (TEMI Split)

This split was introduced in TEMI (BMVC 2023) Adaloglou, Nikolas, Felix Michels, Hamza Kalisch, and Markus Kollmann. "Exploring the Limits …

📊 2 results
📏 Metrics: Percentage correct, Params

ImageNet-32

Imagenet32 is a huge dataset made up of small images called the down-sampled version of Imagenet. Imagenet32 is composed of …

📊 1 results
📏 Metrics: Top 1 Error

ImageNet-64

Imagenet64 is a massive dataset of small images called the down-sampled version of Imagenet. Imagenet64 comprises 1,281,167 training data and …

📊 1 results
📏 Metrics: Top 1 Error

ImageNet-9

ImageNet-9 consists of images with different amounts of background and foreground signal, which you can use to measure the extent …

📊 1 results
📏 Metrics: Top 1 Accuracy

ImageNet-P

ImageNet-P consists of noise, blur, weather, and digital distortions. The dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny …

📊 1 results
📏 Metrics: Top 5 Accuracy

ImageNet-Sketch

ImageNet-Sketch data set consists of 50,889 images, approximately 50 images for each of the 1000 ImageNet classes. The data set …

📊 1 results
📏 Metrics: Accuracy

Imagenette

Imagenette is a subset of 10 easily classified classes from Imagenet (bench, English springer, cassette player, chain saw, church, French …

📊 2 results
📏 Metrics: Accuracy

Intel Image Classification

Context This is image data of Natural Scenes around the world. Content This Data contains around 25k images of size …

📊 2 results
📏 Metrics: Accuracy

JFT-300M

JFT-300M is an internal Google dataset used for training image classification models. Images are labeled using an algorithm that uses …

📊 4 results
📏 Metrics: prec@1

KMNIST

📊 1 results
📏 Metrics: Accuracy

KTH-TIPS2

The KTH-TIPS (Textures under varying Illumination, Pose and Scale) image database was created to extend the CUReT database in two …

📊 1 results
📏 Metrics: Accuracy (%)

Kuzushiji-MNIST

Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images). Since MNIST restricts us to 10 classes, …

📊 14 results
📏 Metrics: Accuracy, Error, Trainable Parameters

Kvasir

The KVASIR Dataset was released as part of the medical multimedia challenge presented by MediaEval. It is based on images …

📊 3 results
📏 Metrics: Accuracy, F1

LIMUC

The LIMUC dataset is the largest publicly available labeled ulcerative colitis dataset that compromises 11276 images from 564 patients and …

📊 1 results
📏 Metrics: Quadratic Weighted Kappa

LabelMe

LabelMe database is a large collection of images with ground truth labels for object detection and recognition. The annotations come …

📊 1 results
📏 Metrics: Test Accuracy

Large Labelled Logo Dataset (L3D)

It is composed of around 770k of color 256x256 RGB images extracted from the European Union Intellectual Property Office (EUIPO) …

📊 2 results
📏 Metrics: Eval F1

LeafNet

The PlantVillage dataset, with over 54,000 images spanning 14 plant species and 26 disease types, has been widely used for …

📊 1 results
📏 Metrics: Accuracy (Top-1)

MAMe

The MAMe dataset contains images of high-resolution and variable shape of artworks from 3 different museums: - The Metropolitan Museum …

📊 4 results
📏 Metrics: Acc

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …

📊 74 results
📏 Metrics: Percentage error, Accuracy, Trainable Parameters, Cross Entropy Loss, Epochs, Top 1 Accuracy

Malaria Dataset

The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells. Source: Malaria Dataset

📊 1 results
📏 Metrics: Acc. (test), PARAMS

MultiMNIST

The MultiMNIST dataset is generated from MNIST. The training and tests are generated by overlaying a digit on top of …

📊 1 results
📏 Metrics: Percentage error

N-Caltech 101

The Neuromorphic-Caltech101 (N-Caltech101) dataset is a spiking version of the original frame-based Caltech101 dataset. The original dataset contained both a …

📊 1 results
📏 Metrics: Accuracy

N-MNIST

Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the …

📊 4 results
📏 Metrics: Accuracy

NCT-CRC-HE-100K

The NCT-CRC-HE-100K dataset is a set of 100,000 non-overlapping image patches extracted from 86 H$\&$E stained human cancer tissue slides …

📊 1 results
📏 Metrics: F1

ObjectNet

ObjectNet is a test set of images collected directly using crowd-sourcing. ObjectNet is unique as the objects are captured at …

📊 94 results
📏 Metrics: Top-1 Accuracy, Top-5 Accuracy

OmniBenchmark

Omni-Realm Benchmark (OmniBenchmark) is a diverse (21 semantic realm-wise datasets) and concise (realm-wise datasets have no concepts overlapping) benchmark for …

📊 22 results
📏 Metrics: Average Top-1 Accuracy

Oracle-MNIST

We introduce the Oracle-MNIST dataset, comprising of 2828 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern …

📊 4 results
📏 Metrics: Accuracy, Trainable Parameters

Oxford-IIIT Pet Dataset

The Oxford-IIIT Pet Dataset has 37 categories with roughly 200 images for each class. The images have a large variations …

📊 3 results
📏 Metrics: Accuracy, PARAMS, FLOPS

Oxford-IIIT Pets

The Oxford-IIIT Pet Dataset is a 37-category pet dataset with roughly 200 images for each class. The images have large …

📊 6 results
📏 Metrics: Accuracy, Per-Class Accuracy

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are: Person: person …

📊 1 results
📏 Metrics: Accuracy

PRImA

The Prima head pose dataset consists of 2790 images of 15 persons recorded twice. Pitch values lie in the interval …

📊 1 results
📏 Metrics: Percentage correct

Places205

The Places205 dataset is a large-scale scene-centric dataset with 205 common scene categories. The training dataset contains around 2,500,000 images …

📊 15 results
📏 Metrics: Top 1 Accuracy

Places365

The Places365 dataset is a scene recognition dataset. It is composed of 10 million images comprising 434 scene classes. There …

📊 6 results
📏 Metrics: Top 1 Accuracy

PlantDoc

PlantDoc is a dataset for visual plant disease detection. The dataset contains 2,598 data points in total across 13 plant …

📊 1 results
📏 Metrics: PARAMS, Accuracy

PlantVillage

The PlantVillage dataset consists of 54303 healthy and unhealthy leaf images divided into 38 categories by species and disease.

📊 1 results
📏 Metrics: Accuracy, F1, Testing Ratio

QMNIST

The exact pre-processing steps used to construct the MNIST dataset have long been lost. This leaves us with no reliable …

📊 1 results
📏 Metrics: Accuracy (%)

RESISC45

RESISC45 dataset is a dataset for Remote Sensing Image Scene Classification (RESISC). It contains 31,500 RGB images of size 256×256 …

📊 16 results
📏 Metrics: Top 1 Accuracy, F1, zero-shot Acc

Red MiniImageNet 20% label noise

Part of the Controlled Noisy Web Labels Dataset.

📊 5 results
📏 Metrics: Accuracy

Red MiniImageNet 40% label noise

Part of the Controlled Noisy Web Labels Dataset.

📊 5 results
📏 Metrics: Accuracy

Red MiniImageNet 80% label noise

Part of the Controlled Noisy Web Labels Dataset.

📊 5 results
📏 Metrics: Accuracy

SIPaKMeD

  • a high-level explanation of the dataset characteristics * explain motivations and summary of its content * potential use cases …
📊 1 results
📏 Metrics: Accuracy

STL-10

The STL-10 is an image dataset derived from ImageNet and popularly used to evaluate algorithms of unsupervised feature learning or …

📊 97 results
📏 Metrics: Percentage correct, FLOPS, PARAMS

SUN397

The Scene UNderstanding (SUN) database contains 899 categories and 130,519 images. There are 397 well-sampled categories to evaluate numerous state-of-the-art …

📊 1 results
📏 Metrics: Accuracy

SVHN

Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits …

📊 57 results
📏 Metrics: Percentage error, Percentage correct

So2Sat LCZ42

So2Sat LCZ42 consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in …

📊 1 results
📏 Metrics: Accuracy

Sports10

  • Games dataset containing 100,000 Gameplay Images of 175 Video Games across 10 Sports Genres - AMERICAN FOOTBALL, BASKETBALL, BIKE …
📊 1 results
📏 Metrics: Validation Accuracy

Stanford Cars

The Stanford Cars dataset consists of 196 classes of cars with a total of 16,185 images, taken from the rear. …

📊 24 results
📏 Metrics: Accuracy

Stanford Online Products

Stanford Online Products (SOP) dataset has 22,634 classes with 120,053 product images. The first 11,318 classes (59,551 images) are split …

📊 1 results
📏 Metrics: Accuracy

Visual Wake Words

Visual Wake Words represents a common microcontroller vision use-case of identifying whether a person is present in the image or …

📊 4 results
📏 Metrics: Accuracy

VizWiz-Classification

Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform …

📊 1 results
📏 Metrics: Accuracy

WebVision

The WebVision dataset is designed to facilitate the research on learning visual representation from noisy web data. It is a …

📊 2 results
📏 Metrics: Top 1 Accuracy, Top 5 Accuracy

iNaturalist

The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to …

📊 18 results
📏 Metrics: Top 1 Accuracy, Top 5 Accuracy, Top 3 Error, Overall

iWildCam2020-WILDS

The iWildCam2020-WILDS dataset is a variant of the iWildCam 2020 dataset. iWildCam2020-WILDS is a benchmark dataset designed to test OOD …

📊 6 results
📏 Metrics: Accuracy (Top-1)

smallNORB

The smallNORB dataset is a datset for 3D object recognition from shape. It contains images of 50 toys belonging to …

📊 6 results
📏 Metrics: Classification Error

Model extraction

UML Classes With Specs

Repository for UML-English data This repository contains the data used for "Extraction of UML Class Diagrams from Natural Language …

📊 1 results
📏 Metrics: Exact Match

Red Teaming

SUDO Dataset

SUDO is a benchmark of 50 real-world malicious tasks designed to evaluate LLM-based computer agents in live desktop and web …

📊 1 results
📏 Metrics: Attack Success Rate

Text Generation

CNN/Daily Mail

CNN/Daily Mail is a dataset for text summarization. Human generated abstractive summary bullets were generated from news stories in CNN …

📊 1 results
📏 Metrics: ROUGE-L

COCO Captions

COCO Captions contains over one and a half million captions describing over 330,000 images. For the training and validation images, …

📊 4 results
📏 Metrics: BLEU-2, BLEU-3, BLEU-4, BLEU-5

CSL

CSL is a synthetic dataset introduced in Murphy et al. (2019) to test the expressivity of GNNs. In particular, graphs …

📊 1 results
📏 Metrics: ROUGE-L

CommonGen

CommonGen is constructed through a combination of crowdsourced and existing caption corpora, consists of 79k commonsense descriptions over 35k unique …

📊 4 results
📏 Metrics: CIDEr, METEOR, BLEU-4, SPICE

Czech restaurant information

Czech restaurant information is a dataset for NLG in task-oriented spoken dialogue systems with Czech as the target language. It …

📊 3 results
📏 Metrics: METEOR

DART

DART is a large dataset for open-domain structured data record to text generation. DART consists of 82,191 examples across different …

📊 3 results
📏 Metrics: BLEU, METEOR, FactSpotter

DailyDialog

DailyDialog is a high-quality multi-turn open-domain English dialog dataset. It contains 13,118 dialogues split into a training set with 11,118 …

📊 1 results
📏 Metrics: BLEU-1, BLEU-2, BLEU-3, BLEU-4

HarmfulQA

Paper | Github | Dataset| Model As a part of our research efforts toward making LLMs more safe for public …

📊 1 results
📏 Metrics: ASR

LCSTS

LCSTS is a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which …

📊 1 results
📏 Metrics: ROUGE-L

OpenWebText

OpenWebText is an open-source recreation of the WebText corpus. The text is web content extracted from URLs shared on Reddit …

📊 2 results
📏 Metrics: eval_loss

ROCStories

ROCStories is a collection of commonsense short stories. The corpus consists of 100,000 five-sentence stories. Each story logically follows everyday …

📊 4 results
📏 Metrics: BLEU-1, Perplexity

ReDial

ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of …

📊 4 results
📏 Metrics: Distinct-3, Distinct-4, Distinct-2, Perplexity

SciQ

The SciQ dataset contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others. The questions are in …

📊 3 results
📏 Metrics: Accuracy