16k

ConceptNet

ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected …

UAVDB is a high-resolution RGB video dataset meticulously designed for UAV detection tasks across diverse scales and complex backgrounds. Comprising …

A real-world dataset, with hyper-accurate digital counterpart & comprehensive ground-truth annotation. Dataset Content - 200 sequences (~400 mins) - 398 …

📊 1 results

📏 Metrics: Accuracy, Completeness, Precision

[1]: https://www.projectaria.com/datasets/ase/ "" [2]: https://facebookresearch.github.io/projectaria_tools/docs/open_datasets/aria_synthetic_environments_dataset "" [3]: https://www.projectaria.com/research/ "" Aria Synthetic Environments is a large-scale, fully simulated dataset created by …

📊 1 results

📏 Metrics: Accuracy, Completeness, Precision, Recall

DTU

DTU MVS 2014 is a multi-view stereo dataset, which is an order of magnitude larger in number of scenes and …

📊 20 results

📏 Metrics: Overall, Acc, Comp

Scan2CAD

Scan2CAD is an alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoints pairs between 14225 (3049 unique) CAD …

📊 2 results

📏 Metrics: Average Accuracy

ScanNet

ScanNet is an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled …

📊 1 results

📏 Metrics: 3DIoU, Chamfer Distance, L1

ShapeNet

ShapeNet is a large scale repository for 3D CAD models developed by researchers from Stanford University, Princeton University and the …

WFDD is a dataset for benchmarking anomaly detection methods with a focus on textile inspection. It includes 4101 woven fabric …

📊 1 results

📏 Metrics: Detection AUROC, Segmentation AUPRO, Segmentation AUROC

voraus-AD

voraus-AD contains machine data of a collaborative robot, which moves a can by performing an industrial pick-and-place task. The samples …

📊 3 results

📏 Metrics: Avg. Detection AUROC

AutoML

Wine

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived …

📊 1 results

📏 Metrics: accuracy

BIG-bench Machine Learning

BIG-bench

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …

📊 1 results

📏 Metrics: Accuracy

Chatbot

AlpacaEval

The AlpacaEval set contains 805 instructions form self-instruct, open-assistant, vicuna, koala, hh-rlhf. Those were selected so that the AlpacaEval ranking …

Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, …

📊 1 results

📏 Metrics: Accuracy

97 synthetic datasets

97 synthetic datasets consists of 97 datasets (as illustrated in the figure) and can be used to test graph-based clustering …

📊 1 results

📏 Metrics: HIT-THE-BEST, Rank difference

Fashion-MNIST

Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …

📊 6 results

📏 Metrics: ARI, F1-score, NMI

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …

📊 6 results

📏 Metrics: ARI, F1-score, NMI

Olivetti face

This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge.

Permuted MNIST is an MNIST variant that consists of 70,000 images of handwritten digits from 0 to 9, where 60,000 …

📊 3 results

📏 Metrics: Average Accuracy, MLP Hidden Layers-width, Pretrained/Transfer Learning, BWT

Continual Pretraining

AG News

AG News (AG’s News Corpus) is a subdataset of AG's corpus of news articles constructed by assembling titles and description …

📊 1 results

📏 Metrics: F1 - macro

SciERC

SciERC dataset is a collection of 500 scientific abstract annotated with scientific entities, their relations, and coreference clusters. The abstracts …

📊 1 results

📏 Metrics: F1 (macro)

Contrastive Learning

10,000 People - Human Pose Recognition Data

Description: 10,000 People - Human Pose Recognition Data. This dataset includes indoor and outdoor scenes.This dataset covers males and females. …

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 5 results

📏 Metrics: Percentage error

ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …

📊 17 results

📏 Metrics: Accuracy (%)

Decision Making

NASA C-MAPSS

Engine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and …

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 15 results

📏 Metrics: NLL (bits/dim), Log-likelihood (nats)

Caltech-101

The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that …

📊 3 results

📏 Metrics: Negative ELBO, NLL, MMD-L2, COV-L2

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …

📊 6 results

📏 Metrics: NLL (bits/dim), Log-likelihood (nats), MMD-L2, COV-L2, NLL

Dimensionality Reduction

EMNIST

EMNIST (extended MNIST) has 4 times more data than MNIST. It is a set of handwritten digits with a 28 …

📊 2 results

📏 Metrics: Classification Accuracy

Domain Adaptation

General Classification

📏 Metrics: Per-Class Accuracy (1-shot), Per-Class Accuracy (2-shots), Per-Class Accuracy (5-shots), Per-Class Accuracy (10-shots), Per-Class Accuracy (20-shots)

SUN

When glancing at a magazine, or browsing the Internet, we are continuously being exposed to photographs. Despite of this overflow …

📊 5 results

📏 Metrics: Per-Class Accuracy (1-shot), Per-Class Accuracy (2-shots), Per-Class Accuracy (5-shots), Per-Class Accuracy (10-shots)

Incremental Learning

MLT17

Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …

📊 1 results

📏 Metrics: Acc

Inductive logic programming

RuDaS

Logical rules are a popular knowledge representation language in many domains. Recently, neural networks have been proposed to support the …

📊 4 results

📏 Metrics: H-Score, R-Score

Interpretable Machine Learning

CUB-200-2011

The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely-used dataset for fine-grained visual categorization task. It contains 11,788 images of …

📊 2 results

📏 Metrics: Top 1 Accuracy

Logical Reasoning

LingOly

This dataset is a benchmark for complex reasoning abilities in large language models, drawing on United Kingdom Linguistics Olympiad problems …

📊 11 results

📏 Metrics: Delta_NoContext, Exact Match Accuracy

RuWorldTree

RuWorldTree is a QA dataset with multiple-choice elementary-level science questions, which evaluate the understanding of core science facts. Motivation The …

📊 4 results

📏 Metrics: Accuracy

Winograd Automatic

The Winograd schema challenge composes tasks with syntactic ambiguity, which can be resolved with logic and reasoning. Motivation The dataset …

📊 4 results

📏 Metrics: Accuracy

Long-tail Learning

COCO-MLT

The COCO-MLT is created from MS COCO-2017, containing 1,909 images from 80 classes. The maximum of training number per class …

📊 11 results

📏 Metrics: Average mAP

EGTEA

Extended GTEA Gaze+ EGTEA Gaze+ is a large-scale dataset for FPV actions and gaze. It subsumes GTEA Gaze+ and comes …

📊 3 results

📏 Metrics: Average Precision, Average Recall

ImageNet-LT

ImageNet Long-Tailed is a subset of /dataset/imagenet dataset consisting of 115.8K images from 1000 categories, with maximally 1280 images per …

📊 65 results

📏 Metrics: Top-1 Accuracy

Lot-insts

LoT-insts contains over 25k classes whose frequencies are naturally long-tail distributed. Its test set from four different subsets: many-, medium-, …

📊 1 results

📏 Metrics: Macro-F1

MIMIC-CXR-LT

MIMIC-CXR-LT. We construct a single-label, long-tailed version of MIMIC-CXR in a similar manner. MIMIC-CXR is a multi-label classification dataset with …

📊 15 results

📏 Metrics: Balanced Accuracy

NIH-CXR-LT

NIH-CXR-LT. NIH ChestXRay14 contains over 100,000 chest X-rays labeled with 14 pathologies, plus a “No Findings” class. We construct a …

📊 15 results

📏 Metrics: Balanced Accuracy

Places-LT

Places-LT has an imbalanced training set with 62,500 images for 365 classes from Places-2. The class frequencies follow a natural …

📊 28 results

📏 Metrics: Top-1 Accuracy, Top 1 Accuracy

VOC-MLT

We construct the long-tailed version of VOC from its 2012 train-val set. It contains 1,142 images from 20 classes, with …

📊 11 results

📏 Metrics: Average mAP

mini-ImageNet-LT

mini-ImageNet was proposed by Matching networks for one-shot learning for few-shot learning evaluation, in an attempt to have a dataset …

📊 1 results

📏 Metrics: Error Rate

Medical Report Generation

HistGen WSI-Report Dataset

This dataset is composed of 7,753 pairs of whole slide images and their corresponding diagnostic reports, extracted from the TCGA …

📊 1 results

📏 Metrics: BLEU-4

IU X-Ray

IU X-ray (Demner-Fushman et al., 2016) is a set of chest X-ray images paired with their corresponding diagnostic reports. The …

📊 1 results

📏 Metrics: BLEU-4, BLEU-1, BLEU-2, BLEU-3, CIDEr, METEOR, ROUGE

MIMIC-CXR

MIMIC-CXR from Massachusetts Institute of Technology presents 371,920 chest X-rays associated with 227,943 imaging studies from 65,079 patients. The studies …

ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …

📊 12 results

📏 Metrics: Top-1

QNLI

The QNLI (Question-answering NLI) dataset is a Natural Language Inference dataset automatically derived from the Stanford Question Answering Dataset v1.1 …

📊 2 results

📏 Metrics: Accuracy

Model extraction

UML Classes With Specs

clintox

The ClinTox dataset compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. The …

📊 18 results

📏 Metrics: ROC-AUC, Molecules (M)

Multi-Label Classification

CheXpert

The CheXpert dataset contains 224,316 chest radiographs of 65,240 patients with both frontal and lateral views available. The task is …

📊 11 results

📏 Metrics: AVERAGE AUC ON 14 LABEL, NUM RADS BELOW CURVE

ChestX-ray14

ChestX-ray14 is a medical imaging dataset which comprises 112,120 frontal-view X-ray images of 30,805 (collected from the year of 1992 …

📊 4 results

📏 Metrics: Average AUC on 14 label, Macro F1

MIMIC-CXR

MIMIC-CXR from Massachusetts Institute of Technology presents 371,920 chest X-rays associated with 227,943 imaging studies from 65,079 patients. The studies …

📊 1 results

📏 Metrics: Average AUC on 14 label

MLRSNet

MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of …

📊 2 results

📏 Metrics: F1-score

MRNet

The MRNet dataset consists of 1,370 knee MRI exams performed at Stanford University Medical Center. The dataset contains 1,104 (80.6%) …

📊 1 results

📏 Metrics: Average AUC, AUC on Abnormality (ABN), AUC on ACL Tear (ACL), AUC on Meniscus Tear (MEN), Average Accuracy, Accuracy on Abnormality (ABN), Accuracy on ACL Tear (ACL), Accuracy on Meniscus Tear (MEN)

NUS-WIDE

The NUS-WIDE dataset contains 269,648 images with a total of 5,018 tags collected from Flickr. These images are manually annotated …

📊 9 results

📏 Metrics: MAP

OpenImages-v6

OpenImages V6 is a large-scale dataset , consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. …

📊 4 results

📏 Metrics: mAP

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are: Person: person …

📊 16 results

📏 Metrics: mAP

Multi-Label Text Classification

📏 Metrics: Error

ChestX-ray14

ChestX-ray14 is a medical imaging dataset which comprises 112,120 frontal-view X-ray images of 30,805 (collected from the year of 1992 …

📊 1 results

📏 Metrics: delta_m

NYUv2

The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both …

📊 2 results

📏 Metrics: Mean IoU

QM9

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …

📊 2 results

📏 Metrics: ∆m%

UTKFace

The UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116 years old). The …

📊 1 results

📏 Metrics: delta_m

Multi-agent Reinforcement Learning

SMAC-Exp

The StarCraft Multi-Agent Challenges+ requires agents to learn completion of multi-stage tasks and usage of environmental factors without precise reward …

📊 1 results

📏 Metrics: Median Win Rate

Multiple Instance Learning

📏 Metrics: Accuracy (%), FLOPS, PARAMS

Novel Class Discovery

SVHN

Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits …

📊 1 results

📏 Metrics: Clustering Accuracy

Optical Character Recognition (OCR)

FSNS - Test

Arabic handwriting dataset.

📊 3 results

📏 Metrics: Sequence error

I2L-140K

Introduced by Singh, Sumeet S.. “Teaching Machines to Code: Neural Markup Generation with Visual Attention.” ArXiv abs/1802.05415 (2018): n. pag. …

📊 2 results

📏 Metrics: BLEU

VideoDB's OCR Benchmark Public Collection

Dataset Introduction This dataset leverages VideoDB's Public Collection to offer a diverse range of videos featuring text-containing scenes. It …

📊 5 results

📏 Metrics: Average Accuracy, Character Error Rate (CER), Word Error Rate (WER)

im2latex-100k

A prebuilt dataset for OpenAI's task for image-2-latex system. Includes total of ~100k formulas and images splitted into train, validation …

📊 1 results

📏 Metrics: BLEU

Outlier Detection

ECG5000

The original dataset for "ECG5000" is a 20-hour long ECG downloaded from Physionet. The name is BIDMC Congestive Heart Failure …

📊 2 results

📏 Metrics: Accuracy

Fashion-MNIST

Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …

📊 1 results

📏 Metrics: AUROC

SKAB

SKAB is designed for evaluating algorithms for anomaly detection. The benchmark currently includes 30+ datasets plus Python modules for algorithms’ …

📊 1 results

📏 Metrics: Average F1

Partial Label Learning

ISIC 2019

The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across …

📊 1 results

📏 Metrics: Balanced Multi-Class Accuracy

M-VAD Names

The dataset contains the annotations of characters' visual appearances, in the form of tracks of face bounding boxes, and the …

The dataset has two years of user awards on a question-answering website: each user received a sequence of badges and …

iris

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician, …

📊 1 results

📏 Metrics: 10 Images, 4*4 Stitching, Exact Accuracy

Reinforcement Learning (RL)

ProcGen

Procgen Benchmark includes 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns …

📊 2 results

📏 Metrics: Mean Normalized Performance

Representation Learning

Animals-10

It contains about 28K medium quality animal images belonging to 10 categories: dog, cat, horse, spyder, butterfly, chicken, sheep, cow, …

📊 1 results

📏 Metrics: 1:1 Accuracy

SciDocs

SciDocs evaluation framework consists of a suite of evaluation tasks designed for document-level tasks. Source: Allen Institute for AI

📊 7 results

📏 Metrics: Avg.

Sports10

Games dataset containing 100,000 Gameplay Images of 175 Video Games across 10 Sports Genres - AMERICAN FOOTBALL, BASKETBALL, BIKE …

The ToolLens dataset consists of 18,770 concise yet intentionally multifaceted queries, each associated with 1 to 3 verified tools out …

📊 1 results

📏 Metrics: COMP@

Semantic Similarity

BIOSSES

The BIOSSES data set comprises total 100 sentence pairs all of which were selected from the "[TAC2 Biomedical Summarization Track …

📊 3 results

📏 Metrics: Pearson Correlation

CHIP-STS

CHIP Semantic Textual Similarity, a dataset for sentence similarity in the non-i.i.d. (non-independent and identically distributed) setting, is used for …

📊 1 results

📏 Metrics: Macro F1

SICK

The Sentences Involving Compositional Knowledge (SICK) dataset is a dataset for compositional distributional semantics. It includes a large number of …

📊 5 results

📏 Metrics: MSE, Pearson Correlation, Spearman Correlation

Sparse Learning

ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …

Office-Home

Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The …

📊 5 results

📏 Metrics: Accuracy

Retinal Fundus MultiDisease Image Dataset (RFMiD)

According to the WHO, World report on vision 2019, the number of visually impaired people worldwide is estimated to be …

📊 1 results

📏 Metrics: AUROC

Two-sample testing

HIGGS Data Set

The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by …

An open-ended VideoQA benchmark that aims to: i) provide a well-defined evaluation by including five correct answer annotations per question …

📊 1 results

📏 Metrics: Accuracy

Zero-shot Generalization

CALVIN

CALVIN (Composing Actions from Language and Vision), is an open-source simulated benchmark to learn long-horizon language-conditioned robot manipulation tasks.

📊 5 results

📏 Metrics: Avg. sequence length

parameter-efficient fine-tuning

BoolQ

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …

📊 4 results

📏 Metrics: Accuracy (% )

HellaSwag

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are …

📊 3 results

📏 Metrics: Accuracy (% )

WinoGrande

WinoGrande is a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the …

📊 3 results

📏 Metrics: Accuracy (% )

regression

California Housing Prices

Median house prices for California districts derived from the 1990 census. About Dataset Context This is the dataset used in …

📊 3 results

📏 Metrics: R2 Score, lambda

Car_Price_Prediction

In this dataset we added [Company Name, Car Model, Car Type, Fuel Type, Transmission, Engine (cc), Mileage, Kms_driven, Buyers, Horsepower …

📊 1 results

📏 Metrics: R Squared

Concrete Compressive Strength

Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age …

📊 3 results

📏 Metrics: R2 Score, lambda

Medical Cost Personal Dataset

This dataset contains demographic and personal health information for individuals, along with the corresponding medical insurance charges billed to them. …

📊 3 results

📏 Metrics: R2 Score, lambda

Machine Learning Benchmarks

16k

2D Object Detection

3D Reconstruction

Anomaly Detection

AutoML

BIG-bench Machine Learning

Chatbot

Classification

Clustering Algorithms Evaluation

Continual Learning

Continual Pretraining

Contrastive Learning

Core set discovery

Data Augmentation

Decision Making