Machine Learning Benchmarks

Browse 286 benchmarks across 30 tasks
← ML Research Wiki / Benchmarks / Graphs
Clear
Browse by Category

10-shot image generation

FQL-Driving

FQL-driving

📊 1 results
📏 Metrics: 0-shot MRR

FlyingThings3D

FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …

📊 1 results
📏 Metrics: 0..5sec

MEAD

Multi-view Emotional Audio-visual Dataset

📊 1 results
📏 Metrics: 12k

Music21

Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …

📊 1 results
📏 Metrics: 0..5sec

3D Generation

E.T. the Exceptional Trajectories

Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …

📊 4 results
📏 Metrics: FD_ClaTr, ClaTr-Score, Classifier-F1

3D Interacting Hand Pose Estimation

InterHand2.6M

The InterHand2.6M dataset is a large-scale real-captured dataset with accurate GT 3D interacting hand poses, used for 3D hand pose …

📊 8 results
📏 Metrics: MPJPE Test, MPVPE Test, MRRPE Test

Ancestor-descendant prediction

WN18RR

WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations …

📊 1 results
📏 Metrics: mAP-0%, mAP-50%, mAP-100%

Anomaly Detection

ADNI

Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment …

📊 1 results
📏 Metrics: AUC

AG News

AG News (AG’s News Corpus) is a subdataset of AG's corpus of news articles constructed by assembling titles and description …

📊 1 results
📏 Metrics: AUROC

BTAD

The BTAD ( beanTech Anomaly Detection) dataset is a real-world industrial anomaly dataset. The dataset contains a total of 2830 …

📊 12 results
📏 Metrics: Detection AUROC, Segmentation AUROC, Segmentation AP, Segmentation AUPRO

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 1 results
📏 Metrics: Mean AUC

COCO-OOC

COCO-OOC goes beyond standard object detection to ask the question: Which objects are out-of-context (OOC)? Given an image with a …

📊 1 results
📏 Metrics: AUC

CUHK Avenue

Avenue Dataset contains 16 training and 21 testing video clips. The videos are captured in CUHK campus avenue with 30652 …

📊 29 results
📏 Metrics: AUC, RBDC, TBDC, FPS

DIOR

Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …

📊 4 results
📏 Metrics: ROC AUC

Fashion-MNIST

Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …

📊 10 results
📏 Metrics: ROC AUC

Fishyscapes

Fishyscapes is a public benchmark for uncertainty estimation in a real-world task of semantic segmentation for urban driving. It evaluates …

📊 8 results
📏 Metrics: AP, FPR95

Forest CoverType

Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given …

📊 1 results
📏 Metrics: AUC

Hyper-Kvasir Dataset

HyperKvasir dataset contains 110,079 images and 374 videos where it captures anatomical landmarks and pathological and normal findings. A total …

📊 5 results
📏 Metrics: AUC

IITB Corridor

An abnormal activity data-set for research use that contains 4,83,566 annotated frames. Source: [Multi-timescale Trajectory Prediction for Abnormal Human Activity …

📊 1 results
📏 Metrics: AUC

ITDD

The Industrial Textile Defect Detection (ITDD) dataset includes 1885 industrial textile images categorized into 4 categories: cotton fabric, dyed fabric, …

📊 1 results
📏 Metrics: Detection AUROC, Segmentation AUROC

InsPLAD

InsPLAD is a Dataset for Power Line Asset Inspection containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. It contains 17 …

📊 4 results
📏 Metrics: Detection AUROC

KDD Cup 1999

This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held …

📊 1 results
📏 Metrics: F1-Score

Kaggle-Credit Card Fraud Dataset

The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred …

📊 1 results
📏 Metrics: AUC

LAG

Includes 5,824 fundus images labeled with either positive glaucoma (2,392) or negative glaucoma (3,432). Source: [Attention Based Glaucoma Detection: A …

📊 4 results
📏 Metrics: AUC

Lost and Found

Lost and Found is a novel lost-cargo image sequence dataset comprising more than two thousand frames with pixelwise annotations of …

📊 4 results
📏 Metrics: AP, FPR

MIT-BIH Arrhythmia Database

The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the …

📊 1 results
📏 Metrics: F1 score

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …

📊 5 results
📏 Metrics: ROC AUC

MPDD

MPDD is a dataset aimed at benchmarking visual defect detection methods in industrial metal parts manufacturing. It consists of more …

📊 14 results
📏 Metrics: Detection AUROC, Segmentation AUROC, Segmentation AUPRO

MVTEC 3D-AD

MVTec 3D Anomaly Detection Dataset (MVTec 3D-AD) is a comprehensive 3D dataset for the task of unsupervised anomaly detection and …

📊 2 results
📏 Metrics: Segmentation AUPRO, Detection AUROC, Segmentation AUROC

MVTec LOCO AD

MVTec Logical Constraints Anomaly Detection (MVTec LOCO AD) dataset is intended for the evaluation of unsupervised anomaly localization algorithms. The …

📊 35 results
📏 Metrics: Avg. Detection AUROC, Detection AUROC (only logical), Detection AUROC (only structural), Segmentation AU-sPRO (until FPR 5%)

Musk v1

The Musk dataset describes a set of molecules, and the objective is to detect musks from non-musks. This dataset describes …

📊 1 results
📏 Metrics: F1-Score

ODDS

Outliers or anomalies are instances that do not conform to the norm of a dataset. Outlier detection is an important …

📊 3 results
📏 Metrics: AUROC, F1

PAD Dataset

Multi-pose Anomaly Detection (MAD) dataset, which represents the first attempt to evaluate the performance of pose-agnostic anomaly detection. The MAD …

📊 2 results
📏 Metrics: Detection AUROC, Segmentation AUROC

Road Anomaly

This dataset contains images of unusual dangers which can be encountered by a vehicle on the road – animals, rocks, …

📊 9 results
📏 Metrics: AP, FPR95

SMD

a dataset of time-series anomaly detection

📊 1 results
📏 Metrics: Recall, precision, F1, F1-score

SVHN

Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits …

📊 1 results
📏 Metrics: Mean AUC

ShanghaiTech

The Shanghaitech dataset is a large-scale crowd counting dataset. It consists of 1198 annotated crowd images. The dataset is divided …

📊 28 results
📏 Metrics: AUC, RBDC, TBDC

ShanghaiTech Campus

The ShanghaiTech Campus dataset has 13 scenes with complex light conditions and camera angles. It contains 130 abnormal events and …

📊 1 results
📏 Metrics: AUC-ROC

Street Scene

Street Scene is a dataset for video anomaly detection. Street Scene consists of 46 training and 35 testing high resolution …

📊 1 results
📏 Metrics: AUC, RBDC, TBDC

TII-SSRC-23

The TII-SSRC-23 dataset offers a comprehensive collection of network traffic patterns, meticulously compiled to support the development and research of …

📊 1 results
📏 Metrics: AUC

Thyroid

Thyroid is a dataset for detection of thyroid diseases, in which patients diagnosed with hypothyroid or subnormal are anomalies against …

📊 2 results
📏 Metrics: AUC, Average Precision, F1-Score

UBnormal

UBnormal is a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. Unlike existing data sets, …

📊 13 results
📏 Metrics: AUC, RBDC, TBDC

UCF-Crime

The UCF-Crime dataset is a large-scale dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world …

📊 1 results
📏 Metrics: AUC

UCR Anomaly Archive

The UCR Anomaly Archive is a collection of 250 uni-variate time series collected in human medicine, biology, meteorology and industry. …

📊 24 results
📏 Metrics: Average F1, AUC ROC

UCSD Ped2

The UCSD Anomaly Detection Dataset was acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways. The crowd …

📊 9 results
📏 Metrics: AUC, FPS

UEA time-series datasets

Five datasets used in NeurTraL-AD paper: \textit{RacketSports (RS).} Accelerometer and gyroscope recording of players playing four different racket sports. Each …

📊 3 results
📏 Metrics: Avg. ROC-AUC

Vehicle Claims

The code to create the dataset is available here. The dataset used in the paper is available on github - …

📊 2 results
📏 Metrics: AUC

VisA

The VisA dataset contains 12 subsets corresponding to 12 different objects as shown in the above figure. There are 10,821 …

📊 45 results
📏 Metrics: Detection AUROC, Segmentation AUPRO (until 30% FPR), F1-Score, Segmentation AUPRO, Segmentation AUROC

WFDD

WFDD is a dataset for benchmarking anomaly detection methods with a focus on textile inspection. It includes 4101 woven fabric …

📊 1 results
📏 Metrics: Detection AUROC, Segmentation AUPRO, Segmentation AUROC

voraus-AD

voraus-AD contains machine data of a collaborative robot, which moves a can by performing an industrial pick-and-place task. The samples …

📊 3 results
📏 Metrics: Avg. Detection AUROC

Atomic number classification

CHILI-100K

The CHILI-100K dataset is a large-scale graph dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined …

📊 9 results
📏 Metrics: F1-score (Weighted)

CHILI-3K

The CHILI-3K dataset is a medium-scale graph dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from …

📊 9 results
📏 Metrics: F1-score (Weighted)

Classification

Adult

Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records …

📊 1 results
📏 Metrics: AUROC

BIOSCAN_1M_Insect Dataset

In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-1M Insect …

📊 2 results
📏 Metrics: Macro F1

BiasBios

The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to …

📊 1 results
📏 Metrics: 1:1 Accuracy

BoolQ

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …

📊 2 results
📏 Metrics: Test Accuracy

Brain Tumor MRI Dataset

This dataset is a combination of the following three datasets : figshare, SARTAJ dataset and Br35H This dataset contains 7022 …

📊 1 results
📏 Metrics: F1 score

CIFAKE: Real and AI-Generated Synthetic Images

The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …

📊 1 results
📏 Metrics: Validation Accuracy

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 1 results
📏 Metrics: Accuracy

CIFAR-10C

Common corruptions dataset for CIFAR10

📊 1 results
📏 Metrics: Accuracy on Brightness Corrupted Images

COVID-19 Image Data Collection

Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it …

📊 1 results
📏 Metrics: Accuracy

CWRU Bearing Dataset

Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and …

📊 1 results
📏 Metrics: 10 fold Cross validation

Chest X-Ray Images (Pneumonia)

The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia …

📊 1 results
📏 Metrics: Accuracy

ForgeryNet

We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across …

📊 3 results
📏 Metrics: AUC, Accuracy

Full-body Parkinson’s disease dataset

A public data set of walking full-body kinematics and kinetics in individuals with Parkinson’s disease

📊 7 results
📏 Metrics: F1-score (weighted)

HOWS

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile …

📊 1 results
📏 Metrics: Overall accuracy after last sequence

HRF

The HRF dataset is a dataset for retinal vessel segmentation which comprises 45 images and is organized as 15 subsets. …

📊 1 results
📏 Metrics: Accuracy

IRFL: Image Recognition of Figurative Language

The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel …

📊 1 results
📏 Metrics: 1-of-100 Accuracy

ISIC 2019

The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across …

📊 1 results
📏 Metrics: Balanced Multi-Class Accuracy

ImageNet C-OOD (class-out-of-distribution)

This dataset was presented as part of the ICLR 2023 paper 𝘈 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘧𝘰𝘳 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘪𝘯𝘨 𝘊𝘭𝘢𝘴𝘴-𝘰𝘶𝘵-𝘰𝘧-𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘪𝘵𝘴 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯 …

📊 5 results
📏 Metrics: Detection AUROC (severity 0), Detection AUROC (severity 5), Detection AUROC (severity 10)

InDL

Dataset Introduction In this work, we introduce the In-Diagram Logic (InDL) dataset, an innovative resource crafted to rigorously evaluate the …

📊 9 results
📏 Metrics: Average Recall

LES-AV

This data set comprises 22 fundus images with their corresponding manual annotations for the blood vessels, separated as arteries and …

📊 1 results
📏 Metrics: Accuracy

Liver-US

The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. …

📊 1 results
📏 Metrics: AUC

MHIST

The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each …

📊 6 results
📏 Metrics: Accuracy

MedSecId

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are …

📊 1 results
📏 Metrics: 1 shot Micro-F1

MixedWM38

MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect …

📊 1 results
📏 Metrics: Accuracy, MCC

MuReD Dataset

Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. …

📊 1 results
📏 Metrics: ML F1, ML mAP, ML AUC

N-CARS

A large real-world event-based dataset for object classification. Source: HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification

📊 6 results
📏 Metrics: Accuracy (%), Architecture, Representation, Representation Time( ms / 100ms events), Inference Time, Params (M)

N-ImageNet

The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera …

📊 9 results
📏 Metrics: Accuracy (%)

RITE

The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries …

📊 1 results
📏 Metrics: Accuracy

RSSCN7

he RSSCN7 dataset contains satellite images acquired from Google Earth, which is originally collected for remote sensing scene classification. We …

📊 1 results
📏 Metrics: 1:1 Accuracy

RTE

The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and …

📊 2 results
📏 Metrics: Test Accuracy

SGD

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. …

📊 1 results
📏 Metrics: F1 (Seqeval)

SHD - Adding

This dataset is based on the Spiking Heidelberg Digits (SHD) dataset. Sample inputs consist of two spike encoded digits sampled …

📊 3 results
📏 Metrics: Accuracy (%)

SPOT-10

The SPOTS-10 dataset is an extensive collection of grayscale images showcasing diverse patterns found in ten animal species. Specifically, SPOTS-10 …

📊 9 results
📏 Metrics: Accuracy

SST-2

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the …

📊 2 results
📏 Metrics: Test Accuracy

Sentiment140

Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter. Source: …

📊 1 results
📏 Metrics: Accuracy

SimGas

This dataset consists of computer-generated images for gas leakage segmentation. It features diverse backgrounds, interfering foreground objects, and precise ground …

📊 1 results
📏 Metrics: Frame Level Accuracy

Sound-based drone fault classification using multitask learning

arxiv : https://arxiv.org/abs/2304.11708 Accepted at 29th International Congress on Sound and Vibration (ICSV29). The drone has been used for various …

📊 1 results
📏 Metrics: macro f1 score (A(100), B(100), C(100) Avg.)

TACM12K

Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, …

📊 1 results
📏 Metrics: Accuracy

TCGA

📊 1 results
📏 Metrics: AUPRC, AUROC

TLF2K

Table-LastFm2K (TLF2K) is a relational table dataset derived from the classical LastFM2K dataset. It contains three tables: artists, user_artists, and …

📊 1 results
📏 Metrics: Accuracy

TML1M

Table-MovieLens1M (TML1M) is a relational table dataset derived from the classical MovieLens1M dataset. It consists of three tables: users, movies, …

📊 1 results
📏 Metrics: Accuracy

WSC

The Winograd Schema Challenge was introduced both as an alternative to the Turing Test and as a test of a …

📊 2 results
📏 Metrics: Test Accuracy

WiC

WiC is a benchmark for the evaluation of context-sensitive word embeddings. WiC is framed as a binary classification task. Each …

📊 2 results
📏 Metrics: Test Accuracy

XImageNet-12

Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background …

📊 3 results
📏 Metrics: Robustness Score

Community Detection

Citeseer

The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 …

📊 1 results
📏 Metrics: ACC, NMI

Cora

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 …

📊 1 results
📏 Metrics: NMI, ACC

DBLP

The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and …

📊 1 results
📏 Metrics: F1-Score

Pubmed

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. …

📊 1 results
📏 Metrics: ACC, NMI

Distance regression

CHILI-100K

The CHILI-100K dataset is a large-scale graph dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined …

📊 8 results
📏 Metrics: MSE

CHILI-3K

The CHILI-3K dataset is a medium-scale graph dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from …

📊 8 results
📏 Metrics: MSE

Graph Classification

ADNI

Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment …

📊 1 results
📏 Metrics: Accuracy

AIDS

AIDS is a graph dataset. It consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral …

📊 2 results
📏 Metrics: Accuracy, Inference Time (ms)

CIFAR-10

The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …

📊 1 results
📏 Metrics: Accuracy

COLLAB

COLLAB is a scientific collaboration dataset. A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators …

📊 35 results
📏 Metrics: Accuracy, Accuracy (10-fold)

CSL

CSL is a synthetic dataset introduced in Murphy et al. (2019) to test the expressivity of GNNs. In particular, graphs …

📊 1 results
📏 Metrics: Acc

Citeseer

The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 …

📊 1 results
📏 Metrics: Accuracy

Cora

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 …

📊 1 results
📏 Metrics: Accuracy

Digits

The DIGITS dataset consists of 1797 8×8 grayscale images (1439 for training and 360 for testing) of handwritten digits. Source: …

📊 1 results
📏 Metrics: Accuracy

ENZYMES

ENZYMES is a dataset of 600 protein tertiary structures obtained from the BRENDA enzyme database. The ENZYMES dataset contains 6 …

📊 49 results
📏 Metrics: Accuracy, Accuracy (10-fold)

HCP Aging

Lifespan HCP Release 2.0 includes cross-sectional visit 1 (V1) preprocessed structural and functional imaging data, unprocessed V1 imaging data for …

📊 1 results
📏 Metrics: Accuracy

IMDB-BINARY

IMDB-BINARY is a movie collaboration dataset that consists of the ego-networks of 1,000 actors/actresses who played roles in movies in …

📊 8 results
📏 Metrics: Accuracy, Accuracy (10-fold)

IMDB-MULTI

IMDB-MULTI is a relational dataset that consists of a network of 1000 actors or actresses who played roles in movies …

📊 1 results
📏 Metrics: Accuracy, Accuracy (10-fold)

IPC-grounded

📊 2 results
📏 Metrics: Accuracy

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …

📊 13 results
📏 Metrics: Accuracy

MUTAG

In particular, MUTAG is a collection of nitroaromatic compounds and the goal is to predict their mutagenicity on Salmonella typhimurium. …

📊 62 results
📏 Metrics: Accuracy, Accuracy (10-fold), Mean Accuracy, Accuracy (10 fold)

MUV

The Maximum Unbiased Validation (MUV) dataset is a benchmark dataset selected from PubChem BioAssay. It was created by applying a …

📊 2 results
📏 Metrics: ROC-AUC

Mutagenicity

Mutagenicity is a chemical compound dataset of drugs, which can be categorized into two classes: mutagen and non-mutagen. Source: [Hierarchical …

📊 5 results
📏 Metrics: Accuracy

NCI1

The NCI1 dataset comes from the cheminformatics domain, where each input graph is used as representation of a chemical compound: …

📊 59 results
📏 Metrics: Accuracy, Accuracy (10-fold)

NCI109

Tudataset: A collection of benchmark datasets for learning with graphs

📊 33 results
📏 Metrics: Accuracy

OASIS

A dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images. Source: [OASIS: …

📊 1 results
📏 Metrics: Accuracy

PROTEINS

PROTEINS is a dataset of proteins that are classified as enzymes or non-enzymes. Nodes represent the amino acids and two …

📊 88 results
📏 Metrics: Accuracy, Accuracy (10 fold), Inference Time (ms)

PTC

PTC is a collection of 344 chemical compounds represented as graphs which report the carcinogenicity for rats. There are 19 …

📊 34 results
📏 Metrics: Accuracy

Pubmed

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. …

📊 1 results
📏 Metrics: Test Accuracy

REDDIT-12K

Reddit12k contains 11929 graphs each corresponding to an online discussion thread where nodes represent users, and an edge represents the …

📊 1 results
📏 Metrics: Accuracy (10 fold)

REDDIT-BINARY

REDDIT-BINARY consists of graphs corresponding to online discussions on Reddit. In each graph, nodes represent users, and there is an …

📊 9 results
📏 Metrics: Accuracy, Accuracy (10-fold)

SIDER

SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and …

📊 2 results
📏 Metrics: ROC-AUC

Synthetic Dynamic Networks

This dataset accompanies the paper `Learning the mechanisms of network growth' by the same authors. The dataset contains 6733 networks …

📊 3 results
📏 Metrics: Accuracy

Tox21

The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense …

📊 3 results
📏 Metrics: ROC-AUC

UK Biobank Brain MRI

UK Biobank participants have generously provided a very wide range of information about their health and well-being since recruitment began …

📊 1 results
📏 Metrics: Accuracy

UPFD-GOS

The Gossipcop variant of the UPFD dataset for benchmarking. Please refer to the UPFD dataset for more details of the …

📊 8 results
📏 Metrics: Accuracy (%)

UPFD-POL

The PolitiFact variant of the UPFD dataset for benchmarking. Please refer to the UPFD dataset for more details of the …

📊 8 results
📏 Metrics: Accuracy (%)

Wine

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived …

📊 1 results
📏 Metrics: Accuracy

clintox

The ClinTox dataset compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. The …

📊 2 results
📏 Metrics: ROC-AUC

Graph Clustering

Citeseer

The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 …

📊 8 results
📏 Metrics: ACC, NMI, ARI, F1, Precision, F score

Cora

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 …

📊 8 results
📏 Metrics: ACC, NMI, ARI, F1, Precision, F score

Pubmed

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. …

📊 6 results
📏 Metrics: ACC, NMI, ARI, F score

Graph Matching

PASCAL VOC

The PASCAL Visual Object Classes (VOC) 2012 dataset contains 20 object categories including vehicles, household, animals, and other: aeroplane, bicycle, …

📊 13 results
📏 Metrics: F1 score, matching accuracy

RARE

RARE consists of English AMR pairs with similarity scores that reflect the structural differences between them. Given that AMRs are …

📊 4 results
📏 Metrics: Spearman Correlation

SPair-71k

SPair-71k contains 70,958 image pairs with diverse variations in viewpoint and scale. Compared to previous datasets, it is significantly larger …

📊 6 results
📏 Metrics: matching accuracy

Graph Property Prediction

QM9

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …

📊 5 results
📏 Metrics: Standardized MAE, logMAE, alpha (ma), gap (meV)

Graph Question Answering

GQA

The GQA dataset is a large-scale visual question answering dataset with real images from the Visual Genome dataset and balanced …

📊 1 results
📏 Metrics: Accuracy

Graph Ranking

ZINC

ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 230 million purchasable compounds in ready-to-dock, …

📊 4 results
📏 Metrics: Kendall's Tau

Graph Regression

GlassTemp

The GlassTemp dataset is collected from Polyinfo. It uses monomers as polymer graphs to predict the property of glass transition …

📊 1 results
📏 Metrics: RMSE

PCQM4Mv2-LSC

PCQM4Mv2 is a quantum chemistry dataset originally curated under the PubChemQC project. Based on the PubChemQC, we define a meaningful …

📊 20 results
📏 Metrics: Validation MAE, Test MAE

QM9

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …

📊 1 results
📏 Metrics: Inference Time (ms)

ZINC

ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 230 million purchasable compounds in ready-to-dock, …

📊 25 results
📏 Metrics: MAE

Graph Representation Learning

COMA

CoMA contains 17,794 meshes of the human face in various expressions Source: DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects

📊 1 results
📏 Metrics: Error (mm)

Hand Pose Estimation

3DPW

The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. …

📊 1 results
📏 Metrics: MPJPE

COCO-WholeBody

COCO-WholeBody is an extension of COCO dataset with whole-body annotations. There are 4 types of bounding boxes (person box, face …

📊 2 results
📏 Metrics: keypoint AP

Custom FINNgers

A dataset with 3200 images (200 for each number quantity on each hand).

📊 1 results
📏 Metrics: 1:1 Accuracy

ICVL

📊 1 results
📏 Metrics: Error (mm)

K2HPD

Includes 100K depth images under challenging scenarios. Source: Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

📊 1 results
📏 Metrics: PDJ@5mm

Image Paragraph Captioning

Image Paragraph Captioning

The Image Paragraph Captioning dataset allows researchers to benchmark their progress in generating paragraphs that tell a story about an …

📊 4 results
📏 Metrics: BLEU-4, METEOR, CIDEr

Initial Structure to Relaxed Energy (IS2RE)

OC20

Open Catalyst 2020 is a dataset for catalysis in chemical engineering. Focusing on molecules that are important in renewable energy …

📊 4 results
📏 Metrics: Energy MAE

Knowledge Graph Completion

DBP-5L (English)

DPB-5L is a Multilingual KG dataset containing 5 KGs in English, French, Japanese, Greek, and Spanish. The dataset is used …

📊 3 results
📏 Metrics: MRR

DBP-5L (Greek)

DPB-5L is a Multilingual KG dataset containing 5 KGs in English, French, Japanese, Greek, and Spanish. The dataset is used …

📊 3 results
📏 Metrics: MRR

DPB-5L (French)

DPB-5L is a Multilingual KG dataset containing 5 KGs in English, French, Japanese, Greek, and Spanish. The dataset is used …

📊 3 results
📏 Metrics: MRR

FB15k-237

FB15k-237 is a link prediction dataset created from FB15k. While FB15k consists of 1,345 relations, 14,951 entities, and 592,213 triples, …

📊 3 results
📏 Metrics: Hits@10, Hits@1, Hits@3, MR, MRR

WN18RR

WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations …

📊 2 results
📏 Metrics: Hits@3, Hits@1, Hits@10

Knowledge Graphs

JerichoWorld

JerichoWorld is a dataset that enables the creation of learning agents that can build knowledge graph-based world models of interactive …

📊 5 results
📏 Metrics: Set accuracy

MARS (Multimodal Analogical Reasoning dataSet)

Analogical reasoning is fundamental to human cognition and holds an important place in various fields. However, previous studies mainly focus …

📊 8 results
📏 Metrics: MRR

Link Prediction

ACM

The ACM dataset contains papers published in KDD, SIGMOD, SIGCOMM, MobiCOMM, and VLDB and are divided into three classes (Database, …

📊 1 results
📏 Metrics: AP, AUC

AbstRCT - Neoplasm

The AbstRCT dataset consists of randomized controlled trials retrieved from the MEDLINE database via PubMed search. The trials are annotated …

📊 1 results
📏 Metrics: F1

Aristo-v4

The Aristo Tuple KB contains a collection of high-precision, domain-targeted (subject,relation,object) tuples extracted from text using a high-precision extraction pipeline, …

📊 1 results
📏 Metrics: Hits@1, Hits@10, Hits@3, MRR

CDCP

The Cornell eRulemaking Corpus – CDCP is an argument mining corpus annotated with argumentative structure information capturing the evaluability of …

📊 1 results
📏 Metrics: F1

COLLAB

COLLAB is a scientific collaboration dataset. A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators …

📊 1 results
📏 Metrics: Hits

Citeseer

The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 …

📊 12 results
📏 Metrics: AUC, AP, Accuracy, ACC

CoDEx Large

CoDEx comprises a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph …

📊 6 results
📏 Metrics: MRR, Hits@1, Hits@3, Hits@10

CoDEx Medium

CoDEx comprises a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph …

📊 7 results
📏 Metrics: MRR, Hits@1, Hits@3, Hits@10

CoDEx Small

CoDEx comprises a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph …

📊 6 results
📏 Metrics: MRR, Hits@1, Hits@3, Hits@10

Cora

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 …

📊 11 results
📏 Metrics: AUC, AP, Accuracy, ACC

DBLP

The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and …

📊 3 results
📏 Metrics: AUC, AP

DRI Corpus

The Dr. Inventor Multi-Layer Scientific Corpus (DRI Corpus) includes 40 Computer Graphics papers, selected by domain experts. Each paper of …

📊 1 results
📏 Metrics: F1

Decagon

Bio-decagon is a dataset for polypharmacy side effect identification problem framed as a multirelational link prediction problem in a two-layer …

📊 2 results
📏 Metrics: AUROC, AUPRC, mAP@50

Douban

We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based …

📊 2 results
📏 Metrics: AUC

FB122

📊 4 results
📏 Metrics: HITS@3, Hits@5, Hits@10, MRR

FB15k

The FB15k dataset contains knowledge base relation triples and textual mentions of Freebase entity pairs. It has a total of …

📊 10 results
📏 Metrics: MRR, Hits@1, Hits@3, Hits@10, MR, MRR raw, Hits@5

FB15k-237

FB15k-237 is a link prediction dataset created from FB15k. While FB15k consists of 1,345 relations, 14,951 entities, and 592,213 triples, …

📊 70 results
📏 Metrics: Hits@1, Hits@3, Hits@10, MRR, MR, training time (s), Hit@1, Hit@10

GDELT

The GDELT Project is a remarkable initiative that monitors our world by analyzing global news from various sources. Here are …

📊 10 results
📏 Metrics: MRR

GO21

GO21 is a biomedical knowledge graph that models genes, proteins, drugs, and the hierarchy of the biological processes they participate …

📊 1 results
📏 Metrics: Hit@1, Hits@10, Hits@3, MRR

KG20C

KG20C is a Knowledge Graph about high quality papers from 20 top computer science Conferences. It can serve as a …

📊 1 results
📏 Metrics: MRR, Hits@1, Hits@3, Hits@10

NELL-995

NELL-995 KG Completion Dataset

📊 3 results
📏 Metrics: Hits@1, Hits@10, MRR, Mean AP, HITS@3

PPI

protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to …

📊 1 results
📏 Metrics: AP, AUC, Accuracy

Pubmed

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. …

📊 13 results
📏 Metrics: AUC, AP, Accuracy, ACC

SINS

SINS is a database of continuous real-life audio recordings in a home environment. The home is a vacation home and …

📊 1 results
📏 Metrics: Scaled time-delay embeddings

TSP/HCP Benchmark set

This is a benchmark set for Traveling salesman problem (TSP) with characteristics that are different from the existing benchmark sets. …

📊 4 results
📏 Metrics: F1

UMLS

The Unified Medical Language System (UMLS) is a comprehensive resource that integrates and disseminates essential terminology, classification standards, and coding …

📊 9 results
📏 Metrics: Hits@10, MR

WN18

The WN18 dataset has 18 relations scraped from WordNet for roughly 41,000 synsets, resulting in 141,442 triplets. It was found …

📊 33 results
📏 Metrics: Hits@10, Hits@3, Hits@1, MRR, MR, training time (s)

WN18RR

WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations …

📊 69 results
📏 Metrics: Hits@10, Hits@3, Hits@1, MRR, MR

Wiki

Context There's a story behind every dataset and here's your opportunity to share yours. ### Content What's inside is …

📊 1 results
📏 Metrics: AUC

Wikidata5M

Wikidata5m is a million-scale knowledge graph dataset with aligned corpus. This dataset integrates the Wikidata knowledge graph and Wikipedia pages. …

📊 12 results
📏 Metrics: MRR, Hits@10, Hits@1, Hits@3

YAGO3-10

YAGO3-10 is benchmark dataset for knowledge base completion. It is a subset of YAGO3 (which itself is an extension of …

📊 17 results
📏 Metrics: Hits@1, Hits@3, Hits@10, MRR, MR

Yelp

The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world …

📊 8 results
📏 Metrics: HR@10, AUC, nDCG@10

Link Sign Prediction

Epinions

The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of …

📊 1 results
📏 Metrics: AUC, Accuracy, Macro-F1

Slashdot

The Slashdot dataset is a relational dataset obtained from Slashdot. Slashdot is a technology-related news website know for its specific …

📊 1 results
📏 Metrics: AUC, Accuracy, Macro-F1

Molecular Property Prediction

MUV

The Maximum Unbiased Validation (MUV) dataset is a benchmark dataset selected from PubChem BioAssay. It was created by applying a …

📊 2 results
📏 Metrics: ROC-AUC

MoleculeNet

MoleculeNet is a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and …

📊 5 results
📏 Metrics: AUC

PCBA

PCBA dataset 11 is a collection of high-quality dose-response data, formulated as a multitask learning benchmark from 128 high-throughput screening …

📊 1 results
📏 Metrics: ROC-AUC

QM7

QM7 dataset is a subset of the GDB-13 database. GDB-13 contains nearly 1 billion stable and synthetically accessible organic molecules. …

📊 7 results
📏 Metrics: MAE

QM8

QM8 dataset is a collection of molecular data used for studying quantum mechanical calculations of electronic spectra and excited state …

📊 7 results
📏 Metrics: MAE

QM9

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …

📊 7 results
📏 Metrics: MAE

SIDER

SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and …

📊 16 results
📏 Metrics: ROC-AUC

Tox21

The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense …

📊 17 results
📏 Metrics: ROC-AUC

clintox

The ClinTox dataset compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. The …

📊 18 results
📏 Metrics: ROC-AUC, Molecules (M)

Node Classification

AMZ Computers

AMZ Computers is a co-purchase graph extracted from Amazon, where nodes represent products, edges represent the co-purchased relations of products, …

📊 5 results
📏 Metrics: Accuracy

AVA

AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. Each of the …

📊 4 results
📏 Metrics: mAP

Amazon Photo

Amazon Photo

📊 10 results
📏 Metrics: Accuracy

Amazon-Fraud

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node …

📊 3 results
📏 Metrics: AUC-ROC

Brazil Air-Traffic

Brazil Air-Traffic

📊 7 results
📏 Metrics: Accuracy

CLUSTER

CLUSTER is a node classification tasks generated with Stochastic Block Models, which is widely used to model communities in social …

📊 12 results
📏 Metrics: Accuracy

CellTypeGraph Benchmark

Classifying all cells in an organ is a relevant and difficult problem from plant developmental biology. We here abstract the …

📊 1 results
📏 Metrics: Top-1 accuracy, class-average Accuracy

Chameleon (48%/32%/20% fixed splits)

Node classification on Chameleon with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 4 results
📏 Metrics: Accuracy

Chameleon(60%/20%/20% random splits)

Node classification on Chameleon with 60%/20%/20% random splits for training/validation/test.

📊 2 results
📏 Metrics: Accuracy

Citeseer

The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 …

📊 65 results
📏 Metrics: Accuracy, Training Split, Validation, 1:1 Accuracy, Accuracy (%), Inference Time (ms)

Citeseer (48%/32%/20% fixed splits)

Node classification on Citeseer with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 26 results
📏 Metrics: 1:1 Accuracy, Accuracy

Cora

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 …

📊 69 results
📏 Metrics: Accuracy, Training Split, Validation, 1:1 Accuracy, Inference Time (ms)

Cora (48%/32%/20% fixed splits)

Node classification on Cora with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 26 results
📏 Metrics: 1:1 Accuracy, Accuracy

Cornell

📊 58 results
📏 Metrics: Accuracy, Accuracy (%)

Cornell (48%/32%/20% fixed splits)

Node classification on Cornell with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 3 results
📏 Metrics: Accuracy

Cornell (60%/20%/20% random splits)

Node classification on Cornell with 60%/20%/20% random splits for training/validation/test.

📊 36 results
📏 Metrics: 1:1 Accuracy

DBLP

The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and …

📊 6 results
📏 Metrics: Accuracy, Micro F1, Inference Time (ms), Macro F1

Film (60%/20%/20% random splits)

Node classification on Film with 60%/20%/20% random splits for training/validation/test.

📊 35 results
📏 Metrics: 1:1 Accuracy

Film(48%/32%/20% fixed splits)

Node classification on Film with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 1 results
📏 Metrics: Accuracy

MUTAG

In particular, MUTAG is a collection of nitroaromatic compounds and the goal is to predict their mutagenicity on Salmonella typhimurium. …

📊 4 results
📏 Metrics: Accuracy

MuMiN-large

This is the large version of the MuMiN dataset.

📊 4 results
📏 Metrics: Claim Classification Macro-F1, Tweet Classification Macro-F1

MuMiN-medium

This is the medium version of the MuMiN dataset.

📊 4 results
📏 Metrics: Claim Classification Macro-F1, Tweet Classification Macro-F1

MuMiN-small

This is the small version of the MuMiN dataset.

📊 4 results
📏 Metrics: Claim Classification Macro-F1, Tweet Classification Macro-F1

NELL

NELL is a dataset built from the Web via an intelligent agent called Never-Ending Language Learner. This agent attempts to …

📊 4 results
📏 Metrics: Accuracy

PATTERN

PATTERN is a node classification tasks generated with Stochastic Block Models, which is widely used to model communities in social …

📊 11 results
📏 Metrics: Accuracy

PPI

protein roles—in terms of their cellular functions from gene ontology—in various protein-protein interaction (PPI) graphs, with each graph corresponding to …

📊 23 results
📏 Metrics: F1, Micro-F1, Micro F1, Macro-F1

Penn94

Node classification on Penn94

📊 31 results
📏 Metrics: Accuracy

Placenta

Placenta is a benchmark dataset for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in …

📊 5 results
📏 Metrics: Accuracy (%)

PubMed (48%/32%/20% fixed splits)

Node classification on PubMed with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 26 results
📏 Metrics: 1:1 Accuracy, Accuracy

PubMed (60%/20%/20% random splits)

Node classification on PubMed with 60%/20%/20% random splits for training/validation/test.

📊 35 results
📏 Metrics: 1:1 Accuracy

Pubmed

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. …

📊 63 results
📏 Metrics: Accuracy, Training Split, F1, Validation, Accuracy (%), F1-Score, Inference Time (ms)

Reddit

The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label …

📊 15 results
📏 Metrics: Accuracy, Micro-F1

Squirrel (48%/32%/20% fixed splits)

Node classification on Squirrel with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 4 results
📏 Metrics: Accuracy

Squirrel (60%/20%/20% random splits)

Node classification on Squirrel with 60%/20%/20% random splits for training/validation/test.

📊 36 results
📏 Metrics: 1:1 Accuracy

Texas (48%/32%/20% fixed splits)

Node classification on Texas with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 2 results
📏 Metrics: Accuracy

USA Air-Traffic

Leonardo Filipe Rodrigues Ribeiro, Pedro H. P. Saverese, and Daniel R. Figueiredo. struc2vec: Learning node representations from structural identity.

📊 7 results
📏 Metrics: Accuracy

Wiki

Context There's a story behind every dataset and here's your opportunity to share yours. ### Content What's inside is …

📊 1 results
📏 Metrics: AUC, Macro F1, Micro F1

Wiki-CS

Wiki-CS is a Wikipedia-based dataset for benchmarking Graph Neural Networks. The dataset is constructed from Wikipedia categories, specifically 10 classes …

📊 6 results
📏 Metrics: Accuracy

Wisconsin (48%/32%/20% fixed splits)

Node classification on Wisconsin with the fixed 48%/32%/20% splits provided by Geom-GCN.

📊 2 results
📏 Metrics: Accuracy

Yelp-Fraud

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based …

📊 5 results
📏 Metrics: AUC-ROC

amazon-ratings

amazon-ratings is a product co-purchasing network based on data from SNAP datasets

📊 4 results
📏 Metrics: Accuracy (%)

genius

node classification on genius

📊 25 results
📏 Metrics: Accuracy, 1:1 Accuracy

minesweeper

minesweeper is a synthetic graph emulating the eponymous game.

📊 4 results
📏 Metrics: AUCROC

questions

Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.

📊 3 results
📏 Metrics: AUCROC

roman-empire

Roman-empire is a word dependency graph based on the Roman Empire article from the English Wikipedia.

📊 7 results
📏 Metrics: Accuracy (% )

tolokers

Tolokers is a crowdsourcing platform workers network based on data provided by Toloka.

📊 4 results
📏 Metrics: AUCROC

twitch-gamers

node classification on twitch-gamers

📊 1 results
📏 Metrics: Accuracy

Outlier Detection

ECG5000

The original dataset for "ECG5000" is a 20-hour long ECG downloaded from Physionet. The name is BIDMC Congestive Heart Failure …

📊 2 results
📏 Metrics: Accuracy

Fashion-MNIST

Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …

📊 1 results
📏 Metrics: AUROC

SKAB

SKAB is designed for evaluating algorithms for anomaly detection. The benchmark currently includes 30+ datasets plus Python modules for algorithms’ …

📊 1 results
📏 Metrics: Average F1

Point Cloud Classification

PointCloud-C

PointCloud-C is the very first test-suite for point cloud robustness analysis under corruptions. - Two sets: ModelNet-C for point cloud …

📊 23 results
📏 Metrics: mean Corruption Error (mCE)

Recommendation Systems

Amazon Beauty

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links …

📊 5 results
📏 Metrics: Hit@10, nDCG@10, NDCG

Amazon Fashion

This datasets is a subset of the Amazon reviews dataset which contain Fashion related products

📊 4 results
📏 Metrics: HitRatio@ 10 (100 Neg. Samples), nDCG@10 (100 Neg. Samples), AUC, nDCG@10 (500 Neg. Samples), Hit@10, NDCG

Amazon Men

This datasets is a subset of the Amazon reviews dataset which contain Men related products

📊 3 results
📏 Metrics: Hit@10, nDCG@10, NDCG

Amazon Product Data

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This …

📊 1 results
📏 Metrics: AUC, F1

Amazon-Book

N/A

📊 15 results
📏 Metrics: nDCG@20, Recall@20, HR@10, NDCG@10, HR@50, NDCG@50

Ciao

The Ciao dataset contains rating information of users given to items, and also contain item category information. The data comes …

📊 1 results
📏 Metrics: Hits@10, Hits@20, nDCG@10, nDCG@20

Delicious

Delicious : This data set contains tagged web pages retrieved from the website delicious.com. Source: [Text segmentation on multilabel documents: …

📊 1 results
📏 Metrics: NDCG, Recall@20

Douban

We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based …

📊 5 results
📏 Metrics: RMSE, NDCG, Recall@20, AUC, HR@10, HR@100, PSP@10, nDCG@10, nDCG@100

Epinions

The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of …

📊 4 results
📏 Metrics: MAE, RMSE, MAP@20, MRR@20, NDCG@20

Gowalla

Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and …

📊 13 results
📏 Metrics: nDCG@20, Recall@20, HR@10, HR@100, PSP@10, nDCG@10, nDCG@100

Pinterest

The Pinterest dataset contains more than 1 million images associated to Pinterest users’ who have “pinned” them. Source: https://openaccess.thecvf.com/content_iccv_2015/papers/Geng_Learning_Image_and_ICCV_2015_paper.pdf

📊 1 results
📏 Metrics: nDCG@10, Hits@10, Hits@20, nDCG@20

PixelRec

an image cover dataset in short video recommendation

📊 1 results
📏 Metrics: Hit@10

Polyvore

This dataset contains 21,889 outfits from polyvore.com, in which 17,316 are for training, 1,497 for validation and 3,076 for testing. …

📊 3 results
📏 Metrics: AUC, Accuracy

ReDial

ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of …

📊 7 results
📏 Metrics: Recall@1, Recall@10, Recall@50

WeChat

The WeChat dataset for fake news detection contains more than 20k news labelled as fake news or not.

📊 2 results
📏 Metrics: AUC, P@10

Yelp

The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world …

📊 2 results
📏 Metrics: NDCG, NDCG@20, Recall@20

Yelp2018

The Yelp2018 dataset is adopted from the 2018 edition of the yelp challenge. Wherein local businesses like restaurants and bars …

📊 11 results
📏 Metrics: NDCG@20, Recall@20, HR@10, HR@100, PSP@10, nDCG@10, nDCG@100

Unsupervised Anomaly Detection

AnoShift

AnoShift is a large-scale anomaly detection benchmark, which focuses on splitting the test data based on its temporal distance to …

📊 15 results
📏 Metrics: ROC-AUC FAR, ROC-AUC IID, ROC-AUC NEAR, ROC-AUC-ID (In-Distribution setup)

Caltech-101

The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that …

📊 1 results
📏 Metrics: AUC (outlier ratio = 0.5)

DAGM2007

This is a synthetic dataset for defect detection on textured surfaces. It was originally created for a competition at the …

📊 1 results
📏 Metrics: Detection AUROC

Fashion-MNIST

Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …

📊 1 results
📏 Metrics: AUC (outlier ratio = 0.5)

KolektorSDD

The dataset is constructed from images of defective production items that were provided and annotated by Kolektor Group d.o.o.. The …

📊 1 results
📏 Metrics: Segmentation AUROC

KolektorSDD2

KolektorSDD2 is a surface-defect detection dataset with over 3000 images containing several types of defects, obtained while addressing a real-world …

📊 3 results
📏 Metrics: Segmentation AP, Segmentation AUROC, Detection AP, Segmentation AUPRO

PRONTO

The PRONTO heterogeneous benchmark dataset is based on an industrial-scale multiphase flow facility. It includes data from heterogeneous sources, including …

📊 1 results
📏 Metrics: AUC, Best Delay, Best F1, F1

Reuters-21578

The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary …

📊 1 results
📏 Metrics: AUC (outlier ratio = 0.5)

SMAP

Soil Moisture Active Passive (SMAP) dataset is a dataset of soil samples and telemetry information using the Mars rover by …

📊 7 results
📏 Metrics: F1, Precision, Recall, AUC

SMD

a dataset of time-series anomaly detection

📊 1 results
📏 Metrics: Precision

TIMo

TIMo (Time-of-Flight Indoor Monitoring) is a dataset of infrared and depth videos intended for the use in Anomaly Detection and …

📊 1 results
📏 Metrics: AUROC

Vehicle Claims

The code to create the dataset is available here. The dataset used in the paper is available on github - …

📊 9 results
📏 Metrics: AUC