Machine Learning Benchmarks

Browse 202 benchmarks across 33 tasks

← ML Research Wiki / Benchmarks / Time Series

Browse by Category

1 Image, 2*2 Stitchi

FQL-Driving

FQL-driving

📊 1 results

📏 Metrics: 0..5sec

10-shot image generation

FQL-Driving

FQL-driving

📊 1 results

📏 Metrics: 0-shot MRR

FlyingThings3D

FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …

📊 1 results

📏 Metrics: 0..5sec

MEAD

Multi-view Emotional Audio-visual Dataset

📊 1 results

📏 Metrics: 12k

Music21

Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …

📊 1 results

📏 Metrics: 0..5sec

Action Detection

Charades

The Charades dataset is composed of 9,848 videos of daily indoors activities with an average length of 30 seconds, involving …

📊 15 results

📏 Metrics: mAP

MultiSports

Spatio-temporal action detection is an important and challenging problem in video understanding. The existing action detection benchmarks are limited in …

📊 2 results

📏 Metrics: Frame-mAP 0.5, Video-mAP 0.2, Video-mAP 0.5

MultiTHUMOS

The MultiTHUMOS dataset contains dense, multilabel, frame-level action annotations for 30 hours across 400 videos in the THUMOS'14 action detection …

📊 1 results

📏 Metrics: mAP

TSU

Toyota Smarthome Untrimmed (TSU) is a dataset for activity detection in long untrimmed videos. The dataset contains 536 videos with …

📊 1 results

📏 Metrics: Frame-mAP

TTStroke-21 ME21

This task offers researchers an opportunity to test their fine-grained classification methods for detecting and recognizing strokes in table tennis …

📊 2 results

📏 Metrics: IoU, mAP

TTStroke-21 ME22

TTStroke-21 for MediaEval 2022. The task is of interest to researchers in the areas of machine learning (classification), visual content …

📊 2 results

📏 Metrics: IoU, mAP

UCF Sports

The UCF Sports dataset consists of a set of actions collected from various sports which are typically featured on broadcast …

📊 5 results

📏 Metrics: Frame-mAP 0.5, Video-mAP 0.2, Video-mAP 0.5

UCF101-24

Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …

📊 15 results

📏 Metrics: Frame-mAP 0.5, Video-mAP 0.1, Video-mAP 0.2, Video-mAP 0.5

Action Recognition

ActivityNet

The ActivityNet dataset contains 200 different types of activities and a total of 849 hours of videos collected from YouTube. …

📊 16 results

📏 Metrics: mAP

Animal Kingdom

Animal Kingdom is a large and diverse dataset that provides multiple annotated tasks to enable a more thorough understanding of …

📊 2 results

📏 Metrics: mAP

BAR

Biased Action Recognition (BAR) dataset is a real-world image dataset categorized as six action classes which are biased to distinct …

📊 4 results

📏 Metrics: Accuracy

Charades

The Charades dataset is composed of 9,848 videos of daily indoors activities with an average length of 30 seconds, involving …

📊 1 results

📏 Metrics: MAP

Charades-Ego

Contains 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most …

📊 6 results

📏 Metrics: mAP

DVS128 Gesture

Comprises 11 hand gesture categories from 29 subjects under 3 illumination conditions. Source: [A Low Power, Fully Event-Based Gesture Recognition …

📊 1 results

📏 Metrics: Accuracy (% )

Drone-Action

Website: https://asankagp.github.io/droneaction/

📊 2 results

📏 Metrics: Top 1 Accuracy, Top-1 Accuracy

EPIC-KITCHENS-100

This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a …

📊 30 results

📏 Metrics: Action@1, Verb@1, Noun@1, GFLOPs

EPIC-KITCHENS-55

The EPIC-KITCHENS-55 dataset comprises a set of 432 egocentric videos recorded by 32 participants in their kitchens at 60fps with …

📊 1 results

📏 Metrics: Top-1 Accuracy

The EgoGesture dataset contains 2,081 RGB-D videos, 24,161 gesture samples and 2,953,224 frames from 50 distinct subjects. Source: http://www.nlpr.ia.ac.cn/iva/yfzhang/datasets/egogesture.html Image …

📊 1 results

📏 Metrics: Top-1 Accuracy, Top-5 Accuracy

H2O (2 Hands and Objects)

We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this …

📊 10 results

📏 Metrics: Actions Top-1, RGB, Hand Pose, Object Pose, Object Label

HAA500

HAA500 is a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591k labeled frames. …

📊 4 results

📏 Metrics: Top-1 (%)

HACS

HACS is a dataset for human action recognition. It uses a taxonomy of 200 action classes, which is identical to …

📊 8 results

📏 Metrics: Top 1 Accuracy, Top 5 Accuracy

HMDB51

The HMDB51 dataset is a large collection of realistic videos from various sources, including movies and web videos. The dataset …

📊 1 results

📏 Metrics: Accuracy

IndustReal

IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy …

📊 1 results

📏 Metrics: Top-1, Top-5

Jester (Gesture Recognition)

Jester Gesture Recognition dataset includes 148,092 labeled video clips of humans performing basic, pre-defined hand gestures in front of a …

📊 3 results

📏 Metrics: Val

MECCANO

The MECCANO dataset is the first dataset of egocentric videos to study human-object interactions in industrial-like settings. The MECCANO dataset …

📊 1 results

📏 Metrics: Top-1 Accuracy

MTL-AQA

A new multitask action quality assessment (AQA) dataset, the largest to date, comprising of more than 1600 diving samples; contains …

📊 1 results

📏 Metrics: Position Accuracy, Armstand Accuracy, Rotation Type Accuracy, No. of Somersaults Accuracy, No. of Twists Accuracy

Mimetics

Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …

📊 2 results

📏 Metrics: mAP

N-UCLA

The Multiview 3D event dataset is capture by me and Xiaohan Nie in UCLA. it contains RGB, depth and human …

📊 1 results

📏 Metrics: Accuracy (Cross-Subject), Accuracy (Cross-View)

NTU RGB+D

NTU RGB+D is a large-scale dataset for RGB-D human action recognition. It involves 56,880 samples of 60 action classes collected …

📊 21 results

📏 Metrics: Accuracy (CS), Accuracy (CV)

NTU RGB+D 120

NTU RGB+D 120 is a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and …

📊 16 results

📏 Metrics: Accuracy (Cross-Setup), Accuracy (Cross-Subject)

Okutama-Action

A new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 …

📊 2 results

📏 Metrics: Accuracy

Penn Action

The Penn Action Dataset contains 2326 video sequences of 15 different actions and human joint annotations for each sequence. Source: …

📊 2 results

📏 Metrics: Accuracy

RareAct

RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims …

📊 3 results

📏 Metrics: mWAP

RoCoG-v2

RoCoG-v2 (Robot Control Gestures) is a dataset intended to support the study of synthetic-to-real and ground-to-air video domain adaptation. It …

📊 1 results

📏 Metrics: Top-1 Accuracy

Skeleton-Mimetics

A dataset derived from the recently introduced Mimetics dataset. Source: Quo Vadis, Skeleton Action Recognition ?

📊 1 results

📏 Metrics: Accuracy

Something-Something V1

The 20BN-SOMETHING-SOMETHING dataset is a large collection of labeled video clips that show humans performing pre-defined basic actions with everyday …

📊 66 results

📏 Metrics: Top 1 Accuracy, Top 5 Accuracy, Param., GFLOPs

Something-Something V2

The 20BN-SOMETHING-SOMETHING V2 dataset is a large collection of labeled video clips that show humans performing pre-defined basic actions with …

📊 116 results

📏 Metrics: Top-1 Accuracy, Top-5 Accuracy, Parameters, GFLOPs

Sports-1M

The Sports-1M dataset consists of over a million videos from YouTube. The videos in the dataset can be obtained through …

📊 8 results

📏 Metrics: Video hit@1 , Video hit@5, Clip Hit@1

THUMOS14

The THUMOS14 (THUMOS 2014) dataset is a large-scale video dataset that includes 1,010 videos for validation and 1,574 videos for …

📊 1 results

📏 Metrics: Accuracy

UAV-Human

UAV-Human is a large dataset for human behavior understanding with UAVs. It contains 67,428 multi-modal video sequences and 119 subjects …

📊 4 results

📏 Metrics: Top 1 Accuracy

UCF101

UCF101 dataset is an extension of UCF50 and consists of 13,320 video clips, which are classified into 101 categories. These …

📊 76 results

📏 Metrics: 3-fold Accuracy, Accuracy, Accuracy 20%Test

UTD-MHAD

The UTD-MHAD dataset consists of 27 different actions performed by 8 subjects. Each subject repeated the action for 4 times, …

📊 1 results

📏 Metrics: Accuracy

Volleyball

Volleyball is a video action recognition dataset. It has 4830 annotated frames that were handpicked from 55 videos with 9 …

📊 3 results

📏 Metrics: Accuracy

Win-Fail Action Understanding

First of its kind paired win-fail action understanding dataset with samples from the following domains: “General Stunts,” “Internet Wins-Fails,” “Trick …

📊 1 results

📏 Metrics: 2-Class Accuracy

Activity Recognition

RWF-2000

A database with 2,000 videos captured by surveillance cameras in real-world scenes. Source: [RWF-2000: An Open Large Scale Video Database …

📊 4 results

📏 Metrics: Accuracy

Stanford40

The Stanford 40 Action Dataset contains images of humans performing 40 actions. In each image, we provide a bounding box …

📊 2 results

📏 Metrics: Top-3 Accuracy (%)

Change Point Detection

SKAB

SKAB is designed for evaluating algorithms for anomaly detection. The benchmark currently includes 30+ datasets plus Python modules for algorithms’ …

📊 1 results

📏 Metrics: NAB (standard), NAB (lowFP), NAB (LowFN)

TSSB

The time series segmentation benchmark (TSSB) currently contains 75 annotated time series (TS) with 1-9 segments. Each TS is constructed …

📊 3 results

📏 Metrics: Relative Change Point Distance, Covering

Classification

Adult

Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records …

📊 1 results

📏 Metrics: AUROC

BIOSCAN_1M_Insect Dataset

In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-1M Insect …

📊 2 results

📏 Metrics: Macro F1

BiasBios

The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to …

📊 1 results

📏 Metrics: 1:1 Accuracy

BoolQ

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …

📊 2 results

📏 Metrics: Test Accuracy

Brain Tumor MRI Dataset

This dataset is a combination of the following three datasets : figshare, SARTAJ dataset and Br35H This dataset contains 7022 …

📊 1 results

📏 Metrics: F1 score

CIFAKE: Real and AI-Generated Synthetic Images

The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …

📊 1 results

📏 Metrics: Validation Accuracy

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …

📊 1 results

📏 Metrics: Accuracy

CIFAR-10C

Common corruptions dataset for CIFAR10

📊 1 results

📏 Metrics: Accuracy on Brightness Corrupted Images

COVID-19 Image Data Collection

Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it …

📊 1 results

📏 Metrics: Accuracy

CWRU Bearing Dataset

Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and …

📊 1 results

📏 Metrics: 10 fold Cross validation

Chest X-Ray Images (Pneumonia)

The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia …

📊 1 results

📏 Metrics: Accuracy

ForgeryNet

We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across …

📊 3 results

📏 Metrics: AUC, Accuracy

Full-body Parkinson’s disease dataset

A public data set of walking full-body kinematics and kinetics in individuals with Parkinson’s disease

📊 7 results

📏 Metrics: F1-score (weighted)

HOWS

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile …

📊 1 results

📏 Metrics: Overall accuracy after last sequence

HRF

The HRF dataset is a dataset for retinal vessel segmentation which comprises 45 images and is organized as 15 subsets. …

📊 1 results

📏 Metrics: Accuracy

IRFL: Image Recognition of Figurative Language

The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel …

📊 1 results

📏 Metrics: 1-of-100 Accuracy

ISIC 2019

The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across …

📊 1 results

📏 Metrics: Balanced Multi-Class Accuracy

ImageNet C-OOD (class-out-of-distribution)

This dataset was presented as part of the ICLR 2023 paper 𝘈 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘧𝘰𝘳 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘪𝘯𝘨 𝘊𝘭𝘢𝘴𝘴-𝘰𝘶𝘵-𝘰𝘧-𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘪𝘵𝘴 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯 …

📊 5 results

📏 Metrics: Detection AUROC (severity 0), Detection AUROC (severity 5), Detection AUROC (severity 10)

InDL

Dataset Introduction In this work, we introduce the In-Diagram Logic (InDL) dataset, an innovative resource crafted to rigorously evaluate the …

📊 9 results

📏 Metrics: Average Recall

LES-AV

This data set comprises 22 fundus images with their corresponding manual annotations for the blood vessels, separated as arteries and …

📊 1 results

📏 Metrics: Accuracy

Liver-US

The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. …

📊 1 results

📏 Metrics: AUC

MHIST

The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each …

📊 6 results

📏 Metrics: Accuracy

MedSecId

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are …

📊 1 results

📏 Metrics: 1 shot Micro-F1

MixedWM38

MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect …

📊 1 results

📏 Metrics: Accuracy, MCC

MuReD Dataset

Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. …

📊 1 results

📏 Metrics: ML F1, ML mAP, ML AUC

N-CARS

A large real-world event-based dataset for object classification. Source: HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification

📊 6 results

📏 Metrics: Accuracy (%), Architecture, Representation, Representation Time( ms / 100ms events), Inference Time, Params (M)

N-ImageNet

The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera …

📊 9 results

📏 Metrics: Accuracy (%)

RITE

The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries …

📊 1 results

📏 Metrics: Accuracy

RSSCN7

he RSSCN7 dataset contains satellite images acquired from Google Earth, which is originally collected for remote sensing scene classification. We …

📊 1 results

📏 Metrics: 1:1 Accuracy

RTE

The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and …

📊 2 results

📏 Metrics: Test Accuracy

SGD

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. …

📊 1 results

📏 Metrics: F1 (Seqeval)

SHD - Adding

This dataset is based on the Spiking Heidelberg Digits (SHD) dataset. Sample inputs consist of two spike encoded digits sampled …

📊 3 results

📏 Metrics: Accuracy (%)

SPOT-10

The SPOTS-10 dataset is an extensive collection of grayscale images showcasing diverse patterns found in ten animal species. Specifically, SPOTS-10 …

📊 9 results

📏 Metrics: Accuracy

SST-2

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the …

📊 2 results

📏 Metrics: Test Accuracy

Sentiment140

Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter. Source: …

📊 1 results

📏 Metrics: Accuracy

SimGas

This dataset consists of computer-generated images for gas leakage segmentation. It features diverse backgrounds, interfering foreground objects, and precise ground …

📊 1 results

📏 Metrics: Frame Level Accuracy

Sound-based drone fault classification using multitask learning

arxiv : https://arxiv.org/abs/2304.11708 Accepted at 29th International Congress on Sound and Vibration (ICSV29). The drone has been used for various …

📊 1 results

📏 Metrics: macro f1 score (A(100), B(100), C(100) Avg.)

TACM12K

Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, …

📊 1 results

📏 Metrics: Accuracy

TCGA

📊 1 results

📏 Metrics: AUPRC, AUROC

TLF2K

Table-LastFm2K (TLF2K) is a relational table dataset derived from the classical LastFM2K dataset. It contains three tables: artists, user_artists, and …

📊 1 results

📏 Metrics: Accuracy

TML1M

Table-MovieLens1M (TML1M) is a relational table dataset derived from the classical MovieLens1M dataset. It consists of three tables: users, movies, …

📊 1 results

📏 Metrics: Accuracy

WSC

The Winograd Schema Challenge was introduced both as an alternative to the Turing Test and as a test of a …

📊 2 results

📏 Metrics: Test Accuracy

WiC

WiC is a benchmark for the evaluation of context-sensitive word embeddings. WiC is framed as a binary classification task. Each …

📊 2 results

📏 Metrics: Test Accuracy

XImageNet-12

Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background …

📊 3 results

📏 Metrics: Robustness Score

Eeg Decoding

CWL EEG/fMRI Dataset

EEG/fMRI Data from 8 subject doing a simple eyes open/eyes closed task is provided on this webpage. The EEG/fMRI data …

📊 1 results

📏 Metrics: Pearson Correlation

Fault Diagnosis

Digital twin-supported deep learning for fault diagnosis

This is a dataset used to test deep learning-supported deep learning for fault diagnosis: - A digital twin model for …

📊 2 results

📏 Metrics: Accuray

Human Activity Recognition

HAR

The Human Activity Recognition Dataset has been collected from 30 subjects performing six different activities (Walking, Walking Upstairs, Walking Downstairs, …

📊 1 results

📏 Metrics: Accuracy, F1 Macro, Macro-F1

PAMAP2

The PAMAP2 Physical Activity Monitoring dataset contains data of 18 different physical activities (such as walking, cycling, playing soccer, etc.), …

📊 2 results

📏 Metrics: NMI, ARI, Accuracy, Macro F1

Imputation

Adult

Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records …

📊 1 results

📏 Metrics: Test error

PhysioNet Challenge 2012

The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units …

📊 1 results

📏 Metrics: AUROC

Sprites

The Sprites dataset contains 60 pixel color images of animated characters (sprites). There are 672 sprites, 500 for training, 100 …

📊 1 results

📏 Metrics: MSE

Li-ion State of Health Estimation

NASA Li-ion Dataset

Experiments on Li-Ion batteries. Charging and discharging at different temperatures. Records the impedance as the damage criterion. The data set …

📊 1 results

📏 Metrics: mean absolute error

Lip Reading

LRW

The Lip Reading in the Wild (LRW) dataset a large-scale audio-visual database that contains 500 different words from over 1,000 …

📊 1 results

📏 Metrics: WER

Mathematical Question Answering

GeoS

GeoS is a dataset for automatic math problem solving. It is a dataset of SAT plane geometry questions where every …

📊 1 results

📏 Metrics: Accuracy (%)

Geometry3K

A new large-scale geometry problem-solving dataset - 3,002 multi-choice geometry problems - dense annotations in formal language for the diagrams …

📊 8 results

📏 Metrics: Accuracy (%)

Multivariate Time Series Forecasting

Electricity

Abstract: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 …

📊 1 results

📏 Metrics: MSE

ExtMarker

Three-dimensional position of external markers placed on the chest and abdomen of healthy individuals breathing during intervals from 73s to …

📊 1 results

📏 Metrics: Jitter, MAE, Maximum error, RMSE, normalized RMSE

MIMIC-III

The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. …

📊 4 results

📏 Metrics: MSE, NegLL

MuJoCo

MuJoCo (multi-joint dynamics with contact) is a physics engine used to implement environments to benchmark Reinforcement Learning methods.

📊 5 results

📏 Metrics: MSE (10^-2, 50% missing)

PhysioNet Challenge 2012

The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units …

📊 4 results

📏 Metrics: mse (10^-3), MSE stdev

Traffic

Abstract: The task for this dataset is to forecast the spatio-temporal traffic volume based on the historical traffic volume and …

📊 1 results

📏 Metrics: MSE

Weather

Weather is recorded every 10 minutes for the 2020 whole year, which contains 21 meteorological indicators, such as air temperature, …

📊 1 results

📏 Metrics: MSE

Prediction

QM9

QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …

📊 4 results

📏 Metrics: Edit Distance

Remaining Useful Lifetime Estimation

NASA C-MAPSS-2

The generation of data-driven prognostics models requires the availability of datasets with run-to-failure trajectories. In order to contribute to the …

📊 1 results

📏 Metrics: Score

Stock Market Prediction

Astock

(1) provide financial news for each specific stock. (2) provide various stock technical factors and fundamental factors for each stock.

📊 16 results

📏 Metrics: Accuray, F1-score, Recall, Precision

stocknet

stocknet-dataset This repository releases a comprehensive dataset for stock movement prediction from tweets and historical stock prices. Please cite …

📊 1 results

📏 Metrics: F1

Synthetic Data Generation

UNSW-NB15

UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains …

📊 2 results

📏 Metrics: EMD

Time Series Analysis

PhysioNet Challenge 2012

The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units …

📊 7 results

📏 Metrics: F1

Speech Commands

Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .

📊 6 results

📏 Metrics: % Test Accuracy, % Test Accuracy (Raw Data)

Time Series Anomaly Detection

MSL

This dataset contains expert-labeled telemetry anomaly data from the Mars Science Laboratory (MSL) rover, Curiosity. Real spacecraft and curiosity rover …

📊 1 results

📏 Metrics: AUPR, F1 Score, Recall, precision

SMAP

Soil Moisture Active Passive (SMAP) dataset is a dataset of soil samples and telemetry information using the Mars rover by …

📊 1 results

📏 Metrics: AUPR, F1 Score, Recall, precision

SMD

a dataset of time-series anomaly detection

📊 1 results

📏 Metrics: AUPR, F1 score, Recall, precision

UCR Anomaly Archive

The UCR Anomaly Archive is a collection of 250 uni-variate time series collected in human medicine, biology, meteorology and industry. …

📊 16 results

📏 Metrics: accuracy

Time Series Classification

BorealTC

Recorded with a Husky A200 wheeled UGV, BorealTC contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel …

📊 2 results

📏 Metrics: Accuracy (5-fold)

ECG200

📊 1 results

📏 Metrics: Accuracy(30-fold)

ECG5000

The original dataset for "ECG5000" is a 20-hour long ECG downloaded from Physionet. The name is BIDMC Congestive Heart Failure …

📊 1 results

📏 Metrics: Accuracy(30-fold)

EigenWorms

Caenorhabditis elegans is a roundworm commonly used as a model organism in the study of genetics. The movement of these …

📊 8 results

📏 Metrics: % Test Accuracy

PhysioNet Challenge 2012

The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units …

📊 27 results

📏 Metrics: AUC, AUC Stdev, AUPRC, AUROC

SHAPES

SHAPES is a dataset of synthetic images designed to benchmark systems for understanding of spatial and logical relations among multiple …

📊 9 results

📏 Metrics: Accuracy, NLL

Time Series Forecasting

ETTh1 (96)

The Electricity Transformer Temperature (ETT) is a crucial indicator in the electric power long-term deployment. This dataset consists of 2 …

📊 2 results

📏 Metrics: MAE, MSE

Extreme Events > Natural Disasters > Hurricane

A new spatio-temporal benchmark dataset (Hurricane), is suited for forecasting during extreme events and anomalies. The dataset is provided through …

📊 1 results

📏 Metrics: RMSE

MLO-Cn2

The Mauna Loa Seeing Study was performed by the EOL/Integrated Surface Flux System team, capturing surface meteorology and flux products …

📊 7 results

📏 Metrics: RMSE

PeMSD7

PeMSD7 is traffic data in District 7 of California consisting of the traffic speed of 228 sensors while the period …

📊 3 results

📏 Metrics: 9 steps MAE

USNA-Cn2 (short-duration)

The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer. …

📊 5 results

📏 Metrics: RMSE

Weather

Weather is recorded every 10 minutes for the 2020 whole year, which contains 21 meteorological indicators, such as air temperature, …

📊 1 results

📏 Metrics: MAE, MSE

Time Series Prediction

Data Collected with Package Delivery Quadcopter Drone

This experiment was performed in order to empirically measure the energy use of small, electric Unmanned Aerial Vehicles (UAVs). We …

📊 1 results

📏 Metrics: Average mean absolute error

Time Series Regression

FinSen

Enhancing Financial Market Predictions: Causality-Driven Feature Selection This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating …

📊 1 results

📏 Metrics: Mean MSE

MLO-Cn2

The Mauna Loa Seeing Study was performed by the EOL/Integrated Surface Flux System team, capturing surface meteorology and flux products …

📊 5 results

📏 Metrics: RMSE

USNA-Cn2 (long-term)

The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer. …

📊 9 results

📏 Metrics: RMSE

USNA-Cn2 (short-duration)

The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer. …

📊 10 results

📏 Metrics: RMSE

Traffic Prediction

Beijing Traffic

The Beijing Traffic Dataset collects traffic speeds at 5-minute granularity for 3126 roadway segments in Beijing between 2022/05/12 and 2022/07/25.

📊 1 results

📏 Metrics: MAE

EXPY-TKY

EXPY-TKY contains the traffic speed information and the corresponding traffic incident information in 10-minute interval for 1843 expressway road links …

📊 8 results

📏 Metrics: 1 step MAE, 3 step MAE, 6 step MAE

LargeST

In this work, we propose LargeST as a new benchmark dataset (see Figure 1), with the goal of facilitating the …

📊 5 results

📏 Metrics: SD MAE, GBA MAE, GLA MAE, CA MAE

METR-LA

METR-LA is a dataset for traffic prediction.

📊 14 results

📏 Metrics: MAE @ 12 step, 12 steps MAE, 12 steps MAPE, 12 steps RMSE, MAE @ 3 step

NYCBike1

Bike flow data of New York City with grid 16x8.

📊 3 results

📏 Metrics: MAE @ in, MAE @ out, MAPE (%) @ in, MAPE (%) @ out

NYCBike2

Bike flow data of New York City.

📊 3 results

📏 Metrics: MAE @ in, MAE @ out, MAPE (%) @ in, MAPE (%) @ out

NYCTaxi

Taxi flow data of New York City with grid 20x10.

📊 4 results

📏 Metrics: MAE @ in, MAE @ out, MAPE (%) @ in, MAPE (%) @ out

PEMS-BAY

PEMS-BAY is a dataset for traffic prediction.

📊 11 results

📏 Metrics: MAE @ 12 step, RMSE , RMSE

PeMS04

PeMS04 is a traffic forecasting benchmark.

📊 9 results

📏 Metrics: 12 Steps MAE, FLOPs(M), MAE, MAPE, Parameters(K), RMSE

PeMS07

PeMS07 is a traffic forecasting benchmark.

📊 12 results

📏 Metrics: MAE@1h

PeMS08

PeMS08 is a traffic forecasting dataset.

📊 10 results

📏 Metrics: MAE@1h, FLOPs(M), MAE, MAPE, Parameters(K), RMSE

PeMSD4

The dataset refers to the traffic speed data in San Francisco Bay Area, containing 307 sensors on 29 roads. The …

📊 10 results

📏 Metrics: 12 steps MAE, 12 steps MAPE, 12 steps RMSE

PeMSD7

PeMSD7 is traffic data in District 7 of California consisting of the traffic speed of 228 sensors while the period …

📊 7 results

📏 Metrics: 12 steps MAE, 12 steps MAPE, 12 steps RMSE

PeMSD8

This dataset contains the traffic data in San Bernardino from July to August in 2016, with 170 detectors on 8 …

📊 10 results

📏 Metrics: 12 steps MAE, 12 steps MAPE, 12 steps RMSE, MAE@1h

Q-Traffic

Q-Traffic is a large-scale traffic prediction dataset, which consists of three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network …

📊 1 results

📏 Metrics: MAPE

SZ-Taxi

Taxi speed data in 15min interval from 156 sensors on major roads of Luohu District in Shenzhen, China, from Jan. …

📊 4 results

📏 Metrics: MAE @ 15min, MAE @ 30min, MAE @ 45min, MAE @ 60min

Trajectory Modeling

NBA SportVU

The NBA SportVU dataset contains player and ball trajectories for 631 games from the 2015-2016 NBA season. The raw tracking …

📊 1 results

📏 Metrics: 1x1 NLL

Trajectory Prediction

ApolloScape

ApolloScape is a large dataset consisting of over 140,000 video frames (73 street scene videos) from various locations in China …

📊 1 results

📏 Metrics: ADE, FDE

Apolloscape Trajectory

Our trajectory dataset consists of camera-based images, LiDAR scanned point clouds, and manually annotated trajectories. It is collected under various …

📊 1 results

📏 Metrics: ADE

Argoverse

Argoverse is a tracking benchmark with over 30K scenarios collected in Pittsburgh and Miami. Each scenario is a sequence of …

📊 1 results

📏 Metrics: MR (K=6), brier-minFDE (K=6), minADE (K=6), minFDE (K=6)

ETH

ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is …

📊 4 results

📏 Metrics: Avg AMD/AMV 8/12

GTA-IM Dataset

The GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. It consists of HD RGB-D image …

📊 1 results

📏 Metrics: ADE, FDE, STB

HEV-I

Honda Egocentric View-Intersection Dataset (HEV-I) is introduced to enable research on traffic participants interaction modelling, future object localization, as well …

📊 2 results

📏 Metrics: ADE(0.5), ADE(1.0), ADE(1.5), FDE(1.5), FIOU(1.5)

JAAD

JAAD is a dataset for studying joint attention in the context of autonomous driving. The focus is on pedestrian and …

📊 4 results

📏 Metrics: MSE(0.5), MSE(1.0), MSE(1.5), C_MSE(1.5), CF_MSE(1.5)

PIE

PIE is a new dataset for studying pedestrian behavior in traffic. PIE contains over 6 hours of footage recorded in …

📊 4 results

📏 Metrics: MSE(0.5), MSE(1.0), MSE(1.5), C_MSE(1.5), CF_MSE(1.5)

PROX

A dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the …

📊 1 results

📏 Metrics: ADE, FDE, STB

SDD

SDD dataset contains a variety of indoor and outdoor scenes, designed for Image Defocus Deblurring. There are 50 indoor scenes …

📊 1 results

📏 Metrics: mADEK @4.8s, mF DEK @4.8s

UCY

The UCY dataset consist of real pedestrian trajectories with rich multi-human interaction scenarios captured at 2.5 Hz (Δt=0.4s). It is …

📊 1 results

📏 Metrics: Avg AMD/AMV 8/12

nuScenes

The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in …

📊 11 results

📏 Metrics: MinADE_5, MinADE_10, MissRateTopK_2_5, MissRateTopK_2_10, MinFDE_1, OffRoadRate

Transferability

classification benchmark

This benchmark includes 11 image classification datasets that were used to evaluate the transferability of metrics. Datasets include FGVC Aircraft, …

📊 6 results

📏 Metrics: Kendall's Tau

Video Prediction

BAIR Robot Pushing

Dataset of 64x64 images of a robot pushing objects on a table top. From Berkeley AI Research (BAIR). Source: Self-Supervised …

📊 6 results

📏 Metrics: FVD

Cityscapes

Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense …

📊 3 results

📏 Metrics: LPIPS, MS-SSIM

DAVIS 2017

DAVIS17 is a dataset for video object segmentation. It contains a total of 150 videos - 60 for training, 30 …

📊 2 results

📏 Metrics: LPIPS, MS-SSIM

Human3.6M

The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding …

📊 6 results

📏 Metrics: SSIM, MSE, MAE

KITTI

KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile …

📊 3 results

📏 Metrics: LPIPS, MS-SSIM

KTH

The efforts to create a non-trivial and publicly available dataset for action recognition was initiated at the KTH Royal Institute …

📊 28 results

📏 Metrics: FVD, SSIM, PSNR, LPIPS, Cond, Train, Pred, Params (M), MSE, Diversity

MPI Sintel

MPI (Max Planck Institute) Sintel is a dataset for optical flow evaluation that has 1064 synthesized stereo images and ground …

📊 1 results

📏 Metrics: LPIPS, PSNR, SSIM, ST-RRED

Moving MNIST

The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move …

📊 24 results

📏 Metrics: MSE, MAE, SSIM, LPIPS, PSNR

Something-Something V2

The 20BN-SOMETHING-SOMETHING V2 dataset is a large collection of labeled video clips that show humans performing pre-defined basic actions with …

📊 1 results

📏 Metrics: FVD

Sprites

The Sprites dataset contains 60 pixel color images of animated characters (sprites). There are 672 sprites, 500 for training, 100 …

📊 1 results

📏 Metrics: MSE

Vimeo90K

The Vimeo-90K is a large-scale high-quality video dataset for lower-level video processing. It proposes three different video processing tasks: frame …

📊 2 results

📏 Metrics: LPIPS, MS-SSIM

YouTube-8M

The YouTube-8M dataset is a large scale video dataset, which includes more than 7 million videos with 4716 classes labeled …

📊 1 results

📏 Metrics: Average PSNR

Video Quality Assessment

KoNViD-1k

Subjective video quality assessment (VQA) strongly depends on semantics, context, and the types of visual distortions. A lot of existing …

📊 20 results

📏 Metrics: PLCC

LIVE Livestream

LIVE Livestream is a database for Video Quality Assessment (VQA), specifically designed for live streaming VQA research. The dataset is …

📊 3 results

📏 Metrics: SRCC

LIVE-ETRI

The video deployed parameter space is continuously increasing to provide more realistic and immersive experiences to global streaming and social …

📊 4 results

📏 Metrics: SRCC

LIVE-FB LSVQ

No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications. …

📊 13 results

📏 Metrics: PLCC

LIVE-VQC

The great variations of videographic skills in videography, camera designs, compression and processing protocols, communication and bandwidth environments, and displays …

📊 19 results

📏 Metrics: PLCC

LIVE-YT-HFR

LIVE-YT-HFR comprises of 480 videos having 6 different frame rates, obtained from 16 diverse contents. Source: [Subjective and Objective Quality …

📊 3 results

📏 Metrics: SRCC

MSU FR VQA Database

The dataset was created for video quality assessment problem. It was formed with 36 clips from Vimeo, which were selected …

📊 6 results

📏 Metrics: SRCC, PLCC, KLCC

MSU NR VQA Database

The dataset was created for video quality assessment problem. It was formed with 36 clips from Vimeo, which were selected …

📊 17 results

📏 Metrics: SRCC, PLCC, KLCC, Type

MSU SR-QA Dataset

Our dataset was made of videos from MSU Video Upscalers Benchmark Dataset, MSU Video Super-Resolution Benchmark Dataset and MSU Super-Resolution …

📊 40 results

📏 Metrics: SROCC, PLCC, KLCC, Type

YouTube-UGC

This YouTube dataset is a sampling from thousands of User Generated Content (UGC) as uploaded to YouTube distributed under the …

📊 17 results

📏 Metrics: PLCC

Weather Forecasting

NOAA Atmospheric Temperature Dataset

This dataset contains meteorological observations (temperature) at the land-based weather stations located in the United States, collected from the Online …

📊 4 results

📏 Metrics: MAE (t+1), MAE (t+10)

SEVIR

SEVIR is an annotated, curated and spatio-temporally aligned dataset containing over 10,000 weather events that each consist of 384 km …

📊 5 results

📏 Metrics: MSE, mCSI

Shifts

The Shifts Dataset is a dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has …

📊 2 results

📏 Metrics: R-AUC MSE

regression

California Housing Prices

Median house prices for California districts derived from the 1990 census. About Dataset Context This is the dataset used in …

📊 3 results

📏 Metrics: R2 Score, lambda

Car_Price_Prediction

In this dataset we added [Company Name, Car Model, Car Type, Fuel Type, Transmission, Engine (cc), Mileage, Kms_driven, Buyers, Horsepower …

📊 1 results

📏 Metrics: R Squared

Concrete Compressive Strength

Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age …

📊 3 results

📏 Metrics: R2 Score, lambda

Medical Cost Personal Dataset

This dataset contains demographic and personal health information for individuals, along with the corresponding medical insurance charges billed to them. …

📊 3 results

📏 Metrics: R2 Score, lambda