Machine Learning Benchmarks

Browse 22 benchmarks across 10 tasks
← ML Research Wiki / Benchmarks / Music
Clear
Browse by Category

1 Image, 2*2 Stitchi

FQL-Driving

FQL-driving

📊 1 results
📏 Metrics: 0..5sec

10-shot image generation

FQL-Driving

FQL-driving

📊 1 results
📏 Metrics: 0-shot MRR

FlyingThings3D

FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …

📊 1 results
📏 Metrics: 0..5sec

MEAD

Multi-view Emotional Audio-visual Dataset

📊 1 results
📏 Metrics: 12k

Music21

Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …

📊 1 results
📏 Metrics: 0..5sec

Audio Generation

AudioCaps

AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds …

📊 23 results
📏 Metrics: FD_openl3, FAD, FD, KL_passt, IS, CLAP_LAION, CLAP_MS

Music Auto-Tagging

MagnaTagATune

MagnaTagATune dataset contains 25,863 music clips. Each clip is a 29-seconds-long excerpt belonging to one of the 5223 songs, 445 …

📊 3 results
📏 Metrics: PR-AUC, ROC AUC

TimeTravel

TimeTravel contains 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story …

📊 1 results
📏 Metrics: 0..5sec

Music Generation

Song Describer Dataset

The Song Describer Dataset (SDD) contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in …

📊 1 results
📏 Metrics: FAD VGG

Music Modeling

JSB Chorales

The JSB chorales are a set of short, four-voice pieces of music well-noted for their stylistic homogeneity. The chorales were …

📊 9 results
📏 Metrics: NLL, Parameters

Nottingham

The Nottingham Dataset is a collection of 1200 American and British folk songs. Source: [Rethinking Recurrent Latent Variable Model for …

📊 8 results
📏 Metrics: NLL, Parameters

Music Question Answering

MusicQA

We propose the MusicQA dataset to train Music-enabled question-answering models and is used for training and evaluating our MU-LLaMA model. …

📊 3 results
📏 Metrics: BLEU, METEOR, ROUGE, BERT Score

Music Source Separation

MUSDB18

The MUSDB18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated …

📊 20 results
📏 Metrics: SDR (avg), SDR (vocals), SDR (drums), SDR (bass), SDR (other)

MUSDB18-HQ

MUSDB18-HQ is a high-quality version of the MUSDB18 music tracks dataset. The high-quality dataset consists of the same 150 songs, …

📊 12 results
📏 Metrics: SDR (avg), SDR (bass), SDR (drums), SDR (others), SDR (vocals)

Slakh2100

The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset …

📊 1 results
📏 Metrics: SDR (bass), SDR (drums), SI-SDRi (Bass), Si-SDRi (Drums), Si-SDRi (Guitar), Si-SDRi (Piano)

Music Transcription

MAESTRO

The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The …

📊 6 results
📏 Metrics: Onset F1

MAPS

MAPS – standing for MIDI Aligned Piano Sounds – is a database of MIDI-annotated piano recordings. MAPS has been designed …

📊 6 results
📏 Metrics: Onset F1

MusicNet

MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise …

📊 6 results
📏 Metrics: APS, Number of params

Slakh2100

The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset …

📊 6 results
📏 Metrics: note-level F-measure-no-offset (Fno), Onset F1

URMP

URMP (University of Rochester Multi-Modal Musical Performance) is a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises …

📊 3 results
📏 Metrics: Onset F1

Visual Speech Recognition

LRS2

The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences …

📊 2 results
📏 Metrics: Word Error Rate (WER)

LRS3-TED

LRS3-TED is a multi-modal dataset for visual and audio-visual speech recognition. It includes face tracks from over 400 hours of …

📊 3 results
📏 Metrics: Word Error Rate (WER)