FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …
Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …
AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds …
MagnaTagATune dataset contains 25,863 music clips. Each clip is a 29-seconds-long excerpt belonging to one of the 5223 songs, 445 …
TimeTravel contains 29,849 counterfactual rewritings, each with the original story, a counterfactual event, and human-generated revision of the original story …
The Song Describer Dataset (SDD) contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in …
The JSB chorales are a set of short, four-voice pieces of music well-noted for their stylistic homogeneity. The chorales were …
The Nottingham Dataset is a collection of 1200 American and British folk songs. Source: [Rethinking Recurrent Latent Variable Model for …
We propose the MusicQA dataset to train Music-enabled question-answering models and is used for training and evaluating our MU-LLaMA model. …
The MUSDB18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated …
MUSDB18-HQ is a high-quality version of the MUSDB18 music tracks dataset. The high-quality dataset consists of the same 150 songs, …
The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset …
The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The …
MAPS – standing for MIDI Aligned Piano Sounds – is a database of MIDI-annotated piano recordings. MAPS has been designed …
MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise …
The Synthesized Lakh (Slakh) Dataset is a dataset for audio source separation that is synthesized from the Lakh MIDI Dataset …
URMP (University of Rochester Multi-Modal Musical Performance) is a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises …
The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences …
LRS3-TED is a multi-modal dataset for visual and audio-visual speech recognition. It includes face tracks from over 400 hours of …