FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …
Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …
CamVid (Cambridge-driving Labeled Video Database) is a road/driving scene understanding database which was originally captured as five video sequences with …
Stack of 2D gray images of glass fiber-reinforced polyamide 66 (GF-PA66) 3D X-ray Computed Tomography (XCT) specimen. Usage: 2D/3D image …
A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset * WaterScenes, the first …
WildScenes is a bi-modal benchmark dataset consisting of multiple large-scale, sequential traversals in natural environments, including semantic annotations in high-resolution …
The xBD dataset contains over 45,000KM2 of polygon labeled pre and post disaster imagery. The dataset provides the post-disaster imagery …
CochlScene is a dataset for acoustic scene classification. The dataset consists of 76k samples collected from 831 participants in 13 …
TAU Urban Acoustic Scenes 2019 Mobile development dataset consists of 10-seconds audio segments from 10 acoustic scenes: Airport Indoor shopping …
TAU Urban Acoustic Scenes 2019 development dataset consists of 10-seconds audio segments from 10 acoustic scenes: airport, indoor shopping mall, …
The TUT Acoustic Scenes 2017 dataset is a collection of recordings from various acoustic scenes all from distinct locations. For …
The dataset for this task is the TUT Urban Acoustic Scenes 2018 dataset, consisting of recordings from various acoustic scenes. …
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the cocktail party effect from an augmented-reality …
Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from …
CREMA-D is an emotional multimodal actor data set of 7,442 original clips from 91 actors. These clips were from 48 …
This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a …
EPIC-SOUNDS is a large scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of …
The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. …
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally …
The Respiratory Sound database was originally compiled to support the scientific challenge organized at Int. Conf. on Biomedical Health Informatics …
A large-scale reference dataset for bioacoustics. MeerKAT is a 1068h large-scale dataset containing data from audio-recording collars worn by free-ranging …
Dataset for multimodal skills assessment focusing on assessing piano player’s skill level. Annotations include player's skills level, and song difficulty …
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains …
The Spiking Heidelberg Digits (SHD) dataset is an audio-based classification dataset of 1k spoken digits ranging from zero to nine …
The SSC dataset is a spiking version of the Speech Commands dataset release by Google (Speech Commands). SSC was generated …
Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .
The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining …
VocalSound is a free dataset consisting of 21,024 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from …
AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds …
Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from …
The dsd100 is a dataset of 100 full lengths of music tracks of different styles along with their isolated drums, …
Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from …
AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds …
Clotho is an audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total …
ASAP is a dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical …
This data set includes beat and bar annotations of the ballroom dataset, introduced by Gouyon et al. [1]. [1] Gouyon …
This dataset includes the beat and downbeat annotations for Beatles albums. The annotations are provided by M. E. P. Davies …
35 recordings of Candombe music with beat and downbeat annotations.
The gtzan8 audio dataset contains 1000 tracks of 30 second length. There are 10 genres, each containing 100 tracks which …
The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive …
GuitarSet is a dataset of high-quality guitar recordings and rich annotations. It contains 360 excerpts 30 seconds in length. The …
J. Hockman, M. E. Davies, and I. Fujinaga, “One in the jungle: Downbeat detection in hardcore, jungle, and drum and …
S. W. Hainsworth and M. D. Macleod, “Particle filtering applied to musical tempo tracking,” EURASIP Journal on Advances in Signal …
Beats, downbeats, and functional structural annotations for 912 Pop tracks. Nieto, O., McCallum, M., Davies., M., Robertson, A., Stark, A., …
Eremenko, E. Demirel, B. Bozkurt, and X. Serra, “Audio-aligned jazz harmony dataset for automatic chord transcription and corpus-based research,” in …
F. Gouyon, “A computational approach to rhythm description — Audio features for the computation of rhythm periodicity functions and their …
A. Holzapfel, M. E. Davies, J. R. Zapata, J. L. Oliveira, and F. Gouyon, “Selective sampling for beat tracking evaluation,” …
J. Driedger, H. Schreiber, W. B. de Haas, and M. Müller, “Towards automatically correcting tapped beat annotations for music recordings.” …
Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records …
In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-1M Insect …
The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to …
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …
This dataset is a combination of the following three datasets : figshare, SARTAJ dataset and Br35H This dataset contains 7022 …
The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …
Common corruptions dataset for CIFAR10
Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it …
Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and …
The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia …
We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across …
A public data set of walking full-body kinematics and kinetics in individuals with Parkinson’s disease
HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile …
The HRF dataset is a dataset for retinal vessel segmentation which comprises 45 images and is organized as 15 subsets. …
The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel …
The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across …
This dataset was presented as part of the ICLR 2023 paper 𝘈 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘧𝘰𝘳 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘪𝘯𝘨 𝘊𝘭𝘢𝘴𝘴-𝘰𝘶𝘵-𝘰𝘧-𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘪𝘵𝘴 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯 …
Dataset Introduction In this work, we introduce the In-Diagram Logic (InDL) dataset, an innovative resource crafted to rigorously evaluate the …
This data set comprises 22 fundus images with their corresponding manual annotations for the blood vessels, separated as arteries and …
The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. …
The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each …
The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are …
MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect …
Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. …
A large real-world event-based dataset for object classification. Source: HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification
The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera …
The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries …
he RSSCN7 dataset contains satellite images acquired from Google Earth, which is originally collected for remote sensing scene classification. We …
The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and …
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. …
This dataset is based on the Spiking Heidelberg Digits (SHD) dataset. Sample inputs consist of two spike encoded digits sampled …
The SPOTS-10 dataset is an extensive collection of grayscale images showcasing diverse patterns found in ten animal species. Specifically, SPOTS-10 …
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the …
Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter. Source: …
This dataset consists of computer-generated images for gas leakage segmentation. It features diverse backgrounds, interfering foreground objects, and precise ground …
arxiv : https://arxiv.org/abs/2304.11708 Accepted at 29th International Congress on Sound and Vibration (ICSV29). The drone has been used for various …
Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, …
Table-LastFm2K (TLF2K) is a relational table dataset derived from the classical LastFM2K dataset. It contains three tables: artists, user_artists, and …
Table-MovieLens1M (TML1M) is a relational table dataset derived from the classical MovieLens1M dataset. It consists of three tables: users, movies, …
The Winograd Schema Challenge was introduced both as an alternative to the Turing Test and as a test of a …
WiC is a benchmark for the evaluation of context-sensitive word embeddings. WiC is framed as a binary classification task. Each …
Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background …
The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …
The DFDC (Deepfake Detection Challenge) is a dataset for deepface detection consisting of more than 100,000 videos. The DFDC dataset …
FaceForensics is a video dataset consisting of more than 500,000 frames containing faces from 1004 videos that can be used …
FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation …
FakeAVCeleb is a novel Audio-Video Deepfake dataset that not only contains deepfake videos but respective synthesized cloned audios as well. …
Localized Audio Visual DeepFake Dataset (LAV-DF). Paper: Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method …
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about …
ASAP is a dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical …
This data set includes beat and bar annotations of the ballroom dataset, introduced by Gouyon et al. [1]. [1] Gouyon …
This dataset includes the beat and downbeat annotations for Beatles albums. The annotations are provided by M. E. P. Davies …
35 recordings of Candombe music with beat and downbeat annotations.
The gtzan8 audio dataset contains 1000 tracks of 30 second length. There are 10 genres, each containing 100 tracks which …
The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive …
GuitarSet is a dataset of high-quality guitar recordings and rich annotations. It contains 360 excerpts 30 seconds in length. The …
J. Hockman, M. E. Davies, and I. Fujinaga, “One in the jungle: Downbeat detection in hardcore, jungle, and drum and …
S. W. Hainsworth and M. D. Macleod, “Particle filtering applied to musical tempo tracking,” EURASIP Journal on Advances in Signal …
Beats, downbeats, and functional structural annotations for 912 Pop tracks. Nieto, O., McCallum, M., Davies., M., Robertson, A., Stark, A., …
Eremenko, E. Demirel, B. Bozkurt, and X. Serra, “Audio-aligned jazz harmony dataset for automatic chord transcription and corpus-based research,” in …
J. Driedger, H. Schreiber, W. B. de Haas, and M. Müller, “Towards automatically correcting tapped beat annotations for music recordings.” …
The EMOTIC dataset, named after EMOTions In Context, is a database of images with people in real environments, annotated with …
1000 songs has been selected from Free Music Archive (FMA). The excerpts which were annotated are available in the same …
Fer2013 contains approximately 30,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of …
The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus …
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains …
The SEED dataset contains subjects' EEG signals when they were watching films clips. The film clips are carefully selected so …
The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. …
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally …
Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: …
CaseHOLD (Case Holdings On Legal Decisions) is a law dataset comprised of over 53,000+ multiple choice questions to identify the …
The Describable Textures Dataset (DTD) contains 5640 texture images in the wild. They are annotated with human-centric attributes inspired by …
Eurosat is a dataset and deep learning benchmark for land use and land cover classification. The dataset is based on …
"We built a large lung CT scan dataset for COVID-19 by curating data from 7 public datasets listed in the …
MR Movie Reviews is a dataset for use in sentiment-analysis experiments. Available are collections of movie-review documents labeled with respect …
Microsoft Research Paraphrase Corpus (MRPC) is a corpus consists of 5,801 sentence pairs collected from newswire articles. Each pair is …
MedConceptsQA - Open Source Medical Concepts QA Benchmark The benchmark can be found here: https://huggingface.co/datasets/ofir408/MedConceptsQA
The MedNLI dataset consists of the sentence pairs developed by Physicians from the Past Medical History section of MIMIC-III clinical …
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary …
The Scene UNderstanding (SUN) database contains 899 categories and 130,519 images. There are 397 well-sampled categories to evaluate numerous state-of-the-art …
The Stanford Cars dataset consists of 196 classes of cars with a total of 16,185 images, taken from the rear. …
UCF101 dataset is an extension of UCF50 and consists of 13,320 video clips, which are classified into 101 categories. These …
NSynth is a dataset of one shot instrumental notes, containing 305,979 musical notes with unique pitch, timbre and envelope. The …
OpenMIC-2018 is an instrument recognition dataset containing 20,000 examples of Creative Commons-licensed music available on the Free Music Archive. Each …
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine-learning …
OpenSubtitles is collection of multilingual parallel corpora. The dataset is compiled from a large database of movie and TV subtitles …
The Universal Dependencies (UD) project seeks to develop cross-linguistically consistent treebank annotation of morphology and syntax for multiple languages. The …
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open …
The Respiratory Sound database was originally compiled to support the scientific challenge organized at Int. Conf. on Biomedical Health Informatics …
The Song Describer Dataset (SDD) contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in …
This data set includes beat and bar annotations of the ballroom dataset, introduced by Gouyon et al. [1]. [1] Gouyon …
The gtzan8 audio dataset contains 1000 tracks of 30 second length. There are 10 genres, each containing 100 tracks which …
This dataset contains 200 famous songs in different genres (mostly in rock) and the beats and downbeat annotations are provided …
The DESED dataset is a dataset designed to recognize sound event classes in domestic environments. The dataset is designed to …
L3DAS21 is a dataset for 3D audio signal processing. It consists of a 65 hours 3D audio corpus, accompanied with …
WildDESED is an extension of the original DESED dataset, created to reflect various domestic scenarios by incorporating complex and unpredictable …
L3DAS21 is a dataset for 3D audio signal processing. It consists of a 65 hours 3D audio corpus, accompanied with …
The PodcastFillers dataset consists of 199 full-length podcast episodes in English with manually annotated filler words and automatically generated transcripts. …
The RWCP Sound Scene Database includes non-speech sounds recorded in an anechoic room, reconstructed signals in various rooms, impulse responses …
The Sony-TAu Realistic Spatial Soundscapes 2022(STARSS22) dataset consists of recordings of real scenes captured with high channel-count spherical microphone array …
The TAU-NIGENS Spatial Sound Events 2021 dataset contains multiple spatial sound-scene recordings, consisting of sound events of distinct categories integrated …
The DNS Challenge at INTERSPEECH 2020 intended to promote collaborative research in single-channel Speech Enhancement aimed to maximize the perceptual …
The EARS-WHAM dataset mixes speech from the EARS dataset with real noise recordings from the WHAM! dataset. Speech and noise …
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the cocktail party effect from an augmented-reality …
The Audio Signal and Information Processing Lab at Westlake University, in collaboration with AISHELL, has released the Real-recorded and annotated …
Uses same clean speech as VoiceBank+Demand but more noise types. Features much lower SNRs ([−10, −5, 0, 5, 10, 15, …
VoiceBank+DEMAND is a noisy speech database for training speech enhancement algorithms and TTS models. The database was designed to train …
VoiceBank+DEMAND is a noisy speech database for training speech enhancement algorithms and TTS models. The database was designed to train …
The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background …
AISHELL-1 is a corpus for speech recognition research and building speech recognition systems for Mandarin. Source: [AISHELL-1: An Open-Source Mandarin …
AISHELL-2 contains 1000 hours of clean read-speech data from iOS is free for academic usage. Source: [AISHELL-2: Transforming Mandarin ASR …
Common Voice is an audio dataset that consists of a unique MP3 and corresponding text file. There are 9,283 recorded …
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the cocktail party effect from an augmented-reality …
GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, …
This noisy speech test set is created from the Google Speech Commands v2 [1] and the Musan dataset[2]. It could …
The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences …
LRS3-TED is a multi-modal dataset for visual and audio-visual speech recognition. It includes face tracks from over 400 hours of …
Continuous speech separation (CSS) is an approach to handling overlapped speech in conversational audio signals. A real recorded dataset, called …
MediaSpeech is a media speech dataset (you might have guessed this) built with the purpose of testing Automated Speech Recognition …
Spoken Language Understanding Evaluation (SLUE) is a suite of benchmark tasks for spoken language understanding evaluation. It consists of limited-size …
SPGISpeech (pronounced “speegie-speech”) is a large-scale transcription dataset, freely available for academic research. SPGISpeech is a collection of 5,000 hours …
Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .
The TED-LIUM corpus consists of English-language TED talks. It includes transcriptions of these talks. The audio is sampled at 16kHz. …
The TIMIT Acoustic-Phonetic Continuous Speech Corpus is a standard dataset used for evaluation of automatic speech recognition systems. It consists …
Overall duration per microphone: about 36 hours (31 hrs train / 2.5 hrs dev / 2.5 hrs test) Count of …
We introduced a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled …
WenetSpeech is a multi-domain Mandarin corpus consisting of 10,000+ hours high-quality labeled speech, 2,400+ hours weakly labelled speech, and about …
The English data for voice building was obtained, prepared and provided the the challenge by Lessac Technologies Inc., having originally …
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from …
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by …
AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds …
Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from …
A synthetic sound mixture specification dataset for the Target Sound Extraction (TSE) task. Dataset samples consist of a .jams file …
AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds …
Clotho is an audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total …
We propose Localized Narratives, a new form of multimodal image annotations connecting vision and language. We ask annotators to describe …
We introduce a new audio dataset called SoundDescs that can be used for tasks such as text to audio retrieval, …
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from …
Trinity Gesture Dataset includes 23 takes, totalling 244 minutes of motion capture and audio of a male native English speaker …
The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with …
MusicCaps is a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. For each 10-second …
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about …