FSDSoundScapes

Dataset Information
Introduced
2022
License
Homepage

Overview

A synthetic sound mixture specification dataset for the Target Sound Extraction (TSE) task. Dataset samples consist of a .jams file specifying the mixture components, and a metadata file with target labels. Mixtures are 6 seconds long and contain 3-5 unique foreground sounds over a 6 second long background sound. Each sample is provided with 3 target labels, and sounds corresponding to all target labels are guaranteed to be present in the mixture. FSDKaggle2018 is used as the source for foreground sounds and TAU Urban Acoustic Scenes 2019 is used as the source for background sounds.

Split

Train: 50K
Val: 5K
Test: 10K

Variants: FSDSoundScapes

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Target Sound Extraction Waveformer Real-Time Target Sound Extraction 2022-11-04

Research Papers

Recent papers with results on this dataset: