EARS-WHAM

Dataset Information
Modalities
Speech
Languages
English
Introduced
2024
Homepage

Overview

The EARS-WHAM dataset mixes speech from the EARS dataset with real noise recordings from the WHAM! dataset. Speech and noise files are mixed at signal-to-noise ratios (SNRs) randomly sampled in a range of [−2.5, 17.5] dB, where the SNR is computed using loudness K- weighted relative to full scale (LKFS) standardized in ITU-R BS.1770 to obtain a more perceptually meaningful scaling and also to remove silent regions from the SNR computation.

Variants: EARS-WHAM

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speech Enhancement Schrödinger Bridge (PESQ loss) Investigating Training Objectives for Generative … 2024-09-16
Speech Enhancement Schrödinger Bridge Schrödinger Bridge for Generative Speech … 2024-07-22
Speech Enhancement Demucs v4 Hybrid Transformers for Music Source … 2022-11-15
Speech Enhancement SGMSE+ Speech Enhancement and Dereverberation with … 2022-08-11
Speech Enhancement CDiffuSE Conditional Diffusion Probabilistic Model for … 2022-02-10
Speech Enhancement Conv-TasNet Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude … 2018-09-20

Research Papers

Recent papers with results on this dataset: