The EARS-WHAM dataset mixes speech from the EARS dataset with real noise recordings from the WHAM! dataset. Speech and noise files are mixed at signal-to-noise ratios (SNRs) randomly sampled in a range of [−2.5, 17.5] dB, where the SNR is computed using loudness K- weighted relative to full scale (LKFS) standardized in ITU-R BS.1770 to obtain a more perceptually meaningful scaling and also to remove silent regions from the SNR computation.
Variants: EARS-WHAM
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Speech Enhancement | Schrödinger Bridge (PESQ loss) | Investigating Training Objectives for Generative … | 2024-09-16 |
Speech Enhancement | Schrödinger Bridge | Schrödinger Bridge for Generative Speech … | 2024-07-22 |
Speech Enhancement | Demucs v4 | Hybrid Transformers for Music Source … | 2022-11-15 |
Speech Enhancement | SGMSE+ | Speech Enhancement and Dereverberation with … | 2022-08-11 |
Speech Enhancement | CDiffuSE | Conditional Diffusion Probabilistic Model for … | 2022-02-10 |
Speech Enhancement | Conv-TasNet | Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude … | 2018-09-20 |
Recent papers with results on this dataset: