VocalSound is a free dataset consisting of 21,024 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. The VocalSound dataset also contains meta-information such as speaker age, gender, native language, country, and health condition.
Variants: VocalSound
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Audio Classification | Qwen-Audio | Qwen-Audio: Advancing Universal Audio Understanding … | 2023-11-14 |
Audio Classification | VocalSound Baseline | Vocalsound: A Dataset for Improving … | 2022-05-06 |
Recent papers with results on this dataset: