The SALMon dataset and benchmark was introduced in the paper "A Suite for Acoustic Language Model Evaluation", with the goal of evaluating the modelling abilities of speech language models with regards to different kinds of acoustic elements.
It is built of several sub tasks, each task has 200 pairs of recordings - one considered positive and one negative. The positive recording is meant to be a more likely, realistic sample whereas the negative is less likely by some specific means.
The sub tasks can be categorised into two main categories: acoustic consistency and semantic-acoustic alignment. In semantic consistency, the positive sample is a real recording and the negative one is with the same spoken content but an acoustic feature (e.g. speaker) changes mid recording. In the alignment sub-task, the positive recording is one where the text matches the acoustic element (e.g sentiment) and the negative is where they don't.
See also the homepage or HuggingFace.
Variants: SALMon
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Language Modelling | LAST 350M | LAST: Language Model Aware Speech … | 2024-09-05 |
Language Modelling | LAST 1.3B | LAST: Language Model Aware Speech … | 2024-09-05 |
Language Modelling | Spirit-LM (Expr.) | Spirit LM: Interleaved Spoken and … | 2024-02-08 |
Language Modelling | Spirit-LM (base) | Spirit LM: Interleaved Spoken and … | 2024-02-08 |
Language Modelling | TWIST 1.3B | Textually Pretrained Speech Language Models | 2023-05-22 |
Language Modelling | TWIST 7B | Textually Pretrained Speech Language Models | 2023-05-22 |
Language Modelling | TWIST 350M | Textually Pretrained Speech Language Models | 2023-05-22 |
Language Modelling | pGSLM | Text-Free Prosody-Aware Generative Spoken Language … | 2021-09-07 |
Recent papers with results on this dataset: