SALMon

Dataset Information
Modalities
Audio
Languages
English
Introduced
2024
License
Homepage

Overview

The SALMon dataset and benchmark was introduced in the paper "A Suite for Acoustic Language Model Evaluation", with the goal of evaluating the modelling abilities of speech language models with regards to different kinds of acoustic elements.

It is built of several sub tasks, each task has 200 pairs of recordings - one considered positive and one negative. The positive recording is meant to be a more likely, realistic sample whereas the negative is less likely by some specific means.

The sub tasks can be categorised into two main categories: acoustic consistency and semantic-acoustic alignment. In semantic consistency, the positive sample is a real recording and the negative one is with the same spoken content but an acoustic feature (e.g. speaker) changes mid recording. In the alignment sub-task, the positive recording is one where the text matches the acoustic element (e.g sentiment) and the negative is where they don't.

See also the homepage or HuggingFace.

Variants: SALMon

Associated Benchmarks

This dataset is used in 1 benchmark:

  • Language Modelling -

Recent Benchmark Submissions

Task Model Paper Date
Language Modelling LAST 350M LAST: Language Model Aware Speech … 2024-09-05
Language Modelling LAST 1.3B LAST: Language Model Aware Speech … 2024-09-05
Language Modelling Spirit-LM (Expr.) Spirit LM: Interleaved Spoken and … 2024-02-08
Language Modelling Spirit-LM (base) Spirit LM: Interleaved Spoken and … 2024-02-08
Language Modelling TWIST 1.3B Textually Pretrained Speech Language Models 2023-05-22
Language Modelling TWIST 7B Textually Pretrained Speech Language Models 2023-05-22
Language Modelling TWIST 350M Textually Pretrained Speech Language Models 2023-05-22
Language Modelling pGSLM Text-Free Prosody-Aware Generative Spoken Language … 2021-09-07

Research Papers

Recent papers with results on this dataset: