BERSt

Basic Emotion Random phrase Shouts

Dataset Information
Modalities
Audio
Languages
English
Introduced
2025
License
Creative Commons Attribution 4.0
Homepage

Overview

BERSt Dataset

We release the BERSt Dataset for various speech recognition tasks including Automatic Speech Recognition (ASR) and Speech Emotion Recogniton (SER)
Overview

4526 single phrase recordings (~3.75h)
98 professional actors
19 phone positions
7 emotion classes
3 vocal intensity levels
varied regional and non-native English accents
nonsense phrases covering all English Phonemes

Data collection

The BERSt dataset represents data collected in home envrionments using various smartphone microphones (phone model available as metadata) Participants were around the globe and represent varying regional accents in English: UK, Canada, USA (multi-state), Australia, including a subset of the data that is non-native English speakers including: French, Russian, Hindi etc. The data includes 13 non-sense phrases for use cases robust to lingustic context and high surprisal. Partipants were prompted to speak, raise their voice and shout each phrase while moving their phone to various distances and locations in their home, as well as with various obstructions to the microphone, e.g. in a backpack

Baseline results of various state-of-the-art methods for ASR and SER show that this dataset remains a challenging task, and we encourage researchers to use this data to fine-tune and benchmark their models in these difficult condition representing possible real world situations

Affect annotations are those provided to the actors, they have not been validated through perception The speech annotations, however, has been checked and adjusted to mistakes in the speech.
Data splits and organisation

For each phone position and phrase, the actors provided a single recording for the three vocal intensity levels, these raw audio files are available

Meta-data in csv format corresponds to the files split per utterance with noise and silence before and after speech removed, found inside clean_clips for each data splits

We provide a test, train and validation split

There is no speaker cross-over between splits, the train and validation sets each contain 10 speakers not seen in the training set

Metadata Details

actor count
    98
Gender counts
    Woman: 61
    Man: 34
    Non-Binary: 1
    Prefer not to disclose 2
Current daily language counts
    English: 95
    Norwegian: 1
    Russian: 1
    French: 1
First language counts
    English: 75
    Non English: 23
        Spanish: 6
        French: 3
        Portuguese: 3
        Chinese: 2
        Norwegian: 1
        Mandarin: 1
        Tagalog: 1
        Italian: 1
        Hungarian: 1
        Russian: 1
        Hindi: 1
        Swahili: 1
        Croatian: 1

Pre-split Data counts
Emotion counts
fear: 236
neutral: 234
disgust: 232
joy: 224
anger: 223
surprise: 210
sadness: 201
Distance counts:
Near body: 627
1-2m away: 324
Other side of room: 316
Outside of room: 293

Variants: BERSt

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speech Emotion Recognition DAWN-hidden-SVM BERSting at the Screams: A … 2025-04-30
Speech Emotion Recognition Wav2Small-VAD-SVM BERSting at the Screams: A … 2025-04-30
Speech Emotion Recognition Speechbrain Wav2Vec2 BERSting at the Screams: A … 2025-04-30

Research Papers

Recent papers with results on this dataset: