Basic Emotion Random phrase Shouts
BERSt Dataset
We release the BERSt Dataset for various speech recognition tasks including Automatic Speech Recognition (ASR) and Speech Emotion Recogniton (SER)
Overview
4526 single phrase recordings (~3.75h)
98 professional actors
19 phone positions
7 emotion classes
3 vocal intensity levels
varied regional and non-native English accents
nonsense phrases covering all English Phonemes
Data collection
The BERSt dataset represents data collected in home envrionments using various smartphone microphones (phone model available as metadata) Participants were around the globe and represent varying regional accents in English: UK, Canada, USA (multi-state), Australia, including a subset of the data that is non-native English speakers including: French, Russian, Hindi etc. The data includes 13 non-sense phrases for use cases robust to lingustic context and high surprisal. Partipants were prompted to speak, raise their voice and shout each phrase while moving their phone to various distances and locations in their home, as well as with various obstructions to the microphone, e.g. in a backpack
Baseline results of various state-of-the-art methods for ASR and SER show that this dataset remains a challenging task, and we encourage researchers to use this data to fine-tune and benchmark their models in these difficult condition representing possible real world situations
Affect annotations are those provided to the actors, they have not been validated through perception The speech annotations, however, has been checked and adjusted to mistakes in the speech.
Data splits and organisation
For each phone position and phrase, the actors provided a single recording for the three vocal intensity levels, these raw audio files are available
Meta-data in csv format corresponds to the files split per utterance with noise and silence before and after speech removed, found inside clean_clips for each data splits
We provide a test, train and validation split
There is no speaker cross-over between splits, the train and validation sets each contain 10 speakers not seen in the training set
Metadata Details
actor count
98
Gender counts
Woman: 61
Man: 34
Non-Binary: 1
Prefer not to disclose 2
Current daily language counts
English: 95
Norwegian: 1
Russian: 1
French: 1
First language counts
English: 75
Non English: 23
Spanish: 6
French: 3
Portuguese: 3
Chinese: 2
Norwegian: 1
Mandarin: 1
Tagalog: 1
Italian: 1
Hungarian: 1
Russian: 1
Hindi: 1
Swahili: 1
Croatian: 1
Pre-split Data counts
Emotion counts
fear: 236
neutral: 234
disgust: 232
joy: 224
anger: 223
surprise: 210
sadness: 201
Distance counts:
Near body: 627
1-2m away: 324
Other side of room: 316
Outside of room: 293
Variants: BERSt
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Speech Emotion Recognition | DAWN-hidden-SVM | BERSting at the Screams: A … | 2025-04-30 |
Speech Emotion Recognition | Wav2Small-VAD-SVM | BERSting at the Screams: A … | 2025-04-30 |
Speech Emotion Recognition | Speechbrain Wav2Vec2 | BERSting at the Screams: A … | 2025-04-30 |
Recent papers with results on this dataset: