BERSt

Name: BERSt
Published: 2025-04-30
License: Creative Commons Attribution 4.0

Basic Emotion Random phrase Shouts

Dataset Information

Modalities

Audio

Languages

English

Introduced

2025

License

Creative Commons Attribution 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

BERSt Dataset

We release the BERSt Dataset for various speech recognition tasks including Automatic Speech Recognition (ASR) and Speech Emotion Recogniton (SER)
Overview

4526 single phrase recordings (~3.75h)
98 professional actors
19 phone positions
7 emotion classes
3 vocal intensity levels
varied regional and non-native English accents
nonsense phrases covering all English Phonemes

Data collection

The BERSt dataset represents data collected in home envrionments using various smartphone microphones (phone model available as metadata) Participants were around the globe and represent varying regional accents in English: UK, Canada, USA (multi-state), Australia, including a subset of the data that is non-native English speakers including: French, Russian, Hindi etc. The data includes 13 non-sense phrases for use cases robust to lingustic context and high surprisal. Partipants were prompted to speak, raise their voice and shout each phrase while moving their phone to various distances and locations in their home, as well as with various obstructions to the microphone, e.g. in a backpack

Baseline results of various state-of-the-art methods for ASR and SER show that this dataset remains a challenging task, and we encourage researchers to use this data to fine-tune and benchmark their models in these difficult condition representing possible real world situations

Affect annotations are those provided to the actors, they have not been validated through perception The speech annotations, however, has been checked and adjusted to mistakes in the speech.
Data splits and organisation

For each phone position and phrase, the actors provided a single recording for the three vocal intensity levels, these raw audio files are available

Meta-data in csv format corresponds to the files split per utterance with noise and silence before and after speech removed, found inside clean_clips for each data splits

We provide a test, train and validation split

There is no speaker cross-over between splits, the train and validation sets each contain 10 speakers not seen in the training set

Metadata Details

actor count
    98
Gender counts
    Woman: 61
    Man: 34
    Non-Binary: 1
    Prefer not to disclose 2
Current daily language counts
    English: 95
    Norwegian: 1
    Russian: 1
    French: 1
First language counts
    English: 75
    Non English: 23
        Spanish: 6
        French: 3
        Portuguese: 3
        Chinese: 2
        Norwegian: 1
        Mandarin: 1
        Tagalog: 1
        Italian: 1
        Hungarian: 1
        Russian: 1
        Hindi: 1
        Swahili: 1
        Croatian: 1

Pre-split Data counts
Emotion counts
fear: 236
neutral: 234
disgust: 232
joy: 224
anger: 223
surprise: 210
sadness: 201
Distance counts:
Near body: 627
1-2m away: 324
Other side of room: 316
Outside of room: 293

Variants: BERSt

Associated Benchmarks

This dataset is used in 1 benchmark:

Speech Emotion Recognition - Metrics: Unweighted Accuracy (UA), Weighted Accuracy (WA)

Recent Benchmark Submissions

Task	Model	Paper	Date
Speech Emotion Recognition	DAWN-hidden-SVM	BERSting at the Screams: A …	2025-04-30
Speech Emotion Recognition	Wav2Small-VAD-SVM	BERSting at the Screams: A …	2025-04-30
Speech Emotion Recognition	Speechbrain Wav2Vec2	BERSting at the Screams: A …	2025-04-30

Research Papers

Recent papers with results on this dataset:

BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition (2025) -

External Links:

BERSt

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview