RAVDESS

Ryerson Audio-Visual Database of Emotional Speech and Song

Dataset Information
Modalities
Videos, Audio, Speech
Languages
English
Introduced
2020
Homepage

Overview

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

Paper: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English
Source:

Variants: RAVDESS

Associated Benchmarks

This dataset is used in 5 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Emotion Recognition MultiMAE-DER MultiMAE-DER: Multimodal Masked Autoencoder for … 2024-04-28
Audio Classification ASM-RH-A Mixer is more than just … 2024-02-28
Speech Emotion Recognition VQ-MAE-S-12 (Frame) + Query2Emo A vector quantized masked autoencoder … 2023-04-21
Emotion Recognition Intermediate-Attention-Fusion Self-attention fusion for audiovisual emotion … 2022-01-26

Research Papers

Recent papers with results on this dataset: