RealMAN

Name: RealMAN
Published: 2024-06-28
License: Unknown

A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Dataset Information

Modalities

Audio, Speech

Languages

English, Chinese

Introduced

2024

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The Audio Signal and Information Processing Lab at Westlake University, in collaboration with AISHELL, has released the Real-recorded and annotated Microphone Array speech&Noise (RealMAN) dataset, which provides annotated multi-channel speech and noise recordings for dynamic speech enhancement and localization:

Microphone array: A 32-channel microphone array with high-fidelity microphones is used for recording
Speech source: A loudspeaker is used for playing source speech signals (about 35 hours of Mandarin speech)
Recording duration and scene: A total of 83.7 hours of speech signals (about 48.3 hours for static speaker and 35.4 hours for moving speaker) are recorded in 32 different scenes, and 144.5 hours of background noise are recorded in 31 different scenes. Both speech and noise recording scenes cover various common indoor, outdoor, semi-outdoor and transportation environments, which enables the training of general-purpose speech enhancement and source localization networks.
Annotation: To obtain the task-specific annotations, speaker location is annotated with an omni-directional fisheye camera by automatically detecting the loudspeaker. The direct-path signal is set as the target clean speech for speech enhancement, which is obtained by filtering the source speech signal with an estimated direct-path propagation filter.

Variants: RealMAN

Associated Benchmarks

This dataset is used in 2 benchmarks:

Speech Enhancement - Metrics: DNSMOS, DNSMOS BAK, DNSMOS OVRL, DNSMOS SIG, PESQ-WB
Automatic Speech Recognition (ASR) - Metrics: CER

Recent Benchmark Submissions

Task	Model	Paper	Date
Speech Enhancement	CleanMel-L-map	CleanMel: Mel-Spectrogram Enhancement for Improving …	2025-02-27
Automatic Speech Recognition (ASR)	CleanMel-L-mask	CleanMel: Mel-Spectrogram Enhancement for Improving …	2025-02-27

Research Papers

Recent papers with results on this dataset:

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR (2025) -

External Links:

RealMAN

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview