AliMeeting

Multi-Channel Multi-Party Meeting Transcription Challenge

Dataset Information
Modalities
Audio
Languages
Chinese
Introduced
2021
License
Homepage

Overview

AliMeeting corpus consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size.

Variants: AliMeeting

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speaker Diarization SOND Speaker Embedding-aware Neural Diarization: an … 2022-03-18

Research Papers

Recent papers with results on this dataset: