MediaSpeech

Dataset Information
Modalities
Speech
Languages
English, Spanish
Introduced
2021
License
Homepage

Overview

MediaSpeech is a media speech dataset (you might have guessed this) built with the purpose of testing Automated Speech Recognition (ASR) systems performance. The dataset consists of short speech segments automatically extracted from media videos available on YouTube and manually transcribed, with some pre- and post-processing. The dataset contains 10 hours of speech for each language provided. This release contains audio datasets in French, Arabic, Turkish and Spanish, and is a part of a larger private dataset.

Source: MediaSpeech: Multilanguage ASR Benchmark and Dataset

Variants: MediaSpeech

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speech Recognition Quartznet MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition Wit MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition Azure MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition VOSK MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition Google MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition wav2vec MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition Deepspeech MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30
Speech Recognition Silero MediaSpeech: Multilanguage ASR Benchmark and … 2021-03-30

Research Papers

Recent papers with results on this dataset: