GigaSpeech

Dataset Information
Modalities
Audio, Speech
Languages
English
Introduced
2021
License
Unknown
Homepage

Overview

GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training.

Variants: GigaSpeech, GigaSpeech DEV, GigaSpeech TEST

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speech Recognition Conformer/Transformer-AED GigaSpeech: An Evolving, Multi-domain ASR … 2021-06-13

Research Papers

Recent papers with results on this dataset: