GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training.
Variants: GigaSpeech, GigaSpeech DEV, GigaSpeech TEST
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Speech Recognition | Conformer/Transformer-AED | GigaSpeech: An Evolving, Multi-domain ASR … | 2021-06-13 |
Recent papers with results on this dataset: