The TED-LIUM corpus consists of English-language TED talks. It includes transcriptions of these talks. The audio is sampled at 16kHz. The dataset spans a range of 118 to 452 hours of transcribed speech data.
Variants: TED-LIUM, Tedlium
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Speech Recognition | Whisper-LLaMa-7b | HyPoradise: An Open Baseline for … | 2023-09-27 |
Speech Recognition | ConformerXXL-PS | BigSSL: Exploring the Frontier of … | 2021-09-27 |
Recent papers with results on this dataset: