TED-LIUM

Dataset Information
Modalities
Audio
Languages
English
Introduced
2012
License
Homepage

Overview

The TED-LIUM corpus consists of English-language TED talks. It includes transcriptions of these talks. The audio is sampled at 16kHz. The dataset spans a range of 118 to 452 hours of transcribed speech data.

Variants: TED-LIUM, Tedlium

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Speech Recognition Whisper-LLaMa-7b HyPoradise: An Open Baseline for … 2023-09-27
Speech Recognition ConformerXXL-PS BigSSL: Exploring the Frontier of … 2021-09-27

Research Papers

Recent papers with results on this dataset: