TED-LIUM

Name: TED-LIUM
Published: 2012-05-01
License: CC BY-NC-ND 3.0

Dataset Information

Modalities

Audio

Languages

English

Introduced

2012

License

CC BY-NC-ND 3.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The TED-LIUM corpus consists of English-language TED talks. It includes transcriptions of these talks. The audio is sampled at 16kHz. The dataset spans a range of 118 to 452 hours of transcribed speech data.

Variants: TED-LIUM, Tedlium

Associated Benchmarks

This dataset is used in 1 benchmark:

Speech Recognition - Metrics: Word Error Rate (WER)

Recent Benchmark Submissions

Task	Model	Paper	Date
Speech Recognition	Whisper-LLaMa-7b	HyPoradise: An Open Baseline for …	2023-09-27
Speech Recognition	ConformerXXL-PS	BigSSL: Exploring the Frontier of …	2021-09-27

Research Papers

Recent papers with results on this dataset:

External Links:

TED-LIUM

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview