LJSpeech

Name: LJSpeech
License: Public domain

The LJ Speech Dataset

Dataset Information

Modalities

Texts, Audio

Languages

English

License

Public domain

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours. The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded in 2016-17 by the LibriVox project and is also in the public domain.

Source: The LJ Speech Dataset
Image Source: https://keithito.com/LJ-Speech-Dataset/
Audio Source: https://keithito.com/LJ-Speech-Dataset/

Variants: LJSpeech

Associated Benchmarks

This dataset is used in 2 benchmarks:

Text-To-Speech Synthesis - Metrics: Audio Quality MOS, Pleasantness MOS, Word Error Rate (WER), MOS, WER (%)
Speech Synthesis - Metrics: Mean Opinion Score

Recent Benchmark Submissions

Task	Model	Paper	Date
Text-To-Speech Synthesis	Matcha-TTS	Matcha-TTS: A fast TTS architecture …	2023-09-06
Text-To-Speech Synthesis	OverFlow	OverFlow: Putting flows on top …	2022-11-13
Text-To-Speech Synthesis	NaturalSpeech	NaturalSpeech: End-to-End Text to Speech …	2022-05-09
Text-To-Speech Synthesis	FastSpeech 2 + HiFiGAN	NaturalSpeech: End-to-End Text to Speech …	2022-05-09
Text-To-Speech Synthesis	VITS	NaturalSpeech: End-to-End Text to Speech …	2022-05-09
Text-To-Speech Synthesis	FastDiff (4 steps)	FastDiff: A Fast Conditional Diffusion …	2022-04-21
Text-To-Speech Synthesis	FastDiff-TTS	FastDiff: A Fast Conditional Diffusion …	2022-04-21
Speech Synthesis	BDDM vocoder	BDDM: Bilateral Denoising Diffusion Models …	2022-03-25
Speech Synthesis	Neural HMM Ablation with 1 state per phone	Neural HMMs are all you …	2021-08-30
Speech Synthesis	Neural HMM	Neural HMMs are all you …	2021-08-30
Text-To-Speech Synthesis	Grad-TTS + HiFiGAN (1000 steps)	Grad-TTS: A Diffusion Probabilistic Model …	2021-05-13
Speech Synthesis	DiffWave LARGE	DiffWave: A Versatile Diffusion Model …	2020-09-21
Text-To-Speech Synthesis	FastSpeech 2 + HiFiGAN	FastSpeech 2: Fast and High-Quality …	2020-06-08
Text-To-Speech Synthesis	Glow-TTS + HiFiGAN	Glow-TTS: A Generative Flow for …	2020-05-22
Text-To-Speech Synthesis	Flowtron	Flowtron: an Autoregressive Flow-based Generative …	2020-05-12
Text-To-Speech Synthesis	Tacotron 2	Flowtron: an Autoregressive Flow-based Generative …	2020-05-12
Text-To-Speech Synthesis	FastSpeech (Mel + WaveGlow)	FastSpeech: Fast, Robust and Controllable …	2019-05-22
Text-To-Speech Synthesis	Merlin	FastSpeech: Fast, Robust and Controllable …	2019-05-22
Text-To-Speech Synthesis	Transformer TTS (Mel + WaveGlow)	Neural Speech Synthesis with Transformer …	2018-09-19

Research Papers

Recent papers with results on this dataset:

External Links:

LJSpeech

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview