ML Research Wiki / Benchmarks / Text-To-Speech Synthesis / LJSpeech

LJSpeech

Text-To-Speech Synthesis Benchmark

Performance Over Time

📊 Showing 15 results | 📏 Metric: Audio Quality MOS

Top Performing Models

Rank	Model	Paper	Audio Quality MOS	Date	Code
1	NaturalSpeech 📚	NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	4.56	2022-05-09	📦 microsoft/NeuralSpeech 📦 daniilrobnikov/vits2 📦 heatz123/naturalspeech
2	VITS 📚	NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	4.43	2022-05-09	📦 microsoft/NeuralSpeech 📦 daniilrobnikov/vits2 📦 heatz123/naturalspeech
3	Grad-TTS + HiFiGAN (1000 steps) 📚	Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech	4.37	2021-05-13	📦 huawei-noah/Speech-Backbones 📦 keonlee9420/DiffGAN-TTS 📦 keonlee9420/DiffSinger
4	Glow-TTS + HiFiGAN 📚	Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search	4.34	2020-05-22	📦 coqui-ai/TTS 📦 jaywalnut310/glow-tts 📦 supertone-inc/super-monotonic-align
5	FastSpeech 2 + HiFiGAN 📚	NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	4.34	2022-05-09	📦 microsoft/NeuralSpeech 📦 daniilrobnikov/vits2 📦 heatz123/naturalspeech
6	FastSpeech 2 + HiFiGAN 📚	FastSpeech 2: Fast and High-Quality End-to-End Text to Speech	4.32	2020-06-08	📦 coqui-ai/TTS 📦 PaddlePaddle/PaddleSpeech 📦 TensorSpeech/TensorflowTTS
7	FastDiff (4 steps) 📚	FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis	4.28	2022-04-21	📦 Rongjiehuang/ProDiff 📦 Rongjiehuang/FastDiff
8	FastDiff-TTS 📚	FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis	4.03	2022-04-21	📦 Rongjiehuang/ProDiff 📦 Rongjiehuang/FastDiff
9	Transformer TTS (Mel + WaveGlow) 📚	Neural Speech Synthesis with Transformer Network	3.88	2018-09-19	📦 PaddlePaddle/PaddleSpeech 📦 as-ideas/TransformerTTS 📦 soobinseo/transformer-tts
10	FastSpeech (Mel + WaveGlow) 📚	FastSpeech: Fast, Robust and Controllable Text to Speech	3.84	2019-05-22	📦 coqui-ai/TTS 📦 PaddlePaddle/PaddleSpeech 📦 ming024/FastSpeech2

All Papers (15)

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

2022

NaturalSpeech

microsoft/NeuralSpeech daniilrobnikov/vits2 heatz123/naturalspeech

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

2022

VITS

microsoft/NeuralSpeech daniilrobnikov/vits2 heatz123/naturalspeech

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

2021

Grad-TTS + HiFiGAN (1000 steps)

huawei-noah/Speech-Backbones keonlee9420/DiffGAN-TTS

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

2020

Glow-TTS + HiFiGAN

coqui-ai/TTS jaywalnut310/glow-tts

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

2022

FastSpeech 2 + HiFiGAN

microsoft/NeuralSpeech daniilrobnikov/vits2 heatz123/naturalspeech

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

2020

FastSpeech 2 + HiFiGAN

coqui-ai/TTS PaddlePaddle/PaddleSpeech

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

2022

FastDiff (4 steps)

Rongjiehuang/ProDiff Rongjiehuang/FastDiff

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

2022

FastDiff-TTS

Rongjiehuang/ProDiff Rongjiehuang/FastDiff

Neural Speech Synthesis with Transformer Network

2018

Transformer TTS (Mel + WaveGlow)

PaddlePaddle/PaddleSpeech as-ideas/TransformerTTS

FastSpeech: Fast, Robust and Controllable Text to Speech

2019

FastSpeech (Mel + WaveGlow)

coqui-ai/TTS PaddlePaddle/PaddleSpeech

Matcha-TTS: A fast TTS architecture with conditional flow matching

2023

Matcha-TTS

shivammehta25/Matcha-TTS

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

2020

Flowtron

NVIDIA/flowtron NVIDIA/radtts KathyReid/opensource-voice-tools

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

2020

Tacotron 2

NVIDIA/flowtron NVIDIA/radtts KathyReid/opensource-voice-tools

OverFlow: Putting flows on top of neural transducers for better TTS

2022

OverFlow

coqui-ai/TTS shivammehta25/OverFlow

FastSpeech: Fast, Robust and Controllable Text to Speech

2019

Merlin

coqui-ai/TTS PaddlePaddle/PaddleSpeech

LJSpeech

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (15)

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Neural Speech Synthesis with Transformer Network

FastSpeech: Fast, Robust and Controllable Text to Speech

Matcha-TTS: A fast TTS architecture with conditional flow matching

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

OverFlow: Putting flows on top of neural transducers for better TTS

FastSpeech: Fast, Robust and Controllable Text to Speech

Model	Paper	Audio Quality MOS	Date
NaturalSpeech	NaturalSpeech: End-to-End Text to Speech Synthesi…	4.56	2022-05-09
VITS	NaturalSpeech: End-to-End Text to Speech Synthesi…	4.43	2022-05-09
Grad-TTS + HiFiGAN (1000 steps)	Grad-TTS: A Diffusion Probabilistic Model for Tex…	4.37	2021-05-13
Glow-TTS + HiFiGAN	Glow-TTS: A Generative Flow for Text-to-Speech vi…	4.34	2020-05-22
FastSpeech 2 + HiFiGAN	NaturalSpeech: End-to-End Text to Speech Synthesi…	4.34	2022-05-09
FastSpeech 2 + HiFiGAN	FastSpeech 2: Fast and High-Quality End-to-End Te…	4.32	2020-06-08
FastDiff (4 steps)	FastDiff: A Fast Conditional Diffusion Model for …	4.28	2022-04-21
FastDiff-TTS	FastDiff: A Fast Conditional Diffusion Model for …	4.03	2022-04-21
Transformer TTS (Mel + WaveGlow)	Neural Speech Synthesis with Transformer Network	3.88	2018-09-19
FastSpeech (Mel + WaveGlow)	FastSpeech: Fast, Robust and Controllable Text to…	3.84	2019-05-22
Matcha-TTS	Matcha-TTS: A fast TTS architecture with conditio…	3.84	2023-09-06
Flowtron	Flowtron: an Autoregressive Flow-based Generative…	3.67	2020-05-12
Tacotron 2	Flowtron: an Autoregressive Flow-based Generative…	3.52	2020-05-12
OverFlow	OverFlow: Putting flows on top of neural transduc…	3.37	2022-11-13
Merlin	FastSpeech: Fast, Robust and Controllable Text to…	2.40	2019-05-22