ML Research Wiki / Benchmarks / Lipreading / LRS2

LRS2

Lipreading Benchmark

Performance Over Time

📊 Showing 18 results | 📏 Metric: Word Error Rate (WER)

Top Performing Models

Rank Model Paper Word Error Rate (WER) Date Code
1 LIBS Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers 65.29 2019-11-26 📦 zju-vipa/KamalEngine
2 TM-CTC + extLM 📚 Deep Audio-Visual Speech Recognition 54.70 2018-09-06 📦 lordmartian/deep_avsr 📦 smeetrs/deep_avsr 📦 exgc/avmust-ted 📦 amitai1992/AutomatedLipReading
3 CTC + KD ASR 📚 ASR is all you need: cross-modal distillation for lip reading 53.20 2019-11-28 -
4 Hybrid CTC / Attention Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture 50.00 2018-09-28 -
5 LF-MMI TDNN 📚 Audio-visual Recognition of Overlapped speech for the LRS2 dataset 48.86 2020-01-06 -
6 TM-seq2seq + extLM 📚 Deep Audio-Visual Speech Recognition 48.30 2018-09-06 📦 lordmartian/deep_avsr 📦 smeetrs/deep_avsr 📦 exgc/avmust-ted 📦 amitai1992/AutomatedLipReading
7 Multi-head Visual-Audio Memory 📚 Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading 44.50 2022-04-04 📦 ms-dot-k/Multi-head-Visual-Audio-Memory
8 MoCo + wav2vec (w/o extLM) Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition 43.20 2022-02-24 📦 lumia-group/leveraging-self-supervised-learning-for-avsr
9 Hybrid CTC / Attention End-to-end Audio-visual Speech Recognition with Conformers 39.10 2021-02-12 📦 zziz/pwc 📦 mpc001/Visual_Speech_Recognition_for_Multiple_Languages 📦 mpc001/auto_avsr
10 CTC/Attention Visual Speech Recognition for Multiple Languages in the Wild 32.90 2022-02-26 📦 mpc001/Visual_Speech_Recognition_for_Multiple_Languages 📦 david-gimeno/lip-rtve

All Papers (18)