LIBS
|
Hearing Lips: Improving Lip Reading by Distilling…
|
65.29
|
2019-11-26
|
|
TM-CTC + extLM
|
Deep Audio-Visual Speech Recognition
|
54.70
|
2018-09-06
|
|
CTC + KD ASR
|
ASR is all you need: cross-modal distillation for…
|
53.20
|
2019-11-28
|
|
Hybrid CTC / Attention
|
Audio-Visual Speech Recognition With A Hybrid CTC…
|
50.00
|
2018-09-28
|
|
LF-MMI TDNN
|
Audio-visual Recognition of Overlapped speech for…
|
48.86
|
2020-01-06
|
|
TM-seq2seq + extLM
|
Deep Audio-Visual Speech Recognition
|
48.30
|
2018-09-06
|
|
Multi-head Visual-Audio Memory
|
Distinguishing Homophenes Using Multi-Head Visual…
|
44.50
|
2022-04-04
|
|
MoCo + wav2vec (w/o extLM)
|
Leveraging Unimodal Self-Supervised Learning for …
|
43.20
|
2022-02-24
|
|
Hybrid CTC / Attention
|
End-to-end Audio-visual Speech Recognition with C…
|
39.10
|
2021-02-12
|
|
CTC/Attention
|
Visual Speech Recognition for Multiple Languages …
|
32.90
|
2022-02-26
|
|
VTP
|
Sub-word Level Lip Reading With Visual Attention
|
28.90
|
2021-10-14
|
|
SyncVSR
|
SyncVSR: Data-Efficient Visual Speech Recognition…
|
28.90
|
2024-06-18
|
|
CTC/Attention (LRW+LRS2/3+AVSpeech)
|
Visual Speech Recognition for Multiple Languages …
|
25.50
|
2022-02-26
|
|
VTP (more data)
|
Sub-word Level Lip Reading With Visual Attention
|
22.60
|
2021-10-14
|
|
RAVEn Large
|
Jointly Learning Visual and Auditory Speech Repre…
|
18.60
|
2022-12-12
|
|
SyncVSR
|
SyncVSR: Data-Efficient Visual Speech Recognition…
|
16.50
|
2024-06-18
|
|
USR
|
Unified Speech Recognition: A Single Model for Au…
|
15.40
|
2024-11-04
|
|
Auto-AVSR
|
Auto-AVSR: Audio-Visual Speech Recognition with A…
|
14.60
|
2023-03-25
|
|