TM-seq2seq
|
Deep Audio-Visual Speech Recognition
|
7.20
|
2018-09-06
|
|
EG-seq2seq
|
Discriminative Multi-modality Speech Recognition
|
6.80
|
2020-05-12
|
|
RNN-T
|
Recurrent Neural Network Transducer for Audio-Vis…
|
4.50
|
2019-11-08
|
|
Hyb-Conformer
|
End-to-end Audio-visual Speech Recognition with C…
|
2.30
|
2021-02-12
|
|
Zero-AVSR
|
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognit…
|
1.50
|
2025-03-08
|
|
AV-HuBERT Large
|
Robust Self-Supervised Audio-Visual Speech Recogn…
|
1.40
|
2022-01-05
|
|
RAVEn Large
|
Jointly Learning Visual and Auditory Speech Repre…
|
1.40
|
2022-12-12
|
|
DistillAV
|
Audio-Visual Representation Learning via Knowledg…
|
1.30
|
2025-02-09
|
|
CTC/Attention
|
Auto-AVSR: Audio-Visual Speech Recognition with A…
|
0.90
|
2023-03-25
|
|
Llama-AVSR
|
Large Language Models are Strong Audio-Visual Spe…
|
0.77
|
2024-09-18
|
|
Whisper-Flamingo
|
Whisper-Flamingo: Integrating Visual Features int…
|
0.76
|
2024-06-14
|
|
MMS-LLaMA
|
MMS-LLaMA: Efficient LLM-based Audio-Visual Speec…
|
0.74
|
2025-03-14
|
|