π Showing 4 results | π Metric: Word Error Rate (WER)
Rank | Model | Paper | Word Error Rate (WER) | Date | Code |
---|---|---|---|---|---|
1 | RAVEn Large π | Jointly Learning Visual and Auditory Speech Representations from Raw Data | 1.40 | 2022-12-12 | π¦ ahaliassos/raven |
2 | AV-HuBERT Large π | Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction | 1.30 | 2022-01-05 | π¦ facebookresearch/av_hubert π¦ guxm2021/MM_ALT |
3 | Llama-AVSR π | Large Language Models are Strong Audio-Visual Speech Recognition Learners | 0.81 | 2024-09-18 | π¦ umbertocappellazzo/llama-avsr |
4 | Whisper π | Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | 0.68 | 2024-06-14 | π¦ roudimit/whisper-flamingo |