ML Research Wiki / Benchmarks / Lipreading / CAS-VSR-W1k (LRW-1000)

CAS-VSR-W1k (LRW-1000)

Lipreading Benchmark

Performance Over Time

📊 Showing 9 results | 📏 Metric: Top-1 Accuracy

Top Performing Models

Rank	Model	Paper	Top-1 Accuracy	Date	Code
1	SyncVSR (Word Boundary)	SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization	58.20	2024-06-18	📦 KAIST-AILab/SyncVSR
2	3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory	Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading	53.80	2022-04-04	📦 ms-dot-k/Multi-head-Visual-Audio-Memory
3	3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR (Word Boundary)	Learn an Effective Lip Reading Model without Pains	0.00	2020-11-15	📦 Fengdalu/learn-an-effective-lip-reading-model-without-pains
4	3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory	Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video	0.00	2022-04-04	📦 ms-dot-k/Visual-Audio-Memory
5	3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR	Learn an Effective Lip Reading Model without Pains	0.00	2020-11-15	📦 Fengdalu/learn-an-effective-lip-reading-model-without-pains
6	3D Conv + ResNet-18 + Bi-GRU (Face Cutout) 📚	Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition	0.00	2020-03-06	📦 sailordiary/deep-face-vsr
7	DFTN	Deformation Flow Based Two-Stream Network for Lip Reading	0.00	2020-03-12	📦 jingyunx/Deformation-Flow-Based-Two-stream-Network
8	GLMIM	Mutual Information Maximization for Effective Lip Reading	0.00	2020-03-13	📦 xing96/MIM-lipreading
9	PCPG	Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading	0.00	2020-03-09	-

All Papers (9)

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

2024

SyncVSR (Word Boundary)

KAIST-AILab/SyncVSR

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

2022

3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory

ms-dot-k/Multi-head-Visual-Audio-Memory

Learn an Effective Lip Reading Model without Pains

2020

3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR (Word Boundary)

Fengdalu/learn-an-effective-lip-reading-model-without-pains

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

2022

3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory

ms-dot-k/Visual-Audio-Memory

Learn an Effective Lip Reading Model without Pains

2020

3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR

Fengdalu/learn-an-effective-lip-reading-model-without-pains

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

2020

3D Conv + ResNet-18 + Bi-GRU (Face Cutout)

sailordiary/deep-face-vsr

Deformation Flow Based Two-Stream Network for Lip Reading

2020

DFTN

jingyunx/Deformation-Flow-Based-Two-stream-Network

Mutual Information Maximization for Effective Lip Reading

2020

GLMIM

xing96/MIM-lipreading

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

2020

PCPG

CAS-VSR-W1k (LRW-1000)

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (9)

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

Learn an Effective Lip Reading Model without Pains

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

Learn an Effective Lip Reading Model without Pains

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

Deformation Flow Based Two-Stream Network for Lip Reading

Mutual Information Maximization for Effective Lip Reading

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

Model	Paper	Top-1 Accuracy	Date
SyncVSR (Word Boundary)	SyncVSR: Data-Efficient Visual Speech Recognition…	58.20	2024-06-18
3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory	Distinguishing Homophenes Using Multi-Head Visual…	53.80	2022-04-04
3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR (Word Boundary)	Learn an Effective Lip Reading Model without Pains		2020-11-15
3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory	Multi-modality Associative Bridging through Memor…		2022-04-04
3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR	Learn an Effective Lip Reading Model without Pains		2020-11-15
3D Conv + ResNet-18 + Bi-GRU (Face Cutout)	Can We Read Speech Beyond the Lips? Rethinking Ro…		2020-03-06
DFTN	Deformation Flow Based Two-Stream Network for Lip…		2020-03-12
GLMIM	Mutual Information Maximization for Effective Lip…		2020-03-13
PCPG	Pseudo-Convolutional Policy Gradient for Sequence…		2020-03-09