LRS2

Lip Reading Sentences 2

Dataset Information
Modalities
Videos, Texts, Audio
Introduced
2017
Homepage

Overview

The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences in-the-wild. The database consists of mainly news and talk shows from BBC programs. Each sentence is up to 100 characters in length. The training, validation and test sets are divided according to broadcast date. It is a challenging set since it contains thousands of speakers without speaker labels and large variation in head pose. The pre-training set contains 96,318 utterances, the training set contains 45,839 utterances, the validation set contains 1,082 utterances and the test set contains 1,242 utterances.

Source: Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Image Source: https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html

Variants: LRS2

Associated Benchmarks

This dataset is used in 7 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Lipreading USR Unified Speech Recognition: A Single … 2024-11-04
Lipreading SyncVSR SyncVSR: Data-Efficient Visual Speech Recognition … 2024-06-18
Lipreading SyncVSR SyncVSR: Data-Efficient Visual Speech Recognition … 2024-06-18
Automatic Speech Recognition (ASR) Whisper Whisper-Flamingo: Integrating Visual Features into … 2024-06-14
Audio-Visual Speech Recognition Whisper-Flamingo Whisper-Flamingo: Integrating Visual Features into … 2024-06-14
Speech Separation TDFNet (MHSA + Shared) TDFNet: An Efficient Audio-Visual Speech … 2024-01-25
Speech Separation TDFNet-large TDFNet: An Efficient Audio-Visual Speech … 2024-01-25
Speech Separation TDFNet-small TDFNet: An Efficient Audio-Visual Speech … 2024-01-25
Automatic Speech Recognition (ASR) Whisper-LLaMA Whispering LLaMA: A Cross-Modal Generative … 2023-10-10
Speech Separation RTFS-Net-6 RTFS-Net: Recurrent Time-Frequency Modelling for … 2023-09-29
Speech Separation RTFS-Net-12 RTFS-Net: Recurrent Time-Frequency Modelling for … 2023-09-29
Speech Separation RTFS-Net-4 RTFS-Net: Recurrent Time-Frequency Modelling for … 2023-09-29
Speech Separation IIANet IIANet: An Intra- and Inter-Modality … 2023-08-16
Automatic Speech Recognition (ASR) CTC/Attention Auto-AVSR: Audio-Visual Speech Recognition with … 2023-03-25
Lipreading Auto-AVSR Auto-AVSR: Audio-Visual Speech Recognition with … 2023-03-25
Audio-Visual Speech Recognition CTC/Attention Auto-AVSR: Audio-Visual Speech Recognition with … 2023-03-25
Speech Separation CTCNet An Audio-Visual Speech Separation Model … 2022-12-21
Speech Recognition RAVEn Large Jointly Learning Visual and Auditory … 2022-12-12
Lipreading RAVEn Large Jointly Learning Visual and Auditory … 2022-12-12
Lipreading Multi-head Visual-Audio Memory Distinguishing Homophenes Using Multi-Head Visual-Audio … 2022-04-04

Research Papers

Recent papers with results on this dataset: