CAS-VSR-S101

Name: CAS-VSR-S101
Published: 2024-06-01
License: Free for academic use

Dataset Information

Modalities

Videos, Texts, Audio, Speech

Languages

Chinese

Introduced

2024

License

Free for academic use

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

A new large-scale, in-thewild Mandarin dataset, CAS-VSR-S101 with 101.1 hours of data. The videos are sourced from broadcast news and conversational programs in Chinese, covering a highly diverse set of topics, speakers and filming conditions. The lengths of the utterances are naturally distributed between 0.01s and 10.57s, and image qualities and resolutions vary. News accounts for 82.4% of the programs. 70.4% of the utterances depict news anchors, hosts and correspondents, while 29.6% are those of interviewees and guests. In addition, at a ratio of approximately 1.5 : 1, male and female appearances are relatively balanced. It is divided into train, validation and test sets by TV channels to minimize speaker
overlap, and at a ratio of roughly 8 : 1 : 1.5 in terms of duration. The validation and test sets are composed of programs broadcast on provincial TV channels.
The dataset is available for academic use under a license.

Variants: CAS-VSR-S101

Associated Benchmarks

This dataset is used in 3 benchmarks:

Speech Recognition - Metrics: Word Error Rate (WER)
Audio-Visual Speech Recognition - Metrics: Word Error Rate (WER)
Lipreading - Metrics: Word Error Rate (WER)

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.

External Links:

CAS-VSR-S101

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview