CAS-VSR-S101

Dataset Information
Modalities
Videos, Texts, Audio, Speech
Languages
Chinese
Introduced
2024
Homepage

Overview

A new large-scale, in-thewild Mandarin dataset, CAS-VSR-S101 with 101.1 hours of data. The videos are sourced from broadcast news and conversational programs in Chinese, covering a highly diverse set of topics, speakers and filming conditions. The lengths of the utterances are naturally distributed between 0.01s and 10.57s, and image qualities and resolutions vary. News accounts for 82.4% of the programs. 70.4% of the utterances depict news anchors, hosts and correspondents, while 29.6% are those of interviewees and guests. In addition, at a ratio of approximately 1.5 : 1, male and female appearances are relatively balanced. It is divided into train, validation and test sets by TV channels to minimize speaker
overlap, and at a ratio of roughly 8 : 1 : 1.5 in terms of duration. The validation and test sets are composed of programs broadcast on provincial TV channels.
The dataset is available for academic use under a license.

Variants: CAS-VSR-S101

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.