CVSS

Name: CVSS
Published: 2022-01-11
License: CC BY 4.0

Dataset Information

Modalities

Texts, Audio, Speech

Languages

English, French, Spanish, German, Italian, Chinese, Japanese, Russian, Portuguese, Arabic, Catalan, Dutch, Estonian, Indonesian, Latvian, Persian, Slovenian, Swedish, Tamil, Turkish, Welsh, Mongolian

Introduced

2022

License

CC BY 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

CVSS is a massively multilingual-to-English speech to speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems

Variants: CVSS

Associated Benchmarks

This dataset is used in 1 benchmark:

Speech-to-Speech Translation - Metrics: ASR-BLEU, Parameters

Recent Benchmark Submissions

Task	Model	Paper	Date
Speech-to-Speech Translation	SeamlessM4T Large	SeamlessM4T: Massively Multilingual & Multimodal …	2023-08-22
Speech-to-Speech Translation	SeamlessM4T Medium	SeamlessM4T: Massively Multilingual & Multimodal …	2023-08-22

Research Papers

Recent papers with results on this dataset:

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation (2023) -

External Links:

CVSS

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview