CVSS is a massively multilingual-to-English speech to speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems
Variants: CVSS
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Speech-to-Speech Translation | SeamlessM4T Large | SeamlessM4T: Massively Multilingual & Multimodal … | 2023-08-22 |
Speech-to-Speech Translation | SeamlessM4T Medium | SeamlessM4T: Massively Multilingual & Multimodal … | 2023-08-22 |
Recent papers with results on this dataset: