MuST-C currently represents the largest publicly available multilingual corpus (one-to-many) for speech translation. It covers eight language directions, from English to German, Spanish, French, Italian, Dutch, Portuguese, Romanian and Russian. The corpus consists of audio, transcriptions and translations of English TED talks, and it comes with a predefined training, validation and test split.
Source: One-to-Many Multilingual End-to-End Speech Translation
Image Source: https://mt.fbk.eu/must-c
Variants: MuST-C EN->DE, MuST-C
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Speech-to-Text Translation | Transformer with Adapters | Lightweight Adapter Tuning for Multilingual … | 2021-06-02 |
Speech-to-Text Translation | Dual-decoder Transformer | Dual-decoder Transformer for Joint Automatic … | 2020-11-02 |
Recent papers with results on this dataset: