FLEURS

Few-shot Learning Evaluation of Universal Representations of Speech

Dataset Information
Modalities
Texts, Audio
Introduced
2022
License
Homepage

Overview

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.

Variants: FLUERS Korean, google/fleurs tr_tr, google/fleurs tg_tj, google/fleurs ta_in, google/fleurs ro, google/fleurs pt_br, GOOGLE/FLEURS - PS_AF, google/fleurs ps_af, google/fleurs ko_kr, google/fleurs ja_jp, google/fleurs id_id, google/fleurs he_il, google/fleurs gl_es, google/fleurs cmn_hans_cn, google/fleurs ca, google/fleurs am_et, Google FLEURS, google/fleurs, FLEURS ASR, ERR2020, Common Voice 11.0, FLEURS, Fleurs (English), FLEURS

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.