Sagalee

Dataset Information
Modalities
Audio, Speech
Languages
Oromo
Introduced
2025

Overview

Speech Recognition Dataset for Oromo Language.

📊 Key features of Sagalee:
100 hours of read speech.
283 gender balanced speakers
* Covers different dialects in Oromo language
* Open source for research

Variants: Sagalee

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Automatic Speech Recognition (ASR) Whisper-largev3-finetuned Sagalee: an Open Source Automatic … 2025-02-01
Automatic Speech Recognition (ASR) Conformer Sagalee: an Open Source Automatic … 2025-02-01

Research Papers

Recent papers with results on this dataset: