Banglish

Dataset Information
Modalities
Audio
Languages
English, Bengali
Introduced
2021
License
Unknown
Homepage

Overview

A Bilingual Dataset for Bangla and English Voice Commands

Colloquial Bangla has adopted many English words due to colonial influence. In conversational Bangla, it is quite common to speak in a mixture of English and Bangla. This phenomenon, prevalent in conversational language is known as code-switching (CS). CS is defined as the continuous alternation between two languages in a single conversation. Thus, in Bangla natural language processing, it is often necessary to map a single base command to its many different variants - spoken in multiple mixtures of English and Bangla. In order to facilitate this, we have curated a dataset centered around common browser commands.

Variants: Banglish

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.