JamPatoisNLI provides the first dataset for natural language inference in a creole language,
Jamaican Patois. Many of the most-spoken
low-resource languages are creoles. These
languages commonly have a lexicon derived
from a major world language and a distinctive grammar reflecting the languages of the
original speakers and the process of language
birth by creolization. This gives them a distinctive place in exploring the effectiveness of
transfer from large monolingual or multilingual pretrained models. While our work, along
with previous work, shows that transfer from
these models to low-resource languages that
are unrelated to languages in their training set
is not very effective, we would expect stronger
results from transfer to creoles. Indeed, our
experiments show considerably better results
from few-shot learning of JamPatoisNLI than
for such unrelated languages, and help us begin to understand how the unique relationship
between creoles and their high-resource base
languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring
premises and expert-written hypotheses, is a
step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.
Variants: JamPatoisNLI
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Natural Language Inference | roberta-unfrozen | JamPatoisNLI: A Jamaican Patois Natural … | 2022-12-07 |
Natural Language Inference | bert-uncased-unfrozen | JamPatoisNLI: A Jamaican Patois Natural … | 2022-12-07 |
Recent papers with results on this dataset: