MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.
Variants: MASSIVE
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Intent Classification | mT5 Base (encoder-only) | MASSIVE: A 1M-Example Multilingual Natural … | 2022-04-18 |
Intent Classification | mT5 Base (text-to-text) | MASSIVE: A 1M-Example Multilingual Natural … | 2022-04-18 |
Intent Classification | XLM-R Base | MASSIVE: A 1M-Example Multilingual Natural … | 2022-04-18 |
Slot Filling | XLM-R Base | MASSIVE: A 1M-Example Multilingual Natural … | 2022-04-18 |
Slot Filling | mT5 Base (encoder-only) | MASSIVE: A 1M-Example Multilingual Natural … | 2022-04-18 |
Slot Filling | mT5 Base (text-to-text) | MASSIVE: A 1M-Example Multilingual Natural … | 2022-04-18 |
Recent papers with results on this dataset: