MASSIVE

Dataset Information
Modalities
Texts
Introduced
2022
License
Homepage

Overview

MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.

Variants: MASSIVE

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Intent Classification mT5 Base (encoder-only) MASSIVE: A 1M-Example Multilingual Natural … 2022-04-18
Intent Classification mT5 Base (text-to-text) MASSIVE: A 1M-Example Multilingual Natural … 2022-04-18
Intent Classification XLM-R Base MASSIVE: A 1M-Example Multilingual Natural … 2022-04-18
Slot Filling XLM-R Base MASSIVE: A 1M-Example Multilingual Natural … 2022-04-18
Slot Filling mT5 Base (encoder-only) MASSIVE: A 1M-Example Multilingual Natural … 2022-04-18
Slot Filling mT5 Base (text-to-text) MASSIVE: A 1M-Example Multilingual Natural … 2022-04-18

Research Papers

Recent papers with results on this dataset: