MASSIVE

Name: MASSIVE
Published: 2022-04-18
License: CC BY 4.0

Dataset Information

Modalities

Texts

Introduced

2022

License

CC BY 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.

Variants: MASSIVE

Associated Benchmarks

This dataset is used in 2 benchmarks:

Intent Classification - Metrics: Intent Accuracy
Slot Filling - Metrics: Slot F1 Score

Recent Benchmark Submissions

Task	Model	Paper	Date
Intent Classification	mT5 Base (encoder-only)	MASSIVE: A 1M-Example Multilingual Natural …	2022-04-18
Intent Classification	mT5 Base (text-to-text)	MASSIVE: A 1M-Example Multilingual Natural …	2022-04-18
Intent Classification	XLM-R Base	MASSIVE: A 1M-Example Multilingual Natural …	2022-04-18
Slot Filling	XLM-R Base	MASSIVE: A 1M-Example Multilingual Natural …	2022-04-18
Slot Filling	mT5 Base (encoder-only)	MASSIVE: A 1M-Example Multilingual Natural …	2022-04-18
Slot Filling	mT5 Base (text-to-text)	MASSIVE: A 1M-Example Multilingual Natural …	2022-04-18

Research Papers

Recent papers with results on this dataset:

MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages (2022) -

External Links:

MASSIVE

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview