SGD

Schema-Guided Dialogue

Dataset Information
Modalities
Texts
Languages
English
License
Homepage

Overview

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings.

Source: The Schema-Guided Dialogue Dataset

Variants: SGD

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Task-Oriented Dialogue Systems T5 The GEM Benchmark: Natural Language … 2021-02-02
Task-Oriented Dialogue Systems BART The GEM Benchmark: Natural Language … 2021-02-02
Classification SGD_ss A Sequence-to-Sequence Approach to Dialogue … 2020-11-18

Research Papers

Recent papers with results on this dataset: