SEDE

Name: SEDE
Published: 2021-06-09
License: Unknown

Stack Exchange Data Explorer

Dataset Information

Modalities

Texts

Introduced

2021

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

SEDE is a dataset comprised of 12,023 complex and diverse SQL queries and their natural language titles and descriptions, written by real users of the Stack Exchange Data Explorer out of a natural interaction. These pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset. The goal of this dataset is to take a significant step towards evaluation of Text-to-SQL models in a real-world setting. Compared to other Text-to-SQL datasets, SEDE contains at least 10 times more SQL queries templates (queries after canonization and anonymization of values) than other datasets, and has the most diverse set of utterances and SQL queries (in terms of 3-grams) out of all single-domain datasets. SEDE introduces real-world challenges, such as under-specification, usage of parameters in queries, dates manipulation and more.

Variants: SEDE

Associated Benchmarks

This dataset is used in 1 benchmark:

Text-To-SQL - Metrics: PCM-F1 (dev), PCM-F1 (test)

Recent Benchmark Submissions

Task	Model	Paper	Date
Text-To-SQL	T5-Large	Text-to-SQL in the Wild: A …	2021-06-09

Research Papers

Recent papers with results on this dataset:

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data (2021) -

External Links:

SEDE

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview