KaggleDBQA

KaggleDBQA: Realistic Text-to-SQL dataset

Dataset Information
Modalities
Texts
Languages
English
Introduced
2021
License
Homepage

Overview

KaggleDBQA is a challenging cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions.

It expands upon contemporary cross-domain text-to-SQL datasets in three key aspects:
(1) Its databases are pulled from real-world data sources and not normalized.
(2) Its questions are authored in environments that mimic natural question answering.
(3) It also provides database documentation that contains rich in-domain knowledge.

Variants: KaggleDBQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Text-To-SQL RAT-SQL KaggleDBQA: Realistic Evaluation of Text-to-SQL … 2021-06-22
Text-To-SQL Edit-SQL KaggleDBQA: Realistic Evaluation of Text-to-SQL … 2021-06-22

Research Papers

Recent papers with results on this dataset: