WikiTableQuestions

Dataset Information
Modalities
Texts, Tabular
Languages
English
Introduced
2015
License
Homepage

Overview

WikiTableQuestions is a question answering dataset over semi-structured tables. It is comprised of question-answer pairs on HTML tables, and was constructed by selecting data tables from Wikipedia that contained at least 8 rows and 5 columns. Amazon Mechanical Turk workers were then tasked with writing trivia questions about each table. WikiTableQuestions contains 22,033 questions. The questions were not designed by predefined templates but were hand crafted by users, demonstrating high linguistic variance. Compared to previous datasets on knowledge bases it covers nearly 4,000 unique column headers, containing far more relations than closed domain datasets and datasets for querying knowledge bases. Its questions cover a wide range of domains, requiring operations such as table lookup, aggregation, superlatives (argmax, argmin), arithmetic operations, joins and unions.

Source: Explaining Queries over Web Tables to Non-Experts
Image Source: https://ppasupat.github.io/WikiTableQuestions/

Variants: WikiTableQuestions

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Semantic Parsing ARTEMIS-DA ARTEMIS-DA: An Advanced Reasoning and … 2024-12-18
Semantic Parsing TabLaP Accurate and Regret-aware Numerical Problem … 2024-10-10
Semantic Parsing SynTQA (GPT) SynTQA: Synergistic Table-based Question Answering … 2024-09-25
Semantic Parsing SynTQA (RF) SynTQA: Synergistic Table-based Question Answering … 2024-09-25
Semantic Parsing SynTQA (Oracle) SynTQA: Synergistic Table-based Question Answering … 2024-09-25
Semantic Parsing NormTab+TabSQLify NormTab: Improving Symbolic Reasoning in … 2024-06-25
Semantic Parsing NormTab (Targeted) + SQL NormTab: Improving Symbolic Reasoning in … 2024-06-25
Semantic Parsing Tab-PoT Efficient Prompting for LLM-based Generative … 2024-06-14
Semantic Parsing TabSQLify (col+row) TabSQLify: Enhancing Reasoning Capabilities of … 2024-04-15
Question Answering TabSQLify (col+row) TabSQLify: Enhancing Reasoning Capabilities of … 2024-04-15
Question Answering ChatGPT 3.5 SpatialFormat LAPDoc: Layout-Aware Prompting for Documents 2024-02-15
Semantic Parsing CABINET CABINET: Content Relevance based Noise … 2024-02-02
Semantic Parsing Chain-of-Table Chain-of-Table: Evolving Tables in the … 2024-01-09
Semantic Parsing Mix SC Rethinking Tabular Data Understanding with … 2023-12-27
Semantic Parsing LEVER LEVER: Learning to Verify Language-to-Code … 2023-02-16
Semantic Parsing Dater Large Language Models are Versatile … 2023-01-31
Semantic Parsing ReasTAP-Large ReasTAP: Injecting Table Reasoning Skills … 2022-10-22
Semantic Parsing Binder Binding Language Models in Symbolic … 2022-10-06
Semantic Parsing OmniTab-Large OmniTab: Pretraining with Natural and … 2022-07-08
Semantic Parsing T5-3b(UnifiedSKG) UnifiedSKG: Unifying and Multi-Tasking Structured … 2022-01-16

Research Papers

Recent papers with results on this dataset: