TAT-QA

Dataset Information
Modalities
Texts, Tables
Languages
English
Introduced
2021
License
Homepage

Overview

TAT-QA (Tabular And Textual dataset for Question Answering) is a large-scale QA dataset, aiming to stimulate progress of QA research over more complex and realistic tabular and textual data, especially those requiring numerical reasoning.

The unique features of TAT-QA include:

  • The context given is hybrid, comprising a semi-structured table and at least two relevant paragraphs that describe, analyze or complement the table;
  • The questions are generated by the humans with rich financial knowledge, most are practical;
  • The answer forms are diverse, including single span, multiple spans and free-form;
  • To answer the questions, various numerical reasoning capabilities are usually required, including addition (+), subtraction (-), multiplication (x), division (/), counting, comparison, sorting, and their compositions;In addition to the ground-truth answers, the corresponding derivations and scale are also provided if any.

In total, TAT-QA contains 16,552 questions associated with 2,757 hybrid contexts from real-world financial reports.

Variants: TAT-QA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering TagOp TAT-QA: A Question Answering Benchmark … 2021-05-17

Research Papers

Recent papers with results on this dataset: