SQuAD

Stanford Question Answering Dataset

Dataset Information
Modalities
Texts
Languages
English
Introduced
2024
License
Homepage

Overview

The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.

Source: Deep Learning Based Text Classification: A Comprehensive Review
Image Source: https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Prime_number.html

Variants: squad_bn, squad_adversarial, qg_squad, The Stanford Question Answering Dataset, squad_v2, SQuAD, SQuAD2.0 dev, SQuAD2.0, SQuAD1.1 dev, SQuAD1.1

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Data-free Knowledge Distillation GOLD (T5-base) GOLD: Generalized Knowledge Distillation via … 2024-03-28
Question Answering Blended RAG Blended RAG: Improving RAG (Retriever-Augmented … 2024-03-22
Data-free Knowledge Distillation Prompt2Model (T5-base) Prompt2Model: Generating Deployable Models from … 2023-08-23
Data-free Knowledge Distillation ProGen (T5-base) ProGen: Progressive Zero-shot Dataset Generation … 2022-10-22
Data-free Knowledge Distillation ZeroGen (T5-base) ZeroGen: Efficient Zero-shot Learning via … 2022-02-16
Question Answering RAG-end2end Fine-tune the Entire RAG Architecture … 2021-06-22
Question Generation Info-HCVAE Generating Diverse and Consistent QA … 2020-05-28
Question Generation HCVAE Generating Diverse and Consistent QA … 2020-05-28

Research Papers

Recent papers with results on this dataset: