Stanford Question Answering Dataset
The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.
Source: Deep Learning Based Text Classification: A Comprehensive Review
Image Source: https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Prime_number.html
Variants: squad_bn, squad_adversarial, qg_squad, The Stanford Question Answering Dataset, squad_v2, SQuAD, SQuAD2.0 dev, SQuAD2.0, SQuAD1.1 dev, SQuAD1.1
This dataset is used in 3 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Data-free Knowledge Distillation | GOLD (T5-base) | GOLD: Generalized Knowledge Distillation via … | 2024-03-28 |
Question Answering | Blended RAG | Blended RAG: Improving RAG (Retriever-Augmented … | 2024-03-22 |
Data-free Knowledge Distillation | Prompt2Model (T5-base) | Prompt2Model: Generating Deployable Models from … | 2023-08-23 |
Data-free Knowledge Distillation | ProGen (T5-base) | ProGen: Progressive Zero-shot Dataset Generation … | 2022-10-22 |
Data-free Knowledge Distillation | ZeroGen (T5-base) | ZeroGen: Efficient Zero-shot Learning via … | 2022-02-16 |
Question Answering | RAG-end2end | Fine-tune the Entire RAG Architecture … | 2021-06-22 |
Question Generation | Info-HCVAE | Generating Diverse and Consistent QA … | 2020-05-28 |
Question Generation | HCVAE | Generating Diverse and Consistent QA … | 2020-05-28 |
Recent papers with results on this dataset: