WikiQA

Wikipedia open-domain Question Answering

Dataset Information
Modalities
Texts
Languages
English
Introduced
2015
License
Homepage

Overview

The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. In order to reflect the true information need of general users, Bing query logs were used as the question source. Each question is linked to a Wikipedia page that potentially has the answer. Because the summary section of a Wikipedia page provides the basic and usually most important information about the topic, sentences in this section were used as the candidate answers. The corpus includes 3,047 questions and 29,258 sentences, where 1,473 sentences were labeled as answer sentences to their corresponding questions.

Source: http://aka.ms/WikiQA
Image Source: Yang et al

Variants: WikiQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering TANDA-DeBERTa-V3-Large + ALL Structural Self-Supervised Objectives for Transformers 2023-09-15
Question Answering RLAS-BIABC RLAS-BIABC: A Reinforcement Learning-Based Answer … 2023-01-07
Question Answering DeBERTa-V3-Large + ALL Pre-training Transformer Models with Sentence-Level … 2022-05-20
Question Answering RoBERTa-Base + SSP Pre-training Transformer Models with Sentence-Level … 2022-05-20
Question Answering DeBERTa-Large + SSP Pre-training Transformer Models with Sentence-Level … 2022-05-20
Question Answering RoBERTa-Base Joint MSPP Paragraph-based Transformer Pre-training for Multi-Sentence … 2022-05-02
Question Answering TANDA-RoBERTa (ASNQ, WikiQA) TANDA: Transfer and Adapt Pre-Trained … 2019-11-11
Question Answering RE2 Simple and Effective Text Matching … 2019-08-01
Question Answering Comp-Clip + LM + LC A Compare-Aggregate Model with Latent … 2019-05-30
Question Answering PairwiseRank + Multi-Perspective CNN Noise Contrastive Estimation and Negative … 2018-09-06
Question Answering SWEM-concat Baseline Needs More Love: On … 2018-05-24
Question Answering HyperQA Hyperbolic Representation Learning for Fast … 2017-07-25
Question Answering MMA-NSE attention Neural Semantic Encoders 2016-07-14
Question Answering Key-Value Memory Network Key-Value Memory Networks for Directly … 2016-06-09
Question Answering LDC Sentence Similarity Learning by Lexical … 2016-02-23
Question Answering AP-CNN Attentive Pooling Networks 2016-02-11
Question Answering LSTM (lexical overlap + dist output) Neural Variational Inference for Text … 2015-11-19
Question Answering Attentive LSTM Neural Variational Inference for Text … 2015-11-19
Question Answering LSTM Neural Variational Inference for Text … 2015-11-19
Question Answering Bigram-CNN (lexical overlap + dist output) Deep Learning for Answer Sentence … 2014-12-04

Research Papers

Recent papers with results on this dataset: