HotpotQA

Name: HotpotQA
Published: 2018-01-01
License: CC BY-SA 4.0

Dataset Information

Modalities

Texts

Languages

English

Introduced

2018

License

CC BY-SA 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

HotpotQA is a question answering dataset collected on the English Wikipedia, containing about 113K crowd-sourced questions that are constructed to require the introduction paragraphs of two Wikipedia articles to answer. Each question in the dataset comes with the two gold paragraphs, as well as a list of sentences in these paragraphs that crowdworkers identify as supporting facts necessary to answer the question.

A diverse range of reasoning strategies are featured in HotpotQA, including questions involving missing entities in the question, intersection questions (What satisfies property A and property B?), and comparison questions, where two entities are compared by a common attribute, among others. In the few-document distractor setting, the QA models are given ten paragraphs in which the gold paragraphs are guaranteed to be found; in the open-domain fullwiki setting, the models are only given the question and the entire Wikipedia. Models are evaluated on their answer accuracy and explainability, where the former is measured as overlap between the predicted and gold answers with exact match (EM) and unigram F1, and the latter concerns how well the predicted supporting fact sentences match human annotation (Supporting Fact EM/F1). A joint metric is also reported on this dataset, which encourages systems to perform well on both tasks simultaneously.

Source: Answering Complex Open-domain Questions Through Iterative Query Generation
Image Source: Yang et al

Variants: HotpotQA, hotpot_qa

Associated Benchmarks

This dataset is used in 2 benchmarks:

Question Answering - Metrics: JOINT-F1, ANS-EM, ANS-F1, SUP-EM, SUP-F1, JOINT-EM
Retrieval - Metrics: Queries per second

Recent Benchmark Submissions

Task	Model	Paper	Date
Retrieval	BM25S	BM25S: Orders of magnitude faster …	2024-07-04
Retrieval	Rank-BM25	BM25S: Orders of magnitude faster …	2024-07-04
Retrieval	Elasticsearch	BM25S: Orders of magnitude faster …	2024-07-04
Question Answering	Beam Retrieval	End-to-End Beam Retrieval for Multi-Hop …	2023-08-17
Question Answering	Chain-of-Skills	Chain-of-Skills: A Configurable Model for …	2023-05-04
Question Answering	AISO	Adaptive Information Seeking for Open-Domain …	2021-09-14
Question Answering	HopRetriever + Sp-search	HopRetriever: Retrieve Hops over Wikipedia …	2020-12-31
Question Answering	IRRR+	Answering Open-Domain Questions of Varying …	2020-10-23
Question Answering	IRRR	Answering Open-Domain Questions of Varying …	2020-10-23
Question Answering	Recursive Dense Retriever	Answering Complex Open-Domain Questions with …	2020-09-27
Question Answering	DDRQA	Answering Any-hop Open-domain Questions with …	2020-09-16
Question Answering	BigBird-etc	Big Bird: Transformers for Longer …	2020-07-28
Question Answering	Quark + SemanticRetrievalMRS IR	A Simple Yet Strong Pipeline …	2020-04-14
Question Answering	Robustly Fine-tuned Graph-based Recurrent Retriever	Learning to Retrieve Reasoning Paths …	2019-11-24
Question Answering	HGN + SemanticRetrievalMRS IR	Hierarchical Graph Network for Multi-hop …	2019-11-09
Question Answering	KGNN	Multi-Paragraph Reasoning with Knowledge-enhanced Graph …	2019-11-06
Question Answering	GoldEn Retriever	Answering Complex Open-domain Questions Through …	2019-10-15
Question Answering	SemanticRetrievalMRS	Revealing the Importance of Semantic …	2019-09-17
Question Answering	MUPPET	Multi-Hop Paragraph Retrieval for Open-Domain …	2019-06-15
Question Answering	DecompRC	Multi-hop Reading Comprehension through Question …	2019-06-07

Research Papers

Recent papers with results on this dataset:

External Links:

HotpotQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview