DROP

Discrete Reasoning Over Paragraphs

Dataset Information
Modalities
Texts
Languages
English
Introduced
2019
License
Homepage

Overview

Discrete Reasoning Over Paragraphs DROP is a crowdsourced, adversarially-created, 96k-question benchmark, in which a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. The questions consist of passages extracted from Wikipedia articles. The dataset is split into a training set of about 77,000 questions, a development set of around 9,500 questions and a hidden test set similar in size to the development set.

Source: https://allennlp.org/drop
Image Source: DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

Variants: DROP Test, DROP, Drop (3-Shot)

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering PaLM 540B (Self Improvement, Self Consistency) Large Language Models Can Self-Improve 2022-10-20
Question Answering PaLM 540B (Self Consistency) Large Language Models Can Self-Improve 2022-10-20
Question Answering PaLM 540B (Self Improvement, CoT Prompting) Large Language Models Can Self-Improve 2022-10-20
Question Answering PaLM 540B (Self Improvement, Standard-Prompting) Large Language Models Can Self-Improve 2022-10-20
Question Answering PaLM 540B (CoT Prompting) Large Language Models Can Self-Improve 2022-10-20
Question Answering PaLM 540B (Standard-Prompting) Large Language Models Can Self-Improve 2022-10-20

Research Papers

Recent papers with results on this dataset: