Discrete Reasoning Over Paragraphs
Discrete Reasoning Over Paragraphs DROP is a crowdsourced, adversarially-created, 96k-question benchmark, in which a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. The questions consist of passages extracted from Wikipedia articles. The dataset is split into a training set of about 77,000 questions, a development set of around 9,500 questions and a hidden test set similar in size to the development set.
Source: https://allennlp.org/drop
Image Source: DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Variants: DROP Test, DROP, Drop (3-Shot)
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Question Answering | PaLM 540B (Self Improvement, Self Consistency) | Large Language Models Can Self-Improve | 2022-10-20 |
Question Answering | PaLM 540B (Self Consistency) | Large Language Models Can Self-Improve | 2022-10-20 |
Question Answering | PaLM 540B (Self Improvement, CoT Prompting) | Large Language Models Can Self-Improve | 2022-10-20 |
Question Answering | PaLM 540B (Self Improvement, Standard-Prompting) | Large Language Models Can Self-Improve | 2022-10-20 |
Question Answering | PaLM 540B (CoT Prompting) | Large Language Models Can Self-Improve | 2022-10-20 |
Question Answering | PaLM 540B (Standard-Prompting) | Large Language Models Can Self-Improve | 2022-10-20 |
Recent papers with results on this dataset: