OpenBookQA

OBQA

Dataset Information
Modalities
Texts
Languages
English
Introduced
2018
License
Homepage

Overview

OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject. It consists of 5,957 multiple-choice elementary-level science questions (4,957 train, 500 dev, 500 test), which probe the understanding of a small “book” of 1,326 core science facts and the application of these facts to novel situations. For training, the dataset includes a mapping from each question to the core science fact it was designed to probe. Answering OpenBookQA questions requires additional broad common knowledge, not contained in the book. The questions, by design, are answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm.
Additionally, the dataset includes a collection of 5,167 crowd-sourced common knowledge facts, and an expanded version of the train/dev/test questions where each question is associated with its originating core fact, a human accuracy score, a clarity score, and an anonymized crowd-worker ID.

Source: https://allenai.org/data/open-book-qa
Image Source: https://arxiv.org/pdf/1809.02789.pdf

Variants: OpenBookQA, OBQA

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Question Answering LLaMA-3 8B+MoSLoRA Mixture-of-Subspaces in Low-Rank Adaptation 2024-06-16
Question Answering LLaMA-3 8B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Question Answering LLaMA-2 7B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Question Answering LLaMA-2 13B + MixLoRA MixLoRA: Enhancing Large Language Models … 2024-04-22
Question Answering PaLM 2-M (1-shot) PaLM 2 Technical Report 2023-05-17
Question Answering PaLM 2-S (1-shot) PaLM 2 Technical Report 2023-05-17
Question Answering PaLM 2-L (1-shot) PaLM 2 Technical Report 2023-05-17
Question Answering LaMini-F-T5 783M LaMini-LM: A Diverse Herd of … 2023-04-27
Question Answering GPT-2-XL 1.5B LaMini-LM: A Diverse Herd of … 2023-04-27
Question Answering FLAN-T5-Large 783M LaMini-LM: A Diverse Herd of … 2023-04-27
Question Answering LaMini-GPT 1.5B LaMini-LM: A Diverse Herd of … 2023-04-27
Question Answering LaMini-T5 738M LaMini-LM: A Diverse Herd of … 2023-04-27
Question Answering T5-Large 738M LaMini-LM: A Diverse Herd of … 2023-04-27
Question Answering BLOOM 176B (2-shot) BloombergGPT: A Large Language Model … 2023-03-30
Question Answering GPT-NeoX 50B (2-shot) BloombergGPT: A Large Language Model … 2023-03-30
Question Answering OPT 66B (one-shot) BloombergGPT: A Large Language Model … 2023-03-30
Question Answering Bloomberg GPT 50B (1-shot) BloombergGPT: A Large Language Model … 2023-03-30
Question Answering GrapeQA: CANP GrapeQA: GRaph Augmentation and Pruning … 2023-03-22
Question Answering GrapeQA: PEGA GrapeQA: GRaph Augmentation and Pruning … 2023-03-22
Question Answering GrapeQA: PEGA+CANP GrapeQA: GRaph Augmentation and Pruning … 2023-03-22

Research Papers

Recent papers with results on this dataset: