OpenBookQA

Name: OpenBookQA
Published: 2018-01-01
License: Custom

OBQA

Dataset Information

Modalities

Texts

Languages

English

Introduced

2018

License

Custom

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject. It consists of 5,957 multiple-choice elementary-level science questions (4,957 train, 500 dev, 500 test), which probe the understanding of a small “book” of 1,326 core science facts and the application of these facts to novel situations. For training, the dataset includes a mapping from each question to the core science fact it was designed to probe. Answering OpenBookQA questions requires additional broad common knowledge, not contained in the book. The questions, by design, are answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm.
Additionally, the dataset includes a collection of 5,167 crowd-sourced common knowledge facts, and an expanded version of the train/dev/test questions where each question is associated with its originating core fact, a human accuracy score, a clarity score, and an anonymized crowd-worker ID.

Source: https://allenai.org/data/open-book-qa
Image Source: https://arxiv.org/pdf/1809.02789.pdf

Variants: OpenBookQA, OBQA

Associated Benchmarks

This dataset is used in 2 benchmarks:

Question Answering - Metrics: Accuracy
Text Generation - Metrics: acc

Recent Benchmark Submissions

Task	Model	Paper	Date
Question Answering	LLaMA-3 8B+MoSLoRA	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16
Question Answering	LLaMA-3 8B + MixLoRA	MixLoRA: Enhancing Large Language Models …	2024-04-22
Question Answering	LLaMA-2 7B + MixLoRA	MixLoRA: Enhancing Large Language Models …	2024-04-22
Question Answering	LLaMA-2 13B + MixLoRA	MixLoRA: Enhancing Large Language Models …	2024-04-22
Question Answering	PaLM 2-M (1-shot)	PaLM 2 Technical Report	2023-05-17
Question Answering	PaLM 2-S (1-shot)	PaLM 2 Technical Report	2023-05-17
Question Answering	PaLM 2-L (1-shot)	PaLM 2 Technical Report	2023-05-17
Question Answering	LaMini-F-T5 783M	LaMini-LM: A Diverse Herd of …	2023-04-27
Question Answering	GPT-2-XL 1.5B	LaMini-LM: A Diverse Herd of …	2023-04-27
Question Answering	FLAN-T5-Large 783M	LaMini-LM: A Diverse Herd of …	2023-04-27
Question Answering	LaMini-GPT 1.5B	LaMini-LM: A Diverse Herd of …	2023-04-27
Question Answering	LaMini-T5 738M	LaMini-LM: A Diverse Herd of …	2023-04-27
Question Answering	T5-Large 738M	LaMini-LM: A Diverse Herd of …	2023-04-27
Question Answering	BLOOM 176B (2-shot)	BloombergGPT: A Large Language Model …	2023-03-30
Question Answering	GPT-NeoX 50B (2-shot)	BloombergGPT: A Large Language Model …	2023-03-30
Question Answering	OPT 66B (one-shot)	BloombergGPT: A Large Language Model …	2023-03-30
Question Answering	Bloomberg GPT 50B (1-shot)	BloombergGPT: A Large Language Model …	2023-03-30
Question Answering	GrapeQA: CANP	GrapeQA: GRaph Augmentation and Pruning …	2023-03-22
Question Answering	GrapeQA: PEGA	GrapeQA: GRaph Augmentation and Pruning …	2023-03-22
Question Answering	GrapeQA: PEGA+CANP	GrapeQA: GRaph Augmentation and Pruning …	2023-03-22

Research Papers

Recent papers with results on this dataset:

Mixture-of-Subspaces in Low-Rank Adaptation (2024) -
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts (2024) -
PaLM 2 Technical Report (2023) -
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions (2023) -
BloombergGPT: A Large Language Model for Finance (2023) -
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering (2023) -
Large Language Models Can Self-Improve (2022) -
Clues Before Answers: Generation-Enhanced Multiple-Choice QA (2022) -
GNN is a Counter? Revisiting GNN for Question Answering (2021) -
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (2021) -
Fusing Context Into Knowledge Graph for Commonsense Question Answering (2020) -
Language Models are Few-Shot Learners (2020) -
UnifiedQA: Crossing Format Boundaries With a Single QA System (2020) -
Careful Selection of Knowledge to solve Open Book Question Answering (2019) -
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering (2018) -

External Links:

OpenBookQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview