ML Research Wiki / Benchmarks / Question Answering / OpenBookQA

OpenBookQA

Question Answering Benchmark

Performance Over Time

📊 Showing 40 results | 📏 Metric: Accuracy

Top Performing Models

Rank Model Paper Accuracy Date Code
1 PaLM 540B (Self Improvement, Self Consistency) Large Language Models Can Self-Improve 94.40 2022-10-20 -
2 PaLM 540B (Self Improvement, CoT Prompting) Large Language Models Can Self-Improve 93.00 2022-10-20 -
3 PaLM 540B (Self Improvement, Standard-Prompting) Large Language Models Can Self-Improve 92.00 2022-10-20 -
4 PaLM 540B (Self Consistency) Large Language Models Can Self-Improve 90.00 2022-10-20 -
5 GrapeQA: PEGA+CANP GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering 90.00 2023-03-22 -
6 GenMC 11B Clues Before Answers: Generation-Enhanced Multiple-Choice QA 89.80 2022-04-30 📦 nju-websoft/genmc
7 AristoRoBERTa + Graph Soft Counter GNN is a Counter? Revisiting GNN for Question Answering 87.40 2021-10-07 -
8 UnifiedQA 11B UnifiedQA: Crossing Format Boundaries With a Single QA System 87.20 2020-05-02 📦 allenai/unifiedqa 📦 facebookresearch/metaicl
9 LLaMA-3 8B+MoSLoRA Mixture-of-Subspaces in Low-Rank Adaptation 86.80 2024-06-16 📦 wutaiqiang/moslora
10 PaLM 540B (CoT Prompting) Large Language Models Can Self-Improve 86.40 2022-10-20 -

All Papers (40)

Large Language Models Can Self-Improve

2022
PaLM 540B (Self Improvement, Self Consistency)

Large Language Models Can Self-Improve

2022
PaLM 540B (Self Improvement, CoT Prompting)

Large Language Models Can Self-Improve

2022
PaLM 540B (Self Improvement, Standard-Prompting)

Large Language Models Can Self-Improve

2022
PaLM 540B (Self Consistency)

GNN is a Counter? Revisiting GNN for Question Answering

2021
AristoRoBERTa + Graph Soft Counter

Large Language Models Can Self-Improve

2022
PaLM 540B (CoT Prompting)

Large Language Models Can Self-Improve

2022
PaLM 540B (Standard-Prompting)

Language Models are Few-Shot Learners

2020
GPT-3 175B (few-shot, k=32)