PaLM 540B (Self Improvement, Self Consistency)
|
Large Language Models Can Self-Improve
|
94.40
|
2022-10-20
|
|
PaLM 540B (Self Improvement, CoT Prompting)
|
Large Language Models Can Self-Improve
|
93.00
|
2022-10-20
|
|
PaLM 540B (Self Improvement, Standard-Prompting)
|
Large Language Models Can Self-Improve
|
92.00
|
2022-10-20
|
|
PaLM 540B (Self Consistency)
|
Large Language Models Can Self-Improve
|
90.00
|
2022-10-20
|
|
GrapeQA: PEGA+CANP
|
GrapeQA: GRaph Augmentation and Pruning to Enhanc…
|
90.00
|
2023-03-22
|
|
GenMC 11B
|
Clues Before Answers: Generation-Enhanced Multipl…
|
89.80
|
2022-04-30
|
|
AristoRoBERTa + Graph Soft Counter
|
GNN is a Counter? Revisiting GNN for Question Ans…
|
87.40
|
2021-10-07
|
|
UnifiedQA 11B
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
87.20
|
2020-05-02
|
|
LLaMA-3 8B+MoSLoRA
|
Mixture-of-Subspaces in Low-Rank Adaptation
|
86.80
|
2024-06-16
|
|
PaLM 540B (CoT Prompting)
|
Large Language Models Can Self-Improve
|
86.40
|
2022-10-20
|
|
LLaMA-3 8B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
84.80
|
2024-04-22
|
|
PaLM 540B (Standard-Prompting)
|
Large Language Models Can Self-Improve
|
84.40
|
2022-10-20
|
|
TTTTT 3B
|
Fusing Context Into Knowledge Graph for Commonsen…
|
83.20
|
2020-12-09
|
|
LLaMA-2 13B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
83.00
|
2024-04-22
|
|
AristoRoBERTa + QA-GNN
|
QA-GNN: Reasoning with Language Models and Knowle…
|
82.80
|
2021-04-13
|
|
QA-GNN
|
QA-GNN: Reasoning with Language Models and Knowle…
|
82.80
|
2021-04-13
|
|
DEKCOR
|
Fusing Context Into Knowledge Graph for Commonsen…
|
82.40
|
2020-12-09
|
|
GrapeQA: PEGA
|
GrapeQA: GRaph Augmentation and Pruning to Enhanc…
|
82.00
|
2023-03-22
|
|
LLaMA-2 7B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
81.60
|
2024-04-22
|
|
AristoRoBERTa
|
QA-GNN: Reasoning with Language Models and Knowle…
|
77.80
|
2021-04-13
|
|
BiLSTM max-out question-match (science fact + common knowledge fact)
|
Can a Suit of Armor Conduct Electricity? A New Da…
|
76.90
|
2018-09-08
|
|
Careful Selection
|
Careful Selection of Knowledge to solve Open Book…
|
72.00
|
2019-07-24
|
|
GrapeQA: CANP
|
GrapeQA: GRaph Augmentation and Pruning to Enhanc…
|
66.20
|
2023-03-22
|
|
GPT-3 175B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
65.40
|
2020-05-28
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
58.50
|
2023-05-17
|
|
OPT 66B (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
58.00
|
2023-03-30
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
57.40
|
2023-05-17
|
|
BiLSTM max-out question-match (WordNet + science fact)
|
Can a Suit of Armor Conduct Electricity? A New Da…
|
56.30
|
2018-09-08
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
56.20
|
2023-05-17
|
|
BiLSTM max-out question-match (with a science fact)
|
Can a Suit of Armor Conduct Electricity? A New Da…
|
55.80
|
2018-09-08
|
|
Bloomberg GPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
51.60
|
2023-03-30
|
|
BLOOM 176B (2-shot)
|
BloombergGPT: A Large Language Model for Finance
|
47.20
|
2023-03-30
|
|
GPT-NeoX 50B (2-shot)
|
BloombergGPT: A Large Language Model for Finance
|
44.20
|
2023-03-30
|
|
LaMini-GPT 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
39.80
|
2023-04-27
|
|
LaMini-T5 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
36.00
|
2023-04-27
|
|
LaMini-F-T5 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
34.00
|
2023-04-27
|
|
T5-Large 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
32.80
|
2023-04-27
|
|
GPT-2-XL 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
32.00
|
2023-04-27
|
|
FLAN-T5-Large 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
31.20
|
2023-04-27
|
|
Random chance baseline
|
Can a Suit of Armor Conduct Electricity? A New Da…
|
25.00
|
2018-09-08
|
|