GPT-4o (HPT)
|
Hierarchical Prompting Taxonomy: A Universal Eval…
|
92.54
|
2024-06-18
|
|
DeBERTaV3-large+KEAR
|
Human Parity on CommonsenseQA: Augmenting Self-At…
|
91.20
|
2021-12-06
|
|
PaLM 2 (few‑shot, CoT, SC)
|
PaLM 2 Technical Report
|
90.40
|
2023-05-17
|
|
KEAR
|
Human Parity on CommonsenseQA: Augmenting Self-At…
|
89.40
|
2021-12-06
|
|
DEKCOR
|
Fusing Context Into Knowledge Graph for Commonsen…
|
83.30
|
2020-12-09
|
|
Unicorn 11B (fine-tuned)
|
UNICORN on RAINBOW: A Universal Commonsense Reaso…
|
79.30
|
2021-03-24
|
|
MUPPET Roberta Large
|
Muppet: Massive Multi-task Representations with P…
|
79.20
|
2021-01-26
|
|
UnifiedQA 11B (fine-tuned)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
79.10
|
2020-05-02
|
|
DRAGON
|
Deep Bidirectional Language-Knowledge Graph Pretr…
|
78.20
|
2022-10-17
|
|
T5-XXL 11B (fine-tuned)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
78.10
|
2020-05-02
|
|
Albert Lan et al. (2020) (ensemble)
|
ALBERT: A Lite BERT for Self-supervised Learning …
|
76.50
|
2019-09-26
|
|
UnifiedQA 11B (zero-shot)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
76.20
|
2020-05-02
|
|
QA-GNN
|
QA-GNN: Reasoning with Language Models and Knowle…
|
76.10
|
2021-04-13
|
|
XLNet+GraphReason
|
Graph-Based Reasoning over Heterogeneous External…
|
75.30
|
2019-09-09
|
|
GrapeQA: PEGA
|
GrapeQA: GRaph Augmentation and Pruning to Enhanc…
|
73.50
|
2023-03-22
|
|
RoBERTa+HyKAS Ma et al. (2019)
|
Towards Generalizable Neuro-Symbolic Systems for …
|
73.20
|
2019-10-30
|
|
GPT-3 Direct Finetuned
|
Human Parity on CommonsenseQA: Augmenting Self-At…
|
73.00
|
2021-12-06
|
|
STaR (on GPT-J)
|
STaR: Bootstrapping Reasoning With Reasoning
|
72.30
|
2022-03-28
|
|
RoBERTa-Large 355M
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
72.10
|
2019-07-26
|
|
STaR without Rationalization (on GPT-J)
|
STaR: Bootstrapping Reasoning With Reasoning
|
68.80
|
2022-03-28
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
66.40
|
2023-03-30
|
|
Bloomberg GPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
65.50
|
2023-03-30
|
|
CAGE-reasoning
|
Explain Yourself! Leveraging Language Models for …
|
64.70
|
2019-06-06
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
64.20
|
2023-03-30
|
|
UnifiedQA 440M (fine-tuned)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
64.00
|
2020-05-02
|
|
BART-large 440M (fine-tuned)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
62.50
|
2020-05-02
|
|
BERT_CSlarge
|
Align, Mask and Select: A Simple Method for Incor…
|
62.20
|
2019-08-19
|
|
GPT-NeoX 20B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
60.40
|
2023-03-30
|
|
GPT-J Direct Finetuned
|
STaR: Bootstrapping Reasoning With Reasoning
|
60.00
|
2022-03-28
|
|
KagNet
|
KagNet: Knowledge-Aware Graph Networks for Common…
|
58.90
|
2019-09-04
|
|
BERT-LARGE
|
CommonsenseQA: A Question Answering Challenge Tar…
|
55.90
|
2018-11-02
|
|
UL2 20B (chain-of-thought + self-consistency)
|
UL2: Unifying Language Learning Paradigms
|
55.70
|
2022-05-10
|
|
Few-shot CoT LaMDA 137B
|
STaR: Bootstrapping Reasoning With Reasoning
|
55.60
|
2022-03-28
|
|
UL2 20B (chain-of-thought)
|
UL2: Unifying Language Learning Paradigms
|
51.40
|
2022-05-10
|
|
Few-shot CoT GPT-J
|
STaR: Bootstrapping Reasoning With Reasoning
|
36.60
|
2022-03-28
|
|
UL2 20B (zero-shot)
|
UL2: Unifying Language Learning Paradigms
|
34.20
|
2022-05-10
|
|
Chain of thought ASDiv
|
Chain-of-Thought Prompting Elicits Reasoning in L…
|
28.60
|
2022-01-28
|
|
Few-shot Direct GPT-J
|
STaR: Bootstrapping Reasoning With Reasoning
|
20.90
|
2022-03-28
|
|