PaLM 540B (finetuned)
|
PaLM: Scaling Language Modeling with Pathways
|
100.00
|
2022-04-05
|
|
Vega v2 6B (KD-based prompt transfer)
|
Toward Efficient Language Model Pretraining and D…
|
99.40
|
2022-12-04
|
|
ST-MoE-32B 269B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
99.20
|
2022-02-17
|
|
UL2 20B (fine-tuned)
|
UL2: Unifying Language Learning Paradigms
|
99.00
|
2022-05-10
|
|
DeBERTa-Ensemble
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
98.40
|
2020-06-05
|
|
Turing NLR v5 XXL 5.4B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
98.20
|
2022-12-04
|
|
DeBERTa-1.5B
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
96.80
|
2020-06-05
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
96.00
|
2023-05-17
|
|
T5-XXL 11B (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
94.80
|
2019-10-23
|
|
FLAN 137B (prompt-tuned)
|
Finetuned Language Models Are Zero-Shot Learners
|
94.00
|
2021-09-03
|
|
GPT-3 175B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
92.00
|
2020-05-28
|
|
T5-XL 3B (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
92.00
|
2019-10-23
|
|
FLAN 137B (zero-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
91.00
|
2021-09-03
|
|
ST-MoE-L 4.1B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
91.00
|
2022-02-17
|
|
GPT-3 175B (0-shot)
|
Language Models are Few-Shot Learners
|
91.00
|
2020-05-28
|
|
T0-3B (CoT fine-tuned)
|
The CoT Collection: Improving Zero-shot and Few-s…
|
90.90
|
2023-05-23
|
|
RoBERTa-Winogrande-ft 355M (fine-tuned)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
90.60
|
2019-07-24
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
90.00
|
2023-05-17
|
|
Flipped-3B
|
Guess the Instruction! Flipped Learning Makes Lan…
|
89.88
|
2022-10-06
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
89.00
|
2023-05-17
|
|
GPT-NeoX (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
88.00
|
2023-03-30
|
|
FLAN 137B (few-shot, k=16)
|
Finetuned Language Models Are Zero-Shot Learners
|
87.00
|
2021-09-03
|
|
GPT-3 175B (1-shot)
|
Language Models are Few-Shot Learners
|
87.00
|
2020-05-28
|
|
RoBERTa-ft 355M (fine-tuned)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
86.40
|
2019-07-24
|
|
Bloomberg GPT (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
86.00
|
2023-03-30
|
|
OPT 66B (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
86.00
|
2023-03-30
|
|
GPT-3 13B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
86.00
|
2020-05-28
|
|
KiC-770M
|
Knowledge-in-Context: Towards Knowledgeable Semi-…
|
85.30
|
2022-10-28
|
|
UL2 20B (0-shot)
|
UL2: Unifying Language Learning Paradigms
|
85.00
|
2022-05-10
|
|
RoBERTa-Winogrande 355M (fine-tuned)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
84.40
|
2019-07-24
|
|
Neo-6B (QA + WS)
|
Ask Me Anything: A simple strategy for prompting …
|
84.00
|
2022-10-05
|
|
BLOOM 176B (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
84.00
|
2023-03-30
|
|
T5-Large 770M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
83.40
|
2019-10-23
|
|
BERT-SocialIQA 340M
|
SocialIQA: Commonsense Reasoning about Social Int…
|
83.40
|
2019-04-22
|
|
Hybrid H3 2.7B (0-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
81.00
|
2022-12-28
|
|
BERT-large 340M
|
SocialIQA: Commonsense Reasoning about Social Int…
|
80.80
|
2019-04-22
|
|
RoE-3B
|
Exploring the Benefits of Training Expert Languag…
|
79.25
|
2023-02-07
|
|
sMLP – deterministic 9.4B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
79.00
|
2022-03-14
|
|
KELM (finetuning BERT-large based single model)
|
KELM: Knowledge Enhanced Pre-Trained Language Rep…
|
78.00
|
2021-09-09
|
|
AlexaTM 20B
|
AlexaTM 20B: Few-Shot Learning Using a Large-Scal…
|
78.00
|
2022-08-02
|
|
Neo-6B (few-shot)
|
Ask Me Anything: A simple strategy for prompting …
|
77.00
|
2022-10-05
|
|
Hybrid H3 2.7B (3-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
77.00
|
2022-12-28
|
|
Causal Strength w/multi-word predicates (presumably on WinoGrande?)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
76.40
|
2019-07-24
|
|
Gshard 9B
|
Efficient Language Modeling with Sparse all-MLP
|
76.00
|
2022-03-14
|
|
Switch Transformer 9B
|
Efficient Language Modeling with Sparse all-MLP
|
75.00
|
2022-03-14
|
|
GPT-3 Large 760M (0-shot)
|
Language Models are Few-Shot Learners
|
73.00
|
2020-05-28
|
|
T5-Base 220M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
71.20
|
2019-10-23
|
|
Hybrid H3 125M (0-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
67.00
|
2022-12-28
|
|
Hybrid H3 125M (0-shot, rank classification)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
67.00
|
2022-12-28
|
|
Pointwise Mutual Information (on 10M stories)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
65.40
|
2019-07-24
|
|
HASH Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
64.00
|
2022-03-14
|
|
Base Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
63.00
|
2022-03-14
|
|
N-Grammer 343M
|
N-Grammer: Augmenting Transformers with latent n-…
|
60.00
|
2022-07-13
|
|
Neo-6B (QA)
|
Ask Me Anything: A simple strategy for prompting …
|
58.20
|
2022-10-05
|
|
H3 125M (0-shot, rank classification)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
51.00
|
2022-12-28
|
|