PaLM 540B (finetuned)
|
PaLM: Scaling Language Modeling with Pathways
|
100.00
|
2022-04-05
|
|
Vega v2 6B (KD-based prompt transfer)
|
Toward Efficient Language Model Pretraining and D…
|
99.20
|
2022-12-04
|
|
ST-MoE-L 4.1B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
98.20
|
2022-02-17
|
|
ST-MoE-32B 269B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
98.00
|
2022-02-17
|
|
Turing NLR v5 XXL 5.4B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
97.60
|
2022-12-04
|
|
DeBERTa-1.5B
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
97.20
|
2020-06-05
|
|
T5-XXL 11B (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
96.80
|
2019-10-23
|
|
T5-Large 770M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
94.40
|
2019-10-23
|
|
T5-Base 220M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
94.00
|
2019-10-23
|
|
PaLM 2-L (one-shot)
|
PaLM 2 Technical Report
|
87.50
|
2023-05-17
|
|
PaLM 2-S (one-shot)
|
PaLM 2 Technical Report
|
82.10
|
2023-05-17
|
|
PaLM 2-M (one-shot)
|
PaLM 2 Technical Report
|
80.40
|
2023-05-17
|
|
GPT-3 175B (Few-Shot)
|
Language Models are Few-Shot Learners
|
75.60
|
2020-05-28
|
|
N-Grammer 343M
|
N-Grammer: Augmenting Transformers with latent n-…
|
67.90
|
2022-07-13
|
|
AlexaTM 20B
|
AlexaTM 20B: Few-Shot Learning Using a Large-Scal…
|
67.90
|
2022-08-02
|
|
Bloomberg GPT (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
53.57
|
2023-03-30
|
|
GPT-3 175B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
52.00
|
2020-05-28
|
|
GPT-NeoX (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
48.21
|
2023-03-30
|
|
BLOOM 176B (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
48.21
|
2023-03-30
|
|
OPT 66B (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
44.64
|
2023-03-30
|
|