Turing NLR v5 XXL 5.4B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
96.40
|
2022-12-04
|
|
ST-MoE-32B 269B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
95.10
|
2022-02-17
|
|
PaLM 540B (finetuned)
|
PaLM: Scaling Language Modeling with Pathways
|
94.60
|
2022-04-05
|
|
DeBERTa-1.5B
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
94.50
|
2020-06-05
|
|
Vega v2 6B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
94.40
|
2022-12-04
|
|
T5-11B
|
Exploring the Limits of Transfer Learning with a …
|
94.10
|
2019-10-23
|
|
PaLM 2-L (one-shot)
|
PaLM 2 Technical Report
|
93.80
|
2023-05-17
|
|
T5-XXL 11B (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
93.40
|
2019-10-23
|
|
PaLM 2-M (one-shot)
|
PaLM 2 Technical Report
|
92.40
|
2023-05-17
|
|
GESA 500M
|
Integrating a Heterogeneous Graph with Entity-awa…
|
92.20
|
2023-07-19
|
|
PaLM 2-S (one-shot)
|
PaLM 2 Technical Report
|
92.10
|
2023-05-17
|
|
LUKE-Graph
|
LUKE-Graph: A Transformer-based Approach with Gat…
|
91.50
|
2023-03-12
|
|
LUKE 483M
|
LUKE: Deep Contextualized Entity Representations …
|
91.20
|
2020-10-02
|
|
GPT-3 175B (one-shot)
|
Large Language Models are Zero-Shot Reasoners
|
90.20
|
2022-05-24
|
|
KELM (finetuning RoBERTa-large based single model)
|
KELM: Knowledge Enhanced Pre-Trained Language Rep…
|
89.60
|
2021-09-09
|
|
ST-MoE-L 4.1B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
88.90
|
2022-02-17
|
|
AlexaTM 20B
|
AlexaTM 20B: Few-Shot Learning Using a Large-Scal…
|
88.40
|
2022-08-02
|
|
FLAN 137B (prompt-tuned)
|
Finetuned Language Models Are Zero-Shot Learners
|
85.10
|
2021-09-03
|
|
Bloomberg GPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
82.80
|
2023-03-30
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
82.50
|
2023-03-30
|
|
GPT-3 Large 760M (0-shot)
|
Language Models are Few-Shot Learners
|
82.10
|
2020-05-28
|
|
Switch Transformer 9B
|
Efficient Language Modeling with Sparse all-MLP
|
79.90
|
2022-03-14
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
78.00
|
2023-03-30
|
|
KELM (finetuning BERT-large based single model)
|
KELM: Knowledge Enhanced Pre-Trained Language Rep…
|
76.70
|
2021-09-09
|
|
sMLP – deterministic 9.4B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
73.40
|
2022-03-14
|
|
FLAN 137B (zero-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
72.50
|
2021-09-03
|
|
Gshard 9B
|
Efficient Language Modeling with Sparse all-MLP
|
72.40
|
2022-03-14
|
|
GPT-NeoX 20B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
67.90
|
2023-03-30
|
|
HASH Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
67.20
|
2022-03-14
|
|
Base Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
60.70
|
2022-03-14
|
|
BERT-Base (single model)
|
BERT: Pre-training of Deep Bidirectional Transfor…
|
56.07
|
2018-10-11
|
|
DocQA + ELMo
|
ReCoRD: Bridging the Gap between Human and Machin…
|
46.70
|
2018-10-30
|
|
N-Grammer 343M
|
N-Grammer: Augmenting Transformers with latent n-…
|
29.90
|
2022-07-13
|
|