PaLM-540B (Few-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
89.70
|
2022-04-05
|
|
PaLM 2-L (one-shot)
|
PaLM 2 Technical Report
|
86.90
|
2023-05-17
|
|
GPT-3 175B (Few-Shot)
|
Language Models are Few-Shot Learners
|
86.40
|
2020-05-28
|
|
LLaMA-65B+CFG (Zero-Shot)
|
Stay on topic with Classifier-Free Guidance
|
84.00
|
2023-06-30
|
|
LLaMA-30B+CFG (zero-shot)
|
Stay on topic with Classifier-Free Guidance
|
83.90
|
2023-06-30
|
|
PaLM 2-M (one-shot)
|
PaLM 2 Technical Report
|
83.70
|
2023-05-17
|
|
LLaMA-13B+CFG (zero-shot)
|
Stay on topic with Classifier-Free Guidance
|
82.20
|
2023-06-30
|
|
PaLM-540B (One-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
81.80
|
2022-04-05
|
|
GLaM 62B/64E (One-Shot)
|
GLaM: Efficient Scaling of Language Models with M…
|
80.90
|
2021-12-13
|
|
PaLM 2-S (one-shot)
|
PaLM 2 Technical Report
|
80.70
|
2023-05-17
|
|
GLM-130B (bidirectional attention)
|
GLM-130B: An Open Bilingual Pre-trained Model
|
80.20
|
2022-10-05
|
|
SparseGPT (175B, 2:4 Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
79.47
|
2023-01-02
|
|
SparseGPT (175B, 4:8 Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
78.77
|
2023-01-02
|
|
PaLM-540B (Zero-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
77.90
|
2022-04-05
|
|
Chinchilla (Zero-Shot)
|
Training Compute-Optimal Large Language Models
|
77.70
|
2022-03-29
|
|
SparseGPT (175B, 50% Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
76.51
|
2023-01-02
|
|
GPT-3 175B (Zero-Shot)
|
Language Models are Few-Shot Learners
|
76.20
|
2020-05-28
|
|
OPT-175B
|
SparseGPT: Massive Language Models Can Be Accurat…
|
75.59
|
2023-01-02
|
|
GPT-3 13B (Zero-Shot)
|
Language Models are Few-Shot Learners
|
72.50
|
2020-05-28
|
|
GLM-XXLarge (bidirectional)
|
GLM: General Language Model Pretraining with Auto…
|
72.35
|
2021-03-18
|
|
Pythia 12B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
70.46
|
2023-04-03
|
|
GPT-3 6.7B (Zero-Shot)
|
Language Models are Few-Shot Learners
|
70.30
|
2020-05-28
|
|
Mamba-2.8B
|
Mamba: Linear-Time Sequence Modeling with Selecti…
|
69.20
|
2023-12-01
|
|
Pythia 6.9B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
67.28
|
2023-04-03
|
|
GLM-XXLarge (unidirectional)
|
GLM: General Language Model Pretraining with Auto…
|
67.18
|
2021-03-18
|
|
GPT-3 2.7B (Zero-Shot)
|
Language Models are Few-Shot Learners
|
67.10
|
2020-05-28
|
|
Universal Transformer (w/ dynamic halting)
|
Universal Transformers
|
56.25
|
2018-07-10
|
|
Residual Shuffle-Exchange network
|
Residual Shuffle-Exchange Networks for Fast Proce…
|
54.34
|
2020-04-06
|
|
Gated-Attention Reader (+ features)
|
Broad Context Language Modeling as Reading Compre…
|
49.00
|
2016-10-26
|
|
Pythia 6.9B(Zero-Shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
4.45
|
2023-04-03
|
|
Pythia 12B(Zero-Shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
3.92
|
2023-04-03
|
|
OPT-175B (50% Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
0.02
|
2023-01-02
|
|
test
|
Test-Time Training with Self-Supervision for Gene…
|
0.01
|
2019-09-29
|
|
Megatron-Turing NLG 530B (Few-Shot)
|
Using DeepSpeed and Megatron to Train Megatron-Tu…
|
|
2022-01-28
|
|