Mistral-Nemo 12B (HPT)
|
Hierarchical Prompting Taxonomy: A Universal Eval…
|
99.87
|
2024-06-18
|
|
ST-MoE-32B 269B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
92.40
|
2022-02-17
|
|
PaLM 540B (fine-tuned)
|
PaLM: Scaling Language Modeling with Pathways
|
92.20
|
2022-04-05
|
|
Turing NLR v5 XXL 5.4B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
92.00
|
2022-12-04
|
|
T5-XXL 11B (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
91.20
|
2019-10-23
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
90.90
|
2023-05-17
|
|
UL2 20B (fine-tuned)
|
UL2: Unifying Language Learning Paradigms
|
90.80
|
2022-05-10
|
|
Vega v2 6B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
90.50
|
2022-12-04
|
|
DeBERTa-1.5B
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
90.40
|
2020-06-05
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
88.60
|
2023-05-17
|
|
ST-MoE-L 4.1B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
88.60
|
2022-02-17
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
88.10
|
2023-05-17
|
|
MUPPET Roberta Large
|
Muppet: Massive Multi-task Representations with P…
|
87.50
|
2021-01-26
|
|
FLAN 137B (prompt-tuned)
|
Finetuned Language Models Are Zero-Shot Learners
|
86.30
|
2021-09-03
|
|
RoBERTa-large 355M + Entailment as Few-shot Learner
|
Entailment as Few-Shot Learner
|
86.00
|
2021-04-29
|
|
T5-Large 770M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
85.40
|
2019-10-23
|
|
LLaMA 65B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
85.30
|
2023-02-27
|
|
LLaMA 2 70B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
85.00
|
2023-07-18
|
|
FLAN 137B (4-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
84.60
|
2021-09-03
|
|
MUPPET Roberta Base
|
Muppet: Massive Multi-task Representations with P…
|
83.80
|
2021-01-26
|
|
Chinchilla 70B (0-shot)
|
Training Compute-Optimal Large Language Models
|
83.70
|
2022-03-29
|
|
LLaMA 2 34B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
83.70
|
2023-07-18
|
|
LLaMA 33B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
83.10
|
2023-02-27
|
|
FLAN 137B (0-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
82.90
|
2021-09-03
|
|
LLaMA 2 13B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
81.70
|
2023-07-18
|
|
T5-Base 220M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
81.40
|
2019-10-23
|
|
BERT-MultiNLI 340M (fine-tuned)
|
BoolQ: Exploring the Surprising Difficulty of Nat…
|
80.40
|
2019-05-24
|
|
Gopher (zero-shot)
|
Scaling Language Models: Methods, Analysis & Insi…
|
79.30
|
2021-12-08
|
|
LLaMA 13B (zero-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
78.10
|
2023-02-27
|
|
LLaMA 2 7B (zero-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
77.40
|
2023-07-18
|
|
LLaMA-2 13B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
77.10
|
2024-04-22
|
|
LLaMA 7B (zero-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
76.50
|
2023-02-27
|
|
T5-Small 60M (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
76.40
|
2019-10-23
|
|
GPT-3 175B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
76.40
|
2020-05-28
|
|
BiDAF-MultiNLI (fine-tuned)
|
BoolQ: Exploring the Surprising Difficulty of Nat…
|
75.57
|
2019-05-24
|
|
LLaMA-3 8B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
75.00
|
2024-04-22
|
|
Bloomberg GPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
74.60
|
2023-03-30
|
|
LLaMA3+MoSLoRA
|
Mixture-of-Subspaces in Low-Rank Adaptation
|
74.60
|
2024-06-16
|
|
GPT-1 117M (fine-tuned)
|
BoolQ: Exploring the Surprising Difficulty of Nat…
|
72.87
|
2019-05-24
|
|
LLaMA-2 7B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
72.70
|
2024-04-22
|
|
BiDAF + ELMo (fine-tuned)
|
BoolQ: Exploring the Surprising Difficulty of Nat…
|
71.41
|
2019-05-24
|
|
OPT-IML 175B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
71.40
|
2022-12-22
|
|
AlexaTM 20B
|
AlexaTM 20B: Few-Shot Learning Using a Large-Scal…
|
69.40
|
2022-08-02
|
|
Neo-6B (QA + WS)
|
Ask Me Anything: A simple strategy for prompting …
|
67.20
|
2022-10-05
|
|
OPT-IML 30B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
66.90
|
2022-12-22
|
|
Neo-6B (few-shot)
|
Ask Me Anything: A simple strategy for prompting …
|
66.50
|
2022-10-05
|
|
N-Grammer 343M
|
N-Grammer: Augmenting Transformers with latent n-…
|
65.00
|
2022-07-13
|
|
Neo-6B (QA)
|
Ask Me Anything: A simple strategy for prompting …
|
64.90
|
2022-10-05
|
|
OPT 30B (0-shot)
|
OPT-IML: Scaling Language Model Instruction Meta …
|
64.00
|
2022-12-22
|
|
UL2 20B (0-shot)
|
UL2: Unifying Language Learning Paradigms
|
63.10
|
2022-05-10
|
|
Majority baseline
|
BoolQ: Exploring the Surprising Difficulty of Nat…
|
62.17
|
2019-05-24
|
|
Hybrid H3 1.3B (0-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
61.70
|
2022-12-28
|
|
OPT-IML 1.3B (0-shot)
|
OPT-IML: Scaling Language Model Instruction Meta …
|
61.50
|
2022-12-22
|
|
Shakti-LLM (2.5B)
|
SHAKTI: A 2.5 Billion Parameter Small Language Mo…
|
61.10
|
2024-10-15
|
|
Hybrid H3 2.7B (3-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
60.60
|
2022-12-28
|
|
GPT-3 75B (0-shot)
|
Language Models are Few-Shot Learners
|
60.50
|
2020-05-28
|
|
OPT 1.3B (zero-shot)
|
OPT-IML: Scaling Language Model Instruction Meta …
|
60.50
|
2022-12-22
|
|
OPT 175B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
60.10
|
2022-12-22
|
|
Hybrid H3 125M (0-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
59.60
|
2022-12-28
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
57.50
|
2023-03-30
|
|
Hybrid H3 125M (3-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
56.10
|
2022-12-28
|
|
Hybrid H3 125M (3-shot, rank classification)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
56.10
|
2022-12-28
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
52.90
|
2023-03-30
|
|
Hyena
|
Hyena Hierarchy: Towards Larger Convolutional Lan…
|
51.80
|
2023-02-21
|
|
GPT-NeoX 20B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
46.40
|
2023-03-30
|
|