ST-MoE-32B 269B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
96.10
|
2022-02-17
|
|
Unicorn 11B (fine-tuned)
|
UNICORN on RAINBOW: A Universal Commonsense Reaso…
|
91.30
|
2021-03-24
|
|
CompassMTL 567M with Tailor
|
Task Compass: Scaling Multi-task Pre-training wit…
|
90.50
|
2022-10-12
|
|
CompassMTL 567M
|
Task Compass: Scaling Multi-task Pre-training wit…
|
89.60
|
2022-10-12
|
|
UnifiedQA 11B (fine-tuned)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
89.40
|
2020-05-02
|
|
GPT-4 (5-shot)
|
GPT-4 Technical Report
|
87.50
|
2023-03-15
|
|
ExDeBERTa 567M
|
Task Compass: Scaling Multi-task Pre-training wit…
|
87.00
|
2022-10-12
|
|
LLaMA-2 13B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
86.30
|
2024-04-22
|
|
LLaMA3 8B+MoSLoRA
|
Mixture-of-Subspaces in Low-Rank Adaptation
|
85.80
|
2024-06-16
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
83.00
|
2023-05-17
|
|
LLaMA-3 8B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
82.10
|
2024-04-22
|
|
ST-MoE-L 4.1B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
81.70
|
2022-02-17
|
|
GPT-3.5 (5-shot)
|
GPT-4 Technical Report
|
81.60
|
2023-03-15
|
|
PaLM 540B (0-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
81.10
|
2022-04-05
|
|
Camelidae-8×34B
|
Parameter-Efficient Sparsity Crafting from Dense …
|
80.90
|
2024-01-05
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
79.20
|
2023-05-17
|
|
RoBERTa-Winogrande 355M (fine-tuned)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
79.10
|
2019-07-24
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
77.90
|
2023-05-17
|
|
Mixtral 8x7B (0-shot)
|
Mixtral of Experts
|
77.20
|
2024-01-08
|
|
PaLM 62B (0-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
77.00
|
2022-04-05
|
|
PaLM-cont 62B (0-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
77.00
|
2022-04-05
|
|
LLaMA 65B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
77.00
|
2023-02-27
|
|
LLaMA-2 7B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
76.80
|
2024-04-22
|
|
LLaMA 33B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
76.00
|
2023-02-27
|
|
Mistral 7B (0-shot)
|
Mistral 7B
|
75.30
|
2023-10-10
|
|
Chinchilla 70B (0-shot)
|
Training Compute-Optimal Large Language Models
|
74.90
|
2022-03-29
|
|
Mistral 7B (0-shot)
|
Mixtral of Experts
|
74.20
|
2024-01-08
|
|
phi-1.5-web 1.3B (zero-shot)
|
Textbooks Are All You Need II: phi-1.5 technical …
|
74.00
|
2023-09-11
|
|
Unified QA 406M (fine-tuned)
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
73.30
|
2020-05-02
|
|
LLaMA 13B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
73.00
|
2023-02-27
|
|
FLAN 137B (few-shot, k=16)
|
Finetuned Language Models Are Zero-Shot Learners
|
72.80
|
2021-09-03
|
|
G-DAUG-Combo + RoBERTa-Large
|
Generative Data Augmentation for Commonsense Reas…
|
71.40
|
2020-04-24
|
|
FLAN 137B (0-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
71.20
|
2021-09-03
|
|
Branch-Train-MiX 4x7B (sampling top-1 expert)
|
Branch-Train-MiX: Mixing Expert LLMs into a Mixtu…
|
70.60
|
2024-03-12
|
|
GPT-3 175B (0-shot)
|
Language Models are Few-Shot Learners
|
70.20
|
2020-05-28
|
|
Gopher 280B (0-shot)
|
Scaling Language Models: Methods, Analysis & Insi…
|
70.10
|
2021-12-08
|
|
LLaMA 7B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
70.10
|
2023-02-27
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
67.00
|
2023-03-30
|
|
Pythia 12B (5-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
66.60
|
2023-04-03
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
66.10
|
2023-03-30
|
|
BERT-Winogrande 345M (fine-tuned)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
64.90
|
2019-07-24
|
|
Bloomberg GPT (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
64.10
|
2023-03-30
|
|
Pythia 12B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
63.90
|
2023-04-03
|
|
RoE-3B
|
Exploring the Benefits of Training Expert Languag…
|
61.60
|
2023-02-07
|
|
Pythia 6.9B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
60.90
|
2023-04-03
|
|
GPT-NeoX (one-shot)
|
BloombergGPT: A Large Language Model for Finance
|
60.60
|
2023-03-30
|
|
FLAN-T5-Large 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
59.90
|
2023-04-27
|
|
Pythia 2.8B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
59.40
|
2023-04-03
|
|
RoBERTa-DPR 355M (0-shot)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
58.90
|
2019-07-24
|
|
ALBERT-xxlarge 235M
|
Back to Square One: Artifact Detection, Training …
|
58.70
|
2021-04-16
|
|
Flipped-3B
|
Guess the Instruction! Flipped Learning Makes Lan…
|
58.56
|
2022-10-06
|
|
GPT-2-XL 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
58.30
|
2023-04-27
|
|
T0-3B (CoT fine-tuned)
|
The CoT Collection: Improving Zero-shot and Few-s…
|
57.50
|
2023-05-23
|
|
GPT-3 Large 760M (0-shot)
|
Language Models are Few-Shot Learners
|
57.40
|
2020-05-28
|
|
RoBERTa-base 125M
|
Back to Square One: Artifact Detection, Training …
|
56.30
|
2021-04-16
|
|
LaMini-F-T5 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
56.00
|
2023-04-27
|
|
LaMini-GPT 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
56.00
|
2023-04-27
|
|
BERT-large 345M
|
Back to Square One: Artifact Detection, Training …
|
55.60
|
2021-04-16
|
|
KiC-770M
|
Knowledge-in-Context: Towards Knowledgeable Semi-…
|
55.30
|
2022-10-28
|
|
T5-Large 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
55.20
|
2023-04-27
|
|
LaMini-T5 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
54.90
|
2023-04-27
|
|
RoBERTa-large 355M
|
Back to Square One: Artifact Detection, Training …
|
54.90
|
2021-04-16
|
|
sMLP – deterministic 9.4B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
54.30
|
2022-03-14
|
|
Switch Transformer 9B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
53.40
|
2022-03-14
|
|
BERT-base 110M
|
Back to Square One: Artifact Detection, Training …
|
53.10
|
2021-04-16
|
|
ALBERT-base 11M
|
Back to Square One: Artifact Detection, Training …
|
52.80
|
2021-04-16
|
|
BERT-large 345M (0-shot)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
51.90
|
2019-07-24
|
|
HASH Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
51.70
|
2022-03-14
|
|
Gshard 9B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
51.10
|
2022-03-14
|
|
Base Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
51.00
|
2022-03-14
|
|
BERT-DPR 345M (0-shot)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
51.00
|
2019-07-24
|
|
Random baseline
|
Back to Square One: Artifact Detection, Training …
|
50.00
|
2021-04-16
|
|
RoBERTa-large 355M (0-shot)
|
WinoGrande: An Adversarial Winograd Schema Challe…
|
50.00
|
2019-07-24
|
|