Unicorn 11B (fine-tuned)
|
UNICORN on RAINBOW: A Universal Commonsense Reaso…
|
90.10
|
2021-03-24
|
|
LLaMA3 8B+MoSLoRA
|
Mixture-of-Subspaces in Low-Rank Adaptation
|
89.70
|
2024-06-16
|
|
CompassMTL 567M with Tailor
|
Task Compass: Scaling Multi-task Pre-training wit…
|
88.30
|
2022-10-12
|
|
LLaMA-3 8B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
87.60
|
2024-04-22
|
|
DeBERTa-Large 304M
|
Two is Better than Many? Binary Classification as…
|
87.40
|
2022-10-29
|
|
CompassMTL 567M
|
Task Compass: Scaling Multi-task Pre-training wit…
|
87.30
|
2022-10-12
|
|
LLaMA-2 13B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
86.80
|
2024-04-22
|
|
Shakti-LLM (2.5B)
|
SHAKTI: A 2.5 Billion Parameter Small Language Mo…
|
86.20
|
2024-10-15
|
|
DeBERTa-Large 304M (classification-based)
|
Two is Better than Many? Binary Classification as…
|
85.90
|
2022-10-29
|
|
ExDeBERTa 567M
|
Task Compass: Scaling Multi-task Pre-training wit…
|
85.50
|
2022-10-12
|
|
UnifiedQA 3B
|
UnifiedQA: Crossing Format Boundaries With a Sing…
|
85.30
|
2020-05-02
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
85.00
|
2023-05-17
|
|
Mixtral 8x7B (0-shot)
|
Mixtral of Experts
|
83.60
|
2024-01-08
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
83.20
|
2023-05-17
|
|
LLaMA-2 7B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
83.20
|
2024-04-22
|
|
Mistral 7B (0-shot)
|
Mistral 7B
|
83.00
|
2023-10-10
|
|
LLaMA 65B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
82.80
|
2023-02-27
|
|
LLaMA 2 70B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
82.80
|
2023-07-18
|
|
Camelidae-8×34B
|
Parameter-Efficient Sparsity Crafting from Dense …
|
82.70
|
2024-01-05
|
|
LLaMA 33B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
82.30
|
2023-02-27
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
82.20
|
2023-05-17
|
|
Mistral 7B (0-shot)
|
Mixtral of Experts
|
82.20
|
2024-01-08
|
|
MT-NLG 530B (0-shot)
|
Megatron-LM: Training Multi-Billion Parameter Lan…
|
82.00
|
2019-09-17
|
|
LLaMA 2 34B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
81.90
|
2023-07-18
|
|
Gopher 280B (0-shot)
|
Scaling Language Models: Methods, Analysis & Insi…
|
81.80
|
2021-12-08
|
|
Chinchilla 70B (0-shot)
|
Training Compute-Optimal Large Language Models
|
81.80
|
2022-03-29
|
|
FLAN 137B (few-shot, k=10)
|
Finetuned Language Models Are Zero-Shot Learners
|
81.70
|
2021-09-03
|
|
OPT-175B
|
SparseGPT: Massive Language Models Can Be Accurat…
|
81.07
|
2023-01-02
|
|
GPT-3 175B (0-shot)
|
Language Models are Few-Shot Learners
|
81.00
|
2020-05-28
|
|
SparseGPT 175B (50% Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
80.63
|
2023-01-02
|
|
FLAN 137B (0-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
80.50
|
2021-09-03
|
|
LLaMA 2 13B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
80.50
|
2023-07-18
|
|
LLaMA 13B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
80.10
|
2023-02-27
|
|
LLaMA 7B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
79.80
|
2023-02-27
|
|
SparseGPT 175B (4:8 Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
79.54
|
2023-01-02
|
|
SparseGPT 175B (2:4 Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
79.54
|
2023-01-02
|
|
RoBERTa-Large 355M
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
79.40
|
2019-07-26
|
|
LLaMA 2 7B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
78.80
|
2023-07-18
|
|
Bloomberg GPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
77.90
|
2023-03-30
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
77.60
|
2023-03-30
|
|
RoBERTa-large 355M (fine-tuned)
|
PIQA: Reasoning about Physical Commonsense in Nat…
|
77.10
|
2019-11-26
|
|
phi-1.5-web (1.3B)
|
Textbooks Are All You Need II: phi-1.5 technical …
|
77.00
|
2023-09-11
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
77.00
|
2023-03-30
|
|
Pythia 12B (5-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
76.70
|
2023-04-03
|
|
Open-LLaMA-3B-v2
|
Sheared LLaMA: Accelerating Language Model Pre-tr…
|
76.20
|
2023-10-10
|
|
Pythia 12B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
76.00
|
2023-04-03
|
|
Sheared-LLaMA-2.7B
|
Sheared LLaMA: Accelerating Language Model Pre-tr…
|
75.80
|
2023-10-10
|
|
GPT-NeoX 20B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
75.80
|
2023-03-30
|
|
Pythia 6.9B (0-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
75.20
|
2023-04-03
|
|
Sheared-LLaMA-1.3B
|
Sheared LLaMA: Accelerating Language Model Pre-tr…
|
73.40
|
2023-10-10
|
|
sMLP - deterministic 9.4B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
73.00
|
2022-03-14
|
|
GPT-3 Large 760M (0-shot)
|
Language Models are Few-Shot Learners
|
72.90
|
2020-05-28
|
|
FLAN-T5-Large 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
72.20
|
2023-04-27
|
|
LaMini-GPT 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
71.30
|
2023-04-27
|
|
LaMini-F-T5 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
70.60
|
2023-04-27
|
|
GPT-2-XL 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
70.50
|
2023-04-27
|
|
Pythia 1B (5-shot)
|
Pythia: A Suite for Analyzing Large Language Mode…
|
70.40
|
2023-04-03
|
|
GPT-2-small 124M (fine-tuned)
|
PIQA: Reasoning about Physical Commonsense in Nat…
|
69.20
|
2019-11-26
|
|
Gshard 9B
|
Efficient Language Modeling with Sparse all-MLP
|
68.10
|
2022-03-14
|
|
LaMini-T5 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
67.20
|
2023-04-27
|
|
BERT-large 340M (fine-tuned)
|
PIQA: Reasoning about Physical Commonsense in Nat…
|
66.80
|
2019-11-26
|
|
BERT-Large 340M
|
BERT: Pre-training of Deep Bidirectional Transfor…
|
66.70
|
2018-10-11
|
|
Base Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
63.80
|
2022-03-14
|
|
HASH Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
63.80
|
2022-03-14
|
|
T5-Large 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
55.90
|
2023-04-27
|
|
OPT-175B (50% Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
54.73
|
2023-01-02
|
|
Random chance baseline
|
PIQA: Reasoning about Physical Commonsense in Nat…
|
50.00
|
2019-11-26
|
|