CompassMTL 567M with Tailor
|
Task Compass: Scaling Multi-task Pre-training wit…
|
96.10
|
2022-10-12
|
|
CompassMTL 567M
|
Task Compass: Scaling Multi-task Pre-training wit…
|
95.60
|
2022-10-12
|
|
DeBERTa-Large 304M (classification-based)
|
Two is Better than Many? Binary Classification as…
|
95.60
|
2022-10-29
|
|
GPT-4 (10-shot)
|
GPT-4 Technical Report
|
95.30
|
2023-03-15
|
|
LLaMA3+MoSLoRA
|
Mixture-of-Subspaces in Low-Rank Adaptation
|
95.00
|
2024-06-16
|
|
DeBERTa-Large 304M
|
Two is Better than Many? Binary Classification as…
|
94.70
|
2022-10-29
|
|
LLaMA-2 13B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
94.70
|
2024-04-22
|
|
Unicorn 11B (fine-tuned)
|
UNICORN on RAINBOW: A Universal Commonsense Reaso…
|
93.90
|
2021-03-24
|
|
LLaMA-3 8B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
93.30
|
2024-04-22
|
|
LLaMA-2 7B + MixLoRA
|
MixLoRA: Enhancing Large Language Models Fine-Tun…
|
93.10
|
2024-04-22
|
|
DeBERTa++
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
93.00
|
2020-06-05
|
|
ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)
|
DiscoSense: Commonsense Reasoning with Discourse …
|
91.50
|
2022-10-22
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
87.40
|
2023-05-17
|
|
ELECTRA-Large 335M (fine-tuned on HellaSwag)
|
DiscoSense: Commonsense Reasoning with Discourse …
|
86.90
|
2022-10-22
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
86.70
|
2023-05-17
|
|
MUPPET Roberta Large
|
Muppet: Massive Multi-task Representations with P…
|
86.40
|
2021-01-26
|
|
LLaMA 65B + CFG (0-shot)
|
Stay on topic with Classifier-Free Guidance
|
86.30
|
2023-06-30
|
|
Falcon-180B (0-shot)
|
The Falcon Series of Open Language Models
|
85.90
|
2023-11-28
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
85.60
|
2023-05-17
|
|
GPT-3.5 (10-shot)
|
GPT-4 Technical Report
|
85.50
|
2023-03-15
|
|
RoBERTa-Large Ensemble
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
85.50
|
2019-07-26
|
|
LLaMA 30B + CFG (0-shot)
|
Stay on topic with Classifier-Free Guidance
|
85.30
|
2023-06-30
|
|
LLaMA 2 70B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
85.30
|
2023-07-18
|
|
HyKAS+CSKG
|
Towards Generalizable Neuro-Symbolic Systems for …
|
85.00
|
2019-10-30
|
|
LLaMA 65B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
84.20
|
2023-02-27
|
|
PaLM-540B (Few-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
83.80
|
2022-04-05
|
|
PaLM-540B (1-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
83.60
|
2022-04-05
|
|
ExDeBERTa 567M
|
Task Compass: Scaling Multi-task Pre-training wit…
|
83.60
|
2022-10-12
|
|
PaLM-540B (0-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
83.40
|
2022-04-05
|
|
LLaMA 2 34B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
83.30
|
2023-07-18
|
|
Camelidae-8×34B (10-shot)
|
Parameter-Efficient Sparsity Crafting from Dense …
|
83.20
|
2024-01-05
|
|
LLaMA 33B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
82.80
|
2023-02-27
|
|
Falcon-40B (0-shot)
|
The Falcon Series of Open Language Models
|
82.70
|
2023-11-28
|
|
Megatron-Turing NLG 530B (Few-Shot)
|
Using DeepSpeed and Megatron to Train Megatron-Tu…
|
82.40
|
2022-01-28
|
|
Qwen2idae-16x14B (10-shot)
|
Parameter-Efficient Sparsity Crafting from Dense …
|
82.30
|
2024-01-05
|
|
LLaMA 13B + CFG (0-shot)
|
Stay on topic with Classifier-Free Guidance
|
82.10
|
2023-06-30
|
|
RoBERTa-Large 355M
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
81.70
|
2019-07-26
|
|
Mistral 7B (0-shot)
|
Mistral 7B
|
81.30
|
2023-10-10
|
|
Chinchilla 70B (0-shot)
|
Training Compute-Optimal Large Language Models
|
80.80
|
2022-03-29
|
|
LLaMA 2 13B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
80.70
|
2023-07-18
|
|
Megatron-Turing NLG 530B (1-shot)
|
Using DeepSpeed and Megatron to Train Megatron-Tu…
|
80.20
|
2022-01-28
|
|
GPT-3 175B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
79.30
|
2020-05-28
|
|
Gopher 280B (0-shot)
|
Scaling Language Models: Methods, Analysis & Insi…
|
79.20
|
2021-12-08
|
|
LLaMA 13B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
79.20
|
2023-02-27
|
|
GPT-3 (0-shot)
|
Language Models are Few-Shot Learners
|
78.90
|
2020-05-28
|
|
LLaMA 2 7B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
77.20
|
2023-07-18
|
|
Falcon-7B (0-shot)
|
The Falcon Series of Open Language Models
|
76.30
|
2023-11-28
|
|
LLaMA 7B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
76.10
|
2023-02-27
|
|
BlooombergGPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
73.90
|
2023-03-30
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
73.50
|
2023-03-30
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
73.20
|
2023-03-30
|
|
Sheared-LLaMA-2.7B (50B)
|
Sheared LLaMA: Accelerating Language Model Pre-tr…
|
70.80
|
2023-10-10
|
|
GPT-NeoX 20B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
68.40
|
2023-03-30
|
|
Open-LLaMA-3B-v2
|
Sheared LLaMA: Accelerating Language Model Pre-tr…
|
67.60
|
2023-10-10
|
|
Mamba-2.8B
|
Mamba: Linear-Time Sequence Modeling with Selecti…
|
66.10
|
2023-12-01
|
|
Sheared-LLaMA-1.3B (50B)
|
Sheared LLaMA: Accelerating Language Model Pre-tr…
|
60.70
|
2023-10-10
|
|
FLAN 137B (3-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
59.20
|
2021-09-03
|
|
Mamba-1.4B
|
Mamba: Linear-Time Sequence Modeling with Selecti…
|
59.10
|
2023-12-01
|
|
FLAN 137B (0-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
56.70
|
2021-09-03
|
|
sMLP – deterministic 9.4B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
54.50
|
2022-03-14
|
|
Switch Transformer 9B
|
Efficient Language Modeling with Sparse all-MLP
|
52.50
|
2022-03-14
|
|
GPT-3 Large 760M (0-shot)
|
Language Models are Few-Shot Learners
|
51.00
|
2020-05-28
|
|
GPT-2-XL 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
50.90
|
2023-04-27
|
|
OPT-6.7B
|
LLM in a flash: Efficient Large Language Model In…
|
50.30
|
2023-12-12
|
|
LLM in a Flash (OPT-6.7B with Predictor)
|
LLM in a flash: Efficient Large Language Model In…
|
49.80
|
2023-12-12
|
|
FLAN-T5-Large 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
48.70
|
2023-04-27
|
|
LaMini-GPT 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
48.30
|
2023-04-27
|
|
BERT-Large 340M
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
47.30
|
2019-05-19
|
|
LaMini-F-T5 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
43.70
|
2023-04-27
|
|
GPT-1 117M
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
41.70
|
2019-05-19
|
|
Flipped-3B
|
Guess the Instruction! Flipped Learning Makes Lan…
|
41.60
|
2022-10-06
|
|
T0-3B (CoT fine-tuned)
|
The CoT Collection: Improving Zero-shot and Few-s…
|
41.10
|
2023-05-23
|
|
LaMini-T5 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
40.60
|
2023-04-27
|
|
BERT-Base 110M
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
40.50
|
2019-05-19
|
|
T5-Large 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
38.90
|
2023-04-27
|
|
Gshard 9B
|
Efficient Language Modeling with Sparse all-MLP
|
38.00
|
2022-03-14
|
|
LSTM + BERT-Base
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
36.20
|
2019-05-19
|
|
RoE-3B
|
Exploring the Benefits of Training Expert Languag…
|
34.60
|
2023-02-07
|
|
ESIM + ElMo
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
33.30
|
2019-05-19
|
|
HASH Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
33.00
|
2022-03-14
|
|
LSTM + GloVe
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
31.70
|
2019-05-19
|
|
fastText
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
31.60
|
2019-05-19
|
|
LSTM + ElMo
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
31.40
|
2019-05-19
|
|
Base Layers 10B (0-shot)
|
Efficient Language Modeling with Sparse all-MLP
|
30.20
|
2022-03-14
|
|
KiC-770M
|
Knowledge-in-Context: Towards Knowledgeable Semi-…
|
29.60
|
2022-10-28
|
|
Random chance baseline
|
HellaSwag: Can a Machine Really Finish Your Sente…
|
25.00
|
2019-05-19
|
|