PSQ (Chen et al., 2020)
|
A Statistical Framework for Low-bitwidth Training…
|
86.80
|
2020-10-27
|
|
Q8BERT (Zafrir et al., 2019)
|
Q8BERT: Quantized 8Bit BERT
|
84.80
|
2019-10-14
|
|
Q-BERT (Shen et al., 2020)
|
Q-BERT: Hessian Based Ultra Low Precision Quantiz…
|
84.70
|
2019-09-12
|
|
KiC-770M
|
Knowledge-in-Context: Towards Knowledgeable Semi-…
|
74.00
|
2022-10-28
|
|
Flipped-3B
|
Guess the Instruction! Flipped Learning Makes Lan…
|
71.05
|
2022-10-06
|
|
RoE-3B
|
Exploring the Benefits of Training Expert Languag…
|
64.01
|
2023-02-07
|
|
ELC-BERT-base 98M (zero init)
|
Not all layers are equally as important: Every La…
|
63.00
|
2023-11-03
|
|
ELC-BERT-small 24M
|
Not all layers are equally as important: Every La…
|
55.40
|
2023-11-03
|
|
LTG-BERT-base 98M
|
Not all layers are equally as important: Every La…
|
54.70
|
2023-11-03
|
|
LTG-BERT-small 24M
|
Not all layers are equally as important: Every La…
|
53.70
|
2023-11-03
|
|
PaLM 2-S (1-shot)
|
PaLM 2 Technical Report
|
|
2023-05-17
|
|
Vega v2 6B (KD-based prompt transfer)
|
Toward Efficient Language Model Pretraining and D…
|
|
2022-12-04
|
|
PaLM 540B (fine-tuned)
|
PaLM: Scaling Language Modeling with Pathways
|
|
2022-04-05
|
|
Turing NLR v5 XXL 5.4B (fine-tuned)
|
Toward Efficient Language Model Pretraining and D…
|
|
2022-12-04
|
|
ST-MoE-32B 269B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
|
2022-02-17
|
|
DeBERTa-1.5B
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
|
2020-06-05
|
|
MUPPET Roberta Large
|
Muppet: Massive Multi-task Representations with P…
|
|
2021-01-26
|
|
DeBERTaV3large
|
DeBERTaV3: Improving DeBERTa using ELECTRA-Style …
|
|
2021-11-18
|
|
T5-XXL 11B
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
T5-XXL 11B (fine-tuned)
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
ST-MoE-L 4.1B (fine-tuned)
|
ST-MoE: Designing Stable and Transferable Sparse …
|
|
2022-02-17
|
|
UL2 20B (fine-tuned)
|
UL2: Unifying Language Learning Paradigms
|
|
2022-05-10
|
|
SMARTRoBERTa
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
FLAN 137B (prompt-tuned)
|
Finetuned Language Models Are Zero-Shot Learners
|
|
2021-09-03
|
|
T5-XL 3B
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
RoBERTa-large 355M + Entailment as Few-shot Learner
|
Entailment as Few-Shot Learner
|
|
2021-04-29
|
|
ALBERT
|
ALBERT: A Lite BERT for Self-supervised Learning …
|
|
2019-09-26
|
|
Adv-RoBERTa ensemble
|
StructBERT: Incorporating Language Structures int…
|
|
2019-08-13
|
|
RoBERTa
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
|
2019-07-26
|
|
RoBERTa (ensemble)
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
|
2019-07-26
|
|
T5-Large 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
|
2023-04-27
|
|
T5-Large 770M
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
RoBERTa-large 355M + EFL + UCA
|
Entailment as Few-Shot Learner
|
|
2021-04-29
|
|
PaLM 540B (1-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
|
2022-04-05
|
|
XLNet (single model)
|
XLNet: Generalized Autoregressive Pretraining for…
|
|
2019-06-19
|
|
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
|
LLM.int8(): 8-bit Matrix Multiplication for Trans…
|
|
2022-08-15
|
|
OPT-IML 175B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
|
2022-12-22
|
|
FLAN 137B (8-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
|
2021-09-03
|
|
FLAN 137B (0-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
|
2021-09-03
|
|
OPT-IML 30B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
|
2022-12-22
|
|
PaLM 2-M (1-shot)
|
PaLM 2 Technical Report
|
|
2023-05-17
|
|
T0-3B (CoT fine-tuned)
|
The CoT Collection: Improving Zero-shot and Few-s…
|
|
2023-05-23
|
|
ERNIE 2.0 Large
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
T5-Base 220M
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
MLM+ del-span
|
CLEAR: Contrastive Learning for Sentence Represen…
|
|
2020-12-31
|
|
PaLM 540B (5-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
|
2022-04-05
|
|
PaLM 2-L (1-shot)
|
PaLM 2 Technical Report
|
|
2023-05-17
|
|
SpanBERT
|
SpanBERT: Improving Pre-training by Representing …
|
|
2019-07-24
|
|
Neo-6B (QA + WS)
|
Ask Me Anything: A simple strategy for prompting …
|
|
2022-10-05
|
|
BigBird
|
Big Bird: Transformers for Longer Sequences
|
|
2020-07-28
|
|
ERNIE 2.0 Base
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
RealFormer
|
RealFormer: Transformer Likes Residual Attention
|
|
2020-12-21
|
|
SqueezeBERT
|
SqueezeBERT: What can computer vision teach NLP a…
|
|
2020-06-19
|
|
PaLM 540B (0-shot)
|
PaLM: Scaling Language Modeling with Pathways
|
|
2022-04-05
|
|
SMART-BERT
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
SMART
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
BERT-large 340M
|
BERT: Pre-training of Deep Bidirectional Transfor…
|
|
2018-10-11
|
|
T5-Small
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
TinyBERT-4 14.5M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
data2vec
|
data2vec: A General Framework for Self-supervised…
|
|
2022-02-07
|
|
Bloomberg GPT 50B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
|
2023-03-30
|
|
FNet-Large
|
FNet: Mixing Tokens with Fourier Transforms
|
|
2021-05-09
|
|
GPT-3 175B (few-shot, k=32)
|
Language Models are Few-Shot Learners
|
|
2020-05-28
|
|
ERNIE
|
ERNIE: Enhanced Language Representation with Info…
|
|
2019-05-17
|
|
AlexaTM 20B
|
AlexaTM 20B: Few-Shot Learning Using a Large-Scal…
|
|
2022-08-02
|
|
LaMini-GPT 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
|
2023-04-27
|
|
SenseBERT-base 110M
|
SenseBERT: Driving Some Sense into BERT
|
|
2019-08-15
|
|
OPT-IML 1.3B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
|
2022-12-22
|
|
TinyBERT-6 67M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
LaMini-F-T5 783M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
|
2023-04-27
|
|
DistilBERT 66M
|
DistilBERT, a distilled version of BERT: smaller,…
|
|
2019-10-02
|
|
Neo-6B (QA)
|
Ask Me Anything: A simple strategy for prompting …
|
|
2022-10-05
|
|
UL2 20B (0-shot)
|
UL2: Unifying Language Learning Paradigms
|
|
2022-05-10
|
|
OPT 175B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
|
2022-12-22
|
|
N-Grammer 343M
|
N-Grammer: Augmenting Transformers with latent n-…
|
|
2022-07-13
|
|
Hybrid H3 125M (0-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
|
2022-12-28
|
|
Neo-6B (few-shot)
|
Ask Me Anything: A simple strategy for prompting …
|
|
2022-10-05
|
|
OPT 30B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
|
2022-12-22
|
|
Hybrid H3 125M (3-shot, logit scoring)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
|
2022-12-28
|
|
Hybrid H3 125M (3-shot, rank classification)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
|
2022-12-28
|
|
24hBERT
|
How to Train BERT with an Academic Budget
|
|
2021-04-15
|
|
BLOOM 176B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
|
2023-03-30
|
|
LaMini-T5 738M
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
|
2023-04-27
|
|
OPT 66B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
|
2023-03-30
|
|
OPT 1.3B
|
OPT-IML: Scaling Language Model Instruction Meta …
|
|
2022-12-22
|
|
GPT-NeoX 20B (1-shot)
|
BloombergGPT: A Large Language Model for Finance
|
|
2023-03-30
|
|
H3 125M (0-shot, rank classification)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
|
2022-12-28
|
|
GPT-2-XL 1.5B
|
LaMini-LM: A Diverse Herd of Distilled Models fro…
|
|
2023-04-27
|
|
H3 125M (3-shot, rank classification)
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
|
2022-12-28
|
|