PSQ (Chen et al., 2020)
|
A Statistical Framework for Low-bitwidth Training…
|
94.50
|
2020-10-27
|
|
Q-BERT (Shen et al., 2020)
|
Q-BERT: Hessian Based Ultra Low Precision Quantiz…
|
93.00
|
2019-09-12
|
|
Q8BERT (Zafrir et al., 2019)
|
Q8BERT: Quantized 8Bit BERT
|
93.00
|
2019-10-14
|
|
24hBERT
|
How to Train BERT with an Academic Budget
|
90.60
|
2021-04-15
|
|
ALBERT
|
ALBERT: A Lite BERT for Self-supervised Learning …
|
|
2019-09-26
|
|
StructBERTRoBERTa ensemble
|
StructBERT: Incorporating Language Structures int…
|
|
2019-08-13
|
|
ALICE
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
MT-DNN-SMART
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
RoBERTa (ensemble)
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
|
2019-07-26
|
|
T5-11B
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
T5-3B
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
DeBERTaV3large
|
DeBERTaV3: Improving DeBERTa using ELECTRA-Style …
|
|
2021-11-18
|
|
DeBERTa (large)
|
DeBERTa: Decoding-enhanced BERT with Disentangled…
|
|
2020-06-05
|
|
XLNet (single model)
|
XLNet: Generalized Autoregressive Pretraining for…
|
|
2019-06-19
|
|
T5-Large 770M
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
|
LLM.int8(): 8-bit Matrix Multiplication for Trans…
|
|
2022-08-15
|
|
ERNIE 2.0 Large
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
RoBERTa-large 355M + Entailment as Few-shot Learner
|
Entailment as Few-Shot Learner
|
|
2021-04-29
|
|
SpanBERT
|
SpanBERT: Improving Pre-training by Representing …
|
|
2019-07-24
|
|
TRANS-BLSTM
|
TRANS-BLSTM: Transformer with Bidirectional LSTM …
|
|
2020-03-16
|
|
T5-Base
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
ASA + RoBERTa
|
Adversarial Self-Attention for Language Understan…
|
|
2022-06-25
|
|
MLM+ subs+ del-span
|
CLEAR: Contrastive Learning for Sentence Represen…
|
|
2020-12-31
|
|
ERNIE 2.0 Base
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
BERT-LARGE
|
BERT: Pre-training of Deep Bidirectional Transfor…
|
|
2018-10-11
|
|
BigBird
|
Big Bird: Transformers for Longer Sequences
|
|
2020-07-28
|
|
RealFormer
|
RealFormer: Transformer Likes Residual Attention
|
|
2020-12-21
|
|
ASA + BERT-base
|
Adversarial Self-Attention for Language Understan…
|
|
2022-06-25
|
|
ERNIE
|
ERNIE: Enhanced Language Representation with Info…
|
|
2019-05-17
|
|
data2vec
|
data2vec: A General Framework for Self-supervised…
|
|
2022-02-07
|
|
Charformer-Tall
|
Charformer: Fast Character Transformers via Gradi…
|
|
2021-06-23
|
|
SenseBERT-base 110M
|
SenseBERT: Driving Some Sense into BERT
|
|
2019-08-15
|
|
TinyBERT-6 67M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
T5-Small
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
DistilBERT 66M
|
DistilBERT, a distilled version of BERT: smaller,…
|
|
2019-10-02
|
|
SqueezeBERT
|
SqueezeBERT: What can computer vision teach NLP a…
|
|
2020-06-19
|
|
Nyströmformer
|
Nyströmformer: A Nyström-Based Algorithm for Appr…
|
|
2021-02-07
|
|
TinyBERT-4 14.5M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
FNet-Large
|
FNet: Mixing Tokens with Fourier Transforms
|
|
2021-05-09
|
|
LM-CPPF RoBERTa-base
|
LM-CPPF: Paraphrasing-Guided Data Augmentation fo…
|
|
2023-05-29
|
|
SMART-BERT
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
SMARTRoBERTa
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|