LTG-BERT-base 98M
|
Not all layers are equally as important: Every La…
|
82.70
|
2023-11-03
|
|
ELC-BERT-base 98M
|
Not all layers are equally as important: Every La…
|
82.60
|
2023-11-03
|
|
LTG-BERT-small 24M
|
Not all layers are equally as important: Every La…
|
77.60
|
2023-11-03
|
|
ELC-BERT-small 24M
|
Not all layers are equally as important: Every La…
|
76.10
|
2023-11-03
|
|
PSQ (Chen et al., 2020)
|
A Statistical Framework for Low-bitwidth Training…
|
67.50
|
2020-10-27
|
|
Q-BERT (Shen et al., 2020)
|
Q-BERT: Hessian Based Ultra Low Precision Quantiz…
|
65.10
|
2019-09-12
|
|
Q8BERT (Zafrir et al., 2019)
|
Q8BERT: Quantized 8Bit BERT
|
65.00
|
2019-10-14
|
|
24hBERT
|
How to Train BERT with an Academic Budget
|
57.10
|
2021-04-15
|
|
BERT+TDA
|
Can BERT eat RuCoLA? Topological Data Analysis to…
|
0.73
|
2023-04-04
|
|
RoBERTa+TDA
|
Can BERT eat RuCoLA? Topological Data Analysis to…
|
0.70
|
2023-04-04
|
|
RemBERT
|
RuCoLA: Russian Corpus of Linguistic Acceptability
|
0.60
|
2022-10-23
|
|
En-BERT + TDA
|
Acceptability Judgements via Examining the Topolo…
|
0.57
|
2022-05-19
|
|
En-BERT + TDA + PCA
|
Acceptability Judgements via Examining the Topolo…
|
|
2022-05-19
|
|
deberta-v3-base+tasksource
|
tasksource: A Dataset Harmonization Framework for…
|
|
2023-01-14
|
|
RoBERTa-large 355M + Entailment as Few-shot Learner
|
Entailment as Few-Shot Learner
|
|
2021-04-29
|
|
FNet-Large
|
FNet: Mixing Tokens with Fourier Transforms
|
|
2021-05-09
|
|
T5-11B
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
StructBERTRoBERTa ensemble
|
StructBERT: Incorporating Language Structures int…
|
|
2019-08-13
|
|
ALBERT
|
ALBERT: A Lite BERT for Self-supervised Learning …
|
|
2019-09-26
|
|
XLNet (single model)
|
XLNet: Generalized Autoregressive Pretraining for…
|
|
2019-06-19
|
|
FLOATER-large
|
Learning to Encode Position for Transformer with …
|
|
2020-03-13
|
|
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
|
LLM.int8(): 8-bit Matrix Multiplication for Trans…
|
|
2022-08-15
|
|
ERNIE 2.0 Base
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
MT-DNN
|
Multi-Task Deep Neural Networks for Natural Langu…
|
|
2019-01-31
|
|
RoBERTa (ensemble)
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
|
2019-07-26
|
|
T5-XL 3B
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
SpanBERT
|
SpanBERT: Improving Pre-training by Representing …
|
|
2019-07-24
|
|
MLM+ del-span+ reorder
|
CLEAR: Contrastive Learning for Sentence Represen…
|
|
2020-12-31
|
|
ERNIE 2.0 Large
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
T5-Large 770M
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
BERT-LARGE
|
BERT: Pre-training of Deep Bidirectional Transfor…
|
|
2018-10-11
|
|
data2vec
|
data2vec: A General Framework for Self-supervised…
|
|
2022-02-07
|
|
RealFormer
|
RealFormer: Transformer Likes Residual Attention
|
|
2020-12-21
|
|
BigBird
|
Big Bird: Transformers for Longer Sequences
|
|
2020-07-28
|
|
ERNIE
|
ERNIE: Enhanced Language Representation with Info…
|
|
2019-05-17
|
|
Charformer-Tall
|
Charformer: Fast Character Transformers via Gradi…
|
|
2021-06-23
|
|
T5-Base
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
DistilBERT 66M
|
DistilBERT, a distilled version of BERT: smaller,…
|
|
2019-10-02
|
|
SqueezeBERT
|
SqueezeBERT: What can computer vision teach NLP a…
|
|
2020-06-19
|
|
TinyBERT-4 14.5M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
T5-Small
|
Exploring the Limits of Transfer Learning with a …
|
|
2019-10-23
|
|
LM-CPPF RoBERTa-base
|
LM-CPPF: Paraphrasing-Guided Data Augmentation fo…
|
|
2023-05-29
|
|