T5-3B
|
Exploring the Limits of Transfer Learning with a …
|
92.50
|
2019-10-23
|
|
T5-Large
|
Exploring the Limits of Transfer Learning with a …
|
92.40
|
2019-10-23
|
|
T5-11B
|
Exploring the Limits of Transfer Learning with a …
|
91.90
|
2019-10-23
|
|
MT-DNN-SMART
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
91.70
|
2019-11-08
|
|
BigBird
|
Big Bird: Transformers for Longer Sequences
|
91.50
|
2020-07-28
|
|
Charformer-Tall
|
Charformer: Fast Character Transformers via Gradi…
|
91.40
|
2021-06-23
|
|
RoBERTa-large 355M + Entailment as Few-shot Learner
|
Entailment as Few-Shot Learner
|
91.00
|
2021-04-29
|
|
T5-Base
|
Exploring the Limits of Transfer Learning with a …
|
90.70
|
2019-10-23
|
|
PSQ (Chen et al., 2020)
|
A Statistical Framework for Low-bitwidth Training…
|
90.40
|
2020-10-27
|
|
Q8BERT (Zafrir et al., 2019)
|
Q8BERT: Quantized 8Bit BERT
|
89.70
|
2019-10-14
|
|
T5-Small
|
Exploring the Limits of Transfer Learning with a …
|
89.70
|
2019-10-23
|
|
BERT-LARGE
|
BERT: Pre-training of Deep Bidirectional Transfor…
|
89.30
|
2018-10-11
|
|
Q-BERT (Shen et al., 2020)
|
Q-BERT: Hessian Based Ultra Low Precision Quantiz…
|
88.20
|
2019-09-12
|
|
ALBERT
|
ALBERT: A Lite BERT for Self-supervised Learning …
|
|
2019-09-26
|
|
RoBERTa (ensemble)
|
RoBERTa: A Robustly Optimized BERT Pretraining Ap…
|
|
2019-07-26
|
|
StructBERTRoBERTa ensemble
|
StructBERT: Incorporating Language Structures int…
|
|
2019-08-13
|
|
FLOATER-large
|
Learning to Encode Position for Transformer with …
|
|
2020-03-13
|
|
SMART
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
|
LLM.int8(): 8-bit Matrix Multiplication for Trans…
|
|
2022-08-15
|
|
SpanBERT
|
SpanBERT: Improving Pre-training by Representing …
|
|
2019-07-24
|
|
XLNet (single model)
|
XLNet: Generalized Autoregressive Pretraining for…
|
|
2019-06-19
|
|
AutoBERT-Zero (Large)
|
AutoBERT-Zero: Evolving BERT Backbone from Scratch
|
|
2021-07-15
|
|
MLM+ del-word+ reorder
|
CLEAR: Contrastive Learning for Sentence Represen…
|
|
2020-12-31
|
|
AutoBERT-Zero (Base)
|
AutoBERT-Zero: Evolving BERT Backbone from Scratch
|
|
2021-07-15
|
|
DistilBERT 66M
|
DistilBERT, a distilled version of BERT: smaller,…
|
|
2019-10-02
|
|
MobileBERT
|
MobileBERT: a Compact Task-Agnostic BERT for Reso…
|
|
2020-04-06
|
|
ERNIE
|
ERNIE: Enhanced Language Representation with Info…
|
|
2019-05-17
|
|
FNet-Large
|
FNet: Mixing Tokens with Fourier Transforms
|
|
2021-05-09
|
|
SqueezeBERT
|
SqueezeBERT: What can computer vision teach NLP a…
|
|
2020-06-19
|
|
24hBERT
|
How to Train BERT with an Academic Budget
|
|
2021-04-15
|
|
ERNIE 2.0 Large
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
TinyBERT-6 67M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
RealFormer
|
RealFormer: Transformer Likes Residual Attention
|
|
2020-12-21
|
|
RoBERTa + SubRegWeigh (K-means)
|
SubRegWeigh: Effective and Efficient Annotation W…
|
|
2024-09-10
|
|
SMARTRoBERTa
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|
TinyBERT-4 14.5M
|
TinyBERT: Distilling BERT for Natural Language Un…
|
|
2019-09-23
|
|
ERNIE 2.0 Base
|
ERNIE 2.0: A Continual Pre-training Framework for…
|
|
2019-07-29
|
|
GenSen
|
Learning General Purpose Distributed Sentence Rep…
|
|
2018-03-30
|
|
InferSent
|
Supervised Learning of Universal Sentence Represe…
|
|
2017-05-05
|
|
Nyströmformer
|
Nyströmformer: A Nyström-Based Algorithm for Appr…
|
|
2021-02-07
|
|
BERT-Base
|
Intrinsic Dimensionality Explains the Effectivene…
|
|
2020-12-22
|
|
BERT-Large
|
Intrinsic Dimensionality Explains the Effectivene…
|
|
2020-12-22
|
|
SMART-BERT
|
SMART: Robust and Efficient Fine-Tuning for Pre-t…
|
|
2019-11-08
|
|