OPT-175B (50% Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
234.77
|
2023-01-02
|
|
Grave et al. (2016) - LSTM
|
Improving Neural Language Models with a Continuou…
|
99.30
|
2016-12-13
|
|
Inan et al. (2016) - Variational LSTM (tied) (h=650)
|
Tying Word Vectors and Word Classifiers: A Loss F…
|
87.70
|
2016-11-04
|
|
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss
|
Tying Word Vectors and Word Classifiers: A Loss F…
|
87.00
|
2016-11-04
|
|
Grave et al. (2016) - LSTM + continuous cache pointer
|
Improving Neural Language Models with a Continuou…
|
68.90
|
2016-12-13
|
|
EGRU
|
Efficient recurrent architectures through activit…
|
68.90
|
2022-06-13
|
|
Melis et al. (2017) - 1-layer LSTM (tied)
|
On the State of the Art of Evaluation in Neural L…
|
65.90
|
2017-07-18
|
|
AWD-LSTM
|
Regularizing and Optimizing LSTM Language Models
|
65.80
|
2017-08-07
|
|
AWD-LSTM + ATOI
|
Alleviating Sequence Information Loss with Data O…
|
64.73
|
2019-09-18
|
|
AWD-LSTM 3-layer with Fraternal dropout
|
Fraternal Dropout
|
64.10
|
2017-10-31
|
|
AWD-LSTM-DRILL
|
Deep Residual Output Layers for Neural Language G…
|
61.90
|
2019-05-14
|
|
AWD-FWM Schlag et al. (2020)
|
Learning Associative Inference Using Fast Weight …
|
61.65
|
2020-11-16
|
|
AWD-LSTM-MoS
|
Breaking the Softmax Bottleneck: A High-Rank RNN …
|
61.45
|
2017-11-10
|
|
AWD-LSTM-MoS + Partial Shuffle
|
Partially Shuffling the Training Data to Improve …
|
59.98
|
2019-03-11
|
|
AWD-LSTM-DOC
|
Direct Output Connection for a High-Rank Language…
|
58.03
|
2018-08-30
|
|
AWD-LSTM-DOC + Partial Shuffle
|
Partially Shuffling the Training Data to Improve …
|
57.85
|
2019-03-11
|
|
Mogrifier LSTM
|
Mogrifier LSTM
|
55.10
|
2019-09-04
|
|
Ensemble of All
|
Advancing State of the Art in Language Modeling
|
53.73
|
2023-11-28
|
|
AWD-LSTM-DOC x5
|
Direct Output Connection for a High-Rank Language…
|
53.09
|
2018-08-30
|
|
AWD-LSTM + continuous cache pointer
|
Regularizing and Optimizing LSTM Language Models
|
52.00
|
2017-08-07
|
|
AWD-LSTM + dynamic eval
|
Dynamic Evaluation of Neural Sequence Models
|
44.30
|
2017-09-21
|
|
AWD-LSTM-DRILL + dynamic eval
|
Deep Residual Output Layers for Neural Language G…
|
42.00
|
2019-05-14
|
|
AWD-LSTM-MoS + dynamic eval
|
Breaking the Softmax Bottleneck: A High-Rank RNN …
|
40.68
|
2017-11-10
|
|
GL-LWGC + AWD-MoS-LSTM + dynamic eval
|
Gradual Learning of Recurrent Neural Networks
|
40.46
|
2017-08-29
|
|
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
|
Improved Language Modeling by Decoding the Past
|
40.30
|
2018-08-14
|
|
FRAGE + AWD-LSTM-MoS + dynamic eval
|
FRAGE: Frequency-Agnostic Word Representation
|
39.14
|
2018-09-18
|
|
adversarial + AWD-LSTM-MoS + dynamic eval
|
Improving Neural Language Modeling via Adversaria…
|
38.65
|
2019-06-10
|
|
Mogrifier LSTM + dynamic eval
|
Mogrifier LSTM
|
38.60
|
2019-09-04
|
|
BERT-Large-CAS
|
Language Models with Transformers
|
34.10
|
2019-04-20
|
|
GPT-2 (fine-tuned)
|
Hydra: A System for Large Multi-Model Deep Learni…
|
15.17
|
2021-10-16
|
|
SparseGPT (175B, 2:4 Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
8.73
|
2023-01-02
|
|
SparseGPT (175B, 4:8 Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
8.45
|
2023-01-02
|
|
OPT-175B
|
SparseGPT: Massive Language Models Can Be Accurat…
|
8.34
|
2023-01-02
|
|
SparseGPT (175B, 50% Sparsity)
|
SparseGPT: Massive Language Models Can Be Accurat…
|
8.21
|
2023-01-02
|
|