ML Research Wiki / Benchmarks / Language Modelling / WikiText-2

WikiText-2

Language Modelling Benchmark

Performance Over Time

📊 Showing 34 results | 📏 Metric: Test perplexity

Top Performing Models

Rank Model Paper Test perplexity Date Code
1 OPT-175B (50% Sparsity) 📚 SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot 234.77 2023-01-02 📦 nvidia/tensorrt-model-optimizer 📦 ist-daslab/sparsegpt 📦 nvlabs/maskllm
2 Grave et al. (2016) - LSTM Improving Neural Language Models with a Continuous Cache 99.30 2016-12-13 📦 dmlc/gluon-nlp 📦 salesforce/awd-lstm-lm 📦 uclanlp/NamedEntityLanguageModel
3 Inan et al. (2016) - Variational LSTM (tied) (h=650) Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling 87.70 2016-11-04 📦 JianGoForIt/YellowFin_Pytorch 📦 rdspring1/PyTorch_GBW_LM 📦 floydhub/word-language-model 📦 InnerPeace-Wu/im2p-tensorflow 📦 Ravoxsg/Word-level-language-modeling
4 Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling 87.00 2016-11-04 📦 JianGoForIt/YellowFin_Pytorch 📦 rdspring1/PyTorch_GBW_LM 📦 floydhub/word-language-model 📦 InnerPeace-Wu/im2p-tensorflow 📦 Ravoxsg/Word-level-language-modeling
5 Grave et al. (2016) - LSTM + continuous cache pointer Improving Neural Language Models with a Continuous Cache 68.90 2016-12-13 📦 dmlc/gluon-nlp 📦 salesforce/awd-lstm-lm 📦 uclanlp/NamedEntityLanguageModel
6 EGRU Efficient recurrent architectures through activity sparsity and sparse back-propagation through time 68.90 2022-06-13 📦 khaleelkhan/evnn
7 Melis et al. (2017) - 1-layer LSTM (tied) On the State of the Art of Evaluation in Neural Language Models 65.90 2017-07-18 📦 deepmind/lamb
8 AWD-LSTM Regularizing and Optimizing LSTM Language Models 65.80 2017-08-07 📦 google-research/google-research 📦 fastai/fastai 📦 dmlc/gluon-nlp
9 AWD-LSTM + ATOI Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes 64.73 2019-09-18 📦 nkcr/overlap-ml
10 AWD-LSTM 3-layer with Fraternal dropout Fraternal Dropout 64.10 2017-10-31 📦 kondiz/fraternal-dropout

All Papers (34)

Fraternal Dropout

2017
AWD-LSTM 3-layer with Fraternal dropout

Improved Language Modeling by Decoding the Past

2018
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.