ML Research Wiki / Benchmarks / Language Modelling / WikiText-103

WikiText-103

Language Modelling Benchmark

Performance Over Time

📊 Showing 83 results | 📏 Metric: Test perplexity

Top Performing Models

Rank Model Paper Test perplexity Date Code
1 Decay RNN How much complexity does an RNN architecture need to learn syntax-sensitive dependencies? 76.67 2020-05-17 📦 bhattg/Decay-RNN-ACL-SRW2020
2 GRU How much complexity does an RNN architecture need to learn syntax-sensitive dependencies? 53.78 2020-05-17 📦 bhattg/Decay-RNN-ACL-SRW2020
3 LSTM How much complexity does an RNN architecture need to learn syntax-sensitive dependencies? 52.73 2020-05-17 📦 bhattg/Decay-RNN-ACL-SRW2020
4 LSTM Improving Neural Language Models with a Continuous Cache 48.70 2016-12-13 📦 dmlc/gluon-nlp 📦 salesforce/awd-lstm-lm 📦 uclanlp/NamedEntityLanguageModel
5 TCN An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling 45.19 2018-03-04 📦 timeseriesAI/tsai 📦 locuslab/TCN 📦 philipperemy/keras-tcn
6 GCNN-8 Language Modeling with Gated Convolutional Networks 44.90 2016-12-23 📦 facebookresearch/fairseq 📦 mhagiwara/nanigonet 📦 Rishit-dagli/GLU
7 Neural cache model (size = 100) Improving Neural Language Models with a Continuous Cache 44.80 2016-12-13 📦 dmlc/gluon-nlp 📦 salesforce/awd-lstm-lm 📦 uclanlp/NamedEntityLanguageModel
8 Neural cache model (size = 2,000) Improving Neural Language Models with a Continuous Cache 40.80 2016-12-13 📦 dmlc/gluon-nlp 📦 salesforce/awd-lstm-lm 📦 uclanlp/NamedEntityLanguageModel
9 GCNN-8 Language Modeling with Gated Convolutional Networks 37.20 2016-12-23 📦 facebookresearch/fairseq 📦 mhagiwara/nanigonet 📦 Rishit-dagli/GLU
10 LSTM Fast Parametric Learning with Activation Memorization 36.40 2018-03-27 -

All Papers (83)

Random Feature Attention

2021
Rfa-Gate-Gaussian-Stateful (Small)

Random Feature Attention

2021
Rfa-Gate-Gaussian-Stateful (Big)

$\infty$-former: Infinite Memory Transformer

2021
∞-former (Sticky memories + initialized GPT-2 Small)