td-LSTM (Zhang et al., 2016)
|
Architectural Complexity Measures of Recurrent Ne…
|
1.63
|
2016-02-26
|
|
td-LSTM-large
|
Architectural Complexity Measures of Recurrent Ne…
|
1.49
|
2016-02-26
|
|
BFN
|
Bayesian Flow Networks
|
1.41
|
2023-08-14
|
|
Unregularised mLSTM
|
Multiplicative LSTM for sequence modelling
|
1.40
|
2016-09-26
|
|
BN LSTM
|
Recurrent Batch Normalization
|
1.36
|
2016-03-30
|
|
LayerNorm HM-LSTM
|
Hierarchical Multiscale Recurrent Neural Networks
|
1.29
|
2016-09-06
|
|
Large RHN
|
Recurrent Highway Networks
|
1.27
|
2016-07-12
|
|
Large mLSTM +emb +WN +VD
|
Multiplicative LSTM for sequence modelling
|
1.27
|
2016-09-26
|
|
Bipartite flows (8 flows)
|
Discrete Flows: Invertible Generative Models of D…
|
1.23
|
2019-05-24
|
|
mLSTM + dynamic eval
|
Dynamic Evaluation of Neural Sequence Models
|
1.19
|
2017-09-21
|
|
12-layer Character Transformer Model
|
Character-Level Language Modeling with Deeper Sel…
|
1.18
|
2018-08-09
|
|
PAR Transformer 24B
|
Pay Attention when Required
|
1.18
|
2020-09-09
|
|
64-layer Character Transformer Model
|
Character-Level Language Modeling with Deeper Sel…
|
1.13
|
2018-08-09
|
|
12L Transformer + 8K adaptive span
|
Adaptive Attention Span in Transformers
|
1.11
|
2019-05-19
|
|
All-attention network - 18 layers
|
Augmenting Self-attention with Persistent Memory
|
1.11
|
2019-07-02
|
|
BP-Transformer - 12 Layers
|
BP-Transformer: Modelling Long-Range Context via …
|
1.11
|
2019-11-11
|
|
Transformer-LS (small)
|
Long-Short Transformer: Efficient Transformers fo…
|
1.09
|
2021-07-05
|
|
Transformer-XL - 24 layers
|
Transformer-XL: Attentive Language Models Beyond …
|
1.08
|
2019-01-09
|
|
All-attention network - 36 layers
|
Augmenting Self-attention with Persistent Memory
|
1.08
|
2019-07-02
|
|
24L Transformer + 8K adaptive span
|
Adaptive Attention Span in Transformers
|
1.07
|
2019-05-19
|
|
Transformer-XL + RMS dynamic eval + decay
|
Dynamic Evaluation of Transformer Language Models
|
1.04
|
2019-04-17
|
|
Focus
|
Focus Your Attention (with Adaptive IIR Filters)
|
0.98
|
2023-05-24
|
|