Language Modelling
|
BFN |
Bayesian Flow Networks
|
2023-08-14 |
Language Modelling
|
Focus |
Focus Your Attention (with Adaptive …
|
2023-05-24 |
Language Modelling
|
Transformer-LS (small) |
Long-Short Transformer: Efficient Transformers for …
|
2021-07-05 |
Language Modelling
|
PAR Transformer 24B |
Pay Attention when Required
|
2020-09-09 |
Language Modelling
|
BP-Transformer - 12 Layers |
BP-Transformer: Modelling Long-Range Context via …
|
2019-11-11 |
Language Modelling
|
All-attention network - 18 layers |
Augmenting Self-attention with Persistent Memory
|
2019-07-02 |
Language Modelling
|
All-attention network - 36 layers |
Augmenting Self-attention with Persistent Memory
|
2019-07-02 |
Language Modelling
|
Bipartite flows (8 flows) |
Discrete Flows: Invertible Generative Models …
|
2019-05-24 |
Language Modelling
|
12L Transformer + 8K adaptive span |
Adaptive Attention Span in Transformers
|
2019-05-19 |
Language Modelling
|
24L Transformer + 8K adaptive span |
Adaptive Attention Span in Transformers
|
2019-05-19 |
Language Modelling
|
Transformer-XL + RMS dynamic eval + decay |
Dynamic Evaluation of Transformer Language …
|
2019-04-17 |
Language Modelling
|
Transformer-XL - 24 layers |
Transformer-XL: Attentive Language Models Beyond …
|
2019-01-09 |
Language Modelling
|
12-layer Character Transformer Model |
Character-Level Language Modeling with Deeper …
|
2018-08-09 |
Language Modelling
|
64-layer Character Transformer Model |
Character-Level Language Modeling with Deeper …
|
2018-08-09 |
Language Modelling
|
mLSTM + dynamic eval |
Dynamic Evaluation of Neural Sequence …
|
2017-09-21 |
Language Modelling
|
Unregularised mLSTM |
Multiplicative LSTM for sequence modelling
|
2016-09-26 |
Language Modelling
|
Large mLSTM +emb +WN +VD |
Multiplicative LSTM for sequence modelling
|
2016-09-26 |
Language Modelling
|
LayerNorm HM-LSTM |
Hierarchical Multiscale Recurrent Neural Networks
|
2016-09-06 |
Language Modelling
|
Large RHN |
Recurrent Highway Networks
|
2016-07-12 |
Language Modelling
|
BN LSTM |
Recurrent Batch Normalization
|
2016-03-30 |