📊 Showing 3 results | 📏 Metric: Perplexity
Rank | Model | Paper | Perplexity | Date | Code |
---|---|---|---|---|---|
1 | FLASH-Quad-8k | Transformer Quality in Linear Time | 15.00 | 2022-02-21 | 📦 lucidrains/FLASH-pytorch 📦 zhuiyitechnology/gau-alpha |
2 | Combiner-Axial-8k | Combiner: Full Attention Transformer with Sparse Computation Cost | 16.49 | 2021-07-12 | 📦 google-research/google-research 📦 mindspore-courses/External-Attention-MindSpore |
3 | Combiner-Fixed-8k | Combiner: Full Attention Transformer with Sparse Computation Cost | 16.60 | 2021-07-12 | 📦 google-research/google-research 📦 mindspore-courses/External-Attention-MindSpore |