ML Research Wiki / Benchmarks / Language Modelling / Wiki-40B

Wiki-40B

Language Modelling Benchmark

Performance Over Time

📊 Showing 3 results | 📏 Metric: Perplexity

Rank	Model	Paper	Perplexity	Date	Code
1	FLASH-Quad-8k	Transformer Quality in Linear Time	15.00	2022-02-21	📦 lucidrains/FLASH-pytorch 📦 zhuiyitechnology/gau-alpha
2	Combiner-Axial-8k	Combiner: Full Attention Transformer with Sparse Computation Cost	16.49	2021-07-12	📦 google-research/google-research 📦 mindspore-courses/External-Attention-MindSpore
3	Combiner-Fixed-8k	Combiner: Full Attention Transformer with Sparse Computation Cost	16.60	2021-07-12	📦 google-research/google-research 📦 mindspore-courses/External-Attention-MindSpore

2022

FLASH-Quad-8k

lucidrains/FLASH-pytorch zhuiyitechnology/gau-alpha

2021

Combiner-Axial-8k

google-research/google-research mindspore-courses/External-Attention-MindSpore

2021

Combiner-Fixed-8k

google-research/google-research mindspore-courses/External-Attention-MindSpore

Model	Paper	Perplexity	Date
FLASH-Quad-8k	Transformer Quality in Linear Time	15.00	2022-02-21
Combiner-Axial-8k	Combiner: Full Attention Transformer with Sparse …	16.49	2021-07-12
Combiner-Fixed-8k	Combiner: Full Attention Transformer with Sparse …	16.60	2021-07-12