A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round 40 billion characters and aimed to accelerate the research of multilingual modeling.
Variants: Wiki-40B
This dataset is used in 3 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Benchmarking | OutEffHop-Bert_base | Outlier-Efficient Hopfield Layers for Large … | 2024-04-04 |
Quantization | OutEffHop-Bert_base | Outlier-Efficient Hopfield Layers for Large … | 2024-04-04 |
Language Modelling | FLASH-Quad-8k | Transformer Quality in Linear Time | 2022-02-21 |
Language Modelling | Combiner-Axial-8k | Combiner: Full Attention Transformer with … | 2021-07-12 |
Language Modelling | Combiner-Fixed-8k | Combiner: Full Attention Transformer with … | 2021-07-12 |
Recent papers with results on this dataset: