ML Research Wiki / Benchmarks / Language Modelling / The Pile

The Pile

Language Modelling Benchmark

Performance Over Time

📊 Showing 39 results | 📏 Metric: Bits per byte

Top Performing Models

Rank Model Paper Bits per byte Date Code
1 Smaller Transformer 126M (pre-trained) Need a Small Specialized Language Model? Plan Early! 33.00 2024-02-02 -
2 OPT 125M Knowledge Unlearning for Mitigating Privacy Risks in Language Models 32.26 2022-10-04 📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
3 Larger Transformer 771M (pre-trained) Need a Small Specialized Language Model? Plan Early! 28.10 2024-02-02 -
4 OPT 1.3B Knowledge Unlearning for Mitigating Privacy Risks in Language Models 19.55 2022-10-04 📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
5 GPT-Neo 125M Knowledge Unlearning for Mitigating Privacy Risks in Language Models 17.83 2022-10-04 📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
6 OPT 2.7B Knowledge Unlearning for Mitigating Privacy Risks in Language Models 17.81 2022-10-04 📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
7 Smaller Transformer 126M (fine-tuned) Need a Small Specialized Language Model? Plan Early! 12.00 2024-02-02 -
8 GPT-Neo 1.3B Knowledge Unlearning for Mitigating Privacy Risks in Language Models 11.46 2022-10-04 📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
9 Transformer 125M Hungry Hungry Hippos: Towards Language Modeling with State Space Models 10.70 2022-12-28 📦 hazyresearch/safari 📦 hazyresearch/h3 📦 lindermanlab/S5
10 GPT-Neo 2.7B Knowledge Unlearning for Mitigating Privacy Risks in Language Models 10.44 2022-10-04 📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning

All Papers (39)

Need a Small Specialized Language Model? Plan Early!

2024
Smaller Transformer 126M (pre-trained)

Need a Small Specialized Language Model? Plan Early!

2024
Larger Transformer 771M (pre-trained)

Need a Small Specialized Language Model? Plan Early!

2024
Smaller Transformer 126M (fine-tuned)

Need a Small Specialized Language Model? Plan Early!

2024
Larger Transformer 771M (fine-tuned)

Test-Time Training on Nearest Neighbors for Large Language Models

2023
GPT-2 Large 774M (test-time training on nearest neighbors)