ML Research Wiki / Benchmarks / Language Modelling / The Pile

The Pile

Language Modelling Benchmark

Performance Over Time

📊 Showing 39 results | 📏 Metric: Bits per byte

Top Performing Models

Rank	Model	Paper	Bits per byte	Date	Code
1	Smaller Transformer 126M (pre-trained)	Need a Small Specialized Language Model? Plan Early!	33.00	2024-02-02	-
2	OPT 125M	Knowledge Unlearning for Mitigating Privacy Risks in Language Models	32.26	2022-10-04	📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
3	Larger Transformer 771M (pre-trained)	Need a Small Specialized Language Model? Plan Early!	28.10	2024-02-02	-
4	OPT 1.3B	Knowledge Unlearning for Mitigating Privacy Risks in Language Models	19.55	2022-10-04	📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
5	GPT-Neo 125M	Knowledge Unlearning for Mitigating Privacy Risks in Language Models	17.83	2022-10-04	📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
6	OPT 2.7B	Knowledge Unlearning for Mitigating Privacy Risks in Language Models	17.81	2022-10-04	📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
7	Smaller Transformer 126M (fine-tuned)	Need a Small Specialized Language Model? Plan Early!	12.00	2024-02-02	-
8	GPT-Neo 1.3B	Knowledge Unlearning for Mitigating Privacy Risks in Language Models	11.46	2022-10-04	📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning
9	Transformer 125M	Hungry Hungry Hippos: Towards Language Modeling with State Space Models	10.70	2022-12-28	📦 hazyresearch/safari 📦 hazyresearch/h3 📦 lindermanlab/S5
10	GPT-Neo 2.7B	Knowledge Unlearning for Mitigating Privacy Risks in Language Models	10.44	2022-10-04	📦 joeljang/knowledge-unlearning 📦 shreya1313/llm-unlearning

All Papers (39)

Need a Small Specialized Language Model? Plan Early!

2024

Smaller Transformer 126M (pre-trained)

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

2022

OPT 125M

joeljang/knowledge-unlearning shreya1313/llm-unlearning

Need a Small Specialized Language Model? Plan Early!

2024

Larger Transformer 771M (pre-trained)

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

2022

OPT 1.3B

joeljang/knowledge-unlearning shreya1313/llm-unlearning

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

2022

GPT-Neo 125M

joeljang/knowledge-unlearning shreya1313/llm-unlearning

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

2022

OPT 2.7B

joeljang/knowledge-unlearning shreya1313/llm-unlearning

Need a Small Specialized Language Model? Plan Early!

2024

Smaller Transformer 126M (fine-tuned)

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

2022

GPT-Neo 1.3B

joeljang/knowledge-unlearning shreya1313/llm-unlearning

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

2022

Transformer 125M

hazyresearch/safari hazyresearch/h3 lindermanlab/S5

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

2022

GPT-Neo 2.7B

joeljang/knowledge-unlearning shreya1313/llm-unlearning

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

2022

Hybrid H3 125M

hazyresearch/safari hazyresearch/h3 lindermanlab/S5

Need a Small Specialized Language Model? Plan Early!

2024

Larger Transformer 771M (fine-tuned)

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-2 Small 124M (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-2 Medium 355M (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-2 Large 774M (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-2 XL 1.5B (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-3 Ada 350M (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-3 Babbage 1.3B (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Test-Time Fine-Tuning with SIFT + GPT-2 (124M)

jonhue/activeft

Test-Time Training on Nearest Neighbors for Large Language Models

2023

GPT-2 Large 774M (test-time training on nearest neighbors)

socialfoundations/tttlm

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Llama-3.2-Instruct 1B

jonhue/activeft

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-3 Curie 6.7B (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Test-Time Fine-Tuning with SIFT + GPT-2 (774M)

jonhue/activeft

GLM-130B: An Open Bilingual Pre-trained Model

2022

GPT-3

thudm/chatglm2-6b thudm/chatglm3

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Llama-3.2-Instruct 3B

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Gemma-2 2B

jonhue/activeft

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2020

GPT-3 Davinci 175B (pre-trained)

EleutherAI/gpt-neo EleutherAI/GPTNeo

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Llama-3.2 1B

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Phi-3 3.8B

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Phi-3 7B

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Gemma-2 9B

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Phi-3 14B

jonhue/activeft

GLM-130B: An Open Bilingual Pre-trained Model

2022

Jurassic-1

thudm/chatglm2-6b thudm/chatglm3

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Llama-3.2 3B

jonhue/activeft

GLM-130B: An Open Bilingual Pre-trained Model

2022

GLM-130B

thudm/chatglm2-6b thudm/chatglm3

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Gemma-2 27B

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)

jonhue/activeft

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

2024

Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)

jonhue/activeft

Model	Paper	Bits per byte	Date
Smaller Transformer 126M (pre-trained)	Need a Small Specialized Language Model? Plan Ear…	33.00	2024-02-02
OPT 125M	Knowledge Unlearning for Mitigating Privacy Risks…	32.26	2022-10-04
Larger Transformer 771M (pre-trained)	Need a Small Specialized Language Model? Plan Ear…	28.10	2024-02-02
OPT 1.3B	Knowledge Unlearning for Mitigating Privacy Risks…	19.55	2022-10-04
GPT-Neo 125M	Knowledge Unlearning for Mitigating Privacy Risks…	17.83	2022-10-04
OPT 2.7B	Knowledge Unlearning for Mitigating Privacy Risks…	17.81	2022-10-04
Smaller Transformer 126M (fine-tuned)	Need a Small Specialized Language Model? Plan Ear…	12.00	2024-02-02
GPT-Neo 1.3B	Knowledge Unlearning for Mitigating Privacy Risks…	11.46	2022-10-04
Transformer 125M	Hungry Hungry Hippos: Towards Language Modeling w…	10.70	2022-12-28
GPT-Neo 2.7B	Knowledge Unlearning for Mitigating Privacy Risks…	10.44	2022-10-04
Hybrid H3 125M	Hungry Hungry Hippos: Towards Language Modeling w…	10.20	2022-12-28
Larger Transformer 771M (fine-tuned)	Need a Small Specialized Language Model? Plan Ear…	10.00	2024-02-02
GPT-2 Small 124M (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	1.23	2020-12-31
GPT-2 Medium 355M (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	1.09	2020-12-31
GPT-2 Large 774M (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	1.08	2020-12-31
GPT-2 XL 1.5B (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	1.05	2020-12-31
GPT-3 Ada 350M (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	0.96	2020-12-31
GPT-3 Babbage 1.3B (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	0.87	2020-12-31
Test-Time Fine-Tuning with SIFT + GPT-2 (124M)	Efficiently Learning at Test-Time: Active Fine-Tu…	0.86	2024-10-10
GPT-2 Large 774M (test-time training on nearest neighbors)	Test-Time Training on Nearest Neighbors for Large…	0.85	2023-05-29
Llama-3.2-Instruct 1B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.81	2024-10-10
GPT-3 Curie 6.7B (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	0.80	2020-12-31
Test-Time Fine-Tuning with SIFT + GPT-2 (774M)	Efficiently Learning at Test-Time: Active Fine-Tu…	0.76	2024-10-10
GPT-3	GLM-130B: An Open Bilingual Pre-trained Model	0.74	2022-10-05
Llama-3.2-Instruct 3B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.74	2024-10-10
Gemma-2 2B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.72	2024-10-10
GPT-3 Davinci 175B (pre-trained)	The Pile: An 800GB Dataset of Diverse Text for La…	0.72	2020-12-31
Llama-3.2 1B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.70	2024-10-10
Phi-3 3.8B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.68	2024-10-10
Phi-3 7B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.68	2024-10-10
Gemma-2 9B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.67	2024-10-10
Phi-3 14B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.65	2024-10-10
Jurassic-1	GLM-130B: An Open Bilingual Pre-trained Model	0.65	2022-10-05
Llama-3.2 3B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.64	2024-10-10
GLM-130B	GLM-130B: An Open Bilingual Pre-trained Model	0.63	2022-10-05
Gemma-2 27B	Efficiently Learning at Test-Time: Active Fine-Tu…	0.63	2024-10-10
Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)	Efficiently Learning at Test-Time: Active Fine-Tu…	0.61	2024-10-10
Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)	Efficiently Learning at Test-Time: Active Fine-Tu…	0.60	2024-10-10
Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)	Efficiently Learning at Test-Time: Active Fine-Tu…	0.56	2024-10-10

The Pile

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (39)