Smaller Transformer 126M (pre-trained)
|
Need a Small Specialized Language Model? Plan Ear…
|
33.00
|
2024-02-02
|
|
OPT 125M
|
Knowledge Unlearning for Mitigating Privacy Risks…
|
32.26
|
2022-10-04
|
|
Larger Transformer 771M (pre-trained)
|
Need a Small Specialized Language Model? Plan Ear…
|
28.10
|
2024-02-02
|
|
OPT 1.3B
|
Knowledge Unlearning for Mitigating Privacy Risks…
|
19.55
|
2022-10-04
|
|
GPT-Neo 125M
|
Knowledge Unlearning for Mitigating Privacy Risks…
|
17.83
|
2022-10-04
|
|
OPT 2.7B
|
Knowledge Unlearning for Mitigating Privacy Risks…
|
17.81
|
2022-10-04
|
|
Smaller Transformer 126M (fine-tuned)
|
Need a Small Specialized Language Model? Plan Ear…
|
12.00
|
2024-02-02
|
|
GPT-Neo 1.3B
|
Knowledge Unlearning for Mitigating Privacy Risks…
|
11.46
|
2022-10-04
|
|
Transformer 125M
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
10.70
|
2022-12-28
|
|
GPT-Neo 2.7B
|
Knowledge Unlearning for Mitigating Privacy Risks…
|
10.44
|
2022-10-04
|
|
Hybrid H3 125M
|
Hungry Hungry Hippos: Towards Language Modeling w…
|
10.20
|
2022-12-28
|
|
Larger Transformer 771M (fine-tuned)
|
Need a Small Specialized Language Model? Plan Ear…
|
10.00
|
2024-02-02
|
|
GPT-2 Small 124M (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
1.23
|
2020-12-31
|
|
GPT-2 Medium 355M (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
1.09
|
2020-12-31
|
|
GPT-2 Large 774M (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
1.08
|
2020-12-31
|
|
GPT-2 XL 1.5B (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
1.05
|
2020-12-31
|
|
GPT-3 Ada 350M (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
0.96
|
2020-12-31
|
|
GPT-3 Babbage 1.3B (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
0.87
|
2020-12-31
|
|
Test-Time Fine-Tuning with SIFT + GPT-2 (124M)
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.86
|
2024-10-10
|
|
GPT-2 Large 774M (test-time training on nearest neighbors)
|
Test-Time Training on Nearest Neighbors for Large…
|
0.85
|
2023-05-29
|
|
Llama-3.2-Instruct 1B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.81
|
2024-10-10
|
|
GPT-3 Curie 6.7B (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
0.80
|
2020-12-31
|
|
Test-Time Fine-Tuning with SIFT + GPT-2 (774M)
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.76
|
2024-10-10
|
|
GPT-3
|
GLM-130B: An Open Bilingual Pre-trained Model
|
0.74
|
2022-10-05
|
|
Llama-3.2-Instruct 3B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.74
|
2024-10-10
|
|
Gemma-2 2B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.72
|
2024-10-10
|
|
GPT-3 Davinci 175B (pre-trained)
|
The Pile: An 800GB Dataset of Diverse Text for La…
|
0.72
|
2020-12-31
|
|
Llama-3.2 1B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.70
|
2024-10-10
|
|
Phi-3 3.8B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.68
|
2024-10-10
|
|
Phi-3 7B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.68
|
2024-10-10
|
|
Gemma-2 9B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.67
|
2024-10-10
|
|
Phi-3 14B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.65
|
2024-10-10
|
|
Jurassic-1
|
GLM-130B: An Open Bilingual Pre-trained Model
|
0.65
|
2022-10-05
|
|
Llama-3.2 3B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.64
|
2024-10-10
|
|
GLM-130B
|
GLM-130B: An Open Bilingual Pre-trained Model
|
0.63
|
2022-10-05
|
|
Gemma-2 27B
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.63
|
2024-10-10
|
|
Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.61
|
2024-10-10
|
|
Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.60
|
2024-10-10
|
|
Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)
|
Efficiently Learning at Test-Time: Active Fine-Tu…
|
0.56
|
2024-10-10
|
|