Shakti-LLM (2.5B)
|
SHAKTI: A 2.5 Billion Parameter Small Language Mo…
|
68.40
|
2024-10-15
|
|
CoA
|
Chain-of-Action: Faithful and Multimodal Question…
|
67.30
|
2024-03-26
|
|
ToT
|
Tree of Thoughts: Deliberate Problem Solving with…
|
66.60
|
2023-05-17
|
|
CoA w/o actions
|
Chain-of-Action: Faithful and Multimodal Question…
|
63.30
|
2024-03-26
|
|
LLaMA 65B
|
LLaMA: Open and Efficient Foundation Language Mod…
|
53.00
|
2023-02-27
|
|
LLaMA 33B
|
LLaMA: Open and Efficient Foundation Language Mod…
|
48.00
|
2023-02-27
|
|
Auto-CoT
|
Automatic Chain of Thought Prompting in Large Lan…
|
42.20
|
2022-10-07
|
|
LLaMA 13B
|
LLaMA: Open and Efficient Foundation Language Mod…
|
41.00
|
2023-02-27
|
|
LLaMA 7B
|
LLaMA: Open and Efficient Foundation Language Mod…
|
29.00
|
2023-02-27
|
|
GPT-4 (RLHF)
|
GPT-4 Technical Report
|
0.59
|
2023-03-15
|
|
Mistral-7B-Instruct-v0.2 + TruthX
|
TruthX: Alleviating Hallucinations by Editing Lar…
|
0.56
|
2024-02-27
|
|
LLaMa-2-7B-Chat + TruthX
|
TruthX: Alleviating Hallucinations by Editing Lar…
|
0.54
|
2024-02-27
|
|
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)
|
Representation Engineering: A Top-Down Approach t…
|
0.54
|
2023-10-02
|
|
LLaMA-2-Chat-7B + Representation Control (Contrast Vector)
|
Representation Engineering: A Top-Down Approach t…
|
0.48
|
2023-10-02
|
|
Gopher 280B (zero-shot, Our Prompt + Choices)
|
Scaling Language Models: Methods, Analysis & Insi…
|
0.30
|
2021-12-08
|
|
GAL 120B
|
Galactica: A Large Language Model for Science
|
0.26
|
2022-11-16
|
|
Gopher 7.1 (zero-shot, QA prompts)
|
Scaling Language Models: Methods, Analysis & Insi…
|
0.25
|
2021-12-08
|
|
GAL 30B
|
Galactica: A Large Language Model for Science
|
0.24
|
2022-11-16
|
|
Gopher 7.1B (zero-shot, Our Prompt + Choices)
|
Scaling Language Models: Methods, Analysis & Insi…
|
0.23
|
2021-12-08
|
|
Gopher 1.4 (zero-shot, QA prompts)
|
Scaling Language Models: Methods, Analysis & Insi…
|
0.23
|
2021-12-08
|
|
GPT-2 1.5B
|
TruthfulQA: Measuring How Models Mimic Human Fals…
|
0.22
|
2021-09-08
|
|
Gopher 1.4B (zero-shot, Our Prompt + Choices)
|
Scaling Language Models: Methods, Analysis & Insi…
|
0.22
|
2021-12-08
|
|
GPT-3 175B
|
TruthfulQA: Measuring How Models Mimic Human Fals…
|
0.21
|
2021-09-08
|
|
OPT 175B
|
Galactica: A Large Language Model for Science
|
0.21
|
2022-11-16
|
|
GPT-J 6B
|
TruthfulQA: Measuring How Models Mimic Human Fals…
|
0.20
|
2021-09-08
|
|
UnifiedQA 3B
|
TruthfulQA: Measuring How Models Mimic Human Fals…
|
0.19
|
2021-09-08
|
|
GAL 125M
|
Galactica: A Large Language Model for Science
|
0.19
|
2022-11-16
|
|
GAL 1.3B
|
Galactica: A Large Language Model for Science
|
0.19
|
2022-11-16
|
|
GAL 6.7B
|
Galactica: A Large Language Model for Science
|
0.19
|
2022-11-16
|
|
Gopher 280B (zero-shot, QA prompts)
|
Scaling Language Models: Methods, Analysis & Insi…
|
|
2021-12-08
|
|