EG-CFG (DeepSeek-V3-0324)
|
Execution Guided Line-by-Line Code Generation
|
96.60
|
2025-06-12
|
|
QualityFlow (Sonnet-3.5)
|
QualityFlow: An Agentic Workflow for Program Synt…
|
94.20
|
2025-01-20
|
|
o1-mini + MapCoder (Hamming.ai)
|
MapCoder: Multi-Agent Code Generation for Competi…
|
93.20
|
2024-05-18
|
|
MGDebugger (DeepSeek-V3-0324)
|
From Code to Correctness: Closing the Last Mile o…
|
92.40
|
2024-10-02
|
|
GPT-4 + AgentCoder
|
AgentCoder: Multi-Agent-based Code Generation wit…
|
91.80
|
2023-12-20
|
|
CodeSim (GPT4o)
|
CODESIM: Multi-Agent Code Generation and Problem …
|
90.70
|
2025-02-08
|
|
GPT-3.5 Turbo (ChatGPT) + AgentCoder
|
AgentCoder: Multi-Agent-based Code Generation wit…
|
89.90
|
2023-12-20
|
|
MapCoder (GPT-4o)
|
MapCoder: Multi-Agent Code Generation for Competi…
|
89.70
|
2024-05-18
|
|
GPT-4 (ChatGPT Plus)
|
How Does Naming Affect LLMs on Code Analysis Task…
|
87.50
|
2023-07-24
|
|
LPW (GPT-4o)
|
Planning-Driven Programming: A Large Language Mod…
|
84.80
|
2024-11-21
|
|
AFlow(GPT-4o-mini)
|
AFlow: Automating Agentic Workflow Generation
|
83.40
|
2024-10-14
|
|
GPT-3.5 Turbo (ChatGPT)
|
How Does Naming Affect LLMs on Code Analysis Task…
|
83.20
|
2023-07-24
|
|
EG-CFG (DeepSeek Coder 1.3b Instruct)
|
Execution Guided Line-by-Line Code Generation
|
83.20
|
2025-06-12
|
|
MapCoder (GPT-4)
|
MapCoder: Multi-Agent Code Generation for Competi…
|
83.10
|
2024-05-18
|
|
o1-mini + Language Agent Tree Search (Hamming.ai)
|
Language Agent Tree Search Unifies Reasoning Acti…
|
82.30
|
2023-10-06
|
|
GPT-4 (Bing Chat)
|
How Does Naming Affect LLMs on Code Analysis Task…
|
82.00
|
2023-07-24
|
|
GPT-3.5 Turbo + Language Agent Tree Search
|
Language Agent Tree Search Unifies Reasoning Acti…
|
81.10
|
2023-10-06
|
|
MGDebugger (CodeQwen1.5)
|
From Code to Correctness: Closing the Last Mile o…
|
80.80
|
2024-10-02
|
|
GPT-4 (Self-Debugging with unit tests + trace)
|
Teaching Large Language Models to Self-Debug
|
80.20
|
2023-04-11
|
|
GPT-4 (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
80.00
|
2024-01-25
|
|
Bard (PaLM 2/chat-bison-001)
|
How Does Naming Affect LLMs on Code Analysis Task…
|
76.20
|
2023-07-24
|
|
GPT-3.5 Turbo (Self-Debugging with unit tests + trace)
|
Teaching Large Language Models to Self-Debug
|
72.80
|
2023-04-11
|
|
Claude
|
How Does Naming Affect LLMs on Code Analysis Task…
|
71.40
|
2023-07-24
|
|
code-davinci-002 175B (Self-Debugging with unit tests + trace)
|
Teaching Large Language Models to Self-Debug
|
70.80
|
2023-04-11
|
|
GPT-3.5 Turbo (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
70.80
|
2024-01-25
|
|
DeepSeek-Coder-Instruct 33B (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
70.00
|
2024-01-25
|
|
GPT-3.5 Turbo + INTERVENOR
|
INTERVENOR: Prompting the Coding Ability of Large…
|
69.80
|
2023-11-16
|
|
code-davinci-002 175B + LEVER
|
LEVER: Learning to Verify Language-to-Code Genera…
|
68.90
|
2023-02-16
|
|
code-davinci-002 175B + CodeT
|
CodeT: Code Generation with Generated Tests
|
67.70
|
2022-07-21
|
|
GPT-3.5 Turbo (3-shot)
|
Teaching Large Language Models to Self-Debug
|
67.60
|
2023-04-11
|
|
code-davinci-002 175B + Reviewer
|
Coder Reviewer Reranking for Code Generation
|
66.90
|
2022-11-29
|
|
code-davinci-002 175B + Coder-Reviewer
|
Coder Reviewer Reranking for Code Generation
|
66.40
|
2022-11-29
|
|
StarCoder2-15B
|
StarCoder 2 and The Stack v2: The Next Generation
|
66.20
|
2024-02-29
|
|
DeepSeek-Coder-Base 33B (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
66.00
|
2024-01-25
|
|
Code Llama - Python 70B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
65.50
|
2023-08-24
|
|
DeepSeek-Coder-Instruct 6.7B (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
65.40
|
2024-01-25
|
|
code-davinci-002 175B + MBR-Exec
|
Coder Reviewer Reranking for Code Generation
|
63.00
|
2022-11-29
|
|
Code Llama 70B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
62.40
|
2023-08-24
|
|
Code Llama - Instruct 70B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
62.20
|
2023-08-24
|
|
code-davinci-001 175B + CodeT
|
CodeT: Code Generation with Generated Tests
|
61.90
|
2022-07-21
|
|
code-davinci-002 175B (3-shot)
|
Teaching Large Language Models to Self-Debug
|
61.40
|
2023-04-11
|
|
Unnatural Code Llama 34B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
61.20
|
2023-08-24
|
|
Mixtral 8x7B (3-shot)
|
Mixtral of Experts
|
60.70
|
2024-01-08
|
|
DeepSeek-Coder-Base 6.7B (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
60.60
|
2024-01-25
|
|
code-davinci-001 175B + MBR-Exec
|
Natural Language to Code Translation with Executi…
|
58.20
|
2022-04-25
|
|
Code Llama - Instruct 34B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
57.00
|
2023-08-24
|
|
Code Llama - Python 34B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
56.20
|
2023-08-24
|
|
code-cushman-001 12B (CodeT)
|
CodeT: Code Generation with Generated Tests
|
55.40
|
2022-07-21
|
|
Code Llama 34B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
55.00
|
2023-08-24
|
|
StarCoder 15.5B (Self-Debugging with unit tests + trace)
|
Teaching Large Language Models to Self-Debug
|
53.20
|
2023-04-11
|
|
StarCoder 15.5B
|
StarCoder: may the source be with you!
|
52.70
|
2023-05-09
|
|
GPT-3.5 Turbo
|
Code Llama: Open Foundation Models for Code
|
52.20
|
2023-08-24
|
|
WizardCoder 15B
|
WizardCoder: Empowering Code Large Language Model…
|
51.80
|
2023-06-14
|
|
PaLM 2-S* (few-shot)
|
PaLM 2 Technical Report
|
50.00
|
2023-05-17
|
|
CodeGen-Mono 16B + CodeT
|
CodeT: Code Generation with Generated Tests
|
49.50
|
2022-07-21
|
|
Code Llama - Instruct 13B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
49.40
|
2023-08-24
|
|
DeepSeek-Coder-Instruct 1.3B (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
49.40
|
2024-01-25
|
|
StarCoderBase 15.5B
|
StarCoder: may the source be with you!
|
49.00
|
2023-05-09
|
|
Code Llama - Python 13B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
49.00
|
2023-08-24
|
|
Qwen2idae-16x14B (4-shot)
|
Parameter-Efficient Sparsity Crafting from Dense …
|
48.60
|
2024-01-05
|
|
code-cushman-001 12B + MBR-Exec
|
Coder Reviewer Reranking for Code Generation
|
48.30
|
2022-11-29
|
|
Code Llama - Python 7B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
47.60
|
2023-08-24
|
|
Mistral 7B (3-shot)
|
Mistral 7B
|
47.50
|
2023-10-10
|
|
CodeGen 16B + MBR-Exec
|
Coder Reviewer Reranking for Code Generation
|
47.30
|
2022-11-29
|
|
StarCoder 15.5B (3-shot)
|
Teaching Large Language Models to Self-Debug
|
47.20
|
2023-04-11
|
|
PaLM Coder 540B
|
PaLM: Scaling Language Modeling with Pathways
|
47.00
|
2022-04-05
|
|
Code Llama 13B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
47.00
|
2023-08-24
|
|
CodeGen 16B + Coder-Reviewer
|
Coder Reviewer Reranking for Code Generation
|
46.20
|
2022-11-29
|
|
DeepSeek-Coder-Base 1.3B (few-shot)
|
DeepSeek-Coder: When the Large Language Model Mee…
|
46.20
|
2024-01-25
|
|
GPT-3.5 Turbo (few-shot)
|
INTERVENOR: Prompting the Coding Ability of Large…
|
45.40
|
2023-11-16
|
|
Llama 2 70B (zero-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
45.00
|
2023-07-18
|
|
Code Llama - Instruct 7B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
44.40
|
2023-08-24
|
|
CodeGen 16B + Reviewer
|
Coder Reviewer Reranking for Code Generation
|
44.10
|
2022-11-29
|
|
phi-1.5-web 1.3B
|
Textbooks Are All You Need II: phi-1.5 technical …
|
43.50
|
2023-09-11
|
|
Branch-Train-Merge 4x7B (top-2)
|
Branch-Train-MiX: Mixing Expert LLMs into a Mixtu…
|
42.60
|
2024-03-12
|
|
Code Llama 7B (3-shot)
|
Code Llama: Open Foundation Models for Code
|
41.40
|
2023-08-24
|
|
Camelidae-8×34B (4-shot)
|
Parameter-Efficient Sparsity Crafting from Dense …
|
41.40
|
2024-01-05
|
|
GPT-3.5 Turbo (0-shot)
|
INTERVENOR: Prompting the Coding Ability of Large…
|
39.80
|
2023-11-16
|
|
Branch-Train-MiX 4x7B (sampling top-2 experts)
|
Branch-Train-MiX: Mixing Expert LLMs into a Mixtu…
|
39.40
|
2024-03-12
|
|
LLaMA 65B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
37.70
|
2023-02-27
|
|
PaLM 540B
|
PaLM: Scaling Language Modeling with Pathways
|
36.80
|
2022-04-05
|
|
SantaCoder 1.1B
|
StarCoder: may the source be with you!
|
35.00
|
2023-05-09
|
|
InCoder 6.7B + CodeT
|
CodeT: Code Generation with Generated Tests
|
34.40
|
2022-07-21
|
|
Llama 2 34B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
33.00
|
2023-07-18
|
|
Llama 2 13B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
30.60
|
2023-07-18
|
|
LLaMA 33B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
30.20
|
2023-02-27
|
|
InCoder 6.7B + MBR-Exec
|
Coder Reviewer Reranking for Code Generation
|
26.70
|
2022-11-29
|
|
InCoder 6.7B + Coder-Reviewer
|
Coder Reviewer Reranking for Code Generation
|
26.10
|
2022-11-29
|
|
InCoder 6.7B + Reviewer
|
Coder Reviewer Reranking for Code Generation
|
24.40
|
2022-11-29
|
|
CodeGeeX-13B
|
CodeGeeX: A Pre-Trained Model for Code Generation…
|
24.40
|
2023-03-30
|
|
LLaMA 13B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
22.00
|
2023-02-27
|
|
Llama 2 7B (0-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
20.80
|
2023-07-18
|
|
InCoder 6.7B (0-shot)
|
InCoder: A Generative Model for Code Infilling an…
|
19.40
|
2022-04-12
|
|
LLaMA 7B (0-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
17.70
|
2023-02-27
|
|
GPT-3.5 Turbo + FlowGenScrum + Test
|
SOEN-101: Code Generation by Emulating Software P…
|
|
2024-03-23
|
|