MBPP

Mostly Basic Python Programming

Dataset Information
Introduced
2021
License
Unknown
Homepage

Overview

The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry-level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases.

Variants: MBPP

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Code Generation EG-CFG (DeepSeek Coder 1.3b Instruct) Execution Guided Line-by-Line Code Generation 2025-06-12
Code Generation EG-CFG (DeepSeek-V3-0324) Execution Guided Line-by-Line Code Generation 2025-06-12
Code Generation CodeSim (GPT4o) CODESIM: Multi-Agent Code Generation and … 2025-02-08
Code Generation QualityFlow (Sonnet-3.5) QualityFlow: An Agentic Workflow for … 2025-01-20
Code Generation LPW (GPT-4o) Planning-Driven Programming: A Large Language … 2024-11-21
Code Generation AFlow(GPT-4o-mini) AFlow: Automating Agentic Workflow Generation 2024-10-14
Code Generation MGDebugger (DeepSeek-V3-0324) From Code to Correctness: Closing … 2024-10-02
Code Generation MGDebugger (CodeQwen1.5) From Code to Correctness: Closing … 2024-10-02
Code Generation MapCoder (GPT-4o) MapCoder: Multi-Agent Code Generation for … 2024-05-18
Code Generation MapCoder (GPT-4) MapCoder: Multi-Agent Code Generation for … 2024-05-18
Code Generation o1-mini + MapCoder (Hamming.ai) MapCoder: Multi-Agent Code Generation for … 2024-05-18
Code Generation GPT-3.5 Turbo + FlowGenScrum + Test SOEN-101: Code Generation by Emulating … 2024-03-23
Code Generation Branch-Train-MiX 4x7B (sampling top-2 experts) Branch-Train-MiX: Mixing Expert LLMs into … 2024-03-12
Code Generation Branch-Train-Merge 4x7B (top-2) Branch-Train-MiX: Mixing Expert LLMs into … 2024-03-12
Code Generation StarCoder2-15B StarCoder 2 and The Stack … 2024-02-29
Code Generation DeepSeek-Coder-Instruct 33B (few-shot) DeepSeek-Coder: When the Large Language … 2024-01-25
Code Generation DeepSeek-Coder-Base 6.7B (few-shot) DeepSeek-Coder: When the Large Language … 2024-01-25
Code Generation DeepSeek-Coder-Base 33B (few-shot) DeepSeek-Coder: When the Large Language … 2024-01-25
Code Generation GPT-4 (few-shot) DeepSeek-Coder: When the Large Language … 2024-01-25
Code Generation GPT-3.5 Turbo (few-shot) DeepSeek-Coder: When the Large Language … 2024-01-25

Research Papers

Recent papers with results on this dataset: