HumanEvalPack is an extension of OpenAI's HumanEval to cover 6 total languages across 3 tasks. The evaluation suite is fully created by humans.
Variants: HumanEvalPack
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Program Repair | MGDebugger (DeepSeek-Coder-V2-Lite) | From Code to Correctness: Closing … | 2024-10-02 |
Recent papers with results on this dataset: