Extension test cases of HumanEval, as well as generated code.
Variants: HumanEval-ET
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Code Generation | EG-CFG (DeepSeek-V3-0324) | Execution Guided Line-by-Line Code Generation | 2025-06-12 |
Code Generation | LPW (GPT-4o) | Planning-Driven Programming: A Large Language … | 2024-11-21 |
Recent papers with results on this dataset: