PECC

PECC: Problem Extraction and Coding Challenges

Dataset Information

Modalities

Texts

Languages

English

Introduced

2024

License

MIT

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% of the AoC challenges and only 8% on the Euler problems. By probing the limits of LLMs' capabilities, our benchmark provides a framework to monitor and assess the subsequent progress of LLMs as a universal problem solver.

Variants: PECC

Associated Benchmarks

This dataset is used in 1 benchmark:

Code Generation - Metrics: Pass@3

Recent Benchmark Submissions

Task	Model	Paper	Date
Code Generation	Claude 3 Haiku	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	GPT-3.5 Turbo	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	codechat-bison	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	chat-bison	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	Mixtral-8x7B-Instruct	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	Phi-3-mini-128k-instruct	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	WizardLM-2-7B	PECC: Problem Extraction and Coding …	2024-04-29
Code Generation	Llama-3-8B-Instruct	PECC: Problem Extraction and Coding …	2024-04-29

Research Papers

Recent papers with results on this dataset:

PECC: Problem Extraction and Coding Challenges (2024) -

External Links:

PECC

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview