APPS

Name: APPS
Published: 2021-05-20
License: Unknown

Automated Programming Progress Standard

Dataset Information

Modalities

Texts

Languages

English

Introduced

2021

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The APPS dataset consists of problems collected from different open-access coding websites such as Codeforces, Kattis, and more. The APPS benchmark attempts to mirror how humans programmers are evaluated by posing coding problems in unrestricted natural language and evaluating the correctness of solutions. The problems range in difficulty from introductory to collegiate competition level and measure coding ability as well as problem-solving.

The Automated Programming Progress Standard, abbreviated APPS, consists of 10,000 coding problems in total, with 131,836 test cases for checking solutions and 232,444 ground-truth solutions written by humans. Problems can be complicated, as the average length of a problem is 293.2 words. The data are split evenly into training and test sets, with 5,000 problems each. In the test set, every problem has multiple test cases, and the average number of test cases is 21.2. Each test case is specifically designed for the corresponding problem, enabling us to rigorously evaluate program functionality.

Source: Measuring Coding Challenge Competence With APPS

Image source: Measuring Coding Challenge Competence With APPS

Variants: APPS

Associated Benchmarks

This dataset is used in 1 benchmark:

Code Generation - Metrics: Introductory Pass@1, Interview Pass@1, Competition Pass@1, Competition Pass@any, Interview Pass@any, Introductory Pass@any, Competition Pass@5, Interview Pass@5, Introductory Pass@5, Competition Pass@1000, Interview Pass@1000, Introductory Pass@1000, Pass@1

Recent Benchmark Submissions

Task	Model	Paper	Date
Code Generation	CodeSim (GPT4)	CODESIM: Multi-Agent Code Generation and …	2025-02-08
Code Generation	LPW (GPT-4o)	Planning-Driven Programming: A Large Language …	2024-11-21
Code Generation	MapCoder APPS-150-cherrypicked (GPT-4)	MapCoder: Multi-Agent Code Generation for …	2024-05-18
Code Generation	deepseek-ai/deepseek-coder-6.7b-instruct	DeepSeek-Coder: When the Large Language …	2024-01-25
Code Generation	MoTCoder-32B-V1.5	MoTCoder: Elevating Large Language Models …	2023-12-26
Code Generation	MoTCoder-7B-V1.5	MoTCoder: Elevating Large Language Models …	2023-12-26
Code Generation	CodeChain+WizardCoder-15b	CodeChain: Towards Modular Code Generation …	2023-10-13
Code Generation	WizardCoder-15b	CodeChain: Towards Modular Code Generation …	2023-10-13
Code Generation	code-davinci-002 175B	CodeT: Code Generation with Generated …	2022-07-21
Code Generation	code-davinci-002 175B (CodeT)	CodeT: Code Generation with Generated …	2022-07-21
Code Generation	CodeRL+CodeT5	CodeRL: Mastering Code Generation through …	2022-07-05
Code Generation	GPT-J 6B (Finetuned)	CodeRL: Mastering Code Generation through …	2022-07-05
Code Generation	GPT-Neo 2.7B (Finetuned)	CodeRL: Mastering Code Generation through …	2022-07-05
Code Generation	GPT2 1.5B (Finetuned)	CodeRL: Mastering Code Generation through …	2022-07-05
Code Generation	AlphaCode 1B	Competition-Level Code Generation with AlphaCode	2022-02-08
Code Generation	AlphaCode 1B Filtered from 50000	Competition-Level Code Generation with AlphaCode	2022-02-08
Code Generation	Codex 12B (Raw)	Evaluating Large Language Models Trained …	2021-07-07
Code Generation	GPT-Neo 2.7B	Measuring Coding Challenge Competence With …	2021-05-20

Research Papers

Recent papers with results on this dataset:

External Links:

APPS

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview