APPS

Automated Programming Progress Standard

Dataset Information
Modalities
Texts
Languages
English
Introduced
2021
License
Unknown
Homepage

Overview

The APPS dataset consists of problems collected from different open-access coding websites such as Codeforces, Kattis, and more. The APPS benchmark attempts to mirror how humans programmers are evaluated by posing coding problems in unrestricted natural language and evaluating the correctness of solutions. The problems range in difficulty from introductory to collegiate competition level and measure coding ability as well as problem-solving.

The Automated Programming Progress Standard, abbreviated APPS, consists of 10,000 coding problems in total, with 131,836 test cases for checking solutions and 232,444 ground-truth solutions written by humans. Problems can be complicated, as the average length of a problem is 293.2 words. The data are split evenly into training and test sets, with 5,000 problems each. In the test set, every problem has multiple test cases, and the average number of test cases is 21.2. Each test case is specifically designed for the corresponding problem, enabling us to rigorously evaluate program functionality.

Source: Measuring Coding Challenge Competence With APPS

Image source: Measuring Coding Challenge Competence With APPS

Variants: APPS

Associated Benchmarks

This dataset is used in 1 benchmark:

  • Code Generation -

Recent Benchmark Submissions

Task Model Paper Date
Code Generation CodeSim (GPT4) CODESIM: Multi-Agent Code Generation and … 2025-02-08
Code Generation LPW (GPT-4o) Planning-Driven Programming: A Large Language … 2024-11-21
Code Generation MapCoder APPS-150-cherrypicked (GPT-4) MapCoder: Multi-Agent Code Generation for … 2024-05-18
Code Generation deepseek-ai/deepseek-coder-6.7b-instruct DeepSeek-Coder: When the Large Language … 2024-01-25
Code Generation MoTCoder-32B-V1.5 MoTCoder: Elevating Large Language Models … 2023-12-26
Code Generation MoTCoder-7B-V1.5 MoTCoder: Elevating Large Language Models … 2023-12-26
Code Generation CodeChain+WizardCoder-15b CodeChain: Towards Modular Code Generation … 2023-10-13
Code Generation WizardCoder-15b CodeChain: Towards Modular Code Generation … 2023-10-13
Code Generation code-davinci-002 175B CodeT: Code Generation with Generated … 2022-07-21
Code Generation code-davinci-002 175B (CodeT) CodeT: Code Generation with Generated … 2022-07-21
Code Generation CodeRL+CodeT5 CodeRL: Mastering Code Generation through … 2022-07-05
Code Generation GPT-J 6B (Finetuned) CodeRL: Mastering Code Generation through … 2022-07-05
Code Generation GPT-Neo 2.7B (Finetuned) CodeRL: Mastering Code Generation through … 2022-07-05
Code Generation GPT2 1.5B (Finetuned) CodeRL: Mastering Code Generation through … 2022-07-05
Code Generation AlphaCode 1B Competition-Level Code Generation with AlphaCode 2022-02-08
Code Generation AlphaCode 1B Filtered from 50000 Competition-Level Code Generation with AlphaCode 2022-02-08
Code Generation Codex 12B (Raw) Evaluating Large Language Models Trained … 2021-07-07
Code Generation GPT-Neo 2.7B Measuring Coding Challenge Competence With … 2021-05-20

Research Papers

Recent papers with results on this dataset: