RES-Q

RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale

Dataset Information

Modalities

Texts

Languages

English

Introduced

2024

License

MIT

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

RES-Q is a natural language instruction-based benchmark for evaluating $\textbf{R}$epository $\textbf{E}$diting $\textbf{S}$ystems, which consists of 100 handcrafted repository editing tasks derived from real GitHub commits. Given an edit instruction and a code repository, RES-Q evaluates an LLM system’s ability to interpret edit instructions, gather information, and construct appropriate edits to the repository.

Variants: RES-Q

Associated Benchmarks

This dataset is used in 1 benchmark:

Code Generation - Metrics: pass@1

Recent Benchmark Submissions

Task	Model	Paper	Date
Code Generation	QurrentOS-coder + Claude 3.5 Sonnet	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + GPT-4o	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + GPT-4 Turbo	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + Claude 3 Opus	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + GPT-4	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + Gemini 1.5 Pro	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + DeepSeek-Coder-V2	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + Llama 3 70b	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24
Code Generation	QurrentOS-coder + Qwen-72B-Instruct	RES-Q: Evaluating Code-Editing Large Language …	2024-06-24

Research Papers

Recent papers with results on this dataset:

RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale (2024) -

External Links:

RES-Q

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview