SAFIM

Name: SAFIM
Published: 2024-03-07
License: CC-BY-4.0

Syntax-Aware Fill-In-the-Middle

Dataset Information

Modalities

Texts

Languages

English

Introduced

2024

License

CC-BY-4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Syntax-Aware Fill-in-the-Middle (SAFIM) is a benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. SAFIM has three subtasks: Algorithmic Block Completion, Control-Flow Expression Completion, and API Function Call Completion. SAFIM is sourced from code submitted from April 2022 to January 2023 to minimize the impact of data contamination on evaluation results.

Authors: Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung
Paper: https://arxiv.org/abs/2403.04814
Huggingface Dataset: https://huggingface.co/datasets/gonglinyuan/safim
Leaderboard: https://safimbenchmark.com
Code & Submission Instructions: https://github.com/gonglinyuan/safim

The SAFIM benchmark is partially derived from problem descriptions and code solutions from https://codeforces.com. According to the license of CodeForces, you may publish the texts of Codeforces problems in any open sources, but you must preserve a direct link to the site.

Variants: SAFIM

Associated Benchmarks

This dataset is used in 1 benchmark:

Code Completion - Metrics: Average, Algorithmic, Control, API

Recent Benchmark Submissions

Task	Model	Paper	Date
Code Completion	deepseek-coder-33b-base	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	deepseek-coder-6.7b-base	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	starcoderbase	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	gpt-4-1106-preview	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	CodeLlama-13b-hf	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	deepseek-coder-1.3b-base	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	CodeLlama-34b-hf	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	CodeLlama-7b-hf	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	gpt-3.5-turbo-0301	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	incoder-6B	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	codegen-16B-multi	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	codegen-2B-multi	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	incoder-1B	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	codegen-6B-multi	Evaluation of LLMs on Syntax-Aware …	2024-03-07
Code Completion	codegen-350M-multi	Evaluation of LLMs on Syntax-Aware …	2024-03-07

Research Papers

Recent papers with results on this dataset:

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks (2024) -

External Links:

SAFIM

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview