ML Research Wiki / Benchmarks / Code Generation / Turbulence

Turbulence

Code Generation Benchmark

Performance Over Time

📊 Showing 5 results | 📏 Metric: CorrSc

Rank	Model	Paper	CorrSc	Date	Code
1	GPT-4	Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code	0.85	2023-12-22	📦 shahinhonarvar/turbulence-benchmark
2	GPT-3.5-Turbo	Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code	0.62	2023-12-22	📦 shahinhonarvar/turbulence-benchmark
3	CodeLlama:13B-4bit-quantised	Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code	0.33	2023-12-22	📦 shahinhonarvar/turbulence-benchmark
4	CodeLlama:7B-4bit-quantised	Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code	0.29	2023-12-22	📦 shahinhonarvar/turbulence-benchmark
5	Command	Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code	0.06	2023-12-22	📦 shahinhonarvar/turbulence-benchmark

2023

GPT-4

shahinhonarvar/turbulence-benchmark

2023

GPT-3.5-Turbo

shahinhonarvar/turbulence-benchmark

2023

CodeLlama:13B-4bit-quantised

shahinhonarvar/turbulence-benchmark

2023

CodeLlama:7B-4bit-quantised

shahinhonarvar/turbulence-benchmark

2023

Command

shahinhonarvar/turbulence-benchmark

Model	Paper	CorrSc	Date
GPT-4	Turbulence: Systematically and Automatically Test…	0.85	2023-12-22
GPT-3.5-Turbo	Turbulence: Systematically and Automatically Test…	0.62	2023-12-22
CodeLlama:13B-4bit-quantised	Turbulence: Systematically and Automatically Test…	0.33	2023-12-22
CodeLlama:7B-4bit-quantised	Turbulence: Systematically and Automatically Test…	0.29	2023-12-22
Command	Turbulence: Systematically and Automatically Test…	0.06	2023-12-22