📊 Showing 5 results | 📏 Metric: CorrSc
Rank | Model | Paper | CorrSc | Date | Code |
---|---|---|---|---|---|
1 | GPT-4 | Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code | 0.85 | 2023-12-22 | 📦 shahinhonarvar/turbulence-benchmark |
2 | GPT-3.5-Turbo | Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code | 0.62 | 2023-12-22 | 📦 shahinhonarvar/turbulence-benchmark |
3 | CodeLlama:13B-4bit-quantised | Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code | 0.33 | 2023-12-22 | 📦 shahinhonarvar/turbulence-benchmark |
4 | CodeLlama:7B-4bit-quantised | Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code | 0.29 | 2023-12-22 | 📦 shahinhonarvar/turbulence-benchmark |
5 | Command | Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code | 0.06 | 2023-12-22 | 📦 shahinhonarvar/turbulence-benchmark |