Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, XiaoqingItai Gat, YossiEllen Tan, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve, Meta Ai, Cᴏᴅᴇ Lʟᴀᴍᴀ -Iɴsᴛʀᴜᴄᴛ, Cᴏᴅᴇ Lʟᴀᴍᴀ, Cᴏᴅᴇ Lʟᴀᴍᴀ -Pʏᴛʜᴏɴ (2023)
Code Llama introduces an advanced family of large language models designed specifically for code generation and programming tasks, built on Llama 2. The models, which include variants like Code Llama, Code Llama -Python, and Code Llama -Instruct, range from 7B to 70B parameters, and excel in zero-shot instruction following and infilling capabilities, alongside enhanced context support allowing input sequences of up to 100k tokens. The models demonstrate state-of-the-art performance on coding benchmarks, achieving up to 67% on HumanEval and 65% on MBPP, outperforming other existing models. Training strategies include code specialization from a foundation pretrained model, with multitask objectives allowing for both autoregressive and causal infilling predictions. Additionally, the models underwent instruction fine-tuning for improved safety and helpfulness, producing enhanced results on safety-related benchmarks such as TruthfulQA, ToxiGen, and BOLD. The paper emphasizes the need for specialized models to optimize performance, particularly in complex coding environments, while also providing guidelines for responsible usage given potential risks associated with code generation.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: