📊 Showing 4 results | 📏 Metric: Inst-level loose-accuracy
Rank | Model | Paper | Inst-level loose-accuracy | Date | Code |
---|---|---|---|---|---|
1 | AutoIF (Llama3 70B) | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | 90.40 | 2024-06-19 | 📦 QwenLM/AutoIF |
2 | AutoIF (Qwen2 72B) | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | 88.00 | 2024-06-19 | 📦 QwenLM/AutoIF |
3 | GPT-4 | Instruction-Following Evaluation for Large Language Models | 85.37 | 2023-11-14 | 📦 google-research/google-research 📦 deepseek-ai/deepseek-llm 📦 josejg/instruction_following_eval 📦 lightblue-tech/M-IFEval |
4 | PaLM 2 S | Instruction-Following Evaluation for Large Language Models | 59.11 | 2023-11-14 | 📦 google-research/google-research 📦 deepseek-ai/deepseek-llm 📦 josejg/instruction_following_eval 📦 lightblue-tech/M-IFEval |