Instruction Following Evaluation Datset
This dataset evaluates instruction following ability of large language models. There are 500+ prompts with instructions such as "write an article with more than 800 words", "wrap your response with double quotation marks", etc.
Variants: IFEval
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Instruction Following | AutoIF (Llama3 70B) | Self-play with Execution Feedback: Improving … | 2024-06-19 |
Instruction Following | AutoIF (Qwen2 72B) | Self-play with Execution Feedback: Improving … | 2024-06-19 |
Instruction Following | GPT-4 | Instruction-Following Evaluation for Large Language … | 2023-11-14 |
Instruction Following | PaLM 2 S | Instruction-Following Evaluation for Large Language … | 2023-11-14 |
Recent papers with results on this dataset: