IFEval

Instruction Following Evaluation Datset

Dataset Information
Modalities
Texts
Languages
English
Introduced
2023
License
Unknown
Homepage

Overview

This dataset evaluates instruction following ability of large language models. There are 500+ prompts with instructions such as "write an article with more than 800 words", "wrap your response with double quotation marks", etc.

Variants: IFEval

Associated Benchmarks

This dataset is used in 1 benchmark:

  • Instruction Following -

Recent Benchmark Submissions

Task Model Paper Date
Instruction Following AutoIF (Llama3 70B) Self-play with Execution Feedback: Improving … 2024-06-19
Instruction Following AutoIF (Qwen2 72B) Self-play with Execution Feedback: Improving … 2024-06-19
Instruction Following GPT-4 Instruction-Following Evaluation for Large Language … 2023-11-14
Instruction Following PaLM 2 S Instruction-Following Evaluation for Large Language … 2023-11-14

Research Papers

Recent papers with results on this dataset: