IFEval

Name: IFEval
Published: 2023-11-14
License: Unknown

Instruction Following Evaluation Datset

Dataset Information

Modalities

Texts

Languages

English

Introduced

2023

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

This dataset evaluates instruction following ability of large language models. There are 500+ prompts with instructions such as "write an article with more than 800 words", "wrap your response with double quotation marks", etc.

Variants: IFEval

Associated Benchmarks

This dataset is used in 1 benchmark:

Instruction Following - Metrics: Inst-level loose-accuracy, Inst-level strict-accuracy, Prompt-level loose-accuracy, Prompt-level strict-accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Instruction Following	AutoIF (Llama3 70B)	Self-play with Execution Feedback: Improving …	2024-06-19
Instruction Following	AutoIF (Qwen2 72B)	Self-play with Execution Feedback: Improving …	2024-06-19
Instruction Following	GPT-4	Instruction-Following Evaluation for Large Language …	2023-11-14
Instruction Following	PaLM 2 S	Instruction-Following Evaluation for Large Language …	2023-11-14

Research Papers

Recent papers with results on this dataset:

External Links:

IFEval

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview