MT-Bench

Name: MT-Bench
Published: 2023-06-09
License: Unknown

Dataset Information

Modalities

Texts

Languages

English

Introduced

2023

License

Unknown

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

This dataset contains 3.3K expert-level pairwise human preferences for model responses generated by 6 models in response to 80 MT-bench questions. The 6 models are GPT-4, GPT-3.5, Claud-v1, Vicuna-13B, Alpaca-13B, and LLaMA-13B. The annotators are mostly graduate students with expertise in the topic areas of each of the questions.

Variants: MT-Bench

Associated Benchmarks

This dataset is used in 1 benchmark:

Text Generation - Metrics: score

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.

External Links:

Papers with Code Entry

MT-Bench

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview