This dataset was collected with the goal of assessing dialog evaluation metrics. In the paper, USR: An Unsupervised and Reference Free Evaluation Metric for Dialog (Mehri and Eskenazi, 2020), the authors collect this data to measure the quality of several existing word-overlap and embedding-based metrics, as well as their newly proposed USR metric.
Variants: USR-TopicalChat
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Dialogue Evaluation | MDD-Eval | MDD-Eval: Self-Training on Augmented Data … | 2021-12-14 |
Dialogue Evaluation | USR | USR: An Unsupervised and Reference … | 2020-05-01 |
Dialogue Evaluation | USR - DR (x = c) | USR: An Unsupervised and Reference … | 2020-05-01 |
Dialogue Evaluation | USR - MLM | USR: An Unsupervised and Reference … | 2020-05-01 |
Dialogue Evaluation | USR - DR (x = f) | USR: An Unsupervised and Reference … | 2020-05-01 |
Recent papers with results on this dataset: