A Translation Accuracy Challenge Set
ACES a dataset consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. It can be used to evaluate a wide range of Machine Translation metrics.
Variants: ACES
This dataset is used in 1 benchmark:
Recent papers with results on this dataset: