SyntaxGym, adapted for interventional interpretability.
Variants: CausalGym
This dataset is used in 1 benchmark:
Recent papers with results on this dataset: