By perturbing the widely used GSM8K dataset, an adversarial dataset for grade-school math called GSM-Plus is created. Motivated by the capability taxonomy for solving math problems mentioned in Polya's principles, this paper identifies 5 perspectives to guide the development of GSM-Plus:
GSM-Plus can be used to evaluate the robustness of current LLMs in mathematical reasoning.
Variants: GSM-Plus
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Math Word Problem Solving | GPT-4 | GSM-Plus: A Comprehensive Benchmark for … | 2024-02-29 |
Recent papers with results on this dataset: