Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction
MuCGEC is a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources. Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.
Variants: MuCGEC
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Grammatical Error Correction | GEC-DI (LM+GED) | Improving Seq2Seq Grammatical Error Correction … | 2023-10-23 |
Recent papers with results on this dataset: