MuCGEC

Name: MuCGEC
Published: 2022-04-23
License: Apache-2.0

Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Dataset Information

Modalities

Texts

Languages

Chinese

Introduced

2022

License

Apache-2.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

MuCGEC is a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources. Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.

Variants: MuCGEC

Associated Benchmarks

This dataset is used in 1 benchmark:

Grammatical Error Correction - Metrics: F0.5

Recent Benchmark Submissions

Task	Model	Paper	Date
Grammatical Error Correction	GEC-DI (LM+GED)	Improving Seq2Seq Grammatical Error Correction …	2023-10-23

Research Papers

Recent papers with results on this dataset:

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions (2023) -

External Links:

MuCGEC

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview