CPP

Chinese Polyphones with Pinyin

Dataset Information
Languages
Chinese
License
Unknown
Homepage

Overview

A benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation.

Source: g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset

Variants: CPP

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Polyphone disambiguation g2pW g2pW: A Conditional Weighted Softmax … 2022-03-20
Polyphone disambiguation g2pM (BERT) g2pM: A Neural Grapheme-to-Phoneme Conversion … 2020-04-07
Polyphone disambiguation g2pM (BiLSTM) g2pM: A Neural Grapheme-to-Phoneme Conversion … 2020-04-07

Research Papers

Recent papers with results on this dataset: