Homophones and Tonal Patterns in English-Chinese Transliteration Oi Yee Kwong Department of Chinese, Translation and Linguistics City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong
[email protected] to overcome the problem and model the charac- Abstract ter choice directly. Meanwhile, Chinese is a typical tonal language and the tone information The abundance of homophones in Chinese can help distinguish certain homophones. Pho- significantly increases the number of similarly neme mapping studies seldom make use of tone acceptable candidates in English-to-Chinese information. Transliteration is also an open transliteration (E2C ). The dialectal factor also problem, as new names come up everyday and leads to different transliteration practice. We there is no absolute or one-to-one transliterated compare E2C between Mandarin Chinese and Cantonese, and report work in progress for version for any name. Although direct ortho- dealing with homophones and tonal patterns graphic mapping has implicitly or partially mod- despite potential skewed distributions of indi- elled the tone information via individual charac- vidual Chinese characters in the training data. ters, the model nevertheless heavily depends on the availability of training data and could be 1 Introduction skewed by the distribution of a certain homo- phone and thus precludes an acceptable translit- This paper addresses the problem of automatic eration alternative. We therefore propose to English-Chinese forward transliteration (referred model the sound and tone together in E2C . In to as E2C hereafter). this way we attempt to deal with homophones There are only a few hundred Chinese charac- more reasonably especially when the training ters commonly used in names, but their combina- data is limited.