An Double Hidden HMM and an CRF for Segmentation Tasks with Pinyin’s Finals Huixing Jiang Zhe Dong Center for Intelligence Science and Technology Beijing University of Posts and Telecommunications Beijing, China
[email protected] [email protected] Abstract inner features of Chinese Characters. And it nat- urally contributes to the identification of Out-Of- We have participated in the open tracks Vacabulary words (OOV). and closed tracks on four corpora of Chi- In our work, Chinese phonology information is nese word segmentation tasks in CIPS- used as basic features of Chinese characters in all SIGHAN-2010 Bake-offs. In our experi- models. For open tracks, we propose a new dou- ments, we used the Chinese inner phonol- ble hidden layers HMM in which a new phonol- ogy information in all tracks. For open ogy information is built in as a hidden layer, a tracks, we proposed a double hidden lay- new lexical association is proposed to deal with ers’ HMM (DHHMM) in which Chinese the OOV questions and domains’ adaptation ques- inner phonology information was used as tions. And for closed tracks, CRF model has been one hidden layer and the BIO tags as an- used , combined with Chinese inner phonology in- other hidden layer. N-best results were formation. We used the CRF++ package Version firstly generated by using DHHMM, then 0.43 by Taku Kudo1. the best one was selected by using a new In the rest sections of this paper, we firstly in- lexical statistic measure. For close tracks, troduce the Chinese phonology in Section 2.