Discriminative Method for Japanese Kana-Kanji Input Method Hiroyuki Tokunaga Daisuke Okanohara Shinsuke Mori Preferred Infrastructure Inc. Kyoto University 2-40-1 Hongo, Bunkyo-ku, Tokyo Yoshidahonmachi, Sakyo-ku, Kyoto 113-0033, Japan 606-8501, Japan tkng,hillbig @preferred.jp
[email protected] { } Abstract Early kana-kanji conversion systems employed heuristic rules, such as maximum-length-word The most popular type of input method matching. in Japan is kana-kanji conversion, conver- With the growth of statistical natural language sion from a string of kana to a mixed kanji- processing, data-driven methods were applied for kana string. kana-kanji conversion. Most existing studies on However there is no study using discrim- kana-kanji conversion have used probability mod- inative methods like structured SVMs for els, especially generative models. Although gen- kana-kanji conversion. One of the reasons erative models have some advantages, a number is that learning a discriminative model of studies on natural language tasks have shown from a large data set is often intractable. that discriminative models tend to overcome gen- However, due to progress of recent re- erative models with respect to accuracy. searches, large scale learning of discrim- However, there have been no studies using only inative models become feasible in these a discriminative model for kana-kanji conversion. days. One reason for this is that learning a discriminative model from a large data set is often intractable. In the present paper, we investigate However, due to progress in recent research on whether discriminative methods such as machine learning, large-scale learning of discrim- structured SVMs can improve the accu- inative models has become feasible.