Discriminative Method for Japanese Kana-Kanji Input Method
Total Page:16
File Type:pdf, Size:1020Kb
Discriminative Method for Japanese Kana-Kanji Input Method Hiroyuki Tokunaga Daisuke Okanohara Shinsuke Mori Preferred Infrastructure Inc. Kyoto University 2-40-1 Hongo, Bunkyo-ku, Tokyo Yoshidahonmachi, Sakyo-ku, Kyoto 113-0033, Japan 606-8501, Japan tkng,hillbig @preferred.jp [email protected] { } Abstract Early kana-kanji conversion systems employed heuristic rules, such as maximum-length-word The most popular type of input method matching. in Japan is kana-kanji conversion, conver- With the growth of statistical natural language sion from a string of kana to a mixed kanji- processing, data-driven methods were applied for kana string. kana-kanji conversion. Most existing studies on However there is no study using discrim- kana-kanji conversion have used probability mod- inative methods like structured SVMs for els, especially generative models. Although gen- kana-kanji conversion. One of the reasons erative models have some advantages, a number is that learning a discriminative model of studies on natural language tasks have shown from a large data set is often intractable. that discriminative models tend to overcome gen- However, due to progress of recent re- erative models with respect to accuracy. searches, large scale learning of discrim- However, there have been no studies using only inative models become feasible in these a discriminative model for kana-kanji conversion. days. One reason for this is that learning a discriminative model from a large data set is often intractable. In the present paper, we investigate However, due to progress in recent research on whether discriminative methods such as machine learning, large-scale learning of discrim- structured SVMs can improve the accu- inative models has become feasible. racy of kana-kanji conversion. To the best The present paper describes how to apply a dis- of our knowledge, this is the first study criminative model for kana-kanji conversion. The comparing a generative model and a dis- remainder of the present paper is organized as criminative model for kana-kanji conver- follows. In the next section, we present a brief sion. An experiment revealed that a dis- description of Japanese text input and define the criminative method can improve the per- kana-kanji conversion problem. In Section 3, we formance by approximately 3%. describe related researches. Section 4 provides the baseline method of the present study, i.e., kana- 1 Introduction kanji conversion based on a probabilistic language Kana-kanji conversion is conversion from kana model. In Section 5, we describe how to apply characters to kanji characters, the most popular the discriminative method for kana-kanji conver- way of inputting Japanese text from keyboards. sion. Section 6 presents the experimental results, Generally one kana string corresponds to many and conclusions are presented in Section 7. kanji strings, proposing what the user wants to in- 2 Japanese Text Input and Kana-Kanji put is not trivial. We showed how input keys are processed in Figure 1. Conversion Two types of problems are encountered in kana- Japanese text is composed of several scripts. The kanji conversion. One is how to reduce conversion primary components are hiragana, katakana, (we errors, and the other is how to correct smoothly refer them as kana) and kanji. when a conversion failure has occurred. In the The number of kana is less than hundred. In present study, we focus on the problem of reduc- many case, we input romanized kana characters ing conversion errors. from the keyboard. One of the task of a Japanese 10 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011), pages 10–18, Chiang Mai, Thailand, November 13, 2011. input (romanized): ka ga ku to ga ku shu u input (kana): か が く と が く し ゅ う 火 蛾 蚊 が 苦 と 学習 constructed graph: BOS EOS 科学 咎 苦 終 Figure 1: A graph constructed from an input string. Some nodes are omitted for simplicity. input method is transliterating them to kana. We must first construct a graph that repre- The problem is how to input kanji. The num- sents all possible conversions. An example ber of kanji is more than ten thousand, the ordinal of such a graph is given in Figure 1. keyboards in Japan do not have a sufficient number We must use a dictionary in order to construct of keys to enable the user to input such characters this graph. directly. In order to address this problem, a number Since all edges are directed from the start of methods have been proposed and investigated. node to the goad node, the graph is a directed Handwriting recognition systems, for example, acyclic graph (DAG). have been successfully implemented. Another successful method is kana-kanji conversion, which 2. Select the most likely path in the graph. is currently the most commonly used method for This task is broken into two parts. The first is inputting Japanese text due to its fast input speed setting the cost of each vertex and edge. The and low initial cost to learn. cost is often determined by supervised learn- ing methods. The second is finding the most 2.1 Kana-kanji conversion likely path from the graph. Viterbi algorithm Kana-kanji conversion is the problem of convert- is used for this task. ing a given kana string into a mixed kanji-kana string. For simplicity, in the following we describe Formally, the task of kana-kanji conversion can a mixed kanji-kana string as a kanji string. be defined as follows. Let x be an input, an un- What should be noted here is that kana-kanji segmented sentence written in kana. Let y be a conversion is the conversion from a kana sentence path, i.e., a sequence of words. When we write to a kanji sentence. One of the key points of kana- the jth word in y as yj, y can be denoted as kanji conversion is that an entire sentence can be y = y1y2 . yn, where n is the number of words converted at once. This is why kana-kanji conver- in path y. sion is great at input speed. Let be a set of candidate paths in the DAG Since an input string is non-segmented sen- Y built from the input sentence x. The goal is to tence, every kana-kanji conversion system must be select a correct path y∗ from all candidate paths in able to segment a kana sentence into words. This . is not a trivial problem, recent kana-kanji conver- Y sion softwares often employ the statistical meth- 2.2 Parameter estimation with a probabilistic ods. language model Although there is a measure of freedom with re- spect to the design of a kana-kanji conversion sys- If we defined the kana-kanji conversion as a graph tem, in the present paper, we discuss kana-kanji search problem, all edges and vertices must have conversion systems they comply with the follow- costs. There are two ways to estimate these pa- ing procedures: rameters. One is to use a language model, which was proposed in the 1990s, and the other is to use 1. Construct a graph from the input string. a discriminative model, as in the present study. 11 This subsection describes kana-kanji conver- Mori et al. (1999) proposed the following de- sion based on probabilistic language models, composition model for kana-kanji model: treated as a baseline in the present paper. In this method, given a kana string, we calcu- P (x y) = P (x y ) | j| j late the probabilities for each conversion candidate j Y and then present the candidate that has the highest where each P (x y ) is a fraction of how many probability. j| j times the word y is read as x . Given that We denote the probability P of a conversion j j d(x , y ) is the number of times word y is read candidate y given x by P (y x). j j j | as xj, P (xj yj) can be written as follows: Using Bayes’ theorem, we can transform this | expression as follows: d(xj, yj) P (xj yj) = . | i d(xi, yj) P (x y)P (y) P (y x) = | P (y)P (x y), 2.3 Smoothing P | P (x) ∝ | In Eq. (1), if any P (yj yj 1) is zero, P (y) is also where P (y) is the language model, and P (x y) is | − | estimated to zero. This means that P (y) is always the kana-kanji model. zero if a word that does not appear in the training The language model P (y) assigns a high prob- corpus is appeared in y. Since we use a language ability to word sequences that are likely to occur. model to determine which sentence is more natu- Word n-gram models or class n-gram models are ral as a Japanese sentence, both probabilities be- often used in the natural language processing. ing zero does not help us. This is referred to as the The definition of n-gram model is denoted as zero frequency problem. follows: We apply smoothing to prevent this prob- lem. In the present paper, we implemented both of additive smoothing and interpolated Kneser- P (y) = P (yj yj 1, yj 2, . , yj n+1), | − − − Ney smoothing (Chen and Goodman, 1998; Teh, j Y 2006). Surprisingly, the additive smoothing over- where n 1 indicates the length of the history used came the interpolated Kneser-Ney smoothing. − to estimate the probability. Therefore we adopt the additive smoothing in the If we adopt a word bigram (n = 2) model, then experiments. P (y) is decomposed into the following form: 3 Related Research P (y) = P (yj yj 1), (1) | − Mori et al.