Proper Name Machine Translation from Japanese to Japanese Sign Language
Total Page:16
File Type:pdf, Size:1020Kb
Proper Name Machine Translation from Japanese to Japanese Sign Language Taro Miyazaki, Naoto Kato, Seiki Inoue, Yuji Nagashima Shuichi Umeda, Makiko Azuma, Nobuyuki Hiruma NHK Science & Technology Research Laboratories Faculty of Information, Tokyo, Japan Kogakuin University miyazaki.t-jw, katou.n-ga, inoue.s-li, Tokyo, japan umeda.s-hg,{ azuma.m-ia, hiruma.n-dy @nhk.or.jp [email protected] } Abstract face, and lips. For deaf people, sign language is easier to understand than spoken language because This paper describes machine transla- it is their mother tongue. To convey the meaning tion of proper names from Japanese to of sentences in spoken language to deaf people, Japanese Sign Language (JSL). “Proper the sentences need to be translated into sign lan- name transliteration” is a kind of machine guage. translation of proper names between spo- To provide more information with sign lan- ken languages and involves character-to- guage, we have been studying machine translation character conversion based on pronunci- from Japanese to Japanese Sign Language (JSL). ation. However, transliteration methods As shown in Figure 1, our translation system au- cannot be applied to Japanese-JSL ma- tomatically translates Japanese text into JSL com- chine translation because proper names puter graphics (CG) animations. The system con- in JSL are composed of words rather sists of two major processes: text translation and than characters. Our method involves CG synthesis. Text translation translates word se- not only pronunciation-based translation, quences in Japanese into word sequences in JSL. but also sense-based translation, because CG synthesis generates seamless motion transi- kanji, which are ideograms that compose tions between each sign word motion by using a most Japanese proper names, are closely motion interpolation technique. To improve the related to JSL words. These translation machine translation system, we have been tack- methods are trained from parallel corpora. ling several problems with translating in JSL. In The sense-based translation part is trained this paper, we focus on the problem of proper via phrase alignment in sentence pairs name translation, because proper names occur fre- in a Japanese and JSL corpus. The quently in TV news programs and are hard to pronunciation-based translation part is translate with conventional methods. trained from a Japanese proper name cor- Proper name translation is one of the ma- pus and then post-processed with trans- jor topics of machine translation. In particu- formation rules. We conducted a series lar, there are many methods that work with spo- of evaluation experiments and obtained ken language, such as “proper name translitera- 75.3% of accuracy rate, increasing from tion,” which means character-to-character conver- baseline method by 19.7 points. We also sion based on pronunciation (Knight et al., 1998; developed a Japanese-JSL proper name Goto et al., 2003; Virga et al., 2003; Li et al., translation system, in which the translated 2004; Finch et al., 2010; Sudoh et al., 2013). proper names are visualized with CG ani- However, transliteration methods cannot be ap- mations. plied to Japanese-JSL proper name translation be- cause proper names in JSL are not composed of 1 Introduction characters but rather of sign words. To translate Sign language is a visual language in which sen- proper names using sign words, sense-based trans- tences are created using the fingers, hands, head, lation is required. Sense-based translation trans- 67 Language Technology for Closely Related Languages and Language Variants (LT4CloseLang), pages 67–75, October 29, 2014, Doha, Qatar. c 2014 Association for Computational Linguistics Figure 1: Japanese-JSL translation system overview lates kanji, which are ideograms that compose For example, in the Japanese place name “香川 most Japanese proper names, into closely related (Kagawa),” the kanji-characters “香 (aroma)” and JSL words. Moreover, although several methods “川 (river)” are respectively translated into sign have been proposed to translate sentences in sign words “AROMA1” and “RIVER.” Accordingly, language, there is as yet no method to translate the translation of “香川 (Kagawa)” is “AROMA proper names (Masso´ et al., 2010; San-Segundo / RIVER” in JSL. et al., 2010; Morrissey, 2011; Stein et al., 2012; Mazzei, 2012; Lugaresi et al., 2013). Type 2: Pronunciation-based case This paper describes proper name translation Here, the pronunciations of the kanji are translit- from Japanese into JSL. The method involves erated into the Japanese kana alphabet. The kana sense-based translation and pronunciation-based are visualized by fingerspelling2. The transliter- translation. Both conversions are based on a ation in this case is not a spelling-based transfor- statistical machine translation framework. The mation from the source language because kanji are sense-based translation is a sense-based character- not phonograms3. wise translation learned from phrase pairs in a For example, in the Japanese personal name “茂 Japanese-JSL corpus. The pronunciation-based 木” (Motegi, written in kana as “モテギ”), the two translation is a pronunciation-based character- kanji “茂” and “木” are respectively transliterated wise translation learned from a Japanese proper into the kana “モテ (mote)” and “ギ (gi).” name corpus and is post-processed with transfor- Each of the three kana, “モ (mo),” “テ (te)” and mation rules. We conducted a series of evaluation “ギ (gi),” is fingerspelled in JSL. experiments and obtained good results. We also developed a proper name translation system from Type 3: Mixed case Japanese to JSL, in which the translated proper This type includes Type 1 and Type 2. That names are visualized with CG-animations. is, some of the characters in the proper names are translated into sign words and the others are 2 Proper Names in JSL transliterated into kana and then visualized by fin- gerspelling. For example, regarding the Japanese 2.1 Types of proper name in JSL place name “長野” (Nagano, written in kana as “ In JSL, proper name representations are classified ナガノ”), the kanji “長” is translated into the sign into four types, as follows. word “LONG” and “野” is transliterated into the kana “ノ (no).” Type 1: sense-based case 1 Here, each character in Japanese proper names is The words in JSL are represented using capitalized En- glish words. This notation method is called “glosses” in the translated into sign words in JSL. Most charac- sign language research community. ters that make up Japanese proper names are kanji. 2All of the kana can be visualized by fingerspelling in Kanji are ideograms, i.e., each kanji representing JSL. 3For example, the character “木” is pronounced “ki,” “gi,” concept, so they can be translated into words with “moku,” “boku” etc. The decision as to which pronunciation the concepts in JSL. should be used is by context, meanings or idiom. 68 JP 香川は朝から晴れるでしょう Table 1: Analysis of proper name types Type 1 43% (It will be fine from the morning in Kagawa) Type 2 3% Place name JSL AROMA / RIVER / MORNING / Type 3 10% Type 4 44% FROM / FINE / DREAM Type 1 60% Sentence 2 Type 2 14% Persons’ name Type 3 21% JP 香/川/は/朝/か/ら/晴/れ/る/で/し/ょ/う Type 4 21% (It will be fine from the morning in Kagawa) JSL AROMA / RIVER / MORNING / Type 4: Idiomatic case These proper names are traditionally defined as FROM / FINE / DREAM fixed representations in JSL. We took the basic method above to be the base- 2.2 Analysis of Proper Name Types in line method for the evaluations. Corpora 3.1.2 Our method To investigate the frequencies of these four types Our method uses the corpus in a phrase-by-phrase in corpora, we analyzed a geographical dictionary manner. To use the phrase-segmented corpus, (JFD, 2009) of place names and our corpus (men- the method is composed of two steps. The first tioned in section 4.2.1) of persons’ names. Table step aligns Japanese phrases to JSL phrases in 1 shows the results of the analysis. each of the sentence pairs in the corpus by us- Proper names of Types 1, 2 and 3 needed to ing many-to-many word alignment. Using the re- be translated, while those of Type 4 needed to sults of the alignment, each sentence pair is di- be registered in an idiomatic translation dictio- vided into phrase pairs. The second step segments nary of proper names. Furthermore, the proper the phrases into characters in Japanese and trains name translations of Type 1, 2 and 3 reduce the sense-based translation part on the phrase pairs to sense-based translations and/or pronunciation- (Figure 2-(b)). based translations. Let us illustrate our method using Sentence 1. Our translation method performs sense-based The first step is dividing a sentence into phrase translation and pronunciation-based translation on pairs. We use alignment pairs, the result of the basis of statistical machine translation (SMT) the many-to-many word alignment, as the phrase methods. The next section describes this method. pairs. The alignment pairs are combined into phrase pairs, as shown in Phrase 1 below. 3 Our translation method Phrase 1 3.1 Sense-based translation 3.1.1 Basic method (baseline) JP1 香川 / は (in Kagawa) The sense-based translation uses SMT, and the JSL1 AROMA / RIVER translation probabilities (i.e. a lexicon model in 朝 / から (from the morning) SMT) are trained on our news corpus consisting JP2 of sentence pairs in Japanese and JSL. The basic JSL2 MORNING / FROM method of training the lexicon model uses the cor- JP3 晴れる / でしょ / う (it will be fine) pus in a sentence-by-sentence manner (Figure 2- (a)). It segments the sentences into characters in JSL3 FINE / DREAM Japanese and into words in JSL.