Word Order Does NOT Differ Significantly Between Chinese And
Total Page:16
File Type:pdf, Size:1020Kb
Word Order Does NOT Differ Significantly Between Chinese and Japanese Chenchen Ding†‡ Masao Utiyama† Eiichiro Sumita† Mikio Yamamoto‡ • ◦ ◦ • †Multilingual Translation Laboratory, National Institute of Information and Communications Technology 3-5 Hikaridai, Seikacho, Sorakugun, Kyoto, 619-0289, Japan ‡Department of Computer Science, University of Tsukuba 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan tei@mibel., myama@ cs.tsukuba.ac.jp •{ } mutiyama, eiichiro.sumita @nict.go.jp ◦{ } Abstract An effective rule-based approach, head final- We propose a pre-reordering approach for ization has been proposed for English-to-Japanese Japanese-to-Chinese statistical machine translation (Isozaki et al., 2012). The approach translation (SMT). The approach uses de- takes advantage of the head final property of pendency structure and manually designed Japanese on the target-side. It designs a head fi- reordering rules to arrange morphemes of nalization rule to move the head word based on Japanese sentences into Chinese-like word the parsing result by a head-driven phrase struc- order, before a baseline phrase-based (PB) ture grammar (HPSG) parser. Generally, the idea SMT system applied. Experimental results can be applied to other SVO-to-Japanese transla- on the ASPEC-JC data show that the im- tion tasks, such as its application in Chinese-to- provement of the proposed pre-reordering Japanese translation (Dan et al., 2012). approach is slight on BLEU and mediocre However, the head finalization cannot be ap- on RIBES, compared with the organizer’s plied on the reverse translation task, i.e. Japanese- baseline PB SMT system. The approach to-SVO translation, which becomes a more diffi- also shows improvement in human evalu- cult task. Specifically, Japanese-to-English trans- ation. We observe the word order does not lation has been studied and several rule-based pre- differ much in the two languages, though reordering approaches have been proposed, tak- Japanese is a subject-object-verb (SOV) ing advantage of the characters of Japanese and language and Chinese is an SVO language. English (Komachi et al., 2006; Katz-Brown and Collins, 2008; Sudoh et al., 2011; Hoshino et al., 1 Introduction 2013; Ding et al., 2014). A comparison of these The state-of-the-art techniques of statistical ma- approaches is reported in Ding et al. (2014). chine translation (SMT) (Koehn et al., 2003; Because both Chinese and English are SVO Koehn et al., 2007) demonstrate good performance languages, to transfer approaches of Japanese-to- on translation of languages with relatively simi- English to Japanese-to-Chinese translation is a lar word orders (Koehn, 2005). However, word natural idea. Based on the framework of Ding reordering is a problematic issue for language et al. (2014), we propose dependency-based pre- pairs with significantly different word orders, such reordering rules for Japanese-to-Chinese transla- as the translation between a subject-verb-object tion in this paper. Contrary to our expectations, (SVO) language and a subject-object-verb (SOV) from the experimental results on ASPEC-JC data, language (Isozaki et al., 2012). we discover that the rule-based pre-reordering To resolve the word reordering problem in cannot improve the Japanese-to-Chinese signifi- SMT, a line of research handles the word reorder- cantly as in the case of Japanese-to-English trans- ing as a separate pre-process, which is referred as lation. In a further investigation, we find a ba- pre-reordering. In pre-reordering, the word or- sic reason is that the word order is actually sim- der on source-side is arranged into the target-side ilar between Chinese and Japanese. Chinese and word order, before a standard SMT system is ap- English, though both of them are SVO languages, plied, on both training and decoding phases. have very different properties. 77 Proceedings of the 1st Workshop on Asian Translation (WAT2014), pages 77‒82, Tokyo, Japan, 4th October 2014. 2014 Copyright is held by the author(s). 自分の妻さえ まだ 伴れて行った 事が ないのです 妻には 私の意味が 解らないのです 连自己的妻子 也 没有 带去过 妻子 不理解 我的意思 Figure 1: Japanese tends to use formal noun (in dash-line Figure 2: A nominative phrase in Japanese box) to “wrap” a long clause. turns to be an accusative phrase in Chinese. 2 Character of Chinese and Japanese 3 Proposed Approach In Ding et al. (2014), they proposed three kinds of We use three inter-chunk rules, classified by the verb, noun, and copula rules to arrange the order of corresponding head morpheme in a head chunk. Japanese chunks1, with further two rules of mor- They are formal-noun rule, stative-verb rule and phemes to achieve a more correct pre-reordering. dynamic-verb rule. However, their system can not directly applied on Japanese-to-Chinese. Generally, Chinese shares Formal-noun rule much more similar characters with Japanese than Head-initialization is conducted. We only ap- English does, as listed in follows. ply the rule to chunks with the formal noun koto as its head morpheme. Chinese is head-final in noun phrases, just as • Japanese. Stative-verb rule The modifier chunk with nominative case- Chinese has no clear boundary between verbs • marker ga is moved after the head chunk. We and prepositions. The coverb and serial verb only apply the rule to chunks with a verb head constructions are common in Chinese, which among aru, iru, and dekiru as its head mor- somewhat like Japanese. pheme. Although it seems that the pre-reordering for Dynamic-verb rule Japanese-to-Chinese may be easier than the case The following modifier chunks are moved af- of Japanese-to-English, there are factors turning ter the head chunk. the case more complex, as listed in follows. with an accusative case-marker wo Chinese has a strong tendency to avoid long • • with a quotation particle to attributes before nouns, while long attributes • with a formal noun as a head morpheme are acceptable in Japanese, especially for the • formal nouns in Japanese (Fig. 1). We apply the rule to chunks with a verb ex- cept the three verbs used in the stative-verb Between Chinese and Japanese, the transitive • rule, as its head morpheme. verbs and intransitive verbs are not in accor- dance in some cases. Essentially, case frame We also use extra/intra-chunk rules to arrange of verbs are not marched well between the certain functional morphemes. two languages (Fig. 2). According to the mentioned issues, the pre- Intra-chunk rule reordering cannot be conducted by only see- Auxiliary verbs, except ta and copula mor- ing specific Japanese functional morphemes, such phemes, are moved before head morpheme as various case-markers, which are often used within a chunk. in many Japanese-to-English pre-reordering ap- Extra-chunk rule proaches. We design three new inter-chunk rules Functional morphemes attached to nouns, and modified the intra/extra-rules based on Ding except genitive case-marker no and paral- et al. (2014). lel markers, are moved before the governing 1i.e. bunsetsu range of the chunk. 78 Original 8種類の TEC分解菌を 純粋分離する ことが できた Japanese Stative-verb Formal-noun Dynamic-verb Pre-reordered できた ことが 純粋分離する 8種類の TEC分解菌を Japanese deleted deleted Chinese 能够 纯粹分离 8种 TEC分解菌 Figure 3: Example of three inter-chunk rules: formal-noun rule, stative-verb rule and dynamic-verb rule. The alignment between pre-reordered Japanese and Chinese is manually aligned. By the three rules, Morphemes of Japanese are arranged in a Chinese-like order. Original 今後の 対応を 急がねばならない 隣の ベッドの 枕の 上に Japanese Extra-chunk Dynamic-verb Pre-reordered ねばならない急が 今後の 対応を に 隣の ベッドの 枕の 上 Japanese Intra-chunk deleted Chinese 必须加快 今后的 对策 在 相邻 病床的 枕头 上 Figure 4: Examples of Intra- and Extra-chunk rules. The left example shows an intra-chunk move within a verb chunk, leading to a more correct reordering of Japanese functional morphemes, where the under- lined parts are corresponding translation. The right example shows an extra-chunk move. A Japanese post-positioned case-marker is arranged to the left-most position of the range it governs. This move makes the Japanese postposition phrase have an identical order to the Chinese preposition phrase. We further delete several Japanese functional 4 Experiment morphemes which do not have exact Chinese translations. They are as follows. We tested our approach on the ASPEC-JC data (Nakazawa et al., 2014). For the source side topic marker wa Japanese sentences, we used MeCab (IPA dictio- • nary)2 for morpheme analysis, CaboCha3 (Kudo and Matsumoto, 2002) for chunking and depen- nominative case-marker ga • dency parsing. We used the Stanford Chinese Word Segmenter4 (Tseng et al., 2005) with the accusative case-marker wo • Chinese Penn Treebank standard (CTB) to seg- conjunctive particle te 2http://mecab.googlecode.com/svn/ • trunk/mecab/doc/index.html 3https://code.google.com/p/cabocha/ We illustrate examples of our pre-reordering ap- 4http://www-nlp.stanford.edu/software/ proach in Figs. 3 and 4. segmenter.shtml 79 ment each Chinese sentence. We used the phrase- nese and Japanese is very high even without pre- based (PB) translation system in Moses5 (Koehn et reordering, which suggests Japanese has a much al., 2007) as a baseline SMT system. Word align- more similar word order with Chinese than En- ment was automatically generated by GIZA++6 glish. Then, we conducted more comparisons. We (Och and Ney, 2003) with the default setting calculated the average ⌧ of French and English, of Moses, and symmetrized by the grow-diag- which are usually referred as a language pair with final-and heuristics (Koehn et al., 2003). In similar word order, on Europarl V7 data8 (Koehn, phrase extraction, the max-phrase-length was 7 2005). As a result, the score is 0.898, which is with GoodTuring option in scoring.