Controlling Japanese Honorifics in English-To-Japanese Neural

Controlling Japanese Honoriﬁcs in English-to-Japanese Neural Machine Translation

Weston Feely* Eva Hasler† Adria` de Gispert† SDL Research *Los Angeles, CA, †Cambridge, UK {wfeely,ehasler,agispert}@sdl.com

Abstract Formality Japanese sentence and transcription 駅の近くにたくさんのお店がある。 Informal In the Japanese language different levels of eki-no chikaku-ni takusan-no omise-ga aru 駅の近くにたくさんのお店があります。 Polite honorific speech are used to convey respect, eki-no chikaku-ni takusan-no omise-ga arimasu deference, humility, formality and social dis- 駅の近くにたくさんのお店がございます。 Formal tance. In this paper, we present a method for eki-no chikaku-ni takusan-no omise-ga gozaimasu controlling the level of formality of Japanese output in English-to-Japanese neural machine Table 1: Three sentences meaning “There are many translation (NMT). By using heuristics to shops near the train station”, in different levels of for- identify honorific verb forms, we classify mality Japanese sentences as being one of three levels of informal, polite, or formal speech in parallel text. The English source side is marked For example, when speaking with family, close with a feature that identifies the level of hon- friends, or others of equal social status, the infor- orific speech present in the Japanese target mal ある (aru “there are”) is used. When speaking side. We use this parallel text to train an English-Japanese NMT model capable of pro- to superiors, strangers, or older individuals the po- ducing Japanese translations in different hon- lite expression あります (arimasu “there are”) is orific speech styles for the same English input used. When expressing deference or humility, the sentence. formal expression ございます (gozaimasu “there are”) is used. In this paper we use the terms infor- 1 Introduction mal, polite, and formal to refer to these three lev- Languages differ in the way they express the same els of formality as shown in Table 1. Traditional ideas depending on social context. In English dif- Japanese grammars may make finer-grained, more ferent words or phrases are used in a more ca- nuanced distinctions than this. sual or familiar context compared to a more formal While there is this nuance to Japanese grammar, context. In languages such as Japanese or Korean in English there is no such distinction, so when formality distinctions are grammatically encoded translating from English into Japanese, a transla- using a system of honorifics. These honorifics are tor must choose one level of formality or another. part of Japanese verbal morphology, which allows This poses a challenge for English-Japanese NMT, the same concept to be expressed in multiple lev- since for a translation to be adequate it needs to els of formality by altering the inflection of the both capture the meaning of the source sentence main verb of the sentence. The examples in Ta- and use the appropriate level of formality. ble 1 show one sentence in three different levels of We propose a method to allow English-Japanese formality. In all three examples the meaning is the NMT to produce translations in a particular level same, but the inflection of the main verb is differ- of formality, using an additional feature on the ent. source side marking the desired level of formal- It is important to note that these formality dis- ity to be used in the translation. With this feature tinctions in Japanese are not optional. All sen- provided at both training and test time, a single tences must use one verb inflection or another, so NMT system can learn to distinguish these lev- speakers are always making a choice of what level els of formality and produce multiple translations of formality to use depending on social context. for the same input sentence. We evaluate our ap-

45 Proceedings of the 6th Workshop on Asian Translation, pages 45–53 Hong Kong, China, November 4, 2019. c 2019 Association for Computational Linguistics English Source The number at the bottom of the list drops off. Modiﬁed English Source The number at the bottom of the list drops off. リストの一番下にある番号がリストから削除されます。 Japanese Target risuto-no ichiban shita-ni aru bango-ga¯ risuto kara sakujo saremasu

Table 2: Attaching a single token to the beginning of an English training data source sentence, based on the predicted formality of the Japanese target side proach on multiple data sets and show that it suc- corpus, we must attach such a token to the begin- cessfully produces sentences in the requested level ning of the English source side, depending on the of formality. Apart from yielding more consistent formality of the Japanese target side. outputs, it improves general translation quality as At test time the resulting English-Japanese measured by BLEU on all data sets. We see par- NMT model will need to be provided the same ticularly strong gains on the polite and formal por- kind of informal, polite, or formal tokens at the tions of the test sets. We also release the following beginning of every English input sentence to be resources that were developed as part of our work translated. This allows the user of the NMT sys- towards formality-aware NMT tem to choose which level of formality they would like their Japanese translation to use. There are ap- • A set of manual formality labels for a portion plications where these labels could be determined of the Tanaka corpus automatically from the context; we leave this for future work as our current data sets do not have • Code for a rule-based formality converter context beyond the sentence level. which can be applied as a translation post- processing step 2.2 Automatic Identification of Honorifics We hope that these resources will spur further re- In order to label our training and test data with search on translation into Japanese. these formality tokens, we need to be able to identify the formality of a Japanese sentence automat- 2 Formality-Aware NMT ically. To do this we look for the presence or ab- sence of certain Japanese honorific verb forms as This section describes our approach for creating a a heuristic. We created a set of common verbs formality-aware English-Japanese NMT system. and verbal inflections that correspond to each formality level, such as the informal expression じゃ 2.1 Choosing Formality in Translation なかった (janakatta “was not”), the suffixes で Our proposed method starts with identifying the す (desu) and ます (masu), which attach to verb formality of every Japanese target sentence in our stems to express politeness, as well as several parallel training corpus. We can determine that honorific and humble verbs such as なさいま the Japanese sentence is informal, polite, or for- す (nasaimasu “to do” honorific) and 致します mal based on the verb inflection of the main verb (itashimasu “to do” humble), which are used in of the sentence, which is often the last word in the formal social contexts to either show respect to the sentence. listener or show humility from the speaker, respec- For example, in Table2 the suffix ます (masu) tively. The full set of verb forms can be found in at the end of the Japanese target sentence is a com- Table 3. mon politeness marker that identifies this as a po- We apply our heuristics to a 21 million sentence lite sentence. This is particular to Japanese gram- Japanese monolingual corpus, composed of web- mar, and from the English source sentence alone crawled text from multiple domains. We catego- you cannot determine what level of formality the rize sentences into three classes which we label Japanese translation should have. So to inform informal, polite, or formal by looking for the verb our English-Japanese NMT system what formality forms in Table3. We start with the formal verb level we are translating into, we attach the token forms. If any of these verbs are present we con- to the beginning of the English source sider the sentence to be formal, if not then we pro- sentence. For every sentence pair in our training ceed to looking for the polite verb forms, then the

46 Formality Verb forms だ, だった, じゃない, じゃなかった, だろう, Informal da, datta, janai, janakatta, darou, だから, だけど, だって, だっけ, そうだ, ようだ dakara, dakedo, datte, dakke, souda, youda です, でした, ない, なかった, ます, ました, ません, Polite desu, deshita, nai, nakatta, masu, mashita, masen, ましょう, でしょう, ください, なさい, である, ではない mashou, deshou, kudasai, nasai, dearu, dewanai ございます, いらっしゃいます, おります, なさいます, 致します, Formal gozaimasu, irrashaimasu, orimasu, nasaimasu, itashimasu, ご覧になります, 拝見します, お目に掛かります, goranninarimasu, haikenshimasu, omenikakarimasu, おいでになります, 伺います, 参ります, 存知します, 存じ上げます, oideninarimasu, ukagaimasu, mairimasu, zonjishimasu, zonjiagemasu, 召し上がります, 頂く, 頂きます, 頂いて, 差しあげます, meshiagemasu, itadaku, itadakimasu, itadaite, sashiagemasu, 下さいます, おっしゃいます, 申し上げます kudasaimasu, osshaimasu, moushiagemasu

Table 3: Common verbs and suffixes for each level of formality, used as identifying heuristics. informal verb forms. If none of the verb forms in Informal Polite Formal Table 3 are present in the sentence it is ignored. Precision 1.00 0.82 0.72 From the original 21 million sentences, 1 million Recall 0.74 0.91 0.97 were unable to be categorized by our heuristics. F1 0.85 0.86 0.83 We hypothesize that a text classifier trained on Table 4: Evaluation scores of labels produced by the the resulting 20 million sentences selected by our formality classifier compared to gold test set labels for heuristics will learn more nuanced distinctions in each formality category (n=150). word choice and style than using the heuristics alone, which only identify a small set of verb forms. We tokenize this data set with the KyTea heuristic rules on this test set, but we hypothesize morphological analyzer (Neubig, 2011b) and train that it generalizes better to unseen text and there- a model on the tokenized monolingual data and fore use it in our translation experiments. The labels with the text classification tools provided results show that our classifier has higher preci- by the FastText (Joulin et al., 2017) toolkit, using sion on the informal category, but lower recall, and word trigram features. higher recall on the polite and formal categories, To evaluate our classifier’s performance, we en- but lower precision. listed the help of a Japanese linguist to make formality judgments on a small test set of 150 2.3 Rule-Based Formality Conversion Japanese sentences drawn from the publicly- We also compare our method of formality-aware available Tanaka corpus (Tanaka, 2001). Out of NMT with a simple rule-based tool which converts these 150 total sentences, 68 were labeled infor- a Japanese sentence from one level of formality mal, 45 were labeled polite and 37 were labeled to another. This is done by identifying the main formal by the annotator. These sentences and an- verb in a Japanese sentence and either replacing notations will be made publicly available along- the verb itself or just the verbal inflection with the side the publication of this paper. inflection for the desired level of formality. The Table4 contains a precision-recall evaluation of code will be made available open-source alongside our formality classifier, showing strong F1 scores this publication. for all three classes. The formality classifier out- Rule-based formality conversion is non-trivial put matches the exact classifications made by our since there are many conjugations to consider for

47 a single verb, which differ based on the class the Dataset Train Test verb belongs to. For example, to convert the verb Proprietary 23,781,990 300 歩きました (arukimashita ”walked” polite)” to ASPEC 1,000,000 1,812 an informal inflection, the polite suffix ました JESC 3,237,376 2,001 (mashita) is removed from the stem of the verb and KFTT 329,882 1,160 a new suffix is appended to create 歩いた (aruita). The き (ki) at the end of the verb stem marks this Table 5: Parallel training and test data set sizes in number of sentences as a verb in a particular verb class. All verbs with ki at the end of their stem belong to the same class and have the same conjugation pattern. 3.2 Experimental Setup In order to use this rule-based method to The English source of each bitext was tokenized compare to our English-Japanese formality-aware with the Moses (Koehn et al., 2007) tokenizer.perl NMT, we can simply take our baseline NMT script and the Japanese target was tokenized with system, trained without the formality tokens de- KyTea. We limit sentence length to 80 tokens on scribed above in section 2.1, and apply the rules either side of the bitext and train a Moses truecaser to convert the NMT output into the desired level for the English source side, except for the JESC of formality. However, this rule-based method is data set because the English side of the JESC cor- imperfect and relies on tokenization and part-of- pus is already entirely lowercased. We use 32k speech information from the KyTea morphologi- subword unit vocabularies (Sennrich et al., 2016b) cal analyzer. Incorrect part of speech tags or tok- separate for source and target. We use the result- enization that doesn’t match our rule-based tool’s ing tokenized, truecased, subworded training data dictionary will lead to errors in changing verbal to train our NMT models. inflection. In our evaluation, we show how using Our experiments use the Transformer (Vaswani this rule-based method compares to our formality- et al., 2018) to train NMT models. We use 512 aware NMT. hidden units, 6 hidden layers, 8 heads, and a batch size of 4096. We train for 200k training steps us- 3 Evaluation ing the Adam optimizer (Kingma and Ba, 2015). For each parallel corpus, we train a formality- aware NMT model by classifying the formality In this section, we evaluate the translation qual- of the Japanese target side and attaching a corre- ity of our formality-aware NMT models as well as sponding feature to the beginning of each English their ability to produce the desired formality level source segment, identifying the target as being in- in the output. formal, polite, or formal. For comparison, we also train a baseline NMT model without these formal- 3.1 Datasets ity annotations.

We use three publicly-available parallel data sets 3.3 Experimental Results for our NMT experiments. The Asian Scientific To evaluate our formality-aware NMT models, we Paper Excerpt Corpus (ASPEC) (Nakazawa et al., first need to choose the right level of formality for 2016), a corpus of scientific paper abstracts, the each sentence in the test sets. We do this by apply- Japanese-English Subtitle Corpus (JESC) (Pryzant ing the formality classifier to the test reference and et al., 2018), a corpus of sentence-aligned movie prepending the predicted labels to the source side and television subtitles, and the Kyoto Free Trans- of each test sentence. We then provide this input lation Task (KFTT) (Neubig et al., 2011a), a cor- to our formality-aware NMT models and compare pus of Wikipedia data about the city of Kyoto. In the output to test set translations from our baseline our experiments we use the standard training and NMT using BLEU (Papineni et al., 2002). We tok- test sets for each parallel corpus. We also use a enize the Japanese MT output and reference using proprietary parallel training data set which con- KyTea before computing BLEU. We evaluate on tains web-crawled data from a mix of domains and the overall test set, as well as each separate portion a corresponding test set. Training and test set sizes of the test set where the test reference was classi- are reported in Table5. fied as informal, polite or formal. Table6 shows

48 our results on the test set using BLEU. 3.3.3 Evaluating formality levels Since choosing the appropriate formality level in 3.3.1 Performance of Rule-Based Conversion Japanese is very important to conform with social We first evaluate the performance of the rule-based norms, we want to show that our formality-aware conversion method described in Section 2.3. The NMT models can provide translations in the de- rule-based tool currently has the capability to con- sired level of formality. As our test sets do not vert to informal or polite verbal inflections, lack- have gold labels from a human annotator for each ing rules for formal verb inflections. Thus, we reference, we use our formality classifier to predict only report results on the informal and polite sec- the level of formality for both the MT output and tions of our test sets. the test reference and compute F1 scores using the As shown in Table6, the rule-based method predicted reference labels. yields improvements on the informal test portion Our F1 comparison in Table7 shows to what ex- for all models except ASPEC where performance tent the formality-aware NMT output matches the remains the same. On the polite portion, we only predicted formality level of the reference transla- see gains for the JESC model but a notable de- tion when the system is provided with the correct crease in performance on ASPEC and no changes input label. We can see that the F1 scores for the for the proprietary and KFTT test sets. This shows formality-aware NMT are high for all three levels that while it is possible to adjust the formality level of formality, above 0.9 in all cases. We also see through post-processing, it is a non-trivial task and a big improvement over the baseline NMT mod- will require more work to improve the coverage of els for each test set, especially in the polite and the tool. However, the rule-based tool could also formal categories. From this we conclude that our be used for other tasks such as creating additional formality-aware NMT models can produce a trans- synthetic training data. lation in the desired level of formality. An imbalance of the training data may partly ex- 3.3.2 Performance of formality-aware NMT plain the difference in quality improvement across the three formality sections of the test sets. Ta- The BLEU scores in Table6 show that on the ble8 shows how much of the training data for each overall test set, our formality-aware NMT mod- data set was classified as being informal, polite, or els show an improvement over the baseline NMT formal. The proprietary data set contains mostly models. This holds true for both the model trained polite and informal data. In contrast, the majority on the proprietary training data set, and the models of the three publicly available data sets is informal trained on the publicly-available training data sets. data, with a much smaller portion of polite data. Out of the models trained on publicly-available For all data sets there is very little formal data, data, the ASPEC model shows the smallest im- leading to the weak baseline performance on that provement (+0.3 BLEU), the KFTT model im- category. By modelling formality levels more ex- proves more (+0.9 BLEU), and the JESC model plicitly, our models are better able to compensate shows the highest improvement (+1.5 BLEU). the inherent bias towards informal style. When looking at the individual portions of the test set, as identified by our classifier, we see a 4 Analysis and Examples larger quality improvement for the model trained on proprietary data on the informal and formal To show some concrete examples of our formality- sections of its test set, and a smaller improve- aware translations, Table9 contains an exam- ment on the polite section. The ASPEC formality- ple of the MT output from the JESC formality- aware NMT is not better on the informal section aware NMT model and the corresponding JESC of its test set, but there are larger gains in quality NMT baseline trained without formality annota- on the polite and formal test sections. The JESC tions. For this single English source sentence, and KFTT models improve on all three sections, there are multiple different MT outputs depending with the largest gains seen in the formal section. on which formality label is attached to the source Finally, formality-aware NMT improves over the before passing it to the NMT model for transla- rule-based method for all models and test sections, tion. The informal expression ない (nai “there is indicating that the NMT model is more effective at not”) is used in the MT output by both the baseline producing the desired formality level in context. model and the formality-aware NMT model when

49 test BLEU Dataset Model Overall Informal Polite Formal Proprietary Baseline NMT 24.6 17.8 28.0 17.4 Rule-Based Conversion - 18.3 28.0 - Formality-Aware NMT 25.5 18.7 28.4 22.3 ASPEC Baseline NMT 43.0 42.8 45.7 33.0 Rule-Based Conversion - 42.8 44.5 - Formality-Aware NMT 43.3 42.8 47.1 43.9 JESC Baseline NMT 18.8 18.0 19.4 22.1 Rule-Based Conversion - 18.7 20.6 - Formality-Aware NMT 20.3 19.4 21.9 29.3 KFTT Baseline NMT 24.6 26.0 18.1 11.4 Rule-Based Conversion - 26.4 18.1 - Formality-Aware NMT 25.5 26.5 20.7 19.3

Table 6: KyTea-tokenized BLEU, comparing baseline NMT, rule-based conversion and formality-aware NMT models on heldout test

F1 Dataset Model Informal Polite Formal Internal Baseline NMT 0.59 0.82 0.29 Formality-Aware NMT 0.97 0.99 0.91 ASPEC Baseline NMT 0.95 0.67 0.19 Formality-Aware NMT 1.00 1.00 0.96 JESC Baseline NMT 0.85 0.50 0.28 Formality-Aware NMT 1.00 0.99 0.96 KFTT Baseline NMT 0.93 0.41 0.00 Formality-Aware NMT 1.00 0.96 0.74

Table 7: F1 scores for each formality category when comparing predicted labels for MT output and reference translation.

Train Informal Polite Formal output from the internal formality-aware NMT Internal 41.5% 56.9% 1.6% model and the corresponding internal NMT base- ASPEC 87.6% 11.2% 1.2% line trained without formality annotations. Again JESC 80.7% 18.3% 0.9% there are multiple different MT outputs depending KFTT 90.9% 8.4% 0.7% on which formality label is attached to the source. The informal expression 戦う (tatakau “to fight”) Table 8: Percentage of each training data set classified is used in the MT output by the formality-aware as informal, polite, or formal NMT model when ’informal’ is attached to the source segment. The polite expression 戦います (tatakaimasu “to fight”) is used by the baseline ’informal’ is attached to the source segment. The NMT model and the formality-aware NMT model polite expression ありません (arimasen “there is when ’polite’ is attached, and the formal expres- not”) is used by the formality-aware NMT model sion 戦いをいたします (tatakai-wo itashimasu when ’polite’ is attached, and the formal expres- “to do battle” humble) is used by the formality- sion ございません (gozaimasen “there is not”) aware NMT model when ’formal’ is attached. All is used by the formality-aware NMT model when of these expressions have the same meaning, but ’formal’ is attached. All of these expressions have correspond correctly to the desired level of formal- the same meaning, but correspond correctly to the ity. Here the formality-aware NMT model is clos- desired level of formality. est to the reference when ’informal’ is attached, Table 10 shows another example of the MT but all three formality-aware NMT translations are

50 Source There’s nothing to apologize for. 謝ることなんかなにもないさ Reference ayamaru koto nanka nanimo nai sa NMT Model MT Output and Transcription 謝ることなんて何もない Baseline ayamaru koto nante nanimo nai 別に謝ることはない Formality-Aware - Informal betsu ni ayamaru koto wa nai 謝ることはありません Formality-Aware - Polite ayamaru koto wa arimasen お詫びのことは何もございません Formality-Aware - Formal owabi no koto wa nanimo gozaimasen

Table 9: Example output from JESC NMT baseline model and formality-aware NMT model, when each formality level is attached to the source segment. equally adequate. translation. The same approach has been success- fully used in other applications, such as in distin- 5 Related Work guishing standard versus back-translated translation parallel corpora (Caswell et al., 2019). Sennrich et al.(2016a) showed that side constraints can be added to the source side of a par- 6 Conclusion allel text to provide control over the politeness of translation output in an English-German transla- We have shown how the distinctions between lev- tion task. Following this paper’s suggestion, we els of formality in the Japanese language can take a similar approach towards Japanese hon- be learned by an NMT model, by identifying orifics. Japanese honorifics in parallel training data and Niu et al.(2017) also use a similar approach, labeling the source side with an additional feature. termed “Formality-Sensitive Machine Transla- We find that this technique provides control over tion”, in a French-English translation task. In (Niu the honorifics present in the MT output and pro- et al., 2018) French-English parallel text with for- vides an improvement in translation quality, par- mality features is combined with English-English ticularly in polite and formal sentences in each test parallel text, where the source and target are of set. This improvement holds for models trained on similar meaning but different formality, to create proprietary data as well as models trained on three a multi-task model that performs both formality- widely-used publicly available Japanese data sets. sensitive MT and monolingual formality transfer. In future work, we would like to explore augment- In related work on Japanese-English NMT, Ya- ing the training data for each of the comparisons magishi et al.(2016) use a side-constraint ap- we showed. We would like to explore creating proach to control the voice (active or passive) artificial English-Japanese data by doing a rule- of an English translation. Takeno(2017) apply based transformation of the Japanese side of the side constraints more broadly to control transla- bitext into different formality levels. We would tion length, bidirectional decoding, domain adap- also like to do further human evaluation of our tation, and unaligned target word generation. Japanese formality classifier and the NMT mod- Our paper follows the modeling approach intro- els we trained, and we may explore applying this duced by Johnson et al.(2017), who showed that technique to English-Korean NMT because Ko- by adding a token to the source side of parallel text rean also has a similar system of honorifics. allows for training a single NMT model on data for multiple language pairs. Their token specifies the desired target language, allowing the user control References over the language of machine translation output, Isaac Caswell, Ciprian Chelba, and David Grangier. even for source-target language pairs that were not 2019. Tagged back-translation. In Proceedings seen during training, which they call ”zero-shot” of the Fourth Conference on Machine Translation,

51 Source King Arthur’s knights do battle with a killer rabbit. 円卓の騎士たちはキラーラビットと戦う。 Reference entaku-no kishitachi-wa kira¯ rabitto-to tatakau NMT Model MT Output and Transcription アーサー王の騎士はキラーウサギと戦います。 Baseline as¯ a¯ o-no¯ kishitachi-wa kira¯ usagi-to tatakaimasu キングアーサーの騎士たちはキラーウサギと戦う。 Formality-Aware - Informal kingu as¯ a-no¯ kishitachi-wa kira¯ usagi-to tatakau キングアーサーの騎士たちはキラーウサギと戦います。 Formality-Aware - Polite kingu as¯ a-no¯ kishitachi-wa kira¯ usagi-to tatakaimasu キングアーサーの騎士たちはキラーウサギと戦いをいたします。 Formality-Aware - Formal kingu as¯ a-no¯ kishitachi-wa kira¯ usagi-to tatakai-wo itashimasu

Table 10: Example output from internal NMT baseline model and formality-aware NMT model, when each formality level is attached to the source segment.

pages 53–63, Florence, Italy. Association for Com- Graham Neubig. 2011b. The Kyoto free translation putational Linguistics. task.

Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Graham Neubig, Yosuke Nakata, and Shinsuke Mori. Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, 2011a. Pointwise prediction for robust, adaptable Fernanda Viegas,´ Martin Wattenberg, Greg Corrado, Japanese morphological analysis. In Proceedings of et al. 2017. Google’s multilingual neural machine the 49th Annual Meeting of the Association for Com- translation system: Enabling zero-shot translation. putational Linguistics: Human Language Technolo- Transactions of the Association for Computational gies, pages 529–533. Association for Computational Linguistics, 5:339–351. Linguistics.

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Xing Niu, Marianna Martindale, and Marine Carpuat. Toma´sˇ Mikolov. 2017. Bag of tricks for efﬁcient 2017. A study of style in machine translation: Con- text classiﬁcation. In Proceedings of the 15th Con- trolling the formality of machine translation output. ference of the European Chapter of the Association In Proceedings of the 2017 Conference on Empiri- for Computational Linguistics: Volume 2, Short Pa- cal Methods in Natural Language Processing, pages pers, pages 427–431. Association for Computational 2814–2819. Linguistics. Xing Niu, Sudha Rao, and Marine Carpuat. 2018. Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: Multi-task neural models for translating between Proceedings A method for stochastic optimization. In 3rd Inter- styles within and across languages. In of the 27th International Conference on Computa- national Conference on Learning Representations. tional Linguistics arXiv.org. , pages 1008—-1021. Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Jing Zhu. 2002. BLEU: a method for automatic Callison-Burch, Marcello Federico, Nicola Bertoldi, evaluation of machine translation. In Proceedings Brooke Cowan, Wade Shen, Christine Moran, of the 40th Annual Meeting of the Association for Richard Zens, Chris Dyer, Ondrej Bojar, Alexan- Computational Linguistics, pages 311–318. Associ- dra Constantin, and Evan Herbst. 2007. Moses: ation for Computational Linguistics. Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of Reid Pryzant, Youngjoo Chung, Dan Jurafsky, and the Association for Computational Linguistics Com- Denny Britz. 2018. JESC: Japanese-English subti- panion Volume Proceedings of the Demo and Poster tle corpus. In Proceedings of the 11th Language Re- Sessions, pages 177–180. Association for Computa- sources and Evaluation Conference. European Lan- tional Linguistics. guage Resource Association.

Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchi- Rico Sennrich, Barry Haddow, and Alexandra Birch. moto, Masao Utiyama, Eiichiro Sumita, Sadao 2016a. Controlling politeness in neural machine Kurohashi, and Hitoshi Isahara. 2016. ASPEC: translation via side constraints. In Proceedings of Asian scientiﬁc paper excerpt corpus. In Pro- NAACL-HLT 2016, pages 35–40. Association for ceedings of the Tenth International Conference on Computational Linguistics. Language Resources and Evaluation (LREC 2016), pages 2204–2208. European Language Resources Rico Sennrich, Barry Haddow, and Alexandra Birch. Association (ELRA). 2016b. Neural machine translation of rare words

52 with subword units. In Proceedings of the 54th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725. Association for Computational Linguistics. Nagata Masaaki Yamamoto Kazuhide Takeno, Shun- suke. 2017. Controlling target features in neural machine translation via preﬁx constraints. In Pro- ceedings of the 4th Workshop on Asian Translation, pages 55–63. Yasuhito Tanaka. 2001. Compilation of a multilingual parallel corpus. In Proceedings of PACLING 2001, pages 265–268. Ashish Vaswani, Samy Bengio, Eugene Brevdo, Franc¸ois Chollet, Aidan Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2Tensor for neural machine translation. In Proceedings of the 13th Confer- ence of the Association for Machine Translation in the Americas (Volume 1: Research Papers), pages 193–199. Association for Machine Translation in the Americas.

Hayahide Yamagishi, Shin Kanouchi, Takayuki Sato, and Mamoru Komachi. 2016. Controlling the voice of a sentence in Japanese-to-English neural machine translation. In Proceedings of the 3rd Workshop on Asian Translation, pages 203–210.