Facilitating Terminology Translation with Target Lemma Annotations Toms Bergmanisyz and Marcis¯ Pinnisyz yTilde / Vien¯ıbas gatve 75A, Riga, Latvia zFaculty of Computing, University of Latvia / Rain¸a bulv. 19, Riga, Latvia
[email protected] Abstract has assumed that the correct morphological forms are apriori known (Hokamp and Liu, 2017; Post Most of the recent work on terminology in- and Vilar, 2018; Hasler et al., 2018; Dinu et al., tegration in machine translation has assumed 2019; Song et al., 2020; Susanto et al., 2020; Dou- that terminology translations are given already gal and Lonsdale, 2020). Thus previous work has inflected in forms that are suitable for the tar- get language sentence. In day-to-day work of approached terminology translation predominantly professional translators, however, it is seldom as a problem of making sure that the decoder’s the case as translators work with bilingual glos- output contains lexically and morphologically pre- saries where terms are given in their dictionary specified target language terms. While useful in forms; finding the right target language form some cases and some languages, such approaches is part of the translation process. We argue come short of addressing terminology translation that the requirement for apriori specified tar- into morphologically complex languages where get language forms is unrealistic and impedes the practical applicability of previous work. In each word can have many morphological surface this work, we propose to train machine trans- forms. lation systems using a source-side data aug- For terminology translation to be viable for trans- mentation method1 that annotates randomly se- lation into morphologically complex languages, lected source language words with their tar- terminology constraints have to be soft.