Arxiv:1904.00930V1 [Cs.CL] 1 Apr 2019
Total Page:16
File Type:pdf, Size:1020Kb
Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation Nikolai Vogler and Craig Stewart and Graham Neubig Language Technologies Institute Carnegie Mellon University fnikolaiv,cas1,[email protected] Abstract speakers, or memory constraints (Lambert and Moser-Mercer, 1994; Liu et al., 2004). Human Simultaneous interpretation, the translation of speech from one language to another in real- short-term memory is particularly at odds with the time, is an inherently difficult and strenuous simultaneous interpreter as he or she must con- task. One of the greatest challenges faced sistently recall and translate specific terminology by interpreters is the accurate translation of uttered by the speaker (Lederer, 1978; Daro` and difficult terminology like proper names, num- Fabbro, 1994). Despite psychological findings bers, or other entities. Intelligent computer- that rare words have long access times (Balota and assisted interpreting (CAI) tools that could an- Chumbley, 1985; Jescheniak and Levelt, 1994; alyze the spoken word and detect terms likely Griffin and Bock, 1998), listeners expect inter- to be untranslated by an interpreter could re- duce translation error and improve interpreter preters to quickly understand the source words performance. In this paper, we propose a task and generate accurate translations. Therefore, pro- of predicting which terminology simultaneous fessional simultaneous interpreters often work in interpreters will leave untranslated, and exam- pairs (Millan´ and Bartrina, 2012); while one inter- ine methods that perform this task using super- preter performs, the other notes certain challeng- vised sequence taggers. We describe a number ing items, such as dates, lists, names, or numbers of task-specific features explicitly designed to (Jones, 2002). indicate when an interpreter may struggle with translating a word. Experimental results on a Computers are ideally suited to the task of re- newly-annotated version of the NAIST Simul- calling items given their ability to store large taneous Translation Corpus (Shimizu et al., amounts of information, which can be accessed al- 2014) indicate the promise of our proposed most instantaneously. As a result, there has been method.1 recent interest in developing computer-assisted in- terpretation (CAI; Plancqueel and Werner; Fantin- 1 Introduction uoli(2016, 2017b)) tools that have the ability to Simultaneous interpretation (SI) is the act of trans- display glossary terms mentioned by a speaker, lating speech in real-time with minimal delay, and such as names, numbers, and entities, to an inter- is crucial in facilitating international commerce, preter in a real-time setting. Such systems have the government meetings, or judicial settings involv- potential to reduce cognitive load on interpreters arXiv:1904.00930v1 [cs.CL] 1 Apr 2019 ing non-native language speakers (Bendazzoli and by allowing them to concentrate on fluent and ac- Sandrelli, 2005; Hewitt et al., 1998). However, curate production of the target message. SI is a cognitively demanding task that requires These tools rely on automatic speech recogni- both active listening to the speaker and careful tion (ASR) to transcribe the source speech, and monitoring of the interpreter’s own output. Even display terms occurring in a prepared glossary. accomplished interpreters with years of training While displaying all terminology in a glossary can struggle with unfamiliar concepts, fast-paced achieves high recall of terms, it suffers from low 1Code is available at https://github.com/nvog/ precision. This could potentially have the un- lost-in-interpretation. Term annotations for the wanted effect of cognitively overwhelming the in- NAIST Simultaneous Translation Corpus will be provided terpreter with too many term suggestions (Stew- upon request after confirmation that you have access to the corpus, available at https://ahcweb01.naist.jp/ art et al., 2018). Thus, an important desideratum resource/stc/. of this technology is to only provide terminology Streaming Feature Proposed Display ASR Extraction Terminology Tagger Results source target Speaker Interpreter Figure 1: The simultaneous interpretation process, which could be augmented by our proposed terminology tagger embedded in a computer-assisted interpreting interface on the interpreter’s computer. In this system, automatic speech recognition transcribes the source speech, from which features are extracted, input into the tagger, and term predictions are displayed on the interface in real-time. Finally, machine translations of the terms can be suggested. assistance when the interpreter requires it. For (inclusive) and i < j ≤ N (exclusive), in source instance, an NLP tool that learns to predict only sentence S0:N that satisfies the following criteria terms an interpreter is likely to miss could be inte- to be an untranslated term: grated into a CAI system, as suggested in Fig.1. In this paper, we introduce the task of predict- • Termhood: It consists of only numbers or ing the terminology that simultaneous interpreters nouns. We specifically focus on numbers or are likely to leave untranslated using only infor- nouns for two reasons: (1) based on the inter- mation about the source speech and text. We pretation literature, these categories contain approach the task by implementing a supervised, items that are most consistently difficult to re- sliding window, SVM-based tagger imbued with call (Jones, 2002; Gile, 2009), and (2) these delexicalized features designed to capture whether words tend to have less ambiguity in their words are likely to be missed by an interpreter. translations than other types of words, mak- We additionally contribute new manual annota- ing it easier to have confidence in the transla- tions for untranslated terminology on a seven talk tions proposed to interpreters. subset of an existing interpreted TED talk cor- • Relevance: A translation of s , we denote t, pus (Shimizu et al., 2014). In experiments on the i:j occurs in a sentence-aligned reference trans- newly-annotated data, we find that intelligent term lation R produced by a translator in an of- prediction can increase average precision over the fline setting. This indicates that in a time- heuristic baseline by up to 30%. unconstrained scenario, the term should be 2 Untranslated Terminology in SI translated. Before we describe our supervised model to pre- • Interpreter Coverage: It is not translated, dict untranslated terminology in SI, we first define literally or non-literally, by the interpreter in the task and describe how to create annotated data interpreter output I. This may reasonably for model training. allow us to conclude that translation thereof may have presented a challenge, resulting in 2.1 Defining Untranslated Terminology the content not being conveyed. Formally, we define untranslated terminology with respect to a source sentence S, sentence created by Importantly, we note that the phrase untrans- a translator R, and sentence created by an inter- lated terminology entails words that are either preter I. Specifically, we define any consecutive dropped mistakenly, intentionally due to the in- sequence of words si:j, where 0 ≤ i ≤ N − 1 terpreter deciding they are unnecessary to carry across the meaning, or mistranslated. We con- In California, there has been a [40] percent trast this with literal and non-literal term cover- Src O O O O O O I O age, which encompasses words translated in a ver- decline in the [Sierra snowpack]. batim and a paraphrastic way, respectively. O O O I I カリフォルニア '/、 4 1ーセント 2.2 Creating Term Annotations California 4 percent Interp To obtain data with labels that satisfy the pre- 2* く * # & しま い > し た 。 vious definition of untranslated terminology, we decline can leverage existing corpora containing sentence- Figure 2: A source sentence and its corresponding aligned source, translation, and simultaneous in- interpretation. Untranslated terms are surrounded by terpretation data. Several of these resources ex- brackets and each word in the term is labeled with an ist, such as the NAIST Simultaneous Translation I-tag. The interpreter mistakes the term 40 for 4, and Corpus (STC) (Shimizu et al., 2014) and the Euro- omits Sierra snowpack. pean Parliament Translation and Interpreting Cor- pus (EPTIC) (Bernardini et al., 2016). Next, we the label I, and all others are assigned a label O, as process the source sentences, identifying terms shown in Fig.2. that satisfy the termhood, relevance, and inter- preter coverage criteria listed previously. 3 Predicting Untranslated Terminology • Termhood Tests: To check termhood for With supervised training data in hand, we can cre- each source word in the input, we first part- ate a model for predicting untranslated terminol- of-speech (POS) tag the input, then check the ogy that could potentially be used to provide in- tag of the word and discard any that are not terpreters with real-time assistance. In this sec- nouns or numbers. tion, we outline a couple baseline models, and then describe an SVM-based tagging model, which we • Relevance and Interpreter Coverage Tests: specifically tailor to untranslated terminology pre- Next, we need to measure relevancy (whether diction for SI by introducing a number of hand- a corresponding target-language term ap- crafted features. pears in translated output), and interpreter coverage (whether a corresponding term does 3.1 Heuristic Baselines not appear in interpreted output). An approx- In order to compare with current methods for term imation to this is whether one of the transla- suggestion in CAI, such as Fantinuoli(2017a), we tions listed in a bilingual dictionary appears first introduce a couple of heuristic baselines. in the translated or interpreted outputs re- spectively, and as a first pass we identify all • Select noun/# POS tag: Our first baseline source terms with the corresponding target- recalls all words that meet the termhood re- language translations. However, we found quirement from x2. Thus, it will achieve per- that this automatic method did not suffice to fect recall at the cost of precision, which will identify many terms due to lack of dictionary equal the percentage of I-tags in the data.