<<

Pragmatic information in translation: a corpus-based study of tense and mood in English and German

Anita Ramm†, Ekaterina Lapshinova-Koltunski‡, Alexander Fraser† †Center for Information and Language Processing, LMU Munich ‡Saarland University

Abstract Finally, we take a look to the modeling of tense and mood for machine translation pointing to im- and mood are important portant features needed to transfer tenses between linguistic phenomena to consider in natural language processing (NLP) research. We con- languages. Our analysis indicates that bilingual sider the correspondence between English and modeling of tense and mood cannot be properly German tense and mood in translation. Human done by considering solely lexical/syntactic fea- translators do not find this correspondence tures, e.g. words, POS tags, etc., also supported easy, and as we will show through careful anal- by the previous work (Ye et al., 2006). Instead, in- ysis, there are no simplistic ways to map tense corporation of pragmatic information is required, and mood from one language to another. Our which is currently not directly accessible to most observations about the challenges of human translation of tense and mood have important NLP systems. We summarize the pragmatic in- implications for multilingual NLP. Of partic- formation required and provide a list of available ular importance is the challenge of modeling tools for automatic annotation with the respective tense and mood in rule-based, phrase-based information, which will be of direct use in future statistical and neural machine translation. efforts to solve this difficult modeling task. In the following, we present theoretical issues and 1 Introduction related work (Section2), quantitative analysis on This paper analyzes tense and mood in En- the usage of the tense/mood correspondences in glish and German from the perspective of the English-German parallel data and their modeling data commonly used to train MT systems or to in the context of MT (Section3), summarizing the model tense/mood, namely freely available bilin- findings in Section4. gual texts. The need for a thorough analysis of tense/mood in parallel texts arises from the fact 2 Theoretical issues and related work that there is a high degree of variation between 2.1 Contrasts in English and German tense the two languages resulting in a many-to-many re- and mood systems lation in the tense/mood translation between En- glish and German. Particularly, frequently oc- As known from contrastive (König curring unintuitive tense correspondences and the and Gast, 2012; Hawkins, 2015), English low frequency of the many tense/mood combina- and German share a common ground of six tions is problematic for different NLP tasks us- morpho-syntactic tenses: present/Präsens, simple ing parallel corpora. We study the correspon- past/Präteritum, present /Perfekt, past dences in a large English-German parallel corpus perfect/Plusquamperfekt, future I/Futur I and and explain them from the point of view of dif- future II/Futur II. We summarize those with ferent pragmatic factors – contextual constraints examples in both languages in Table1, which in terms of genre/user preferences or textual prop- we created to show the correspondence between erties, and tense interchangeability. We compare these languages. In English, each of the tenses English and German morpho-syntactic tense sets has a progressive variant. The German tense suffering from tense correspondence gaps in both system does not have an explicit marking of the directions and discuss the impact of translation progressive aspect. But German has a larger process on the tense/mood variability in our data. set of subjunctive tense forms. While a few of Morph. English German tense Synt. tense Example Synt. tense Example present simple (I) read Präsens (Ich) lese present progressive (I) am reading (I) have read Perfekt (Ich) habe gelesen present perfect progressive (I) have been reading (I) will read present future I Futur I (Ich) werde lesen (I) am going to read (I) will be reading future I progressive (I) am going to be reading future II (I) will have read Futur II (Ich) werde gelesen haben future II progressive (I) will have been reading past simple (I) read Präteritum (Ich) las past progressive (I) was reading Plusquam- past past perfect (I) had read (Ich) hatte gelesen perfekt past perfect progressive (I) had been reading present* conditional I (I) would read Konjunktiv II (Ich) würde lesen conditional I progressive (I) would be reading past* conditional II (I) would have read Konjunktiv II (Ich) hätte gelesen conditional II progressive (I) would have been reading (Er) lese present* Konjunktiv I (Er) werde lesen

Table 1: List of the tenses in English and German in active . The table indicates the tense correspondences in terms of their morpho-syntactic structure. them have direct morpho-syntactic counterparts (1). Such sentences may, for instance, indicate po- in English, most of them correspond to indicative liteness. tenses in English. The of a specific tense (1) Ich hätte gern ein Glas Wasser. form may considerably vary too. We summarize I have gladly a glass water. the contrasts related to the meaning of the English ’I’d like to have a glass of water.’ and German tenses described by König and Gast(2012) in Table2. This refers Both Konjunktiv I and Konjunktiv II can be used in to different aspects such as the time the context of the reported speech. Note, however, (past, futurate, future, etc.) and relation to the that the use of the is not gram- moment of utterance (resultative, universal, matically required to signal reported speech. In narrative). In other words, the (non-)parallelism fact, the two Konjunktiv forms and the indicative of the respective tenses can be established by mood are often used interchangeably in reported considering specific semantic properties of a given speech (Csipak, 2015). For the English subjunc- and the utterance that the respective verb tive mood, König and Gast(2012) rather use the occurs in. Different aspects in the English tense term quasi-subjunctive, since subjunctive mood in system have different impacts on the use of tenses. English exists only for the verb be. Other forms For instance, in to the used in the subjunctive contexts correspond to the tense, the present progressive can be used in the infinitives. The German Präsens and Futur I are futurate context. In German, Präsens can almost interchangeable in many contexts. In the futurate always be used to refer to the future. The English use, Präsens is usually combined with a tempo- progressive tense lacks direct counterparts in ral phrase which points to the future; in (2), the German and is therefore translated into a number adverbial morgen provides the respective temporal of different German tenses. information. However, the temporal phrase is not always overtly given in a considered sentence: in English and German also differ greatly with re- (3), the verb kommen in the refers to spect to the grammatical mood. In German, the the future which is obvious solely by considering subjunctive is expressed in the verbal the preceding sentence. and interacts with the German tense system chang- ing the time of an utterance. German distinguishes (2) Ich komme morgen. I come tomorrow. between two subjunctive morpho-syntactic forms: ’I’ll come tomorrow.’ Konjunktiv I and Konjunktiv II. The latter is used (3) Kommst du morgen? Ja, ich komme. in the context of conditional and contrafactual ut- Come you tomorrow? Yes, I come. terances. Usually, sentences with Konjunktiv II are ’I’ll come tomorrow.’ composed of at least two clauses. There are, how- ever, also free factive occurrences of Konjunktiv II, Another prominent example of tense interchange- where it occurs in a simple sentence, see Example ability in German is related to the past tenses. Use German English Präsens/present tense non-past Ich schlafe von 12 bis 7. I sleep from midnight to seven. futurate Morgen weiß ich das. → future tense (I will know that tomorrow.) Präteritum/simple past past time Ich schlief den ganzen Tag. I slept the whole day. Futur I/future tense future time Ich werde schlafen. I will sleep. I am going to sleep. Perfekt/present perfect resultative Jemand hat mein Auto gestohlen. Someone has stolen my car. existential Ich habe (schon mal) Tennis gespielt. I have played tennis. hot news Kanzler Schröder ist zurückgetreten. Chancellor Schröder has resigned. universal → Präsens (Ich lebe hier seit 2 jahren.) I have lived here for two years. narrative Ich bin gestern im Theater gewesen. → past tense (I was in theater yesterday.) Futur II/future perfect future Ich werde das bis morgen erledigt haben. I will have done this by tomorrow. results Plusquamperfekt/past perfect pre-past Ich hatte geschlafen. I had slept.

Table 2: Meaning of tenses in English and German (König and Gast, 2012, p. 92)

There are some fine-grained differences between ent (i.e., domain-specific) distributional specifics: the respective tenses, but at least Präteritum and past tenses are rather typical for narrative texts, Perfekt are interchangeable in many contexts, while present tense are more typical for ar- see Sammon(2002). In fact, the dominance of ei- gumentative texts such as political essays, popu- ther of the forms is a matter of author’s preference lar science articles, etc. These findings are in line or contextual constraints, see 2.2 below. For in- with the classification of tenses proposed by Wein- stance, Perfekt is often used in spoken language, rich(2001). In addition to contextual constraints while Präteritum is more frequently used in writ- expressed in genre or register, translation of tenses ing. Furthermore, there is a certain lexical prefer- may also follow a set of rules defined for a spe- ence: auxiliaries and modals are more frequently cific translation project. For instance, the transla- used in Präteritum than in Perfekt. tion guidelines of the European Commission for German require the session minutes or reports be 2.2 Contextual constraints written in the present tense.1

Contextual constraints on the tense/mood usage 2.3 Tense and mood in human translation have been analyzed mostly in a monolingual con- text. For example, Weinrich(2001) differentiates Tense and mood were analyzed in previous studies between two groups of the German tenses: (i) dis- on English-German translation (Teich, 2003; Neu- cussing (Präsens, Perfekt, Futur I, Futur II) and mann, 2013). However, a systematic description (ii) narrative (Präteritum, Plusquamperfekt, and of tense/mood transformation patterns for this lan- subjunctives Konjunktiv I and Konjunktiv II). His guage pair has been missing until our work. At classification is relevant for genre differentiation. the same time, translation studies provide us with For instance, the narrative tenses are mostly found valuable information on how translation process in written German (e.g., literary works), while the has an impact onto translated texts which, as a re- discussing tenses are more often used in the spo- sult, differ from non-translated texts both in the ken language. In a multilingual context, there exist source (SL) and the target language (TL). These a few studies that analyze the role of tense/mood differences are reflected in the features of trans- in functional variation of language called register lated language (Gellerstam, 1986; Baker, 1993). variation. Biber(1995) uses preferences for spe- Two of these translation features are important for cific tense and mood as linguistic indicators for bilingual modeling of tense and mood: (i) shin- specific registers in a number of languages. Neu- ing through and (ii) normalization. The former mann(2013) presents a contrastive corpus-based one indicates the closeness of the translation to the study of English and German (including transla- source (Teich, 2003), whereas the latter one is re- tions), in which the tense frequency is used among lated to the tendency to conform (and exaggerate) other textual properties to induce the goal type of the patterns typical for the TL (Baker, 1993). We the text (one of the parameters of register varia- would observe shining through in our data if tenses tion): argumentation, narration, instruction, etc. used in the sources are preserved in the transla- She observed that the frequency of present vs. past 1https://ec.europa.eu/info/sites/info/ across texts from different registers expose differ- files/german_style_guide_de_0.pdf tions. While there is much parallelism with re- reported encouraging results, Gispert and Mariño spect to tense in the two languages under analy- (2008) and Ramm and Fraser(2016) left unan- sis, many cases may expose a TL-specific usage of swered questions about the appropriate method tense, which may considerably differ from a form and the necessary contextual information for mod- given in the source due to a smaller set of tenses eling tense and mood in a bilingual context. available in the system. Follow- ing Teich(2003), shining through is less promi- 3 Analyses nent and normalization is more prominent when 3.1 Tense and mood in human translation translating into a language which has fewer op- tions with respect to a specific grammatical sys- Data and tools Since one of our aims is to serve tem. This means that our parallel texts may expose the task of machine translation, our contrastive a great variation in the tense translation. Finally, analysis of tense and mood in English and German parallel corpora represent a concatenation of the relies on the parallel corpora provided for WMT15 translations produced by many different transla- shared tasks on machine translation (Bojar et al., tors. Therefore, we expect that the observed vari- 2015). We make use of the News corpus (news ar- ation in tense translations can be impacted by the ticles, 272k sentences), the Europarl corpus (1,9 preferences of a specific translator. mio. sentences) and the Crawl corpus, a large collection of mix-domain bilingual documents re- 2.4 Tense and mood in machine translation trieved from the Internet (2,4 mio. sentences). In 2 and NLP addition, we also consider Pattr , a medical corpus (1,8 mio. sentences). In this way, we have a con- In the context of the rule-base MT, i.e., in EURO- stellation of various domains (as they are called in TRA (Copeland et al., 1991), translation of tense NLP) or registers/genres (as they are called in the and mood relies on an interlingua representation studies described in 2.2 above). The corpora are to which the SL sentence is mapped, and which is tokenized with a standard tokenizer provided with then mapped to the of the TL respectively. the SMT toolkit Moses (Koehn et al., 2007) and This mapping is rule-based and follows a set of parsed with the Mate parser (Bohnet and Nivre, manually defined rules which make use of differ- 2012) which provides dependency parse trees for ent kinds of information. The rules for English- both languages, and, for German, morphological German formulated within EUROTRA indicate analysis of words. Both sides of the parallel cor- that tense cannot be considered in isolation, but pora are annotated with tense, mood and voice in- rather in a combination with other related linguis- formation using the TMV annotator (Ramm et al., tic features such as aspect and Aktionsart. Thus, 2017). The English-German verb pairs annotated specific modality, as well as voice properties, need with the respective information are then extracted to be considered in the bilingual modeling of tense by (i) automatically computing word alignment and mood. of the parallel texts with Giza++ (Och and Ney, Recently, there have been attempts to automat- 2003) and (ii) identifying pairs of VCs from the ically model tense and mood for different NLP aligned, annotated parallel data (see example in tasks. In the monolingual context, for instance, Figure1). In our analyses, we do not differenti- Tajiri et al.(2012) used a tense classification ate between translation directions, because we are model for detecting and correcting tense in the interested in all transformations possible for the texts produced by English learners. In the bilin- analyzed language pair. gual context, Ye et al.(2006) presented an em- pirical study of the features needed to train a Indicative tense As already mentioned in Sec- classification model for predicting English tenses tion 2.1, the English progressive tenses are trans- given the source sentences in Chinese. Gispert lated into a number of different German tenses. and Mariño(2008), Loáiciga et al.(2014) and Figure2 illustrates the frequency distribution of Ramm and Fraser(2016) presented work on build- the English present perfect (progressive) in our ing tense classification models which are used to data. It is striking that both English tense forms improve tense choice in statistical MT systems correspond to three different German tenses in for English-Spanish, English-French and English- 2http://www.cl.uni-heidelberg.de/ German, respectively. While Loáiciga et al.(2014) statnlpgroup/pattr/de-en.tar.gz ROOT P tense: present OBJ CONJ NMOD SBJ VC COORD NMOD mood: indicative New drugs may slow lung and ovarian cancer . voice: active

New drugs may3 slow4 lung and ovarian cancer .

Neue Medikamente können3 Lungen- und Eierstockkrebs verlangsamen7 .

ROOT PU OC MD tense: present NK SBJ CJ CD mood: subjunctive Neue Medikamente könnten Lungen- und Eierstockkrebs verlangsamen . voice: active

Figure 1: Word-aligned, parsed English-German parallel sentence pair with TMV annotations. Parallel VC: may slow ↔ können verlangsamen, tense/mood pair present/indicative ↔ present/subjunctive. most cases: Präsens, Perfekt and Präteritum for the conditional I and Perfekt for the conditional whereby Perfekt is the most prominent equiva- II. lent. Considering the two German past tenses to- gether, it becomes clear that both present perfect Finite vs. non-finite verbal complexes Our tense forms correspond to one of the German past data shows that the usage of non-finite VCs in the tenses more often than the present tense does. Pro- two languages varies considerably. For instance, gressiveness also seems to have a large impact on in the News corpus, 16.7% of all VCs in English the translation of present perfect into German: ca. are non-finite, while this is the case for only 7.9% 77% of the non-progressive forms corresponds to of the German VCs. Similar ratio is also given in one of the German past tenses, whereas 56% of the Europarl corpus in which 18.2% of the English the progressive cases do so. In other words, the VCs and 6.2% of the German VCs, respectively, progressive present tense still prefers to be trans- are non-finite. Figure5 indicates that the major ferred into one of the German past tenses. How- part of the English non-finite VCs have German ever, the German Präsens more often corresponds finite VCs as equivalents. These translation equiv- to this English tense than to the non-progressive alents pose an interesting problem in the context variant. of MT. When translating from English to German, MT needs to generate a finite clause for the given non-finite source clause. Particularly, it needs to Subjunctive mood The frequency distribution generate a finite German VC in a tense form for of the tense correspondences between subjunc- which there is no obvious evidence in the source. tive forms based on news texts is shown in Fig- ure3. As expected, the German Konjunktiv tense Tense interchangeability In the data from the forms are equivalents of all English indicative News corpus, we identified 190 occurrences of the tense forms in the dataset at hand. Thereby, the auxiliary sein (to be) in one of the composed past Konjunktiv II is a more frequent equivalent than tenses (i.e., Perfekt and Plusquamperfekt) in ac- the Konjunktiv I. Assumed that the conditional and tive voice in contrast to 10,247 occurrences in the counter-factual situations in English are described simple past tense Präteritum. This lexical prefer- with conditional forms, it is quite unexpected that ence is also given for a few additional full verbs the other English tense forms more often corre- in German as shown by counts derived from the spond to the German Konjunktiv II (used to in- Crawl corpus: denken (819 vs. 354), stehen (3083 dicate conditional contexts) than to Konjunktiv I vs. 98), geben (7220 vs. 1523), ziehen (1565 vs. (used to indicate reported speech) in our transla- 145). The data suggests the same preference also tion data. A possible explanation for this is that for the passive voice: denken (184 vs. 38), geben in the news data, Konjunktiv II is more often used (1517 vs. 395), ziehen (380 vs. 78). to express reported speech than the Konjunktiv I form. When expressing non-factual events, En- Contextual specifics Direct of the glish conditionals can be seen as direct counter- tense frequencies extracted from the German data parts of the German Konjunktiv II: Figure4 shows (Figure6) shows variation in the usage of tenses that Konjunktiv II is the most frequent equivalent in different domains (or registers). Being the most for all four English conditional tense forms in our frequent tense form in all corpora, Präsens, how- data. Further frequent counterparts are Präteritum ever, differs in its relative frequency. For instance, PresPerf PresPerfProg 0.6

0.3

0.1

0.05

0.01

0.005

0.001

Präsens Präteritum Perfekt Pluperfekt Futur I Futur II Konj I pres Konj I past Konj II pres Konj II past

Figure 2: German correspondences of the English present prefect (progressive) in the Europarl corpus.

1 Konjunktiv I Konjunktiv II

0.8

0.6

0.4

0.2

0

pres past futureI condI condII presProg pastProg presPerf pastPerf futureII futureIProg condIProg condIIProg toInfinitive presPerfProg pastPerfProg futureIIProg

Figure 3: English correspondences of the German Konjunktiv tenses in the Europarl corpus.

1 0.6 Präsens News Europarl Perfekt 0.3 Präteritum 0.8 Pluperfekt Futur I 0.1 Futur II Konjunktiv I 0.6 0.05 Konjunktiv II

0.4 0.01

0.2

0

Präsens Perfekt Futur I Futur II Präteritum Pluperfekt Infinitive conditional1 conditional2 Konjunktiv I conditional1Pr conditional2Pr Konjunktiv II

Figure 4: German correspondences of the English con- Figure 5: German correspondences of the English non- ditionals in the News corpus. finite VCs – and to-infinitives. in News, the relative frequency of Präsens is 9% which is important for translation of the German lower than in Europarl (0.66 vs. 0.75), while subjunctive mood. However, the set of syntactic Präsens represents 97% of the tense forms in Pattr rules described by Olivas et al.(2005) can be re- (medical texts). We also observe variation in the used to identify the respective contexts in English. past tense use within the respective corpora. For To our knowledge, there are no publicly avail- instance, in Europarl, Präteritum and Perfekt have able tools for automatic annotation of texts with almost equal relative frequency (0.08 and 0.10, re- genre and/or domain information, although there spectively), whereas News clearly prefers the nar- has been ongoing research in this area (Santini, rative tense Präteritum over the discussing tense 2007; Sharoff et al., 2010; Petrenz, 2014; Biber Perfekt (0.19 vs. 0.08, respectively). and Egbert, 2016). As seen from Table3, textual properties carrying pragmatic information repre- 3.2 Modeling tense and mood sent their own subtasks in NLP. Tools for annota- Many-to-many relation Corpus analyses of hu- tion of the respective information are mostly based man translations presented in Section 3.1 show on classification models that use many different that the respective monolingual, as well as bilin- subtask-related information. While predicted an- gual linguistic tense-related specifics in English notations are correct in many cases, they may also and German result in a many-to-many relation. be erroneous, consequently having negative im- Figure7 illustrates this relation on the basis of pact on training a tense/mood translation model. distributions of tense transformation patterns de- Instead of using outputs of many different tools re- rived from our data. A formal description of the quiring a complex processing pipeline, one might respective many-to-many relation requires knowl- train a model directly with the features used to edge on different linguistic levels: lexical, syntac- train models for predicting each of the relevant tic and semantic/pragmatic. textual properties. One of the reasons for the many-to-many rela- The information on the distribution of various tion is the different granularity of the tense sys- tenses and mood in the bilingual data is also im- tems in the two languages. While there are tenses portant. Therefore, training data should be care- in English which do not have a direct counterpart fully preselected to account for this specific distri- in German, some tense forms in German do not bution. have a direct counterpart in English (Konjunktiv I) either. 4 Discussion and Conclusion

Tense/mood-related contextual features For The paper describes a contrastive analysis of En- automatic modeling of tense and mood, textual glish and German tense and mood by means of characteristics discussed in the preceding sections parallel data. We provide an overview of the cate- need to be mapped to the specific contextual in- gories available in both language systems, point to formation overtly given in a sentence. The respec- the existing asymmetries providing corpus-based tive contextual features are summarized in Table evidence from human translations and formulate 3. Many of these features can be derived from assumptions on their impact on MT. Our trans- parsed and POS-tagged data. However, some of lation data shows considerable amount of varia- them require access to other annotation tools, as tion of tense/mood translation between the two well as lexical databases which include informa- languages. Translations also vary a lot leading tion about semantic properties of the words/ Au- to many unexpected tense/mood correspondences. tomatic annotation of the temporal ordering can The observed variation may be explained by a be done with the tool TARSQI (Verhagen et al., number of different factors which are not only re- 2005). Information about tense, mood and voice lated to the differences on lexical/syntactic level of the VCs in the English texts can be obtained between the considered languages, but also to a with the TMV annotator (Ramm et al., 2017). number of pragmatic factors, including the pro- Information about Aktionsart in terms of state, cess of translation. We also show that modeling event and progress can be gained from the out- tense/mood for MT requires additional informa- put of the tool Sitent (Friedrich and Palmer, 2016). tion beyond the morpho-syntactic properties of the Currently, no tools are publicly available for au- source, and we discuss tools for obtaining this in- tomatic identification of conditionals in English, formation, which can (and should) be used in fu- 0.9 0.6 News Europarl Crawl Pattr

0.3

0.1

0.05

0.01

0.005

0.001

Präsens Präteritum Perfekt Pluperfekt Futur I Futur II

Figure 6: Relative frequencies of the indicative active tense forms in four different German corpora.

1

0.8

0.6

Präsens 0.4 Perfekt Präteritum Pluperfekt Futur I 0.2 Futur II Konjunktiv I Konjunktiv II 0

pres past condI futureI condII futureII gerund presProg pastProg presPerf pastPerf condIProg futureIProg condIIProg toInfinitive presPerfProg pastPerfProg futureIIProg

Figure 7: Distribution of tense translations derived from the News, Europarl and Crawl corpus.

Textual property Lexical/syntactic level Annotation tool availability VC, main verb POS tagging and parse trees tense, mood, voice TMVannotator (Ramm et al., 2017) temporal expressions (NPs and PPs): TARSQI (Verhagen et al., 2005) head, preposition, adjective, adverb POS tagging + parse trees temporal ordering TARSQI (Verhagen et al., 2005) Tense Aspect auxiliary (combination) POS tagging and parse trees + mapping rules event/state/progress sitent (Friedrich and Palmer, 2016) NP: parse trees determiner, quantifier semantic properties Aktionsart number POS tagging mass, count WordNet Domain/ genre - Reported speech QSample (Scheible et al., 2016) Conditional clauses -

Table 3: Mapping of the different textual properties to the corresponding lexical/syntactic levels. Column Tool availability lists tools for automatic annotation of the English texts with the respective information. ture modeling research. Beyond directly improv- made). ing the modeling, an interesting future considera- tion would be to give the translation system user control of document-level tense and mood choices (e.g., by introducing a parameter for how choices Mona Baker. 1993. Corpus linguistics and translation for tense and mood in should be studies: Implications and applications. In G. Francis Baker M. and E. Tognini-Bonelli, editors, Text and Technology: in Honour of John Sinclair, pages 233– Proceedings of ACL, demonstration session, Prague, 250. Benjamins, Amsterdam. Czech Republic. Douglas Biber. 1995. Dimensions of Register Varia- Ekkehard König and Volker Gast. 2012. Understand- tion: A Cross-Linguistic Comparison. Cambridge ing English-German constrasts. Number 29 in University Press. Grundlagen der Anglistik und Amerikanistik. Erich Schmidt Verlag. Douglas Biber and Jesse Egbert. 2016. Using gram- matical features for automatic register identification Sharid Loáiciga, Thomas Meyer, and Andrei Popescu- in an unrestricted corpus of documents from the Belis. 2014. English-French Verb Phrase Alignment open web. Journal of Research Design and Statis- in Europarl for Tense Translation Modeling. In Pro- tics in Linguistics and Communication Science, 2. ceedings of the The 9th Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland. Bernd Bohnet and Joakim Nivre. 2012. A Transition- Based System for Joint Part-of-Speech Tagging and Stella Neumann. 2013. Contrastive register variation. Labeled Non-Projective Dependency Parsing. In In A quantitative approach to the comparison of En- Proceedings of EMNLP-CoNLL, Jeju, Korea. glish and German. Trends in Linguistics. Studies and Monographs. De Gruyter Mouton. Ondrejˇ Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Franz Josef Och and Hermann Ney. 2003. A System- Philipp Koehn, Varvara Logacheva, Christof Monz, atic Comparison of Various Statistical Alignment Matteo Negri, Matt Post, Carolina Scarton, Lucia Models. Computational Linguistics, 29(1):19–51. Specia, and Marco Turchi. 2015. Findings of the José A. Olivas, Cristina Puente, and Andrea Tejado. 2015 workshop on statistical machine translation. 2005. Searching for causal relations in text docu- In Proceedings of the Tenth Workshop on Statistical ments for ontological application. In Proceedings of Machine Translation, pages 1–46, Lisbon, Portugal. ICAI, Las Vegas, Nevada, USA. Association for Computational Linguistics. Philipp Petrenz. 2014. Cross-Lingual Genre Classifi- Charles Copeland, Jacques Durand, Steven Krauwer, cation. Ph.D. thesis, School of Informatics, Univer- and Bente Maegaar. 1991. The Eurotra linguis- sity of Edinburgh, Scotland. tic specifications. Technical report, Office for Of- ficial Publications of the Commission of the Euro- Anita Ramm and Alexander M. Fraser. 2016. Model- pean Communities, Brussels/Luxembourg. Studies ing verbal inflection for English to German SMT. In in Machine Translation and Natural Language Pro- Proceedings of WMT, Berlin, Germany. cessing 1. Anita Ramm, Sharid Loáiciga, Annemarie Friedrich, Eva Csipak. 2015. Free factive subjunctives in Ger- and Alexander Fraser. 2017. Annotating tense, man. Doctoral thesis. Niedersächsische Staats- und mood and voice for English, French and German. Universitätsbibliothek Göttingen. In Proceedings of ACL, demonstration session, Van- couver, Canada. Annemarie Friedrich and Alexis Palmer. 2016. Situa- tion entity types: automatic classification of clause- Geoff Sammon. 2002. Exploring . level aspect. In Proceedings of ACL, Berlin, Ger- Cornelson Verlag. many. Marina Santini. 2007. Automatic Identification of Martin Gellerstam. 1986. Translationese in Swedish Genre in Web Pages. Doctoral thesis. University of novels translated from English. In L. Wollin and Brighton. H. Lindquist, editors, Translation Studies in Scan- Christian Scheible, Roman Klinger, and Sebastian dinavia, pages 88–95. CWK Gleerup, Lund. Padó. 2016. Model Architectures for Quotation De- tection. In Proceedings of ACL, Berlin, Germany. Adrià de Gispert and Jose B. Mariño. 2008. On the impact of morphology in English to Spanish statis- Serge Sharoff, Zhili Wu, and Katja Markert. 2010. The tical MT. Speech Communication, 50(11-12):1034– Web Library of Babel: evaluating genre collections. 1046. In Proceedings of LREC, Malta. J.A. Hawkins. 2015. A Comparative Typology of En- Toshikazu Tajiri, Mamoru Komachi, and Yuji Mat- glish and German: Unifying the Contrasts. Rout- sumoto. 2012. Tense and aspect error correction ledge Library Editions: The . Tay- for ESL learners using global context. In Proceed- lor & Francis. ings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL): Short Papers Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris - Volume 2, Jeju Island, Korea. Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Elke Teich. 2003. Cross-Linguistic Variation in Sys- Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra tem und Text. A Methodology for the Investigation Constantin, and Evan Herbst. 2007. Moses: Open of Translations and Comparable Texts. Mouton de source toolkit for statistical machine translation. In Gruyter, Berlin. Marc Verhagen, Inderjeet Mani, Roser Saurí, Robert , Knippen, Jess Littman, and James Pustejovsky. 2005. Automating Temporal Annotation with TARSQI. In Proceedings of ACL, demonstration session, Ann Arbor, Michigan, USA. Harald Weinrich. 2001. Tempus. Besprochene und erzählte Welt, 6 edition. C.H.Beck. Yang Ye, Li Fossum, Victoria, and Steven Abney. 2006. Latent Features in Automatic Tense Translation be- tween Chinese and English. In Proceedings of the Seventh SIGHAN Workshop on Processing, Sidney, Australia.