A Corpus-Based Study of Tense and Mood in English and German
Total Page:16
File Type:pdf, Size:1020Kb
Pragmatic information in translation: a corpus-based study of tense and mood in English and German Anita Rammy, Ekaterina Lapshinova-Koltunskiz, Alexander Frasery yCenter for Information and Language Processing, LMU Munich zSaarland University Abstract Finally, we take a look to the modeling of tense and mood for machine translation pointing to im- Grammatical tense and mood are important portant features needed to transfer tenses between linguistic phenomena to consider in natural language processing (NLP) research. We con- languages. Our analysis indicates that bilingual sider the correspondence between English and modeling of tense and mood cannot be properly German tense and mood in translation. Human done by considering solely lexical/syntactic fea- translators do not find this correspondence tures, e.g. words, POS tags, etc., also supported easy, and as we will show through careful anal- by the previous work (Ye et al., 2006). Instead, in- ysis, there are no simplistic ways to map tense corporation of pragmatic information is required, and mood from one language to another. Our which is currently not directly accessible to most observations about the challenges of human translation of tense and mood have important NLP systems. We summarize the pragmatic in- implications for multilingual NLP. Of partic- formation required and provide a list of available ular importance is the challenge of modeling tools for automatic annotation with the respective tense and mood in rule-based, phrase-based information, which will be of direct use in future statistical and neural machine translation. efforts to solve this difficult modeling task. In the following, we present theoretical issues and 1 Introduction related work (Section2), quantitative analysis on This paper analyzes tense and mood in En- the usage of the tense/mood correspondences in glish and German from the perspective of the English-German parallel data and their modeling data commonly used to train MT systems or to in the context of MT (Section3), summarizing the model tense/mood, namely freely available bilin- findings in Section4. gual texts. The need for a thorough analysis of tense/mood in parallel texts arises from the fact 2 Theoretical issues and related work that there is a high degree of variation between 2.1 Contrasts in English and German tense the two languages resulting in a many-to-many re- and mood systems lation in the tense/mood translation between En- glish and German. Particularly, frequently oc- As known from contrastive grammars (König curring unintuitive tense correspondences and the and Gast, 2012; Hawkins, 2015), English low frequency of the many tense/mood combina- and German share a common ground of six tions is problematic for different NLP tasks us- morpho-syntactic tenses: present/Präsens, simple ing parallel corpora. We study the correspon- past/Präteritum, present perfect/Perfekt, past dences in a large English-German parallel corpus perfect/Plusquamperfekt, future I/Futur I and and explain them from the point of view of dif- future II/Futur II. We summarize those with ferent pragmatic factors – contextual constraints examples in both languages in Table1, which in terms of genre/user preferences or textual prop- we created to show the correspondence between erties, and tense interchangeability. We compare these languages. In English, each of the tenses English and German morpho-syntactic tense sets has a progressive variant. The German tense suffering from tense correspondence gaps in both system does not have an explicit marking of the directions and discuss the impact of translation progressive aspect. But German has a larger process on the tense/mood variability in our data. set of subjunctive tense forms. While a few of Morph. English German tense Synt. tense Example Synt. tense Example present simple (I) read Präsens (Ich) lese present progressive (I) am reading present perfect (I) have read Perfekt (Ich) habe gelesen present perfect progressive (I) have been reading (I) will read present future I Futur I (Ich) werde lesen (I) am going to read (I) will be reading future I progressive (I) am going to be reading future II (I) will have read Futur II (Ich) werde gelesen haben future II progressive (I) will have been reading past simple (I) read Präteritum (Ich) las past progressive (I) was reading Plusquam- past past perfect (I) had read (Ich) hatte gelesen perfekt past perfect progressive (I) had been reading present* conditional I (I) would read Konjunktiv II (Ich) würde lesen conditional I progressive (I) would be reading past* conditional II (I) would have read Konjunktiv II (Ich) hätte gelesen conditional II progressive (I) would have been reading (Er) lese present* Konjunktiv I (Er) werde lesen Table 1: List of the tenses in English and German in active voice. The table indicates the tense correspondences in terms of their morpho-syntactic structure. them have direct morpho-syntactic counterparts (1). Such sentences may, for instance, indicate po- in English, most of them correspond to indicative liteness. tenses in English. The meaning of a specific tense (1) Ich hätte gern ein Glas Wasser. form may considerably vary too. We summarize I have gladly a glass water. the contrasts related to the meaning of the English ’I’d like to have a glass of water.’ and German tenses described by König and Gast(2012) in Table2. This description refers Both Konjunktiv I and Konjunktiv II can be used in to different aspects such as the time reference the context of the reported speech. Note, however, (past, futurate, future, etc.) and relation to the that the use of the subjunctive mood is not gram- moment of utterance (resultative, universal, matically required to signal reported speech. In narrative). In other words, the (non-)parallelism fact, the two Konjunktiv forms and the indicative of the respective tenses can be established by mood are often used interchangeably in reported considering specific semantic properties of a given speech (Csipak, 2015). For the English subjunc- verb and the utterance that the respective verb tive mood, König and Gast(2012) rather use the occurs in. Different aspects in the English tense term quasi-subjunctive, since subjunctive mood in system have different impacts on the use of tenses. English exists only for the verb be. Other forms For instance, in contrast to the simple present used in the subjunctive contexts correspond to the tense, the present progressive can be used in the infinitives. The German Präsens and Futur I are futurate context. In German, Präsens can almost interchangeable in many contexts. In the futurate always be used to refer to the future. The English use, Präsens is usually combined with a tempo- progressive tense lacks direct counterparts in ral phrase which points to the future; in (2), the German and is therefore translated into a number adverbial morgen provides the respective temporal of different German tenses. information. However, the temporal phrase is not always overtly given in a considered sentence: in English and German also differ greatly with re- (3), the verb kommen in the present tense refers to spect to the grammatical mood. In German, the the future which is obvious solely by considering subjunctive is expressed in the verbal morphology the preceding sentence. and interacts with the German tense system chang- ing the time of an utterance. German distinguishes (2) Ich komme morgen. I come tomorrow. between two subjunctive morpho-syntactic forms: ’I’ll come tomorrow.’ Konjunktiv I and Konjunktiv II. The latter is used (3) Kommst du morgen? Ja, ich komme. in the context of conditional and contrafactual ut- Come you tomorrow? Yes, I come. terances. Usually, sentences with Konjunktiv II are ’I’ll come tomorrow.’ composed of at least two clauses. There are, how- ever, also free factive occurrences of Konjunktiv II, Another prominent example of tense interchange- where it occurs in a simple sentence, see Example ability in German is related to the past tenses. Use German English Präsens/present tense non-past Ich schlafe von 12 bis 7. I sleep from midnight to seven. futurate Morgen weiß ich das. ! future tense (I will know that tomorrow.) Präteritum/simple past past time Ich schlief den ganzen Tag. I slept the whole day. Futur I/future tense future time Ich werde schlafen. I will sleep. I am going to sleep. Perfekt/present perfect resultative Jemand hat mein Auto gestohlen. Someone has stolen my car. existential Ich habe (schon mal) Tennis gespielt. I have played tennis. hot news Kanzler Schröder ist zurückgetreten. Chancellor Schröder has resigned. universal ! Präsens (Ich lebe hier seit 2 jahren.) I have lived here for two years. narrative Ich bin gestern im Theater gewesen. ! past tense (I was in theater yesterday.) Futur II/future perfect future Ich werde das bis morgen erledigt haben. I will have done this by tomorrow. results Plusquamperfekt/past perfect pre-past Ich hatte geschlafen. I had slept. Table 2: Meaning of tenses in English and German (König and Gast, 2012, p. 92) There are some fine-grained differences between ent (i.e., domain-specific) distributional specifics: the respective tenses, but at least Präteritum and past tenses are rather typical for narrative texts, Perfekt are interchangeable in many contexts, while present tense verbs are more typical for ar- see Sammon(2002). In fact, the dominance of ei- gumentative texts such as political essays, popu- ther of the forms is a matter of author’s preference lar science articles, etc. These findings are in line or contextual constraints, see 2.2 below. For in- with the classification of tenses proposed by Wein- stance, Perfekt is often used in spoken language, rich(2001). In addition to contextual constraints while Präteritum is more frequently used in writ- expressed in genre or register, translation of tenses ing. Furthermore, there is a certain lexical prefer- may also follow a set of rules defined for a spe- ence: auxiliaries and modals are more frequently cific translation project.