Modeling Verbal Inflection for English to German

Total Page:16

File Type:pdf, Size:1020Kb

Modeling Verbal Inflection for English to German Modeling verbal inflection for English to German SMT Anita Ramm Alexander Fraser IMS CIS University of Stuttgart, Germany University of Munich, Germany [email protected] [email protected] Abstract not handle the verbs, ensuring neither that verbs appear in the correct position (which is a problem German verbal inflection is frequently due to the highly divergent word order of English wrong in standard statistical machine and German), nor that verbs are correctly inflected translation approaches. German verbs (problematic due to the richer system of verbal in- agree with subjects in person and num- flection in German). In many cases, verbs do not ber, and they bear information about mood match their subjects (in person and number) which and tense. For subject–verb agreement, makes understanding of translations difficult. In we parse German MT output to iden- addition to person and number, the German verbal tify subject–verb pairs and ensure that the inflection also includes information about tense verb agrees with the subject. We show and mood. If these are wrong (i.e. do not cor- that this approach improves subject-verb respond to the tense/mood in the source), very agreement. We model tense/mood transla- important information, such as point of time and tion from English to German by means of modality of an action/state expressed by the verb, a statistical classification model. Although is incorrect. This can lead to false understanding our model shows good results on well- of the overall sentence. formed data, it does not systematically improve tense and mood in MT output. In this paper, we reimplement the nominal in- Reasons include the need for discourse flection modeling for translation to German pre- knowledge, dependency on the domain, sented by Fraser et al. (2012) and combine it and stylistic variety in how tense/mood is with the reordering of the source data (Gojun and translated. We present a thorough analysis Fraser, 2012). In a novel extension, we present a of these problems. method for correction of the agreement errors, and an approach for modeling the translation of tense 1 Introduction and mood from English into German. While the subject-verb agreement problems are dealt with Statistical machine translation of English into Ger- successfully, modeling of tense/mood translation man faces two main problems involving verbs: (i) is problematic due to many reasons which we will correct placement of the verbs, and (ii) generation analyze in detail. of the appropriate inflection for the verb. The position of verbs in German and English In Section 2, we give an overview of the pro- differs greatly and often large-range reorderings cessing pipeline for handling verbal inflection. are needed to place the German verbs in the cor- The method for handling subject–verb agreement rect positions. Gojun and Fraser (2012) showed errors is described in Section 3, while modeling of that the preordering approach applied on English– tense/mood translation is presented in Section 4. to–German SMT overcomes large problems with The impact of the proposed methods for modeling both missing and misplaced verbs. verbal inflection on the quality of the MT output is Fraser et al. (2012) proposed an approach for shown in Section 5. An extensive discussion of the handling inflectional problems in English to Ger- problems related to modeling tense/mood is given man SMT, focusing on the problems of sparsity in Section 6. Finally, future work is presented in caused by nominal inflection. However, they do Section 7. 21 Proceedings of the First Conference on Machine Translation, Volume 1: Research Papers, pages 21–31, Berlin, Germany, August 11-12, 2016. c 2016 Association for Computational Linguistics MT: reordered EN + stemmed DE 2 Overall architecture + DE inflection generation new drugs might lung , ovarian cancer slow 2.1 Ensuring correct German verb placement neue Medikamantekonnte Lungen− und Eierstockkrebs verlangsamen Different positions of verbs in English and Ger- Agreement Tense/Mood − parse − derive features man often require word movements over a large − find SV pairs − predict tense/mood with CRF Medikamente + können distance. This leads to two problems in German − map subject morph to verb translations generated by SMT systems concern- ing the verbs: either the verbs are not generated at 3.Pl Past.Subj all, or they are placed incorrectly. Generate To ensure that our MT output contains the maxi- − SMOR − stem + morph features mum number of (correctly placed) finite verbs, we können + 3.Pl.Past.Subj reorder English prior to training and translation us- ing a small set of reordering rules originally de- könnten scribed by Gojun and Fraser (2012). The verbs in the English part of the training, tuning and testing Figure 1: Processing pipeline. The verbal inflec- data are moved to the positions typical for German tion modeling consists of two components: (i) a which increases the syntactic similarity of English component for deriving agreement features person and German sentences. We train an SMT system and number, and (ii) a component for predicting on the reordered English and apply it to the re- tense and mood. The inflected verbs are generated ordered English test set. with SMOR (Schmid et al., 2004), a morphology This approach has good results in terms of the generation tool for German. position of the verbs in German translations. How- ever, the problem of incorrect verbal inflection baseline by identifying finite verbs in the baseline is unresolved. In fact, the reordering makes the MT output, predicting their morphological fea- agreement problems even worse due to move- tures and finally producing the correct inflected ments of verbs away from their subjects (cf. Sec- output (see Figure 1). tion 3.1). Verbal morphological features include informa- tion about person/number, as well as tense and 2.2 Inflection of the German SMT output mood. Particularly the modeling of tense/mood Fraser et al. (2012) proposed a method for han- translation is interesting: in this paper, we present dling nominal inflection for English to German a method to model the translation of English tense SMT. They work with a stemmed representation of and mood into German considering all German the German words in which certain morphological tenses/moods in a single model. In addition, features such as case, number, etc. are omitted. we present a detailed discussion which is, to our After the translation step, for nominal stemmed knowledge, the first deep analysis of this topic. words in the MT output, morphological features The processing pipeline is given in Figure 1. are predicted using a set of pre-trained classifiers After translation of the reordered English input to and finally surface forms are generated resulting a German stem-like representation, the nominal in fully-inflected German MT output. feature prediction is performed followed by our In their approach, the verbs are neither stemmed novel verbal feature prediction. Finally, the entire nor inflected, but instead handled as normal words. German MT output is inflected by combining the Thus, in the translation step, the decoder (in inter- stems and the predicted features to produce sur- action with the German language model) decides face forms (normal words). on the inflected verb forms in the final MT output. 3 Correction of the subject–verb 2.3 Adding verbal inflection modeling agreement As a baseline SMT system, we use a system trained on the reordered English sentences (cf. 3.1 Problem description Section 2.1) and stemmed German data with nom- In many languages, the subject is located near the inal inflection modeling as a post-processing step corresponding finite verb. However, in languages (cf. Section 2.2). In our system, we extend the such as German, the subject might be very far from 22 Data avg dist in words >5 words that I1.Sg. that yesterday to you said1.Sg. News 3.9 24% Europarl 3.7 22% Crawled 2.9 15% dass ich1.Sg. dir das gestern gesagt habe1.Sg. Table 1: Subject–verb distances in German texts. Figure 2: Example of a subject–verb distance caused by the reordering of the English clause the verb. We extracted subject–verb pairs from ’that I said that yesterday to you’. German corpora and computed their distances. The results are summarized in Table 1. News and Europarl are composed of more com- 3.2 Parsing for detection of subject–verb plex sentences than the corpus crawled from the pairs internet. While in the crawled data, there are more sentences with smaller subject–verb dis- tances, News and Europarl expose larger distances Agreement correction depends on correct identifi- between subjects and finite verbs. cation of subject–verb pairs. Although we work with English parses where the subjects can be cor- Although the average distance in words is rather rectly identified in many cases, this information small, there is a fair amount of subject–verb pairs source seems not to be sufficient. Problematic are with distance larger than 5 words (in Europarl syntactic divergences where the English subject 22%, in News 25%) which are problematic for does not correspond to the German subject. training the translation system. Even for small distances, it is not guaranteed that the agreement Initially, we aimed at predicting agreement fea- is generated correctly due to the missing appro- tures. However, we were not able to build a clas- priate translation phrases. Moreover, the German sifier with satisfying results due to the problems language model trained on the same data would mentioned above. We thus applied a method im- probably have problems to extract n-grams which plemented in Depfix (Rosa et al., 2012). They ensure the correct subject–verb agreement for all parse the MT output, extract subject–verb pairs possible subject–verb combinations.
Recommended publications
  • On Root Modality and Thematic Relations in Tagalog and English*
    Proceedings of SALT 26: 775–794, 2016 On root modality and thematic relations in Tagalog and English* Maayan Abenina-Adar Nikos Angelopoulos UCLA UCLA Abstract The literature on modality discusses how context and grammar interact to produce different flavors of necessity primarily in connection with functional modals e.g., English auxiliaries. In contrast, the grammatical properties of lexical modals (i.e., thematic verbs) are less understood. In this paper, we use the Tagalog necessity modal kailangan and English need as a case study in the syntax-semantics of lexical modals. Kailangan and need enter two structures, which we call ‘thematic’ and ‘impersonal’. We show that when they establish a thematic dependency with a subject, they express necessity in light of this subject’s priorities, and in the absence of an overt thematic subject, they express necessity in light of priorities endorsed by the speaker. To account for this, we propose a single lexical entry for kailangan / need that uniformly selects for a ‘needer’ argument. In thematic constructions, the needer is the overt subject, and in impersonal constructions, it is an implicit speaker-bound pronoun. Keywords: modality, thematic relations, Tagalog, syntax-semantics interface 1 Introduction In this paper, we observe that English need and its Tagalog counterpart, kailangan, express two different types of necessity depending on the syntactic structure they enter. We show that thematic constructions like (1) express necessities in light of priorities of the thematic subject, i.e., John, whereas impersonal constructions like (2) express necessities in light of priorities of the speaker. (1) John needs there to be food left over.
    [Show full text]
  • Classifiers: a Typology of Noun Categorization Edward J
    Western Washington University Western CEDAR Modern & Classical Languages Humanities 3-2002 Review of: Classifiers: A Typology of Noun Categorization Edward J. Vajda Western Washington University, [email protected] Follow this and additional works at: https://cedar.wwu.edu/mcl_facpubs Part of the Modern Languages Commons Recommended Citation Vajda, Edward J., "Review of: Classifiers: A Typology of Noun Categorization" (2002). Modern & Classical Languages. 35. https://cedar.wwu.edu/mcl_facpubs/35 This Book Review is brought to you for free and open access by the Humanities at Western CEDAR. It has been accepted for inclusion in Modern & Classical Languages by an authorized administrator of Western CEDAR. For more information, please contact [email protected]. J. Linguistics38 (2002), I37-172. ? 2002 CambridgeUniversity Press Printedin the United Kingdom REVIEWS J. Linguistics 38 (2002). DOI: Io.IOI7/So022226702211378 ? 2002 Cambridge University Press Alexandra Y. Aikhenvald, Classifiers: a typology of noun categorization devices.Oxford: OxfordUniversity Press, 2000. Pp. xxvi+ 535. Reviewedby EDWARDJ. VAJDA,Western Washington University This book offers a multifaceted,cross-linguistic survey of all types of grammaticaldevices used to categorizenouns. It representsan ambitious expansion beyond earlier studies dealing with individual aspects of this phenomenon, notably Corbett's (I99I) landmark monograph on noun classes(genders), Dixon's importantessay (I982) distinguishingnoun classes fromclassifiers, and Greenberg's(I972) seminalpaper on numeralclassifiers. Aikhenvald'sClassifiers exceeds them all in the number of languages it examines and in its breadth of typological inquiry. The full gamut of morphologicalpatterns used to classify nouns (or, more accurately,the referentsof nouns)is consideredholistically, with an eye towardcategorizing the categorizationdevices themselvesin terms of a comprehensiveframe- work.
    [Show full text]
  • The Term Declension, the Three Basic Qualities of Latin Nouns, That
    Chapter 2: First Declension Chapter 2 covers the following: the term declension, the three basic qualities of Latin nouns, that is, case, number and gender, basic sentence structure, subject, verb, direct object and so on, the six cases of Latin nouns and the uses of those cases, the formation of the different cases in Latin, and the way adjectives agree with nouns. At the end of this lesson we’ll review the vocabulary you should memorize in this chapter. Declension. As with conjugation, the term declension has two meanings in Latin. It means, first, the process of joining a case ending onto a noun base. Second, it is a term used to refer to one of the five categories of nouns distinguished by the sound ending the noun base: /a/, /ŏ/ or /ŭ/, a consonant or /ĭ/, /ū/, /ē/. First, let’s look at the three basic characteristics of every Latin noun: case, number and gender. All Latin nouns and adjectives have these three grammatical qualities. First, case: how the noun functions in a sentence, that is, is it the subject, the direct object, the object of a preposition or any of many other uses? Second, number: singular or plural. And third, gender: masculine, feminine or neuter. Every noun in Latin will have one case, one number and one gender, and only one of each of these qualities. In other words, a noun in a sentence cannot be both singular and plural, or masculine and feminine. Whenever asked ─ and I will ask ─ you should be able to give the correct answer for all three qualities.
    [Show full text]
  • 30. Tense Aspect Mood 615
    30. Tense Aspect Mood 615 Richards, Ivor Armstrong 1936 The Philosophy of Rhetoric. Oxford: Oxford University Press. Rockwell, Patricia 2007 Vocal features of conversational sarcasm: A comparison of methods. Journal of Psycho- linguistic Research 36: 361−369. Rosenblum, Doron 5. March 2004 Smart he is not. http://www.haaretz.com/print-edition/opinion/smart-he-is-not- 1.115908. Searle, John 1979 Expression and Meaning. Cambridge: Cambridge University Press. Seddiq, Mirriam N. A. Why I don’t want to talk to you. http://notguiltynoway.com/2004/09/why-i-dont-want- to-talk-to-you.html. Singh, Onkar 17. December 2002 Parliament attack convicts fight in court. http://www.rediff.com/news/ 2002/dec/17parl2.htm [Accessed 24 July 2013]. Sperber, Dan and Deirdre Wilson 1986/1995 Relevance: Communication and Cognition. Oxford: Blackwell. Voegele, Jason N. A. http://www.jvoegele.com/literarysf/cyberpunk.html Voyer, Daniel and Cheryl Techentin 2010 Subjective acoustic features of sarcasm: Lower, slower, and more. Metaphor and Symbol 25: 1−16. Ward, Gregory 1983 A pragmatic analysis of epitomization. Papers in Linguistics 17: 145−161. Ward, Gregory and Betty J. Birner 2006 Information structure. In: B. Aarts and A. McMahon (eds.), Handbook of English Lin- guistics, 291−317. Oxford: Basil Blackwell. Rachel Giora, Tel Aviv, (Israel) 30. Tense Aspect Mood 1. Introduction 2. Metaphor: EVENTS ARE (PHYSICAL) OBJECTS 3. Polysemy, construal, profiling, and coercion 4. Interactions of tense, aspect, and mood 5. Conclusion 6. References 1. Introduction In the framework of cognitive linguistics we approach the grammatical categories of tense, aspect, and mood from the perspective of general cognitive strategies.
    [Show full text]
  • Corpus Study of Tense, Aspect, and Modality in Diglossic Speech in Cairene Arabic
    CORPUS STUDY OF TENSE, ASPECT, AND MODALITY IN DIGLOSSIC SPEECH IN CAIRENE ARABIC BY OLA AHMED MOSHREF DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics in the Graduate College of the University of Illinois at Urbana-Champaign, 2012 Urbana, Illinois Doctoral Committee: Professor Elabbas Benmamoun, Chair Professor Eyamba Bokamba Professor Rakesh M. Bhatt Assistant Professor Marina Terkourafi ABSTRACT Morpho-syntactic features of Modern Standard Arabic mix intricately with those of Egyptian Colloquial Arabic in ordinary speech. I study the lexical, phonological and syntactic features of verb phrase morphemes and constituents in different tenses, aspects, moods. A corpus of over 3000 phrases was collected from religious, political/economic and sports interviews on four Egyptian satellite TV channels. The computational analysis of the data shows that systematic and content morphemes from both varieties of Arabic combine in principled ways. Syntactic considerations play a critical role with regard to the frequency and direction of code-switching between the negative marker, subject, or complement on one hand and the verb on the other. Morph-syntactic constraints regulate different types of discourse but more formal topics may exhibit more mixing between Colloquial aspect or future markers and Standard verbs. ii To the One Arab Dream that will come true inshaa’ Allah! عربية أنا.. أميت دمها خري الدماء.. كما يقول أيب الشاعر العراقي: بدر شاكر السياب Arab I am.. My nation’s blood is the finest.. As my father says Iraqi Poet: Badr Shaker Elsayyab iii ACKNOWLEDGMENTS I’m sincerely thankful to my advisor Prof. Elabbas Benmamoun, who during the six years of my study at UIUC was always kind, caring and supportive on the personal and academic levels.
    [Show full text]
  • 1 Noun Classes and Classifiers, Semantics of Alexandra Y
    1 Noun classes and classifiers, semantics of Alexandra Y. Aikhenvald Research Centre for Linguistic Typology, La Trobe University, Melbourne Abstract Almost all languages have some grammatical means for the linguistic categorization of noun referents. Noun categorization devices range from the lexical numeral classifiers of South-East Asia to the highly grammaticalized noun classes and genders in African and Indo-European languages. Further noun categorization devices include noun classifiers, classifiers in possessive constructions, verbal classifiers, and two rare types: locative and deictic classifiers. Classifiers and noun classes provide a unique insight into how the world is categorized through language in terms of universal semantic parameters involving humanness, animacy, sex, shape, form, consistency, orientation in space, and the functional properties of referents. ABBREVIATIONS: ABS - absolutive; CL - classifier; ERG - ergative; FEM - feminine; LOC – locative; MASC - masculine; SG – singular 2 KEY WORDS: noun classes, genders, classifiers, possessive constructions, shape, form, function, social status, metaphorical extension 3 Almost all languages have some grammatical means for the linguistic categorization of nouns and nominals. The continuum of noun categorization devices covers a range of devices from the lexical numeral classifiers of South-East Asia to the highly grammaticalized gender agreement classes of Indo-European languages. They have a similar semantic basis, and one can develop from the other. They provide a unique insight into how people categorize the world through their language in terms of universal semantic parameters involving humanness, animacy, sex, shape, form, consistency, and functional properties. Noun categorization devices are morphemes which occur in surface structures under specifiable conditions, and denote some salient perceived or imputed characteristics of the entity to which an associated noun refers (Allan 1977: 285).
    [Show full text]
  • Higher-Order Grammatical Features in Multilingual BERT Isabel Papadimitriou Ethan A
    Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT Isabel Papadimitriou Ethan A. Chi Stanford University Stanford University [email protected] [email protected] Richard Futrell Kyle Mahowald University of California, Irvine University of California, Santa Barbara [email protected] [email protected] Abstract We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment (how different languages define what counts as a “subject”) is manifested across the embedding spaces of different languages. To understand if and how morphosyntactic alignment affects contextual embedding spaces, we train classifiers to recover the subjecthood of mBERT embeddings in transitive sentences (which do not contain overt information about morphosyntactic alignment) and then evaluate Figure 1: Top: Illustration of the difference between them zero-shot on intransitive sentences alignment systems. A (for agent) is notation used for (where subjecthood classification depends on the transitive subject, and O for the transitive ob- alignment), within and across languages. We ject:“The lawyer chased the dog.” S denotes the find that the resulting classifier distributions intransitive subject: “The lawyer laughed.” The blue reflect the morphosyntactic alignment of their circle indicates which roles are marked as “subject” in training languages. Our results demonstrate each system. that mBERT representations are influenced by Bottom: Illustration of the training and test process. high-level grammatical features that are not We train a classifier to distinguish A from O arguments manifested in any one input sentence, and that using the BERT contextual embeddings, and test the this is robust across languages. Further ex- classifier’s behavior on intransitive subjects (S).
    [Show full text]
  • Declension of Nouns
    DECLENSION OF NOUNS In English, the relationship between words in a sentence depends primarily on word order. The difference between the god desires the girl and the girl desires the god is immediately apparent to us. Latin does not depend on word order for basic meaning, but on inflections (changes in the endings of words) to indicate the function of words within a sentence. Thus the god desires the girl can be expressed in Latin deus puellam desiderat, puellam deus desiderat, or desiderat puellam deus without any change in basic meaning. The accusative ending of puellam shows that the girl is being acted upon (i.e., is the object of the verb) and is not the actor (i.e., the subject of the verb). Similarly, the nominative form of deus shows that the god is the actor (agent) in the sentence, not the object of the verb. The inflection of nouns is called declension. The individual declensions are called cases, and together they form the case system. Nouns, pronouns, adjectives and participles are declined in six Cases: nominative, genitive, dative, accusative, ablative, and vocative and two Numbers (singular and plural). (The locative, an archaic case, existed in the classical period only for a few words). Nominative Indicates the subject of a sentence. (The boy loves the book). Genitive Indicates possession. (The boy loves the girl’s book). Dative Indicates indirect object. (The boy gave the book to the girl). Accusative Indicates direct object. (The boy loves the book). Ablative Answers the questions from where? by what means? how? from what cause? in what manner? when? or where? The ablative is used to show separation (from), instrumentality or means (by, with), accompaniment (with), or locality (at).
    [Show full text]
  • The Position of Subjects*
    Lingua 85 (1991) 21 l-258. North-Holland 211 The position of subjects* Hilda Koopman and Dominique Sportiche Department of Linguistics, UCLA, Los Angeles, CA 90024, USA Grammatical theories all use in one form or another the concept of canonical position of a phrase. If this notion is used in the syntax, when comparing the two sentences: (la) John will see Bill. (1 b) Bill John will see. we say that Bill occupies its canonical position in (la) but not in (lb). Adopting the terminology of the Extended Standard Theory, we can think of the canonical position of a phrase as its D-structure position. Since the concept of canonical position is available, it becomes legitimate to ask of each syntactic unit in a given sentence what its canonical position is, relative to the other units of the sentence. The central question we address in this article is: what is the canonical position of subjects1 Starting with English, we propose that the structure of an English clause is as in (2): * The first section of this article has circulated as part of Koopman and Sportiche (1988) and is a written version of talks given in various places. It was given in March 1985 at the GLOW conference in Brussels as Koopman and Sportiche (1985), at the June 1985 CLA meeting in Montreal, at MIT and Umass Amherst in the winter of 1986, and presented at UCLA and USC since. The input of these audiences is gratefully acknowledged. The second section is almost completely new. 1 For related ideas on what we call the canonical postion of subjects, see Contreras (1987), Kitagawa (1986) Kuroda (1988), Speas (1986) Zagona (1982).
    [Show full text]
  • The Grammatical Category of Modality∗
    The grammatical category of modality∗ Valentine Hacquard University of Maryland, College Park, Maryland, United States [email protected] Abstract In many languages, the same words are used to express epistemic and root modality. These modals further tend to interact with tense and aspect in systematic ways, based on their interpretation. Is this pattern accidental, or a consequence of grammar or mean- ing? I address this question by: (i) comparing `grammatical' modals to verbs/adjectives that share meanings with modals, but not the same scope constraints; (ii) examining pat- terns of grammaticalization from `lexical' to `grammatical' modality; (iii) comparing scope interactions in languages where modals are 'polysemous' and in those where they are not. 1 Introduction In many languages, the same modal words are used to express a variety of `root' and `epistemic' meanings. English may, for instance, can express deontic or epistemic possibility. About half of the 200+ languages in [vdAA05] have a single form that is used to express both kinds of modality. Yet, in many other languages, modal markers are unambiguously determined for meaning. In the Kratzerian tradition ([Kra81, Kra91, Kra12]), modals are lexically specified only for force (as existential or universal quantifiers over worlds), and the various meanings a `polysemous'1 modal expresses arise from the modal combining with various modal bases and ordering sources. This account, based on the case of polysemous languages, easily extends to non-polysemous ones: a modal can further lexically specify the kind of modal base and ordering source it allows, restricting its meaning to a single epistemic or root meaning. An alternative account based on non-polysemous cases would provide separate lexical entries for the various modals, and extend to polysemous cases by postulating ambiguity.
    [Show full text]
  • Basic German: a Grammar and Workbook
    BASIC GERMAN: A GRAMMAR AND WORKBOOK Basic German: A Grammar and Workbook comprises an accessible reference grammar and related exercises in a single volume. It introduces German people and culture through the medium of the language used today, covering the core material which students would expect to encounter in their first years of learning German. Each of the 28 units presents one or more related grammar topics, illustrated by examples which serve as models for the exercises that follow. These wide-ranging and varied exercises enable the student to master each grammar point thoroughly. Basic German is suitable for independent study and for class use. Features include: • Clear grammatical explanations with examples in both English and German • Authentic language samples from a range of media • Checklists at the end of each Unit to reinforce key points • Cross-referencing to other grammar chapters • Full exercise answer key • Glossary of grammatical terms Basic German is the ideal reference and practice book for beginners but also for students with some knowledge of the language. Heiner Schenke is Senior Lecturer in German at the University of Westminster and Karen Seago is Course Leader for Applied Translation at the London Metropolitan University. Other titles available in the Grammar Workbooks series are: Basic Cantonese Intermediate Cantonese Basic Chinese Intermediate Chinese Intermediate German Basic Polish Intermediate Polish Basic Russian Intermediate Russian Basic Welsh Intermediate Welsh Titles of related interest published
    [Show full text]
  • Modeling the Position and Inflection of Verbs in English to German
    Modeling the position and inflection of verbs in English to German machine translation Von der Fakult¨atInformatik, Elektrotechnik und Informationstechnik der Universit¨at Stuttgart zur Erlangung der W¨urdeeines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung. Vorgelegt von Anita Ramm aus Vukovar, Kroatien Hauptberichter Prof. Dr. Alexander Fraser 1. Mitberichterin PD Dr. Sabine Schulte im Walde 2. Mitberichter Prof. Dr. Jonas Kuhn Tag der m¨undlichen Pr¨ufung:23.03.2018 Institut f¨urMaschinelle Sprachverarbeitung der Universit¨atStuttgart 2018 Abstract Machine translation (MT) is automatic translation of speech or text from a source lan- guage (SL) into a target language (TL). Machine translation is performed without hu- man interaction: a text or a speech signal is used as input to a computer program which automatically generates translation of the input data. There are two main approaches to machine translation: rule-based and corpus-based. Rule-based MT systems rely on manually or semi-automatically acquired rules which describe lexical, as well as syntactic correspondences between SL and TL. Corpus-based methods learn such correspondences automatically using large sets of parallel texts, i.e., texts which are translations of each other. Corpus-based methods such as statistical machine translation (SMT) and neural machine translation (NMT) rely on statistics which express the probability of translat- ing a specific SL translation unit into a specific TL unit (typically, word and/or word sequence). While SMT is a combination of different statistical models, NMT makes use of a neural network which encodes parallel SL and TL contexts in a more sophisticated way. Many problems have been observed when translating from English into German using SMT.
    [Show full text]