Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information

Total Page:16

File Type:pdf, Size:1020Kb

Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 223-230. Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information Sabine Schulte im Walde Chris Brew Institut für Maschinelle Sprachverarbeitung Department of Linguistics Universität Stuttgart The Ohio State University Azenbergstraße 12, 70174 Stuttgart, Germany Columbus, USA, OH 43210-1298 [email protected] [email protected] Abstract since it provides a principled basis for filling gaps in available lexical knowledge. For example, the En- The paper describes the application of k- glish verb classification has been used for applica- Means, a standard clustering technique, tions such as machine translation (Dorr, 1997), word to the task of inducing semantic classes sense disambiguation (Dorr and Jones, 1996), and for German verbs. Using probability document classification (Klavans and Kan, 1998). distributions over verb subcategorisation Various attempts have been made to infer conve- frames, we obtained an intuitively plausi- niently observable morpho-syntactic and semantic ble clustering of 57 verbs into 14 classes. properties for English verb classes (Dorr and Jones, The automatic clustering was evaluated 1996; Lapata, 1999; Stevenson and Merlo, 1999; against independently motivated, hand- Schulte im Walde, 2000; McCarthy, 2001). constructed semantic verb classes. A series of post-hoc cluster analyses ex- To our knowledge this is the first work to ob- plored the influence of specific frames and tain German verb classes automatically. We used frame groups on the coherence of the verb a robust statistical parser (Schmid, 2000) to ac- classes, and supported the tight connec- quire purely syntactic subcategorisation information tion between the syntactic behaviour of for verbs. The information was provided in form the verbs and their lexical meaning com- of probability distributions over verb frames for ponents. each verb. There were two conditions: the first with relatively coarse syntactic verb subcategorisa- tion frames, the second a more delicate classifica- 1 Introduction tion subdividing the verb frames of the first con- A long-standing linguistic hypothesis asserts a tight dition using prepositional phrase information (case connection between the meaning components of a plus preposition). In both conditions verbs were verb and its syntactic behaviour: To a certain ex- clustered using k-Means, an iterative, unsupervised, tent, the lexical meaning of a verb determines its be- hard clustering method with well-known properties, haviour, particularly with respect to the choice of its cf. (Kaufman and Rousseeuw, 1990). The goal of a arguments. The theoretical foundation has been es- series of cluster analyses was (i) to find good values tablished in extensive work on semantic verb classes for the parameters of the clustering process, and (ii) such as (Levin, 1993) for English and (Vázquez to explore the role of the syntactic frame descrip- et al., 2000) for Spanish: each verb class contains tions in verb classification, to demonstrate the im- verbs which are similar in their meaning and in their plicit induction of lexical meaning components from syntactic properties. syntactic properties, and to suggest ways in which From a practical point of view, a verb classifi- the syntactic information might further be refined. cation supports Natural Language Processing tasks, Our long term goal is to support the development of high-quality and large-scale lexical resources. sitional phrases, according to their frequencies in the corpus. Prepositional phrases are referred to by 2 Syntactic Descriptors for Verb Frames case and preposition, such as ‘Dat.mit’, ‘Akk.für’. The resulting lexical subcategorisation for reden and The syntactic subcategorisation frames for German the frame type np whose total joint probability is verbs were obtained by unsupervised learning in a 0.35820, is displayed in Table 2 (for probability val- statistical grammar framework (Schulte im Walde et ues 1%). al., 2001): a German context-free grammar contain- ing frame-predicting grammar rules and information Refined Frame Prob about lexical heads was trained on 25 million words np:Akk.über acc / ‘about’ 0.11981 np:Dat.von dat / ‘about’ 0.11568 of a large German newspaper corpus. The lexi- np:Dat.mit dat / ‘with’ 0.06983 calised version of the probabilistic grammar served np:Dat.in dat / ‘in’ 0.02031 as source for syntactic descriptors for verb frames Table 2: Refined np distribution for reden (Schulte im Walde, 2002b). The verb frame types contain at most three The subcategorisation frame descriptions were for- arguments. Possible arguments in the frames mally evaluated by comparing the automatically are nominative (n), dative (d) and accusative (a) generated verb frames against manual definitions in noun phrases, reflexive pronouns (r), prepositional the German dictionary Duden – Das Stilwörterbuch phrases (p), expletive es (x), non-finite clauses (i), (Dudenredaktion, 2001). The F-score was 65.30% finite clauses (s-2 for verb second clauses, s-dass for with and 72.05% without prepositional phrase in- dass-clauses, s-ob for ob-clauses, s-w for indirect formation: the automatically generated data is both wh-questions), and copula constructions (k). For easy to produce in large quantities and reliable example, subcategorising a direct (accusative case) enough to serve as proxy for human judgement object and a non-finite clause would be represented (Schulte im Walde, 2002a). by nai. We defined a total of 38 subcategorisation frame types, according to the verb subcategorisa- 3 German Semantic Verb Classes tion potential in the German grammar (Helbig and Semantic verb classes have been defined for sev- Buscha, 1998), with few further restrictions on ar- eral languages, with dominant examples concern- gument combination. ing English (Levin, 1993) and Spanish (Vázquez et We extracted verb-frame distributions from the al., 2000). The basic linguistic hypothesis underly- trained lexicalised grammar. Table 1 shows an ing the construction of the semantic classes is that example distribution for the verb glauben ‘to verbs in the same class share both meaning compo- think/believe’ (for probability values 1%). nents and syntactic behaviour, since the meaning of Frame Prob a verb is supposed to influence its behaviour in the ns-dass 0.27945 sentence, especially with regard to the choice of its ns-2 0.27358 np 0.09951 arguments. n 0.08811 We hand-constructed a concise classification with na 0.08046 14 semantic verb classes for 57 German verbs before ni 0.05015 nd 0.03392 we initiated any clustering experiments. We have on nad 0.02325 hand a larger set of verbs and a more elaborate clas- nds-2 0.01011 sification, but choose to work on the smaller set for Table 1: Probability distribution for glauben the moment, since an important component of our research program is an informative post-hoc analysis We also created a more delicate version of subcate- which becomes infeasible with larger datasets. The gorisation frames that discriminates between differ- semantic aspects and majority of verbs are closely ent kinds of pp-arguments. This was done by dis- related to Levin’s English classes. They are consis- tributing the frequency mass of prepositional phrase tent with the German verb classification in (Schu- frame types (np, nap, ndp, npr, xp) over the prepo- macher, 1986) as far as the relevant verbs appear in his less extensive semantic ‘fields’. 4 Clustering Methodology 1. Aspect: anfangen, aufhören, beenden, begin- Clustering is a standard procedure in multivariate nen, enden data analysis. It is designed to uncover an inher- 2. Propositional Attitude: ahnen, denken, ent natural structure of the data objects, and the glauben, vermuten, wissen equivalence classes induced by the clusters provide 3. Transfer of Possession (Obtaining): bekom- a means for generalising over these objects. In our men, erhalten, erlangen, kriegen 4. Transfer of Possession (Supply): bringen, case, clustering is realised on verbs: the data objects liefern, schicken, vermitteln, zustellen are represented by verbs, and the data features for 5. Manner of Motion: fahren, fliegen, rudern, describing the objects are realised by a probability segeln distribution over syntactic verb frame descriptions. 6. Emotion: ärgern, freuen Clustering is applicable to a variety of areas in 7. Announcement: ankündigen, bekanntgeben, Natural Language Processing, e.g. by utilising eröffnen, verkünden class type descriptions such as in machine transla- 8. Description: beschreiben, charakterisieren, tion (Dorr, 1997), word sense disambiguation (Dorr darstellen, interpretieren and Jones, 1996), and document classification (Kla- 9. Insistence: beharren, bestehen, insistieren, vans and Kan, 1998), or by applying clusters for pochen smoothing such as in machine translation (Prescher 10. Position: liegen, sitzen, stehen et al., 2000), or probabilistic grammars (Riezler et 11. Support: dienen, folgen, helfen, unterstützen al., 2000). 12. Opening: öffnen, schließen We performed clustering by the k-Means algo- 13. Consumption: essen, konsumieren, lesen, rithm as proposed by (Forgy, 1965), which is an un- saufen, trinken supervised hard clustering method assigning data 14. Weather: blitzen, donnern, dämmern, nieseln, objects to exactly ¡ clusters. Initial verb clusters are regnen, schneien iteratively re-organised by assigning each verb to its The class size is between 2 and 6, no verb
Recommended publications
  • Basic German: a Grammar and Workbook
    BASIC GERMAN: A GRAMMAR AND WORKBOOK Basic German: A Grammar and Workbook comprises an accessible reference grammar and related exercises in a single volume. It introduces German people and culture through the medium of the language used today, covering the core material which students would expect to encounter in their first years of learning German. Each of the 28 units presents one or more related grammar topics, illustrated by examples which serve as models for the exercises that follow. These wide-ranging and varied exercises enable the student to master each grammar point thoroughly. Basic German is suitable for independent study and for class use. Features include: • Clear grammatical explanations with examples in both English and German • Authentic language samples from a range of media • Checklists at the end of each Unit to reinforce key points • Cross-referencing to other grammar chapters • Full exercise answer key • Glossary of grammatical terms Basic German is the ideal reference and practice book for beginners but also for students with some knowledge of the language. Heiner Schenke is Senior Lecturer in German at the University of Westminster and Karen Seago is Course Leader for Applied Translation at the London Metropolitan University. Other titles available in the Grammar Workbooks series are: Basic Cantonese Intermediate Cantonese Basic Chinese Intermediate Chinese Intermediate German Basic Polish Intermediate Polish Basic Russian Intermediate Russian Basic Welsh Intermediate Welsh Titles of related interest published
    [Show full text]
  • Modeling the Position and Inflection of Verbs in English to German
    Modeling the position and inflection of verbs in English to German machine translation Von der Fakult¨atInformatik, Elektrotechnik und Informationstechnik der Universit¨at Stuttgart zur Erlangung der W¨urdeeines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung. Vorgelegt von Anita Ramm aus Vukovar, Kroatien Hauptberichter Prof. Dr. Alexander Fraser 1. Mitberichterin PD Dr. Sabine Schulte im Walde 2. Mitberichter Prof. Dr. Jonas Kuhn Tag der m¨undlichen Pr¨ufung:23.03.2018 Institut f¨urMaschinelle Sprachverarbeitung der Universit¨atStuttgart 2018 Abstract Machine translation (MT) is automatic translation of speech or text from a source lan- guage (SL) into a target language (TL). Machine translation is performed without hu- man interaction: a text or a speech signal is used as input to a computer program which automatically generates translation of the input data. There are two main approaches to machine translation: rule-based and corpus-based. Rule-based MT systems rely on manually or semi-automatically acquired rules which describe lexical, as well as syntactic correspondences between SL and TL. Corpus-based methods learn such correspondences automatically using large sets of parallel texts, i.e., texts which are translations of each other. Corpus-based methods such as statistical machine translation (SMT) and neural machine translation (NMT) rely on statistics which express the probability of translat- ing a specific SL translation unit into a specific TL unit (typically, word and/or word sequence). While SMT is a combination of different statistical models, NMT makes use of a neural network which encodes parallel SL and TL contexts in a more sophisticated way. Many problems have been observed when translating from English into German using SMT.
    [Show full text]
  • Grammar and Word List; • Grammar Explanations Marked with This Symbol
    49510_einleger_01_24 18.03.2005 8:42 Uhr Seite 1 by Herrad Meese Listening and exercise materials for beginners, by Herrad Meese Translations for Series 1: Episodes 1–26 In this booklet you will find translations of the • overview pages of the episodes; • the exercise instructions: Ü • information boxes at the start of the individual attachments: scripts of the audio scenes, solutions, grammar and word list; • grammar explanations marked with this symbol: The Symbols of the overview pages for each episode: Here you will find information on the Here you will find overviews of the contents of the episodes and audio scenes and information which Germany's socio-cultural background. CD the scenes are on. Information Listening Here you will find an overview of This introduces important grammatical the speech intentions and the key structures and teaches you to reco- vocabulary of the episode. gnise them. Understanding Recognising expressions structures Here you can practise the vocabulary. This contains pointers to structures You will also find a list of useful which already occur in the episode but expressions and phrases which are will not be dealt with systematically important for your vocabulary. until later. Remembering A bit more expressions grammar? Here you can test how much you’ve Here you will find learning tips and learned and what you still need to summaries of grammatical structures work on. and some of the vocabulary. Tips Test 1 49510_einleger_01_24 18.03.2005 8:42 Uhr Seite 2 P. 5 OVERVIEW EPISODE 1 P. 9 OVERVIEW EPISODE 2 Information Information Listening strategies Listening strategies • Focus on the sounds.
    [Show full text]
  • THE USE of the PRETERITE and the PRESENT PERFECT in ENGLISH and GERMAN. a CONTRASTIVE ANALYSIS. Institutt for Litteratur, Områ
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by NORA - Norwegian Open Research Archives THE USE OF THE PRETERITE AND THE PRESENT PERFECT IN ENGLISH AND GERMAN. A CONTRASTIVE ANALYSIS. Institutt for litteratur, områdestudier og europeiske språk Det humanistiske fakultet Universitetet i Oslo ENG4190 Høstsemesteret 2011 Jennifer Zug Veileder: Johan Elsness This master thesis deals with the use of the present perfect and the preterite in both English and German. The aim of the investigation is to find out to what extent the use of the present perfect and the preterite vary in English and German and how the original forms are translated into the other language, i.e. whether specific translation patterns can be found. To begin with, a theoretical part summarizes earlier researches by various authors before the main part of the thesis is presented. The research part is a corpus-based analysis, using the Oslo Multilingual Corpus. First of all, the English original preterite forms are looked upon. Three semantic categories could be found in my investigation in which the preterite is used in English: single events, sequences and events occurring regularly. Additionally, the preterite is often used in combination with definite time adverbials. The same categories could also be found when considering the German preterite. Looking at the English translations of the German preterite forms, it is striking that most instances, in total over 90%, were translated into the preterite in English as well. Considering the German translations of the English preterite forms, the preterite is also the preferred tense used in the translations, however, the present perfect appears also frequently in the translations, especially in sentences containing direct speech.
    [Show full text]
  • Modeling Verbal Inflection for English to German
    Modeling verbal inflection for English to German SMT Anita Ramm Alexander Fraser IMS CIS University of Stuttgart, Germany University of Munich, Germany [email protected] [email protected] Abstract not handle the verbs, ensuring neither that verbs appear in the correct position (which is a problem German verbal inflection is frequently due to the highly divergent word order of English wrong in standard statistical machine and German), nor that verbs are correctly inflected translation approaches. German verbs (problematic due to the richer system of verbal in- agree with subjects in person and num- flection in German). In many cases, verbs do not ber, and they bear information about mood match their subjects (in person and number) which and tense. For subject–verb agreement, makes understanding of translations difficult. In we parse German MT output to iden- addition to person and number, the German verbal tify subject–verb pairs and ensure that the inflection also includes information about tense verb agrees with the subject. We show and mood. If these are wrong (i.e. do not cor- that this approach improves subject-verb respond to the tense/mood in the source), very agreement. We model tense/mood transla- important information, such as point of time and tion from English to German by means of modality of an action/state expressed by the verb, a statistical classification model. Although is incorrect. This can lead to false understanding our model shows good results on well- of the overall sentence. formed data, it does not systematically improve tense and mood in MT output.
    [Show full text]
  • Islands of Resilience: the History of the German Strong Verbs from a Systemic Point of View*
    Islands of Resilience: The history of the German strong verbs from a systemic point of view* Robert Mailhammer Abstract Existing accounts of the German strong verbs show notable confusion about their historical development and remarkable diversity with regard to their systemic conceptualisation. Using a principal parts approach, this paper brings out the drastic changes the system of the Germanic strong verbs has undergone since its genesis. In particular, it is shown that the loss of the paradigmatic connection between the phonological root structure of a verb and its ablaut pattern is the key alteration that transformed the strong verbs from a highly homogenous and learner-friendly system to a complicated array of inflection classes whose predictability is no longer based on the citation form stored in the mental lexicon. However, this paper also reveals that it is exactly the remnants of the old system that keep the German strong verbs together as a group and even ensure a limited degree of productivity. The conclusions drawn for a synchronic representation are that the traditional concept of inflection classes, based on the phonological root structure, may be of more value than generally believed and that the conceptualisation of the paradigms of the strong verbs involving stem forms with different ablaut grades as principal parts is supported from the diachronic as well as from the synchronic point of view. Keywords Modern German – strong verbs – principal parts – Minimal Generalisation Hypothesis – word and paradigm morphology 1. Introduction Grammars of Modern Standard German have had difficulties in coming to grips with the group of so- called strong verbs.
    [Show full text]
  • Bi-Particle Adverbs, Pos-Tagging and the Recognition of German Separable Prefix Verbs
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2016 Bi-particle adverbs, PoS-tagging and the recognition of german separable prefix verbs Volk, Martin ; Clematide, Simon ; Graën, Johannes ; Ströbel, Phillip Abstract: In this paper we propose an algorithm for computing the full lemma of German verbs that occur in sentences with a separated prefix. The algorithm is meant for large-scale corpus annotation. It relies on Part-of-Speech tags and works with 97% precision when the tags are correct. Unfortunately there are multi-word adverbs with particles that are homographs with separated verb particles and prepositions. Since the usage as separated particle and preposition is much more frequent, these multi-word adverbs are often incorrectly tagged. We show that special treatment of these bi-particle adverbs improves the re-attachment of separated verb particles. Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-126372 Conference or Workshop Item Accepted Version Originally published at: Volk, Martin; Clematide, Simon; Graën, Johannes; Ströbel, Phillip (2016). Bi-particle adverbs, PoS- tagging and the recognition of german separable prefix verbs. In: KONVENS 2016, Bochum, 19 Septem- ber 2016 - 21 September 2016. Bi-particle Adverbs, PoS-Tagging and the Recognition of German Separable Prefix Verbs Martin Volk, Simon Clematide, Johannes Graen,¨ Phillip Strobel¨ University of Zurich Institute of Computational Linguistics [email protected] Abstract to compute the correct verb lemma. Unfortunately, Part-of-Speech taggers (like the TreeTagger) assign In this paper we propose an algorithm the lemma locally and do not consider the long- for computing the full lemma of German distance dependency between the verb and the pre- verbs that occur in sentences with a sepa- fix.
    [Show full text]
  • German Verbs & Essential of Grammar
    GERMAN VERBS & ESSENTIAL OF GRAMMAR: V. 2 - PT. E PDF, EPUB, EBOOK Charles James | 144 pages | 19 May 2008 | McGraw-Hill Education - Europe | 9780071498036 | English, German | London, United States German Verbs & Essential of Grammar: v. 2 - Pt. E PDF Book Conjugated verbs are used to express the characteristics of person, number, tense, voice and mode in the German language. Gerunds in - ung are feminine and have regular plurals in - en. The basic form is created by putting the word zu before the infinitive. Irregular verbs are, for the most part, strong verbs. When attached, these prefixes are always stressed. The most common permanent prefixes found in German are ver- , ge- , be- , er- , ent- or emp- , and zer-. There are more than strong and irregular verbs, but there is a gradual tendency for strong verbs to become weak. On the other hand, this form is often used in fun or mocking expressions, because the imputed behaviour that is content of the mocking can be merged into a single word. Note that in the explicitly feminine form a second syllable er is omitted, if the infinitive ends on ern or eren. Thank you for supporting ielanguages. German sentence structure normally places verbs in second position or final position. For complex infinitives, adverbial phrases and object phrases are ignored, they do not affect this process; except something else is mentioned. In general we use the present tense, when we are speaking about the future. If one of the two meanings is figurative, the inseparable version stands for this figurative meaning:. With verbs whose roots end in el or er , the e of the infinitive suffix is dropped.
    [Show full text]
  • On Verb Types Causing Pronominal Adverbs in German Language
    Proceedings ON VERB TYPES CAUSING PRONOMINAL ADVERBS IN GERMAN LANGUAGE Herri Akhmad Bukhori State University of Malang [email protected]; [email protected] Abstract Pronominal adverbs in German language are formed by a combination of a locative adverb da ‘there’ and a preposition. The preposition emerges in a construction of a pronominal adverb coming from a preposition as a part of the prepositional verb. This study aims at describing verb types causing the pronominal adverb. Data are collected from German sentences containing pronominal adverbia taken from four famous German novels, namely Winnetou I, II, III, Friede auf Erden and a German short story Herr der Diebe. The analysis method used in the study is a distributional method with substitution technique and permutation. The classification of verb types is based on theory proposed by Quirk and Djajasudarma. The result of analysis shows that (prepositional) verb types causing pronominal adverbs in German language belong to (1) activity verbs, (2) verbs of innert perception and cognition, (3) relational verbs, and (4) verbs of bodily sensation. Key words: verb types, pronominal adverb, German language audiovisual aids INTRODUCTION Verbs are one of the categories or known as parts of speech that usually serve to predicate in a clause or a sentence. In some languages, such as the Indo-European languages according to Sternemann and Urschmidt (1989), verbs have morphological features such as tenses, aspects, personal, or numbers. Most of the verbs represent a semantic element of action, state, or process. Kridalaksana (2001: 226) reveals that verbs in Indonesian are characterized by the possibility of beginning with the word ‘tidak’ (not) and it is not possible to begin with ‘sangat’ (very), more.
    [Show full text]
  • Nominalization in German
    Nominalization in German Tatjana Scheffler November 12, 2005 1 Introduction This paper deals with nominalizations in German. The first part summarizes the facts about German nominalizations. There are many different types of nomi- nalizations in German. We discuss three kinds of nominalizations in particular: infinitival nominals like das Laufen (walking), so-called “stem”-derived nomi- nals like Fahrt (trip, ride), and, prominently, -ung nominals like Verschwendung (wastefulness). After an introduction to the types of nominals (section 2.1), we discuss which verbs can or cannot form the different types of nominaliza- tions (2.2), followed by the semantics of German nominalizations (2.3) and their syntactic behavior (2.4). The picture described in the first part of the paper is compatible with a theory where the semantics of the root that the verb and its nominalization have in common determines which forms can be constructed, and what their syntactic behavior (for example, argument structure) will be. This situation is an argument for Distributed Morphology, which claims that roots do not carry categorical features, and all syntactic behavior that differentiates between categories (e.g., nouns, verbs, adjectives) is determined by functional projections that derive full morphological complexes of the roots. However, there are at least two issues relating to German nominalizations that pose a problem for this framework. They are discussed in the second part of this paper. The first topic is raising. Raising nominals, in contrast to raising verbs, do not exist. Section 3.1 elaborates how this fact can be explained even under the hypothesis that noun roots do not differ from verb roots.
    [Show full text]
  • Modeling Deponency in Germanic Preterite-Present Verbs Using Datr
    University of Kentucky UKnowledge Theses and Dissertations--Linguistics Linguistics 2017 MODELING DEPONENCY IN GERMANIC PRETERITE-PRESENT VERBS USING DATR Marie G. Bourgerie Hunter University of Kentucky, [email protected] Digital Object Identifier: https://doi.org/10.13023/ETD.2017.378 Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Bourgerie Hunter, Marie G., "MODELING DEPONENCY IN GERMANIC PRETERITE-PRESENT VERBS USING DATR" (2017). Theses and Dissertations--Linguistics. 23. https://uknowledge.uky.edu/ltt_etds/23 This Master's Thesis is brought to you for free and open access by the Linguistics at UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Linguistics by an authorized administrator of UKnowledge. For more information, please contact [email protected]. STUDENT AGREEMENT: I represent that my thesis or dissertation and abstract are my original work. Proper attribution has been given to all outside sources. I understand that I am solely responsible for obtaining any needed copyright permissions. I have obtained needed written permission statement(s) from the owner(s) of each third-party copyrighted matter to be included in my work, allowing electronic distribution (if such use is not permitted by the fair use doctrine) which will be submitted to UKnowledge as Additional File. I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and royalty-free license to archive and make accessible my work in whole or in part in all forms of media, now or hereafter known. I agree that the document mentioned above may be made available immediately for worldwide access unless an embargo applies.
    [Show full text]
  • Determining the Placement of German Verbs in English–To–German SMT
    Determining the placement of German verbs in English–to–German SMT Anita Gojun Alexander Fraser Institute for Natural Language Processing University of Stuttgart, Germany {gojunaa, fraser}@ims.uni-stuttgart.de Abstract German language model, making the translations difficult to understand. When translating English to German, exist- A common approach for handling the long- ing reordering models often cannot model range reordering problem within PSMT is per- the long-range reorderings needed to gen- forming syntax-based or part-of-speech-based erate German translations with verbs in the (POS-based) reordering of the input as a prepro- correct position. We reorder English as a cessing step before translation (e.g., Collins et al. preprocessing step for English-to-German SMT. We use a sequence of hand-crafted (2005), Gupta et al. (2007), Habash (2007), Xu reordering rules applied to English parse et al. (2009), Niehues and Kolss (2009), Katz- trees. The reordering rules place English Brown et al. (2011), Genzel (2010)). verbal elements in the positions within the We reorder English to improve the translation clause they will have in the German transla- to German. The verb reordering process is im- tion. This is a difficult problem, as German plemented using deterministic reordering rules on verbal elements can appear in different po- English parse trees. The sequence of reorderings sitions within a clause (in contrast with En- glish verbal elements, whose positions do is derived from the clause type and the composi- not vary as much). We obtain a significant tion of a given verbal complex (a (possibly dis- improvement in translation performance.
    [Show full text]