This Thesis Has Been Submitted in Fulfilment of the Requirements for a Postgraduate Degree (E.G
Total Page:16
File Type:pdf, Size:1020Kb
This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (e.g. PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work is protected by copyright and other intellectual property rights, which are retained by the thesis author, unless otherwise stated. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. Incorporating Pronoun Function into Statistical Machine Translation Liane Guillou I V N E R U S E I T H Y T O H F G E R D I N B U Doctor of Philosophy Institute for Language, Cognition and Computation School of Informatics University of Edinburgh 2016 Abstract Pronouns are used frequently in language, and perform a range of functions. Some pronouns are used to express coreference, and others are not. Languages and genres differ in how and when they use pronouns and this poses a prob- lem for Statistical Machine Translation (SMT) systems (Le Nagard and Koehn, 2010; Hardmeier and Federico, 2010; Novák, 2011; Guillou, 2012; Weiner, 2014; Hardmeier, 2014). Attention to date has focussed on coreferential (anaphoric) pronouns with NP antecedents, which when translated from English into a lan- guage with grammatical gender, must agree with the translation of the head of the antecedent. Despite growing attention to this problem, little progress has been made, and little attention has been given to other pronouns. The central claim of this thesis is that pronouns performing different func- tions in text should be handled differently by SMT systems and when evaluating pronoun translation. This motivates the introduction of a new framework to categorise pronouns according to their function: Anaphoric/cataphoric reference, event reference, extra-textual reference, pleonastic, addressee reference, speaker reference, generic reference, or other function. Labelling pronouns according to their function also helps to resolve instances of functional ambiguity arising from the same pronoun in the source language having multiple functions, each with dif- ferent translation requirements in the target language. The categorisation frame- work is used in corpus annotation, corpus analysis, SMT system development and evaluation. I have directed the annotation and conducted analyses of a parallel corpus of English-German texts called ParCor (Guillou et al., 2014), in which pronouns are manually annotated according to their function. This provides a first step toward understanding the problems that SMT systems face when translating pro- nouns. In the thesis, I show how analysis of manual translation can prove useful in identifying and understanding systematic differences in pronoun use between two languages and can help inform the design of SMT systems. In particular, the anal- ysis revealed that the German translations in ParCor contain more anaphoric and pleonastic pronouns than their English originals, reflecting differences in pronoun use. This raises a particular problem for the evaluation of pronoun translation. Automatic evaluation methods that rely on reference translations to assess pro- noun translation, will not be able to provide an adequate evaluation when the i reference translation departs from the original source-language text. I also show how analysis of the output of state-of-the-art SMT systems can reveal how well current systems perform in translating different types of pronouns and indicate where future efforts would be best directed. The analysis revealed that biases in the training data, for example arising from the use of “it” and “es” as both anaphoric and pleonastic pronouns in both English and German, is a problem that SMT systems must overcome. SMT systems also need to disambiguate the function of those pronouns with ambiguous surface forms so that each pronoun may be translated in an appropriate way. To demonstrate the value of this work, I have developed an automated post- editing system in which automated tools are used to construct ParCor-style anno- tations over the source-language pronouns. The annotations are then used to re- solve functional ambiguity for the pronoun “it” with separate rules applied to the output of a baseline SMT system for anaphoric vs. non-anaphoric instances. The system was submitted to the DiscoMT 2015 shared task on pronoun translation for English-French. As with all other participating systems, the automatic post- editing system failed to beat a simple phrase-based baseline. A detailed analysis, including an oracle experiment in which manual annotation replaces the auto- mated tools, was conducted to discover the causes of poor system performance. The analysis revealed that the design of the rules and their strict application to the SMT output are the biggest factors in the failure of the system. The lack of automatic evaluation metrics for pronoun translation is a limiting factor in SMT system development. To alleviate this problem, Christian Hard- meier and I have developed a testing regimen called PROTEST comprising (1) a hand-selected set of pronoun tokens categorised according to the different prob- lems that SMT systems face and (2) an automated evaluation script. Pronoun translations can then be automatically compared against a reference translation, with mismatches referred for manual evaluation. The automatic evaluation was applied to the output of systems submitted to the DiscoMT 2015 shared task on pronoun translation. This again highlighted the weakness of the post-editing system, which performs poorly due to its focus on producing gendered pronoun translations, and its inability to distinguish between pleonastic and event refer- ence pronouns. ii Lay Summary Pronouns are used frequently in language, to serve different functions. These dif- fer somewhat from language to language, and this poses a problem for Statistical Machine Translation (SMT) systems. Focussing on cases where a pronoun refers to an entity introduced earlier in the text with a noun phrase (its antecedent) – where the pronoun serves an anaphoric function – when translating from En- glish into a language with grammatical gender such as French or German, the anaphoric pronoun must agree with the translation of its antecedent. Despite growing attention to this problem, little progress has been made, and little at- tention has been given to pronouns serving other functions. The central claim of this thesis is that pronouns performing different func- tions in text should be handled differently by SMT systems and when evaluating pronoun translation. This motivates the introduction of a new framework to categorise pronouns according to their function. Labelling pronouns according to their function also helps to resolve instances of functional ambiguity arising from the same pronoun in the source language having multiple functions, each with different translation requirements in the target language. The categorisation framework is used in corpus annotation, corpus analysis, SMT system develop- ment and evaluation. I have directed the annotation and conducted analyses of a collection of paral- lel English-German texts called ParCor, in which pronouns are manually anno- tated according to their function. This provides a first step toward understanding the problems that SMT systems face when translating pronouns. An analysis of manual translation revealed differences in the use of anaphoric and pleonastic (“dummy”) pronouns between English and German. An analysis of the output of state-of-the-art SMT systems highlighted (1) the need to disambiguate the func- tion of those pronouns with ambiguous surface forms so that each pronoun may be translated in an appropriate way and (2) the need to further sub-categorise anaphoric pronouns according to the translation requirements of the target lan- guage. To demonstrate the value of this work, I have developed an automated post- editing system which aims to identify pronouns that have been translated in- correctly and correct them, using rules that differ for anaphoric and for non- anaphoric pronouns. The system was submitted to the DiscoMT 2015 shared iii task on pronoun translation for English-French. As with all other participating systems, my system failed to beat a simple baseline. A detailed analysis revealed that the design of the rules and their strict application to the SMT output are the biggest factors in the failure of the system. The lack of automatic evaluation metrics for pronoun translation is a limiting factor in SMT system development. To alleviate this problem, Christian Hard- meier and I have developed a testing regimen called PROTEST comprising (1) a hand-selected set of pronoun tokens categorised according to the different prob- lems that SMT systems face and (2) an automated evaluation script. Pronoun translations can then be automatically compared against a human-authored refer- ence translation, with mismatches referred for manual evaluation. The automatic evaluation was applied to the output of systems submitted to the DiscoMT 2015 shared task on pronoun translation. This again highlighted the weakness of the post-editing system,