Master Reference
Total Page:16
File Type:pdf, Size:1020Kb
Master Suitability of Neural Machine Translation for Producing Linguistically Accessible Text: Exploring the Effects of Pre-Editing on Easy-to-Read Administrative Documents KAPLAN, Abigail Abstract Governments have the potential to improve the civic inclusion of people with intellectual disabilities living in multilingual societies by providing administrative documents in an Easy-to-Read format and in the preferred language of the target audience. This study aims to address the obstacles keeping Swiss and French administrations from successful accessible communication with their English-speaking citizens with disabilities. It assesses the suitability of a free and public neural machine translation system, DeepL, for generating accessible English from Easy-to-Read French source texts. With the goal of increasing the linguistic accessibility of the texts produced by the system, it proposes the introduction of a pre-editing step. A four-part study investigated the issue from three different angles: translation quality, accessibility, and readability. Findings show that pre-editing improves two of the three factors of linguistic accessibility examined, making this type of tool more suitable for producing Easy-to-Read English translations. Reference KAPLAN, Abigail. Suitability of Neural Machine Translation for Producing Linguistically Accessible Text: Exploring the Effects of Pre-Editing on Easy-to-Read Administrative Documents. Master : Univ. Genève, 2021 Available at: http://archive-ouverte.unige.ch/unige:151207 Disclaimer: layout of this document may differ from the published version. 1 / 1 Abigail R. Kaplan Suitability of Neural Machine Translation for Producing Linguistically Accessible Text Exploring the Effects of Pre-Editing on Easy-to-Read Administrative Documents Directrice : Silvia Rodríguez Vázquez Jurée : Pierrette Bouillon Mémoire présenté à la Faculté de traduction et d’interprétation (Département de traduction, Unité d’anglais) pour l’obtention de la Maîtrise universitaire en traduction, mention Traduction et communication spécialisée multilingue Université de Genève Janvier 2021 ii Déclaration attestant le caractère original du travail effectué J’affirme avoir pris connaissance des documents d’information et de prévention du plagiat émis par l’Université de Genève et la Faculté de traduction et d’interprétation (notamment la Directive en matière de plagiat des étudiant-e-s, le Règlement d’études des Maîtrises universitaires en traduction et du Certificat complémentaire en traduction de la Faculté de traduction et d’interprétation ainsi que l’Aide-mémoire à l’intention des étudiants préparant un mémoire de Ma en traduction). J’atteste que ce travail est le fruit d’un travail personnel et a été rédigé de manière autonome. Je déclare que toutes les sources d’information utilisées sont citées de manière complète et précise, y compris les sources sur Internet. Je suis consciente que le fait de ne pas citer une source ou de ne pas la citer correctement est constitutif de plagiat et que le plagiat est considéré comme une faute grave au sein de l’Université, passible de sanctions. Au vu de ce qui précède, je déclare sur l’honneur que le présent travail est original. Nom et prénom : Lieu / date / signature : iii AcknoWledgements First, my sincere gratitude to Silvia Rodríguez Vázquez, for your guidance, enthusiasm, attention, and care. You did not give up hope despite the many curveballs life threw at us along the way. Thank you for inspiring this project through your own Work in the fields of translation technologies and accessibility, and for believing in me enough to orchestrate the incredible opportunity of presenting a portion of this research at the 2019 Klaara Conference. I would also like to thank Pierrette Bouillon, who opened my eyes to the fascinating and complex world of machine translation, and all of the other smart and dedicated FTI professors who have shared their knowledge and helped shape me into a capable budding professional. Special thanks to my six colleagues who graciously agreed to give up their time and post-editing energy to participate in my (long) study, and to all of my FTI friends and classmates. You awe me with both your creativity and your compassion, and you motivate me to strive for excellence in our field. To my mom, dad, Jake, and Allan, thank you for knowing When to nudge and knowing When to sympathize. Thank you for showing me nothing but love and patience. Thank you for being my readers, my listeners, and always my biggest cheerleaders. iv Abstract Governments have the potential to improve the civic inclusion of people with intellectual disabilities living in multilingual societies by providing administrative documents in an Easy-to- Read format and in the preferred language of the target audience. However, in the case of Switzerland and France, Easy-to-Read is not widely used for French administrative communication, and even less so for English. This study aims to address the obstacles that could be keeping the Swiss and French administrations from publishing more translated Easy-to-Read documents, or in other Words, the barriers to successful accessible communication with their English-speaking citizens with disabilities. To this end, it investigates the suitability of a free and public neural machine translation system, DeepL, for generating linguistically accessible English from Easy-to-Read French source texts. With the goal of increasing the linguistic accessibility of the texts produced by the system, it proposes the introduction of a pre-editing step, in which formatting is removed from the French text prior to translation. A four-part study examined the issue from three different angles of linguistic accessibility: translation quality, accessibility, and readability. In the first three parts of the study, translation quality and accessibility were manually assessed by way of DQF-MQM error annotation, measurements of post-editing effort, and Easy- to-Read guideline violation annotation. The fourth part featured an automatic evaluation of readability. Findings allow us to conclude that adding a pre-editing step, Which clarifies sentence boundaries for the machine, does in fact improve two of the three factors of linguistic accessibility examined, translation quality and accessibility, and make this type of tool more suitable for producing Easy-to-Read English translations. Keywords: neural machine translation; controlled language; Easy-to-Read; accessibility; accessible communication v Table of Contents List of Figures ......................................................................................................................................... viii List of Tables ........................................................................................................................................... ix List of Abbreviations ................................................................................................................................. x Chapter 1: Introduction ....................................................................................................... 1 1.1 Motivation ..................................................................................................................................... 1 1.2 Research context .......................................................................................................................... 3 1.3 Research goals, questions, and hypotheses ............................................................................... 5 1.4 Materials and methods ................................................................................................................. 6 1.5 Thesis structure ............................................................................................................................ 7 Chapter 2: Controlled Language and Machine Translation ......................................................8 2.1 Introduction .................................................................................................................................. 8 2.2 Controlled language ..................................................................................................................... 8 2.2.1 Defining controlled language ................................................................................................................................ 8 2.2.2 Branches and applications of controlled language ............................................................................................. 9 2.2.2.1 Machine-oriented controlled language (MOCL) ........................................................................................ 9 2.2.2.2 Human-oriented controlled language (HOCL) ........................................................................................10 2.2.3 Easy to Read/Facile à lire et à comprendre .............................................................................................................13 2.2.3.1 History and development of Easy to Read ...............................................................................................14 2.2.3.2 Limitations of E2R/FALC .........................................................................................................................17 2.3 Machine translation ................................................................................................................... 20 2.3.1 Defining machine translation .............................................................................................................................20