Assessing the Comprehensibility and Perception of Machine Translations
Total Page:16
File Type:pdf, Size:1020Kb
ASSESSING THE COMPREHENSIBILITY AND PERCEPTION OF MACHINE TRANSLATIONS A PILOT STUDY Aantal woorden: 16 804 Iris Ghyselen Studentennummer: 01400320 Promotor: Prof. dr. Lieve Macken Masterproef voorgelegd voor het behalen van de graad master in het vertalen in de richting Toegepaste Taalkunde Academiejaar: 2017 – 2018 ASSESSING THE COMPREHENSIBILITY AND PERCEPTION OF MACHINE TRANSLATIONS A PILOT STUDY Aantal woorden: 16 804 Iris Ghyselen Studentennummer: 01400320 Promotor: Prof. dr. Lieve Macken Masterproef voorgelegd voor het behalen van de graad master in het vertalen in de richting Toegepaste Taalkunde Academiejaar: 2017 – 2018 i VERKLARING I.V.M. AUTEURSRECHT De auteur en de promotor(en) geven de toelating deze studie als geheel voor consultatie beschikbaar te stellen voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van gegevens uit deze studie. ii ACKNOWLEDGMENTS There are several people who deserve my profound gratitude for their help and support while I was writing this dissertation. In the first place I would like to give thanks to all the respondents who filled in my questionnaire. That small act of kindness meant a great deal to me. The questionnaire required some time and attention to fill in and was distributed in a period when people were being overwhelmed on social media with all kinds of questionnaires for other theses. Therefore, I am very grateful to each and every one of them who helped me. Secondly, I would like to express my deep sense of gratitude to my supervisor, Prof. Dr. Lieve Macken, who has helped me with everything I could ask for and who far exceeded my expectations of any supervisor. This dissertation could not have been completed without her support. I want to thank Céline Van De Walle as well, for taking the time to help me with the language, style and structure of my text by providing some very useful feedback. My parents’ support has also been instrumental for me while writing this dissertation. They were prepared to listen to me at all times and helped me to find respondents. Moreover, their love and support have helped me throughout my entire education and I am very grateful for the chance to pursue my studies and by extension my dreams. Fourthly, I would like to extend this thanks to my entire family for supporting me in every way that they could. From visiting me all on my Erasmus exchange to sending me good luck cards and texts during stressful exam periods, they have really been there for me. Lastly, I want to thank my friends for being there whenever I needed them. The library sessions with them made writing this dissertation so much more fun. iii ABSTRACT This dissertation addresses the results of reading comprehension tests and perception questions on both human translated and raw (unedited) machine translated texts. These translations are based on three source texts of the English Machine Translation Evaluation version (CREG-MT- eval) of the Corpus of Reading Comprehension Exercises (CREG). The author of this dissertation translated the human translations herself and the neural machine translation engines used are DeepL and Google Translate. The experiment was undertaken via a SurveyMonkey questionnaire, which 99 respondents filled in. The questionnaire contained five reading comprehension questions, as well as five perception questions. The translations were shown before answering the questions, but not during. The results show that respondents can tell which translation is a human or machine translation and that the human translations receive the best clarity scores. The mistakes that bother readers most have to do with grammar, sentence length, level of idiomaticity and incoherence. Comprehension is best with human translations when respondents are asked directly, but the comprehension questions show that the human translation only performs best for one text, with DeepL scoring better for the other two. As for the machine translations, there is no definite answer as to which machine translation tool performs better. iv TABLE OF CONTENTS 1 Introduction ......................................................................................................................... 1 2 Literature study ................................................................................................................... 4 2.1 Comprehensibility ....................................................................................................... 4 2.2 Quality ......................................................................................................................... 5 2.2.1 General text quality .............................................................................................. 5 2.2.2 Quality of translated texts .................................................................................... 6 2.2.3 Quality evaluation ................................................................................................ 8 2.2.3.1 Human evaluation ............................................................................................. 8 A. Scoring ............................................................................................................... 10 B. Reading comprehension ..................................................................................... 10 2.2.3.2 Automatic evaluation ...................................................................................... 11 2.3 Error typology ............................................................................................................ 11 2.4 MT approaches .......................................................................................................... 13 2.4.1 RBMT ............................................................................................................. 13 2.4.2 SMT ................................................................................................................ 14 2.4.3 NMT ............................................................................................................... 14 2.4.3.1 Google Translate ......................................................................................... 14 2.4.3.2 DeepL ......................................................................................................... 15 2.4.3.3 Typical errors of NMT ................................................................................ 15 3 Methodology ..................................................................................................................... 17 4 Applied error typology ...................................................................................................... 22 5 Results ............................................................................................................................... 25 5.1 General results ........................................................................................................... 25 5.2 Text-specific questions .............................................................................................. 27 5.2.1 Human or machine translation ........................................................................... 27 5.2.1.1 Human translation ........................................................................................... 28 A. HT labelled as MT .............................................................................................. 28 B. HT labelled as HT .............................................................................................. 29 5.2.1.2 Google Translate ............................................................................................. 31 A. GT labelled as HT .............................................................................................. 31 B. GT labelled as MT .............................................................................................. 32 5.2.1.3 DeepL ............................................................................................................. 34 A. DL labelled as HT .............................................................................................. 34 v B. DL labelled as MT .............................................................................................. 36 5.2.1.4 Summary ......................................................................................................... 38 5.2.2 Clarity score ....................................................................................................... 39 5.2.3 Comprehension ................................................................................................... 42 5.2.4 Notable mistakes ................................................................................................ 48 5.3 General results for comprehension questions ............................................................ 53 5.4 Comprehension questions text 1 ................................................................................ 55 5.5 Comprehension questions text 2 ................................................................................ 57 5.6 Comprehension questions text 3 ................................................................................ 59 5.7 Linguists versus non-linguists ................................................................................... 61 6 Conclusion and discussion ...............................................................................................