Masterarbeit / Master's Thesis

MASTERARBEIT / MASTER’S THESIS Titel der Masterarbeit / Title of the Master‘s Thesis Europarl Corpus HFWL Glossaries: A Set of Multilingual Glossaries Based on the Quantitative Corpus-Based Lexical Analysis of the Europarl Corpus verfasst von / submitted by Bc. Marek Paulovič, BA angestrebter akademischer Grad / in partial fulfilment of the requirements for the degree of Master of Arts (MA) Wien, 2016 / Vienna 2016 Studienkennzahl lt. Studienblatt / A 065 331 342 degree programme code as it appears on the student record sheet: Studienrichtung lt. Studienblatt / Masterstudium Dolmetschen UG2002 Deutsch Englisch degree programme as it appears on the student record sheet: Betreut von / Supervisor: Univ.-Prof.Mag.Dr. Gerhard Budin EUROPARL CORPUS HWFL GLOSSARIES 3 ACKNOWLEDGMENTS Firstly, I would like to express my sincere gratitude to my advisor Univ.-Prof. Mag. Dr. Gerhard Budin for his continuous and immense support, for his patience, motivation, and considerable knowledge in this field. His guidance has helped me throughout the duration of researching and writing of this thesis. I could not have imagined having a better advisor and mentor for my master thesis. Besides my advisor, I would like to thank PhDr. Jana Rejšková and Mgr. Věra Kloudová, Ph.D. for their insightful comments and help with identification of the problematic internationalisms from the both didactic and linguistic perspective. My sincere thanks also goes to Beth Garner for her immense help with proofreading of this master thesis. Without her encouragement, I might have not decided to write this master thesis in English. I would like to also thank to my loved ones, family and friends who supported me throughout entire process, both by keeping me harmonious and helping me with putting the pieces together. Last but not least I would like to thank God for the good health and wellbeing that were necessary to complete this thesis. EUROPARL CORPUS HWFL GLOSSARIES 4 Abstract The main goal of this thesis was to experimentally create a set of multilingual glossaries (English/German/Czech/Slovak) of the most frequent words in the European Parliament using quantitative corpus-based lexical analysis of the Europarl corpus. The result was aimed to help advanced non-native speakers of English to acquire active vocabulary in their language combination for effective oral communication in the political discourse. Throughout the thesis, the inductive transdisciplinary approach is applied - creating a useful language learning tool represented the main research problem that was to be solved through application of knowledge from various disciplines such as English language teaching, speech science, lexicology, terminology, language for special purposes with a special focus on politolinguistics, computer assisted vocabulary learning, and finally, corpus linguistics as a working method for corpus analysis. Based on our theoretical review, it can be confirmed that: a) learning HFWL is the most effective learning method for vocabulary acquisition considering the amount of time given and the delivered results, b) learning vocabulary aimed at its automatization may lead to speech output improvement, and c) political language can be regarded to a certain extent as a language for special purposes. The results of literature review are summarized in the proposed cognitive model for spoken speech production. In the practical part, the theoretical knowledge of corpus linguistics was applied for creating a set of four multilingual glossaries. This thesis puts the case for using multilingual glossaries based on high frequency word lists as a didactic tool, but the hypothesis still remains to be validated in other experimental studies. Keywords: Europarl corpus, glossary compilation, keyword analysis, LSP, political language, vocabulary learning, corpus-based lexical analysis, HFWL EUROPARL CORPUS HWFL GLOSSARIES 5 Abstrakt Die vorliegende Arbeit befasst sich mit der Erstellung 4 multilingualer Glossare (Englisch/Deutsch/Tschechisch/Slowakisch) der häufigsten Wörter im Europäischen Parlament, deren Auswahl auf einer quantitativen, korpusgestützten, lexikalischen Analyse des Europarl Korpus basiert. Das Ziel war es, anhand der Korpusanalyse ein Hilfsmittel für fortgeschrittene englische Nicht-Muttersprachler zusammenzustellen, das den Erwerb des aktiven Wortschatzes im politischen Diskurs erleichtert und folglich zur Verbesserung der mündlichen Sprachkompetenz führt. Die Masterarbeit basiert auf dem induktiven transdisziplinären Ansatz; Das Vorhaben, dieses Hilfsmittel zusammenzustellen, stellt das Forschungsproblem dar. Dieses wurde durch die Anwendung von Wissen aus unterschiedlichen akademischen Disziplinen gelöst, nämlich dem Sprachunterricht, der Sprachwissenschaft, der Lexikologie, der Terminologie, der Fachsprachenforschung, genauer genommen der Politolinguistik, dem computergestützten Sprachunterricht und zu guter Letzt der Korpuslinguistik als die Untersuchungsmethode. Auf der Grundlage der recherchierten Informationen konnte eruiert werden, dass a) das Erlernen von Häufigkeitswortschatz, gemessen an den Lernergebnissen und dem Zeitaufwand, die effektivste Lernmethode für den Wortschatzerwerb ist, b) ein Wortschatzerwerb, der auf die Automatisierung des aktiven Wortschatzes abzielt, zur Verbesserung der mündlichen Sprachkompetenz führen kann, und c) die politische Sprache in gewissem Maße als Fachsprache betrachtet werden kann. Die Ergebnisse der Literaturrecherche wurden beim Entwurf eines synthetisierten Models für die mündliche Sprachproduktion angewendet. Die in der Masterarbeit präsentierten vier Glossare bestehen aus einer quantitativen Korpusanalyse des Europarl Korpus und einer qualitativen Korpusanalyse der identifizierten 3000 häufigsten Wörter und der 1500 häufigsten Schlüsselwörter. In der Masterarbeit wird vorgeschlagen, das Lernen von Glossaren, die auf Häufigkeitswortschatz basieren, als didaktische Methode anzuwenden, aber diese Hypothese ist noch bei anderen experimentellen Studien zu verifizieren. Schlüsselwörter: Europarl Korpus, Erstellung von Glossaren, linguistische Schlüsselwortanalyse, politische Sprache, Politolinguistik, Wortschatzerwerb, computergestützter Spracherwerb, korpusgestützte lexikalische Analyse, HFWL EUROPARL CORPUS HWFL GLOSSARIES 6 TABLE OF CONTENTS Introduction ................................................................................................................................ 10 Study aims ............................................................................................................................... 11 Chapters Summary .................................................................................................................. 12 Theoretical Framework for Inter- and Transdisciplinarity: Towards Transdisciplinarity in Translation Studies ...................................................................................................................... 14 Introduction ............................................................................................................................ 14 Translation Studies .................................................................................................................. 14 Transdisciplinarity ................................................................................................................... 16 Transdisciplinarity in Transcultural Communication ............................................................... 18 Transdisciplinarity in Interpreting Studies .............................................................................. 19 Conclusion ............................................................................................................................... 20 Main Theoretical Framework for Glossary Compilation ............................................................. 21 Introduction ............................................................................................................................ 21 The Usage of High Frequency Word Lists in Second Language Learning & Teaching ............. 22 History and usage of high frequency word lists. ................................................................. 22 Vocabulary size in regard to HFWL. .................................................................................... 24 Vocabulary categorization................................................................................................... 26 Vocabulary acquisition. ....................................................................................................... 28 Limitation of HFWL .............................................................................................................. 32 Conclusion and suggestion for correct vocabulary acquisition. .......................................... 32 Speech Science ........................................................................................................................ 33 Speech production theory. .................................................................................................. 33 Automatization of language processes in special regard to bilingual lexicon. ................... 36 Concept of shared attention. .............................................................................................. 38 Theoretical implications. ..................................................................................................... 41 Conclusion ..............................................................................................................................

Masterarbeit / Master's Thesis

Machine-Translation Inspired Reordering As Preprocessing for Cross-Lingual Sentiment Analysis

Student Research Workshop Associated with RANLP 2011, Pages 1–8, Hissar, Bulgaria, 13 September 2011

ALW2), Pages 1–10 Brussels, Belgium, October 31, 2018

Using Morphemes from Agglutinative Languages Like Quechua and Finnish to Aid in Low-Resource Translation

A Massively Parallel Corpus: the Bible in 100 Languages

A Corpus-Based Study of Unaccusative Verbs and Auxiliary Selection

FERSIWN GYMRAEG ISOD the National Corpus of Contemporary

The Translation Equivalents Database (Treq) As a Lexicographer’S Aid

The Pile: an 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding

Distributional Properties of Verbs in Syntactic Patterns Liam Considine

The National Corpus of Contemporary Welsh 1. Introduction

Semantic Role Annotation of a French-English Corpus