
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Göteborgs universitets publikationer - e-publicering och e-arkiv University of Gothenburg Language Technology Programme May, 2008 FROM CORPUS TO LANGUAGE CLASSROOM: reusing Stockholm Umeå Corpus in a vocabulary exercise generator SCORVEX Master Thesis, 30 points Author: Elena Volodina Supervisor: Lars Borin May, 2008 Elena Volodina. 2008. Master Thesis, GU. From Corpus to Language Classroom: Reusing Stockholm Umeå Corpus in a Vocabulary Exercise Generator SCORVEX. Abstract In this master thesis the focus has been made on the evaluation of Stockholm Umeå Corpus (SUC) as a source of teaching materials for learners of Swedish as a Second language. The evaluation has been carried out both practically and theoretically. On the theoretical side, readability tests have been run on all SUC texts to analyze whether appropriate texts can be automatically selected for each proficiency level. To make readability analysis more “vocabulary aware” lexical frequency profile of each text has been collected, analyzed and embedded into the final readability score assigned to each text. SUC has proven to be a rich source of texts of different proficiency levels appropriate for language training purposes. Advantages and disadvantages of SUC as a source of pedagogical materials have been identified in the course of work. On the practical side, as a side effect of the theoretical analysis, a pedagogical tool SCORVEX (Swedish CORpus-based Vocabulary EXercise generator) has been designed and implemented. The existing modules of SCORVEX demonstrate to which extent it is possible to generate pedagogically acceptable vocabulary items with SUC as the only language resource. I am demonstrating in the thesis how wordbank items, multiple choice items and c-tests can be automatically generated for a specified proficiency level, word frequency band and a specified wordclass. In yes/no items potential words are generated on the basis of existing morphemes. All the four modules are therefore “language-aware”. Accessing frequency data obtained from SUC is the pre-requisite for the exercise generation, whereas SUC text archive is the only source of texts, sentences and words for vocabulary items. This thesis can hopefully wake interest among teachers to test this generator in real-life conditions and maybe even convince some teachers in the usefulness of this pedagogical tool. The numerous ways for further development of this software are outlined in the paper. 1 Elena Volodina. 2008. Master Thesis, GU. From Corpus to Language Classroom: Reusing Stockholm Umeå Corpus in a Vocabulary Exercise Generator SCORVEX. CONTENTS ABSTRACT .................................................................................................................................................. 1 CONTENTS.................................................................................................................................................. 2 List of Tables ........................................................................................................................................ 4 List of Figures....................................................................................................................................... 4 List of Abbreviations ............................................................................................................................ 5 1. INTRODUCTION .................................................................................................................................... 6 1.1 VOCABULARY ACQUISITION – A FEW WORDS........................................................................................ 6 1.2 EXERCISE GENERATORS - BACKGROUND AND RELATED RESEARCH ...................................................... 7 1.3 IDEA AND CENTRAL ISSUES OF THIS ESSAY ........................................................................................... 8 1.4 METHOD ............................................................................................................................................. 11 1.5 STRUCTURE OF THE THESIS ................................................................................................................. 11 1.6 NOVELTY AND APPLICABILITY............................................................................................................ 12 2. ICALL FOR SWEDISH: OVERVIEW................................................................................................ 13 2.1 CALL - OVERVIEW OF DEVELOPMENT ................................................................................................ 13 2.2 ICALL - OVERVIEW OF DEVELOPMENT............................................................................................... 15 2.3 SWEDISH AS A SECOND/FOREIGN LANGUAGE..................................................................................... 19 2.3.1 Teaching/Testing Swedish as a Second Language..................................................................... 19 2.3.2 Research within Swedish as a Second Language. Linguistic & Pedagogical Perspectives ....... 20 CALL APPLICATIONS FOR SWEDISH AS L2............................................................................................... 22 2.5 ICALL APPLICATIONS FOR SWEDISH AS L2........................................................................................ 22 2.5.1 GRIM......................................................................................................................................... 23 2.5.2 IT-based Collaborative Learning in Grammar (ITG)................................................................. 24 2.5.3 VISL - Visual Interactive Syntax Learning................................................................................ 25 2.5.4 Ville & DEAL............................................................................................................................ 26 2.5.5 ARTUR...................................................................................................................................... 26 2.5.6 VocabTool ................................................................................................................................. 27 2.5.7 Lingus ........................................................................................................................................ 28 2.5.8 Wordfinder................................................................................................................................. 28 2.5.9 Squirrel....................................................................................................................................... 29 2.5.10 Didax........................................................................................................................................ 29 2.5.11 Other projects........................................................................................................................... 29 2.6 NL RESOURCES AND NLP TOOLS FOR SWEDISH.................................................................................. 30 3. USE OF CORPUS IN THE EXERCISE GENERATOR.................................................................... 31 3.1 GENERAL ON CORPORA IN SECOND LANGUAGE ACQUISITION ............................................................ 31 3.2 OVERVIEW OF SWEDISH CORPORA...................................................................................................... 32 3.3 GENERAL ON SUC AND ITS ROLE IN THE EXERCISE GENERATOR......................................................... 33 3.4 SOME WORDS ON THE NOTIONS OF “WORD” AND “LEMMA”................................................................ 36 3.5 SUC AS A SOURCE OF FREQUENCY INFORMATION............................................................................... 37 3.5.1 The FL in yes/no items............................................................................................................... 40 3.5.2 The FL in automatic selection of target vocabulary items from texts ........................................ 40 3.5.3 The FL in selection of distractors for multiple-choice items ..................................................... 43 3.5.4 The FL in search of authentic texts. LFP calculation................................................................. 43 3.6 SUC AS A SOURCE OF AUTHENTIC EXAMPLES ..................................................................................... 45 3.6.1 Readability Indices..................................................................................................................... 46 3.6.2 Lexical Difficulty Measures....................................................................................................... 47 3.6.3 Test setting................................................................................................................................. 49 3.6.4 Test results, generalizations and conclusions.............................................................................52 3.6.5 Algorithm for text selection. ...................................................................................................... 57 3.6.6 Algorithm for sentence selection ............................................................................................... 58 2 Elena Volodina. 2008. Master Thesis, GU. From Corpus to Language Classroom: Reusing Stockholm Umeå Corpus in a Vocabulary Exercise Generator SCORVEX. 4. VOCABULARY
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages135 Page
-
File Size-