Building Teaching Materials for Adult English Language Learners

Total Page:16

File Type:pdf, Size:1020Kb

Building Teaching Materials for Adult English Language Learners INTERACTION, AUTHENTICITY AND SPOKEN CORPORA: BUILDING TEACHING MATERIALS FOR ADULT ENGLISH LANGUAGE LEARNERS A thesis presented to the faculty of the Graduate School of Western Carolina University in partial fulfillment of the requirements for the degree of Master of Arts in Teaching English to Speakers of Other Languages (TESOL). By Courtney Cunningham Director: Dr. Chandrika Balasubramanian Associate Professor of Applied Linguistics and Director Graduate TESOL program English Department Committee Members: Dr. Christopher Blake, English Department Dr. Randi Reppen, English Department, Northern Arizona University 2 ACKNOWLEDGEMENTS I would like to begin by thanking my committee members, who have each guided me through this process and contributed to my research. Chris Blake and Chandrika Balasubramanian have provided me with a thorough and fulfilling education in TESOL, without which, none of this would have been possible. Chris‘s suggestions for my writing make it more complete. He also continually offers me support and positive words when they are most needed. I owe my interest and passion for corpus linguistics, in particular, to Chandrika, whose enthusiasm for it is contagious. Chandrika, as my director, has played a vital role in making this thesis a reality. Without the help and guidance of Randi Reppen, this thesis would not be the product that it is. I thank her for allowing me access to the Cambridge Corpus of Spoken North American English and for giving my thesis the direction that it needed. I value each committee member and am forever grateful that they were a part of my thesis process. I would also like to thank my fellow graduate students and friends who have shared ideas with me and listened to mine. I owe particular thanks to Alan Wray, who offered his assistance in editing and made suggestions so that my language was clearer. Finally, I would like to thank the participants of this study for allowing me to interview them and whose responses shed much light on adult English language education in the United States. I dedicate this thesis to adult English language learners in the U.S. I hope that in some way my work contributes to increasing their English language skills and the betterment of their lives in the U.S. 3 TABLE OF CONTENTS Page Abstract……………………………………………………………………………………7 Chapter 1: Introduction……………………………………………………………………9 Introduction………………………………………………………………………. 9 Statement of Problem…………………………………………………………….10 Purpose of Study…………………………………………………………………14 Description of Study……………………………………………………………..15 Research Questions………………………………………………………………18 Significance to Field….………………………………………………………….18 Ethical Considerations….………………………………………………………..19 Chapter 2: Review of Literature………………………………………………………....20 Introduction……………………………………………………………………...20 Foreign-Born Population………………………………………………………...21 Adult ESL Programs…………………………………………………………….25 Effective Teaching for Adult ELLs……………………………………………..29 A Corpus-based Approach to Language Teaching………………………………37 Corpus-based Teaching Materials………………………………………………50 4 Summary……………………………………………………………………….51 Chapter 3: Methodology……………………………………………………………….53 Introduction…………………………………………………………………….53 Setting………………………………………………………………………….55 Sample/Participants…………………………………………………………….56 Data Collection/Procedures……………………………………………..……..57 Interviews with Participants……………………………………………………57 Textbook Evaluations…………………………………………………………..60 Corpora in Materials Development…………………………………………….62 Chapter 4: Results……………………………………………………………………...64 Introduction……………………………………………………………………64 Interviews…………………………………………………………………….. 66 Administrator Interviews………………………………………………66 Teacher Interviews……………………………………………………..72 Student Interviews……………………………………………………..78 Textbook Evaluations………………………………………………………….94 Side by Side…………………………………………………………….95 5 Step Forward: Language for Everyday Life……………………………….101 Interchange…………………………………………………………………….110 Touchstone…………………………..………………………………………….117 Chapter 5: Discussion…………………………………………………………………. 128 Introduction…………………………………………………………………....128 Discussion of Corpora for Language Teaching………………………………..129 Limitations……………………………………………………………………..136 Recommendations for Future Research………………………………………..137 Conclusion……………………………………………………………………..138 List of Tables……………………….………………………………………………….. 6 References…………………………………………………………………………….. 140 Appendices……………………………………………………………………………..149 Appendix 1: Interview Questions for Program Administrators and Instructor…149 Appendix 2: Interview Questions for Target Learner Population……..……… 150 Appendix 3: Textbook Evaluation Overview and Checklist…….......................151 Appendix 4: Questionnaire: Textbook Evaluation…………..…………………153 Appendix 5: Consent Form………………………………………………..….. 154 6 LIST OF TABLES Page Table 1: Learner Population Overview………………………………………………...57 Table 2: Learner Population Details…..………………………………………………. 79 Table 3: Frequency of English Language Use…………………….…………………....81 Table 4: Difficult Aspects of Learning English…………………………………………83 Table 5: The Most Important English Language Skill………………………………… 85 Table 6: Teaching Activities that Work the Best……………………………………… 86 Table 7: Reasons for Wanting to Improve English……………………………………. 90 Table 8: Goals to Accomplish While in the Program………………………………….. 91 Table 9: Preferred Teaching Activities………………………………………………….93 7 ABSTRACT INTERACTION, AUTHENTICITY AND SPOKEN CORPORA: BUILDING TEACHING MATERIALS FOR ADULT ENGLISH LANGUAGE LEARNERS Courtney Cunningham, MA Western Carolina University Director: Dr. Chandrika Balasubramanian This study investigated the needs and challenges of adult ELLs in the community college setting in the United States. The study was conducted in Western North Carolina (WNC), where administrators, teachers, and students of three different community colleges were interviewed. Interviews determined the needs and challenges of this group of learners, the language skills they are most interested in acquiring, and how effective current teaching materials are in helping meet their needs. Interviews were transcribed to detect patterns in participant responses. The learners were primarily interested in increasing speaking and listening skills so that that they could communicate in their communities in situations they encounter on a regular basis. Results from the interviews, as well as extensive research conducted regarding effective ELT for adult ELLs helped establish criteria for teaching materials that would be beneficial for this group of learners, specifically with speaking and listening. Textbook evaluations were created and applied to three textbooks that were commonly used within all of the community colleges that participated in the study. The evaluations were also applied to a corpus-based textbook that was created using the Cambridge Corpus of Spoken North American English, which focuses on spoken English, to compare with the other textbooks. The findings of this 8 study suggest that there needs to be teaching materials, both textbook and computer programs, created for this group of learners that focus on the qualities of native spoken English within situations that adult ELLs in the U.S. will encounter in their day to day lives. 9 CHAPTER 1: INTRODUCTION Introduction Developing a successful adult English language program is an in-depth process in which educators must combine research and practice to ensure students are provided the most effective and productive learning environment possible. A successful program is one that has appropriate courses which systematically follow the progression of students‘ learning and needs (Celce-Murcia, 2001; Graves, 2001; Richards, 2001). Non-credit adult English as a Second Language (ESL) programs provide foreign-born adults living in the United States an opportunity to learn English, but programs are not as strong as they would be if innovative approaches based on research and practice were implemented (Chisman & Crandall, 2007; Chisman, 2008; Crandall & Sheppard, 2004; McHugh, Gelatt, & Fix, 2007; Rodriguez, Burt, Peyton, and Ueland, 2009). Research has determined that bringing innovative approaches that have proven successful into the majority of programs in the U.S. is a challenge; teachers and administrators are responsible for bringing the research and practice into the classroom, but are often limited in time and resources (Chisman & Crandall, 2007; Chisman, 2008; Crandall & Sheppard, 2004) However, since textbooks and teaching materials are the most commonly utilized resources in the language classroom a logical solution would be to incorporate innovative research and practice into those materials. Designing textbooks and teaching materials using corpus-based grammar and data, and incorporating the ‗real‘ language that adult English language learners (ELLs) in the U.S. frequently encounter, will benefit non-credit ESL programs and increase English proficiency in the foreign- 10 born adult population in the United States. The purpose of this study is to determine how to strengthen teaching materials that enhance learning through innovative teaching approaches and content that is relevant and useful to the learners, derived from corpora of American Spoken English. Statement of the Problem The foreign-born population in the U.S. continues to grow, with an increase of 6.9 million between 2000 and 2008 (―Migration Policy Institute‖, 2010). While the young children and youth of this population are receiving English language instruction in the public schools, adults are accountable for their English
Recommended publications
  • MASC: the Manually Annotated Sub-Corpus of American English
    MASC: The Manually Annotated Sub-Corpus of American English Nancy Ide*, Collin Baker**, Christiane Fellbaum†, Charles Fillmore**, Rebecca Passonneau†† *Vassar College Poughkeepsie, New York USA **International Computer Science Institute Berkeley, California USA †Princeton University Princeton, New Jersey USA ††Columbia University New York, New York USA E-mail: [email protected], [email protected], [email protected], [email protected], [email protected] Abstract To answer the critical need for sharable, reusable annotated resources with rich linguistic annotations, we are developing a Manually Annotated Sub-Corpus (MASC) including texts from diverse genres and manual annotations or manually-validated annotations for multiple levels, including WordNet senses and FrameNet frames and frame elements, both of which have become significant resources in the international computational linguistics community. To derive maximal benefit from the semantic information provided by these resources, the MASC will also include manually-validated shallow parses and named entities, which will enable linking WordNet senses and FrameNet frames within the same sentences into more complex semantic structures and, because named entities will often be the role fillers of FrameNet frames, enrich the semantic and pragmatic information derivable from the sub-corpus. All MASC annotations will be published with detailed inter-annotator agreement measures. The MASC and its annotations will be freely downloadable from the ANC website, thus providing
    [Show full text]
  • Background and Context for CLASP
    Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90’s and early 2000’s Text Encoding Initiative (TEI) – 1987 Still ongoing, used mainly by humanities EAGLES/ISLE Developed standards for morpho-syntax, syntax, sub-categorization, etc. (links on CLASP wiki) Corpus Encoding Standard (now XCES - http://www.xces.org) Main Aspects" ! Harmonization of formats for linguistic data and annotations" ! Harmonization of descriptors in linguistic annotation" ! These two are often mixed, but need to deal with them separately (see CLASP wiki)" Formats: The Past 20 Years" 1987 TEI Myriad of formats 1994 MULTEXT, CES ~1996 XML 2000 ISO TC37 SC4 2001 LAF model introduced now LAF/GrAF, ISO standards Myriad of formats Actually…" ! Things are better now" ! XML use" ! Moves toward common models, especially in Europe" ! US community seeing the need for interoperability " ! Emergence of common processing platforms (GATE, UIMA) with underlying common models " Resources 1990 ! WordNet gains ground as a “standard” LR ! Penn Treebank, Wall Street Journal Corpus World Wide Web ! British National Corpus ! EuroWordNet XML ! Comlex ! FrameNet ! American National Corpus Semantic Web ! Global WordNet ! More FrameNets ! SUMO ! VerbNet ! PropBank, NomBank ! MASC present NLP software 1994 ! MULTEXT > LT tools, LT XML 1995 ! GATE (Sheffield) 1996 1998 ! Alembic Workbench ! ATLAS (NIST) 2003 ! What happened to this? 200? ! Callisto ! UIMA Now: GATE
    [Show full text]
  • Informatics 1: Data & Analysis
    Informatics 1: Data & Analysis Lecture 12: Corpora Ian Stark School of Informatics The University of Edinburgh Friday 27 February 2015 Semester 2 Week 6 http://www.inf.ed.ac.uk/teaching/courses/inf1/da Student Survey Final Day ! ESES: The Edinburgh Student Experience Survey http://www.ed.ac.uk/students/surveys Please log on to MyEd before 1 March to complete the survey. Help guide what we do at the University of Edinburgh, improving your future experience here and that of the students to follow. Ian Stark Inf1-DA / Lecture 12 2015-02-27 Lecture Plan XML We start with technologies for modelling and querying semistructured data. Semistructured Data: Trees and XML Schemas for structuring XML Navigating and querying XML with XPath Corpora One particular kind of semistructured data is large bodies of written or spoken text: each one a corpus, plural corpora. Corpora: What they are and how to build them Applications: corpus analysis and data extraction Ian Stark Inf1-DA / Lecture 12 2015-02-27 Homework ! Tutorial Exercises Tutorial 5 exercises went online earlier this week. In these you use the xmllint command-line tool to check XML validity and run your own XPath queries. Reading T. McEnery and A. Wilson. Corpus Linguistics. Second edition, Edinburgh University Press, 2001. Chapter 2: What is a corpus and what is in it? (§2.2.2 optional) Photocopied handout, also available from the ITO. Ian Stark Inf1-DA / Lecture 12 2015-02-27 Remote Access to DICE ! Much coursework can be done on your own machines, but sometimes it’s important to be able to connect to and use DICE systems.
    [Show full text]
  • The Expanding Horizons of Corpus Analysis
    The expanding horizons of corpus analysis Brian MacWhinney Carnegie Mellon University Abstract By including a focus on multimedia interactions linked to transcripts, corpus linguistics can vastly expand its horizons. This expansion will rely on two continuing developments. First, we need to develop easily used methods for each of the ten analytic methods we have examined, including lexical analyses, QDA (qualitative data analysis), automatic tagging, language profiles, group comparisons, change scores, error analysis, feedback studies, conversation analysis, and modeling. Second, we need to work together to construct a unified database for language studies and related sciences. This database must be grounded on the principles of open access, data-sharing, interoperability, and integrated structure. It must provide powerful tools for searching, multilinguality, and multimedia analysis. If we can build this infrastructure, we will be able to explore more deeply the key questions underlying the structure and functioning of language, as it emerges from the intermeshing of processes operative on eight major timeframes. 1. Introduction Corpus linguistics has benefitted greatly from continuing advances in computer and Internet hardware and software. These advances have made it possible to develop facilities such as BNCweb (bncweb.lancs.ac.uk), LDC (Linguistic Data Consortium) online, the American National Corpus (americannationalcorpus. org), TalkBank (talkbank.org), and CHILDES (childes.psy.cmu.edu). In earlier periods, these corpora were limited to written and transcribed materials. However, most newer corpora now include transcripts linked to either audio or video recordings. The development of this newer corpus methodology is facilitated by technology which makes it easy to produce high-quality video recordings of face-to-face interactions.
    [Show full text]
  • Unit 3: Available Corpora and Software
    Corpus building and investigation for the Humanities: An on-line information pack about corpus investigation techniques for the Humanities Unit 3: Available corpora and software Irina Dahlmann, University of Nottingham 3.1 Commonly-used reference corpora and how to find them This section provides an overview of commonly-used and readily available corpora. It is also intended as a summary only and is far from exhaustive, but should prove useful as a starting point to see what kinds of corpora are available. The Corpora here are divided into the following categories: • Corpora of General English • Monitor corpora • Corpora of Spoken English • Corpora of Academic English • Corpora of Professional English • Corpora of Learner English (First and Second Language Acquisition) • Historical (Diachronic) Corpora of English • Corpora in other languages • Parallel Corpora/Multilingual Corpora Each entry contains the name of the corpus and a hyperlink where further information is available. All the information was accurate at the time of writing but the information is subject to change and further web searches may be required. Corpora of General English The American National Corpus http://www.americannationalcorpus.org/ Size: The first release contains 11.5 million words. The final release will contain 100 million words. Content: Written and Spoken American English. Access/Cost: The second release is available from the Linguistic Data Consortium (http://projects.ldc.upenn.edu/ANC/) for $75. The British National Corpus http://www.natcorp.ox.ac.uk/ Size: 100 million words. Content: Written (90%) and Spoken (10%) British English. Access/Cost: The BNC World Edition is available as both a CD-ROM or via online subscription.
    [Show full text]
  • English Corpus Linguistics: an Introduction - Charles F
    Cambridge University Press 0521808790 - English Corpus Linguistics: An Introduction - Charles F. Meyer Index More information Index Aarts, Bas, 4, 102 Biber, Douglas, et al. (1999) 14 adequacy, 2–3, 10–11 Birmingham Corpus, 15, 142 age, 49–50 Blachman, Edward, 76–7 Altenberg, Bengt, 26–7 BNC see British National Corpus AMALGAM Tagging Project, 86–7, 89 Brill, Eric, 86 American National Corpus, 24, 84, 142 Brill Tagger, 86–8 American Publishing House for the Blind British National Corpus (BNC), 143 Corpus, 17, 142 annotation, 84 analyzing a corpus, 100 composition, 18, 31t, 34, 36, 38, 40–1, 49 determining suitability, 103–7, 107t copyright, 139–40 exploring a corpus, 123–4 planning, 30–2, 33, 43, 51, 138 extracting information: defining parameters, record keeping, 66 107–9; coding and recording, 109–14, research using, 15, 36 112t; locating relevant constructions, speech samples, 59 114–19, 116f, 118f tagging, 87 framing research question, 101–3 time-frame, 45 future prospects, 140–1 British National Corpus (BNC) Sampler, see also pseudo-titles (corpus analysis 139–40, 143 case study); statistical analysis Brown Corpus, xii, 1, 143 anaphors, 97 genre variation, 18 annotation, 98–9 length, 32 future prospects, 140 research using, 6, 9–10, 12, 42, 98, 103 grammatical markup, 81 sampling methodology, 44 parsing, 91–6, 98, 140 tagging, 87, 90 part-of-speech markup, 81 time-frame, 45 structural markup, 68–9, 81–6 see also FROWN (Freiburg–Brown) tagging, 86–91, 97–8, 111, 117–18, 140 Corpus types, 81 Burges, Jen, 52 appositions, 42, 98 Burnard,
    [Show full text]
  • A Study of Issues and Techniques for Creating Core Vocabulary Lists for English As an International Language
    A STUDY OF ISSUES AND TECHNIQUES FOR CREATING CORE VOCABULARY LISTS FOR ENGLISH AS AN INTERNATIONAL LANGUAGE BY C. JOSEPH SORELL A thesis submitted to Victoria University of Wellington in fulfilment of the requirements for the degree of Doctor of Philosophy Victoria University of Wellington 2013 ABSTRACT Core vocabulary lists have long been a tool used by language learners and instructors seeking to facilitate the initial stages of foreign language learning (Fries & Traver, 1960: 2). In the past, these lists were typically based on the intuitions of experienced educators. Even before the advent of computer technology in the mid-twentieth century, attempts were made to create such lists using objective methodologies. These efforts regularly fell short, however, and – in the end – had to be tweaked subjectively. Now, in the 21st century, this is unfortunately still true, at least for those lists whose methodologies have been published. Given the present availability of sizable English- language corpora from around the world and affordable personal computers, this thesis seeks to fill this methodological gap by answering the research question: How can valid core vocabulary lists for English as an International Language be created? A practical taxonomy is proposed based on Biber’s (1988, 1995) multi-dimensional analysis of English texts. This taxonomy is based on correlated linguistic features and reasonably covers representative spoken and written texts in English. The four-part main study assesses the variance in vocabulary data within each of the four key text types: interactive (face-to-face conversation), academic exposition, imaginative narrative, and general reported exposition. The variation in word types found at progressive intervals in corpora of various sizes is measured using the Dice coefficient, a coefficient originally used to measure species variation in different biotic regions (Dice, 1945).
    [Show full text]
  • Towards a Global, Multilingual Framenet
    LREC 2020 WORKSHOP Language Resources and Evaluation Conference 11–16 May 2020 International FrameNet Workshop 2020 Towards a Global, Multilingual FrameNet PROCEEDINGS Edited by: Tiago T. Torrent, Collin F. Baker, Oliver Czulo, Kyoko Ohara and Miriam R. L. Petruck Proceedings of the LREC International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet Edited by: Tiago T. Torrent, Collin F. Baker, Oliver Czulo, Kyoko Ohara and Miriam R. L. Petruck ISBN: 979-10-95546-58-0 EAN: 9791095546580 For more information: European Language Resources Association (ELRA) 9 rue des Cordelières 75013, Paris France http://www.elra.info Email: [email protected] c European Language Resources Association (ELRA) These Workshop Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License ii Introduction This workshop continues the series of International FrameNet Workshops, based on frame semantics and FrameNets around the world, including meetings in Berkeley, California in 2013 (sponsored by the Swedish FrameNet group), in Juiz de Fora, Brazil in 2016 (sponsored by FrameNet Brasil), and in Miyazaki, Japan in 2018 (in conjunction with the LREC conference there). The last of these was specifically designed to bring together two sometimes separate groups of researchers, those concentrating on frame semantics and those concentrating on construction grammar, which Charles J. Fillmore and many of the same colleagues developed in parallel over a period of several decades. The call for papers of the current conference emphasized that the workshop would welcome studies of both theoretical and practical issues, and we are fortunate to have strong contributions of both types, sometimes within a single paper.
    [Show full text]
  • The Architecture of a Multipurpose Australian National Corpus
    The Architecture of a Multipurpose Australian National Corpus Pam Peters Macquarie University 1. Introduction Collaborative planning by Australian researchers for a large national corpus presents us with quite diverse views on: (a) what a corpus is, (b) what research agendas it should support, (c) which varieties of discourse it should contain, (d) how many languages it could include, (e) how the material might be collected, and (f) what kinds of annotation are needed to add value to the texts. The challenge for us all is to develop a comprehensive plan and architecture for a corpus which will encompass as many research agendas as possible. A modular design which allows independent compilation of segments of the notional whole recommends itself, so long as common systems of metadata and annotation can be set down at the start. 2. What is a Corpus? Divergent understandings of what a corpus is reflect some aspects of the word’s history, and the different academic disciplines which have made use of them. In literary studies since the earlier C18, the Latin word corpus has been used to refer to the body of work by a single author, for example, Shakespeare, or set of them, such as the romantic poets. It implies the total output of those authors, not a sampling. In linguistic science since the 1960s, corpus has referred to a structured collection of texts sampled from various types of discourse, including written and – since the 1980s – spoken as well (e.g., Collins Cobuild). It is thus deliberately heterogenous, and designed to be in some sense ‘representative’ of a standard language or variety of it.
    [Show full text]
  • Representation Learning for Information Extraction
    Representation Learning for Information Extraction by Ehsan Amjadian A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Cognitive Science Carleton University Ottawa, Ontario ©2019 Ehsan Amjadian Abstract Distributed representations, predominantly acquired via neural networks, have been applied to natural language processing tasks including speech recognition and machine translation with a success comparable to sophisticated state-of-the-art algo- rithms. The present thesis offers an investigation of the application of such represen- tations to information extraction. Specifically, I explore the suitability of applying shallow distributed representations to the automatic terminology extraction task, as well as the bridging reference resolution task. I created a dataset as a gold standard for automatic term extraction in the mathematical education domain. I carefully as- sessed the performance of the existing terminology extraction methods on this dataset. Then, I introduce a novel method for automatic terminology extraction for one word terms, and I evaluate the performance of the novel algorithm in various terminological domains. The introduced algorithm leverages the distributed representation of words from the local and global perspectives to encode syntactic, semantic, association, and frequency information at the same time. Furthermore, this novel algorithm can be trained with a minimal number of data points. I show that the algorithm is robust to the change of domain, and that information can be transferred from one technical domain to another, leveraging what we call anchor words with consistent semantics shared between the domains. As for the bridging reference resolution task, a dataset is built on the letter portion of the Open American National Corpus and I compare the performance of a preliminary method against a majority class baseline.
    [Show full text]
  • ΓE 77 Computational Linguistics Essays/ Project Topics
    ΓE 77 Computational Linguistics Essays/ Project Topics 1. General Information • Instead of EXAMS • 2.000 to 3.000 words +15% (depends on the topic) • Essay (1-student) OR Project (group of 2-3 students) • Starting date: 2 Apr 2018 • Topic notification date: until 18 Apr 2018 • (Short) Presentation date: Last lecture • Deadline: 20 Jun 2018 2. Summary • (Research | Survey | Literature Review) Papers • Existing NLP Tools (testing | evaluation) • Annotation Project • Corpus data Project • Phonological, Morphological, Syntactic, Semantic Project • Language Learning Platform evaluation Languages: English or Greek or another language 3. Essay Topics • You select an area and I will recommend some articles/books to get you started. One book chapter or big article is going to be the core part of an essay; several articles will be the additional and supplementary bibliography. • Research Areas: Machine Translation, Machine Learning, Summarization, Question Answering, Sentiment Analysis, Information Extraction, Reference Resolution, Speech Analysis/ Synthesis, Thesauri and Ontologies, Building Corpora, Corpus annotation, Language Learning Platforms, etc. You should use examples from English plus another language. Also you can have joint research areas. You can check the book of Speech and Language Processing for additional research areas. 4. Project Topics 4.1. Annotation Project (APHASIA) OR any other spoken data (even video) • Using a tool for multi-level annotations (ELAN), you will annotate aphasic speech data based on an annotation template. • Task 1: Multi-level annotation of a file (2-4 minutes). • Task 2: Point out the linguistic errors and speech events. • Task 3: Present the aphasic speech and its characteristics/parameters based on annotations. • Task 4: Short evaluation of the tool and the annotation template.
    [Show full text]
  • New Kazakh Parallel Text Corpora with On-Line Access
    New Kazakh parallel text corpora with on-line access Zhandos Zhumanov1, Aigerim Madiyeva2 and Diana Rakhimova3 al-Farabi Kazakh National University, Laboratory of Intelligent Information Systems, Almaty, Kazakhstan [email protected], [email protected], [email protected] Abstract. This paper presents a new parallel resource – text corpora – for Ka- zakh language with on-line access. We describe 3 different approaches to col- lecting parallel text and how much data we managed to collect using them, par- allel Kazakh-English text corpora collected from various sources and aligned on sentence level, and web accessible corpus management system that was set up using open source tools – corpus manager Mantee and web GUI KonText. As a result of our work we present working web-accessible corpus management sys- tem to work with collected corpora. Keywords: parallel text corpora, Kazakh language, corpus management system 1 Introduction Linguistic text corpora are large collections of text used for different language studies. They are used in linguistics and other fields that deal with language studies as an ob- ject of study or as a resource. Text corpora are needed for almost any language study since they are basically a representation of the language itself. In computer linguistics text corpora are used for various parsing, machine translation, speech recognition, etc. As shown in [1] text corpora can be classified by many categories: Size: small (to 1 million words); medium (from 1 to 10 million words); large (more than 10 million words). Number of Text Languages: monolingual, bilingual, multilingual. Language of Texts: English, German, Ukrainian, etc. Mode: spoken; written; mixed.
    [Show full text]