SELECTING and CREATING a WORD LIST for ENGLISH LANGUAGE TEACHING by Deny A
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Spoken BNC2014: Designing and Building a Spoken Corpus Of
The Spoken BNC2014 Designing and building a spoken corpus of everyday conversations Robbie Love i, Claire Dembry ii, Andrew Hardie i, Vaclav Brezina i and Tony McEnery i i Lancaster University / ii Cambridge University Press This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016. After showing that a survey of the recent history of corpora of spo- ken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to (i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while com- piling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of cor- pora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past. Keywords: Spoken BNC2014, transcription, corpus construction, spoken corpora 1. Introduction The ESRC Centre for Corpus Approaches to Social Science (CASS) 1 at Lancaster University and Cambridge University Press have compiled a new, publicly- accessible corpus of present-day spoken British English, gathered in informal con- texts, known as the Spoken British National Corpus 2014 (Spoken BNC2014). This 1. The research presented in this paper was supported by the ESRC Centre for Corpus Approaches to Social Science, ESRC grant reference ES/K002155/1. -
Teaching and Learning Academic Vocabulary
Musa Nushi Shahid Beheshti University Homa Jenabzadeh Shahid Beheshti University Teaching and Learning Academic Vocabulary Developing learners' lexical competence through vocabulary instruction has always been high on second language teachers' agenda. This paper will be focusing on the importance of academic vocabulary and how to teach such vocabulary to adult EFL/ESL learners at intermediate and higher levels of proficiency in the English language. It will also introduce several techniques, mostly those that engage students’ cognitive abilities, which can in turn facilitate the process of teaching and learning academic vocabulary. Several online tools that can assist academic vocabulary teaching and learning will also be introduced. The paper concludes with the introduction and discussion of four important academic vocabulary word lists which can help second language teachers with the identification and selection of academic vocabulary in their instructional planning. Keywords: Vocabulary, Academic, Second language, Word lists, Instruction. 1. Introduction Vocabulary is essential to conveying meaning in a second language (L2). Schmitt (2010) notes that L2 learners seem cognizant of the importance of vocabulary in language learning, as evidenced by their tendency to carry dictionaries, and not grammar books, around with them. It has even been suggested that the main difference between intermediate and advanced L2 learners lies not in how complex their grammatical knowledge is but in how expanded and developed their mental lexicon is (Lewis, 1997). Similarly, McCarthy (1990) observes that "no matter how well the student learns grammar, no matter how successfully the sounds of L2 are mastered, without words to express a wide range of meanings, communication in an L2 just cannot happen in any meaningful way" (p. -
POS Tagging for Improving Code-Switching Identification In
POS Tagging for Improving Code-Switching Identification in Arabic Mohammed Attia1 Younes Samih2 Ali Elkahky1 Hamdy Mubarak2 Ahmed Abdelali2 Kareem Darwish2 1Google LLC, New York City, USA 2Qatar Computing Research Institute, HBKU Research Complex, Doha, Qatar 1{attia,alielkahky}@google.com 2{ysamih,hmubarak,aabdelali,kdarwish}@hbku.edu.qa Abstract and sociolinguistic reasons. Second, existing the- oretical and computational linguistic models are When speakers code-switch between their na- tive language and a second language or lan- based on monolingual data and cannot adequately guage variant, they follow a syntactic pattern explain or deal with the influx of CS data whether where words and phrases from the embedded spoken or written. language are inserted into the matrix language. CS has been studied for over half a century This paper explores the possibility of utiliz- from different perspectives, including theoretical ing this pattern in improving code-switching linguistics (Muysken, 1995; Parkin, 1974), ap- identification between Modern Standard Ara- plied linguistics (Walsh, 1969; Boztepe, 2003; Se- bic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the tati, 1998), socio-linguistics (Barker, 1972; Heller, POS signal in word-level code-switching iden- 2010), psycho-linguistics (Grosjean, 1989; Prior tification. We build a deep learning model en- and Gollan, 2011; Kecskes, 2006), and more re- riched with linguistic features (including POS cently computational linguistics (Solorio and Liu, tags) that outperforms the state-of-the-art re- 2008a; Çetinoglu˘ et al., 2016; Adel et al., 2013b). sults by 1.9% on the development set and 1.0% In this paper, we investigate the possibility of on the test set. -
Defining Vocabulary Words Grades
Vocabulary Instruction Booster Session 2: Defining Vocabulary Words Grades 5–8 Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin © 2014 Texas Education Agency/The University of Texas System Acknowledgments Vocabulary Instruction Booster Session 2: Defining Vocabulary Words was developed with funding from the Texas Education Agency and the support and talent of many individuals whose names do not appear here, but whose hard work and ideas are represented throughout. These individuals include national and state reading experts; researchers; and those who work for the Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin and the Texas Education Agency. Vaughn Gross Center for Reading and Language Arts College of Education The University of Texas at Austin www.meadowscenter.org/vgc Manuel J. Justiz, Dean Greg Roberts, Director Texas Education Agency Michael L. Williams, Commissioner of Education Monica Martinez, Associate Commissioner, Standards and Programs Development Team Meghan Coleman, Lead Author Phil Capin Karla Estrada David Osman Jennifer B. Schnakenberg Jacob Williams Design and Editing Matthew Slater, Editor Carlos Treviño, Designer Special thanks to Alice Independent School District in Alice, Texas Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin © 2014 Texas Education Agency/The University of Texas System Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin © 2014 Texas Education Agency/The University of Texas System Introduction Explicit and robust vocabulary instruction can make a significant difference when we are purposeful in the words we choose to teach our students. -
E-Language: Communication in the Digital Age Dawn Knight1, Newcastle University
e-Language: Communication in the Digital Age Dawn Knight1, Newcastle University 1. Introduction Digital communication in the age of ‘web 2.0’ (that is the second generation of in the internet: an internet focused driven by user-generated content and the growth of social media) is becoming ever-increasingly embedded into our daily lives. It is impacting on the ways in which we work, socialise, communicate and live. Defining, characterising and understanding the ways in which discourse is used to scaffold our existence in this digital world is, therefore, emerged as an area of research that is a priority for applied linguists (amongst others). Corpus linguists are ideally situated to contribute to this work as they have the appropriate expertise to construct, analyse and characterise patterns of language use in large- scale bodies of such digital discourse (labelled ‘e-language’ here - also known as Computer Mediated Communication, CMC: see Walther, 1996; Herring, 1999 and Thurlow et al., 2004, and ‘netspeak’, Crystal, 2003: 17). Typically, forms of e-language are technically asynchronous insofar as each of them is ‘stored at the addressee’s site until they can be ‘read’ by the recipient (Herring 2007: 13). They do not require recipients to be present/ready to ‘receive’ the message at the same time that it is sent, as spoken discourse typically does (see Condon and Cech, 1996; Ko, 1996 and Herring, 2007). However, with the increasing ubiquity of digital communication in daily life, the delivery and reception of digital messages is arguably becoming increasingly synchronous. Mobile apps, such as Facebook, WhatsApp and I-Message (the Apple messaging system), for example, have provisions for allowing users to see when messages are being written, as well as when they are received and read by the. -
Aspects of Idiom*
University of Calgary PRISM: University of Calgary's Digital Repository Calgary (Working) Papers in Linguistics Volume 07, Winter 1982 1982-01 Aspects of idiom* Sayers, Coral University of Calgary Sayers, C. (1982). Aspects of idiom*. Calgary Working Papers in Linguistics, 7(Winter), 13-32. http://hdl.handle.net/1880/51297 journal article Downloaded from PRISM: https://prism.ucalgary.ca Aspects of Idiom* Coral Sayers • 1. Introduction Weinreich (1969) defines an idiom as "a complex expression • whose meaning cannot be derived from the meanings of its elements." This writer has collected examples of idioms from the English, German, Australian English, and Quebec French dialects (Appendix I) in order to examine the properties of the idiom, to explore in the literature the current concepts of idiom, and to relate relevant knowledge gained by these processes to the teaching of English as a Second Language. An idiom may be a word, a "lexical idiom," or, if it has a more complicated constituent structure, a "phrasal idiom." (Katz and Postal, 1964, cited by Fraser, 1970:23) As the lexical idiom, domi nated by a single branch syntactic category (e.g. "squatter," a noun) does not present as much difficulty as does the phrasal idiom, this paper will focus upon the latter. Despite the difficulties presented by idioms to the Trans formational Grammar model, e.g. the contradiction of the claim that virtually all sentences have a low occurrence probability and frequency (Coulmas, 1979), the consensus of opinion seems to be that idioms can be accommodated in the grammatical description. Phrasal idioms of English should be considered as a "more complicated variety of mono morphemic lexical entries." (Fraser, 1970:41) 2. -
Investigating Vocabulary in Academic Spoken English
INVESTIGATING VOCABULARY IN ACADEMIC SPOKEN ENGLISH: CORPORA, TEACHERS, AND LEARNERS BY THI NGOC YEN DANG A thesis submitted to the Victoria University of Wellington in fulfilment of the requirements for the degree of Doctor of Philosophy in Applied Linguistics Victoria University of Wellington 2017 Abstract Understanding academic spoken English is challenging for second language (L2) learners at English-medium universities. A lack of vocabulary is a major reason for this difficulty. To help these learners overcome this challenge, it is important to examine the nature of vocabulary in academic spoken English. This thesis presents three linked studies which were conducted to address this need. Study 1 examined the lexical coverage in nine spoken and nine written corpora of four well-known general high-frequency word lists: West’s (1953) General Service List (GSL), Nation’s (2006) BNC2000, Nation’s (2012) BNC/COCA2000, and Brezina and Gablasova’s (2015) New-GSL. Study 2 further compared the BNC/COCA2000 and the New-GSL, which had the highest coverage in Study 1. It involved 25 English first language (L1) teachers, 26 Vietnamese L1 teachers, 27 various L1 teachers, and 275 Vietnamese English as a Foreign Language learners. The teachers completed 10 surveys in which they rated the usefulness of 973 non-overlapping items between the BNC/COCA2000 and the New- GSL for their learners in a five-point Likert scale. The learners took the Vocabulary Levels Test (Nation, 1983, 1990; Schmitt, Schmitt, & Clapham, 2001), and 15 Yes/No tests which measured their knowledge of the 973 words. Study 3 involved compiling two academic spoken corpora, one academic written corpus, and one non-academic spoken corpus. -
Modeling and Encoding Traditional Wordlists for Machine Applications
Modeling and Encoding Traditional Wordlists for Machine Applications Shakthi Poornima Jeff Good Department of Linguistics Department of Linguistics University at Buffalo University at Buffalo Buffalo, NY USA Buffalo, NY USA [email protected] [email protected] Abstract Clearly, descriptive linguistic resources can be This paper describes work being done on of potential value not just to traditional linguis- the modeling and encoding of a legacy re- tics, but also to computational linguistics. The source, the traditional descriptive wordlist, difficulty, however, is that the kinds of resources in ways that make its data accessible to produced in the course of linguistic description NLP applications. We describe an abstract are typically not easily exploitable in NLP appli- model for traditional wordlist entries and cations. Nevertheless, in the last decade or so, then provide an instantiation of the model it has become widely recognized that the devel- in RDF/XML which makes clear the re- opment of new digital methods for encoding lan- lationship between our wordlist database guage data can, in principle, not only help descrip- and interlingua approaches aimed towards tive linguists to work more effectively but also al- machine translation, and which also al- low them, with relatively little extra effort, to pro- lows for straightforward interoperation duce resources which can be straightforwardly re- with data from full lexicons. purposed for, among other things, NLP (Simons et al., 2004; Farrar and Lewis, 2007). 1 Introduction Despite this, it has proven difficult to create When looking at the relationship between NLP significant electronic descriptive resources due to and linguistics, it is typical to focus on the dif- the complex and specific problems inevitably as- ferent approaches taken with respect to issues sociated with the conversion of legacy data. -
THE IMPACT of USING WORDLISTS in the LANGUAGE CLASSROOM on STUDENTS’ VOCABULARY ACQUISITION Gülçin Coşgun Ozyegin University
International Journal of English Language Teaching Vol.4, No.3, pp.49-66, March 2016 ___Published by European Centre for Research Training and Development UK (www.eajournals.org) THE IMPACT OF USING WORDLISTS IN THE LANGUAGE CLASSROOM ON STUDENTS’ VOCABULARY ACQUISITION Gülçin Coşgun Ozyegin University ABSTRACT: Vocabulary has always been an area of interest for many researchers since words represent “the building block upon which knowledge of the second language can be built” and without them people cannot convey the intended meaning (Abrudan, 2010). Nation (1988) emphasised if teachers want to help their learners to deal with unknown words, it would be better to spend more time on vocabulary learning strategies rather than spending time on individual words. However, Schmitt in Schmitt and McCarthy (1997:200) stated that among vocabulary learning strategies only ‘guessing from context’ and ‘key word method’ have been investigated in depth. Therefore, there is need for more research on vocabulary learning whose pedagogical implications may contribute to the field of second language learning. Considering the above- mentioned issues, vocabulary is a worthwhile field to investigate. Hence, this paper aims at proposing a framework for vocabulary teaching strategy in English as a foreign language context. KEYWORDS: Wordlists, Vocabulary Teaching, Word walls, Vocabulary Learning Strategies INTRODUCTION Vocabulary has always been an area of interest and concern for many researchers and teachers alike since words represent “the building block upon which knowledge of the second language can be built” and without them people cannot convey the intended meaning (Abrudan, 2010). Anderson and Freebody (1981) state that a strong correlation exists between vocabulary and academic achievement. -
FERSIWN GYMRAEG ISOD the National Corpus of Contemporary
FERSIWN GYMRAEG ISOD The National Corpus of Contemporary Welsh Project Report, October 2020 Authors: Dawn Knight1, Steve Morris2, Tess Fitzpatrick2, Paul Rayson3, Irena Spasić and Enlli Môn Thomas4. 1. Introduction 1.1. Purpose of this report This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project. The report lays out the theoretical underpinnings of the research, demonstrating how the project has built on and extended this theory. We also raise and discuss some of the key operational questions that arose during the course of the project, outlining the ways in which they were answered, the impact of these decisions on the resource that has been produced and the longer-term contribution they will make to practices in corpus-building. Finally, we discuss some of the applications and the utility of the work, outlining the impact that CorCenCC is set to have on a range of different individuals and user groups. 1.2. Licence The CorCenCC corpus and associated software tools are licensed under Creative Commons CC-BY-SA v4 and thus are freely available for use by professional communities and individuals with an interest in language. Bespoke applications and instructions are provided for each tool (for links to all tools, refer to section 10 of this report). When reporting information derived by using the CorCenCC corpus data and/or tools, CorCenCC should be appropriately acknowledged (see 1.3). § To access the corpus visit: www.corcencc.org/explore § To access the GitHub site: https://github.com/CorCenCC o GitHub is a cloud-based service that enables developers to store, share and manage their code and datasets. -
An Analysis of the Merriam-Webster's Advanced Learner's English Dictionary^ Second Edition
An Analysis of the Merriam-Webster's Advanced Learner's English Dictionary^ Second Edition Takahiro Kokawa Y ukiyoshi Asada JUNKO SUGIMOTO TETSUO OSADA Kazuo Iked a 1. Introduction The year 2016 saw the publication of the revised Second Edition of the Merriam-Webster^ Advanced Learner }s English Dictionary (henceforth MWALED2)y after the eight year interval since the first edition was put on the market in 2008. We would like to discuss in this paper how it has, or has not changed through nearly a decade of updating between the two publications. In fact,when we saw the striking indigo color on the new edition ’s front and back covers, which happens to be prevalent among many EFL dictionaries on the market nowadays, yet totally different from the emerald green appearance of the previous edition, MWALED1, we anticipated rather extensive renovations in terms of lexicographic fea tures, ways of presentation and the descriptions themselves, as well as the extensiveness of information presented in MWALED2. However, when we compare the facts and figures presented in the blurbs of both editions, we find little difference between the two versions —‘more than 160,000 example sentences ,’ ‘100,000 words and phrases with definitions that are easy to understand/ ‘3,000 core vocabulary words ,’ ’32,000 IPA pronunciations ,,‘More than 22,000 idioms, verbal collocations, and commonly used phrases from Ameri can and British English/ (More than 12,000 usage labels, notes, and paragraphs’ and 'Original drawings and full-color art aid understand ing [sic]5 are the common claims by the two editions. -
All-Words Word Sense Disambiguation Using Concept Embeddings
All-words Word Sense Disambiguation Using Concept Embeddings Rui Suzuki, Kanako Komiya, Masayuki Asahara, Minoru Sasaki, Hiroyuki Shinnou Ibaraki University, 4-12-1 Nakanarusawa, Hitachi, Ibaraki JAPAN, National Institute for Japanese Language and Linguistics, 10-2 Midoricho, Tachikawa, Tokyo, JAPAN, [email protected], kanako.komiya.nlp, minoru.sasaki.01, hiroyuki.shinnou.0828g @vc.ibaraki.ac.jp, [email protected] Abstract All-words word sense disambiguation (all-words WSD) is the task of identifying the senses of all words in a document. Since the sense of a word depends on the context, such as the surrounding words, similar words are believed to have similar sets of surrounding words. We therefore predict the target word senses by calculating the distances between the surrounding word vectors of the target words and their synonyms using word embeddings. In addition, we introduce the new idea of concept embeddings, constructed from concept tag sequences created from the results of previous prediction steps. We predict the target word senses using the distances between surrounding word vectors constructed from word and concept embeddings, via a bootstrapped iterative process. Experimental results show that these concept embeddings were able to improve the performance of Japanese all-words WSD. Keywords: word sense disambiguation, all-words, unsupervised 1. Introduction many words have the same article numbers. Several words Word sense disambiguation (WSD) involves identifying the can have the same precise article number, even when the senses of words in documents. In particular, the WSD task semantic breaks are considered. where the senses of all the words in a document are disam- biguated is referred to as all-words WSD.