Multilingual Code-Switching Identification Via LSTM Recurrent Neural Networks

Total Page:16

File Type:pdf, Size:1020Kb

Multilingual Code-Switching Identification Via LSTM Recurrent Neural Networks Multilingual Code-switching Identification via LSTM Recurrent Neural Networks Younes Samih Suraj Maharjan Dept. of Computational Linguistics Dept. of Computer Science Heinrich Heine University, University of Houston Düsseldorf, Germany Houston, TX, 77004 [email protected] [email protected] Mohammed Attia Laura Kallmeyer Thamar Solorio Google Inc. Dept. of Computational Linguistics Dept. of Computer Science New York City Heinrich Heine University, University of Houston NY, 10011 Düsseldorf, Germany Houston, TX, 77004 [email protected] [email protected] [email protected] Abstract tweets, SMS messages, user comments on the ar- ticles, blogs, etc., this phenomenon is becoming This paper describes the HHU-UH-G system more pervasive. Code-switching does not only occur submitted to the EMNLP 2016 Second Work- shop on Computational Approaches to Code across sentences (inter-sentential) but also within the Switching. Our system ranked first place for same sentence (intra-sentential), adding a substan- Arabic (MSA-Egyptian) with an F1-score of tial complexity dimension to the automatic process- 0.83 and second place for Spanish-English ing of natural languages (Das and Gambäck, 2014). with an F1-score of 0.90. The HHU-UH- This phenomenon is particularly dominant in multi- G system introduces a novel unified neural lingual societies (Milroy and Muysken, 1995), mi- network architecture for language identifica- grant communities (Papalexakis et al., 2014), and tion in code-switched tweets for both Spanish- in other environments due to social changes through English and MSA-Egyptian dialect. The sys- tem makes use of word and character level rep- education and globalization (Milroy and Muysken, resentations to identify code-switching. For 1995). There are also some social, pragmatic and the MSA-Egyptian dialect the system does not linguistic motivations for code-switching, such as rely on any kind of language-specific knowl- the the intent to express group solidarity, establish edge or linguistic resources such as, Part Of authority (Chang and Lin, 2014), lend credibility, or Speech (POS) taggers, morphological analyz- make up for lexical gaps. ers, gazetteers or word lists to obtain state-of- the-art performance. It is not necessary for code-switching to oc- cur only between two different languages like Spanish-English (Solorio and Liu, 2008), Mandarin- 1 Introduction Taiwanese (Yu et al., ) and Turkish-German (Özlem Code-switching can be defined as the act of al- Çetinoglu, 2016), but it can also happen between ternating between elements of two or more lan- three languages, e.g. Bengali, English and Hindi guages or language varieties within the same ut- (Barman et al., 2014), and in some extreme cases terance. The main language is sometimes re- between six languages: English, French, German, ferred to as the ‘host language’, and the embed- Italian, Romansh and Swiss German (Volk and ded language as the ‘guest language’ (Yeh et al., Clematide, 2014). Moreover, this phenomenon can 2013). Code-switching is a wide-spread linguis- occur between two different dialects of the same lan- tic phenomenon in modern informal user-generated guage as between Modern Standard Arabic (MSA) data, whether spoken or written. With the advent and Egyptian Dialect (Elfardy and Diab, 2012), of social media, such as Facebook posts, Twitter or MSA and Moroccan Arabic (Samih and Maier, 50 Proceedings of the Second Workshop on Computational Approaches to Code Switching, pages 50–59, Austin, TX, November 1, 2016. c 2016 Association for Computational Linguistics 2016a; Samih and Maier, 2016b). The current inant language. Their system obtained an accuracy shared task is limited to two scenarios: a) code- of 96.9% at word-level language identification task. switching between two distinct languages: Spanish- The task of detecting code-switching points is English, b) and two language varieties: MSA- generally cast as a sequence labeling problem. Its Egyptian Dialect. difficulty depends largely on the language pair be- With the massive increase in code-switched writ- ing processed. ings in user-generated content, it has become im- Several projects have treated code-switching be- perative to develop tools and methods to handle and tween MSA and Egyptian Arabic. For example, El- process this type of data. Identification of languages fardy et al. (2013) present a system for the detec- used in the sentence is the first step in doing any kind tion of code-switching between MSA and Egyptian of text analysis. For example, most data found in so- Arabic which selects a tag based on the sequence cial media produced by bilingual people is a mixture with a maximum marginal probability, considering of two languages. In order to process or translate this 5-grams. A later version of the system is named data to some other language, the first step will be to AIDA2 (Al-Badrashiny et al., 2015) and it is a more detect text chunks and identify which language each complex hybrid system that incorporates different chunk belongs to. The other categories like named classifiers and components such as language mod- entities, mixed, ambiguous and other are also impor- els, a named entity recognizer, and a morphological tant for further language processing. analyzer. The classification strategy is built as a cas- cade voting system, whereby a conditional Random 2 Related Works Field (CRF) classifier tags each word based on the decisions from four other underlying classifiers. Code-switching has attracted considerable attention The participants of the “First Workshop on Com- in theoretical linguistics and sociolinguistics over putational Approaches to Code Switching” had ap- several decades. However, until recently there has plied a wide range of machine learning and sequence not been much work on the computational pro- learning algorithms with some using additional cessing of code-switched data. The first compu- online resources like English dictionary, Hindi- tational treatment of this linguistic phenomenon Nepali wiki, dbpedia, online dumps, LexNorm, can be found in (Joshi, 1982). He introduces a etc. to tackle the problem of language detec- grammar-based system for parsing and generating tion in code-switched tweets on Nepali-English, code-switched data. More recently, the detection of Spanish-English, Mandarin-English and MSA Di- code-switching has gained traction, starting with the alects (Solorio et al., 2014). For MSA-Dialects, work of (Solorio and Liu, 2008), and culminating in two CRF-based systems, a system using language- the first shared task at the “First Workshop on Com- independent extended Markov models, and a system putational Approaches to Code Switching” (Solorio using a CRF autoencoder have been presented; the et al., 2014). Moreover, there have been efforts latter proved to be the most successful. in creating and annotating code-switching resources The majority of the systems dealing with word- (Özlem Çetinoglu, 2016; Elfardy and Diab, 2012; level language identification in code-switching rely Maharjan et al., 2015; Lignos and Marcus, 2013). on linguistic resources (such as named entity Maharjan et al. (2015) used a user-centric approach gazetteers and word lists) and linguistic informa- to collect code-switched tweets for Nepali-English tion (such as POS tags and morphological analysis), and Spanish-English language pairs. They used two and they use machine learning methods that have methods, namely a dictionary based approach and been typically used with sequence labeling prob- CRF GE and obtained an F1 score of 86% and 87% lems, such as support vector machine (SVM), con- for Spanish-English and Nepali-English respectively ditional random fields (CRF) and n-gram language at word level language identification task. Lig- models. Very few, however, have recently turned nos and Marcus (2013) collected a large number of to recurrent neural networks (RNN) and word em- monolingual Spanish and English tweets and used bedding with remarkable success. (Chang and Lin, ratio list method to tag each token with by its dom- 2014) used a RNN architecture and combined it 51 with pre-trained word2vector skip-gram word em- beddings, a log bilinear model that allows words with similar contexts to have similar embeddings. The word2vec embeddings were trained on a large Twitter corpus of random samples without filtering by language, assuming that different languages tend to share different contexts, allowing embeddings to provide good separation between languages. They showed that their system outperforms the best SVM- based systems reported in the EMNLP’14 Code- Switching Workshop. Vu and Schultz (2014) proposed to adapt the re- current neural network language model to different code-switching behaviors and even use them to gen- erate artificial code-switching text data. Adel et al. (2013) investigated the application of RNN lan- guage models and factored language models to the task of identifying code-switching in speech, and re- ported a significant improvement compared to the traditional n-gram language model. Our work is similar to that of Chang and Lin (2014) in that we use RNNs and word embed- dings. The difference is that we use long-short- term memory (LSTM) with the added advantage of the memory cells that efficiently capture long- Figure 1: System Architecture. distance dependencies. We also combine word- level with character-level representation to obtain morphology-like information on words. where ht is the hidden states vector, W denotes weight matrix, b denotes bias vector and f is the ac- 3 Model tivation function of the hidden layer. Theoretically RNN can learn long distance dependencies, still in In this section, we will provide a brief description practice they fail due the vanishing/exploding gra- of LSTM, and introduce the different components dient (Bengio et al., 1994). To solve this problem , of our code-switching detection model. The archi- Hochreiter and Schmidhuber (1997) introduced the tecture of our system, shown in Figure 1, bears re- long short-term memory RNN (LSTM). The idea semblance to the models introduced by Huang et al.
Recommended publications
  • Why Is Language Typology Possible?
    Why is language typology possible? Martin Haspelmath 1 Languages are incomparable Each language has its own system. Each language has its own categories. Each language is a world of its own. 2 Or are all languages like Latin? nominative the book genitive of the book dative to the book accusative the book ablative from the book 3 Or are all languages like English? 4 How could languages be compared? If languages are so different: What could be possible tertia comparationis (= entities that are identical across comparanda and thus permit comparison)? 5 Three approaches • Indeed, language typology is impossible (non- aprioristic structuralism) • Typology is possible based on cross-linguistic categories (aprioristic generativism) • Typology is possible without cross-linguistic categories (non-aprioristic typology) 6 Non-aprioristic structuralism: Franz Boas (1858-1942) The categories chosen for description in the Handbook “depend entirely on the inner form of each language...” Boas, Franz. 1911. Introduction to The Handbook of American Indian Languages. 7 Non-aprioristic structuralism: Ferdinand de Saussure (1857-1913) “dans la langue il n’y a que des différences...” (In a language there are only differences) i.e. all categories are determined by the ways in which they differ from other categories, and each language has different ways of cutting up the sound space and the meaning space de Saussure, Ferdinand. 1915. Cours de linguistique générale. 8 Example: Datives across languages cf. Haspelmath, Martin. 2003. The geometry of grammatical meaning: semantic maps and cross-linguistic comparison 9 Example: Datives across languages 10 Example: Datives across languages 11 Non-aprioristic structuralism: Peter H. Matthews (University of Cambridge) Matthews 1997:199: "To ask whether a language 'has' some category is...to ask a fairly sophisticated question..
    [Show full text]
  • Manual for Language Test Development and Examining
    Manual for Language Test Development and Examining For use with the CEFR Produced by ALTE on behalf of the Language Policy Division, Council of Europe © Council of Europe, April 2011 The opinions expressed in this work are those of the authors and do not necessarily reflect the official policy of the Council of Europe. All correspondence concerning this publication or the reproduction or translation of all or part of the document should be addressed to the Director of Education and Languages of the Council of Europe (Language Policy Division) (F-67075 Strasbourg Cedex or [email protected]). The reproduction of extracts is authorised, except for commercial purposes, on condition that the source is quoted. Manual for Language Test Development and Examining For use with the CEFR Produced by ALTE on behalf of the Language Policy Division, Council of Europe Language Policy Division Council of Europe (Strasbourg) www.coe.int/lang Contents Foreword 5 3.4.2 Piloting, pretesting and trialling 30 Introduction 6 3.4.3 Review of items 31 1 Fundamental considerations 10 3.5 Constructing tests 32 1.1 How to define language proficiency 10 3.6 Key questions 32 1.1.1 Models of language use and competence 10 3.7 Further reading 33 1.1.2 The CEFR model of language use 10 4 Delivering tests 34 1.1.3 Operationalising the model 12 4.1 Aims of delivering tests 34 1.1.4 The Common Reference Levels of the CEFR 12 4.2 The process of delivering tests 34 1.2 Validity 14 4.2.1 Arranging venues 34 1.2.1 What is validity? 14 4.2.2 Registering test takers 35 1.2.2 Validity
    [Show full text]
  • Error Linguistics and the Teaching and Learning of Written English in Nigerian Universities As a Second Language Environment
    International Journal of Humanities and Social Science Invention (IJHSSI) ISSN (Online): 2319 – 7722, ISSN (Print): 2319 – 7714 www.ijhssi.org ||Volume 9 Issue 6 Ser. I || June 2020 || PP 61-68 Error Linguistics and the Teaching and Learning of Written English in Nigerian Universities as a Second Language Environment MARTIN C. OGAYI Department of English and Literary Studies Ebonyi State University, Abakaliki. Abstract: Alarming decline of the written English language proficiency of graduates of Nigerian universities has been observed for a long period of time. This has engendered a startling and besetting situation in which most graduates of Nigerian universities cannot meet the English language demands of their employers since they cannot write simple letters, memoranda and reports in their places of work. Pedagogy has sought different methods or strategies to remedy the trend. Applied linguistics models have been developed towards the improvement of second or foreign language learning through the use of diagnostic tools such as error analysis. This paper focuses on the usefulness and strategies of error analysis as diagnostic tool and strategy for second language teaching exposes the current state of affairs in English as a Second Language learning in Nigerian sociolinguistic environment highlights the language-learning theories associated with error analysis, and, highlights the factors that hinder effective error analysis of written ESL production in Nigerian universities. The paper also discusses the role of positive corrective feedback as a beneficial error treatment that facilitates second language learning. Practical measures for achieving more efficient and more productive English as a Second Language teaching and learning in Nigerian universities have also been proffered in the paper.
    [Show full text]
  • The Study of Errors and Feedback in Second Language Acquisition
    IIUC STUDIES ISSN 1813-7733 Vol.- 8, December 2011 (p 131-140) The Study of Errors and Feedback in Second Language Acquisition (SLA) Research: Strategies used by the ELT practitioners in Bangladesh to address the errors their students make in learning English Md. Maksud Ali* Abstract: The study of errors and feedback is one of the major issues in Second Language Acquisition (SLA) research. The research in this area is so important because it gives the English language teaching (ELT) practitioners an opportunity to have an insight in understanding the nature of learners’ errors and in giving feedback to learners. Following quantitative method, this paper carries out an empirical cross-sectional survey research on errors and feedback in SLA in the context of Bangladesh in order to generalize the way in which the Bangladeshi ELT practitioners view their students’ errors and the ways they correct the errors. Key Words: Second Language (L2) Learners’ Errors, Mother Tongue (L1) Interference, Interlanguage, Contrastive Analysis (CA), Error Analysis (EA), Personal Competence, Fossilization, Feedback, Learner Autonomy 1 Introduction The study of errors made by the Second Language Learners (SLL) has been one of the major concerns in Second Language Acquisition (SLA) research. Earlier, it was generally considered that Second language (L2) Learners’ errors were the result of mother tongue (L1) interference. However, later there was a reaction against the view and some researchers came up with a new orthodoxy that ‘vast majority of errors’ were not due to L1 interference but rather they were because of their ‘unique linguistic * Lecturer, Department of English Language & Literature, IIUC.
    [Show full text]
  • ECU International Student Writing Colloquium
    INTERNATIONAL STUDENT WRITING COLLOQUIUM Working with International Student Writers: Perspectives from the Field of Second Language Writing February 10-11, 2021 12:00pm – 2:00pm ~All Sessions Delivered via Zoom~ Working with International Student Writers: Perspectives from the Field of Second Language Writing Program Description In this informal, virtual colloquium, world-renowned experts in the field of second language writing share their perspectives and tips on working with international student writers. While sessions target faculty who work with international student writers, faculty from throughout the UNC System are encouraged and welcome to attend. Program organized by Dr. Mark Johnson, Associate Professor of TESOL and Applied Linguistics, East Carolina University®. Program sponsored by the ECU Office of Global Affairs and the ECU Graduate School. Register now! Working with International Student Writers: Perspectives from the Field of Second Language Writing Program Schedule Time Speaker 12:00 pm – 12:45 pm Dr. Charlene Polio 12:45 pm – 1:30 pm Dr. Dana Ferris February 10, 2021 10, February 1:30 pm – 2:00 pm Question and Answer Session Time Speaker 2021 12:00 pm – 12:45 pm Dr. Christine Feak 12:45 pm – 1:30 pm Dr. Paul Kei Matsuda February 11, 1:30 pm – 2:00 pm Question and Answer Session Working with International Student Writers: Perspectives from the Field of Second Language Writing Wednesday, February 10, 12:00 – 12:45pm Speaker: Dr. Charlene Polio Title: Promoting Language and Genre Awareness across Contexts: Being All Things to All People Abstract: A genre approach to teaching writing may focus on a specific genre within a specific field, but we rarely have the luxury of teaching homogeneous groups of students, who do not have diverse goals and needs, particularly at lower proficiency levels.
    [Show full text]
  • Research on Language and Learning: Implications for Language Teaching
    Research on Language and Learning: Implications for Language Teaching EVA ALCÓN' Universitat Jaume 1 ABSTRACT Taking into account severa1 limitations of communicative language teaching (CLT), this paper calls for the need to consider research on language use and learning through communication as a basis for language teaching. It will be argued that a reflective approach towards language teaching and learning might be generated, which is explained in terms of the need to develop a context-sensitive pedagogy and in terms of teachers' and learners' development. KEYWORDS: Language use, teaching, leaming. INTRODUCTION The limitations of the concept of method becomes obvious in the literature on language teaching methodology. This is also referred to as the pendulum metaphor, that is to say, a method is proposed as a reform or rejection of a previously accepted method, it is applied in the language classroom, and eventually it is criticised or extended (for a review see Alcaraz et al., 1993; Alcón, 2002a; Howatt, 1984; Richards & Rodgers, 1986; Sánchez, 1993, 1997). Furthermore, * Addressfor correspondence: Eva Alcón Soler, Dpto. Filología Inglesa y Románica, Facultat Cikncies Humanes i Socials, Universitat Jaume 1, Avda. Sos Baynat, s/n, 12071 Castellón. Telf. 964-729605, Fax: 964-729261. E-mail: [email protected] O Servicio de Publicaciones. Universidad de Murcia. All rights reserved. IJES, VOL 4 ((1,2004, pp. 173-196 174 Eva Alcón as reported by Nassaji (2000), throughout the history of English language teaching methodology, there seems to be a dilema over focused analytic versus unfocused experiential language teaching. While the former considers learning as the development of formal rule-based knowledge, the latter conceptualises learning as the result of naturalistic use of language.
    [Show full text]
  • Foreign Language Teaching and Learning Aleidine Kramer Moeller University of Nebraska–Lincoln, [email protected]
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Publications: Department of Teaching, Department of Teaching, Learning and Teacher Learning and Teacher Education Education 2015 Foreign Language Teaching and Learning Aleidine Kramer Moeller University of Nebraska–Lincoln, [email protected] Theresa Catalano University of Nebraska-Lincoln, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/teachlearnfacpub Part of the Bilingual, Multilingual, and Multicultural Education Commons, Educational Methods Commons, International and Comparative Education Commons, and the Teacher Education and Professional Development Commons Moeller, Aleidine Kramer and Catalano, Theresa, "Foreign Language Teaching and Learning" (2015). Faculty Publications: Department of Teaching, Learning and Teacher Education. 196. http://digitalcommons.unl.edu/teachlearnfacpub/196 This Article is brought to you for free and open access by the Department of Teaching, Learning and Teacher Education at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Faculty Publications: Department of Teaching, Learning and Teacher Education by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Published in J.D. Wright (ed.), International Encyclopedia for Social and Behavioral Sciences 2nd Edition. Vol 9 (Oxford: Pergamon Press, 2015), pp. 327-332. doi: 10.1016/B978-0-08-097086-8.92082-8 Copyright © 2015 Elsevier Ltd. Used by permission. digitalcommons.unl.edu Foreign Language Teaching and Learning Aleidine J. Moeller and Theresa Catalano 1. Department of Teaching, Learning and Teacher Education, University of Nebraska–Lincoln, USA Abstract Foreign language teaching and learning have changed from teacher-centered to learner/learning-centered environments. Relying on language theories, research findings, and experiences, educators developed teaching strategies and learn- ing environments that engaged learners in interactive communicative language tasks.
    [Show full text]
  • Contributors
    Contributors Editors Xuesong (Andy) Gao is an associate professor in the School of Education, the University of New South Wales (Australia). His current research and teaching interests are in the areas of learner autonomy, sociolinguistics, vocabulary studies, language learning narratives and language teacher education. His major publications appear in journals including Applied Linguistics, Educational Studies, English Language Teaching Journal, Journal of Multilingual and Multicultural Development, Language Teaching Research, Research Papers in Education, Studies in Higher Education, System, Teaching and Teacher Education, TESOL Quarterly and World Englishes. In addition, he has published one research mono- graph (Strategic Language Learning) and co-edited a volume on identity, motivation and autonomy with Multilingual Matters. He is a co-editor for the System journal and serves on the editorial and advisory boards for journals including The Asia Pacific Education Researcher, Journal of Language, Identity and Education and Teacher Development. Hayriye Kayi-Aydar is an assistant professor of English applied linguistics at the University of Arizona. Her research works with discourse, narra- tive and English as a second language (ESL) pedagogy, at the intersections of the poststructural second language acquisition (SLA) approaches and interactional sociolinguistics. Her specific research interests are agency, identity and positioning in classroom talk and teacher/learner narratives. Her most recent work investigates how language teachers from different ethnic and racial backgrounds construct professional identities and how they position themselves in relation to others in contexts that include English language learners. Her work on identity and agency has appeared in various peer-reviewed journals, such as TESOL Quarterly, Teaching and Teacher Education, ELT Journal, Critical Inquiry in Language Stud- ies, Journal of Language, Identity, and Education, Journal of Latinos and Education, System and Classroom Discourse.
    [Show full text]
  • Dictionary Writing System
    Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of TshwaneLex Gilles-Maurice de Schryver, Department of African Languages and Cul- tures, Ghent University, Ghent, Belgium; Xhosa Department, University of the Western Cape, Bellville, Republic of South Africa; and TshwaneDJe HLT, Pretoria, Republic of South Africa ([email protected]), and Guy De Pauw, CNTS — Language Technology Group, University of Antwerp, Belgium; and School of Computing and Informatics, University of Nairobi, Kenya ([email protected]) Abstract: In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching. Keywords: LEXICOGRAPHY, DICTIONARY, SOFTWARE, DICTIONARY WRITING SYS- TEM (DWS), CORPUS QUERY PACKAGE (CQP), TSHWANELEX, CORPUS, CORPUS ANNO- TATION, PART-OF-SPEECH TAGGER (POS-TAGGER), MACHINE LEARNING, NORTHERN SOTHO (SESOTHO SA LEBOA) Samenvatting: Woordenboekaanmaaksysteem + corpusanalysepakket: een studie van TshwaneLex. In dit artikel wordt het geïntegreerde corpusanalysepakket van het woordenboekaanmaaksysteem TshwaneLex geanalyseerd. Aandacht gaat zowel naar het verwer- ken van onbewerkte corpusdata als naar geannoteerde corpusdata. Wat het laatste betreft wordt aangetoond hoe, met een minimum aan intellectuele arbeid, automatische leertechnieken met suc- ces kunnen worden ingezet om corpora voor lexicografische doeleinden aan te maken waarin de woordklassen expliciet worden vermeld.
    [Show full text]
  • Error Correction in the Second Language Classroom
    VOLUME 11 ISSUE 2 FALL 2007 FeatURED THEME: Error correction in the second ERROR CORRECTION language classroom by Shawn Loewen The topic of error correction in the second language (L2) classroom tends to spark controversy among both language teachers and L2 acquisition researchers. Teachers may have very strong views about error correction, based on their own previous L2 learning experiences, or they may be more ambivalent, particularly if they have been following the debate among L2 researchers on the topic. Depending on which journal articles a teacher reads, he or she will find error correction described on a continuum ranging from ineffective and possibly harmful (e.g., Truscott, 1999) to beneficial (e.g., Russell & Spada, 2006) and possibly even essential for some grammatical structures (White, 1991). Furthermore, teachers may be confronted with students’ opinions about error correction since students are on the receiving end and often have their own views of if and how it should happen in the classroom. Given these widely varying views, what is a teacher to do? This article will address this question by exploring some of the current thinking about error correction in the field of L2 acquisition research. Before going further, it is important to clarify the context in which error correction will be considered. Second language instruction can be conceptualized as falling into two broad categories: meaning-focused instruction and form-focused instruction (Long, 1996; Ellis, 2001). Meaning-focused instruction is characterized by communicative language teaching and involves no direct, explicit attention to language form. The L2 is seen as a vehicle for learners to express their ideas.
    [Show full text]
  • Macro-Sociolinguistics Page 1 MACRO SOCIOLINGUISTICS
    MACRO SOCIOLINGUISTICS: INSIGHT LANGUAGE Rohib Adrianto Sangia Abstract: Language can be studied internally and externally. As externally, Sociolinguistics as the branch of linguistics looked or put position in relation to language speakers in the community, because in human society is no longer as individuals, will remain as a social community. Sociolinguistics concerns with two aspects of civilization, language and society, there are appropriate terms which are micro and macro in sociolinguistics. The main differences of them are micro-sociolinguistics or sociolinguistics –in narrow sense- is the study of language in relation to society, while macro-sociolinguistics or the sociology of language is the study of society in relation to language. Macro-sociolinguistics focuses such as social factors, exactly the interaction between language and dialect, the study of the decline and stabilization of minority languages, bilingualism developmental stability in a particular group. Keywords: Language, Sociolinguistics, Macro Sociolinguistics, Bilingualism. INTRODUCTION Language is a communication tool that people use to interact with each other. By mastering the language of humans can know the content of the world through science and new knowledge and had never imagined before. As a means of communication and interaction that is only possessed by humans, language can be studied internally and externally (Thomason and Kaufman, 1988: 22). Internally means the study made against internal elements such as language course, the structure of phonological, morphological, and syntactical alone. While externally meaningful study was conducted to things or factors outside the language, but in the use of language itself, speech community or the environment. Language may refer to the specific capacity in humans to obtain and use a complex system of communication, or to a specific agency of a complex communication system.
    [Show full text]
  • Typology, Documentation, Description, and Typology
    Typology, Documentation, Description, and Typology Marianne Mithun University of California, Santa Barbara Abstract If the goals of linguistic typology, are, as described by Plank (2016): (a) to chart linguistic diversity (b) to seek out order or even unity in diversity knowledge of the current state of the art is an invaluable tool for almost any linguistic endeavor. For language documentation and description, knowing what distinctions, categories, and patterns have been observed in other languages makes it possible to identify them more quickly and thoroughly in an unfamiliar language. Knowing how they differ in detail can prompt us to tune into those details. Knowing what is rare cross-linguistically can ensure that unusual features are richly documented and prominent in descriptions. But if documentation and description are limited to filling in typological checklists, not only will much of the essence of each language be missed, but the field of typology will also suffer, as new variables and correlations will fail to surface, and our understanding of deeper factors behind cross-linguistic similarities and differences will not progress. 1. Typological awareness as a tool Looking at the work of early scholars such as Franz Boas and Edward Sapir, it is impossible not to be amazed at the richness of their documentation and the insight of their descriptions of languages so unlike the more familiar languages of Europe. It is unlikely that Boas first arrived on Baffin Island forewarned to watch for velar/uvular distinctions and ergativity. Now more than a century later, an awareness of what distinctions can be significant in languages and what kinds of systems recur can provide tremendous advantages, allowing us to spot potentially important features sooner and identify patterns on the basis of fewer examples.
    [Show full text]