Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Total Page:16

File Type:pdf, Size:1020Kb

Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots Samson Tanx\ Shafiq Jotyxz xSalesforce Research Asia \National University of Singapore zNanyang Technological University fsamson.tan,[email protected] Abstract Multilingual models have demonstrated im- pressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual commu- (a) Aligned words across sentences nities, it is common for polyglots to code-mix when conversing with each other. Inspired by this phenomenon, we present two strong black- box adversarial attacks (one word-level, one phrase-level) for multilingual models that push their ability to handle code-mixed sentences to the limit. The former uses bilingual dictionar- ies to propose perturbations and translations of the clean example for sense disambiguation. (b) Extracted candidate perturbations The latter directly aligns the clean example with its translations before extracting phrases as perturbations. Our phrase-level attack has (c) Final multilingual adversary a success rate of 89.75% against XLM-Rlarge, Figure 1: BUMBLEBEE’s three key stages of adver- bringing its average accuracy of 79.85 down sary generation: (a) Align words in the matrix (En- to 8.18 on XNLI. Finally, we propose an effi- glish) and embedded sentences (top: Indonesian, cient adversarial training scheme that trains in bottom: Chinese); (b) Extract candidate perturba- the same number of steps as the original model and show that it improves model accuracy.1 tions from embedded sentences; (c) Construct final adversary by maximizing the target model’s loss. 1 Introduction multilingual societies (e.g., Singapore, Papua New The past year has seen incredible breakthroughs in Guinea, etc.), it is common for multilingual inter- cross-lingual generalization with the advent of mas- locutors to produce sentences by mixing words, sive multilingual models that aim to learn universal phrases, and even grammatical structures from the language representations (Pires et al., 2019; Wu languages in their repertoires (Matras and Sakel, and Dredze, 2019; Conneau et al., 2020b). These 2007). This is known as code-mixing (Poplack models have demonstrated impressive cross-lingual et al., 1988), a phenomenon common in casual con- transfer abilities: simply fine-tuning them on task versational environments such as social media and data from a high resource language such as English text messages.2 Hence, it is crucial for NLP sys- after pretraining on monolingual corpora was suffi- tems serving multilingual communities to be robust cient to manifest such abilities. This was observed to code-mixing if they are to understand and estab- even for languages with different scripts and no lish rapport with their users (Tay, 1989; Bawa et al., vocabulary overlap (K et al., 2020). 2020) or defend against adversarial polyglots. However, transferring from one language to an- Although gold standard data (Bali et al., 2014; other is insufficient for NLP systems to understand Patwa et al., 2020) is important for definitively multilingual speakers in an increasingly multilin- evaluating code-mixed text processing ability, such gual world (Aronin and Singleton, 2008). In many datasets are expensive to collect and annotate. The 1Code: github.com/salesforce/adversarial-polyglots 2Examples of real code-mixing in Appendix A. dizzying range of potential language combinations on task data from a high-resource language such as further compounds the immensity of such an effort. English. This is known as cross-lingual transfer. We posit that performance on appropriately crafted adversaries could act as a lower bound of Code-mixed text processing. Previous research a model’s ability to generalize to the distribution on code-mixed text processing focused on con- simulated by said adversaries, an idea akin to worst- structing formal grammars (Joshi, 1982) and token- case analysis (Divekar, 1984). For example, Tan level language identification (Bali et al., 2014; et al.(2020b) showed that an NLP system that was Solorio et al., 2014; Barman et al., 2014), be- robust to morphological adversaries was less per- fore progressing to named entity recognition and plexed by dialectal text exhibiting morphological part-of-speech tagging (Ball and Garrette, 2018; variation. Likewise, if a system is robust to code- AlGhamdi and Diab, 2019; Aguilar and Solorio, mixed adversaries constructed from some set of lan- 2020). Recent work explores code-mixing in guages, it is reasonable to expect it to also perform higher-level tasks such as question answering and better on real code-mixed text in those languages. task-oriented dialogue (Chandu et al., 2019; Ahn While they may not fully model the intricacies of et al., 2020). Muller et al.(2020) demonstrate real code-mixing (Sridhar and Sridhar, 1980), we mBERT’s ability to transfer to an unseen dialect by believe that they can be useful in the absence of exploiting its speakers’ tendency to code-mix. appropriate evaluation data. Hence, we: A key challenge of developing models that are robust to code-mixing is the availability of code- • Propose two strong black-box adversarial attacks mixed datasets. Hence, Winata et al.(2019) use a targeting the cross-lingual generalization ability pointer-generator network to generate synthetically of massive multilingual representations (Fig. 1), code-mixed sentences while Pratapa et al.(2018) demonstrating their effectiveness on state-of-the- explore the use of parse trees for the same purpose. art models for natural language inference and Yang et al.(2020) propose to improve machine question answering. To our knowledge, these are translation with “code-switching pretraining”, re- the first two multilingual adversarial attacks. placing words with their translations in a simi- • Propose an efficient adversarial training scheme lar manner to masked language modeling (Devlin that takes the same number of steps as standard et al., 2019). These word pairs are constructed supervised training and show that it creates more from monolingual corpora using cosine similarity. language-invariant representations, improving ac- Sitaram et al.(2019) provide a comprehensive sur- curacy in the absence of lexical overlap. vey of code-mixed language processing. 2 Related Work Word-level adversaries. Modified inputs aimed at disrupting a model’s predictions are known as Multilingual classifiers. Low resource lan- adversarial examples (Szegedy et al., 2014). In guages often lack support due to the high cost of an- NLP, perturbations can be applied at the character, notating data for supervised learning. An approach subword, word, phrase, or sentence levels. to tackle this challenge is to build cross-lingual rep- Early word-level adversarial attacks (Ebrahimi resentations that only need to be trained on task et al., 2018; Blohm et al., 2018) made use of the data from a high resource language to perform well target model’s gradients to flip individual words to on another under-resourced language (Klementiev trick the model into making the wrong prediction. et al., 2012). Artetxe and Schwenk(2019) pre- However, while the perturbations were adversarial sented the first general purpose multilingual rep- for the target model, perturbed word’s original se- resentation using a BiLSTM encoder. Following mantics was often not preserved. This could result the success of Transformer models (Vaswani et al., in the expected prediction changing and making 2017), recent multilingual models like mBERT (De- the model appear more brittle than it actually is. vlin et al., 2019), Unicoder (Huang et al., 2019), Later research addressed this by searching for ad- and XLM-R (Conneau et al., 2020a) take the versarial rules (Ribeiro et al., 2018) or by constrain- pretraining!fine-tuning paradigm into the multi- ing the candidate perturbations to the k nearest lingual realm by pretraining Transformer encoders neighbors in the embedding space (Alzantot et al., on unlabeled monolingual corpora with various lan- 2018; Michel et al., 2019; Ren et al., 2019; Zhang guage modeling objectives before fine-tuning them et al., 2019; Li et al., 2019; Jin et al., 2020). Zang Original P: The girl that can help me is all the way across town. H: There is no one who can help me. Adversary P: olan girl that can help me is all the way across town. H: dw§ ¯ one who can help me. Prediction Before: Contradiction After: Entailment Original P: We didn’t know where they were going. H: We didn’t know where the people were traveling to. Adversary P: We didn’t know where they were going. H: We didn’t know where les gens allaient. Prediction Before: Entailment After: Neutral Original P: Well it got to where there’s two or three aircraft arrive in a week and I didn’t know where they’re flying to. H: There are never any aircraft arriving. Adversary P: общем, дошло до mahali there’s two or three aircraft arrive in a week and I didn’t know where they’re flying to. H: Îe¡有 aircraft arriving. Prediction Before: Contradiction After: Entailment Table 1: BUMBLEBEE adversaries found for XLM-R on XNLI (P: Premise; H: Hypothesis). et al.(2020) take another approach by making use rules, from different languages in a single sentence. of a annotated sememes to disambiguate polyse- This is distinguished from code-switching, which mous words, while Tan et al.(2020a) perturb only occurs at the inter-sentential level (Kachru, 1978). the words’ morphology and encourage semantic Extreme code-mixing. Inspired by the prolifer- preservation via a part-of-speech constraint. Other ation of real-life code-mixing and polyglots, we approaches make use of language models to gener- propose POLYGLOSS and BUMBLEBEE, two multi- ate candidate perturbations (Garg and Ramakrish- lingual adversarial attacks that adopt the persona of nan, 2020; Han et al., 2020). Wallace et al.(2019) an adversarial code-mixer. We focus on the lexical find phrases that act as universally adversarial per- component of code-mixing, where some words in a turbations when prepended to clean inputs.
Recommended publications
  • Variations in Language Use Across Gender: Biological Versus Sociological Theories
    UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society Title Variations in Language Use across Gender: Biological versus Sociological Theories Permalink https://escholarship.org/uc/item/1q30w4z0 Journal Proceedings of the Annual Meeting of the Cognitive Science Society, 28(28) ISSN 1069-7977 Authors Bell, Courtney M. McCarthy, Philip M. McNamara, Danielle S. Publication Date 2006 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Variations in Language Use across Gender: Biological versus Sociological Theories Courtney M. Bell (cbell@ mail.psyc.memphis.edu Philip M. McCarthy ([email protected]) Danielle S. McNamara ([email protected]) Institute for Intelligent Systems University of Memphis Memphis, TN38152 Abstract West, 1975; West & Zimmerman, 1983) and overlap We examine gender differences in language use in light of women’s speech (Rosenblum, 1986) during conversations the biological and social construction theories of gender. than the reverse. On the other hand, other research The biological theory defines gender in terms of biological indicates no gender differences in interruptions (Aries, sex resulting in polarized and static language differences 1996; James & Clarke, 1993) or insignificant differences based on sex. The social constructionist theory of gender (Anderson & Leaper, 1998). However, potentially more assumes gender differences in language use depend on the context in which the interaction occurs. Gender is important than citing the differences, is positing possible contextually defined and fluid, predicting that males and explanations for why they might exist. We approach that females use a variety of linguistic strategies. We use a problem here by testing the biological and social qualitative linguistic approach to investigate gender constructionist theories (Bergvall, 1999; Coates & differences in language within a context of marital conflict.
    [Show full text]
  • Contesting Regimes of Variation: Critical Groundwork for Pedagogies of Mobile Experience and Restorative Justice
    Robert W. Train Sonoma State University, California CONTESTING REGIMES OF VARIATION: CRITICAL GROUNDWORK FOR PEDAGOGIES OF MOBILE EXPERIENCE AND RESTORATIVE JUSTICE Abstract: This paper examines from a critical transdisciplinary perspective the concept of variation and its fraught binary association with standard language as part of the conceptual toolbox and vocabulary for language educators and researchers. “Variation” is shown to be imbricated a historically-contingent metadiscursive regime in language study as scientific description and education supporting problematic speaker identities (e.g., “non/native”, “heritage”, “foreign”) around an ideology of reduction through which complex sociolinguistic and sociocultural spaces of diversity and variability have been reduced to the “problem” of governing people and spaces legitimated and embodied in idealized teachers and learners of languages invented as the “zero degree of observation” (Castro-Gómez 2005; Mignolo 2011) in ongoing contexts of Western modernity and coloniality. This paper explores how regimes of variation have been constructed in a “sociolinguistics of distribution” (Blommaert 2010) constituted around the delimitation of borders—linguistic, temporal, social and territorial—rather than a “sociolinguistics of mobility” focused on interrogating and problematizing the validity and relevance of those borders in a world characterized by diverse transcultural and translingual experiences of human flow and migration. This paper reframes “variation” as mobile modes-of-experiencing- the-world in order to expand the critical, historical, and ethical vocabularies and knowledge base of language educators and lay the groundwork for pedagogies of experience that impact human lives in the service of restorative social justice. Keywords: metadiscursive regimes w sociolinguistic variation w standard language w sociolinguistics of mobility w pedagogies of experience Train, Robert W.
    [Show full text]
  • Language Ideologies:Bridging the Gap Between Social Structures and Local Practices Introduction to the Colloquium
    Language Ideologies:Bridging the Gap between Social Structures and Local Practices Introduction to the Colloquium Brigitta Busch ¨ Jürgen Spitzmüller University of Vienna ¨ Department of Linguistics Sociolinguistics Symposium öw Murcia, wÏ/.Ï/ö.wÏ Bridging what? Introduction to the Colloquium Busch/Spitzmüller Bridging what? Stance and Metapragmatics Indexical Anchors ‚ Local indexicality – stance and social positions Programme ‚ Social indexicality – language ideologies ö¨öö Bridging what? Introduction to the Colloquium Busch/Spitzmüller Bridging what? Stance and Metapragmatics Indexical Anchors ‚ Local indexicality – stance and social positions Programme ‚ Social indexicality – language ideologies ö¨öö Social Positioning and Stance (as Local Practices) Introduction to the Colloquium Busch/Spitzmüller ‚ Davies,Bronwyn/Harré, Rom (wRR.). Positioning. The Discourse Production of Selves. In: Journal for the Theory Bridging what? of Social Behaviour ö./w, pp. ÿé–Ïé. Stance and Metapragmatics ‚ Wortham, Stanton (ö...). Interactional Positioning and Indexical Anchors Narrative Self-Construction. In: Narrative Inquiry Programme wR/wóÅ-wÏÿ . ‚ Englebretson, Robert (ed.) (ö..Å). Stancetaking in Discourse. Subjectivity,Evaluation, Interaction. Amsterdam/Philadelphia: Benjamins (Pragmatics & Beyond, N. S. wÏÿ). ‚ Deppermann,Arnulf (ö.wó). Positioning. In:Anna de Fina/Alexandra Georgakopoulou (eds.): The Handbook of Narrative Analysis.Oxford: Wiley Blackwell, pp. éÏR–éÏÅ. ‚ amongst many more é¨öö Social Positioning and Stance (as Local Practices – within Discursive Frames) Introduction to the Colloquium Busch/Spitzmüller Bridging what? ‚ Bamberg, Michael (wRRÅ). Positioning Between Structure Stance and and Performance. In: Journal of Narrative and Life History Metapragmatics Å/w-ÿ, pp. ééó–éÿö. Indexical Anchors ‚ Bamberg, Michael/Georgakopoulou,Alexandra (ö..Ï). Programme Small Stories as a New Perspective in Narrative and Identity Analysis. In: Text and Talk öÏ/é, pp.
    [Show full text]
  • Introduction: Documenting Variation in Endangered Languages
    Language Documentation & Conservation Special Publication No. 13 (July 2017) Documenting Variation in Endangered Languages ed. by Kristine A. Hildebrandt, Carmen Jany, and Wilson Silva, pp. 1-5 http://nlfrc.hawaii.edu/ldc/ 1 http://handle.net/24746 Introduction: Documenting Variation in Endangered Languages Kristine A. Hildebrandt, Carmen Jany, and Wilson Silva Southern Illinois University Edwardsville, California State University, San Bernadino, University of Rochester The papers in this special publication are the result of presentations and fol- low-up dialogue on emergent and alternative methods to documenting variation in endangered, minority, or otherwise under-represented languages. Recent decades have seen a burgeoning interest in many aspects of language documentation and field linguistics (Chelliah & de Reuse 2010, Crowley & Thieberger 2007, Gippert et al 2006, Grenoble 2010, Newman & Ratliff 2001, Sakel & Everett 2012, Woodbury 2011).1 There is also a great deal of material dealing with language variation in major languages (Bassiouney 2009, Eckert 2000, Eckert & Rickford 2001, Hinskens 2005, Labov 1972a, 1972b, 1994, 2001, 2006, 2012, Murray & Simon 2006, Wolfram & Schilling-Estes 2005). In contrast, intersections of language variation in endangered and minority languages are still few in number. Yet examples of those few cases published on the intersection of language documentation and language variation reveal exciting poten- tials for linguistics as a discipline, challenging and supporting classical models, creating new models and predictions. For instance, Stanford’s study of Sui (China) (2009) demonstrates that while socio-economic class in indigenous communities is un-illuminating, clan is a useful predictor of lexical variation. Likewise, phonological variation (Clarke 2009) may be more productively observed across different territorial groups in Innu (Canada), highlighting the role of “covert hierarchy” as a social factor.
    [Show full text]
  • JOURNAL of LANGUAGE and LINGUISTIC STUDIES ISSN: 1305-578X Journal of Language and Linguistic Studies, 13(1), 379-398; 2017
    Available online at www.jlls.org JOURNAL OF LANGUAGE AND LINGUISTIC STUDIES ISSN: 1305-578X Journal of Language and Linguistic Studies, 13(1), 379-398; 2017 The impact of non-native English teachers’ linguistic insecurity on learners’ productive skills Giti Ehtesham Daftaria*, Zekiye Müge Tavilb a Gazi University, Ankara, Turkey b Gazi University, Ankara, Turkey APA Citation: Daftari, G.E &Tavil, Z. M. (2017). The Impact of Non-native English Teachers’ Linguistic Insecurity on Learners’ Productive Skills. Journal of Language and Linguistic Studies, 13(1), 379-398. Submission Date: 28/11/2016 Acceptance Date:04/13/2017 Abstract The discrimination between native and non-native English speaking teachers is reported in favor of native speakers in literature. The present study examines the linguistic insecurity of non-native English speaking teachers (NNESTs) and investigates its influence on learners' productive skills by using SPSS software. The eighteen teachers participating in this research study are from different countries, mostly Asian, and they all work in a language institute in Ankara, Turkey. The learners who participated in this work are 300 intermediate, upper-intermediate and advanced English learners. The data related to teachers' linguistic insecurity were collected by questionnaires, semi-structured interviews and proficiency tests. Pearson Correlation and ANOVA Tests were used and the results revealed that NNESTs' linguistic insecurity, neither female nor male teachers, is not significantly correlated with the learners' writing and speaking scores. © 2017 JLLS and the Authors - Published by JLLS. Keywords: linguistic insecurity, non-native English teachers, productive skills, questionnaire, interview, proficiency test 1. Introduction There is no doubt today that English is the unrivaled lingua franca of the world with the largest number of non-native speakers.
    [Show full text]
  • Demographic Dialectal Variation in Social Media: a Case Study of African-American English
    Demographic Dialectal Variation in Social Media: A Case Study of African-American English Su Lin Blodgett† Lisa Green∗ Brendan O’Connor† †College of Information and Computer Sciences ∗Department of Linguistics University of Massachusetts Amherst Abstract As many of these dialects have traditionally ex- isted primarily in oral contexts, they have histor- Though dialectal language is increasingly ically been underrepresented in written sources. abundant on social media, few resources exist Consequently, NLP tools have been developed from for developing NLP tools to handle such lan- text which aligns with mainstream languages. With guage. We conduct a case study of dialectal the rise of social media, however, dialectal language language in online conversational text by in- is playing an increasingly prominent role in online vestigating African-American English (AAE) on Twitter. We propose a distantly supervised conversational text, for which traditional NLP tools model to identify AAE-like language from de- may be insufficient. This impacts many applica- mographics associated with geo-located mes- tions: for example, dialect speakers’ opinions may sages, and we verify that this language fol- be mischaracterized under social media sentiment lows well-known AAE linguistic phenomena. analysis or omitted altogether (Hovy and Spruit, In addition, we analyze the quality of existing 2016). Since this data is now available, we seek to language identification and dependency pars- analyze current NLP challenges and extract dialectal ing tools on AAE-like text, demonstrating that they perform poorly on such text compared to language from online data. text associated with white speakers. We also Specifically, we investigate dialectal language in provide an ensemble classifier for language publicly available Twitter data, focusing on African- identification which eliminates this disparity American English (AAE), a dialect of Standard and release a new corpus of tweets containing AAE-like language.
    [Show full text]
  • 4 Variation in Language and Gender
    98 Suzanne Romaine 4 Variation in Language and Gender SUZANNE ROMAINE 1 Introduction This chapter addresses some of the main research methods, trends, and findings concerning variation in language and gender. Most of the studies examined here have employed what can be referred to as quantitative variationist meth- odology (sometimes also called the quantitative paradigm or variation theory) to reveal and analyze sociolinguistic patterns, that is, correlations between variable features of the kind usually examined in sociolinguistic studies of urban speech communities (e.g. postvocalic /r/ in New York City, glottalization in Glasgow, initial /h/ in Norwich, etc.), and external social factors such as social class, age, sex, network, and style (see Labov 1972a). When such large-scale systematic research into sociolinguistic variation began in the 1960s, its main focus was to illuminate the relationship between language and social structure more generally, rather than the relationship between language and gender specifically. However, the category of sex (un- derstood simply as a binary division between males and females) was often included as a major social variable and instances of gender variation (or sex differentiation, as it was generally called) were noted in relation to other socio- linguistic patterns, particularly, social class and stylistic differentiation. Because the way in which research questions are formed has a bearing on the findings, some of the basic methodological assumptions and the historical context in which the variationist approach emerged are discussed briefly in section 2. The general findings are the focus of section 3, with special reference to connections between sex differentiation, social class stratification, and style shifting.
    [Show full text]
  • From 'Sex Differences' to Gender Variation in Sociolinguistics
    University of Pennsylvania Working Papers in Linguistics Volume 8 Issue 3 Selected Papers from NWAV 30 Article 4 2002 From 'sex differences' to gender variation in sociolinguistics. Mary Bucholtz Follow this and additional works at: https://repository.upenn.edu/pwpl Recommended Citation Bucholtz, Mary (2002) "From 'sex differences' to gender variation in sociolinguistics.," University of Pennsylvania Working Papers in Linguistics: Vol. 8 : Iss. 3 , Article 4. Available at: https://repository.upenn.edu/pwpl/vol8/iss3/4 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/pwpl/vol8/iss3/4 For more information, please contact [email protected]. From 'sex differences' to gender variation in sociolinguistics. This working paper is available in University of Pennsylvania Working Papers in Linguistics: https://repository.upenn.edu/pwpl/vol8/iss3/4 From 'Sex Differences' to Gender Variation in Sociolinguistics 4- MaryjjBucholtz I1n thIntroductioe past decaden, the sociolinguisti'} c study of gender variation has taken new directions, both theoretically and methodologically. This redirection has made the linguistic subfields of language and gender and variationist so­ ciolinguistics relevant to each other in new ways. Within sociolinguistics, issues of gender emerged primarily as the study of "sex differences," in which the focus of analysis wasjj the quantifiable difference between women's and men's use: of particular linguistic variables, especially phonological variables. While thesefquestions were vitally important, their motivation was often less an interest in women or men per se than in under­ standing the social processes that actuate and advance linguistic change. Consequently, the close relationship between language and gender and quantitative sociolinguistics in the early years of both subfields became looser over time, as scholars pursued separate sets of questions with separate theoretical and methodological tools.
    [Show full text]
  • Scalar Effects of Social Networks on Language Variation
    [This is a pre-print version of an article to appear in Language Variation and Change. Please consult the final version if citing.] Scalar effects of social networks on language variation Devyani Sharma Queen Mary, University of London ABSTRACT The role of social networks in language variation has been studied using a wide range of metrics. This study critically examines the effect of different dimensions of networks on different aspects of language variation. Three dimensions of personal network (ethnicity, nationality, diversity) are evaluated in relation to three levels of language structure (phonetic form, accent range, language choice) over three generations of British Asians. The results indicate a scaling of network influences. The two metrics relating to qualities of an individual’s ties are more historically and culturally specific, whereas the network metric that relates to the structure of an individual’s social world appears to exert a more general effect on accent repertoires across generations. This two-tier typology—network qualities (more culturally contingent) and network structures (more general)—facilitates an integrated understanding of previous studies and a more structured methodology for studying the effect of social networks on language. INTRODUCTION Social networks have long been recognized as central to how people speak. Some studies have identified powerful general network mechanisms in language change (e.g., Milroy and Milroy, 1978), while others have focused on more community-specific dynamics (e.g., Gal, 1978). A broad picture of how and why language is influenced by social networks thus tends to be complicated by the difficulty of comparing across studies and the tendency to select network measures relevant to the community in question.
    [Show full text]
  • From Sociolinguistic Variation to Socially Strategic Stylisation
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by White Rose Research Online This is a repository copy of From sociolinguistic variation to socially strategic stylisation. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/86140/ Version: Accepted Version Article: Snell, J orcid.org/0000-0002-0337-7212 (2010) From sociolinguistic variation to socially strategic stylisation. Journal of Sociolinguistics, 14 (5). pp. 630-656. ISSN 1360-6441 https://doi.org/10.1111/j.1467-9841.2010.00457.x © 2010, Wiley. This is the peer reviewed version of the following article: Snell, J (2010) From sociolinguistic variation to socially strategic stylisation. Journal of Sociolinguistics, 14 (5). pp. 630-656 , which has been published in final form at http://dx.doi.org/10.1111/j.1467-9841.2010.00457.x. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. Reuse Unless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher’s website.
    [Show full text]
  • Relations Between Formal Linguistic Insecurity and the Perception of Linguistic Insecurity
    Relations between formal linguistic insecurity and the perception of linguistic insecurity. A quantitative study on linguistic insecurity in an educational environment at the Valencian Community (Spain) Josep M. Baldaquí Escandell Department of Catalan Studies / IIFV, University of Alicante, Spain Ap. Correus 99, E-03080. Alacant (Spain) (Received XX Month Year; final version received XX Month Year) What is the relationship between the awareness of linguistic prestige and the security or insecurity in the use of minoritized languages? Is formal linguistic insecurity (as initially described by Labov) the same as the speakers’ perception of linguistic insecurity? Which are the variables related to the various types of linguistic insecurity in bilingual environments? How does schooling affect this phenomenon? This paper attempts to provide an answer to these questions, based on the results of a research project on linguistic insecurity (LI) carried out in a social context, the Valencian Autonomous Community (Spain), where there are two official languages (Spanish and Catalan) competing against each other, but with a very different degree of social recognition. The choice of an academic environment for this study is justified by the importance that the LI phenomenon may have within the context of language learning in multilingual contexts.1 Keywords: linguistic insecurity; Catalan language; linguistic prestige Introduction Since Labov (1966) formulated the initial definition of the notion of linguistic insecurity (LI), the studies on this construct have expanded and modified the notion, which is nowadays much more complex than Labov’s original proposal. According to Labov (1966), LI is a measurement of the speaker’s perception of the prestige of certain linguistic forms, compared to the ones the speaker remembers he or she normally uses.
    [Show full text]
  • Language Variation and Change Concepts and Definitions
    Language Variation and Change Concepts and definitions WS 2014/15, Campus Essen Raymond Hickey, English Linguistics 1 Speaker contact and its effects 2 Accent 1) A reference to pronunciation, i.e. the collection of phoneticfeatures which allow speakers to be identified regionally and/or socially. Frequently it indicates that someone does not speak the standardform of a language, cf. He speaks with a strong accent. 2) The stress placed on a syllable of a word or the type of stress used by a language (volume, length and/or pitch). In the International Phonetic Alphabet primary accent is shown with a superscript vertical stroke placed before the stressed syllable as in polite [p@"laIt]. A subscript stroke indicates secondary stress, e.g. a "black\bird (compound word) versus a "black "bird (syntactic group). 3 Rule A formulation of a regular process in a language and/or the internalised knowledge of this process (from early language acquisition). Forinstance, the rule for forming an adverb from an adjective consists of adding -ly to the latter as in quick > quickly. Rules are not watertight, e.g. friendly is an adjective although it ends in -ly, because most processes in languages are regular with a small number of exceptions. This may be the residue of an historical process which was not carried through to completion or may be due to the fact that a form does not match the input necessary for a rule, cf. friendly which cannot take the adverbial ending /-li/. Because native speakers do not experience difficulty in mastering exceptions there is rarely a ‘cleanup’operation in a language to remove exceptions.
    [Show full text]