Exploring Automated Formant Analysis for Comparalve Varialonist Study Of

Total Page:16

File Type:pdf, Size:1020Kb

Exploring Automated Formant Analysis for Comparalve Varialonist Study Of Exploring automated formant analysis for comparave variaonist study of Heritage Cantonese and English Naomi Cui1, Minyi Zhu1, Vina Law1 Holman Tse2 & Naomi Nagy1 1University of Toronto 2University of PiAsBurgh HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTO HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC What is the HLVC Project? • Large-scale project invesIgang language use and change in heritage (non-official) languages spoken in Toronto. • Goals – To document and describe heritage languages spoken by immigrants and 2 generaons of their descendants – To create a corpus available for research on language change – To push variaonist research beyond its monolingually- oriented core By focusing on heritage language use among mulIlingual speakers – To develop a framework for research on heritage languages and contact 2 A Sample of Previous HLVC Work Cantonese Faetar Italian Korean Russian Ukrainian VOT ✓ ✓ ✓ ✓ ✓ Ø-suBject ✓ ✓ ✓ ✓ Borrowing ✓ Vowels * * This presentaon 3 Vowels • Very well researched in sociolinguisIcs, But very liAle work on vowel variaon and change in languages other than English. • Large Body of research has made possiBle the development of new technologies/techniques to make vowel analysis easier – Example: FAVE (Rosenfelder et al 2011) Image from Wikipedia 4 Goals of Current Project • To determine the extent to which the vowel systems of Cantonese and English may Be mutually influencing each other in Toronto • To extend the use of automated forced alignment and formant extracIon as tools for the sociolinguisIc study of contact-induced change in Heritage Cantonese. – Prosodylab-Aligner (Gorman et al 2011) to Be adopted 5 Methodological ProBlems • Large amount of data in HLVC Corpus (~40 speakers/language) – Manual formant measurements take a lot of Ime. • FAVE designed to work only on English • Could Prosodylab-Aligner Be a viable alternave? 6 30 Dec. 2007 Italian! Chinese! Cantonese! Punjabi" Portuguese" Spanish" Tagalog" Urdu" Tamil" Polish" 7 7 ContrasIng demographics Language MT speakers Ethnic Origin Est. (2011 Census) (2006 Census) in TO Speakers come from Cantonese 170,000+ 537,000 1951 Hong Kong Italian 166,000 466,000 1908 Calabria Russian 78,000 58,505 1916 St. PetersBurg, Moscow Korean 51,000 55,000 1967 Seoul Ukrainian 26,000 122,000 1913 Lviv Faetar <100? 300? 1950 Faeto, Celle di St. Vito (Apulia Italy) www40.statcan.ca/l01/cst01/demo12c-eng.htm; www12.statcan.gc.ca/census-recensement/2011/dp-pd/prof/index.cfm?Lang=E 8 Lviv, Ukraine 1913 Western Poland, 1911 Budapest, Hungary, 1885 Faeto, Italy 1950 9 Cantonese vs. English Vowel Space Images from Wikipedia Allophonic lowering of /i/ Before velars Similar Canadian English Vowels (Yue-Hashimoto 1972) see, /si/ si1, /si˦/, 詩, ‘poem’ sick, /sɪk/ sik1, [sɪk˦], 識, ‘to know’ ??? 10 Expected outcome 1st 2nd Heritage Language / Culture English/Canadian 11 Data • Two sets of hour-long sociolinguisIc interviews from 2 generaons of speakers idenIfied as Hong Kong Chinese and who claim Cantonese as a heritage language – Not from the same speakers, however. Interviews in English from Interviews in Cantonese from the Contact in the City the HLVC Corpus (Nagy 2009, Corpus (CinC) (Hoffman 2011) and Walker 2010) “Ngo5 fu6 mou5 yat1 gau2 cat1 yi6 lin4 lei4 “My parents came dou3 do1 leon1 do1.” to Toronto in 1972.” 12 Speaker Sample Generaon Sex CANTONESE ENGLISH Male C1M62A TO.035 C1M59A TO.038 1 (Ages: 42-82) Female C1F78A TO.030 C1F54A TO.037 C1F82A TO.039 Male C2M44A TO.029 2 (Ages: 16-44) Female C2F16A TO.031 C2F21B TO.056 Total N=8 N=8 13 Methods - English Data 1. Sentence-level Ime alignment (manual) using ELAN 2. Word- and phoneme-level Ime alignment (automated) with FAVE • http://fave.ling.upenn.edu 14 15 Prosodylab-Aligner (Gorman 2011 et al) • A Python script used to perform text to audio speech alignment • Supports training on arBitrary data – à With any input from language X, can Be trained to deal with acousIc data from language X • Requirements – At least a total of one hour of audio (.wav file in chunks OK) – Matching .lab files (.txt files readable By Prosodylab-Aligner) for each .wav file – A customized dicIonary 16 Methods – Cantonese Data 1. Interviews transcriBed By nave speakers of Cantonese using Jyutping Romanizaon in ELAN – Manual sentence-level alignment 2. To create input readable By Prosodylab-Aligner, PRAAT script used to create smaller .wav files with matching .txt files for each annotaon. 17 PRAAT Script C1F54A_IV_2074.wav Transla3on: “Because at that 3me, China was at war.” C1F54A_IV_2105.wav Transla3on: “And then the Communist Party came, and then ...” 18 Training and Evaluaon • .wav files and matching .lab files put in a Training directory • Prosodylab-aligner uses Training directory and dic5onary to Build a training model Custom diconary in the format of The CMU Pronouncing DicIonary • Prosodylab-aligner uses training model to evaluate the same files in the same directory 19 Textgrid Output of Prosodylab-Aligner 20 Another PRAAT script: formant extracIon • Formant informaon extracted from Prosodylab- Aligner generated Textgrids and matching .wav files using PRAAT script • Output: Tab-delimited .txt file 21 Vowel Normalizaon • hAp://ncslaap.liB.ncsu.edu/tools/norm/ norm1.php • Labov ANAE (Vowel Extrinsic) method used 22 Prepping for R-Brul • Tab-delimited .txt file generated By NORM with normalized values for vowel formants • New columns added for variables • Ready for stasIcal analysis with R-brul (Johnson) 23 Variables of Interest • External Factors – Generaon – Gender – Age • Internal Factors – Following Segment – Tone 24 Cantonese Vowel Charts Toronto CAN (8 speakers), Labov ANAE (speaker extrinsic) Hong Kong Homeland CAN YU I Toronto CAN (8 speakers), Labov ANAE (speaker extrinsic) ING IK YU I U E U O F1’ E O F1’ A A 800 700 600 500 400 800 700 600 500 400 AA AA AllSpkrs AllSpkrsImage from Wikipedia 2000 1800 1600 1400 1200 2000 1800 1600 1400 1200F2’ F2’ 25 Toronto Anglo ENG vs CAN ENG Toronto Anglo English Toronto CAN Heritage English UW IY UW IY IH IH F1 OW F1 OW EH AH 700 600 500 400 AE 800 700 600 500 400 AA AA AVG AllSpkrs 2000 1800 1600 1400 1200 2200 2000 1800 1600 1400 1200 F2 F2 Based on means from Roeder 2012, Based on means of 7 speakers Boberg 2008, Roeder & Jarmasz 2010 26 F1 and F2 Means for /i/ in open syllables 1st 2nd Heritage Language / Culture English/Canadian Cantonese CAN English (8 speakers) (11 speakers) Toronto Anglo English Gen F1* F2* Tokens Gen F1** F2** Tokens F1 F2 1 439 2044 3207 1 454 2096 1545 474 2011 2 423 2106 857 2 434 2324 2370 All 435 2057 4064 All 441 2234 3925 •Gen 2 has higher and •Gen 2 has higher and •Anglo English has the more fronted /i/ more fronted /i/ lowest /i/. •*p < 0.05 •**p < 0.01 27 Discussion of Results • Evidence of generaonal change clear with same general developmental trend in Both languages. – Raising and fronIng of /i/ for Gen 2 in Both CAN and CAN ENG • Relave posiIon of /i/ and /ɪ/ are different in CAN and ENG. • Lack of /u/ fronIng in CAN oBserved, But some fronIng in CAN ENG • How these changes result from contact with English (if that is the case) appear to Be quite complex – further research required to BeAer understand how. • Note – Tone not considered as a factor – Variaon and change in other vowels not considered – No homeland data available 28 Discussion of Methodology • Without human intervenIon, automacally extracted data creates reasonable vowel plots • A promising avenue for future research on vowel variaon and change in heritage languages • But need to check and compare results with manual formant extracIon 29 Future Work • Assessing accuracy of automated alignment and formant extracIon By aempIng to replicate results using manual methods • Expanding to more vowels and more speakers – 8 speakers for this analysis, ~ 40 CAN speakers in Corpus – Comparing homeland data • Expanding to other heritage languages – Italian, Faetar, Russian, Ukrainian, Korean 30 감사합니다 дякую Grazie molto Спасибо 多謝! gratsiə namuor:ə HLVC RAs: Rick Grimm Paulina Lyskawa Sarah Truong Cameron Abma Dongkeun Han Rosa Mastri Dylan Uscher Vanessa Bertone Natalia Harhaj Timea Molnár Ka-man Wong Ulyana Bila Taisa Hewka Jamie Oh Olivia Yu Rosanna Calla Melania Hrycyna Maria Parascandolo Minyi Zhu Minji Cha Michael Iannozzi Rita Pang Collaborators: Karen Chan Diana Kim Andrew Peters Yoonjung Kang Joanna Chociej Janyce Kim Tiina ReBane Alexei Kochetov Sheila Chung Iryna Kulyk Hoyeon Rim James Walker Tiffany Chung Mariana Kuzela Will Sawkiw Funding: Courtney Clinton Ann Kwon Maksym Shkvorets SSHRC, University of Radu Craioveanu Alex La GamBa Vera Riche~ Smith Toronto, Marco Covi Carmela La Rosa Anna Shalaginova Shevchenko Derek Denis Natalia Lapinskaya KonstanIn Shapoval Foundaon Tonia Djogovic Kris Lee Yi Qing Sim Joyce Fok Nikki Lee Mario So Gao Paolo Frasca Olga Levitski Awet Tekeste Ma Gardner Arash Loi Josephine Tong HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC 31 References • Boberg, Charles. 2008. “Regional phoneIc differenIaon in Standard Canadian English.” Journal of English Linguiscs 36/2: 129-154. • Gorman, Kyle, Jonathan Howell & Michael Wagner. (2011). Prosodylab-Aligner: A tool for forced alignment of laboratory speech. Proceedings of AcousIcs Week in Canada, QueBec City. • Hoffman, M. F., & Walker, J. A. (2010). Ethnolects and the city: Ethnic orientaon and linguisIc variaon in Toronto English. Language Variaon and Change, 22, 37-67. • LoBanov • Nagy, Naomi. (2009). Heritage Language Variaon and Change in Toronto. hAp:// projects.chass.utoronto.ca/ngn/HLVC. • Roeder, ReBecca. 2012. “The Canadian Shi‚ in Two Ontario CiIes.” Special Issue of World Englishes: Autonomy and Homogeneity in Canadian English 31,4: 478-492. Guest editors Stefan Dollinger and Sandra Clarke. • Roeder, ReBecca and Lidia-Gabriela Jarmasz. 2010. “The Canadian Shi‚ in Toronto.” Revue canadienne de linguisque/Canadian Journal of LinguisIcs 55,3: 387-404.
Recommended publications
  • Wikipedia, the Free Encyclopedia 03-11-09 12:04
    Tea - Wikipedia, the free encyclopedia 03-11-09 12:04 Tea From Wikipedia, the free encyclopedia Tea is the agricultural product of the leaves, leaf buds, and internodes of the Camellia sinensis plant, prepared and cured by various methods. "Tea" also refers to the aromatic beverage prepared from the cured leaves by combination with hot or boiling water,[1] and is the common name for the Camellia sinensis plant itself. After water, tea is the most widely-consumed beverage in the world.[2] It has a cooling, slightly bitter, astringent flavour which many enjoy.[3] The four types of tea most commonly found on the market are black tea, oolong tea, green tea and white tea,[4] all of which can be made from the same bushes, processed differently, and in the case of fine white tea grown differently. Pu-erh tea, a post-fermented tea, is also often classified as amongst the most popular types of tea.[5] Green Tea leaves in a Chinese The term "herbal tea" usually refers to an infusion or tisane of gaiwan. leaves, flowers, fruit, herbs or other plant material that contains no Camellia sinensis.[6] The term "red tea" either refers to an infusion made from the South African rooibos plant, also containing no Camellia sinensis, or, in Chinese, Korean, Japanese and other East Asian languages, refers to black tea. Contents 1 Traditional Chinese Tea Cultivation and Technologies 2 Processing and classification A tea bush. 3 Blending and additives 4 Content 5 Origin and history 5.1 Origin myths 5.2 China 5.3 Japan 5.4 Korea 5.5 Taiwan 5.6 Thailand 5.7 Vietnam 5.8 Tea spreads to the world 5.9 United Kingdom Plantation workers picking tea in 5.10 United States of America Tanzania.
    [Show full text]
  • The Culture of Wikipedia
    Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented.
    [Show full text]
  • Word Segmentation for Chinese Wikipedia Using N-Gram Mutual Information
    Word Segmentation for Chinese Wikipedia Using N-Gram Mutual Information Ling-Xiang Tang1, Shlomo Geva1, Yue Xu1 and Andrew Trotman2 1School of Information Technology Faculty of Science and Technology Queensland University of Technology Queensland 4000 Australia {l4.tang, s.geva, yue.xu}@qut.edu.au 2Department of Computer Science Universityof Otago Dunedin 9054 New Zealand [email protected] Abstract In this paper, we propose an unsupervised seg- is called 激光打印机 in mainland China, but 鐳射打印機 in mentation approach, named "n-gram mutual information", or Hongkong, and 雷射印表機 in Taiwan. NGMI, which is used to segment Chinese documents into n- In digital representations of Chinese text different encoding character words or phrases, using language statistics drawn schemes have been adopted to represent the characters. How- from the Chinese Wikipedia corpus. The approach allevi- ever, most encoding schemes are incompatible with each other. ates the tremendous effort that is required in preparing and To avoid the conflict of different encoding standards and to maintaining the manually segmented Chinese text for train- cater for people's linguistic preferences, Unicode is often used ing purposes, and manually maintaining ever expanding lex- in collaborative work, for example in Wikipedia articles. With icons. Previously, mutual information was used to achieve Unicode, Chinese articles can be composed by people from all automated segmentation into 2-character words. The NGMI the above Chinese-speaking areas in a collaborative way with- approach extends the approach to handle longer n-character out encoding difficulties. As a result, these different forms of words. Experiments with heterogeneous documents from the Chinese writings and variants may coexist within same pages.
    [Show full text]
  • Exeter's Chinese Community
    Telling our Stories, Finding our Roots: Exeter’s Multi-Coloured History Chinese Minority and its Contribution to Diversity in Exeter By Community Researchers, Gordon CHAN and Sasiporn PHONGPLOENPIS February 2013 I. Chinese in Modern Exeter The United Kingdom has long earned its reputation for being a multicultural and diverse society. People from around the world have come here for the sake of safety, jobs and a better life [1]. Being part of the big family, Chinese or British Chinese [2] residents have made up 1.7% [3] of the total population in Exeter. With the increasing number of Chinese overseas students in recent years [4], 7.5% [5] of the total students in the University of Exeter now originate from mainland China [6]. This figure (around 1,300 students) nearly catches up with the number of Chinese residents in Exeter and is equivalent to ~1.1% [7] of the total population. Figure 1. Population in Exeter (2011) according to ethnic groups. (Built based on data from [3]) 1 Exeter residents (118,000 people in 2011) ~1.1% Chinese university students 1.7% Chinese residents Figure 2. Chinese population in Exeter. (Not in scale) [3][7] Exeter is in the top 10 local authority districts in England for businesses that show high potential for growth [8]. Thus, it is not surprising that this city can attract a workforce and intelligent minds, including ethnic minorities from around the world. Even though ethnic minorities might possess different lifestyles, languages, cultures or origins from the majority [9], their existence and contribution could be beneficial to our everyday life.
    [Show full text]
  • The Wili Benchmark Dataset for Written Natural Language Identification
    1 The WiLI benchmark dataset for written II. WRITTEN LANGUAGE BASICS language identification A language is a system of communication. This could be spoken language, written language or sign language. A spoken language Martin Thoma can have several forms of written language. For example, the E-Mail: [email protected] spoken Chinese language has at least three writing systems: Traditional Chinese characters, Simplified Chinese characters and Pinyin — the official romanization system for Standard Chinese. Abstract—This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification. Languages evolve. New words like googling, television and WiLI-2018 is a publicly available,1 free of charge dataset of Internet get added, but written languages are also refactored. short text extracts from Wikipedia. It contains 1000 paragraphs For example, the German orthography reform of 1996 aimed at of 235 languages, totaling in 235 000 paragraphs. WiLI is a classification dataset: Given an unknown paragraph written in making the written language simpler. This means any system one dominant language, it has to be decided which language it which recognizes language and any benchmark needs to be is. adapted over time. Hence WiLI is versioned by year. Languages do not necessarily have only one name. According to Wikipedia, the Sranan language is also known as Sranan Tongo, Sranantongo, Surinaams, Surinamese, Surinamese Creole and I. INTRODUCTION Taki Taki. This makes ISO 369-3 valuable, but not all languages are represented in ISO 369-3. As ISO 369-3 uses combinations The identification of written natural language is a task which of 3 Latin letters and has 547 reserved combinations, it can appears often in web applications.
    [Show full text]
  • 89 Annual Meeting
    Meeting Handbook Linguistic Society of America American Dialect Society American Name Society North American Association for the History of the Language Sciences Society for Pidgin and Creole Linguistics Society for the Study of the Indigenous Languages of the Americas The Association for Linguistic Evidence 89th Annual Meeting UIF+/0 7/-+Fi0N i0N XgLP(+I'L 5/hL- 7/-+Fi0N` 96 ;_AA Ti0(i-e` @\A= ANNUAL REVIEWS It’s about time. Your time. It’s time well spent. VISIT US IN BOOTH #1 LEARN ABOUT OUR NEW JOURNAL AND ENTER OUR DRAWING! New from Annual Reviews: Annual Review of Linguistics linguistics.annualreviews.org • Volume 1 • January 2015 Co-Editors: Mark Liberman, University of Pennsylvania and Barbara H. Partee, University of Massachusetts Amherst The Annual Review of Linguistics covers significant developments in the field of linguistics, including phonetics, phonology, morphology, syntax, semantics, pragmatics, and their interfaces. Reviews synthesize advances in linguistic theory, sociolinguistics, psycholinguistics, neurolinguistics, language change, biology and evolution of language, typology, and applications of linguistics in many domains. Complimentary online access to the first volume will be available until January 2016. TABLE OF CONTENTS: • Suppletion: Some Theoretical Implications, • Correlational Studies in Typological and Historical Jonathan David Bobaljik Linguistics, D. Robert Ladd, Seán G. Roberts, Dan Dediu • Ditransitive Constructions, Martin Haspelmath • Advances in Dialectometry, Martijn Wieling, John Nerbonne • Quotation and Advances in Understanding Syntactic • Sign Language Typology: The Contribution of Rural Sign Systems, Alexandra D'Arcy Languages, Connie de Vos, Roland Pfau • Semantics and Pragmatics of Argument Alternations, • Genetics and the Language Sciences, Simon E. Fisher, Beth Levin Sonja C.
    [Show full text]
  • Training Deep Neural Networks for Bottleneck Feature Extraction
    Training Deep Neural Networks for Bottleneck Feature Extraction Master’s Thesis of Jonas Gehring Interactive Systems Laboratories Carnegie Mellon University, Pittsburgh, USA Karlsruhe Institute of Technology, Germany Advisor: Prof. Alexander Waibel Second advisor: Prof. Florian Metze Reviewer: Prof. Alexander Waibel Second reviewer: Dr. Sebastian Stüker Duration: June 2012 – December 2012 KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu I declare that I have developed and written the enclosed thesis completely by myself, and have not used sources or means without declaration in the text. Karlsruhe, December 7th, 2012 ......................................... Jonas Gehring Abstract In automatic speech recognition systems, preprocessing the audio signal to generate fea- tures is an important part of achieving a good recognition rate. Previous works have shown that artificial neural networks can be used to extract good, discriminative features that yield better recognition performance than manually engineered feature extraction al- gorithms. One possible approach for this is to train a network with a small bottleneck layer, and then use the activations of the units in this layer to produce feature vectors for the remaining parts of the system. Deep learning is a field of machine learning that deals with efficient training algorithms for neural networks with many hidden layers, and with automatic discovery of relevant features from data. While most frequently used in computer vision, multiple recent works have demonstrated the ability of deep networks to achieve superior performance on speech recognition tasks as well. In this work, a novel approach for extracting bottleneck features from deep neural net- works is proposed.
    [Show full text]
  • POWER LANGUAGE INDEX Which Are the World’S Most in Uential Languages?
    Kai L. Chan, PhD POWER LANGUAGE INDEX Which are the world’s most inuential languages? There are over 6,000 languages spoken in the world today, but some 2,000 of them count fewer than a thousand speakers. Moreover, just 15 of them account for half of the languages spoken in the world. Which are the world’s most inuential languages? What are the proper metrics to measure the reach and power of languages? Power Language Index (May 2016) Kai L. Chan, PhD WEF Agenda: These are the most powerful languages in the world1 There are over 6,000 languages spoken in the world today, but some 2,000 of them count fewer than 1,000 speakers. Moreover, just 15 account for half of the languages spoken in the world. In a globalised world with multilingual societies, knowledge of languages is paramount in facilitating communication and in allowing people to participate in society’s cultural, economic and social activities. A pertinent question to ask then is: which are the most useful languages? If an alien were to land on Earth, which language would enable it to most fully engage with humans? To understand the efficacy of language (and by extension culture), consider the doors (“opportunities”) opened by it. Broadly speaking, there are five opportunities provided by language: 1. Geography: The ability to travel 2. Economy: The ability to participate in an economy 3. Communication: The ability to engage in dialogue 4. Knowledge and media: The ability to consume knowledge and media 5. Diplomacy: The ability to engage in international relations So which languages are the most powerful? Based on the opportunities above an index can be constructed to compare/rank languages on their efficacy in the various domains.
    [Show full text]
  • 2018 MCM Problem B: How Many Languages?
    2018 MCM Problem B: How Many Languages? Background: There are currently about 6,900 languages spoken on Earth. About half the world’s population claim one of the following ten languages (in order of most speakers) as a native language: Mandarin (incl. Standard Chinese), Spanish, English, Hindi, Arabic, Bengali, Portuguese, Russian, Punjabi, and Japanese. However, much of the world’s population also speaks a second language. When considering total numbers of speakers of a particular language (native speakers plus second or third, etc. language speakers), the languages and their order change from the native language list provided. The total number of speakers of a language may increase or decrease over time because of a variety of influences to include, but not limited to, the language(s) used and/or promoted by the government in a country, the language(s) used in schools, social pressures, migration and assimilation of cultural groups, and immigration and emigration with countries that speak other languages. Moreover, in our globalized, interconnected world there are additional factors that allow languages that are geographically distant to interact. These factors include international business relations, increased global tourism, the use of electronic communication and social media, and the use of technology to assist in quick and easy language translation. Retrieved from https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers on January 17, 2018. Problem: A large multinational service company, with offices in New York City in the United States and Shanghai in China, is continuing to expand to become truly international. This company is investigating opening additional international offices and desires to have the employees of each office speak both in English and one or more additional languages.
    [Show full text]
  • Language Distinctiveness*
    RAI – data on language distinctiveness RAI data Language distinctiveness* Country profiles *This document provides data production information for the RAI-Rokkan dataset. Last edited on October 7, 2020 Compiled by Gary Marks with research assistance by Noah Dasanaike Citation: Liesbet Hooghe and Gary Marks (2016). Community, Scale and Regional Governance: A Postfunctionalist Theory of Governance, Vol. II. Oxford: OUP. Sarah Shair-Rosenfield, Arjan H. Schakel, Sara Niedzwiecki, Gary Marks, Liesbet Hooghe, Sandra Chapman-Osterkatz (2021). “Language difference and Regional Authority.” Regional and Federal Studies, Vol. 31. DOI: 10.1080/13597566.2020.1831476 Introduction ....................................................................................................................6 Albania ............................................................................................................................7 Argentina ...................................................................................................................... 10 Australia ....................................................................................................................... 12 Austria .......................................................................................................................... 14 Bahamas ....................................................................................................................... 16 Bangladesh ..................................................................................................................
    [Show full text]
  • Dimensions in Variationist Sociolinguistics: A
    DIMENSIONS IN VARIATIONIST SOCIOLINGUISTICS: A SOCIOLINGUISTIC INVESTIGATION OF LANGUAGE VARIATION IN MACAU by WERNER BOTHA submitted in accordance with the requirements for the degree of MASTER OF ARTS WITH SPECIALISATION IN SOCIOLINGUISTICS at the UNIVERSITY OF SOUTH AFRICA Supervisor: PROF. L.A. BARNES November, 2011 Student Number: 34031863 I declare that Dimensions in Variationist Sociolinguistics: a Sociolinguistic Investigation of Language Variation in Macau is my own work and that all the sources that I have used or quoted have been indicated and acknowledged by means of complete references. __________________ 25/11/2011 Signature Date Urge and urge and urge, Always the procreant urge of the world. Out of the dimness opposite equals advance….Always substance and increase, Always a knit of identity….always distinction….always a breed of life. To elaborate is no avail….Learned and unlearned feel that it is so. - Walt Whitman, 1855 ABSTRACT At the very heart of variationist Sociolinguistics is the notion that language has an underlying structure, and that this structure varies according to external linguistic variables such as age, gender, social class, community membership, nationality, and so on. Specifically, this study examines variation in initial and final segments, as well as sentence final particles in Cantonese in Macau Special Administrative Region (SAR). Results of this study indicate that external linguistic constraint categories play a role in the realization of how and when initial and final segments, as well as sentence final particles are used in Macau Cantonese. Finally, this dissertation illustrates that pragmatic functions in the systematic use of linguistic variables requires explanations that draw from variationist sociolinguistic research that has an ethnographic and interpretive basis.
    [Show full text]
  • Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, Pages 1–14, Osaka, Japan, December 12 2016
    VarDial 3 Third Workshop on NLP for Similar Languages, Varieties and Dialects Proceedings of the Workshop December 12, 2016 Osaka, Japan The papers are licenced under a Creative Commons Attribution 4.0 International License License details: http://creativecommons.org/licenses/by/4.0/ ISBN978-4-87974-716-7 ii Preface VarDial is a well-established series of workshops, attracting researchers working on a range of topics related to the study of linguistic variation, e.g., on building language resources for language varieties and dialects or in creating language technology and applications that make use of language closeness and exploit existing resources in a related language or a language variant. The research presented in the two previous editions, namely VarDial’2014, which was co-located with COLING’2014, and LT4VarDial’2015, which was held together with RANLP’2015, focused on topics such as machine translation between closely related languages, adaptation of POS taggers and parsers for similar languages and language varieties, compilation of corpora for language varieties, spelling normalization, and finally discrimination between and identification of similar languages. The latter was also the topic of the DSL shared task, held in conjunction with the workshop. We believe that this is a very timely series of workshops, as research in language variation is much needed in today’s multi-lingual world, where several closely-related languages, language varieties, and dialects are in daily use, not only as spoken colloquial language but also in written media, e.g., in SMS, chats, and social networks. Language resources for these varieties and dialects are sparse and extending them could be very labor-intensive.
    [Show full text]