Exploring automated formant analysis for compara ve varia onist study of Heritage Cantonese and English
Naomi Cui1, Minyi Zhu1, Vina Law1 Holman Tse2 & Naomi Nagy1 1University of Toronto 2University of Pi sburgh
HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTO HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC What is the HLVC Project?
• Large-scale project inves ga ng language use and change in heritage (non-official) languages spoken in Toronto. • Goals – To document and describe heritage languages spoken by immigrants and 2 genera ons of their descendants – To create a corpus available for research on language change – To push varia onist research beyond its monolingually- oriented core by focusing on heritage language use among mul lingual speakers – To develop a framework for research on heritage languages and contact
2 A Sample of Previous HLVC Work
Cantonese Faetar Italian Korean Russian Ukrainian VOT ✓ ✓ ✓ ✓ ✓
Ø-subject ✓ ✓ ✓ ✓
Borrowing ✓
Vowels *
* This presenta on
3 Vowels
• Very well researched in sociolinguis cs, but very li le work on vowel varia on and change in languages other than English. • Large body of research has made possible the development of new technologies/techniques to make vowel analysis easier – Example: FAVE (Rosenfelder et al 2011)
Image from Wikipedia
4 Goals of Current Project
• To determine the extent to which the vowel systems of Cantonese and English may be mutually influencing each other in Toronto • To extend the use of automated forced alignment and formant extrac on as tools for the sociolinguis c study of contact-induced change in Heritage Cantonese. – Prosodylab-Aligner (Gorman et al 2011) to be adopted
5 Methodological Problems
• Large amount of data in HLVC Corpus (~40 speakers/language) – Manual formant measurements take a lot of me. • FAVE designed to work only on English • Could Prosodylab-Aligner be a viable alterna ve?
6 30 Dec. 2007
Italian! Chinese! Cantonese! Punjabi Portuguese Spanish Tagalog Urdu Tamil Polish
7 7 Contras ng demographics
Language MT speakers Ethnic Origin Est. (2011 Census) (2006 Census) in TO Speakers come from
Cantonese 170,000+ 537,000 1951 Hong Kong Italian 166,000 466,000 1908 Calabria Russian 78,000 58,505 1916 St. Petersburg, Moscow Korean 51,000 55,000 1967 Seoul Ukrainian 26,000 122,000 1913 Lviv Faetar <100? 300? 1950 Faeto, Celle di St. Vito (Apulia Italy)
www40.statcan.ca/l01/cst01/demo12c-eng.htm; www12.statcan.gc.ca/census-recensement/2011/dp-pd/prof/index.cfm?Lang=E 8 Lviv, Ukraine 1913
Western Poland, 1911
Budapest, Hungary, 1885 Faeto, Italy 1950
9 Cantonese vs. English Vowel Space
Images from Wikipedia
Allophonic lowering of /i/ before velars Similar Canadian English Vowels (Yue-Hashimoto 1972) see, /si/ si1, /si˦/, 詩, ‘poem’ sick, /sɪk/ sik1, [sɪk˦], 識, ‘to know’ ??? 10 Expected outcome
1st 2nd
Heritage Language / Culture English/Canadian
11 Data • Two sets of hour-long sociolinguis c interviews from 2 genera ons of speakers iden fied as Hong Kong Chinese and who claim Cantonese as a heritage language – Not from the same speakers, however.
Interviews in English from Interviews in Cantonese from the Contact in the City the HLVC Corpus (Nagy 2009, Corpus (CinC) (Hoffman 2011) and Walker 2010)
“Ngo5 fu6 mou5 yat1 gau2 cat1 yi6 lin4 lei4 “My parents came dou3 do1 leon1 do1.” to Toronto in 1972.”
12 Speaker Sample
Genera on Sex CANTONESE ENGLISH
Male C1M62A TO.035 C1M59A TO.038 1 (Ages: 42-82) Female C1F78A TO.030 C1F54A TO.037 C1F82A TO.039 Male C2M44A TO.029 2 (Ages: 16-44) Female C2F16A TO.031 C2F21B TO.056 Total N=8 N=8
13 Methods - English Data
1. Sentence-level me alignment (manual) using ELAN
2. Word- and phoneme-level me alignment (automated) with FAVE • http://fave.ling.upenn.edu
14 15 Prosodylab-Aligner (Gorman 2011 et al)
• A Python script used to perform text to audio speech alignment • Supports training on arbitrary data – à With any input from language X, can be trained to deal with acous c data from language X • Requirements – At least a total of one hour of audio (.wav file in chunks OK) – Matching .lab files (.txt files readable by Prosodylab-Aligner) for each .wav file – A customized dic onary
16 Methods – Cantonese Data 1. Interviews transcribed by na ve speakers of Cantonese using Jyutping Romaniza on in ELAN – Manual sentence-level alignment
2. To create input readable by Prosodylab-Aligner, PRAAT script used to create smaller .wav files with matching .txt files for each annota on.
17 PRAAT Script
C1F54A_IV_2074.wav
Transla on: “Because at that me, China was at war.”
C1F54A_IV_2105.wav Transla on: “And then the Communist Party came, and then ...”
18 Training and Evalua on
• .wav files and matching .lab files put in a Training directory
• Prosodylab-aligner uses Training directory and dic onary to build a training model
Custom dic onary in the format of The CMU Pronouncing Dic onary • Prosodylab-aligner uses training model to evaluate the same files in the same directory
19 Textgrid Output of Prosodylab-Aligner
20 Another PRAAT script: formant extrac on • Formant informa on extracted from Prosodylab- Aligner generated Textgrids and matching .wav files using PRAAT script • Output: Tab-delimited .txt file
21 Vowel Normaliza on
• h p://ncslaap.lib.ncsu.edu/tools/norm/ norm1.php • Labov ANAE (Vowel Extrinsic) method used
22 Prepping for R-Brul
• Tab-delimited .txt file generated by NORM with normalized values for vowel formants • New columns added for variables • Ready for sta s cal analysis with R-brul (Johnson)
23 Variables of Interest
• External Factors – Genera on – Gender – Age • Internal Factors – Following Segment – Tone
24 Cantonese Vowel Charts
Toronto CAN (8 speakers), Labov ANAE (speaker extrinsic) Hong Kong Homeland CAN
YU I Toronto CAN (8 speakers), Labov ANAE (speaker extrinsic)
ING IK YU I U
E U O F1’ E O F1’ A
A 800 700 600 500 400 800 700 600 500 400 AA AA AllSpkrs AllSpkrsImage from Wikipedia 2000 1800 1600 1400 1200
2000 1800 1600 1400 1200F2’ F2’ 25 Toronto Anglo ENG vs CAN ENG
Toronto Anglo English Toronto CAN Heritage English
UW IY UW IY
IH
IH F1
OW F1
OW
EH AH 700 600 500 400
AE 800 700 600 500 400 AA AA AVG AllSpkrs
2000 1800 1600 1400 1200 2200 2000 1800 1600 1400 1200
F2 F2 Based on means from Roeder 2012, Based on means of 7 speakers Boberg 2008, Roeder & Jarmasz 2010 26 F1 and F2 Means for /i/ in open syllables 1st 2nd
Heritage Language / Culture English/Canadian
Cantonese CAN English (8 speakers) (11 speakers) Toronto Anglo English Gen F1* F2* Tokens Gen F1** F2** Tokens F1 F2 1 439 2044 3207 1 454 2096 1545 474 2011 2 423 2106 857 2 434 2324 2370 All 435 2057 4064 All 441 2234 3925
•Gen 2 has higher and •Gen 2 has higher and •Anglo English has the more fronted /i/ more fronted /i/ lowest /i/. •*p < 0.05 •**p < 0.01
27 Discussion of Results
• Evidence of genera onal change clear with same general developmental trend in both languages. – Raising and fron ng of /i/ for Gen 2 in both CAN and CAN ENG • Rela ve posi on of /i/ and /ɪ/ are different in CAN and ENG. • Lack of /u/ fron ng in CAN observed, but some fron ng in CAN ENG • How these changes result from contact with English (if that is the case) appear to be quite complex – further research required to be er understand how. • Note – Tone not considered as a factor – Varia on and change in other vowels not considered – No homeland data available
28 Discussion of Methodology
• Without human interven on, automa cally extracted data creates reasonable vowel plots • A promising avenue for future research on vowel varia on and change in heritage languages • But need to check and compare results with manual formant extrac on
29 Future Work
• Assessing accuracy of automated alignment and formant extrac on by a emp ng to replicate results using manual methods • Expanding to more vowels and more speakers – 8 speakers for this analysis, ~ 40 CAN speakers in Corpus – Comparing homeland data • Expanding to other heritage languages – Italian, Faetar, Russian, Ukrainian, Korean
30 감사합니다 дякую Grazie molto Спасибо 多謝! gratsiə namuor:ə
HLVC RAs: Rick Grimm Paulina Lyskawa Sarah Truong Cameron Abma Dongkeun Han Rosa Mastri Dylan Uscher Vanessa Bertone Natalia Harhaj Timea Molnár Ka-man Wong Ulyana Bila Taisa Hewka Jamie Oh Olivia Yu Rosanna Calla Melania Hrycyna Maria Parascandolo Minyi Zhu Minji Cha Michael Iannozzi Rita Pang Collaborators: Karen Chan Diana Kim Andrew Peters Yoonjung Kang Joanna Chociej Janyce Kim Tiina Rebane Alexei Kochetov Sheila Chung Iryna Kulyk Hoyeon Rim James Walker Tiffany Chung Mariana Kuzela Will Sawkiw Funding: Courtney Clinton Ann Kwon Maksym Shkvorets SSHRC, University of Radu Craioveanu Alex La Gamba Vera Riche Smith Toronto, Marco Covi Carmela La Rosa Anna Shalaginova Shevchenko Derek Denis Natalia Lapinskaya Konstan n Shapoval Founda on Tonia Djogovic Kris Lee Yi Qing Sim Joyce Fok Nikki Lee Mario So Gao Paolo Frasca Olga Levitski Awet Tekeste Ma Gardner Arash Lo i Josephine Tong
HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC 31
References
• Boberg, Charles. 2008. “Regional phone c differen a on in Standard Canadian English.” Journal of English Linguis cs 36/2: 129-154. • Gorman, Kyle, Jonathan Howell & Michael Wagner. (2011). Prosodylab-Aligner: A tool for forced alignment of laboratory speech. Proceedings of Acous cs Week in Canada, Quebec City. • Hoffman, M. F., & Walker, J. A. (2010). Ethnolects and the city: Ethnic orienta on and linguis c varia on in Toronto English. Language Varia on and Change, 22, 37-67. • Lobanov • Nagy, Naomi. (2009). Heritage Language Varia on and Change in Toronto. h p:// projects.chass.utoronto.ca/ngn/HLVC. • Roeder, Rebecca. 2012. “The Canadian Shi in Two Ontario Ci es.” Special Issue of World Englishes: Autonomy and Homogeneity in Canadian English 31,4: 478-492. Guest editors Stefan Dollinger and Sandra Clarke. • Roeder, Rebecca and Lidia-Gabriela Jarmasz. 2010. “The Canadian Shi in Toronto.” Revue canadienne de linguis que/Canadian Journal of Linguis cs 55,3: 387-404. • Rosenfelder, Ingrid; Fruehwald, Joe; Evanini, Keelan and Jiahong Yuan. (2011). FAVE (Forced Alignment and Vowel Extrac on) Program Suite. h p://fave.ling.upenn.edu. • Wi enburg, Peter, H. Brugman, Albert Russel, A. Klassmann, and Han Sloetjes. (2006). ELAN: a Professional Framework for Mul modality Research. Proceedings of LREC 2006, Fi h Interna onal Conference on Language Resources and Evalua on. • Yue Hashimoto, Oi-kan 1972. Phonology of Cantonese. Cambridge University Press.
32 HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTO
HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC EMAIL:[email protected] FOR TODAY’S SLIDES: EMAIL: [email protected]
33