Exploring automated formant analysis for comparave variaonist study of Heritage and English

Naomi Cui1, Minyi Zhu1, Vina Law1 Holman Tse2 & Naomi Nagy1 1University of Toronto 2University of Pisburgh

HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTO HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC What is the HLVC Project?

• Large-scale project invesgang language use and change in heritage (non-official) languages spoken in Toronto. • Goals – To document and describe heritage languages spoken by immigrants and 2 generaons of their descendants – To create a corpus available for research on language change – To push variaonist research beyond its monolingually- oriented core by focusing on heritage language use among mullingual speakers – To develop a framework for research on heritage languages and contact

2 A Sample of Previous HLVC Work

Cantonese Faetar Italian Korean Russian Ukrainian VOT ✓ ✓ ✓ ✓ ✓

Ø-subject ✓ ✓ ✓ ✓

Borrowing ✓

Vowels *

* This presentaon

3 Vowels

• Very well researched in sociolinguiscs, but very lile work on vowel variaon and change in languages other than English. • Large body of research has made possible the development of new technologies/techniques to make vowel analysis easier – Example: FAVE (Rosenfelder et al 2011)

Image from

4 Goals of Current Project

• To determine the extent to which the vowel systems of Cantonese and English may be mutually influencing each other in Toronto • To extend the use of automated forced alignment and formant extracon as tools for the sociolinguisc study of contact-induced change in Heritage Cantonese. – Prosodylab-Aligner (Gorman et al 2011) to be adopted

5 Methodological Problems

• Large amount of data in HLVC Corpus (~40 speakers/language) – Manual formant measurements take a lot of me. • FAVE designed to work only on English • Could Prosodylab-Aligner be a viable alternave?

6 30 Dec. 2007

Italian! Chinese! Cantonese! Punjabi Portuguese Spanish Tagalog Urdu Tamil Polish

7 7 Contrasng demographics

Language MT speakers Ethnic Origin Est. (2011 Census) (2006 Census) in TO Speakers come from

Cantonese 170,000+ 537,000 1951 Italian 166,000 466,000 1908 Calabria Russian 78,000 58,505 1916 St. Petersburg, Moscow Korean 51,000 55,000 1967 Seoul Ukrainian 26,000 122,000 1913 Lviv Faetar <100? 300? 1950 Faeto, Celle di St. Vito (Apulia Italy)

www40.statcan.ca/l01/cst01/demo12c-eng.htm; www12.statcan.gc.ca/census-recensement/2011/dp-pd/prof/index.cfm?Lang=E 8 Lviv, Ukraine 1913

Western Poland, 1911

Budapest, Hungary, 1885 Faeto, Italy 1950

9 Cantonese vs. English Vowel Space

Images from Wikipedia

Allophonic lowering of /i/ before velars Similar Canadian English Vowels (Yue-Hashimoto 1972) see, /si/ si1, /si˦/, 詩, ‘poem’ sick, /sɪk/ sik1, [sɪk˦], 識, ‘to know’ ??? 10 Expected outcome

1st 2nd

Heritage Language / Culture English/Canadian

11 Data • Two sets of hour-long sociolinguisc interviews from 2 generaons of speakers idenfied as Hong Kong Chinese and who claim Cantonese as a heritage language – Not from the same speakers, however.

Interviews in English from Interviews in Cantonese from the Contact in the City the HLVC Corpus (Nagy 2009, Corpus (CinC) (Hoffman 2011) and Walker 2010)

“Ngo5 fu6 mou5 yat1 gau2 cat1 yi6 lin4 lei4 “My parents came dou3 do1 leon1 do1.” to Toronto in 1972.”

12 Speaker Sample

Generaon Sex CANTONESE ENGLISH

Male C1M62A TO.035 C1M59A TO.038 1 (Ages: 42-82) Female C1F78A TO.030 C1F54A TO.037 C1F82A TO.039 Male C2M44A TO.029 2 (Ages: 16-44) Female C2F16A TO.031 C2F21B TO.056 Total N=8 N=8

13 Methods - English Data

1. Sentence-level me alignment (manual) using ELAN

2. Word- and phoneme-level me alignment (automated) with FAVE • http://fave.ling.upenn.edu

14 15 Prosodylab-Aligner (Gorman 2011 et al)

• A Python script used to perform text to audio speech alignment • Supports training on arbitrary data – à With any input from language X, can be trained to deal with acousc data from language X • Requirements – At least a total of one hour of audio (.wav file in chunks OK) – Matching .lab files (.txt files readable by Prosodylab-Aligner) for each .wav file – A customized diconary

16 Methods – Cantonese Data 1. Interviews transcribed by nave speakers of Cantonese using Romanizaon in ELAN – Manual sentence-level alignment

2. To create input readable by Prosodylab-Aligner, PRAAT script used to create smaller .wav files with matching .txt files for each annotaon.

17 PRAAT Script

C1F54A_IV_2074.wav

Translaon: “Because at that me, was at war.”

C1F54A_IV_2105.wav Translaon: “And then the Communist Party came, and then ...”

18 Training and Evaluaon

• .wav files and matching .lab files put in a Training directory

• Prosodylab-aligner uses Training directory and diconary to build a training model

Custom diconary in the format of The CMU Pronouncing Diconary • Prosodylab-aligner uses training model to evaluate the same files in the same directory

19 Textgrid Output of Prosodylab-Aligner

20 Another PRAAT script: formant extracon • Formant informaon extracted from Prosodylab- Aligner generated Textgrids and matching .wav files using PRAAT script • Output: Tab-delimited .txt file

21 Vowel Normalizaon

• hp://ncslaap.lib.ncsu.edu/tools/norm/ norm1.php • Labov ANAE (Vowel Extrinsic) method used

22 Prepping for R-Brul

• Tab-delimited .txt file generated by NORM with normalized values for vowel formants • New columns added for variables • Ready for stascal analysis with R-brul (Johnson)

23 Variables of Interest

• External Factors – Generaon – Gender – Age • Internal Factors – Following Segment – Tone

24 Cantonese Vowel Charts

Toronto CAN (8 speakers), Labov ANAE (speaker extrinsic) Hong Kong Homeland CAN

YU I Toronto CAN (8 speakers), Labov ANAE (speaker extrinsic)

ING IK YU I U

E U O F1’ E O F1’ A

A 800 700 600 500 400 800 700 600 500 400 AA AA AllSpkrs AllSpkrsImage from Wikipedia 2000 1800 1600 1400 1200

2000 1800 1600 1400 1200F2’ F2’ 25 Toronto Anglo ENG vs CAN ENG

Toronto Anglo English Toronto CAN Heritage English

UW IY UW IY

IH

IH F1

OW F1

OW

EH AH 700 600 500 400

AE 800 700 600 500 400 AA AA AVG AllSpkrs

2000 1800 1600 1400 1200 2200 2000 1800 1600 1400 1200

F2 F2 Based on means from Roeder 2012, Based on means of 7 speakers Boberg 2008, Roeder & Jarmasz 2010 26 F1 and F2 Means for /i/ in open syllables 1st 2nd

Heritage Language / Culture English/Canadian

Cantonese CAN English (8 speakers) (11 speakers) Toronto Anglo English Gen F1* F2* Tokens Gen F1** F2** Tokens F1 F2 1 439 2044 3207 1 454 2096 1545 474 2011 2 423 2106 857 2 434 2324 2370 All 435 2057 4064 All 441 2234 3925

•Gen 2 has higher and •Gen 2 has higher and •Anglo English has the more fronted /i/ more fronted /i/ lowest /i/. •*p < 0.05 •**p < 0.01

27 Discussion of Results

• Evidence of generaonal change clear with same general developmental trend in both languages. – Raising and fronng of /i/ for Gen 2 in both CAN and CAN ENG • Relave posion of /i/ and /ɪ/ are different in CAN and ENG. • Lack of /u/ fronng in CAN observed, but some fronng in CAN ENG • How these changes result from contact with English (if that is the case) appear to be quite complex – further research required to beer understand how. • Note – Tone not considered as a factor – Variaon and change in other vowels not considered – No homeland data available

28 Discussion of Methodology

• Without human intervenon, automacally extracted data creates reasonable vowel plots • A promising avenue for future research on vowel variaon and change in heritage languages • But need to check and compare results with manual formant extracon

29 Future Work

• Assessing accuracy of automated alignment and formant extracon by aempng to replicate results using manual methods • Expanding to more vowels and more speakers – 8 speakers for this analysis, ~ 40 CAN speakers in Corpus – Comparing homeland data • Expanding to other heritage languages – Italian, Faetar, Russian, Ukrainian, Korean

30 감사합니다 дякую Grazie molto Спасибо 多謝! gratsiə namuor:ə

HLVC RAs: Rick Grimm Paulina Lyskawa Sarah Truong Cameron Abma Dongkeun Han Rosa Mastri Dylan Uscher Vanessa Bertone Natalia Harhaj Timea Molnár Ka-man Wong Ulyana Bila Taisa Hewka Jamie Oh Olivia Yu Rosanna Calla Melania Hrycyna Maria Parascandolo Minyi Zhu Minji Cha Michael Iannozzi Rita Pang Collaborators: Karen Chan Diana Kim Andrew Peters Yoonjung Kang Joanna Chociej Janyce Kim Tiina Rebane Alexei Kochetov Sheila Chung Iryna Kulyk Hoyeon Rim James Walker Tiffany Chung Mariana Kuzela Will Sawkiw Funding: Courtney Clinton Ann Kwon Maksym Shkvorets SSHRC, University of Radu Craioveanu Alex La Gamba Vera Riche Smith Toronto, Marco Covi Carmela La Rosa Anna Shalaginova Shevchenko Derek Denis Natalia Lapinskaya Konstann Shapoval Foundaon Tonia Djogovic Kris Lee Yi Qing Sim Joyce Fok Nikki Lee Mario So Gao Paolo Frasca Olga Levitski Awet Tekeste Ma Gardner Arash Loi Josephine Tong

HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC 31

References

• Boberg, Charles. 2008. “Regional phonec differenaon in Standard Canadian English.” Journal of English Linguiscs 36/2: 129-154. • Gorman, Kyle, Jonathan Howell & Michael Wagner. (2011). Prosodylab-Aligner: A tool for forced alignment of laboratory speech. Proceedings of Acouscs Week in , Quebec City. • Hoffman, M. F., & Walker, J. A. (2010). Ethnolects and the city: Ethnic orientaon and linguisc variaon in Toronto English. Language Variaon and Change, 22, 37-67. • Lobanov • Nagy, Naomi. (2009). Heritage Language Variaon and Change in Toronto. hp:// projects.chass.utoronto.ca/ngn/HLVC. • Roeder, Rebecca. 2012. “The Canadian Shi in Two Ontario Cies.” Special Issue of World Englishes: Autonomy and Homogeneity in Canadian English 31,4: 478-492. Guest editors Stefan Dollinger and Sandra Clarke. • Roeder, Rebecca and Lidia-Gabriela Jarmasz. 2010. “The Canadian Shi in Toronto.” Revue canadienne de linguisque/Canadian Journal of Linguiscs 55,3: 387-404. • Rosenfelder, Ingrid; Fruehwald, Joe; Evanini, Keelan and Jiahong Yuan. (2011). FAVE (Forced Alignment and Vowel Extracon) Program Suite. hp://fave.ling.upenn.edu. • Wienburg, Peter, H. Brugman, Albert Russel, A. Klassmann, and Han Sloetjes. (2006). ELAN: a Professional Framework for Mulmodality Research. Proceedings of LREC 2006, Fih Internaonal Conference on Language Resources and Evaluaon. • Yue Hashimoto, Oi-kan 1972. Phonology of Cantonese. Cambridge University Press.

32 HERITAGE LANGUAGE VARIATION AND CHANGE IN TORONTO

HTTP://PROJECTS.CHASS.UTORONTO.CA/NGN/HLVC EMAIL:[email protected] FOR TODAY’S SLIDES: EMAIL: [email protected]

33