INTERSPEECH 2011

Electroglottograph and Acoustic Cues for Phonation Contrasts in Taiwan Min Falling Tones

Ho-hsien Pan1, Mao-Hsu Chen1, Shao-Ren Lyu1

1 Department of Foreign Languages and Literatures, National Chiao Tung University, Hsinchu Taiwan [email protected], [email protected], [email protected]

including closed (contact) quotients obtained using a hybrid Abstract method (CQ_H) and Peak Increase in Contac (PIC), This study explored the effective articulatory and acoustic distinguished modal, breathy, and creaky phonations in parameters for distinguishing Taiwan Min falling unchecked Hmong and distinguished tense and lax phonations in Yi [6, 8]. tones 53 and 31 and checked tones 5 and 3. Data were Acoustic parameters including spectral tilts which measure the collected from Zhangzhou, Quanzhou, and mixed accents in amplitude differences between the first harmonic (H1) and the northern, central, and southern Taiwan. Results showed that strongest harmonic in the first formant (A1), second formant EGG parameters, Contact Quotients (CQ) and Peak Increase (A2) and third formant (A3), i.e. H1*-H2* (corrected), H1*- in Contact (PIC) were not effective in distinguishing checked A1* (corrected), H1*-A2* (corrected), H1*-A3*, and Cepstral from unchecked tones across speakers. In contrast, f0 contour Peak Prominence (CPP) distinguished phonation contrasts in and Cepstral Peak Prominence (CPP) consistently Hmong and Yi [6, 8]. distinguished checked tones from unchecked tones across A previous fiberoptic study of Taiwan Min checked speakers. The f0 onset was highest for 53, followed in with final unreleased stops [p t, k, ǚ] produced with order by tone 3, 5 and 31. The f0 contours of tone 5 were the tones 5 and 3, showed glottalization with adduction of highest in the later half of the vowels. CPP measures of ventricular folds for checked tone syllables in citation form. checked tones were higher than those of unchecked tones in However, glottalization sometimes disappeared when checked the latter portion of vowels. tones were placed in a sentence or phrase [9]. An inverse Index Terms: Electroglottoraph, glottal wave form, f0, filtered oral airflow study found low airflow and a long closed Cepstral peak prominence, Taiwan Min, phonation. phase in the glottal waveform of checked tones produced by some speakers [10]. A more recent laryngoscopic study found 1. Introduction consistent adduction of glottal folds for Taiwan Min checked tones, but less or no adduction of ventricular folds or In languages with mixed pitch and phonation tone systems, the aryepiglottic folds for younger speakers [11]. effective acoustic parameters for characterizing phonation Although airflow, fiberoptic, and laryngoscopic studies types vary across languages. As the production of pitch for found glottalization for checked tones, spectral tilt as lexical tones and qualities both involve larynx, the measured by H1-A2 were not effective for distinguishing variations in pitch may contribute to the variation in voice checked from unchecked tones [12]. Since acoustic data is qualities. In Burmese, inverse filter acoustic waveforms were more readily available during speech processing, it is essential 10.21437/Interspeech.2011-267 used to characterize lexical tones produced with modal or to identify the effective acoustic parameters for documenting creaky voice qualities accompanying lexical tones. It was phonation contrast in Taiwan Min. found that peak flow and the asymmetry between rate of rise Following the study on Yi and Hmong, the current study and fall in glottal pulses distinguished creaky and modal explores the effectiveness of EGG parameters including CQ phonation [1]. Most previous studies on phonation contrasts and PIC and acoustic parameters including spectral tilt use articulatory data to supplement acoustic data. For measures and CPP in order to characterize reliable cues for example, both glottal airflow and acoustic data were studied to Taiwan Min checked tones. As Taiwan Min tone 5 is characterize breathy, creaky, and modal phonation types currently undergoing during which the f0 onset accompanying three lexical tones in Green Mong. The closed of tone 5 was lowered from high to mid f0 range, the phase duration of glottal airflow distinguished phonation types identification of the effective EGG and acoustical parameters as did amplitude difference between first harmonic (H1) and can help explore how voice quality vary along with the f0 second harmonic (H2), f0, vowel duration, vowel quality and change [13]. It is hypothesized that voice quality contrast in voice onset time [2, 3, 4, 5]. The harmonic amplitude terms of spectral tilt and Cepstral Peak Prominence will be difference is referred to as spectral tilt (H1-H2), maintained to help distinguishing checked tones from Additionally, both airflow and acoustic data were used to unchecked tones while the contrast in f0 onset is undergoing study languages including Jingpho, Hani, Yi (Nasu) and Wa, changes. However, there should be inter-speaker variations as with tense / lax phonation contrasts accompanying lexical sound change spread around different regions and ages. tonal contrasts. Lax vowels were found to have a greater flow to pressure ratio and a greater ratio of f0 to H2 for lax vowels 2. Method [6]. An Electroglottoraph (EGG) study of Hanoi Vietnamese found an increase in the open quotient for -final rhyme syllables and glottalization for nasal-final rhyme 2.1. Subjects syllables [7]. In an articulatory and acoustic study of two languages with phonation and lexical tone contrasts, namely Following the isoglosses defined by Ang [14], we recruited Hmong and Yi, it was found that two EGG parameters, eight native speakers from Taiwan who spoke Zhangzhou (㻛

Copyright © 2011 ISCA 649 28-31 August 2011, Florence, Italy ⶆ), Quanzhou (㱱ⶆ), and mixed accents. All speakers were A one-way ANOVA (checked vs. unchecked tones) was able to speak Mandarin and were above 40 years old, as used to analyze the EGG and acoustic parameters at 9 vowel shown in Table 1. time points for each speaker. Table 1. Subject backgrounds. F:Female,M:Male, N: Northern, C: Central, S: Southern

>40 years old N. Zhangzhou 1 M, 1F N. Quanzhou 1 F C. Zhangzhou 1 F C. Quanzhou 1 M, 1F Figure 1: Acoustic waveform (top panel) and EGG S. Mixed 2 F waveform (lower panel).

2.2. Corpus 3. Result There were 390 randomized monosyllabic items in the reading list, including target items with checked tones 5 and 3, control If Taiwan Min checked tones are produced with tense/ creaky items with tones 53 and 31, and filler items with tone 55, 13 voice, then the checked tones should have longer CQ and and 33. higher PIC values. Additionally, the spectral tilt of checked tones should show a less steep spectral slope, thus having 2.3. Instruments smaller values of H1*-H2*, H1*-A1*, H1*-A2* and H1*-A3*. The CPP values tend to be less for phonation with breathiness. Glottal Enterprise EGG system EG2-PCS and TEV Tm-728 II microphones were used to simultaneously pick up acoustic 3.1. EGG parameters and EGG data which were then recorded with Audacity onto a laptop. The EGG and acoustic signals were then separated As shown in Table 2, only five of the eight speakers showed into different channels, inverted and analyzed with EggWorks significant differences in the mean contact quotients (CQ_H) [15]. The acoustic signals were tagged using Praat to mark during the nine vowel time points. CQs at vowel midpoint vowel boundaries, and then analyzed with VoiceSauce to were significantly different among checked 5 and 3 and obtain spectral tilt and CPP measurements [16]. unchecked tones 53 and 31 for these five speakers only. As shown in Table 2, again significantly different PIC 2.4. Procedure values between checked and unchecked tones can only be found among six of the eight speakers. Consequently, neither During the recording, two electrodes were placed on speakers’ CQ_H nor PIC was effective measures in documenting necks while the microphone was placed within 15 cm in front phonation type across speakers. of all speakers’ mouths. Speakers first read the number of each token and then paused before reading each target token Table 2. EGG, spectral tilt and CPP at nine vowel time points on the reading list. Speakers paused after every 100 tokens to in checked tones 5 and 3 and unchecked tones 53 and 31 take a rest. The recording lasted for about 40 minutes. syllables. NZ: Northern Zhangzhou, NQ: Northern Quanzhou, CZ: Central Zhangzhou, CQ: Central Quanzhou, SM: 2.5. Data analysis Southern Mixed, F: Female, M: Male. ** p<.01 10 20 30 40 50 60 70 80 90 As shown in Figure 1, for EGG signals, two parameters, % namely the CQ of glottal waveform and instantaneous peak NZ1F ** ** ** ** rate of change in contact (Peak Increase in Contact, PIC), were NZ1M calculated at nine equal intervals (10% through 90% during vowel phonation) using EggWorks. CQ was calculated using NQ1F ** ** ** ** ** a hybrid method (CQ_H). In this method, the edges of the CZ1F ** ** ** ** contacting phase of glottal pulses are defined using two

CQ_H CQ1F methods. The beginning of the contact phase is the positive peak in the first derivative of the EGG signal. The end of the CQ1M contact phase is arbitrarily determined using a 25% threshold SM1F ** ** ** [16]. The PIC is the peak positive value in the derivative of SM2F ** ** the EGG signal, equivalent to DECPA measure [17]. VoiceSauce was used to query acoustic parameters NZ1F ** including f0, spectral tilt (corrected H1*-H2*, H1*-A1*, H1*- NZ1M A2*, H1*-A3*) and CPP [16, 18]. F0 was estimated using the NQ1F ** ** ** ** STRAIGHT algorithm at 1 ms intervals [19]. Since vowel height and tongue advancement were not controlled for, the CZ1F ** ** ** ** corrected amplitude of the first three formant, bandwidth, and PIC CQ1F corresponding spectral tilts H1*-H2*, H1*-A1*, H1*-A2*, CQ1M ** ** ** H1*-A3* was calculated using Snack Sound Toolkit [20]. CPP is a measure of cepstral peak amplitude normalized over SM1F ** ** ** all amplitudes [21]. CPP measures represent how prominent SM2F ** ** ** ** ** ** the cepstral peak is against “background noise,” and indicates NZ1F ** ** ** how periodic the signals are.

650 and thelowestfortone31. 5 tone for highest vowel midpoint(50%)f0contours werethe tone31. tone3,5andfinally After the followed by 53, tone for highest As showninFigure2,thef0onset wasthe F0 3.2.1. Acousticparameters 3.2.

CPP H1*-A3* H1*-A2* H1*-A1* H1*-H2* SM2F SM1F CQ1M CQ1F CZ1F NQ1F NZ1M NZ1F SM2F SM1F CQ1M CQ1F CZ1F NQ1F NZ1M NZ1F SM2F SM1F CQ1M CQ1F CZ1F NQ1F NZ1M NZ1F SM2F SM1F CQ1M CQ1F CZ1F NQ1F NZ1M NZ1F SM2F SM1F CQ1M CQ1F CZ1F NQ1F NZ1M ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

** ** ** ** ** ** ** ** ** ** **

** ** ** ** ** ** ** ** **

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

** ** ** ** ** ** ** ** ** ** ** ** **

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

651

CPP CPP

CPP CPP 16 18 20 22 24 26 28 30 16 18 20 22 24 26 28 30 CepstralPeakProminence(CPP) 3.2.3. Figure 2: and H1*-A2*atthe80%voweltimepoints. with significantdifferencesinH1*-A1* speakers five were tilt differenceswereobservedamong allspeakers. At most spectral significant which in was nosinglevoweltimepoint distinguishedallspeakers’data. However, effectively there twospectraltiltmeasures that only the were H1*-A2* and from uncheckedtones.H1*-A1* tones checked distinguish to values tilt spectral use to Table 2revealshoweffectiveitis SpectralTilt 3.2.2. 16 18 20 22 24 26 28 30 16 18 20 22 24 26 28 30 31 53 31 53 02 04 06 70 60 50 40 30 20 10 70 60 50 40 30 20 10 3 5 3 5 31 53 31 53 02 04 06 70 60 50 40 30 20 10 3 5 02 04 06 70 60 50 40 30 20 10 3 5 31 andcheckedtones5,3. Figure 3:CPPof 31 53 31 53 3 5 3 5 31 53 31 53 3 5 3 5 log(strF0) F0 contoursofTaiwanMintones53,3,5,and3. 1.8 1.9 2.0 2.1 2.2 2.3 2.4 31 53 31 53 3 5 3 5 31 53 31 53 3 5 31 53 3 5 3 5 1020304050607080901000 Data Points Data Points Data Points NZ1F Data Points NQ1F CQ1F SM1F 31 53 53 31 3 5 3 5 31 53 31 53 tone 31 tone 53 tone 3 tone 5 53 31 5 3 3 5 3 5 ( ( % % ( ( % ) ) 31 53 % 3 5 ) 31 53 31 53 53 31 ) 5 3 3 5 3 5 31 53 Taiwan Min unchecked tones 53 and TaiwanMinuncheckedtones53 53 31 3 5 3 5 31 53 5 3 tone 31 31 tone 53 tone 3 tone 5 tone 53 31 53 31 3 5 3 5 31 53 53 31 3 5 3 5 Data Points (%) Points Data 31 53 5 3

53 31 31 53 CPP

3 CPP 5 3 5 31 53 16 18 20 22 24 26 28 30 31 53 16 18 20 22 24 26 28 30 CPP3 5

3 5 CPP 53 31 16 18 20 22 24 26 28 30 31 53 16 18 20 22 24 26 28 30 02 04 06 70 60 50 40 30 20 10 31 53 5 3 02 04 06 70 60 50 40 30 20 10 3 5 3 5 31 53 31 53 02 04 06 70 60 50 40 30 20 10 3 5 02 04 06 70 60 50 40 30 20 10 3 5 31 53 3 5 31 53 31 53 5 3 3 5 31 53 31 53 3 5 3 5 31 53 3 5 53 31 31 53 3 5 3 5 31 53 31 53 3 5 3 5 Data Points Data Points Points Data 53 31 3 5 Data Points Points Data Data Points Points Data NZ1M CZ1F CQ1M 31 53 31 53 SM2F 3 5 3 5 31 53 53 31 3 5 3 5 53 31 3 5 ( ( % % ( ( ) % ) % 31 53 ) 31 53 ) 3 5 5 3 31 53 53 31 3 5 3 5 31 53 31 53 3 5 3 5 31 53 53 31 3 5 3 5

31 53 31 53 3 5 3 5 31 53 31 53 3 5 3 5 As shown in Table 2, CPP significantly differentiated checked [5] Garellek, M., Keating, P. 2010. The acoustic consequences of tones from unchecked tones produced by all speakers at 20% phonation and tone interactions in Jalapa Mazatec. UCLA and 70% vowel time points (NZ1F: 20% F(1, 310)=21.41, Working Papers in Phonetics, 108, 141-163. p<.01, 70% F(1, 310)=13.79, p<.01; NZ1M: 20% F(1, [6] Keating, P., Esposito, C., Garellek, M., Khan, S., Kuang, J. 2010. Phonation contrast across languages. UCLA Working 323)=7.91, p<.01, 70% F(1, 323)=49.82, p<.01; NQ1F: 20% Papers in Phonetics, 108, 188-202. F(1, 309)=20.52, p<.01, 70% F(1, 309)= 19.29, p<.01; CZ1F: [7] Michaud, A. 2004. Final consonants and glottalization: New 20% F(1, 324)=35.45, p<.01, 70% F(1, 324)=55.8, p<.01; perspective from Hanoi Vietnamese, Phonetica, 61, 119-146. CQ1F: 20% F(1, 316)=21.39, p<.01, 70% F(1, 316)=11.38, [8] Maddieson, I., Ladefoged, P. 1985 “Tense” and “lax” in four p<.01; CQ1M: 20% F(1, 315)=30.85, p<.01, 70% F(1, minority languages of China. UCLA Working Papers in 315)=11.63, p<.01; SM1F: 20% F(1, 325)=41.55, p<.01, 70% Phonetics, 60, 59-83. F(1, 325)=25.33, p<.01; SM2F: 20% F(1, 325)=42.22, p<.01, [9] Iwata, R., Sawashima, M., Hirose, H. 1981. Laryngeal adjustments for -final stops in . Annual 70% F(1, 325)=31.82, p<.01). Bulletin Research Institute Logopediatrics and Phoniatrics. 15, Figure 3 shows detailed CPP values from 10% to 70% 45-54. vowel time points in checked tones 5 and 3, and unchecked [10] Pan, H.-h. 1991. Voice quality of Amoy falling tones. J. Acoust tones 53 and 31. It was found that the CPP values of checked Soc Am. 90, 2345. tones were lower than unchecked tones at the 20% time point, [11] Edmondson, J. A., Chang, Y.-c., Huang, H.-c., Hsieh, F.-f., but higher than unchecked tones at the 70% time point. Peng, Y. 2010. Reinforcing voiceless stop coda in Taiwanese, In sum, EGG parameters, including Contact Quotient Vietnamese and other east and southeast Asian languages: (CQ_H) and Peak Increase in Contact (PIC), did not reveal Laryngoscopic case studies. LabPhon 12. New Mexico [12] Pan, H.-h. 2005. Voice Quality of Falling Tones in Taiwan Min. consistent pattern across speakers. Among the acoustic Proc. Interspeech 2005 Lisboa, 1401-1404. parameters, f0 contours and Cepstral Peak Prominence (CPP) [13] Chen, S.-c. 2009. ⎘䀋救⋿婆⃫枛䲣䴙⍲昘春ℍ倚婧䘬嬲䔘 showed consistent cross speaker patterns at 20% and 70% 冯嬲⊾⎘䀋救⋿婆䘬⫿堐婧㞍↮㜸 The vowel system vowel point times. Inter-speaker variation in spectral tilt such change and the Yin-/Yang-entering tonal variations in as H1*-H2*, H1*-A1* and H1*-A2* may indicate dialectal Taiwanese . J. of Taiwanese Languages and variation. Literature ⎘䀋婆㔯䞼䨞, 3, 157-178. [14] Ang, U.-j. 2009. ⎘䀋⛘⋨救⋿婆䘬㕡妨栆✳冯㕡妨↮⋨ On 4. Discussion the dialect typologies and isoglosses of Taiwan Min. J. of Taiwanese Languages and Literature ⎘䀋婆妨䞼䨞, 3, 239- The lack of consistent results using EGG indicators of 309. phonation type indicates that further articulatory studies are [15] EggWorks. necessary for characterizing the phonation types of Taiwan http://www.linguistics.ucla.edu/faciliti/facilities/physiology/egg Min checked tones. Acoustic measures are more consistent in .htm, 2010. identifying phonation types. CPP measures indicate periodic [16] Shue, Y.-L., Keating P., Vicenik, C. 2009. VoiceSauce: A voicing is more consistent within the production of the latter program for voice analysis. J. Acoust Soc Am., 126, 2221. [17] Michaud, A. 2004. A measurement from Electroglottography: half of syllables carrying a checked tone. DECPA, and its application in prosody. Speech Prosody 2004, Tone 5 is undergoing a sound in change in Taiwan Min. 633-636. Laryscopic study showed no or little adduction of ventricular [18] Iseli, M., Shue, Y.-L., Alwan, A. 2007. Age, sex and vowel folds for checked tones produced by younger generation. The dependencies of acoustic measures related to the voice source, J. inter-speaker variation observed in spectral tilt, such as H1*- of Acoust. Soc, Am., 121, 2283-2295. H2*, H1*-A1* and H1*-A2*, may reflect this sound change. [19] Kawahara, H., de Cheveigh, A., Patterson, R.D.1998. an While CPP consistently distinguish checked from unchecked instaneous-frequency-based pitch extraction method for high tones, spectral tilt parameters may show dialectal variation quality speech transformation: revised TEMPO in STAIGHT- suite. Proc. ICSLP’98, Sydney, Australia. due to sound change. For future study, CPP and spectral tilt [20] Sjölander, K. 2004. Snack sound toolkit, KTH Stockholm, measures could be used to document how the phonation of Sweden: http://www.speech.kth.se/snack. checked tones vary among young and middle aged speakers in [21] Hillenbrand, J., Cleveland, R.A., Erickson, R.L. 1994. Acoustic different dialect regions. correlates of breathy vocal quality, J. of speech and Hearing Research, 37, 769-778. 5. Acknowledgements This project was supported by National Science Council in Taiwan (NSC 98-2401-H-009-040-MY3). The software VoiceSauce and EggWorks were provided by the UCLA phonetics lab. We would like to thank Yi-chu Ke for help in designing the corpus. 6. References [1] Javkin, H. R., Maddieson, I.1983. An inverse filtering study of Burmese creaky voice. UCLA Working Papers in Phonetics, 57, 115-125. [2] Andruski, J., Ratliff, M. 2000. Phonation types in production of phonological tone: the case of Green Mong, J. of the International Phonetic Association, 30(1/2), 37-61. [3] Andruski, J. E. 2006. Tone clarity in mixed pitch / phonation- type tones. J. of Phonetics, 34, 388-404. [4] Esposito, C. 2006. The Effects of Linguistic Experience on the Perception of Phonation. UCLA Ph.D. dissertation.

652