Acoustic Differences Between English /T/ Glottalization and Phrasal Creak
Total Page:16
File Type:pdf, Size:1020Kb
INTERSPEECH 2016 September 8–12, 2016, San Francisco, USA Acoustic differences between English /t/ glottalization and phrasal creak Marc Garellek1, Scott Seyfarth1 1Department of Linguistics, UC San Diego [email protected], [email protected] (a) Non-glottalized, non-creaky (b) Glottalized, non-creaky Abstract 5000 5000 In American English, the presence of creaky voice can derive 4000 4000 from distinct linguistic processes, including phrasal creak (pro- 3000 3000 2000 2000 Frequency Frequency (Hz) longed irregular voicing, often at edges of prosodic phrases) and Frequency (Hz) coda /t/ glottalization (when the alveolar closure for syllable- 1000 1000 0 0 final /t/ is replaced by or produced simultaneously with glottal (c) Non-glottalized, creaky (d) Glottalized, creaky constriction). Previous work has shown that listeners can differ- 5000 5000 entiate words in phrasal creak from those with /t/ glottalization, 4000 4000 which suggests that there are acoustic differences between the 3000 3000 Frequency Frequency (Hz) creaky voice derived from phrasal creak and /t/ glottalization. 2000 Frequency (Hz) 2000 In this study, we analyzed vowels preceding syllable-final /t/ in 1000 1000 the Buckeye Corpus, which includes audio recordings of spon- 0 0 taneous speech from 40 speakers of American English. Tokens Time Time were coded for presence of phrasal creak (prolonged irregular voicing extending beyond the target syllable) and /t/ glottaliza- Figure 1: Sample spectrograms of ‘about,’ with presence of glot- tion (whether the /t/ was produced only with glottal constric- talization crossed with phrasal creak. The creaky tokens (panels tion). Eleven spectral measures of voice quality, including both c and d) are differentiated from the non-creaky ones (panels a harmonic and noise measures, were extracted automatically and and b) by irregularity throughout the vowels and voiced por- discriminant analyses were performed. The results indicate that tions of the word. In the glottalized productions (panels b and the discriminant functions can classify these sources of creaky d), there is also a distinct region of strong glottal irregularity voice above chance, and that Cepstral Peak Prominence, a mea- that is localized to the end of the syllable. sure of harmonics-to-noise ratio, is important for distinguishing phrasal creak from glottalization. Index Terms: creaky voice, phonation, glottalization, voice may span several words and appears on any voiced sound. In- deed, some speakers produce creaky voice over large portions 1. Introduction of phrases, and not just at the ends of prosodic domains. Such Creaky voice tends to have a low and irregular F0, as well as cases might be used to index speaker identity [12]. In our increased vocal fold constriction, relative to modal voice. How- study, we refer to both phrase-final creak and longer durations ever, previous studies have shown that numerous subtypes of of creaky voice as ‘phrasal creak.’ creaky voice exist [1, 2, 3]: some forms of creaky voice are Because /t/ glottalization and phrasal creak act on differ- constricted but have a regular F0, whereas other types are not ent domains (only on /t/ for glottalization; over long strings of constricted but are irregular in F0 [3, 4]. In this study, we test segments for phrasal creak), these two sources of creaky voice whether different linguistic sources of creaky voice are pro- can co-occur on a particular word. For example, the bottom- duced with different acoustic characteristics. right panel of Figure 1 shows a token of ‘about’ with both /t/ In American English, creaky voice has several distinct lin- glottalization (as seen by the lack of stop closure and [t] release guistic origins. First, creaky voice may occur as a result of burst) and phrasal creak (as seen by the irregular voicing over so-called ‘/t/ glottalization,’ the phenomenon in which glottal the entire word). Can listeners disambiguate between different constriction is produced simultaneously with or instead of /t/ linguistic sources of creaky voice? It has been shown that En- [5,6,7,8]. For example, the word ‘about’ can be produced with- glish listeners are able to disambiguate words with /t/ glottaliza- out glottalization as [@baUt], or as glottalized [@baUP]. Glottal- tion (e.g. ‘atlas’ [æPl@s]) from non-glottalized counterparts (e.g. ized and non-glottalized tokens of ‘about’ are shown in the top ‘Alice’ [æl@s]), even when the latter occur in phrasal creak [13]. row of Figure 1. The glottalized token (top-right, panel b) shows This suggests that there are systematic acoustic differences be- an increasing glottal period and greater F0 irregularity towards tween /t/ glottalization and phrasal creak. However, this previ- as the vowel progresses, with no identifiable stop closure or re- ous study used stimuli from laboratory speech recorded by only lease burst associated with the coda /t/. In contrast, a clear one phonetically trained speaker. This leaves it unclear as to stop closure and release burst can be seen in the non-glottalized which acoustic features are perceptually relevant and robust for token (top-left, panel a). disambiguating /t/ glottalization from phrasal creak, and which Another source of creaky voice in American English is may be idiosyncratic. phrase-final creak, the phenomenon in which ends of prosodic In this paper, we investigate whether English speakers pro- phrases are creaky [9,10]. In contrast to /t/ glottalization, which duce /t/ glottalization and phrasal creak with different kinds of affects only coda /t/ of a single word [11], phrase-final creak creaky voice in spontaneous speech that are acoustically well- Copyright 2016 ISCA 1054 http://dx.doi.org/10.21437/Interspeech.2016-1472 discriminated. If indeed they do, this would suggest that dif- (HNR05), and subharmonics-to-harmonics ratio (SHR), which ferent subtypes of creaky voice can be utilized for different lin- measures the amplitude difference between the harmonics and guistic purposes (e.g. /t/ glottalization vs. phrasal creak) – that subharmonics in the signal [21]. Because F0 tends to be irreg- is, subtypes of creaky voice can contrast within a particular lan- ular during prototypical creaky voice, we expect both CPP and guage. HNR05 to be lower in creakier voice qualities [3, 22]. More- over, the frequent occurrence of period-doubled phonation dur- 2. Corpus data ing creaky voice should increase the SHR [3]. The seven re- maining measures were estimates of spectral tilt in various fre- The recordings used in this study were obtained from the Buck- quency bands. All of these seven measures are expected to be eye Corpus of spontaneous American English [14], which con- lower in creakier voice qualities, because of the increased glot- tains speech of 40 adults (gender-balanced) from Ohio. The tal constriction and/or abrupt glottal closure [3,22,23]. Because recordings took place in a quiet room and were digitized at 16 these measures also vary as a function of vowel quality, they kHz with 16-bit resolution. The corpus has canonical phonemic were corrected for vowel formants, following [24], which al- transcriptions for each word (generated by automatic alignment lows for cross-vowel comparisons in voice quality. Corrected software), as well as narrow phonetic transcriptions, which were measures are shown with asterisks. made by corpus annotators who hand-corrected the labeling and Due to the importance of correct F0 tracking for many of segmentation of the canonical phonemic transcriptions. these measures, we took two steps to exclude tokens that were We analyzed the following subset of the corpus, which likely mistracked. First, we excluded 414 tokens which con- is taken from the dataset constructed for and described in tained at least one analysis frame in which F0 was either half or [15], summarized here with additional exclusions and hand- double the F0 value in the previous frame (within a 10% mar- corrections. Words with syllable-final /t/ in the canonical form gin). While F0 may drop rapidly during creaky voice, an octave were extracted from the corpus, based on a syllabification al- jump is most likely to be a tracking error. Second, we stan- gorithm adapted from [16]. We selected only tokens of coda dardized F0 within two groups within each speaker: modal [t] /t/ that were preceded by a vowel at least 50 ms long (as short tokens, and tokens with phrasal creak and/or glottalization. We samples are problematic for voice analyses), were not part of a excluded tokens with a mean F0 > 2.5 standard deviations from complex coda (e.g., excluding ‘kept’, ‘rant’), and which were each speaker’s modal or creaky mean, as appropriate. realized phonetically as a voiceless stop [t] or as a glottal stop Finally, outliers > 2.5 standard deviations from each [P]. 11,947 tokens in the corpus met these initial criteria. We speaker’s mean on any of the acoustic measures were excluded hand-inspected each token, and excluded 346 tokens that were (excluding F0; 11.1% of the total data). Overall, 8,936 vowels disfluent, were misaligned, or had noisy or clipped recordings. were analyzed, whose distributions into the four groups (non- Glottal stops were identified based on the provided corpus glottalized non-creaky, non-glottalized creaky, glottalized non- annotations, and verified during hand-inspection. An instance creaky, and glottalized creaky) are shown in Table 3. of /t/ was considered to be realized as a glottal stop if it had irregular voicing localized to the onset and offset of the target coda, and no [t] release burst. The provided annotations were Table 1: Acoustic measures used in the discriminant analysis. found to be largely accurate: of the 6,044 glottal stops meeting Included in the analysis were averages of each measure, as well the criteria, only 462 (7.6%) were hand-corrected to a [t], and as changes between the first and final third of the target vowel.