1

Contributions of modal and creaky to the perception of habitual pitch

Lisa Davidson

New York University

Department of Linguistics New York University 10 Washington Place New York, NY 10003 [email protected]

2

ABSTRACT

In some languages, creaky voice is used over relatively long stretches of speech as a prosodic element, to convey emotion, and/or stylistically. A primary acoustic and perceptual cue to creaky voice quality is a low fundamental frequency. Previous research has shown that listeners can make fine-grained comparisons of speakers’ habitual modal pitch, but this study focuses on how a combination of modal and non-modal affects the perception of habitual pitch. A perception experiment assesses whether listeners are more likely to rate a speaker’s utterance as being holistically lower in pitch if it contains both modal and creaky voice as compared to fully modal speech. Results indicate that for female American English speakers with higher modal pitch, the inclusion of creaky voice leads listeners to rate such utterances as lower in pitch compared to fully modal utterances, but not for speakers with lower modal pitch. These results are consistent with studies showing that pitch perception interacts with non-modal phonation, and relates to previous observations that speakers may utilize non-modal phonation to manipulate their intended habitual pitch.

3

1. INTRODUCTION. Creaky voice refers to a non- quality that is often characterized by a low fundamental frequency (F0), semi-regular glottal opening periods, and strong damping between glottal pulses (Garellek, 2019; Gerratt and Kreiman, 2001; Keating et al., 2015; Laver, 1980). In addition to languages that use creaky phonation to contrast (e.g., Avelino, 2010; Blankenship, 2002; Esposito, 2012; Garellek, 2012; Garellek and Keating, 2011; Gerfen and Baker, 2005), creaky voice has been observed as a prosodic element, for example as an end-of-sentence marker in languages like English, Mandarin, and Finnish (Belotel-Grenie and Grenie, 2004; Callier, 2013; Epstein, 2002; Garellek and Seyfarth, 2016; Henton and Bladon, 1987; Kreiman, 1982; Ogden, 2001; Redi and Shattuck-Hufnagel, 2001; Slifka, 2006). Creaky voice has also been studied in sociolinguistic and affective contexts, and it has been shown to convey emotion, speaker identity, and meaning, including, for example, irritation in Hanoi Vietnamese (Nguyen et al., 2013), seeking commiseration in Lachixío Zapotec and Tzeltal Maya (Sicoli, 2015) identification with a Chicano gangster persona (Mendoza-Denton, 2011), and the marking of parentheticals in American English (Lee, 2015). In the United States in the early 21st century, creaky phonation has also captured the attention of both researchers and the popular press as being especially characteristic of the speech of younger female (mainly white) speakers (Abdelli-Beruh et al., 2014; Hageman, 2013; Khazan, 2014; Palermo, 2013; Wolf, 2015; Wolk et al., 2012; Yuasa, 2010), though acoustic analysis of speech samples indicate that differences between men and women in production are small (Abdelli- Beruh et al., 2016; Irons and Alexander, 2016; Melvin and Clopper, 2015; Pratt, 2018). While production studies have endeavored to examine who implements creaky voice and why, acoustic analyses of voice quality confirm that one of the main properties of creaky voice is that it is characterized as having a low F0. In terms of articulation, the term ‘creaky voice’ is usually used to refer to a collection of phonation types, including ‘prototypical creaky voice’ (low F0, semi-regular periodicity, and strong damping between glottal pulses), ‘multiply pulsed creak’ (glottal openings that alternate in higher and lower amplitude, with a perceived low F0, with the entire duration between higher amplitudes being semi-periodic), and ‘non-constricted creak’ (F0 is low and irregular, but these properties are accompanied by glottal spreading and higher airflow, not constriction) as some examples (Batliner et al., 1993; Davidson, 2019; Drugman et al., 2014; Garellek, 2019; Gerratt and Kreiman, 2001; Ishi et al., 2008; Keating et al., 2015; Laver, 1980; Redi and Shattuck-Hufnagel, 2001; Slifka, 2006). Most studies have also

4 shown that the portions of speech identified as creaky voice using articulatory criteria usually have an upper boundary for F0 below 100Hz (Hollien and Wendahl, 1968; Hollien and Michel, 1968; McGlone, 1967; McGlone and Shipp, 1971; Murry, 1971). In a summary of the literature and their own stimuli, Blomgren et al. (1998) report that a typical range for creaky voice is about 20Hz to 80Hz. Moreover, the range for creaky voice is reported to be very similar for both men and women, though the average F0 ranges for modal phonation are approximately 100-160Hz for males and 170-260Hz for females (Blomgren et al., 1998; Keating and Kuo, 2012; Pepiot, 2014; Titze, 1994). Perception studies also show that one of the most consistent cues that listeners use to identify creaky voice is low pitch (Hollien and Wendahl, 1968). Kuang and Liberman (2016) find that when periodic stimuli are manipulated to contain glottal pulse irregularity as is found in creaky voice, listeners are more likely to report hearing lower pitch. Relatedly, in Cantonese, creaky voice often accompanies 4, the low falling tone, and listeners report hearing Tone 4 more often when presented with a stimulus containing creak than when listening to one that does not (Yu and Lam, 2014). Davidson (2018a) shows that in addition to identifying creaky voice when it is actually present in a multi-word utterance, listeners also false alarm on male speakers who have particularly low modal F0. The role of pitch in the production and perception of creaky voice coincide in studies that speculate that one reason that creaky voice could be implemented is because it has the effect of lowering a speaker’s perceived holistic pitch (Anderson et al., 2014; Yuasa, 2010; Zimman, 2018). Zimman (2018) comments that creak may be a strategy that transgender men use in order to access a lower pitch range if they consider their voices to be otherwise relatively high pitched. Anderson et al. (2014) and Yuasa (2010) make note of the possibility that lower pitch has potential benefits in professional settings for women, following studies which have shown that both women and men are judged to be more dominant if they have lower pitch (Borkowska and Pawlowski, 2011; Feinberg et al., 2008; Fraccaro et al., 2013; Puts et al., 2007). Similarly, in the political realm, voters prefer candidates of both genders to have lower pitch (Klofstad, 2015; Klofstad et al., 2012). These observations present the intriguing possibility that some groups could use creaky voice to lower their perceived habitual pitch, but studies of how creaky voice is implemented introduce some complications. It is not the case that speakers who employ creaky voice in their

5 speech do so in 100% of an utterance. Though studies of creaky voice in spontaneous speech are limited to date, previous studies using read speech suggest that the proportion of creaky syllables in a sentence is usually much less than half of the sentence (probably lower than 1/3 of a sentence, though this is difficult to determine since studies use different metrics such as number or proportion of creaky syllables, proportion of creaky voice per minute, or number of sentences containing creaky voice with no indication of how long it lasted) (Behrman and Akhund, 2016; Irons and Alexander, 2016; Melvin and Clopper, 2015; Oliveira et al., 2016). Irons and Alexander (2016) did include a spontaneous speech portion in their study using the Kane et al. (2013) automated creak detector, and found that about 26% of men’s speech and 8% of women’s speech was produced with creaky voice. Pratt (2018) examined data from interviews from 12 male and 12 female Californian high school students, finding that the speakers range from 0.9% to 36.7% for the total amount of creak in the whole interview, with no real pattern for male vs. female speakers. These data suggest, then, that a listener will often be faced with a preponderance of modal (or other) voice for a given speaker, with a smaller proportion of that speaker’s speech being produced as creaky voice. In that case, are listeners aware of both a speaker’s habitual F0 in the modal voice and the perceived pitch of the creaky voice portions? The purpose of this study is to test how listeners assess speakers’ habitual pitch with stimuli that contain both modal and creaky voice. Previous research shows that listeners are capable of relatively fine-grained distinctions in comparing speakers’ habitual pitch when presented with whole sentences (Davidson, 2018b), but the effect of creaky phonation on the holistic determination of pitch has not yet been explored. In the current study, listeners are presented with fully modal, wholly creaky, and partially creaky (first half modal, second half creaky) utterances from four female American speakers and are asked to judge the overall pitch of the utterance using a scale from low pitch to high pitch. For the modal voice portions (both fully modal, and the modal portion of the partially creaky utterance), two speakers represent lower habitual pitch (~150Hz) and two speakers have higher habitual pitch (~200Hz). In perceiving the speakers’ pitch, there are two likely options:

a. If listeners are able to separate creaky voice from modal voice and rate speakers only on the portions they assume to be their “normal” (i.e. modal) pitch, then it is expected that the listeners will rate both fully modal and partially creaky utterances as

6

relatively high for the higher pitched speakers and relatively low for the lower pitched speakers, with no difference in rating between fully modal and partially creaky. The perceived pitch of whole creak utterances is predicted to be low regardless of the speakers’ modal pitch. b. If instead listeners factor the creaky portion into the overall perceived pitch of the utterance in the partially creaky utterances, then it is expected that there will be a three-way distinction between fully modal, partially creaky, and whole creak in how listeners rate the overall pitch of the utterance. A variation of this result may be that a three-way distinction will be found for the higher pitched speakers, while listeners rate the fully modal and partially creaky utterances the same for the lower pitched speakers, if the modal F0 is already low enough for listeners to place these speakers at the low end of the scale (a floor effect).

The next section contains acoustic information about the stimuli and the results of a rating study for spontaneous utterances produced by four female speakers that represented either fully modal, partially creaky, or wholly creaky voice quality.

2. METHOD

2.1. PARTICIPANTS AND STIMULI. The participants were 71 (self-reported) native speakers of American English (28F, 43M, ages 22-71) who were recruited using Amazon Mechanical Turk (MTurk). Participants were required to have an IP address in the United States to complete the study. Participants came from all of the broad geographic regions of the United States. They were paid $2.25 for about 10 minutes of their time. One caveat about this participant set is that though they were asked not to participate if they were bilingual or had learned another language before 5 years of age, this could not be guaranteed. However, previous testing has shown that participants can forget this requirement, and when presented with a demographic questionnaire at the end of the study, they will report having learned multiple languages as a child. Thus, participants were asked again at the end whether English was their only native language, and if not, which other languages they were native speakers of. For this study, all of the participants claimed to have spoken only American English. Moreover, no participant claimed to have more than an “intermediate” level in another language. Another limitation of the participant set is that

7 they were not asked for information about their racial or ethnic backgrounds, in case that were to affect how pitch is perceived.

2.2. STIMULI. The stimuli for the study consist of naturally produced utterances from four white female podcast hosts who are speakers of ‘standardized’ American English who record in professional settings and who all worked in public radio previously or currently. The speakers ranged in age from 32-44 years old. The utterances were 3-4 word phrases taken from the ends of sentences (mean dur = 909ms; e.g., ‘many months ago’, ‘kids one day’). They were divided into three voice quality categories: fully modal, whole creak, and partially creaky. For each speaker, there were 6 unique utterances in each category, for a total of 72 utterances that were rated by each participant. In partially creaky utterances, creak was produced on the second 50% of the utterance (e.g. “my older fr̰ ḭḛn̰ ds a̰ r̰ ḛ”). The presence of creak was identified by visual inspection of waveforms and spectrograms in Praat (Boersma and Weenink, 2018). As noted earlier, creaky voice can be characterized by a few different acoustic criteria, and all of the tokens in this study were categorized as either prototypical or multiply pulsed creak (Keating et al., 2015), which listeners consider to be equally good implementations of creaky voice (Davidson, 2019). Prototypical creak is characterized by a low and irregular F0, with a long closed phase, and multiply pulsed creak contains two sets of glottal openings which alternate regularly in amplitude and length and are perceived as having the low pitch corresponding to the whole duration of the multiple glottal openings (see examples in Figure 1). The creak type stayed relatively consistent throughout the short utterances. For each speaker and within each voice quality category, half of the utterances used prototypical creak and half used multiply pulsed creak. The files were normalized to 70dB SPL. Intonationally, the modal tokens all contained flat or slightly falling F0.

-----

FIGURE 1 ABOUT HERE

-----

The average F0 of the modal voice for each speaker was either relatively high (~200Hz) or relatively low (~150Hz), while the creaky portions range from an average of 68Hz to 83Hz.

8

The details are shown in Table 1. The speakers were labeled Hi-1, Hi-2, Lo-1, and Lo-2. F0 was extracted using the STRAIGHT algorithm in VoiceSauce (Shue et al., 2011). Because there were different phonemes in each file and many are not adequate for pitch measurements, the measurements were taken over the longest non-final (average duration 100ms) in the modal and creaky portions of each file in order to maintain as much consistency as possible. While different can have different intrinsic F0s (Shadle, 1985; Whalen and Levitt, 1995), the measurements were averaged over at least 5 different vowels, including high, low, back and front vowels, for both creaky and modal vowels for each speaker. This should minimize potential effects of intrinsic differences since they are similarly averaged for all of the speakers. In addition to F0, two acoustic measures of voice quality—H1*-H2* and harmonics to noise ratio (HNR)—are also provided in Table 1 since they have been shown to distinguish creaky voice from modal voice in a number of languages. Both measures are obtained with VoiceSauce. H1*-H2* is the difference between the amplitude (in dB) of the first harmonic and the second harmonic, with the ‘*’ referring to a correction for the effects of formant values and bandwidths. Lower values of H1*-H2* have been shown to be correlated with increased glottal constriction (Kreiman et al., 2012; Samlan and Story, 2011), and studies of creaky voice demonstrate that H1*-H2* is lower for creaky voice than for modal voice in English and other languages (e.g., Blankenship, 2002; Davidson, 2018a; Esposito, 2012; Garellek, 2019; Seyfarth and Garellek, 2015). HNR is a measure of the difference between the amplitudes of the harmonic and inharmonic components of the source spectrum (in dB), and previous work has shown that decreased values for HNR correlate with creaky voice since glottal openings in creaky voice are not typically completely periodic (Blankenship, 2002; Esposito, 2012; Garellek, 2012; Garellek and Seyfarth, 2016). Although studies relating spectral and noise measures to the perception of pitch in speech are sparse, there is some evidence that even when F0 stays constant, listeners will report hearing a higher pitch when the spectral slope is steeper (a characteristic of ) and lower when the spectral slope is flat (a characteristic of tense voice) (Kuang and Liberman, 2018). While Kuang and Liberman (2018) does not directly reflect on creaky voice, the study does indicate that fluctuations in voice quality measures likely affect how listeners perceive pitch. In Table 1, values for H1*-H2* and HNR are in the expected direction for almost every speaker and condition (H1*-H2* and HNR are lower for creaky portions of the stimuli than for modal).

9

However, an exception is the H1*-H2* value for Hi-2 in the whole creak condition (6.71dB), which is positive and similar to her modal values, while all other speakers have negative H1*- H2* values for the creaky stimuli. This aspect of Hi-2’s whole creak stimuli will be further discussed in Section 4. In addition to the test stimuli, five sound files with only modal voice from different female speakers were also prepared for the practice phase. The sentence “Fine soap saves tender skin” from the UW/NU corpus was used (Panfili et al., 2017). The average F0 of these utterances ranged from 155Hz to 251Hz, increasing in approximately 30Hz intervals from the minimum to the maximum. Because there are no speakers with particularly low pitch in this corpus, the average F0 was lowered 40Hz from the original speakers using the PSOLA algorithm in Praat for the utterances with the lowest pitch.

-----

TABLE 1 ABOUT HERE

-----

2.3. PROCEDURE. The experiment was conducted on MTurk using the Experigen experimental platform (Becker and Levine, 2013). Listeners clicked ‘I Accept’ after reading a statement about informed consent. They were told to put on headphones and adjust the volume to a listening level that would be comfortable for them. The first screen explained that the term pitch refers to the rate of vibration of a speaker’s vocal folds, and that on average, male speakers have a lower pitch than female speakers do, but even within male or female speakers, the pitch of some talkers can either be relatively higher or lower than average. It also explained that the experiment consisted of rating each female speaker’s pitch on a scale of 1 (low) to 9 (high) in comparison to other female American English voices that they had experience with. The next screen contained the 5 practice items, which were randomly ordered. Participants clicked on each sound and then chose a number from 1-9 from radio buttons that appeared underneath the sound file button. The next item appeared on the screen once they made a response to the preceding trial. Participants were allowed to listen to a sound file as many times as they wanted. No feedback was provided.

10

For the test phase, each of the stimuli items was presented in a different random order to each participant. They clicked to play the item, and then were asked to rate the pitch of each item from 1 (“lowest pitch”) to 9 (“highest pitch”). They could listen to each item at most two times. After the test phase, the participants completed a demographic questionnaire which included questions such as their age, their gender (male, female, other, no response), where they had lived, and what other languages they had learned and how proficient they were. The demographic questionnaire is given in Appendix A.

3. RESULTS. The primary question in this study is whether the presence of creaky voice in the partially creaky stimuli affects how listeners rate the holistic pitch of the utterance. A linear mixed effects regression using lme4 and lmerTest in R (Bates et al., 2018; Kuznetsova et al., 2013) was carried out with speaker (Hi-1, Hi-2, Lo-1, Lo-2) and amount of creak (modal, partially creaky, whole creak) as fixed effects and rating (1-9) as the dependent measure. Participant was included as a random intercept. Speaker was sum-coded, but modal was used as the baseline for the amount of creak.1i Planned comparisons were used to determine whether there were differences between the three amounts of creak for each speaker individually. A plot of the results is shown in Figure 2, and the statistical results are reported in Table 2. The findings show that almost every main effect and interaction are significant, with the exception of Hi-1*partial creak and Lo-2*whole creak. To better interpret these results, the amount of creak was compared for each speaker individually in a linear mixed effects regression with amount of creak as a fixed effect and rating as the dependent measure. Results for each speaker individually show that for Hi-1 and Hi-2, listeners give significantly higher pitch ratings for modal than for partial than for whole (Hi-1: modal vs. partial, β = -.46, t = -5.32, p < .0001; modal vs. whole, β = -1.81, t = -21.08, p < .0001; partial vs. whole, β = 1.36, t = 15.77, p < .0001; Hi-2: modal vs. partial, β = -1.20, t = -13.81, p < .0001; modal vs. whole, β = -1.80, t = - 20.72, p < .0001; partial vs. whole, β = .59, t = 6.89, p < .0001). In contrast, for both Lo-1 and Lo-2, there is no significant difference between modal and partial, but both of these have significantly higher pitch ratings than whole (Lo-1: modal vs. partial, β = -.14, t = -1.70, p = .09; modal vs. whole, β = -1.05, t = -12.11, p < .0001; partial vs whole, β = 9.04, t = 10.42, p < .0001; Lo-2: modal vs. partial, β = .08, t = 0.86, p = .39; modal vs. whole, β = -1.59, t = -16.54, p < .0001; partial vs whole, β = 1.67, t = 17.40, p < .0001).

11

-----

FIGURE 2 ABOUT HERE

-----

-----

TABLE 2 ABOUT HERE

-----

Another relevant analysis is the comparison of ratings across all of the speakers in each condition, to determine whether they distinguish between the habitual pitch of the speakers’ modal voice (which is expected), but also when they use creak. Comparisons within each amount of creak condition, given in Table 3, show there are significantly higher ratings for Hi-2 than for Hi-1 in the modal condition, which is consistent with the 16Hz difference between these speakers in Table 1, but no significant difference for the low-pitched speakers, who have only a 3Hz difference in the modal condition. In the partial creak condition, Hi-1 is rated as having higher pitch than Hi-2. This is not expected based on the modal pitch in the partially creaky condition (see Table 1), but the average of the modal and creaky portions of these stimuli is higher for Hi-1 (138Hz) than for Hi-2 (125.5Hz). In the whole creak condition, there are significant differences between all the speakers in the order Hi-2 > Hi -1> Lo-1 > Lo-2. This does not completely match up with the average measured F0 for this condition (68, 84, 78, 72Hz), though only Hi-2 is evidently out of place. This is further discussed in Section 4.

-----

TABLE 3 ABOUT HERE

-----

4. DISCUSSION. This study explores whether listeners perceive the holistic pitch of a multi-word utterance to be lower when half of the utterance contains creaky voice as compared to fully modal utterances by the same speaker. The results demonstrate that for the higher pitched speakers, listeners did in fact rate the partially creaky utterances as having a lower pitch, even

12 though the modal portions of these utterances are as high or higher in pitch than the fully modal utterances (see Table 1). For lower pitched speakers, however, there was no difference in pitch rating for the modal vs. partially creaky conditions, suggesting that if a speaker’s habitual pitch is already low enough, creaky voice is not helpful in further lowering the perceived pitch. As the average pitch for American English speaking women has been reported to be between 200- 230Hz (Keating and Kuo, 2012; Pepiot, 2014; Simpson, 2009), a habitual pitch that is at least 50Hz lower than the average is likely to be considered already quite low for an American woman, though it remains to be seen what value is “low enough” for creaky voice to stop having an effect. For all speakers, the ratings for the whole creak condition are significantly lower than for partial creak. The findings at least for the higher pitched speakers suggest that listeners are not attending just to the modal portion of an utterance when holistically determining a speaker’s habitual pitch, though this study cannot address whether speakers are averaging over the whole utterance (both modal and creaky portions), or if a different type of weighting occurs. Moreover, the creak in this stimuli only occurs over the second 50% of the utterance, since this is the typical location of phrasal creak in American English. However, it is currently unclear whether listeners’ perception of the holistic pitch of partially creaky utterances would generalize to creak in different locations. The results of this study also complement previous research showing that speakers can use speech samples ranging from vowels to whole sentences either to identify where in the speaker’s range a particular utterance falls, or to compare holistically across speakers (Bishop and Keating, 2012; Davidson, 2018b; Honorof and Whalen, 2005; Lee et al., 2010). Listeners were sensitive to a ~15Hz difference in the modal voice of the two higher pitched speakers, but treated the 3Hz average difference among the lower pitched speakers as equivalent. In the partial creak condition, the ratings did not correspond to the F0 of the modal portion of these utterances, but for the higher-pitched speakers, they were consistent with a combination of the modal and creaky portions. A new result in this study is that listeners also appear to be sensitive to perceived pitch even in creaky voice, but it is not the case that the lowest creaky pitch necessarily maps onto the lowest pitch rating. While the pitch ratings among Hi-1, Lo-1, and Lo- 2 are what would be expected (84Hz > 78Hz > 72Hz), Hi-2 at 68Hz actually has the highest pitch rating.

13

One reason for this discrepancy between Hi-2’s low F0 for whole creak but relatively high pitch ratings may be related to her anomalously high value for H1*-H2*, which is similar to the measurements for her modal tokens (and different from the expected low value for the creaky portion of the partially creaky stimuli.) While individual speakers each have different values for spectral tilt measures like H1*-H2* and the literature has not confirmed that a positive versus negative difference reliably maps onto modal vs. creaky voice, it is nevertheless notable that Hi- 2’s positive H1*-H2* value for whole creak is similar to the values for both her own modal voice and to Hi-1’s modal voice values (between 4-8dB), while the low-pitched speakers’ modal values are lower (between 0-2dB) (see Table 1). Kreiman and Gerratt (2010) find that English- speaking listeners are sensitive to a difference in H1-H2 of at least 3.6dB, so if the relatively high values of H1*-H2* of Hi-1 and Hi-2 are associated with a higher perceived pitch, then the unexpectedly high value of H1*-H2* for Hi-2’s whole creak tokens could cause the raters to perceive these stimuli as having a higher pitch than F0 alone would indicate. Both the perceptual integration of creaky and modal voice in perceiving pitch, and the potential effect of an outlying spectral measure like H1-H2, are consistent with research which has shown that speakers are sensitive to multiple acoustic components in the perception of pitch. As also reviewed in Sections 1 and 2, spectral tilt and noise cues can also affect the perception of pitch in short stretches of non-modal phonation such as breathy, tense, and creaky voice (Kuang and Liberman, 2016; Kuang and Liberman, 2018; Silverman, 2003; Yu and Lam, 2014), though these cues seem to play a secondary role as compared to F0 (Bishop and Keating, 2012; Lee, 2009). This study was partially informed by research that has speculated that certain speakers could be implementing creaky voice in order to lower their perceived habitual pitch (Anderson et al., 2014; Yuasa, 2010; Zimman, 2018). While it remains to be seen whether the role of creaky voice in lowering the perception of the habitual pitch of a speaker scales up to entire conversations, these results suggest that especially higher pitched speakers who wish to come across as having a lower pitch could conceivably implement creaky voice to achieve this goal. However, it would appear that such a tactic could come at a cost. As some have reported (Anderson et al., 2014; Irons and Alexander, 2016; Ligon et al., 2018), at least some listeners have negative perceptions of creaky voice. For example, Anderson et al. (2014) tested whether creaky voice had a measurably negative effect on how listeners rated personality characteristics of speakers using creaky voice (‘vocal fry’) in a workplace-like setting, finding that creaky

14 versions of the phrase ‘Thank you for considering me for this opportunity’ are given fewer positive ratings than the modal tokens. This view has been present in the popular press as well, where women are explicitly counseled to eliminate vocal fry if they wish to be taken seriously in the workplace (Chappelow, 2012; Hageman, 2013; Khazan, 2014; Mo, 2016; Wolf, 2015). However, as Ligon et al. (2018) imply, female or higher pitched speakers in particular may be boxed in on all sides, since such speakers could use creaky voice to both lower their pitch, and to avoid sounding “aggressive” or “obnoxious”, which were adjectives assigned to the loud voice quality in Ligon et al., or “feminine” and “sweet”, which were assigned to high-pitched voice. In comparison, low-pitched voice was assigned almost nearly all positive attributes, such as “cool, mature, confident”, and also “manly” (note that all of the speakers in their study were female). Different results in Parker and Borrie (2018) also show that speech rate and the speaker’s pitch in modal speech also affected how listeners judged female speakers’ likeability and intelligence. In particular, at lower pitch and faster speech rates, creaky voice lowered ratings of likeability and intelligence, but faster speech at higher pitch led to higher scores on likeability when creak was present. Thus, it could be that there is only a limited combination of pitch and voice quality options that are socially preferred for female and high-pitched speakers, and that it is difficult for speakers to achieve this ‘Goldilocks’ middle ground. In conclusion, this study demonstrates that when listeners hear utterances with a combination of modal and creaky voice, they rate speakers with a higher modal pitch as having lower pitch on the stimuli with partial creak, while ratings of partially creaky utterances for speakers with a lower modal pitch are not different as compared to fully modal utterances. This suggests that listeners integrate the low F0 of creaky voice into their overall holistic perception of the pitch of the utterance, though if the modal pitch is already at the lower end of the range for speakers (in this case, white, professional female speakers of American English), the addition of creak at the end of the utterance may not further lower the perception of that speaker’s pitch. In addition, this study also showed that listeners seem to distinguish between pitch even in fully creaky utterances, and that other measures of voice quality, like H1-H2, may contribute to perceived pitch especially if their values are notably out of line with what is expected. In the future, more investigation of how non-modal phonation in general, and measures of spectral tilt and noise more specifically, interact with F0 are important for shedding light on how listeners perceive pitch, especially in multi-word utterances.

15

5. REFERENCES

ABDELLI-BERUH, NASSIMA, WOLK, LESLIE and SLAVIN, DIANNE. 2014. Prevalence of vocal fry in young adult male American English speakers. Journal of Voice 28.185-90

ABDELLI-BERUH, NASSIMA, DRUGMAN, THOMAS and RED OWL, R. H. 2016. Occurrence frequencies of acoustic patterns of vocal fry in American English speakers. Journal of Voice 30.759.e11-59.e20. http://www.sciencedirect.com/science/article/pii/S0892199715002118

ANDERSON, RINDY, KLOFSTAD, CASEY, MAYEW, WILLIAM and VENKATACHALAM, MOHAN. 2014. Vocal fry may undermine the success of young women in the labor market. PLOS ONE 9.e97506. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0097506

AVELINO, HERIBERTO. 2010. Acoustic and electroglottographic analyses of nonpathological, nonmodal phonation. Journal of Voice 24.270-80

BATES, DOUGLAS, MAECHLER, MARTIN, BOLKER, BEN, WALKER, STEVEN, CHRISTENSEN, RUNE,

SINGMANN, HENRIK, DAI, BIN, SCHIEPL, FABIAN, GROTHENDIECK, GABOR, GREEN, PETER

and FOX, JOHN. 2018. lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1-19. http://CRAN.R-project.org/package=lme4

BATLINER, ANTON, BURGER, SUSANNE, JOHNE, B. and KIESSLING, ANDREAS. 1993. MÜSLI: A classification scheme for laryngealizations. Proceedings of the ESCA workshop on .176-79

BECKER, MICHAEL and LEVINE, JONATHAN. 2013. Experigen–an online experiment platform. http://becker.phonologist.org/experigen

BEHRMAN, ALISON and AKHUND, ALI. 2016. The effect of loud voice and clear speech on the use of vocal fry in women. Folia Phoniatrica et Logopaedica 68.159-66

BELOTEL-GRENIE, AGNES and GRENIE, MICHEL. 2004. The creaky voice phonation and the organisation of Chinese discourse. Proceedings of the International Symposium on Tonal Aspects of Languages, Beijing, China. https://www.isca- speech.org/archive/tal2004/papers/tal4_005.pdf

16

BISHOP, JASON and KEATING, PATRICIA. 2012. Perception of pitch location within a speaker’s range: Fundamental frequency, voice quality and speaker sex. Journal of the Acoustical Society of America 132.1100-12

BLANKENSHIP, BARBARA. 2002. The timing of nonmodal phonation in vowels. Journal of 30.163-91

BLOMGREN, MICHAEL, CHEN, YANG, NG, MANWA and GILBERT, HARVEY R. 1998. Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. Journal of the Acoustical Society of America 103.2649-58

BOERSMA, PAUL and WEENINK, DAVID. 2018. Praat: Doing phonetics by computer [Computer program]. version 6.0.23.

BORKOWSKA, BARBARA and PAWLOWSKI, BOGUSLAW. 2011. Female voice frequency in the context of dominance and attractiveness perception. Animal Behaviour 82.55-59.

CALLIER, PATRICK. 2013. Linguistic Context and the Social Meaning of Voice Quality Variation. Georgetown University: PhD dissertation.

CHAPPELOW, CRAIG. 2012. The Verbal Tic of Doom: Why the “Vocal Fry” Is Killing Your Job Search. Fast Company. Retrieved on March 13, 2019. https://www.fastcompany.com/1834461/verbal-tic-doom-why-vocal-fry-killing-your-job- search

DAVIDSON, LISA. 2018a. The effects of pitch, gender, and prosodic context on the identification of creaky voice. Phonetica, to appear. https://www.karger.com/DOI/10.1159/000490948

DAVIDSON, LISA. 2018b. Perception of relative pitch of sentence-length utterances. Journal of the Acoustical Society of America Express Letters 144.EL89-EL94

DAVIDSON, LISA. 2019. Perceptual coherence of creaky voice qualities. Proceedings of the International Congress of Phonetic Sciences 2019, 1-5. Melbourne, Australia.

DRUGMAN, THOMAS, KANE, JOHN and GOBL, CHRISTER. 2014. Data-driven detection and analysis of the patterns of creaky voice. Computer Speech and Language 28.1233-53

EPSTEIN, MELISSA. 2002. Voice Quality and Prosody in English. UCLA: PhD dissertation.

ESPOSITO, CHRISTINA. 2012. An acoustic and electroglottographic study of White Hmong tone and phonation. Journal of Phonetics 40.466-76

17

FEINBERG, DAVID, DEBRUINE, LISA , JONES, BENEDICT and PERRETT, DAVID. 2008. The role of femininity and averageness of voice pitch in aesthetic judgments of women's voices. Perception 37.615-23.

FRACCARO, PAUL J., O'CONNOR, JILLIAN J. M., RE, DANIEL E., JONES, BENEDICT C., DEBRUINE,

LISA M. and FEINBERG, DAVID R. 2013. Faking it: deliberately altered voice pitch and vocal attractiveness. Animal Behaviour 85.127-36.

GARELLEK, MARC. 2012. The timing and sequencing of coarticulated non-modal phonation in English and White Hmong. Journal of Phonetics 40.152-61

GARELLEK, MARC. 2019. The phonetics of voice. Handbook of Phonetics, ed. by William Katz and Peter Assmann. New York: Routledge.

GARELLEK, MARC and KEATING, PATRICIA. 2011. The acoustic consequences of phonation and tone interactions in . Journal of the International Phonetic Association 41.185-205.

GARELLEK, MARC and SEYFARTH, SCOTT. 2016. Acoustic differences between English /t/ and phrasal creak. Proceedings of Interspeech 2016.1054-58

GERFEN, CHIP and BAKER, KIRK. 2005. The production and perception of laryngealized vowels in Coatzospan Mixtec. Journal of Phonetics 33.311-34.

GERRATT, BRUCE and KREIMAN, JODY. 2001. Toward a taxonomy of nonmodal phonation. Journal of Phonetics 29.365-81

HAGEMAN, WILLIAM. 2013. Stop! You're hurting my ears! Chicago Tribune. Retrieved on March 13, 2019. https://www.chicagotribune.com/lifestyles/sc-fam-0723-voice-control- 20130723-story.html

HENTON, CAROLINE and BLADON, ANTHONY. 1987. Creak as a sociophonetic marker. Language, speech and mind: studies in honor of Victoria A. Fromkin, ed. by Larry Hyman and C Li, 3-29. London: Routledge.

HOLLIEN, HARRY and WENDAHL, RONALD. 1968. Perceptual study of vocal fry. Journal of the Acoustical Society of America 43.506-09

HOLLIEN, HARRY and MICHEL, J.F. 1968. Vocal fry as a phonational . Journal of Speech and Hearing Research 11.600-04

HONOROF, DOUGLAS and WHALEN, D. H. 2005. Perception of pitch location within a speaker’s F0 range. Journal of the Acoustical Society of America 117.2193-200

18

IRONS, SARAH T. and ALEXANDER, JESSICA E. 2016. Vocal fry in realistic speech: Acoustic characteristics and perceptions of vocal fry in spontaneously produced and read speech. The Journal of the Acoustical Society of America 140.3397-97.

ISHI, CARLOS, SAKAKIBARA, KEN-ICHI, ISHIGURO, HIROSHI and HAGITA, NORIHIRO. 2008. A method for automatic detection of vocal fry. IEEE Transactions on Audio, Speech and Language Processing 16.47-56

KANE, JOHN, DRUGMAN, THOMAS and GOBL, CHRISTER. 2013. Improved automatic detection of creak. Computer Speech & Language 27.1028-47.

KEATING, PATRICIA and KUO, GRACE. 2012. Comparison of speaking fundamental frequency in English and Mandarin. Journal of the Acoustical Society of America 132.1050-60

KEATING, PATRICIA, GARELLEK, MARC and KREIMAN, JODY. 2015. Acoustic properties of different kinds of creaky voice. Proceedings of the 18th Internatinal Congress of Phonetic Sciences, ed. by The Scottish Consortium for ICPhS 2015. Glasgow, Scotland.

KHAZAN, OLGA. 2014. Vocal Fry May Hurt Women's Job Prospects. The Atlantic. Retrieved on March 13, 2019. https://www.theatlantic.com/business/archive/2014/05/employers-look- down-on-women-with-vocal-fry/371811/

KLOFSTAD, CASEY. 2015. Candidate voice pitch influences election outcomes. Political Psychology 37.725-38

KLOFSTAD, CASEY, ANDERSON, RINDY and PETERS, SUSAN. 2012. Sounds like a winner: voice pitch influences perception of leadership capacity in both men and women. Proceedings of the Royal Society B 279.2698-704

KREIMAN, JODY. 1982. Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics 10.163-75

KREIMAN, JODY and GERRATT, BRUCE. 2010. Perceptual sensitivity to first harmonic amplitude in the voice source. The Journal of the Acoustical Society of America 128.2085-89. https://asa.scitation.org/doi/abs/10.1121/1.3478784

KREIMAN, JODY, SHUE, YEN-LIANG, CHEN, GANG, ISELI, MARKUS, GERRATT, BRUCE,

NEUBAUER, JUERGEN and ALWAN, ABEER. 2012. Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation. The Journal of the Acoustical Society of America 132.2625–32

19

KUANG, JIANJING and LIBERMAN, MARK. 2016. The effect of vocal fry on pitch perception. Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ed. by Zhi Ding, Wenjun Zhang and Zhi-Quan Luo. Shanghai: IEEE.

KUANG, JIANJING and LIBERMAN, MARK. 2018. Integrating voice quality cues in the pitch perception of speech and non-speech utterances. Frontiers in Psychology 9. https://www.frontiersin.org/article/10.3389/fpsyg.2018.02147

KUZNETSOVA, ALEXANDRA, BROCKHOFF, PER BRUUN and CHRISTENSEN, RUNE. 2013. lmerTest: Tests for random and fixed effects for linear mixed effect models.

LAVER, JOHN. 1980. The Phonetic Description of Voice Quality. Cambridge: Cambridge University Press.

LEE, CHAO-YANG. 2009. Identifying isolated, multispeaker Mandarin tones from brief acoustic input: A perceptual and acoustic study. Journal of the Acoustical Society of America 125.1125-37

LEE, CHAO-YANG, DUTTON, LAUREN and RAM, GAYATRI. 2010. The role of speaker gender identification in relative fundamental frequency height estimation from multispeaker, brief speech segments. Journal of the Acoustical Society of America 128.384-88

LEE, SINAE. 2015. Creaky voice as a phonational device marking parenthetical segments in talk. Journal of Sociolinguistics 19.275-302

LIGON, CLAIRE, ROUNTREY, CARRIE, VAIDYA RANK, NOOPUR, HULL, MICHAEL and KHIDR,

ALIAA. 2018. Perceived desirability of vocal fry among female speech communication disorders graduate students. Journal of Voice. to appear

MCGLONE, ROBERT. 1967. Air flow during vocal fry phonation. Journal of Speech, Language and Hearing Research 10.299-304

MCGLONE, ROBERT and SHIPP, THOMAS. 1971. Some physiologic correlates of vocal fry phonation. Journal of Speech, Language and Hearing Research 14.769-75

MELVIN, SHANNON and CLOPPER, CYNTHIA G. 2015. Gender variation in creaky voice and fundamental frequency. 18th International Congress of Phonetic Sciences, ed. by Scottish Consortium for ICPhS 2015. Edinburgh: Scotland.

20

MENDOZA-DENTON, NORMA. 2011. The semiotic hitchhiker's guide to creaky voice: Circulation and gendered hardcore in a Chicana/o gang persona. Journal of Linguistic Anthropology 21.261-80

MO, KELSEY. 2016. Women should avoid using "vocal fry" in the workplace. The State Press. Retrieved on March 13, 2019. http://www.statepress.com/article/2016/10/spopinion- women-should-stop-using-vocal-fry-in-the-workplace

MURRY, THOMAS. 1971. Subglottal pressure and airflow measures during vocal fry phonation. Journal of Speech, Language and Hearing Research 14.544-51

NGUYEN, THI-LAN, MICHAUD, ALEXIS, TRAN, DO-DAT and MAC, DANG-KHOA. 2013. The interplay of intonation and complex lexical tones: how speaker attitudes affect the realization of glottalization on Vietnamese sentence-final particles. Proceedings of Interspeech 2013, Lyon, France. https://www.isca- speech.org/archive/archive_papers/interspeech_2013/i13_3522.pdf

OGDEN, RICHARD. 2001. Turn transition, creak and in Finnish talk-in-interaction. Journal of the International Phonetic Association 31.139-52

OLIVEIRA, GISELE, DAVIDSON, ASHIRA, HOLCZER, RACHELLE, KAPLAN, SARA and PARETZKY,

ADINA. 2016. A Comparison of the Use of Glottal Fry in the Spontaneous Speech of Young and Middle-Aged American Women. Journal of Voice 30.684-87.

PALERMO, ELIZABETH. 2013. Is the Way You Talk Killing Your Career? Business News Daily. Retrieved on March 13, 2019. https://www.businessnewsdaily.com/4020-vocal-fry- career-damage.html

PANFILI, L.M., HAYWOOD, J., MCCLOY, D.R., SOUZA, P.E. and WRIGHT, R.A. 2017. The UW/NU Corpus, Version 2.0

PARKER, MICHELLE and BORRIE, STEPHANIE. 2018. Judgments of intelligence and likability of young adult female speakers of American English: The influence of vocal fry and the surrounding acoustic-prosodic context. Journal of Voice 32.538-45

PEPIOT, ERWAN. 2014. Male and female speech: a study of mean f0, f0 range, phonation type and speech rate in Parisian French and American English speakers. Proceedings of Speech Prosody 7, ed. by Nick Campbell, Dafydd Gibbon and Daniel Hirst. Dublin, Ireland.

PRATT, TERESA. 2018. Affective sociolinguistic style: an ethnography of embodied linguistic variation in an arts high school. Stanford University: PhD Dissertation.

21

PUTS, DAVID ANDREW, HODGES, CAROLYN R., CÁRDENAS, RODRIGO A. and GAULIN, STEVEN J. C. 2007. Men's voices as dominance signals: vocal fundamental and formant frequencies influence dominance attributions among men. Evolution and Human Behavior 28.340-44.

REDI, LAURA and SHATTUCK-HUFNAGEL, STEFANIE. 2001. Variation in the realization of glottalization in normal speakers. Journal of Phonetics 29.407-29.

SAMLAN, ROBIN and STORY, BRAD. 2011. Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. Journal of Speech, Language and Hearing Research 54.1267-83

SEYFARTH, SCOTT and GARELLEK, MARC. 2015. Coda glottalization in American English. Proceedings of the 18th International Congress of Phonetic Sciences. http://idiom.ucsd.edu/~mgarellek/files/Seyfarth_Garellek_2015_ICPhS.pdf

SHADLE, CHRISTINE H. 1985. Intrinsic fundamental frequency of vowels in sentence context. The Journal of the Acoustical Society of America 78.1562-67. https://asa.scitation.org/doi/abs/10.1121/1.392792

SHUE, YEN-LIANG, KEATING, PATRICIA, VICENIK, CHAD and YU, KRISTINE. 2011. VoiceSauce: A program for voice analysis. Proceedings of the XVII International Congress of Phonetic Sciences, 1846-49. Hong Kong: International Phonetic Association.

SICOLI, MARK. 2015. Voice Registers. Handbook of Discourse Analysis, ed. by Deborah Tannen, Heidi Hamilton and Deborah Schiffrin, 105-26. Chichester: John Wiley and Sons.

SILVERMAN, DANIEL. 2003. Pitch discrimination during breathy versus modal phonation. Papers in Laboratory Phonology VI, ed. by John Local, Richard Ogden and Rosalind Temple, 293-304. Cambridge: Cambridge University Press.

SIMPSON, ADRIAN. 2009. Phonetic differences between male and female speech. Language and Linguistics Compass 3.621-40

SLIFKA, JANET. 2006. Some physiological correlates to regular and irregular phonation at the end of an utterance. Journal of Voice 20.171-86

TITZE, INGO. 1994. Principles of Voice Production. Englewood Cliffs, NJ: Prentice Hall.

WHALEN, D. H. and LEVITT, ANDREA G. 1995. The universality of intrinsic F0 of vowels. Journal of Phonetics 23.349-66.

22

WOLF, NAOMI. 2015. Young women, give up the vocal fry and reclaim your strong female voice. The Guardian. Retrieved on March 13, 2019. https://www.theguardian.com/commentisfree/2015/jul/24/vocal-fry-strong-female-voice

WOLK, LESLIE, ABDELLI-BERUH, NASSIMA and SLAVIN, DIANNE. 2012. Habitual use of vocal fry in young adult female speakers. Journal of Voice 26.111-16

YU, KRISTINE and LAM, HIU-WAI. 2014. The role of creaky voice in Cantonese tonal perception. Journal of the Acoustical Society of America 136.1320-1333

YUASA, IKUKO PATRICIA. 2010. Creaky voice: a new feminine voice quality for young urban- oriented upwardly mobile American women? American Speech 85.315-37

ZIMMAN, LAL. 2018. Transgender voices: Insights on identity, embodiment, and the gender of the voice. Language and Linguistics Compass 12.e12284. https://onlinelibrary.wiley.com/doi/abs/10.1111/lnc3.12284

23

1Since Anderson et al. (2014) found that listener age may affect subjective evaluation at least on one of their categories, another version of this analysis was carried out with age as a continuous variable. While age did not have a significant main effect, there were some significant interactions between age and individual speakers. Crucially, however, there were no significant interactions between age and amount of creak. This analysis revealed that while older listeners occasionally rated individual speakers differently (either relatively higher or relatively lower depending on the speaker) than the younger listeners did, the patterns between modal, partially creaky, and whole creak remained the same for all speakers regardless of age. Since the purpose of this study is to examine the effect of amount of creak on the holistic perception of pitch, and this did not interact with age, the age factor is not further reported on here. Likewise, an analysis including participant sex as a continuous variable showed neither a main effect of sex, nor interactions either with individual speakers or amount of creak. Therefore, sex is also excluded from the analysis.