<<

MEASURES OF ONSET TIME: A METHODOLOGICAL STUDY

Rebecca Rae

A Thesis

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

May 2018

Committee:

Ronald Scherer, Advisor

Jason Whitfield

Brent Archer © 2018

Rebecca C. Rae

All Rights Reserved iii

ABSTRACT

Ronald Scherer, Advisor

The current investigation aimed to compare four different measurement approaches for the determination of voice onset time for the six English stop , where VOT is defined as the “burst to onset of ”. The signals of interest were the wideband airflow, microphone, electroglottograph, and the spectrographic display. A primary question was whether the use of the wideband airflow signal results in shorter VOT measurements. Two adult males and two adult females produced “CV-the-CV” utterances (e.g., “pɑ the pɑ”) containing the six

English stop consonants across two conditions, habitual vs. clear speech. Visual measurements were from the burst to the initial detection of phonation (IDP). The wideband airflow gave the shortest VOT measures. For habitual speech and for the voiceless stop consonants, the airflow signal revealed glottal airflow oscillations on average of 1.7 ms sooner than the microphone signal, 8.1 ms sooner than the EGG signal, and 13.6 ms sooner than spectrographic formant detection. The VOT differences between the airflow and microphone signals were not

significant. For voiced stop consonants, the airflow signal typically also gave similar values of

VOT as the microphone signal, and on average 6 ms sooner than the formant excitation and 5 ms

sooner than the electroglottograph signal. The study emphasizes the finding that the initial

detection of phonation often appears earlier after the burst for the wideband airflow and microphone signals in comparison to the electroglottograph and spectrographic signals. iv

ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Ronald Scherer, for all of his help and guidance over the past two years. Dr. Scherer served as an excellent role model and his willingness to give his time and knowledge so generously has been very much appreciated. I would also like to thank my committee members, Dr. Jason Whitfield and Dr. Brent Archer, for providing help with statistics and document edits, and providing useful and insightful suggestions throughout the course of this research project. Finally, I would particularly like to thank my parents and all of my family for all of their support, not only on this endeavor but also on each endeavor I have embarked on throughout my academic and non-academic career. v

TABLE OF CONTENTS

Page

INTRODUCTION………………………………………………………………………..... 1

Stop Consonants and Voice Onset Time…………………………………………. .. 1

Voice Onset Time…………………………………………...... 2

VOT Measures for Adults…………………………………………...... 2

Influential Conditions…………………………………………...... 6

Speech Rate…………………………………...... 6

Speaker Sex ...... …………………………………. 6

Speaker Age ...... …………………………………. 9

Place of Articulation… ...... ………………………………. 10

Vowel Height ...... …………………………………. 10

Linguistic Task...... …………………………………. 10

Methods Used to Measure Voice Onset Time……………………………………. .. 11

Wideband Spectrogram…………………………………...... 11

Acoustic Displays ...... …………………………………. 12

Electroglottograph...... ………………………………. 13

Wideband Airflow ...... ………………………………. 21

Statement of the Problem……………………………………...... 22

Research Objectives ……………………………………...... 23

CHAPTER I. METHODS………………………...... 25

IRB Approval………………………………………………………………...... 25

Participants…………………………………………………………………………. 25 vi

Voice, Speech, and Health Questionnaire………………………………………….. 25

Speech Stimuli…………………………………………………...... 25

Recording Environment………………………………………………………….. ... 26

Recording Protocol………………………………………………………………. ... 26

Data Analysis…………………………………………...... 27

Statistics…………………………………………………...... 29

CHAPTER II. RESULTS……………………………...... 34

Measurement Methodology……………………………………………………….. . 34

Measurement Location……………………………………………………………... 34

Influential Conditions ...... ……………………………………………………. 35

Voicing Mode ...... ………………………………………………….. 35

Place of Articulation ...... ………………………………………………….. 38

CHAPTER III. DISCUSSION ...…………………………………………………………… 47

Measurement Methodology……………………………………………………….. . 47

Measurement Location……………………………………………………………... 49

Influential Conditions ...... ……………………………………………………. 50

Voicing Mode ...... ………………………………………………….. 50

Place of Articulation ...... ………………………………………………….. 51

Clinical Relevance ...... ……………………………………………………. 52

CHAPTER IV. CONCLUSIONS………………………………… ..……………………… 54

REFERENCES……………………………………………………………………………… 56

APPENDIX A. INSTITUTIONAL REVIEW BOARD APPROVAL LETTER………… . 62

APPENDIX B. VOICE, SPEECH, AND HEALTH QUESTIONNAIRE……………… ... 63 vii

LIST OF TABLES

Table Page

1 Voiceless Stop Consonant VOT Data (ms) ...... 4

2 Voiced Stop Consonant VOT Data (ms) ...... 5

3 Previously Studied Influential Factors and Conditions Relative to VOT ...... 7

4 Previously Implemented VOT Analysis Methods ...... 14

5 Measurement Methodology Fixed Effect Estimates ...... 35

6 Measurement Location Fixed Effect Estimates ...... 36

7 Voiced VOT Mode Distribution for Habitual Speech Production...... 37

8 Frequency of Pre-voicing Relative to Consonant Place of Articulation for Voiced Stop

Consonants…...... 37

9 Average Voiceless Stop Consonant VOT Data for Habitual Speech Production ...... 39

10 Average Voiced Stop Consonant VOT Data for Habitual Speech Production ...... 40

11 Average /p/ VOT Values in Habitual Speech Production ...... 41

12 Average /t/ VOT Values in Habitual Speech Production ...... 42

13 Average /k/ VOT Values in Habitual Speech Production ...... 43

14 Average /b/ VOT Values in Habitual Speech Production ...... 44

15 Average /d/ VOT Values in Habitual Speech Production ...... 45

16 Average /g/ VOT Values in Habitual Speech Production ...... 46 viii

LIST OF FIGURES

Figure Page

1 Utility of the Wideband Airflow Signal for Burst Detection – Example A...... 28

2 Utility of the Wideband Airflow Signal for Burst Detection – Example B ...... 28

3 Demonstration of the Segmentation Between the Burst and First Valley as the Initial

Detection of Phonation (IDP) for the Determination of VOT – Example A ...... 31

4 Demonstration of the Segmentation Between the Burst and First Valley as the Initial

Detection of Phonation (IDP) for the Determination of VOT – Example B ...... 32

5 Demonstration of the Points of Demarcation for the Measurements of VOT in the

Coupled Microphone and Spectrogram Displays ...... 33

6 Demonstration of the Points of Demarcation for the Measurements of VOT in the

Wideband Airflow Signal ...... 34

7 Demonstration of the Points of Demarcation for the Measurements of VOT in the

Electroglottograph...... 34

8 Average /p/ VOT Values in Habitual Speech Production ...... 43

9 Average /t/ VOT Values in Habitual Speech Production ...... 44

10 Average /k/ VOT Values in Habitual Speech Production ...... 45

11 Average /b/ VOT Values in Habitual Speech Production ...... 46

12 Average /d/ VOT Values in Habitual Speech Production ...... 47

13 Average /g/ VOT Values in Habitual Speech Production ...... 48

14 VOT Distribution for Voiced and Voiceless Stops...... 51 1

INTRODUCTION

Stop Consonants and Voice Onset Time

Stop consonants are phonemes produced in the presence of a complete obstruction in the vocal tract (Pickett, 1999). During production of a stop consonant, air pressure builds in the vocal tract behind the articulatory obstruction until there is a separation of the articulators (the burst) and the following release of air and the accompanying frication noise (Edwards, 1981).

This may or may not be followed by an aspiration noise. In the , the stop consonants phonemes are [p, b, t, d, k, g]. Based on the distinctive feature theory proposed by

Chomsky and Halle (1968), phonemes [b, d, g] are classified as “+ voice” because they are produced with phonation, and the phonemes [p, t, k] are classified as “– voice” as they are produced without phonation. When attempting to distinguish within phonemic categories, this

“voiced” and “voiceless” dichotomy can be very useful.

The early work of Lisker and Abramson (1964) analyzed this voiced – voiceless dichotomy in stop consonant productions and approached an explanation and definition of voice onset time with the following statement:

“The two features correlated with voicing and aspiration - periodic pulsing at the frequency of the voice pitch and noise in the frequency range of the higher formants - have an interesting relation to one another, at least in the case of the stops in English; each feature tends to be prominent in spectrograms only where the other is absent. Thus if a portion of a spectrographic pattern indicates the presence of voicing, then the noise feature is absent or much obscured, while if noise is strongly marked then periodic pulsing is usually not discernible. Now if we locate a pattern segment with reference to the instant of release of the stop closure, - and this event is marked by an abrupt increase in the amplitude and frequency spread of the signal - then we may define the amount or degree of voicing of a stop as the duration of the time interval by which the onset of periodic pulsing either precedes or follows release. … a difference of voicing not only separates voiced from voiceless stops, but that it equally well distinguishes aspirated from unaspirated stops, where the latter are both commonly called voiceless. The noise feature of aspiration, instead of being considered coordinate with voicing, is then regarded simply as the automatic concomitant of a large delay in voice onset. In English, at least, this seems reasonable: /b d g/ and /p t k/ probably differ everywhere in the time 2

of voice onset relative to release, but in certain positions the presence of aspiration noise tells us something about the absolute magnitude of delay in the onset time following /p t k/ releases. … it seems reasonable to begin a study aimed at finding the acoustic features which serve as cues for the manner differentiation of stops by fixing attention on the timing relation between voice onset and the release of occlusion.” (p. 387)

“Wide-band spectrograms of the recordings were made, and from these, voice onset times were measured by marking off the interval between the release of the stop and the onset of glottal vibration, that is, voicing. The point of voicing onset was determined by locating the first of the regularly spaced vertical striations which indicate glottal pulsing, while the instant of release was found by fixing the point where the pattern shows an abrupt change in overall spectrum.” (p. 389; bolding for emphasis)

Voice Onset Time

Voice onset time (VOT) is defined as “the time difference between the release of the oral constriction for production and the onset of vocal-fold vibration” (Lisker & Abramson,

1964, 1967). VOT is typically stated as a numerical value in milliseconds (ms). VOT can be viewed as a summative variable of articulator-laryngeal coordination as it reflects temporal control between the larynx, lips, tongue, and jaw (Auzou, Ozsancak, Morris, Jan, Eustache, &

Hannequin, 2000; Baken & Orlikoff, 2000). Consequently, VOT can be utilized as a functional tool to categorize acoustic aspects of a range of developmental, neuromotor, or linguistic disorders (Auzou et al., 2000; Baken & Orlikoff, 2000). Numerous researchers have explored

VOT variability relative to physiological, pathological, and linguistic task differences (see below for details).

VOT Measures for Adults

Stop consonants in the English language are typically placed into one of three distinct

VOT categories. Voiced phonemes /b/, /d/, and/ g/ are categorized as either “short voicing lag” or “voicing lead”. During the production of “short voicing lag” phonemes, voicing occurs concurrently or just after the burst, which results in a VOT value very close to zero or a small positive value typically under approximately 20 milliseconds (Smith, 1978; Swartz, 1991; Lin & 3

Wang, 2011). During the production of “voicing lead” phonemes, voicing occurs before the burst, which results in a negative VOT value (Smith, 1978; McCrea & Morris, 2005). For “short voicing lag” phonemes, VOT values tend to range between 0 and 25 milliseconds with a median value of 10 milliseconds (Auzou et al., 2000). In the English language, the voiced stop consonant phonemes /b/, /d/, and /g/ are produced more frequently as “short voicing lag” phonemes (Lisker

& Abramson, 1964, 1967; Smith, 1978). Lisker and Abramson (1964) found individual speakers typically utilize one voicing mode (“voicing lead” vs. “short voicing lag”) exclusively for voiced stop consonant production. A study performed by Smith (1978) found speakers do not exclusively use one voicing mode over another and tend to switch between “voicing lead” and

“short voicing lag” modes depending on the linguistic context. Additionally, Smith (1978) found the voicing mode for voiced stop consonant phonemes significantly correlates with place of articulation. Smith (1978) found speakers produced the bilabial /b/ phoneme as “voicing lead”

56% of the time, alveolar phoneme /d/ as “voicing lead” approximately 50% of the time, and the velar /g/ phoneme as “voicing lead” 39% of the time. The voiceless stop consonant phonemes

/p/, /t/, and /k/ are classified as “long voicing lag” phonemes. During the production of “long voicing lag” phonemes, voicing occurs significantly after the burst, which results in positive

VOT values. A number of studies examining VOT values in the English language have contributed to the development of VOT data for each of these phonemes (Table 1, 2). These studies report a wide range of data for each stop consonant due to varying research methodologies and different controls of influential variables. Overall data trend analysis of the current English language VOT research literature reveals native English speakers primarily use only “short voicing lag” or “long voicing lag” voicing modes. For “long voicing lag” phonemes,

VOT values tend to range between 60 and 100 milliseconds with a median value of 4

Table 1: Voiceless Stop Consonant VOT Data (ms). Average VOT values from previous research studies for the production of voiceless stop consonants in phrases and single words. Measures of central tendency (mean and standard deviations) and the corresponding range of the data are reported. The values reported are in milliseconds (ms).

# of /p/ /t/ /k/ Researchers Speaking Context Participants avg. SD range avg. SD range avg. SD range Lisker and Abramson, 1964 4 58 /A 20:120 70 N/A 30:105 80 N/A 50:135 monosyllabic CV words monosyllabic CV words in Lisker and Abramson, 1964 4 28 N/A 10:45 39 N/A 15:70 43 N/A 30:85 conversation and narrative Klatt, 1975 3 47 ≈ 11 42:50 65 ≈ 11 53:77 70 ≈ 11 66:74 “say CV again” Sweeting and Baken, 1982 30 77 4.8 N/A ------“It’s a CV” Hardcastle et al., 1985 4 65.3 N/A N/A 80 N/A N/A 82.8 N/A N/A single words Caruso and Burton, 1987 8 62.5 25.4 N/A 71.9 19.4 N/A 74.8 16.0 N/A “Say CV again” Lee et al., 1988 10 - - - 72.5 26.9 N/A - - - “twenty one” Forrest, Weismer, and 8 46 10 N/A ------“Buy Bobby a poppy” Turner, 1989 Baum and Ryan, 1993 10 85 N/A N/A 95 N/A N/A 110 N/A N/A “Please say CVC”

Brown, Morris, & Weiss, 12 F 60.3 14.4 33:79 62.5 12.6 39:79 - - - “Speak CV again” 1993 monosyllabic or disyllabic Yang, 1993 7 77 - 47- 142 95 - 54- 193 88 - 45-131 CV Robb, Gilbert, and Lerman, 10 F 73 15 83 18 88 16 N/A N/A N/A “It’s a CV” 2005 10 M 58 13 76 15 78 12 Morris, McCrea and Herring, 40 F 58.8 15.7 78.5 17.7 75.1 17.1 N/A N/A N/A CV syllables 2008 40 M 55.6 19.2 69.2 16.2 73.4 17.8 Overall Average 60.8 73.7 79.0 ------(SD) (14.7) (14.5) (16.8) Note. CV = consonant-, avg. = mean, SD = Standard Deviation, N/A = value could not be determined due to logistical reasons 5

Table 2: Voiced Stop Consonant VOT Data (ms). Average VOT values from previous research studies for the production of voiced stop consonants in phrases and single words. Measures of central tendency (mean and standard deviations) and the corresponding range of the data are reported. The values reported are in milliseconds (ms).

# of /b/ /d/ /g/ Researchers Speaking Context Participants avg. SD range avg. SD range avg. SD range Lisker and Abramson, 4 1 N/A 0:5 5 N/A 0:25 21 N/A 0:35 monosyllabic CV words 1964 monosyllabic CV words Lisker & Abramson, 4 7 N/A 0:15 9 N/A 0:25 17 N/A 0:13 in conversation and 1964 narrative Klatt, 1975 3 11 ≈ 5 6:12 17 ≈ 5 11:23 27 ≈ 5 19:36 “say CV again” Sweeting & Baken, 30 13.9 1.0 N/A ------“It’s a CV” 1982 Caruso & Burton, 1987 8 19.7 8.5 N/A 21.4 4.9 N/A 35.2 8.4 N/A “Say CV again” Forrest, Weismer and 8 12 3 N/A ------“Buy Bobby a poppy” Turner, 1989 Baum and Ryan, 1993 10 10 N/A N/A 15 N/A N/A 30 N/A “Please say CVC” Brown, Morris, and 12 F 8.4 6.8 0:26 12.6 6.4 0:26 - - - “Speak CV again” Weiss, 1993 17 20 14:2 31 monosyllabic or Yang, 1993 7 N/A 9:32 N/A N/A 15:57 disyllabic CV words Robb, Gilbert, and 10 F 4 4 12 7 18 7 N/A N/A N/A “It’s a CV” Lerman, 2005 10 M 6 4 15 5 22 6 Morris, McCrea and 40 F 14.9 6.5 20.6 9.7 30.7 12.5 N/A N/A N/A CV syllables Herring, 2008 40 M 15.5 7.8 19.5 8.0 30.0 12.5 Average of Averages 11.2 14.2 25.7 ------(SD) (5.5) (5.2) (6.1) Note. CV = consonant-vowel, avg. = mean, SD = Standard Deviation, N/A = value could not be determined due to logistical reasons 6

75 milliseconds (Auzou et al., 2000). Auzou et al. (2000) have made a formal call for the establishment of more definitive normative VOT data based on their findings from a critical review of the measurements of VOT.

Influential Conditions

As VOT is a durational measure, it can change as a result of physiological differences

(e.g., age), differences in pathological status (e.g., depression), and linguistic task differences

(e.g., speech rate; McCrea & Morris, 2005; Auzou et al., 2005). Potential effects of these various conditions tend to differ across the literature due to varying research methodologies and designs as outlined in Table 3.

Speech Rate. VOT has been shown to vary as a function of speech rate in which faster speech rates result in shorter VOT values (Baum & Ryan, 1993; Kessinger, 1997; Volaitis &

Miller, 1992; Diehl, Souther, & Convis, 1980; Miller, Green, & Reeves, 1986).

Speaker Sex. Mixed findings in the literature reveal an undetermined effect of speaker sex on VOT production. Some studies show no main effect of sex on VOT (Morris, McCrea,

Herring, 2008; Whiteside, Hanson & Cowell, 2004; Yu, De Nil, & Pang, 2015); whereas, some studies show evidence of a main effect of sex on VOT (Thomas, 2012; Karlsson, Zetterholm,

Sullivan, 2004; Swartz, 1992; Ryalls, Zipprer, Baldauff, 1997). Overall, studies showing a main effect of sex on VOT duration found men to have shorter VOT values than women for both voiced and voiceless stops (Thomas, 2012; Swartz, 1992; Ryalls, 1997). However, one study by

Karlsson, Zetterholm, and Sullivan (2004) found men produce longer VOT values in comparison to females. It should also be noted that some researchers have found mixed findings in regards to the effect of speaker sex. One study found a significant effect of sex on VOT for voiceless stops but no significant effect of sex on VOT for voiced stops (Robb, Gilbert, and Lerman, 2005). 7

Table 3: Previously Studied Influential Factors and Conditions Relative to VOT.

Variable Significant Effect No Effect Significant effect of speech rate; faster speech rates equate to shorter VOT Speech values (Baum and Ryan, 1993; N/A (no studies found) Rate Kessinger, 1997; Volaitis and Miller, 1992; Diehl, Souther, and Convis, 1980; Miller, Green, Reeves, 1986; Jancke, 1994) - Significant effect of sex; females produce longer VOTs (Swartz, No significant effect of sex 1992; Ryalls, 1997; Robb, Gilbert, (Whiteside, Hanson, Cowell, 2004; Speaker Lerman, 2005; Thomas, 2012) Robb, Gilbert, and Lerman, 2005; Sex - Significant effect of sex for voiced Morris, McCrea, Herring, 2008; Yu ; males produce longer et al., 2015) VOT (Karlsson, Zetterholm, Sullivan, 2004) Significant effect of clear speech on VOT; VOT longer for V- stops in the Speech Clarity N/A (no studies found) word initial position for clear vs. habitual speech - Significant effect for V stops; VOT + Fundamental No main effect of Fo for V stops shorter for high Fo than lower to mid Fo Frequency (McCrea & Morris, 2005) (McCrea & Morris, 2005) - VOT generally longer at high lung volumes and shorter at low lung volumes (Hoit, Solomon, Hixon, Lung 1993) Volume - Significant effect of altitude on N/A (no studies found) VOT; VOT shorter at higher altitudes; attributed to smaller available lung volumes (Lieberman, Protopapas, and Kanki, 1995) - No significant effect of age for phonemes /b/ and /p/ (Sweeting and Baken, 1982) - No significant effect of age for Significant effect of age group on VOT velar stop production (Petrosino, Age (Thomas, 2012; Yu, 2015) Colcord, Kurcz, Yonker, 1993) - No significant effect of age for phonemes /b, p ,t, d/ when comparing five year old children to adults (Koenig, 2000) 8

Significant effect of place of articulation; velar > alveolar > bilabial (Klatt, 1975; Smith, 1978; Volaitis and Place of Miller, 1992; Baum and Ryan, 1993; N/A (no studies found) Articulation Jancke, 1994; Kessinger, 1997; Robb, Gilbert, Lerman, 2005; Fischer & Goberman, 2010) No significant difference between Trained vs. trained and untrained female singers Untrained N/A (no studies found) for VOT production (McCrea, Singers Morris, Richard, 2007) No significant menstrual cycle phase effect on overall VOT duration Hormones N/A (no studies found) (Whiteside, Hanson, and Cowell, 2004) - No significant effect of PD on VOT values (Fischer and Goberman, 2010) - No significant effect of Neurological N/A (no studies found) Alzheimer’s disease on VOT Condition production (Baker, Ryalls, Brice, &, Whiteside, 2007) - No significant of ALS on VOT values (Caruso and Burton, 1987) No consistent significant effect on VOT production; some statistical Dysphagia N/A (no studies found) significance for only a few of the Participants (Ryalls et al., 1999) No significant effect of vocal Vocal N/A (no studies found) nodules on VOT production Pathologies (Marciniec, 2009) PWS vs. No significant group difference for “Typical” N/A (no studies found) people who stutter (Watson and Controls Alfonso, 1982) - No studies have directly examined the effects of vocal loudness on VOT Vocal duration in adult speakers Intensity/ - Knuttila (2011) reported this same finding; however, Knuttila provided a Loudness prediction about the effects of speaker “loudness” on VOT based on the Hoit et al., 1993 study on lung volume and VOT. Significant effect of race; African American individuals, on average, Race/ produce more pre-voicing VOTs in N/A (no studies found) Ethnicity comparison to Caucasian individuals (Ryalls, 1997) 9

- Significant effect of vowel height on VOT; VOT ratios longer for high compared to low vowels (Fischer & Goberman, Succeeding No significant effect of vowel 2010) Vowel context on velar VOT (Petrosino, - Significant effect of vowel height Height Colcord, Kurcz, Yonker, 1993) on VOT; VOT is 15% longer before the high vowels /ɪ/ and /u/ than before mid to low vowels /æ/ and /ɑ/ (Klatt, 1975) - Significant effect of syllable number; VOT longer in monosyllable words in comparison to multisyllabic words/utterances (Yu et al., 2015) Linguistic - VOT for /p, t, k/ in two syllable N/A (no studies found) Task words on avg. 8% shorter than in the corresponding one syllable word (Klatt, 1975) - Longer utterance equates to a shorter VOT value (Lisker and Abramson, 1964) Significant effect of depression on VOT; people with a diagnosis of Depression depression have significantly shorter N/A (no studies found) VOT values (Flint, Black, Campbell- Taylor, Gailey, and Levinton, 1992) No significant effect of obstructive pulmonary disease (asthma) for Pulmonary N/A (no studies found) VOT of word initial /t/ (Lee, Insufficiencies Chamberlain, Loudon, and Stemple, 1988) Significant effect of environmental setting (inside sound booth vs. non- Environmental laboratory setting) on VOT values; men N/A (no studies found) Setting and women produce shorter VOT values in a non-laboratory setting (Robb , Gilbert, Lerman, 2005) Note. V+ = voiced stops /b, d, g/; V- = voiceless stops /p, t, k/

Speaker Age. Speaker age is another area of VOT research that has resulted in mixed findings. Some researchers have found a main effect of age (Thomas, 2012; Yu, De Nil, & Pang,

2015) where older individuals have longer VOT values; whereas, some researchers have found 10 no main effect of age on VOT duration (Sweeting and Baken, 1982; Petrosino, Colcord, Kurcz,

Yonker, 1993; Koenig, 2000).

Place of Articulation. Numerous researchers have found a significant effect of place of articulation of overall VOT duration for voiced and voiceless stop consonants. Researchers have found velar stops have the longest VOT value, bilabial stops the shortest VOT values, and alveolar stops an intermediate VOT value (Smith, 1978; Fischer & Goberman, 2010; Baum &

Ryan, 1993; Kessinger, 1997; Klatt, 1975; Volaitis & Miller, 1992; Robb, Gilbert, & Lerman,

2005; Jancke, 1994).

Vowel Height. A few researchers have found VOT values to be longer when the consonant is succeeded by the high vowel /i/ in comparison to the low vowel /a/ (Fischer &

Goberman, 2010; Klatt, 1975). Klatt (1975) reported VOT is 15% longer before the high vowels

/ɪ/ and /u/ than before mid to low vowels /æ/ and /ɑ/. Specifically examining velar stops,

Petrosino et al. (1992) found no effect of vowel height on VOT.

Linguistic Task. The length of the linguistic task (as indicated by the number of syllables) appears to correlate with VOT duration. Studies have shown the longer the linguistic task (e.g., phrase or sentence production vs. syllable production), the shorter the corresponding

VOT value (Klatt, 1975; Yu, De Nil, & Pang, 2015; Lisker & Abramson, 1964).

Additionally, researchers have examined the effects of fundamental frequency, training in singing, lung volume, hormones, neurological condition, dysphagia, vocal pathology, fluency disorders, race/ethnicity, hearing capability, depression, pulmonary insufficiency, and environmental setting on overall VOT duration. Studies that have examined the potential impacts of these variables are outlined in Table 3. 11

Methods Used to Measure Voice Onset Time

Wideband Spectrogram. Lisker and Abramson’s pioneering cross-linguistic study originally proposed the use of the wideband spectrogram as an acoustic display to accurately measure VOT (1964). Wideband spectrograms display a time by frequency by amplitude representation of speech output in which time is on the x-axis, frequency the y-axis, and amplitude the z-axis or darkness of the depicted frequencies (Marciniec, 2009). Lisker and

Abramson indicated that analysis of the wideband spectrogram reveals a number of salient acoustic cues vital to determining VOT. Salient acoustic cues in the spectrum directly correspond to physiological correlates of stop consonant production. The release of vocal tract constriction is demarcated in the spectrogram by an abrupt spectral change and the onset of voicing is demarcated by the first regularly spaced vertical striation found in the formant structure. In some utterances, it becomes too difficult to accurately determine the location of the release burst or the onset of voicing on the spectrogram (Auzou et al., 2000). Detecting the burst becomes problematic if a speaker fails to achieve full articulatory closure in his or her production of a stop consonant, as is the case with individuals who have various neurological conditions (Auzou et al., 2000). Determining the onset of regular striations on the spectrogram becomes difficult in certain linguistic contexts due to the effects of voicing in the preceding and succeeding phonemes (Lisker & Abramson, 1964). In addition, identifying the location of formant excitation by the operator depends on the strength of the glottal source as well as the clarity of the spectrogram, and thus is considered a potentially difficult task. Furthermore, more recently

Abramson and Whalen (2017) indicate using the wideband spectrogram as an accurate measure of VOT becomes problematic at times due to a significant amount of ambiguity and smearing in the temporal dimension. Smearing of the signal in the temporal dimension makes it difficult to 12 determine the onset of the first regularly spaced vertical striation, which is indicative of the physiological onset of voicing. In order to account for these limitations, researchers have adopted acoustic oscillographic signals in an effort to measure VOT more precisely.

Acoustic Displays. Several researchers have adopted the use of the time-synced acoustic oscillographic display in conjunction with the spectrographic display in the attempt to obtain an accurate measure of VOT. In the oscillographic display of the microphone signal, often called the audio signal, the release of the vocal tract constriction is demarcated as an overt transient and the onset of voicing is demarcated by the initiation of “periodicity” (it is noted that human voicing is not periodic but quasi-periodic or nearly periodic, but the earlier literature repeatedly uses the word “periodic” in the VOT context). For voiceless stop phonemes, frication and aspiration are identified as noise signals located between the burst and the onset of voicing.

When precise measurement using the spectrogram is difficult due to smearing in the temporal dimension, the audio signal can help further differentiate what is less well observed in the spectrogram. Consistently locating the transient (burst) on the audio signal and accurately determining initiation of phonatory oscillation may also be difficult due to the ambiguity of the audio signal, but the timing issues present with the spectrogram are absent.

Digitized audio signal speech analysis typically utilizes manual identification of both the burst and initiation of voicing in the signal. Consequently, the VOT data rely on researcher judgement to establish an accurate VOT value. Once a researcher determines the location of the burst and onset of voicing, the software provides the moment in time for those events, and thus the elapsed time difference between the two selected points, the latter being the VOT.

An important point to be made for the current project is that, across the literature,

researchers have interpreted the initiation of voicing in the signal differently (Table 4). In 13 addition to identifying the first glottal striation in the spectrogram, at which formant structure has been excited, relative to the audio signal, researchers have historically selected the earliest overt

“peak”, and in other studies the “valley”, in the quasi-periodic audio signal. In other words, researchers typically select the first lowest point of amplitude or first highest point of amplitude in the quasi-periodic audio signal. It is noted there may be initial vocal fold oscillation that does not create sufficient acoustic energy to create glottal striations and excite the formants.

Consequently, this measure of burst to the presence of glottal striations may create longer VOT values for voiceless stops in comparison to the burst to the actual moment in time when glottal oscillation actually begins, marking voicing onset, which can be seen in the audio signal. In cases where a valley or peak of the initial quasi-periodic signal is chosen, a full half cycle of the phonatory cycle may separate these two measures. This ignores the true onset, which would be the true moment in time when the nearly periodic phenomenon begins. These differences in measurement methodology may in part contribute to the variability in the VOT values reported in the literature (Table 1, 2).

Electroglottograph. Since the early 1980s, there has been a steady increase in the use of electroglottography (EGG) for research and clinical applications. Baken (1992) attributed this increase to an increased interest in vocal disorders and advances in vocal physiology.

Electroglottographic signals have also been used to obtain VOT measurements.

Electroglottography is a non-invasive, innocuous, and inexpensive recording procedure that can be used to study limited and inferred aspects of the vibratory behavior of the vocal folds during speech production (Titze, 1990; Baken, 1992). A high frequency, low amperage, electric signal is passed between two electrodes positioned on the two laryngeal laminae (Titze, 1990). 14

Table 4: Previously Implemented VOT Analysis Methods. Direct quotations and corresponding illustrative examples of measurement methodology are provided where applicable. Illustration Example Researchers Method Methodology (if applicable) “Wide-band spectrograms of the recordings were made, and from these, voice onset times were measured by marking off the interval between the release of the stop and onset of glottal Lisker and vibration… The point of voicing Wideband Abramson, onset was determined by locating Spectrogram 1964 the first of the regularly spaced vertical striations which indicate glottal pulsing, while the instant of release was found by fixing the point where the pattern shows an abrupt overall change in the spectrum.” (p. 389) “The VOT is indicated by the sudden onset of vertical striations in the second and higher Wideband formants…. vertical lines have Klatt, 1975 Spectrogram been drawn on the spectrograms at plosive release, at the end of visible frication noise, and at voicing onset.” (p. 687) “Obtained from wide-band Watson and Wideband spectrograms… onset of voiced Illustration not provided by Alfonso, Spectrogram vowels was taken to be the first researchers 1982 regular vocal fold pulse”. (p. 225) “Measured from the onset of the oral release of the stop-plosive to Caruso and Wideband the onset of voicing as indicated Illustration not provided by Burton, Spectrogram by the first regularly occurring researchers 1987 vertical striation in the second and higher formants.” (p.81) 15

“Wideband spectrograms were made… all subsequent measures Forrest, were made from these digitalized Weismer Wideband records …. onsets of formant and Turner, Spectrogram transitions were defined by a 1989 minimum of 20-Hz change over a 20 ms interval,” (p. 2611)

“VOT was measured from the plosive release burst to the first Petrosino, distinct vertical voicing striation Wideband Colcord, for the succeeding vowel …. Spectrogram Illustration not provided by Kurcz, and waveform display was used to and researchers Yonker substantiate onset of either the Oscillogram 1993 release burst (spike) or voicing (periodicity) for the vowel” (p. 86) “…. calculated from the onset of the target-initial stop consonant Baum and Illustration not provided by Oscillogram burst to the onset of periodicity Ryan, 1993 researchers correspond to the following vowel.” (p. 433)

“oscillographic “Voice onset time was derived traces of Brown, from the point of the sudden drop intraoral Morris, and in pressure, which represents to pressure and Weiss, consonant release, as depicted by the voicing 1993 signal” point B, to the onset of voicing as indicated by point C.” (p. 331) (p. 311) 16

“Original recordings were played back and filtered by two band Jancke, “analogue pass filters to obtain signals 1994 processing” indicating phonation and articulation” (p. 25)

“Cursors have been placed at the Lieberman, onset of the burst that was caused Protopapas, Oscillogram by opening the lips and at the 1995 onset of periodicity that indicated vocal fold vibration.” (p. 858) “The VOT interval was located by placing a cursor at the onset of the burst, which indicated the release of the stop consonant. The Ryalls, Oscillogram second cursor was placed at the Gustafson, Illustration not provided by and Auditory highest point of the first regularly and Santini, researchers Cues appearing period of the vowel …. 1999 the examiner also listened to the marked portion to ensure the burst had been properly isolated” (p. 171) “The two events that define VOT, namely stop release and voicing onset, were located in the airflow signal as shown in Figure 1. The stop releases in the target syllable were defined according to the local peak in the second time derivative of the smoothed flow signal, representing the rapid Koenig, Unsmoothed airflow increase that occurs upon 2000 Airflow Signal oral release. Voicing onset was set at the first visible pulse in the original (lightly smoothed) flow signal, and VOT was calculated as the difference between these two times…. Therefore, tokens of /b, d/ that showed continuous voicing through the closure were assigned a VOT values of 0 ms.” 17

“The release of the plosive was Karlsson, marked at the last zero crossing in Zetterhold, the waveform before a transient Oscillographic and … the onset of voicing was Waveform Sullivan marked at the last zero crossing (2004) before the onset of periodicity in the waveform” (p. 317)

“Onset of voicing (marked by the first visible sign of low frequency periodic acoustic activity in the Whiteside, Oscillographic spectrograms).... point of closure Hanson, Waveform and Illustration not provided by release was taken as the transient and Cowell, Sound Pressure burst of the plosive’s release… researchers 2004 Waveforms where measures of VOT needed validation, sound pressure waveforms were used” (p. 45) “A time marker was placed at the onset of the noise burst of each stop and another marker at the onset of steady-state vocal fold vibration. Steady-state vocal fold McCrea Oscillogram vibration was determined using Illustration not provided by and Morris, and the combined appearance of the researchers 2005 Spectrogram first vertical striation in the first and second formants on the sound spectrogram and the first downward peak of the complex vowel waveform on the oscillogram trace.” (p. 1016) “… left cursor was positioned at the burst release and the right cursor was placed at the first instance of vocal fold vibration at Robb, the level of the second formant... Gilbert, and Wideband instances in which voicing Illustration not provided by Lerman, Spectrogram occurred before the burst release researchers 2005 were measured by placing the left cursor at the onset of the pre-burst voicing and the right cursor was positioned at the burst release” (p. 128) 18

“The acoustic beginning of the consonant (cessation of higher Wideband Lane and frequency energy for the Spectrogram Perkell, preceding vowel) … and the and Time 2005 onset of voicing for the following Waveforms vowel (beginning of the first regular glottal cycle).” (p.1335)

“The start of the noise which signifies the release of the stop consonant was detected on the spectrogram and marked using a hand controlled cursor. The onset of vocal fold vibration was Morris, detected on the oscillogram and Gorham- Oscillogram marked using a second hand Illustration not provided by Rowan, and and Wideband controlled cursor. The onset of researchers Herring, Spectrogram vocal fold vibration was defined 2007 as the first upward peak of regularly occurring oscillations of the waveform representing vocal fold movement. The VOT was the time that elapsed between the plosive release and the onset of vocal fold vibration”. (p. 115) “VOT was measured by placing a time marker at the onset of the noise burst of each stop, and another market at the onset of steady-state vocal fold vibration. McCrea Oscillogram Steady-state vocal fold vibration Illustration not provided by and Morris, and Wideband was determined using the researchers 2007 Spectrogram combined appearance of the first vertical striation in the second formant on the sound spectrogram and the first down ward peak of the complex vowel waveform on the oscillogram trace.” “Determination of the steady- state portion of each vowel and of Torre and VOT was subjective… Praat Wideband Illustration not provided by Barlow, software provided the specific Spectrogram researchers 2009 objective frequency data for F0 and the three formant frequencies.”(p. 328) 19

“Onset of the VOT was defined as the burst of noise energy in the spectrographic display …offset of Morris, Oscillogram VOT was defined as the first peak McCrea, Illustration not provided by and of the regular wave motion on the and Herring researchers Spectrogram oscillographic display that was 2008 simultaneous with the glottal voicing bar on the spectrogram”. (p. 31) “VOT was determined by measuring the interval from the onset of the initial stop burst to the onset of periodicity associated with the vowel… vertical cursors Raw Audio Fischer and were placed at these two time Waveform and Illustration not provided by Goberman, points and the time between the Wide-band researchers 2010 cursors was calculated as VOT … Spectrogram the beginning/end of the vocalic nucleus were determined by the presence of the first formant (F1) combined with energy of a higher formant” (p. 24-25)

“Manually measured for each CV token using vertical cursors… a Oscillogram vertical cursor was placed at the Thomas, and Wideband onset of the spectral burst on the 2015 Spectrogram spectrogram, and the second vertical cursor was placed on the onset of glottal vibration” (p. 12) 20

“Each word was annotated by hand in Praat for the beginning of the stop closure, the beginning of the burst, and the beginning of the following … We Nelson and Oscillogram excluded tokens with pre-voicing, Illustration not provided by Wedel, and Wideband tokens with no identifiable burst, researchers 2017 Spectrogram and tokens with closure durations or voice onset times that were more than 2 standard deviations from the specific speaker’s mean for that consonant.” (p. 55)

The recorded electroglottographic signal represents changes in the overall conductance

between these electrodes. During the glottal closed phase, the vocal fold tissues meet one another

and conductance increases (Titze, 1990). As the vocal folds “peal apart” from one another, there

is a decrease in overall conductance (Titze, 1990). The EGG signal is the demodulation of the

conductance change throughout the phonatory cycle, with a waveshape that is determined by the

degree of glottal adduction, subglottal pressure, vocal fold length, and size of the vocal folds.

The signal waveform is an obvious waveform as soon as the vocal folds touch during phonatory

onset. Most researchers assume (and research supports the notion) that the EGG waveform shape

then corresponds to the changing contact area between the two vocal folds during phonation

(Scherer et al., 1988; Hampala et al., 2016)

Speech research benefits greatly from the use of EGG as it generates a signal essentially

free of supraglottal and acoustic influence (Baken, 1992). This allows for accurate simultaneous

measurement of other relevant variables such as the wideband airflow and the microphone

signal. One significant issue with the use of EGG is that the path between the electrodes is not a

straight line due to the heterogeneity of the neck tissues and their resulting resistive properties

(Baken, 1992). The neck simply behaves as a volume conductor and the electrical current from 21 the electrodes radiates from one electrode and converges on the other (Baken, 1992). As a result, researchers interested in using EGG must take into account the impedance effects of the neck tissues (Baken, 1992). A limitation of electroglottography is that it relies on vocal fold contact to show a significant change in the EGG waveform; thus, it is unable to capture the first few cycles of vocal fold oscillation that occur prior to the initiation of contact when the vocal folds initially vibrate without contact. For the production of initial stop consonants, vocal fold oscillation would occur prior to vocal fold contact per se. Thus, for voiceless stop consonants, the time from the burst to the first indication of glottal activity using the EGG should typically result in a VOT value that is too long (by several glottal cycles), assuming that the intent for the VOT measure is to capture the moment in time when any indication of vocal fold oscillation is present.

Wideband Airflow. Aerodynamic assessment of vocal function primarily focuses on obtaining estimates of average glottal airflow and average subglottal air pressures (Kent & Ball,

2000). Recently, airflow has begun to emerge as a useful tool for analyzing speech production tasks. Aerodynamic assessment utilizes noninvasive procedures to measure airflow, typically through use of an aerodynamically controlled facemask system. The Glottal Enterprises aerodynamic system, which is often referred to as the “Rothenberg mask”, is a commonly used pneumotachograph in clinical speech science research. The aerodynamic system consists of a facemask with holes covered with mesh wire, an intraoral air pressure transducer with small tube attachment, and an additional air pressure transducer for pneumotach-measurement of airflow through the mask during speech production. The output airflow value provided by the mask is proportional to the pressure drop across the mask.

The airflow is “wideband” because the acoustic flow is captured from zero Hz to approximately 2000 Hz, so that the influence of the first two formants are obvious in the airflow 22 signal (which looks quite similar to the microphone signal). For the proposed research project, because of the wideband nature of the airflow signal, there is greater sensitivity compared to average airflow relative to time and airflow change. This results in the potential capturing of airflow changes as the result of initial vocal fold oscillation even when the acoustic signal is not strong.

One significant issue with the measurement of wideband airflow using the controlled facemask system is ensuring there is a complete seal between the Participant’s face and the transducer mask (Holmberg, Hillman, Perkell, Guiod, Goldman, 1995; May & Scherer, 2017).

Acquiring a perfect seal is vital to measuring valid airflow as even a small leak in the mask can have a large impact on the amplitude-based flow data (Holmberg et al., 1995; May & Scherer,

2017).

Statement of the Problem

There have been numerous studies of voice onset time through spectrographic, oscillographic (microphone), and glottographic measurement approaches. Only one previous research methodology (Koenig, 2000) has taken into account airflow as a potentially accurate signal to measure the burst and phonatory onset for VOT measurement (as far as the present researchers are aware). It is possible that the most accurate measure of when the vocal folds actually begin their oscillation during phonatory onset is revealed through the airflow signal, because airflow may be more sensitive to this initial oscillation by showing the modulation of the airflow than the acoustic signal is in showing acoustic modulation.

Thus, the current research project will comparatively study four different measurement approaches for the determination of voice onset time for the production of the six English stop consonants /p, t, k, b, d, g/. The signals of interest are the acoustic or audio signal, the 23 spectrographic display, the electroglottographic (EGG) signal, and the wideband airflow. The approach is visual marking by cursor by an operator while viewing the displays, the traditional method, rather than by automatic extraction of events (Jancke, 1994).

This study introduces the phrase “initial detection of phonation, IDP”, which is the moment in time for the initial indication of vocal fold or glottal oscillation in the waveform of the acoustic, airflow, or electroglottographic signal. This is visualized as the initial modulation of the signal when it appears to have a strongly related period of oscillation related to the immediately subsequent voicing cycles.

Research Objectives

The research questions to be answered are:

1) When definitive glottal oscillatory action is inferred from the wideband airflow signal,

when does this occur in the acoustic, spectrographic, and EGG signals?

Hypothesis: The initial detection of phonation (IDP) observed in the acoustic and EGG

signals will occur at the same time, or lag behind (occur at a later time), in comparison

to the IDP in the wideband airflow signal. Preliminary investigation (the pilot project for

this thesis) suggested that there may be an earlier IDP in the wideband airflow signal in

comparison to the acoustic and EGG signals for the production of the six CV words. The

“definitive” glottal action in the spectrogram will be identified as the presence of energy

in the first and second formant region. It is noted that if the IDP for the airflow is sooner

than for the other signals, the VOT for voiceless stops and for voiced stops with positive

VOT values will be of shorter duration. However, the initiation of airflow from the zero

baseline may still be a useful tool in identifying the presence of the burst for negative

VOT values. 24

2) For the signals studied, what is the resulting order of the duration of the VOT for

voiceless stop consonants?

Hypothesis: The order from shortest to longest VOT values will be for the airflow signal,

the audio signal, the EGG signal, and then the moment in the spectrographic display

when there is sufficient acoustic excitation to show energy in the first and second

formants of the vowel. The first hypothesis suggests that the airflow may give the shortest

VOT values. The EGG signal should provide longer VOT values compared to the audio

signal because the presence of the EGG signal waveform relies on vocal fold contact,

which should occur after a short duration of vocal fold oscillation as the vocal folds move

closer during dynamic adduction. The hypothesis suggests that there may not be

sufficient acoustic energy present to excite formants until there is near vocal fold closure,

and thus the VOT for the spectral presence of formants may occur nearly at the same

time. For voiced stops with a positive VOT value, the longest VOT may again be with the

EGG signal and spectral formant presence.

3) How do the VOT values vary across the conditions examined?

Hypothesis: VOT will vary as a function of place of articulation and voicing. Numerous

researchers have found a significant effect of place of articulation of overall VOT

duration with velar stop production resulting in the longest VOT values and bilabial stop

production resulting in the shortest VOT values (see the review in the introduction

section). 25

CHAPTER I. METHODS

IRB Approval

The current study was reviewed and approved (approval number 1046800-2) by the

Bowling Green State University Institutional Review Board (Appendix A).

Participants

The current research study recruited four healthy participants (2 adult males and 2 adult females) within the age range of 21- 22 years of age (Appendix B). All Participants were native speakers of English, had normal voice and speech, reported no history of receiving training in professional speech, voice, or singing, and reported no history of smoking or current illness. All

Participants provided signed informed consent before initiation of the study and received a fifteen-dollar gift card for their participation in the study.

Voice, Speech, and Health Questionnaire

Prior to recording speech utterances, each participant was asked to complete a brief health, speech, and voice questionnaire. The health, speech, and voice questionnaire is attached as Appendix B. The main purpose of this form was to help account for individual Participant characteristics, which may potentially influence the participant’s voice and speech on the day of the recording.

Speech Stimuli

Each of the six stop consonants (/p, b, t, d, k, g/) was paired with the vowel /ɑ/ and produced in the carrier phrase “CV-the-CV” (e.g., “pɑ the pɑ”) spoken with equal stress on each of the CV syllables. This phrase was selected in order to attempt to equate the stress level of the first syllable across the utterance. Each CV pair was produced with independent changes in speech clarity level (habitual vs. clear). Other varying factors taken into account during further 26 analysis included method of measurement for the VOT (airflow, audio, EGG, spectral), consonant place of articulation (bilabial, alveolar, velar), and mode of voicing (voiceless stop, voiced stop). A total of 144 tokens (4 participants x 6 consonants x 2 conditions x 3 utterances/condition) were recorded and measured.

Recording Environment

A sound-treated booth approximately 4 feet by 6.33 feet by 6.5 feet was utilized for data collection and recording (the voice lab in room 181 in the BGSU Health and Human Services building). Participants were seated comfortably in a chair and given adequate time to become accustomed to the sound booth and its furnishings before data collection commenced.

Recording Protocol

Before the commencement of recording the primary investigator and participant went through a period of training to familiarize the participant with the recording procedures and to allow him or her to become comfortable with the research equipment. Speech stimuli were typed in boldface on a piece of white paper and participants were asked to read the stimuli aloud while holding the stimuli sheet perpendicular in front of the face. For each utterance, participants were required to place a sterilized facemask firmly on their face, wear a non-invasive electroglottograph electrode neckband, and wear a headband-mounted microphone. The Glottal

Enterprises MSIF-2 aerodynamic facemask system was used in the current study.

Electroglottograph electrodes were placed firmly on the participant’s neck on either side of the thyroid cartilage in order to attempt to acquire an adequate EGG signal. A headband-mounted microphone (Radio Shack Cooperation, Model 33- 3301) was selected in order to maintain a constant mouth to microphone system for all productions. The wideband airflow, microphone, and EGG signals were simultaneously recorded using a DATAQ A/D converter (DI-2108 Series, 27

DATAQ Instruments, Akron, Ohio) and Windaq software at a sampling rate of 20,000 Hz per channel. Signal timing differences due to the acoustic travel time of the speed of sound were explored and found to be negligible. Recorded signals were stored on a computer hard drive for later data analysis.

Data Analysis

For habitual speech tasks, three productions of each stimulus were analyzed for data comparisons across participants. The VOT values were all positive for the voiceless stop consonants, and mostly positive for the voiced stop consonants. Productions with pre-voicing prior to the burst were not included in the summary analyses as these productions accounted for only 11.7% of the overall data set. Each stimulus was manually annotated using Praat computerized software and vertical cursors at distinct measurement locations.

For the microphone and airflow signals, the first cursor was placed at the location of the burst signifying the release of stop consonant closure. When accurate determination of the burst location in the microphone signal was difficult to visualize, coupled views of the wideband airflow and microphone signal were utilized. The airflow signal was used to substantiate the location of the burst in the microphone signal as the release of stop closure is marked in the airflow signal by an overt change in slope and is thus more sensitive to burst detection (Figure

1, 2). A second vertical cursor was placed at the point of initial detection of phonation (IDP) in the succeeding vowel. The initial detection of phonation was demarcated as the first valley of the first regularly occurring signal oscillation that was related in an obvious way to phonation.

The time elapsed between the two vertical cursors was taken as the VOT (Figure 3, 4).

Additional locations for the initial detection of phonation were also measured due to varying 28

Wideband Airflow

Microphone Signal

Figure 1. Utility of the Wideband Airflow Signal for Burst Detection – Example A. Demonstration of the utility of the wideband airflow signal for determination of the location of the burst when it is less identifiable in the microphone signal for the production of a /kɑ/ token.

Wideband Airflow

Microphone Signal

Figure 2. Utility of the Wideband Airflow Signal for Burst Detection – Example B. Demonstration of the utility of the wideband airflow signal for determination of the location of the burst when it is less identifiable in the microphone signal due to extraneous noise in the microphone signal for the production of a /tɑ/ token. 29 measurement methodologies found in the VOT research literature (Figure 5, 6). Additional

VOT measurement values were determined using the first peak, second valley, second peak, and third valley as the first initial detection of phonation in the oscillation of the wideband airflow and microphone signals.

For the wideband spectrogram (Figure 5) the initial detection of phonation was measured as the first distinct voicing striation in the first and second formant (Lisker and

Abramson, 1964; Klatt, 1975; Watson and Alfonso, 1982; Caruso and Burton, 1987; Forrest,

Weismer and Turner, 1989; Petrosino, Colcord, Kurcz, and Yonker, 1993; Robb, Gilbert, and

Lerman, 2005; Morris, McCrea, and Herring, 2008; Fischer and Goberman, 2010; Thomas,

2015).

For the electroglottograph signal, the initial detection of phonation was measured by the first glottal pulse for the initiation of oscillation indicating the initiation of vocal fold contact

(Figure 7).

Statistics

A liner mixed model analysis was conducted to determine the extent to which measurement approach (the use of the wideband airflow or microphone signal or electroglottographic signal or the spectrogram) and measurement location (first valley, first peak, second valley, second peak, third valley) affected VOT. To examine the effect of measurement approach, a model was constructed using measurement methodology (Levels:

Flow-IDP, MIC-IDP, EGG, Spectrogram), voicing characteristic (voiceless, voiced) and the associated interaction effect as fixed effects. To examine the effect of measurement location, a model was constructed using location (Levels: first valley, first peak, second valley, second peak, third valley), measurement approach (wideband airflow, microphone signal), and the 30 interaction as fixed effects. The random effects structure for both models included voicing characteristic, place of articulation, and trial as random slope terms, and participant as the random intercept term.

Wideband Airflow

Microphone Signal

Spectrogram

Electroglottograph

Figure 3. Demonstration of the Segmentation Between the Burst and First Valley as the Initial Detection of Phonation (IDP) for the Determination of VOT – Example A. There is a simultaneous onset of phonation (IDP) of the wideband airflow and microphone displays, with a later onset of phonation indicated in the spectrogram (the vertical bar) and EGG signal (end of the segmentation) for the production of a /pɑ/ token. 31

Wideband Airflow

Microphone Signal

46 ms difference

Spectrogram

Electroglottograph

Figure 4. Demonstration of the Segmentation Between the Burst and First Valley as the Initial Detection of Phonation (IDP) for the Determination of VOT – Example B. There is a simultaneous onset (IDP) of the wideband airflow and microphone displays with a later onset by approx. 7 glottal cycles in the spectrogram for the production of a /tɑ/ token. This example also demonstrates the occurrence of the EGG waveform indicating phonation onset before the phonatory onset indicated by the spectrogram. 32

P2 P1 1 1

Legend B, V1 B = phonatory onset V2

P1 = first peak V3 P2 = second peak Wideband Airflow

V1 = first valley

V2 = second valley

V3 = third valley

FE = formant excitation FE

Spectrogram

Figure 5. Demonstration of the Points of Demarcation for the Measurements of VOT in the Coupled Microphone and Spectrogram Displays. Multiple locations were analyzed for the initial detection of phonation (IDP) based on the current VOT research literature measurement methodologies, for the production of the /pɑ/ token of Figure 2.3. 33

Legend Wideband Airflow

B = phonatory onset P1 P2

P1 = first peak

P2 = second peak

V1 = first valley B, V1 V2 = second valley V2 V3 = third valley V3 Figure 2.6. Demonstration of the Points of Demarcation for the Measurements of VOT in the Wideband Airflow Signal. Another demonstration of the individual points of demarcation for the measurement of VOT in the wideband airflow signal based on the current VOT research literature measurement methodologies, for the production of the /pɑ/ token of Figure 23.

Legend

E = EGG Onset VF

Contact E B = phonatory onset B-E Electroglottograph

Figure 2.7. Demonstration of the Points of Demarcation for the Measurements of VOT in the Electroglottograph. Demonstration of the measurement methodology for the determination of the initial detection of phonation in the electroglottograph signal for the production of the /pɑ/ token of Figure 3. Statistics 34

CHAPTER II. RESULTS

Measurement Methodology

Results of the linear mixed model analysis revealed the voiceless stop consonant VOT measurements significantly differed between measurement approaches. Table 5 below reports the fixed effect estimates. The VOT measurements of voiceless stops made using the microphone display at its first valley location were not significantly different from the measurements made using the wideband airflow display at its first valley location, p>0.05. The VOT measurements of voiceless stops made using the EGG display at the first glottal pulse were significantly different from the measurements made using the wideband airflow display and its first valley location, p<0.0001 (by 8.1 ms). The VOT measurements of voiceless stops made using the wideband spectrogram and its first formant excitation were significantly different from the measurements made using the wideband airflow display and its first valley location, p <0.0001 (by 13.6 ms).

The only difference between measurement approaches observed for the voiced stop VOT measures was for the measures made between the wideband flow or microphone signals to the spectrographic display. The VOT measures of voiced stops made using the wideband spectrogram at its first formant excitation were significantly different from the measurements made using the wideband airflow display and its first valley location, p = 0.007 (by 6.3 ms).

Measurement Location

The measurement locations are the first valley of the airflow and microphone signals, their first peak, their second valley, their second peak, and their third valley (see Figures 5 and

6). Results of the linear mixed model revealed that measurement location significantly influenced VOT. Compared to the airflow first valley location, all other measurement locations were significantly longer, p<0.05 for all contrasts. No significant differences between 35

Table 5. Measurement Methodology Fixed Effect Estimates. VOT estimate values are calculated average VOT values across participants and place of articulation for each measurement methodology. The reported data are separated by voicing characteristics for each measurement methodology. The p-values are comparisons of the VOT measure with the wideband airflow at the first valley location.

VOT Standard Estimate df t-value p-value Error (ms) Voiceless Stop Consonants (/p, t, k/) Wideband Airflow 61.778 5.526 3.07 11.18 Microphone Signal 63.487 1.898 261.01 0.900 0.036881 Electroglottograph 69.898 1.898 261.01 4.279 <0.0001 *** Spectrogram 75.425 1.898 261.01 7.191 <0.0001 ***

Voiced Stop Consonants (/b, d, g/) Wideband Airflow 12.243 4.425 4.06 -11.194 0.000332 Microphone Signal 12.335 2.703 261.01 -0.598 0.550179 Electroglottograph 17.417 2.703 261.01 -1.09 0.276805 Spectrogram 18.542 2.703 261.01 -2.719 0.0070 ** Note. ** = p<0.05; *** = p<0.001; df = degrees of freedom measurement approaches (airflow and microphone) were observed at any measurement location, p>0.05 for all contrasts. These results are reasonable given that each adjacent measurement location is approximately one-half a phonation vibratory cycle (Figures 5 and 6). Table 6 below reports the fixed effect estimates.

Influential Conditions

Voicing Mode. As expected, voiced stops exhibited significantly shorter VOTs than voiceless stops regardless of the measurement approach, p<0.001. All of the voiceless stops had positive VOT values (all had voicing lag times). 36

Table 6. Measurement Location Fixed Effect Estimates. VOT estimate values are calculated average VOT values across voicing, place or articulation, and subject.

Degrees VOT Estimate Standard of t-value p-value (ms) Error Freedom Wideband Airflow IDP/V1 37.1540 1.0157 360.6 3.492 P1 39.7816 1.2916 684 2.034 0.0423 * V2 42.177 1.2916 684 3.889 0.0001 *** P2 45.1353 1.2916 684 6.179 <0.0001 *** V3 47.5358 1.2916 684 8.038 <0.0001 ***

Microphone Signal IDP/V1 38.0653 1.2916 684 0.706 0.480685

P1 39.5879 1.8266 684 -0.106 0.915589 V2 42.2162 1.8266 684 0.021 0.982886 P2 44.8824 1.8266 684 -0.138 0.889934 V3 47.9077 1.8266 684 0.204 0.838724 Note. IDP/V1 = initial detection of phonation/first valley; P1= first peak; V2= second valley; P2= second peak; V3= third valley; * = p <0.5; *** = p<0.001

For the voiced stop consonants, the frequency of occurrence of each voicing mode for (voicing lead vs. voicing lag) are given in Table 7 for each speaker. Overall, across speakers voiced phonemes were produced with pre-voicing 11.7% of the time. 37

Table 7. Voiced VOT Mode Distribution for Habitual Speech Production. Individual and overall participant frequency in percent occurrence of voicing mode usage (voicing lead vs. voicing lag) for voiced consonant productions.

Voicing Lead Voicing Lag Participant (negative VOT value) (positive VOT value) F1 6.7% 93.3% F2 6.7% 93.3% M1 33.3% 66.7% M2 0% 100% Overall 11.7% 88.3%

The bilabial /b/ phoneme was produced as prevoiced 0% of the time, the alveolar /d/ phoneme was produced as prevoiced 13% of the time, and the velar /g/ phoneme was produced as prevoiced 33% of the time (Table 8).

Table 8. Frequency of Pre-voicing Relative to Consonant Place of Articulation for Voiced Stop Consonants.

Voicing Lead Voicing Lag Stop Consonant (negative VOT value) (positive VOT value) /b/ 0% 100% /d/ 13% 87% /g/ 33% 67%

Although both habitual and clear speech tokens were produced by the participants, the thesis results and discussion will use only the habitual speech production tokens. Thus, since only 11.7% of the voiced stop consonants had a negative VOT value, those tokens will be excluded from further discussion because the emphasis of this study is the relationship among the airflow, microphone, EGG, and spectral aspects relative to positive VOT values. Numerous research studies have also discounted negative VOT values based on the rationale of Klatt (1975) who reasoned the presence of voicing prior to consonant release is not phonemic in the English 38 language. In other words, the production of negative, zero, or short-lag VOTs are all allophones of voiced stop consonants in the English language and not distinctive for differentiation between voiced and voiceless stops (Swartz, 1992).

Place of Articulation. For voiceless stop consonants, the alveolar phoneme /t/ provided the shortest VOT values, the velar phoneme /k/ provided the longest VOT values, and the bilabial /p/ phoneme provided an intermediate VOT value across the airflow and microphone signal approaches (Table 9). These results are somewhat inconsistent from the typical findings of having the bilabial voiceless stop /p/ having the shortest VOT. For voiced stop consonants, the bilabial phoneme /b/ provided the shortest VOT values, the velar phoneme /g/ provided the longest VOT value, and the alveolar phoneme /d/ provided an intermediate VOT value across the four measurement approaches (Table 10), consistent with the literature. The individual phoneme

VOT values and corresponding graphic representations for habitual speech production tasks are provided on the subsequent pages. 39

Table 9. Average Voiceless Stop Consonant VOT Data for Habitual Speech Production.

/p/ /t/ /k/ Wideband Airflow IDP/First Valley 61.8 60.5 62.4 First Peak 64.3 63.0 64.8 Second Valley 67.0 66.1 68.0 Second Peak 69.9 69.0 70.8 Third Valley 72.0 71.4 73.2

Microphone IDP/First Valley 63.4 62.3 64.1 First Peak 66.2 64.9 66.8 Second Valley 68.7 67.6 69.6 Second Peak 72.0 70.5 72.5 Third Valley 74.6 73.4 75.2

Electroglottograph 69.0 70.1 70.0

Spectrogram 74.3 76.7 74.6 Note. underlined values = shortest VOT value across measurement approaches 40

Table 10. Average Voiced Stop Consonant VOT Data for Habitual Speech Production

/b/ /d/ /g/ Wideband Airflow IDP/First Valley 8.5 11.9 16.6 First Peak 11.4 14.7 19.1 Second Valley 13.1 16.7 20.7 Second Peak 14.7 20.5 24.5 Third Valley 18.4 22.7 26.1

Microphone IDP/First Valley 9.0 11.9 16.3 First Peak 11.4 14.0 18.2 Second Valley 13.5 16.7 20.7 Second Peak 15.0 20.3 22.9 Third Valley 19.4 22.7 26.1

Electroglottograph 12.5 18.2 21.7

Spectrogram 14.0 18.8 23.1 Note. underlined values = shortest VOT value across measurement approaches 41

Table 11. Average /p/ VOT Values in Habitual Speech Production. Average VOT values in milliseconds for production of the /p/ phoneme in the phrase “pɑ the pɑ” in a habitual speech production task.

FLW MIC EGG Spectrogram V1 61.8 63.4 – – P1 64.3 66.2 – – V2 67.0 68.7 – – P2 69.9 72.0 – – V3 72.0 74.6 – – EGG – – 69.0 – Spectrogram – – – 74.3 Note. VOT = voice onset time; V = valley; P = peak; FLW = wideband airflow signal; MIC = microphone signal; EGG = electroglottograph

Average /p/ VOT Values in Habitual Speech 75.0

70.0

65.0 FLW MIC EGG VOT (ms) VOT 60.0 SP

55.0

50.0 IDP/V1 P1 V2 P2 V3 EGG SP

Figure 8. Average /p/ VOT Values in Habitual Speech. Schematic representation of average VOT values in milliseconds for production of the /p/ phoneme in the phrase “pɑ the pɑ” in habitual speech production task. Note. FLW = wideband airflow; MIC = microphone signal; EGG = electroglottograph; SP = spectrogram; IDP = initiation detection of phonation; V = valley; P = peak 42

Table 12. Average /t/ VOT Values in Habitual Speech Production. Average VOT values in milliseconds for production of the /t/ phoneme in the phrase “tɑ the tɑ” in a habitual speech production task.

FLW MIC EGG Spectrogram V1 60.5 62.3 – – P1 63.0 64.9 – – V2 66.1 67.6 – – P2 69.0 70.5 – – V3 71.4 73.4 – – EGG – – 70.1 – Spectrogram – – – 76.7 Note. VOT = voice onset time; V = valley; P = peak; FLW = wideband airflow signal; MIC = microphone signal; EGG = electroglottograph

Average /t/ VOT Values in Habitual Speech 80.0

75.0

70.0 FLW MIC 65.0 EGG SP

VOT (ms) VOT 60.0

55.0

50.0 IDP/V1 P1 V2 P2 V3 EGG SP

Figure 9. Average /t/ VOT Values in Habitual Speech Production. Schematic representation of average VOT values in normal speech production for the /t/ phoneme. Note. FLW = wideband airflow; MIC = microphone signal; EGG = electroglottograph; SP = spectrogram; IDP = initiation detection of phonation; V = valley; P = peak 43

Table 13. Average /k/ VOT Values in Habitual Speech Production. Average VOT values in milliseconds for production of the /k/ phoneme in the phrase “kɑ the kɑ” in a habitual speech production task.

FLW MIC EGG Spectrogram V1 62.4 64.1 – – P1 64.8 66.8 – – V2 68.0 69.6 – – P2 70.8 72.5 – – V3 73.2 75.2 – – EGG – – 70.0 – Spectrogram – – – 74.6 Note. VOT = voice onset time; V = valley; P = peak; FLW = wideband airflow signal; MIC = microphone signal; EGG = electroglottograph

Average /k/ VOT Values in Habitual Speech 80.0

75.0

70.0 FLW MIC 65.0 EGG

VOT (ms) VOT SP 60.0

55.0

50.0 IDP/V1 P1 V2 P2 V3 EGG SP

Figure 10. Average /k/ VOT Values in Habitual Speech Production. Schematic representation of average VOT values in normal speech production for the /k/ phoneme. Note. FLW = wideband airflow; MIC = microphone signal; EGG = electroglottograph; SP = spectrogram; IDP = initiation detection of phonation; V = valley; P = peak 44

Table 14. Average /b/ VOT Values in Habitual Speech Production. Average VOT values in milliseconds for production of the /b/ phoneme in the phrase “bɑ the bɑ” in a habitual speech production task.

FLW MIC EGG Spectrogram V1 8.5 9.0 – – P1 11.4 11.4 – – V2 13.1 13.5 – – P2 14.7 15.0 – – V3 18.4 19.4 – – EGG – – 12.5 – Spectrogram – – – 14.0 Note. VOT = voice onset time; V = valley; P = peak; FLW = wideband airflow signal; MIC = microphone signal; EGG = electroglottograph

Average /b/ VOT Values in Habitual Speech

19.0

17.0

15.0 FLW 13.0 MIC EGG 11.0 SP VOT (ms) VOT 9.0

7.0

5.0 IDP/V1 P1 V2 P2 V3 EGG SP

Figure 11. Average /b/ VOT Values in Habitual Speech Production. Schematic representation of average VOT values in normal speech production for the /b/ phoneme”. Note. FLW = wideband airflow; MIC = microphone signal; EGG = electroglottograph; SP = spectrogram; IDP = initiation detection of phonation; V = valley; P = peak 45

Table 15. Average /d/ VOT Values in Habitual Speech Production. Average VOT values in milliseconds for production of the /d/ phoneme in the phrase “dɑ the dɑ” in habitual speech production task.

FLW MIC EGG Spectrogram V1 11.9 11.9 – – P1 14.7 14.0 – – V2 16.7 16.7 – – P2 20.5 20.3 – – V3 22.7 22.7 – – EGG – – 18.2 – Spectrogram – – – 18.8 Note. VOT = voice onset time; V = valley; P = peak; FLW = wideband airflow signal; MIC = microphone signal; EGG = electroglottograph

Average /d/ VOT Values in Habitual Speech 25.0 23.0 21.0 19.0 17.0 FLW MIC 15.0 EGG 13.0 SP VOT (ms) VOT 11.0 9.0 7.0 5.0 IDP/V1 P1 V2 P2 V3 EGG SP

Figure 12. Average /d/ VOT Values in Habitual Speech Production. Schematic representation of average VOT values for habitual speech production for the /d/ phoneme. Note. FLW = wideband airflow; MIC = microphone signal; EGG = electroglottograph; SP = spectrogram; IDP = initiation detection of phonation; V = valley; P = peak. 46

Table 16. Average /g/ VOT Values in Habitual Speech Production. Average VOT values in milliseconds for production of the /g/ phoneme in the phrase “gɑ the gɑ” in habitual speech production task.

FLW MIC EGG Spectrogram V1 16.6 16.3 – – P1 19.1 18.2 – – V2 20.7 20.7 – – P2 24.5 22.9 – – V3 26.1 26.1 – – EGG – – 21.7 – Spectrogram – – – 23.1 Note. VOT = voice onset time; V = valley; P = peak; FLW = wideband airflow signal; MIC = microphone signal; EGG = electroglottograph

Average /g/ VOT Values in Habitual Speech

30.0

25.0

20.0 FLW 15.0 MIC EGG

VOT (ms) VOT 10.0 SP

5.0

0.0 IDP/V1 P1 V2 P2 V3 EGG SP

Figure 13. Average /g/ VOT Values in Habitual Speech Production. Schematic representation of average VOT values in habitual speech production task for the /g/ phoneme. Note. FLW = wideband airflow; MIC = microphone signal; EGG = electroglottograph; SP = spectrogram; IDP = initiation detection of phonation; V = valley; P = peak 47

CHAPTER III. DISCUSSION

Measurement Methodology

The current study found a significant effect of measurement approach on VOT duration for voiceless stop consonant production. For voiceless stops, VOT measurements made using the

EGG and spectrographic displays were significantly longer in comparison to those made with the wideband airflow signal, by 8.1 ms and 13.6 ms, respectively. For positive voiced stop consonant production, VOT measurements made using the spectrographic display were significantly longer in comparison to those made with the wideband airflow signal, by 6.3 ms.

The VOT measurements made using the microphone signal were not significantly different from the wideband airflow VOT values for both voiced and voiceless productions (e.g., the difference for voiceless stop consonants of 1.7 ms, shorter for airflow, was not statistically significant, and the difference for voiced stop consonants of 0.10 ms, shorter for airflow, also was not significant).

To the knowledge of the present researchers, only one previous study has taken into account the airflow signal as a potentially accurate signal to measure the burst and phonatory onset for VOT measurement. Koenig (2000) utilized the wideband airflow to measure VOT for production of the /p, b, t, d/ phonemes in healthy men, women, and children and found similar

VOT values as in this study. The airflow VOT values reported by Koenig (2000) for the /b/, /t/, and /d/ phonemes are all within the standard deviations of the airflow VOT values of the current study.

No previous VOT studies have utilized the electroglottograph (EGG) signal (as far as the present researchers are aware) for the determination of VOT. This may be due to the fact the

EGG signal cannot be utilized to accurately determine the location of the burst and the EGG 48 signal waveform will always occur at a later time point as the signal relies on vocal fold contact.

However, it should be noted in the current study the EGG signal often detected vocal fold contact prior to formant excitation in the spectrographic display. In addition, the EGG signal served as a useful tool for delineating between articulatory factors and true pre-voicing when this determination was unclear for pre-burst voiced VOTs.

In regards to the spectrographic display, the wideband airflow display revealed glottal oscillations on average 13.6 ms sooner than the spectrographic display for voiceless stops and

6.3 ms sooner for voiced stops. These differences were found to be significantly different. A number of researchers have historically utilized the spectrographic display as a measurement tool for the determination of VOT based on the recommendations of Lisker and Abramson (1964;

Klatt, 1975; Sweeting and Baken, 1982; Watson and Alfonso, 1982; Caruso and Burton, 1987;

Forrest, Weismer, and Turner, 1989; Robb, Gilbert, and Lerman, 2005; Torre and Barlow, 2009).

The early works of Lisker and Abramson (1964) recommended the initiation of periodicity be demarcated in the spectrographic display by locating the first regularly occurring vertical striation in the second and higher formants. However, given the definition of VOT as the release of oral constriction to the onset of vocal-fold vibration, it is possible the wideband airflow and microphone signal are more sensitive to detecting the first few cycles of vocal fold oscillation that occur prior to vocal fold contact and formant excitation. Choosing to use the wideband airflow or microphone signal therefore results in/ a significantly shorter VOT than using the spectrographic display. The 13.6 ms difference found in this study is superimposed on the VOT distribution presented by Kent and Read (1996) in Figure 14, suggesting that the difference is significant relative to how VOT values are considered. Previous works of Smith, Hillenbrand, and Ingrisano (1986) questioned the validity of the spectrogram for accurate determination of 49

VOT with two investigators finding overall average disagreements of 8 ms and 12 ms, respectively, between spectrographic and oscillographic display analyses, findings supported by the research reported here.

Figure 14. VOT Distribution for Voiced and Voiceless Stops. Figure 6-3 from Kent and Read, 1992, p. 108, with the caption: “Distribution of voice onset time (VOT) values for voiced and voiceless stops, showing approximate VOT ranges for voicing lead, short voicing lag, and long voicing lag.” Superimposed (with arbitrary placement) as the double-arrow line is the time segment 13.6 ms, the time between VOT measures using the wideband airflow or microphone signal vs using the formant excitation seen in a spectrogram.

Measurement Location

Multiple VOT measurement values were determined using the first valley, first peak, second valley, second peak, and third valley as the initial detection of phonation (IDP) in the oscillation of the wideband airflow and microphone signals. Multiple IDP locations were chosen due to varying measurement methodologies found in the VOT research literature. Compared to 50 the first valley IDP location, all other measurement locations resulted in significantly longer

VOT values. The significant differences between these distinct measurement locations may help account for the wide variations in VOT data found across research studies and methodologies.

Researchers have often historically interpreted the initiation of periodicity in the microphone signal differently across research methodologies.

In the current study, the initial detection of phonation was demarcated as the first valley of the first regularly occurring signal oscillation that was related in an obvious way to phonation.

This demarcation of the first valley for the IDP in the current study is similar to the measurement methodology of Karlsson, Zetterhold, and Sullivan (2004) in which the onset of voicing was defined as “the last zero crossing before the onset of periodicity in the waveform” (p.317).

The point of demarcation of the first peak as the first IDP in the microphone and wideband airflow signals is similar to the methodology of Ryalls, Gustafson, and Santini (1999) in which the IDP was demarcated as “the highest point of the first regularly appearing period of the vowel” (p. 171). Additionally, the methodology implemented by Morris, Gorham-Rowan, and Herring (2007) defined the onset of vocal fold oscillation as “the first upward peak of regularly occurring oscillations of the waveform representing vocal fold movement” (p. 31).

Studies defining initiation of vocal fold oscillation similar to that of second valley or second peak were not found by the current researchers, but these locations often were present prior to the spectrographic and electroglottographic points of historical measurement.

Influential Conditions

Voicing Mode. The relatively low frequency of pre-voicing for the production of voiced stop consonants in the current study is consistent with the findings of a number of previous research studies. Morris, McCrea, and Herring (2008) measured 720 voiced VOTs across three 51 vowel contexts (/ɑ/, /i/, /u/) for the three voiced consonants and found only 1% of the tokens were pre-voiced. Another study performed by Swartz (1992) classified voiced VOT values for the /d/ phonemes as six distinctive VOT types, of which two types had a clear negative VOT.

Swartz (1992) found pre-voicing occurred 11.6% of the time within the sample, which is consistent with the overall frequency of voicing mode usage in the current study.

In contrast to the low occurrence of pre-voicing in the current study, Smith (1978) found pre-voicing occurs more frequently across two vowel contexts (/a/ and /i/), with pre-voicing occurring 56% of the time for bilabials, 50% of the time for alveolars, and 39% of the time for velars. However, Smith found a higher frequency of pre-voicing for voiced phonemes preceding the high front /i/ vowel in comparison to the low back /ɑ/ vowel, which led to a lower frequency of pre-voicing. The current study only pertains to the low back /ɑ/ vowel, which may account for this difference in speaker voicing mode usage.

Place of Articulation. Previous VOT research found a significant effect of place of articulation on VOT duration in which bilabials result in the shortest VOT values, velars the longest VOT values, and alveolars an intermediate VOT value (Klatt, 1975; Smith, 1978;

Volaitis and Miller, 1992; Baum and Ryan, 1993; Jancke, 1994; Kessinger, 1997; Robb, Gilbert,

Lerman, 2005; Fischer & Goberman, 2010). The current research findings were consistent with the VOT research literature only for voiced stop phonemes in normal speech production. The voiceless stop phoneme data for normal speech production is inconsistent with the previous VOT research literature. In normal speech production, the voiceless velar phoneme /k/ provided the longest VOT value; the voiceless alveolar phoneme /t/ provided the shortest VOT value, and the bilabial phoneme /p/ provided an intermediate VOT value across the four measurement approaches. The general differences in overall VOT duration as the result of articulatory 52 dynamics are explained in the literature by a number of factors, including aerodynamics, articulatory movement velocity, and differences in the mass of the articulators (Cho and

Ladefoged, 1999). Relative to aerodynamics, the volume of the oral cavity behind the point of constriction for a velar stop consonant is smaller in comparison to that of the alveolar and bilabial stop consonants, and thus has a greater overall air pressure behind the occlusion at the beginning of the release. Cho and Ladefoged (1999) concluded that the higher the air pressure in the vocal tract, the longer the amount of time required for the pressure behind the closure to fall and allow the initiation of vocal fold oscillation. Relative to articulatory contact, velar stop consonants are produced with a larger contact area in comparison to alveolar stops and even more so in comparison to bilabial stop consonants. As a result, the release of constriction is typically slower for velar stop consonants in comparison to the alveolar and bilabial stop consonants, resulting in longer overall VOT values.

Clinical Relevance

Relative to future clinical and research applications, the wideband airflow signal can be utilized to help to substantiate the location of the burst and phonatory onset in the microphone signal. For articulatory conditions of incomplete closer or a repeated opening-closing-opening gesture for a stop consonant release, the wideband airflow signal may give more accurate timing values and physiological insight into the articulatory motions, since the airflow will be zero when there is closure and positive values when the articulators actually separate (and negative values if the articulators separate but the oral – or vocal tract – volume increases). In addition, the location on the airflow signal where glottal oscillation occurs, that is, where the modulating airflow occurs due to initial phonation, also helps to indicate at what point of glottal abduction the initial detection of phonation (IDP) occurs. 53

If the current methodology were to be implemented using the very first occurrence of the

IDP, the VOT values would be substantially shorter in comparison to the previous VOT research literature that has been based on spectrographic formant excitation locations. Given the definition of VOT as burst to onset of vocal fold oscillation, it seems logical that one should adopt a strategy in which one indeed identifies glottal oscillation the first time it occurs, compared to formant excitation of the spectrographic signal and vocal fold contact of the electroglottographic signal. Clinically this is important because it is necessary to know quite accurately how coordinative activities are timed among themselves.

VOT is often viewed as a summative variable of articulator-laryngeal coordination as it reflects temporal control between the larynx, lips, tongue, and jaw. The simultaneity of the microphone and wideband airflow signals found in the current study helps provide a better understanding and orientation to articulatory-laryngeal and respiratory coordination relative to phonation onset. 54

CHAPTER IV. CONCLUSIONS

The current research project comparatively examined four measurement approaches for the determination of voice onset time (VOT) for stop consonant production. The approaches were the use of the wideband airflow, microphone, and electroglottographic signals, and the spectrogram display of the microphone signal. The VOT measure was from the articulatory burst to the initial detection of phonation (IDP), which for the airflow and microphone signal was the first occurrence of signal oscillation that corresponded to glottal cycles. The results of this study indicated the following:

1) Use of the initial detection of phonation (IDP) resulted in similar measures of VOT

for the wideband airflow and microphone signals, but much shorter intervals than for

the electroglottographic signal (about 7.2 ms shorter) and for the spectrographic

display of excitation of the formants (about 12.5 ms shorter).

2) VOT varied in terms of place of articulation, but inconsistently from the literature,

and voicing characteristic (with voiced stops having shorter VOT values than did

voiceless stops).

3) The wideband airflow burst is easy to identify with the wideband airflow when the

acoustic excitation from the microphone is not easy to identify.

4) Use of both the microphone and the wideband airflow signals simultaneously

reinforces each other for the detection of the articulatory burst as well as the initial

detection of phonation, since, for the latter, the glottal oscillation may be subtle

during the first cycles, but simultaneous and present on both the airflow and

microphone signals, especially when attempting to discern glottal oscillation effects

from random noise or non-glottal causing effects. 55

Future research should consider the use of high-speed video in order to verify the vocal fold motion simultaneity with airflow and microphone variation that is related to the first initial detection of phonation. Future research should also utilize a larger sample size in order to determine if the current findings are consistent among a larger sample of participants.

Additionally, the current study only examined syllable initial productions preceding the /a/ vowel and VOT is known to vary in regards to linguistic task and vowel height; thus, future research should examine differences in measurement approaches in a variety of linguistic contexts.

Research examining other influential conditions such as speech rate and speech clarity should also be explored. 56

REFERENCES

Auzou, P., Ozsancak, C., Morris, R. J., Jan, M., Eustache, F., Hannequin, D. (2000).Voice onset

time in aphasia, apraxia of speech and dysarthria: A review. Clinical Linguistics &

Phonetics, 14, 131–150

Abramson, A., & Whalen, D., Voice onset time (VOT) at 50: Theoretical and practical issues in

measuring voicing distinctions, Journal of Phonetics, 63, 75-86

Baken, R.J. (1992). Electroglottography. Journal of Voice, 6 (2), 98-110.

Baken, R.J., Orlikoff, R.F. (2000) Clinical Measurement of Speech and Voice – Second Edition.

San Diego, CA: Singular Thomson Learning

Baker, J., Ryalls, J., Brice, A., Whiteside, J. (2007). Voice onset time in speakers with

Alzheimer’s disease, Clinical Linguistics & Phonetics, 21, 11-12

Baum, S. R., & Ryan, L. (1993). Rate of speech in aphasia: Voice onset time. Brain and

Language, 44, 431–445

Brown, W.S., Morris, R., Weiss, R. (1993). Comparative methods for measurement of VOT,

Journal of Phonetics, 21, 329-336

Caruso, A. J. & Burton, E. K. (1987). Temporal acoustic measures of dysarthria associated with

amyotrophic lateral sclerosis. Journal of Speech and Hearing Research, 30, 80- 87

Chomsky, Noam & Halle, Morris (1968). The Sound Pattern of English. New York: Harper and

Row. pp. 293-292

Cho, T., Ladefoged, P. (1999). Variation and universal in VOT: Evidence from 18 languages.

Journal of Phonetics, 27, 207-229

Davis, K. 1995. Phonetic and phonological contrasts in the acquisition of voicing: Voice onset

time production in Hindi and English. Journal of Child Language, 22, 275 – 305 57

Diehl, R., Souther, A., Convis, C. (1980). Conditions on rate normalization in .

Perception and Psychophysics, 27 (5), 435-443

Edwards, T.J. (1981). Multiple features analysis of intervocalic English plosives. The Journal of the Acoustical Society of America, 69, 535–547

Fischer, E., Goberman, A. (2010). Voice onset time in Parkinson disease. Journal of

Communication Disorders, 43, 21-34

Flint, A., Black, S., Campbell-Taylor, I., Gailey, G., Levinton, C. (1992). Acoustic analysis in

the differentiation of Parkinson’s disease and major depression, Journal of

Psycholinguistic Research, 21 (5), 383-399

Forrest, K., Weismer, G., & Turner, G. S. (1989). Kinematic, acoustic, and perceptual analyses

of connected speech produced by Parkinsonian and normal geriatric adults. The Journal

of the Acoustical Society of America, 85, 2608–2622.

Hampala V., Garcia, M., Svec J.G., Scherer, R., and Herbst, C.T. (2016). Relationship between

the electroglottographic signal and vocal fold contact area. Journal of Voice 30 (2), 161-

171.

Hardcastle, W. J., Barry, R. A. M., & Clark, C. J. (1985). Articulatory and voicing characteristics

of adult dysarthric and verbal dyspraxic speakers: An instrumental study. British Journal

of Disorders of Communication, 20, 249–270.

Hoit, J., Solomon, N., & Hixon, T. (1993). Effect of lung volume on voice onset time (VOT).

Journal of Speech and Hearing Research, 36, 516–521

Holmberg, E., Hillman, R., Perkell, J., Guiod, P., Goldman, S. (1995). Comparisons among

aerodynamic, electroglottographic, and acoustic spectral measures of female voice.

Journal of Speech and Hearing Research, 38, 1212-1223 58

Jancke, L. (1994). Variability and duration of voice onset time and phonation in stuttering and

nonstuttering adults, Journal of Fluency Disorders, 19, 21-37

Karlsson, F., Zetterholm, E., & Sullivan, K. P. (2004). Development of a gender difference in

voice onset time. In Proceedings of the 10th Australian international conference on

speech science & technology (pp. 316 - 321).

Kent, R.D., & Ball, M.J. (2000). Voice quality measurement. San Diego, California: Singular

Publishing Group.

Kent, R.D. and Read, C. The Acoustic Analysis of Speech, Singular Publishing Group, Inc., San

Diego, 1996.

Kessinger, R., & Blumstein, S. (1997). Effects of speaking rate on voice-onset time in Thai,

French, and English. Journal of Phonetics, 25, 143–168.

Klatt, D. H. (1975). Voice onset time, frication, and aspiration in word-initial consonants

clusters. Journal of Speech and Hearing Research, 18, 686–706

Knuttila, E. (2011). The effects of vocal loudness and speaking rate on voice-onset time in

typically developing children and children with cochlear implants. Retrieved from

ProQuest Digital Dissertations.

Koenig, L. (2000). Laryngeal factors in voiceless consonant production in men, women, and 5-

year-olds. Journal of Speech, Language, and Hearing Research, 43, 1211-1228

Lane, H., Perkell, J. (2005). Control of Voice-onset time in the absence of hearing: A review.

Journal of Speech, Language, and Hearing Research, 48 (6), 1334-1343

Lee, L., Chamberlain, L. G., Loudon, R. G. and Stemple, J. C., 1988, Speech segment durations

produced by healthy and asthmatic subjects. Journal of Speech and Hearing Disorders,

53, 186 -193 59

Lieberman, P., Knudson, R., & Mead, J. (1969). Determination of the rate of change of

fundamental frequency with respect to subglottal air pressure during sustained phonation.

Journal of the Acoustical Society of America, 45, 1537–1545

Lin, Chi-Yueh & Wang, Hsiao-Chuan. (2011). Automatic estimation of voice onset time for

word-initial stops by applying random forest to onset detection. The Journal of the

Acoustical Society of America, 130 (1), 514-525

Lisker, L., & Abramson, A. (1964). A cross-language study of voicing in initial stops: Acoustical

measurements. Word, 20, 384–422.

Lisker, L., & Abramson, A (1967). Some effects of context on voice onset time in English stops.

Language and Speech, 10, 1–28.

Marciniec, S. (2009). Voice onset time of women with vocal nodules. Retrieved from ProQuest

Digital Dissertations.

May, N. and Scherer, R. (in press). Aerodynamic consequences of a pneumotachograph mask

leak. Journal of Voice.

McCrea, C.R., Morris, R.J. (2005). The effects of fundamental frequency level on voice onset

time in normal adult male speakers. Journal of Speech, Language, and Hearing

Research, 48, 1013-1024

McCrea, C.R., Morris, R.J., (2007). Voice onset time for female trained and untrained singers

during speech and singing. Journal of Communication Disorders , 40, 418-431

Miller, J.L., Green, K.P., & Reeves, A. (1986) Speaking rate and segments: A look at the relation

between speech production and speech perception for the voicing contrast, Phonetica, 43,

106-115 60

Morris, R.J., Gorham-Rowan, M.N., Herring, K.D. (2007). Voice onset time in women as a

function of oral contraceptive use. Journal of Voice, 23 (1), 114-118

Morris, R.J., McCrea, C.R., Herring, K.D. (2008) Voice onset time differences between adult

males and females: Isolated syllables. The Journal of Phonetics, 36, 308-317

Nelson, N.R., Wedel, A. (2017). The phonetic specificity of competitions: Contrastive

hyperarticulation of voice onset time in conversational English. Journal of Phonetics, 64,

51-70

Petrosino, L., Colcord, R. D., Kurcz, K. B., Yonder, R. J. (1993).Voice onset time of velar stop

productions in aged speakers. Perceptual and Motor Skills, 76, 83–88

Pickett, M (1980). The Sounds of Speech Communication. Baltimore: University Park Press

Robb, M., Gilbert, H., & Lerman, J. (2005). Influence of gender and environmental setting onV

OT. Folia Phoniatrica et Logopaedica, 57, 125–133.

Ryalls, J., Gustafson, K., Santini, C. (1999), Preliminary investigation of voice onset time

production in persons with dysphagia, Dysphagia, 14, 169-175

Ryalls, J., Simon, M., & Thomason, J. (2004). Voice onset time production in older Caucasian-

and African–Americans. Journal of Multilingual Communication Disorders, 2, 61–67

Scherer, R.C., Druker, D.G., Titze, I.R. (1988). "Electroglottography and direct measurement of

vocal fold contact area", in Vocal Physiology: Voice Production, Mechanisms, and

Function, edited by O. Fujimura (Raven Press, Ltd., New York), 279-291.

Smith, (1978). Effect of place of articulation and vowel environment on voiced stop consonant

production. Glossa, 12, 163-175.

Swartz, B. (1992). Gender differences in voice onset time. Perceptual and Motor Skills, 75, 983–

992 61

Sweeting, P.M., & Baken, R.J., (1982). Voice onset time in the normal-aged population, Journal

of Speech, Language, and Hearing Research, 25, 129-134

Thomas, R.M. (2012). Effect of age and sex on voice onset time in healthy individuals. Retrieved

from ProQuest Digital Dissertations.

Titze, I. (1990). Interpretation of the electroglottographic signal. Journal of Voice, 4 (1), 1-9.

Torre, P., & Barlow, J. A. (2009). Age-related changes in acoustic characteristics of adult

speech. Journal of Communication Disorders, 42(5), 324-333

Volaitis, L. E., & Miller, J. L. (1992). Phonetic prototypes: Influence of place of articulation and

speaking rate on the internal structure of voicing categories. The Journal of the

Acoustical Society of America, 92, 723–735

Watson, B., & Alfonso, P. (1982). A comparison of LRT and VOT between stutterers and

nonstutterers. Journal of Fluency Disorders, 7, 219-241

Whiteside, S., Hanson, A., Cowell, P. (2004) Hormones and temporal components of speech:

differences and effects of menstrual cyclicity on speech. Neuroscience Letters, 367, 44-

47

Yang, B. (1993). A voice onset time comparison of English and Korean stop consonants,

Dongeui Journal, 20, 41-59

Yu, V., De Nil, L., Pang, E. (2015). Effects of age, sex and syllable number on voice onset time:

Evidence from children’s voiceless aspirated stops. Language and Speech, 58 (2), 152-

167 62

APPENDIX A. INSTITUTIONAL REVIEW BOARD APPROVAL LETTER 63

APPENDIX B. HEALTH, SPEECH, AND VOICE QUESTIONNAIRE

Please check the box that applies to you. If you have a question about anything on this questionnaire please ask one of the experimenters.

Yes No 1. Are you currently healthy?

2. Do you currently suffer from allergies?

3. Do you currently smoke?

4. Do you consume alcoholic beverages?

5. Does your voice and speech today sound representative of your voice and speech on a daily basis?

6. Have you ever received voice therapy?

7. Have you ever received speech therapy?

8. Are you currently receiving voice therapy?

9. Are you currently receiving speech therapy?

10. Have you ever received professional speech training?

11. Have you ever received professional voice training?

12. Are you currently receiving professional speech training?

13. Are you currently receiving professional voice training?

Name: ______Date: ______