<<

STEADINESS OF SCALES BY UNTRAINED ADULT FEMALES

Michelle Mary Bretl

A Thesis

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

May 2018

Committee:

Ronald C. Scherer, Advisor

Jason A.Whitfield

© 2018

Michelle Bretl

All Rights Reserved iii ABSTRACT

Ronald C. Scherer, Advisor

As a singer, register transitions can be challenging to navigate. A singer must perceive where the transition is occurring and apply adjustments to smooth the transition. Within register transitions, one may experience various types of vocal instabilities. The primary aim of this research was to study the production mechanisms characterizing vocal instabilities in the untrained female singer. This was an exploratory study that included five untrained female singers who produced “normal” octave scales as well as scales as smoothly as they could.

Simultaneous recordings of airflow, microphone, and EGG signals were analyzed. The scales were divided into three groups based on the perceptual consensus of level of smoothness.

Unsteady scales contained aphonic segments, abrupt fluctuations, obvious intensity changes, and unexpected fundamental frequency (fo) variations. Subtler unsteady scales exhibited noticeable but “understated” quality changes, increased speed of fo changes or overshoots, and smaller yet evident intensity variations. The participants often produced perceivably smooth scales with minimal instability. Results suggest that untrained female singers are capable of producing perceptually smooth scales across register transitions. However, within some of these perceptually smooth scales, subtle changes and disturbances were noticed that result in the perception of minor instabilities. These subtleties are often seen more clearly within the airflow signal, EGG signal shifts, and fo rate of change than are aurally perceptible. For the unsteady scales, the more obvious instabilities were seen within nearly all measures, most notably in the airflow, fo, and intensity contours, and EGG waveform width and height. This study offers insights into a wider range of steadiness of vocal production where objective recordings reveal subtle changes that are difficult to hear. iv

This thesis is dedicated to my teacher and mentor of 10 years, Dr. Kari Ragan.

You believed in me when I was just a kid with big dreams but few plans. v ACKNOWLEDGMENTS

I am extremely grateful to my advisor, Dr. Ronald Scherer, who has continuously supported my goals through his expertise, guidance, and patience. He always found time answer my questions, challenge my thinking, and encourage my creativity. This project would not be the same without his willingness to support my ideas and interests.

I am very thankful to my parents, John and Teresa Bretl, who were my first teachers.

They instilled in me both the joy and importance of continued education and critical thinking in a world with so many unanswered questions. Their commitment to becoming experts in their respective fields inspires me to aim high and never settle for the information I already know.

My high school and undergraduate mentor, Dr. Kari Ragan, played an immense role in carving the path I have set out on, which would likely look very different without her guidance.

She was always one of the first people to reinforce my changing interests and plans, and she never told me I couldn’t do it, even if it might be challenging. I was so lucky to have her as a

mentor in such formative and crucial years.

Finally, I am incredibly appreciative of my fiancé, Nick Picatti. He has filled so many

different roles over the past year and a half, including my cheerleader, assistant, counselor,

friend, and sounding board. But most importantly and consistently, he has been my partner and main source of support. From the beginning of this project, he has shown continued interest in what I am doing and learning, and continues to ask when he can read the finished product. We have not only survived the 2,000+ mile separation but have thrived both individually and as a unit. Thank you so much for your patience and willingness to let me be a little selfish. vi

TABLE OF CONTENTS

Page

CHAPTER I: INTRODUCTION……………………………………………………………. 1

Vocal Register Definitions…………………………………………………………… 1

Register Transition Research………………………………………………………… 3

Vocal Instabilities……………………………………………………………………. 7

Current Study………………………………………………………………………… 9

CHAPTER II: METHODS………………………………………………………………….. 11

Subjects……………………………………………………………………………… 11

Instrumentation……………………………………………………………………… 12

Tasks………………………………………………………………………………… 13

Recording Procedures……………………………………………………………….. 14

Measures…………………………………………………………………………….. 15

Identifying instabilities………………………………………………………. 16

Statistical Design……………………………………………………………………. 17

Pilot Study…………………………………………………………………………… 18

CHAPTER III: RESULTS…………………………………………………………………… 21

Smooth Group……………………………………………………………………….. 21

Middle Group………………………………………………………………………… 34

Unsteady Group……………………………………………………………………… 43

CHAPTER IV: DISCUSSION……………………………………………………………… 61

CHAPTER V: CLINICAL IMPLICATIONS AND FUTURE DIRECTIONS…………….. 64

CHAPTER VI: CONCLUSIONS…………………………………………………………… 66 vii

REFERENCES……………………………………………………………………………… 67

APPENDIX A: EXPANDED INSTABILITIES TABLE…………………………………… 70

APPENDIX B: IRB APPROVAL…………………………………………………………… 78 viii

LIST OF FIGURES

Figure Page

1 FP1 Smooth Scale Contours ...... 22

2 Trend-removed EGG signals in FP1 Smooth scale ...... 23

3 fo derivative over time for the FP1 Smooth scale...... 25

4 fo derivative over frequency for the FP1 Smooth scale ...... 25

5 FP2 Smooth Scale 1 Contours ...... 26

6 fo derivative over time for the FP2 Smooth scale 1 ...... 28

7 fo derivative over frequency for the FP2 Smooth scale 1 ...... 29

8 FP2 Smooth Scale 2 Contours ...... 30

9 Trend-removed EGG signals across FP2 Smooth scale 2 ...... 31

10 fo derivative over time for the FP2 Smooth scale 2...... 33

11 fo derivative over frequency for the FP2 Smooth scale 2 ...... 33

12 FP3 Middle Scale Contours ...... 35

13 Spectral slope comparisons for FP3 Middle Scale ...... 36

14 Trend-removed EGG signals for FP3 Middle Scale ...... 37

15 fo derivative over time for the FP3 Middle scale...... 38

16 fo derivative over frequency for the FP2 Middle scale ...... 39

17 FP2 Middle Scale Contours ...... 40

18 Spectral slope comparisons for FP2 Middle Scale ...... 41

19 Trend-removed EGG signals for FP2 Middle Scale ...... 42

20 FP5 Unsteady Scale 1 Contours ...... 44

21 Raw EGG signal across FP5 Unsteady Scale ...... 45 ix

22 Spectral slope comparisons for FP5 Unsteady Scale ...... 46

23 fo derivative over time for the FP5 Unsteady scale...... 47

24 fo derivative over frequency for the FP5 Unsteady scale ...... 47

25 FP1 Unsteady Scale Contours ...... 48

26 FP1 Unsteady Scale Contours – first intensity drop ...... 49

27 FP1 Unsteady Scale Contours – second intensity drop ...... 50

28 FP1 Unsteady Scale raw EGG signal ...... 50

29 fo derivative over time for the FP1 Unsteady scale...... 53

30 fo derivative over frequency for the FP1 Unsteady scale ...... 53

31 FP2 Unsteady Scale Contours ...... 55

32 Trend-removed EGG signals for FP2 Unsteady Scale ...... 56

33 Spectral slope comparisons for FP2 Unsteady Scale ...... 57

34 fo derivative over time for the FP2 Unsteady scale...... 58

35 fo derivative over frequency for the FP2 Unsteady scale ...... 58 x

LIST OF TABLES

Table Page

1 Participant experience specifications ...... 11

2 Instabilities across all analyzed scales ...... 60 1

CHAPTER I: INTRODUCTION Definitions

Vocal registers have been defined and described by professionals in many fields, including physicians, voice scientists, singers, and voice teachers, among others. Hollien (1974) defined the three speaking registers that - pathologists use today: pulse (vocal

fry), modal (chest) register, and loft () register. For adult males, the fo ranges across these registers are approximately 1-70 Hz (fry), 75-500 Hz (modal), and 150-750 Hz (falsetto). For

adult females the ranges are approximately 1-70 Hz (fry), 130-750 Hz (modal), and 220-1700 Hz

(falsetto) (Hollien, 1974). Colton and Hollien (1973) suggested a similar definition but included measures of subglottal air pressure and individual perception to supplement their definition.

Perceptual measures of each register are important in voice science and singing, where professionals (e.g., voice teachers or speech-language pathologists) use perception in order to teach clients and students proper awareness of their own voices. Hollien’s 1974 article, “On

Vocal Registers,” discussed necessary considerations for defining vocal registers in the future: “I maintain that, before the existence of a particular vocal register can be established, it must be operationally defined: 1) perceptually, 2) acoustically, 3) physiologically and 4) aerodynamically” (p. 2). Hollien sees the significance of each element in a register and discusses the need for a proper definition before scientists can determine the separation of registers. As described later in this thesis, a similar orientation to describing registration characteristics will involve examining perceptual, acoustic, airflow, and glottographic signals and analyses, supporting the notion that registration variation is a complex phenomenon.

It is important for speech-language pathologists to make distinctions between registers in the speaking voice and registers in the singing voice. In the 19th century, Manuel Garcia, an

historically important teacher, pioneered the definition of singing registers based on his teaching 2

experience. “The word register means, a series of consecutive and homogeneous

produced by the same mechanism, and different essentially from other sounds originating in

mechanical means of a different kind” (Garcia, 1924, p. 4). This definition may seem clear and

concrete at first glance. However, this specification of a mechanical adjustment, rather than a

description of laryngeal and acoustic factors combined, resulted in many discussions on the

topic. Titze (1988) organized passaggi (another word for the transitions between registers) of the

voice as two possible types of transitions. The first transition is explained by changes in

periodicity, where there is a pulsing perceivable to the ear created by excitation when

phonating in vocal fry but not in modal where the individual pulsations are not heard. The

second transition is an alteration in the of the voice, where different are expected

to be distinguishing characteristics between modal and falsetto. This transformation in the

quality is defined as an “abrupt quality change,” acoustically in the spectral slope and

physiologically in glottal adduction (p. 183), when shifting between modal and falsetto. Titze’s

view of registration assigns measurable parameters to each relatively pure register to differentiate

where one register zone ends and the next begins. His idea of quality in the second or timbre

transition refers to the spectrum for each register, based on the relationship between the

and resonances, as well as the spectral balance and slope.

Murry et al. (1998) expanded the literature of register distinction by considering different

registers in terms of acoustic signals and viewed the configuration of the glottis. Glottal configuration can affect aerodynamics, and the study hypothesized how the shape of the glottis

(whether complete or incomplete glottal closure) might differentiate registers. Nathalie Henrich

(2006) suggested that the discrepancy within the topic stemmed from insufficient specific

information rather than the breadth of literature and opinions. She outlined the differing 3

approaches to the definition of registers, explaining how speaking and singing registers vary.

Speaking within one register is defined as tones produced by the same laryngeal mechanism,

while singing within one register is defined as tones produced with the same (or similar) vocal

quality (Henrich, 2006). The definition used to explain singing registers then supports the notion

that many other registers can exist in singing. Additionally, Henrich appears to suggest that

multiple laryngeal mechanisms could be associated with the same register as long as the

perceptual “vocal quality” does not change. Roubeau et al. (2009) adds that registers of the

voice can be viewed as distinct regions over the entire frequency range of the voice,

varying from person to person. These register differentiations are a result of certain

manipulations of the and vocal tract. The numerous explanations and definitions of vocal

registration continue to fuel many discussions across different professions, resulting in further

investigations of the interaction between registers and therefore also register transitions.

Register Transition Research

For a singer, register transitions, that is, singing from one register into an adjacent register, can be challenging to navigate. Not only does a singer need to feel and perceive where and when the transition is occurring or will be occurring, but also needs to apply any technique or adjustment possible in order to smooth the transition (typically a smooth transition in classical singing is desired and required, whereas in yodeling, for example, an abrupt register change is expected). Research on the science of register transitions has used a variety of approaches and measures to describe those transitions.

Biomechanical modeling and excised laryngeal modeling of the transition between the chest and falsetto registers have provided evidence of laryngeal and acoustic bifurcations (Švec 4

et al., 1999; Tokuda et al., 2010). These junctions are parametrically displayed through “period

doubling, period tripling, and irregular vibrations” (Švec et al., 1999, p. 1526). Švec et al.

highlighted a short amount of time where both register systems were activated, defined as “a

one-dimensional example of a region of coexistence of the two registers” (p. 1529). Furthermore,

Tokuda et al. (2010) discovered that, while manipulating the subglottal resonator really does not affect register transition, supraglottal resonances significantly affect the register transition between the chest and head/falsetto register. By varying the length of tubing above and below the sound source, where length changes alters the resonance frequencies, a pitch and amplitude jump is observed at the point where the chest-to-falsetto register transition occurs. These jumps support the idea that supraglottal resonances can significantly influence register transitions

(Tokuda et al., 2010).

Register transitions have been studied in human participants through the use of sung intervals or leaps between registers. Miller et al. (2002) discovered a characteristic leap interval

(CLI) that distinguished natural registers in singing, where the interval was shorter in women than in men. This interval was determined by examining fo, electroglottographic (EGG)

measures of vocal fold contact, and glottal closed quotient. Švec et al. (2008) continued to

explore register transitions through singing leaps between register boundaries. The register

transitions were those from a single female subject who was capable of producing three distinct

vocal registers (and qualities): whistle, head, and chest registers. The study looked at transitions

from chest to head register and from head to as well as some calculated pitch

jumps. Frequency locations for the transitions were pinpointed and information on adduction and

vocal fold vibration was collected on the basis of closed quotient measurements. Continuing with

the measure of human register transitions, Bernadin et al. (2014) used acoustic features and EGG 5

measures to characterize the register transition. The study then outlined the use of the derivative

of the EGG signal in female singers. This derivative gives robust information regarding the

opening and closing moments of the glottis.

Morris, Okerlund, and Dolly (2012) explored acoustical and physiological adjustments

made by singers. The singers sang a triad on the vowel [a:] and audio and EGG signals were

analyzed. The authors found that the singers were physically managing and adjusting their

resonance strategy to match their EGG closed quotient value. Morris, Okerlund, and Bernadin

(2015) further analyzed register shifts in female singers, but studied a separate calculation of a

“closing quotient” in order to better distinguish any change in the quotient over a register shift.

This new quotient calculation included the differentiated EGG signal as well as the differentiated

audio signal. Overall, the results indicated that the singers maintained a relatively constant

closing quotient, even across register transitions. Finally, Morris, Okerlund, and Craven (2016)

set out to identify the specific spectral and EGG signal differences through the primo

of female singers. The singers sang the [ɑ:] vowel spanning one octave across the chest and head

register. The closed quotient rating determined from the EGG signal showed a small decrease

through the ascension of pitch, but some singers exhibited different closed quotient results.

The interaction of the thyroarytenoid (TA) and cricothyroid (CT) muscles in terms of

register transitions has also been studied. It was asked whether a register transition is marked by

a change in the activity of these dominant muscles in the larynx. Kochis-Jennings et al. (2012) studied whether productions of pitches within different registers (including chest, chestmix, headmix, and head registers) showed distinctive acoustic characteristics, TA (thyroarytenoid) and CT (cricothyroid) muscle activity, and adduction ratings, to better understand the effect of pitch and register on these changes. Participants phonated 3-5 frequencies (Eb4–F4, Eb4–G4, 6

E4–Ab4, or F4–A4 or Bb4, depending on ). They found that the participants increased

TA muscle activity and vocal fold adduction in shifting from head to headmix to chestmix to chest register with sustained . CT muscle activity was not related to register shift on the same pitch, but rather on change of pitch through the registers. Kochis-Jennings et al. (2014) followed up on these results, further exploring dominant muscle activity across registers.

Electromyography (EMG) was used to measure activity of each of the muscles during pitch glides through the same four registers. Results showed that for frequency production above 300

Hz, either the CT muscle activity dominated or the two muscle activities (CT and TA) were nearly equal, regardless of register.

Titze (2014) also contributed to the discussion of muscle dominance by encouraging the inclusion of glottal air pressures at the level of the vocal folds, which affects perception of pitch, loudness, and vowel identity. These pressures included intraglottal air pressure and transglottal air pressure. “Glottal adductory geometry,” as Titze calls the shape of the phonating glottis, helps evaluate the registration of the voice. Angular glottal shape alone can lead to unstable vocal registration; however, when subglottal and transglottal pressures are adjusted, registration will likely stabilize.” Herbst et al. (2015) developed an in-depth exploratory study of vocal register shifts and the various factors that mark this shift in a male singer. This study required the singer to sustain the vowel [i:] at one frequency, shifting between registers; videokymographic, EGG, airflow, subglottal pressure, and voice source data were collected. The study ultimately determined that the register transitions were well defined in the EGG waveform and the displayed changing glottal flow and subglottal pressure signals. Similar changes were seen when the singer manipulated adduction quantities as well as subglottal pressure. 7

Vocal Instabilities

Vocal fry is a register of the voice characterized by very low fundamental frequencies.

When under control, this register serves a purpose in communication and can even be used as a

therapeutic technique (Cielo et al., 2011). However, constant use of vocal fry may be associated

with vocal folds that have been damage, manifesting as unstable productions of pitch and

loudness.

Echternach et al. (2016) discussed the effects of vocal fold lesions on the passaggi

and surrounding pitch regions. Four of the seven professional singers in the study experienced

major instability in phonation across the passaggio regions due to mass lesions, while phonation

above and below the vocal register transitions were nearly stable. The female singers who

demonstrated instability in the transition as a result of the added mass only displayed instability

in the upper passaggio, while the lower passaggio remained relatively unaffected (Echternach et

al. 2016).

Furthermore, the natural instability of the adolescent voice change can often be

recognized in the transition between registers. A main characteristic of the adolescent voice is

the unsteadiness of pitch, likely due to the rapid growth of the muscles and organs involved in

speech (Boltezar, Buger, & Zargi, 1997). An adolescent may experience a relatively fast drop in

the frequency at which he speaks, causing the voice to have frequent register shifts during this change. is another example where the voice experiences instabilities. The person uses the falsetto register predominantly, but the anatomical and physiological changes warrant a change to a lower speaking pitch (Vaidya & Vyas, 2006). The pull between habitual action and larynx-growth accommodation causes instability in the voice of one who experiences puberphonia. 8

To contrast, there are times in singing when register deviation, and sometime instability, is purposeful and meaningful in context. Many roles in require males to sing in a falsetto register. The male falsetto voice can be used for comedic reasons, where the voice may utilize the falsetto register for certain words (could be sung or spoken) or pitches. This register may also be recommended for ballads in order to achieve a more sensitive timbre. Sundberg et al. (2011) discussed the different voice types by which a singer in the Peking opera may be classified.

Several voice types are labeled by the use of a “small” (falsetto) voice.

Several studies have outlined parametric measures that define an unstable nature in a register transition. For example, a “register break” at register transitions have features such as abrupt pitch changes and amplitude jumps (Roubeau et al., 1987; Švec et al., 1999). Furthermore, instabilities are seen in the EGG signal as distinct peaks differing from a more consistent pattern

(Selamtzis & Ternstrom, 2014). Selamtzis and Ternstrom (2014) also found, however, that instabilities in a register transition may be incorrectly identified in a signal with a low signal-to- noise ratio due to the similarity between the EGG signals of noise and vocal instability.

Tokuda et al. (2007) approached the idea of register transitions and voice instabilities through the use of excised larynges and biomechanical modeling. The experiment had the larynges and models exhibit vibrations similar to those of the chest and falsetto registers. Using a three-mass model, Tokuda et al. examined a variety of instabilities, including frequency jumps with hysteresis, as well as a demonstration of subharmonics and chaos near those frequency jumps. Tokuda et al. (2010) then assessed the role of vocal tract resonators in the biomechanical modeling of transitions from chest to falsetto registers. Results from subglottal pressure variation simulation concluded that changes in subglottal pressure affect amplitude and hysteresis.

Additionally, supraglottal length and shape greatly influenced the overall transition from the 9 chest register to the falsetto register, while the subglottal length showed little influence (Tokuda et al., 2010).

In the existing research, vocal “instability” in singing or speaking is not well defined.

That is, the possible reasons and explanations for perceived instabilities or inconsistences in the voice have not yet been fully explored. Švec et al. (1999) discussed “vocal breaks,” defined as

“a sudden transition from one voice register to another,” across the first singing register transition (p. 98). These quick breaks or register shifts are signature instabilities mentioned in the literature, as it is possible to recreate them in modeling. The idea of “chaos” is also often introduced specific to instabilities in the voice (Jiang, Zhang, & McGilligan, 2006; Titze et al.,

1993). Chaos has often been described in the framework of nonlinear dynamics, which has been used to discuss newborn cries (Mende et al., 1990), pathological voices (Herzel et al., 1994), and contemporary singing (Neubauer et al., 2004). However, the literature indicates that either the voices studied have known physiological irregularities, or the nonlinear dynamic measures are to be used to indicate pathophysiological dysfunction.

Current Study

While register transitions and vocal instabilities have been topics in the literature, there is insufficient direct information on how an instability may manifest. Specifically, the current study focuses on how the production of vocal aerodynamics, acoustics, and glottographic signals are affected, both in smooth and unsteady transitions between notes and registers, for scales sung by untrained young adult women with normal phonation. Untrained singers have been subjects of other studies but have not been studied in the context of the smoothness of scales and how they navigate these challenging areas of the voice. When a person trains to become a professional (or 10 avocational) singer, the passaggio may not be produced as smoothly as it is expected to be. This study provides information about laryngeal instabilities across scales that cause the perception of unsteadiness.

This study is a multi-signal description and analysis of a range of vocal unsteadiness in untrained female singers. The primary aim of this study was to identify voice instabilities during singing through the singer’s passaggi and describe them relative to changes is fo, airflow, intensity, inferred adduction, and acoustic spectra. Additionally, unsteady transitions are compared to smooth transitions within the same and between different singers. It was hypothesized that “smooth transitions” would result in little variation in the signals and associated measures besides the expected pitch changes associated with singing scales, with little indication of a transition between registers, and “unsteady transitions” would be revealed by variation in each of the objective signals and measures. An additional hypothesis was that more subtle characteristics of unsteadiness would be more obvious from objective signals and analyses than auditorily perceived. 11

CHAPTER II: METHODS

Subjects

Five untrained young adult female singers completed the tasks in the study. Participants were chosen based on four criteria: the person has no history of private voice lessons (choral experience was accepted), has the ability to produce an octave scale in a lower and upper range, has relatively normal speech and voice, and is not a smoker. The participants had a variety of experience in , bands, and other groups. The participants were all females, ages 22-

28 years old. See Table 1 for participant characteristics. All participants signed a consent form that explained the purpose, procedures, risks, and benefits of the study, and also filled out a health and voice history form. This information ensured that no participant had any known pathologies of the larynx and/or vocal folds. All subjects were healthy at the time of recording and had no history of voice problems or other related health issues within the last month.

Table 1: Participant experience specifications.

Participant Participant Participant Participant Participant FP1 FP2 FP3 FP4 FP5

Choir none 5 years none 2 years 4 years

Band or 4 years 4 years none none 11 years Orchestra

Musical none none none 2 years 1 year Theater

Occasional Exposure to Currently karaoke, Singing at Occasional Other singing at singing in singing at home karaoke home church band home 12

Instrumentation

This study used three signal types: airflow, microphone, and the electroglottograph. The

microphone [Model 33-3013, RadioShack Corporation, Fort Worth, TX with preamplifier] was

used to record the audio signal for acoustic analysis. The aerodynamic system with a

circumferentially vented flow mask (Glottal Enterprises, Syracuse, NY; model MSIF-2 S/N

2049S) was used to record broadband flow (Rothenberg et al., 1972). Two different sized masks were used during the experiment to ensure that the mask fit each participant’s face. The electroglottographic (EGG) signal (Kay Elemetrics, Lincoln Park, NJ, USA) was analyzed using custom software (Sigplot, a Matlab-based program) to infer vocal fold contact measures and disruptions in phonation. The remaining equipment was situated outside the booth and included the signal acquisition system with a digital oscilloscope (DATAQ Instruments, Inc., Akron, OH, model DI-720, with WINDAQ software) and a Dell (OptiPlex 780, Round Rock, TX) computer.

The pneumotachograph flow masks were calibrated based on standard procedures of constant flow (with the use of rotameter flowmeters), with uncertainty within approximately ±3% over a range of ±4000 cm3/s.

The electroglottograph was used to obtain waveforms of the assumed changes in vocal fold contact area. The device included two small plates that were placed on the skin over the

right and left laminae. The signal obtained is a demodulated variation of the impedance

(high frequency, low amperage) through the neck as the vocal folds vibrate. In terms of flow, the

study determined the range of flow changes across scales, discussed as either expected and/or

consistent changes or instabilities in the signal. The microphone signal was used to measure

acoustic characteristics across each scale, including frequency, intensity, spectral changes, and

the frequency derivative (fo rate of change). 13

The microphone, airflow, and EGG signals were recorded into separate channels of the

DATAQ A/D converter using a sampling rate of 20,000 Hz per channel and DATAQ’s WinDAQ software. The EGG and flow signals were later also displayed as smoothed signals, where smoothing was a weighted moving average of seven consecutive samples, resulting in no time shift of the signals. Smoothing was repeated until the signals showed only major variations (1000 smoothing cycles for the EGG and 2000 smoothing cycles for the airflow). The smoothing was performed after the WinDAQ files were read in by the custom Sigplot software.

Tasks

The untrained singers in this study first produced their “natural” or “comfortable” singing of diatonic octave scales that had notes traditionally known to move between registers, established prior to recording. The singer began with a lower octave diatonic scale that crossed the lower passaggio (primo) register transition. Then the singer moved to a higher octave diatonic scale that crossed the upper passaggio (secondo) register. Across each passaggio, the participant sang on a continuous /a/ vowel on one breath and did the same on a continuous /i/ vowel. Additionally, the participants sang on each of these vowels at two different dynamic levels, which are referred to as mezzo piano (medium soft) and mezzo forte (medium loud). At least five trials were completed for each register transition, vowel, and loudness level combination.

Following the production of “natural” or “comfortable” scales, the participants were asked to produce the same scales as smoothly as they could and then as unsteady or “clunky” as they could. The same procedure was followed for this second part, including the two vowels and two loudness levels for each of the two octaves. 14

Recording Procedures

The experiment took place in the Voice Physiology Laboratory, Bowling Green State

University. Each of the five participants went through a short training period prior to recording.

Diatonic scales were established based on prior literature and the investigators’ perceptions of where the primo and secondo passaggi fell for each participant; the first octave scale was in the lower end of the participant’s range and the second octave scale was in the higher end of the range. The participant practiced the scales three to five times each at both soft and loud intensity levels. Once the participant was able to independently produce each scale at each loudness level, the equipment was added to the training. The EGG collar was placed around the participant’s neck at a location based on the two electrodes contacting the thyroid laminae just below the thyroid notch and the EGG signal appearing to be relatively strong. The participant was then instructed on use of the vented pneumotachograph mask. The participant held the mask against her own face to ensure that she could easily remove the mask at any time. Each participant was made aware of the importance of keeping the mask held tightly against her face to avoid the possibility of an airflow leak around the rim of the mask. The primary investigator remained in the booth with the participant during recording to watch for potential mask leaks, and the other investigator watched the signals on the computer monitor outside the booth to try to identify any possible mask leaks or problems with the other signals. The microphone was placed approximately 3 inches from the singer’s mouth using a headband mount.

When all signals appeared strong and appropriate for each participant, recording procedures began. The primary investigator remained in the sound booth with the participant and cued the participant for each task. The investigator also played the first note of the required scales using a portable keyboard and instructed any changes that needed to be made during the 15 recording procedure. The participant was given a 5-10-minute break between the “natural” scales and the smooth productions and purposefully unsteady productions.

Measures

From the three signals, several measures were derived. First, the acoustic microphone signal provided measures of fundamental frequency and intensity. The acoustic signal was analyzed through Praat software and both the pitch and intensity listings were extracted. The sample rate for the pitch listing was every five milliseconds, while the sample rate for the intensity listing was the default rate of every 13 milliseconds. The higher sampling rate for the pitch listing was used because of the use of the first derivative measure of the fundamental frequency. This derivative measure was determined by taking frequency two minus frequency one (fo2- fo1) divided by time two minus time one (t2-t1). This function was repeated for all time and frequency points extracted from Praat. Once the derivative points were determined, the derivative was plotted over the frequency on the x-axis on one plot and then plotted over time on the x-axis in another plot. Additionally, from the acoustic signal, spectra could also be extracted on Praat. Narrowband spectra were used to compare spectral slope from the first to the third formant over the scales.

The second signal, the airflow signal, provided information about transglottal airflow across the scales. The raw signal provided broadband airflow, illustrating clearer changes in amplitude of the signal. The airflow signal was then smoothed using a sample rate of 2000 within Sigplot software to better visualize gross changes in the airflow across the scale. Finally, the third signal was the EGG signal. This signal was analyzed on Sigplot and was examined according to the waveform and changes in the waveform shape as well as the smoothed EGG 16 signal. This signal was smoothed using a sample rate of 1000 within the Sigplot software and indicated vertical shifts in the signal.

Identifying instabilities.

A total of 13 different instabilities were realized throughout all of the scales that were analyzed. Aphonic segments in the frequency line were common. This instability was marked as present when the fo signal dropped out, often called a “voice break.” A large, abrupt intensity change was another instability that occurred within scales and was considered an instability only if the intensity change was greater than 10 dB at once. This abrupt intensity change always accompanied aphonic segments, but also appeared independently within scales. Another perceptual instability often heard across the scales are abrupt fo shifts of wobbles that are unrelated to vibrato. These wobbles are often most easily perceived on sustained pitches when the fo is expected to remain very consistent. Finally, perceptually, an instability or unsteadiness is perceived in a clear voice quality change. This was a consensus agreement between the principle investigator and advisor to determine whether or not there was a presence of quality change across the scale.

One instability recognized in the airflow signal was a large abrupt change of greater than

150 cm3/s. These changes commonly occurred across pitch changes and were not as noticeable perceptually as they were objectively. Additionally, specific to airflow, the inconsistency of changes airflow patterns across pitch changes was considered to be an unsteadiness within the scales. If the participant began the scale using an adductory pattern on pitch changes (according to the airflow signal) and changes to a more abductory pattern within the scale, that would be considered an inconsistency and unsteadiness.

The fo derivative provided further information about unsteadiness, especially when 17 changes were subtler. The derivative is also referred to as the rate of change of the fo. One instability noted in the rate of change plot is a pitch overshoot of greater than 375 Hz/s. A pitch overshoot most easily appears in the plot of the derivative over time. The plot creates a peak in the opposite direction of the other peaks directly following a pitch change. Secondly, the derivative plot may show large variation in the height of the peaks. Two instabilities result from this variation. The first is a measure of the maximum positive peak to the maximum negative peak, and an instability was noted when the two peaks were greater than 1600 Hz/s apart. The second is a measure of the maximum positive peak to the minimum positive peak for ascending scales or the maximum negative peak to the minimum negative peak for descending scales. To be considered an instability, the peaks had to be greater than 900 Hz/s apart.

The EGG signal is the last remaining signal that provided information about unsteadiness within scales. One instability was a change in the EGG waveform height by greater than 30%.

This height change indicates change in tissue contact during phonation. In addition to height change, the EGG width of the waveform (EGGW50) could change by more than 0.1 to be considered an instability. The final two instabilities look at the EGG signal as a whole. One instability is the inconsistent direction of pattern of the DC shifts on each pitch change, which would be seen most easily on the smoothed EGG signal. Finally, the last instability was the presence of a dropout of the raw EGG signal. Many scales did not have any EGG information throughout the entire scale, which would not be included in this instability. Instead, this instability notes in the EGG signal was present for more than one sustained pitch within the scale and then dropped out for more than one sustained pitch during any point in the scale. 18

Statistical Design

This study is exploratory and descriptive in nature. Traces of the fundamental frequency, intensity, EGG waveform and smoothed EGG, and smoothed airflow contours were examined for each scale. Differences in the cycle-to-cycle characteristics were examined for irregular

regions within each contour. Additionally, the time derivative of the fundamental frequency

signal was examined to reveal more subtle changes in the scales.

Pilot Study

The pilot phase of the study was conducted in order to determine a few of the possible

variations of the signals due to relatively controlled unsteadiness gestures while singing the

scales. The subject was the primary investigator because of her background in singing. The

subject recorded several attempts at unsteady register transitions through the primo passaggio

while phonating on the vowel /a/, representing possible techniques or strategies the participants

would use in the actual study to represent unstable transitions, including a register “break” (i.e.,

abrupt frequency shift), hyperadduction, and roughness across the transitions.

The primary investigator completed these recordings in the Voice Physiology Laboratory.

A microphone was placed approximately 3 inches from the mouth using a headband mount in

order to record the audio signal for acoustic analysis. The subject also held the vented

pneumotachograph mask tightly against her own face to record broadband flow within the

productions. The small, sterilized plastic tube included in the mask was placed in the corner of

the mouth in order to measure oral pressure to estimate subglottal pressure values. The

electroglottograph (EGG) device was placed on the skin over the right and left thyroid laminae.

The EGG signal was analyzed to infer a variety of vocal fold contact measures and disruptions in 19

phonation. The microphone, airflow, oral air pressure, and EGG signals were recorded into

separate channels of the DATAQ A/D converter using a sampling rate of 20,000 Hz per channel

and DATAQ’s WinDAQ software. The recordings were then imported into Sigplot. Spectral

characteristics were analyzed using the acoustic analysis program Praat.

Across each of the unsteady transitions, changes were noticed in all signals. In the

unsteady transition classified by cessation of voicing, the microphone signal and the wideband

flow signal nearly mirrored one another. However, upon smoothing the wideband airflow signal,

negative airflow measures were noticed during this time of hyperadduction and no voicing. The

airflow signal in this instance demonstrated that there was an added physiological component

during vocal fold closure. One hypothesis for this negative airflow measure was that the larynx

was lowering (with hyperadducted vocal folds not allowing airflow from the ), resulting in

an increasing volume in the supraglottal vocal tract. This increasing volume would then have

been measured as negative airflow through the mask. Both the wideband and smoothed airflow

demonstrated this negative flow during this example of cessation of voicing. In this same

unsteady transition, the EGG signal revealed a sustained, nearly flat signal during cessation of

the voice, then a short vertical jump during creaky voicing followed by a flat signal again. This

trend continued until the vocal folds became slightly less adducted and normal voicing resumed.

In an unsteady transition classified by wobbling or unstable pitch, the wideband flow

signal had more noticeable fluctuations along the envelope of the signal compared to the

microphone signal. The smoothed flow signal revealed these fluctuations as well, although the

range of variation was only about 40 cm3/s. The EGG signal was the measure that revealed the greatest fluctuation as a result of the instability. The EGG signal showed a decrease in amplitude 20 as it reached the wobbling pitch in the transition, followed by vertical shifts in the signal, presumably representing vertical shifts of the larynx.

Another unstable transition through the primo passaggio was recognized by an abrupt frequency shift (perceived as a pitch drop in this example). The smoothed airflow signal showed a decrease of about 80-100 cm3/s at the point in the recording where the frequency shifted rapidly. The EGG signal showed a small vertical shift as well when the pitch shifted downward.

The final example of an unstable transition was an abrupt shift from to across the primo passaggio transition. At the abrupt shift from chest to head voice, the microphone signal showed a small reduction in amplitude, but otherwise remained stable. The wideband airflow signal showed a more pronounced reduction in amplitude and a shift upward.

The smoothed airflow signal revealed a similar upward shift of about 175 cm3/s during the abrupt change from chest to head voice, indicating a higher positive flow of air when the singer used head voice rather than chest voice. At this same point in the recording, the EGG signal showed a small decrease in amplitude of the signal, but did not show any vertical shift, indicating that the vertical position of the larynx was most likely relatively stable when shifting between chest voice and head voice. 21

CHAPTER III: RESULTS

To begin analysis, the recorded scales were categorized based on perceptual smoothness of the auditory signal. Both investigators agreed upon an appropriate level for each scale based on perceptual quality and perceptual dynamics of the scale. Six levels were initially established to represent the continuum of smoothness to unsteadiness; levels were labeled L1-L6, with category L1 being perceived as the smoothest and category L6 perceived as the most unsteady.

However, the six levels were then classified into three larger groups, marked as Smooth (L1,

L2), Middle (L3, L4), and Unsteady (L5, L6). When the three-group categorization was complete, the groups were nearly equal: 36.4% (32 of 88 scales) of the analyzed scales by these untrained singers were perceived in the Smooth category, 27.2% (24 of 88 scales) were perceived to be in the Middle category, and 36.4% (32 of 88 scales) in the Unsteady category.

These three overarching groups allowed for determination of salient features of unsteadiness based on observations and analyses of the acoustic, airflow, intensity, and EGG patterns.

Smooth Group

In the smooth group, the salient features tended to be the following: consistent voice quality throughout the scale, minimal pitch and intensity variations (with the exception of vibrato), and consistent airflow patterns and strategies. However, in the smooth group, there were often some instabilities that went undetected auditorily, including pitch overshoots and large airflow variations. Examples from the smooth group will be presented in this section.

Participant FP1 produce a perceptually smooth descending scale from C5 to C4 at the louder intensity level. This was the participant’s ninth token produced on an /i/ vowel. The contours for intensity, fundamental frequency fo, smoothed EGG, and smoothed airflow are 22

shown in Figure 1. In general, the signals indicate smooth changes from pitch to pitch across the

scale. The intensity contour shows an initial intensity near 80 dB at the top of the scale. The

intensity ends near 65 dB at the bottom of the scale, but this intensity descent of about 15 dB is

gradual and consistent. The fo contour shows the step-down pattern of a descending scale with

continuous phonation (i.e., no pitch breaks). Each pitch has relatively minimal fo variability, and

there are no major overshoots of pitch on each pitch change. This group of features indicates

perceptual smoothness in the corpus of the perceptually smooth scales by the participants.

1 2 3 4 5 6 7 8

Figure 1: FP1 Smooth Scale Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP1 on /i/ vowel, loud intensity level, from C5 to C4 pitches.

Of note is that the EGG signal shows larger DC shifts (i.e., shifting of the mean value of

the signal up or down) on the last four pitch changes compared to the first three pitch changes,

suggesting a vertical shift of the larynx in a specific direction on each of the pitch changes. It is

important to note that, because the EGG signal was not calibrated for the relation among

electrode placement. vocal fold level, and smoothed EGG value, there was little confidence in 23 stating which direction the larynx moved when there were upward or downward shifts in the smoothed EGG signal. The prominent DC shifts in the example of Figure 1 may indicate that the voice is in a different register, but there was no perception of a registration change across the scale. According to Figure 2, the EGG signals before and after the first large laryngeal shift (at

504.9 seconds) appear to be very similar in shape and size. The EGG signal before the laryngeal shift has an EGGW50 value of 0.28 and an average height around 0.06. The EGG signal after the shift has an EGGW50 value of 0.27 and an average height around 0.065. The difference between these measures is not significant; however, the height of the signals is relatively small, which may indicate only a small change in tissue contact throughout the scale. The cycle period is slightly increased for the signal representing “after” the shift, which is attributed to being on a lower fo. Based on this evidence, registration likely does not change over this scale.

Figure 2: Trend-removed EGG signals in FP1 Smooth scale.

The contour of the airflow signal in Figure 1 mirrors the fairly smooth contour of the intensity contour. The airflow shows an increase of nearly 50 cm3/s over the first two notes and 24

then has a gradual decrease over the rest of the scale, running parallel to the decreasing intensity

contour. Major inconsistencies in the airflow signal are not seen except for a decrease of 30

3 cm /s at 504.9 seconds (where the large laryngeal shift is also seen). Most of the other fo changes

result in an apparent small adductory strategy at the fo change boundary, where the flow

decreases on the fo change and then increases back to a stable value shortly after, when the fo is

also stabilized. This strategy for fo change is not uncommon and will be seen in other examples.

The gradual decrease in airflow suggests greater adduction as the pitch is lowered throughout the

scale.

More subtle measures were also used to reveal possible instabilities; these include the

rate of change of fo (fo time derivative) and fo rate of change compared to fo (fo derivative versus

fo). Figure 3 shows the fo derivative over time and Figure 4 shows the fo derivative against fo.

Seven negative peaks in Figure 3 and seven arches in Figure 4 can be seen clearly representing

the seven pitch changes. That is, as the pitch changes to its next lower note, the fo decreases, shown by the negative peak, which for this scale varies from about -200 Hz/s to -700 Hz/s.

However, the circled areas in Figures 3 and 4 represent locations in the scale where the fo is

more variable and more unsteady during the “constant” pitch of the scale (not during the move

from one pitch to another); the first location occurs during the first note of the scale and the

second location occurs during the penultimate note. The variability of these two notes is greater

than the variability seen for the other notes (between the large negative dips). In Figure 4, the

ovals represent the two sustained pitches just discussed, where the first and penultimate fo values

are not as tight and filled as the remaining notes. A tight dispersion indicates little variation of fo

during the sustaining of a pitch of the scale (suggesting a constant, “straight” tone), whereas a

wide dispersion suggests less control of the fo during the note (if there is vibrato, the dispersion 25

will be wider, see below). These subtle measures reveal more elusive instability that may not

always be perceived aurally but can still reside in the voice.

Figure 3: fo derivative over time for the FP1 smooth scale.

Figure 4: fo derivative over frequency fo for the FP1 smooth scale. The x-axis shows the fo descending left to right because the scale was descending.

Despite the noted fo variations in Figures 3 and 4, the scale was still perceived as

smoothly produced. The subtle measures apparently did not create any large acoustic changes 26

that could have affected the perception of the steadiness of the scale. As a result, this scale was

placed in L1 and classified as “Smooth.”

Participant FP2 created a perceptually smooth descending scale from Eb5 to Eb4 at a soft

intensity level. This was the 19th token produced by FP2 and was sung as a natural octave scale

for her. The contours for intensity, fo, smoothed EGG, and smoothed airflow are shown in Figure

5. The intensity contour shows an initial intensity near 63 dB at the top of the scale. The intensity

ends near 50 dB at the bottom of the scale, but the intensity descent is not as consistent nor as

smooth as the first L1 example discussed above. This may be due in part to the differing

loudness level. It apparently becomes more difficult to control intensity and subglottal pressure

the quieter the voice. The fo contour shows the step-down pattern of a descending scale with continuous phonation (i.e., no pitch breaks). Each pitch has minimal fo variability, with slight

pitch increases prior to each pitch change.

1 2 3 4 5 6 7 8

Figure 5: FP2 Smooth Scale 1 Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP2 on /i/ vowel, soft intensity level, from Eb5 to Eb4 pitches. 27

The smoothed EGG signal shows consistent DC shifts (i.e., shifting of the mean value of

the signal up or down) on all pitch changes, with slight variations and increased shifting near the

end of the scale. The similarities of the EGG shifts may signify a smooth scale. These DC shifts

are unique, however, because there is a minor negative EGG shift at the downward change of the

fo, followed by a positive shift after the new pitch has been established, suggesting that there is

another minor shift in laryngeal position during the relatively constant pitch.

The smoothed airflow shows a very specific pattern throughout the entire scale. At each

pitch change, the airflow decreases, possibly indicating an increased adductory strategy of the

vocal folds or a decrease in subglottal pressure. As displayed by the vertical line in Figure 5, the negative peak of the airflow occurs along the pitch change, near the fastest rate of change of the pitch. Additionally, the intensity is on an upward slope at the point of the negative peak airflow.

When the airflow reaches a positive peak, the intensity also reaches a positive peak and the fo is

relatively stable. The signals during the change of pitch from one note to the next suggest a

coordination of greater adduction (to decrease the flow momentarily) with reduced CT

contraction (to reduce the length of the vocal folds to lower the fo) with increased subglottal

pressure (to create a rise in intensity). While some airflow changes are greater than others

throughout the scale, the strategy remains the same and indicates a smooth and consistent

approach to producing the octave scale.

While this scale is classified as smooth, there are some minor instabilities revealed in the subtler measures of rate of change of the fundamental frequency. Figure 6 shows the fo

derivative over time and Figure 7 shows the fo derivative against fo. In Figure 7, the fourth, sixth,

and seventh fo values have obvious variation from the center of the note. However, based on

Figure 7, it is not possible to tell when (during the sustained pitch) these variations occur. Figure 28

6 provides the specific locations where the variations occur, all of which show a peak in the

opposite direction of the pitch change prior to the change (marked with circles on Figure 6).

Additionally, the fastest fo rate of change occurs on the second to third note, where the fo rate of

change reaches nearly -1350 Hz/s. In contrast, the slowest fo rate of change occurs on the pitch change from the first to second note, and only reaches about -550 Hz/s. It is important to note that the and first and fifth pitch changes are half steps rather than whole steps, which would likely lead to a slower fo rate of change. While these differences cannot always be specifically

heard and identified as the scale is being sung, the subtle measures catch these specific

instabilities and explain them based on fo changes.

Figure 6: fo derivative over time for FP2 smooth scale 1.

Despite the noted fo variations in Figures 6 and 7, the scale was still perceived as being

produced smoothly. The subtle measures did not reveal any large changes that could have

affected the perception of the steadiness of the scale. As a result, the scale was appropriately

classified as “Smooth.” 29

Figure 7: fo derivative over frequency for FP2 Smooth scale 1. The x-axis shows the fo descending left to right because the scale was descending.

FP2 created another scale perceived as smooth. This scale had some perceivable instabilities, but the instabilities did not cause the scale to be considered as unsteady for an untrained singer. The scale is an ascending scale from D4 to D5 at a loud intensity level. This was the 16th token produced by FP2 and was sung as a natural octave scale for her. The contours for intensity, fo, smoothed EGG, and smoothed airflow are shown in Figure 8. The intensity contour shows an initial intensity near 65 dB at the bottom of the scale and about 71 dB at the top of the scale, an increase of 6 dB. The intensity contour shows intensity rise-fall across the entire scale, where intensity increases and decreases again across each sustained pitch. The intensity reaches a minimum just before the maximum peaks of each frequency change, and along the descent of the flow contour for each pitch. The fo contour shows the step-up pattern of an ascending scale with continuous phonation (i.e., no pitch breaks), but there are visible pitch overshoots near 229.6 seconds and 230.3 seconds and wavering sustained pitches on the final three pitches. The wavering or variations of the last two pitches is also visible in the intensity 30

contour, which shows a “bumpier” contour rather than smooth arches (as seen across the prior

pitches of the scale).

8 7 6 5 3 4 2 1

Figure 8: FP2 Smooth Scale 2 Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP2 on /a/ vowel, loud intensity level, from D4 to D5 pitches.

The airflow contour shows a very interesting, yet understandable, pattern. The general

pattern includes a large increase of about 75 cm3/s across a single pitch where the minimum value occurs just after the pitch change ends and the maximum value occurs just as the pitch change begins. This pattern mirrors that of the intensity pumping, where the airflow increases across the sustained pitch and then quickly decreases and reaches a minimum during the pitch change. The minimum value of airflow occurs directly after the minimum value of intensity; the same sequence occurs for the maximum values between the airflow and intensity. This motion of the airflow in conjunction with the pattern observed in the intensity contour would support the notion that this participant was using increased subglottal pressure on each sustained pitch, then an increased adductory strategy on each pitch change. 31

One deviation from the airflow pattern was noticed just after 228 seconds in Figure 8, where the airflow does not increase across the second sustained pitch and then decreases well beyond the starting airflow of the pitch (by about 125 cm3/s compared to <50 cm3/s). At this same time point in the audio recording, there is little to no noticeable quality change in the voice, though it is possible to hear that the participant is breathier at the start of the scale than at the end of the scale. This is a more unique airflow contour to observe in an ascending scale because it is more often seen that the voice becomes breathier (e.g., has greater airflow) at the top of the scale, while this scale shows the opposite phenomenon.

The smoothed EGG signal seen in Figure 8 does not appear to have any marked changes in DC shift. The most noticeable shift in the smoothed signal occurs near 229.5 seconds, where the signal shift downward and then back upward in a rounded motion, which contrasts the sharper peaks usually seen with DC shifts. The non-smoothed raw EGG signal is present during the scale until the final two notes, where it appears the signal is not strong enough or the larynx shifted out of the detectable region. Figure 9 shows the EGG waveform shape at several different locations along the scale.

Figure 9: Trend-removed EGG signals across FP2 Smooth scale 2. 32

According to the figure, the amplitude of the signal decreased over the three points, beginning at

0.15 amplitude at the start of the scale and ending near 0.1 amplitude when the EGG signal drops out. This amplitude change is most likely due to the increase of pitch along the scale; the lengthening of vocal folds with higher pitches results in less tissue contact.

With the large drop in airflow between the second and third pitch, one may have expected a more drastic change in the shape of the raw EGG signal. However, the shape remains consistent across all three locations, specifically the presence of the extra peak on the right side of each maximum peak, called a “right fender”. As a result, the EGGW50 value remained nearly the same through all three locations, ranging from 0.185 to 0.21. The EGGW25 value is the value that shows the largest change among the three signals, measuring 0.585, 0.285, and 0.24

(from the first to the last grouping). This larger change is due to the inclusion of that right fender in the measurement of the EGG pulse width.

Based on the contours found in Figure 8, this scale contains more indications for instability and some level of instability was perceived audibly. Figures 10 and 11 below include the subtler measures of rate of change: fo rate of change and fo rate of change compared to fo, respectively. In Figures 10 and 11, the first four pitches and first three pitch changes display relatively steady sustained sound and reasonable rates of change, respectively. However, beginning with the fourth pitch change, unsteadiness is seen in the fo - fo derivative plot. Directly after the fourth and fifth pitch changes, there are large pitch overshoots into the negative direction (noted with circles in Figure 10). The smoothest scales generally do not display drastic pitch overshoots. After these overshoots, the final circle in Figure 10 highlights the large pitch variations of the final two pitches in the scale. These variations are not considered to be elements of unsteadiness. The audio recording reveals that the participant was apparently using, or 33 attempting to use, vibrato on the final two notes, which helps to perceivably smooth out the final notes of the scale.

Figure 10: fo derivative over time for FP2 smooth scale 2.

Figure 11 shows an important contrast seen between the last four notes. On the fifth and sixth note, the overshoot is recognized by a single line surrounding the established pitch. The final two notes show the visual representation of pitch variability, where the tight circle is never quite established, and instead there is a larger circular pattern around the intended pitch, reflecting the “vibrato” production.

Figure 11: fo derivative over frequency for FP2 Smooth scale 2. 34

This scale produced by FP2 was perceived to be smooth with minor instabilities.

However, based on the measurements and contours displayed, the scale contained more

instabilities and inconsistencies than were perceived while listening to the scale. This

classification may not be appropriate when combining perceptual opinions with signal

analysis/visual observations but does suggest that some overshoot of pitch at the pitch change

locations appears to be permissible from a smooth production orientation.

Middle Group

In the middle group, the salient features tend to be the following: perceptible but fairly smooth registration shifts, minor pitch bobbles, non-vibrato pitch variations, and quality changes across the scale. Scales perceived to be part of the middle group generally have overwhelmingly

smooth features with one or two instances of perceptible changes or instabilities.

FP3 created a scale that was perceived as part of the middle group. This scale had

perceivable instabilities and inconsistencies, but the instabilities did not cause the scale to be

considered fully unsteady. The scale is an ascending scale from C4 to C5 at a loud intensity

level. This was the 20th token produced by FP3 and was sung on an /i/ vowel. The contours for

intensity, fo, smoothed EGG, and smoothed airflow are shown in Figure 12. The intensity

contour shows an initial intensity around 73 dB at the bottom of the scale and 70 dB at the top of

the scale, indicating a relatively consistent intensity production across the scale. The intensity

contour shows minimal variation (within about 3 dB) throughout the scale until the final pitch,

where there is wavering that occurs in the opposite direction of (out of phase with) the fo

variation. The fo contour shows the step-up pattern of an ascending scale with continuous

phonation (i.e., no pitch breaks), but there are visible pitch overshoots on the first three pitch 35 changes and wavering on the final pitch. In the penultimate note, indicated by the circle, one of the more prominent areas of unsteadiness occurred. The participant completed the pitch change at 769 seconds, but then has an abrupt pitch variation (rise and fall) once again before settling into a relatively constant fo. In the audio, this is perceived as a voice quality change and a potential registration shift from a more modal to a more head register quality.

7 8 6 5 4 3 2 1

Figure 12: FP3 Middle Scale Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP3 on /i/ vowel, loud intensity level, from C4 to C5 pitches.

Figure 12 also shows that the smoothed airflow remains relatively stable near 80 cm3/s for the first 4 notes. Then, beginning on the fifth note, the airflow rises and approximately doubles by the end of the scale. This rise in airflow suggests less glottal adduction for the second half of the scale, with the rise beginning prior to the point of instability circled in Figure 12. The airflow dips during the first pitch and at the change to the second pitch, and subsequently has an airflow rise with its peak during the pitch rise for the last 5 pitches. The raw EGG waveforms are lost (i.e., the amplitude is nearly zero) right at the spot of the instability, so it was not possible to 36 measure changes in adduction specific to EGGW or amplitude. However, laryngeal changes must have occurred at that point of instability because of the loss of the EGG waveform. The largest smoothed EGG signal DC shift occurs between the fourth and fifth note, when the airflow begins to rise.

Because of the perceived change in voice quality across this region, a comparison of spectral slices of the audio signal before and after the instability was completed. Figure 13 shows two spectral slices, and two straight lines showing the slope of each spectral slice (based on the envelope over the first and third ). The blue contour represents a spectral slice taken near 766.7 seconds (-1.10*10-2 dB/Hz), which is well before the moment of instability. The red contour represents a spectral slice taken near 769.5 seconds (-1.59*10-2 dB/Hz), occurring after the moment of instability. The spectral slope is steeper for the slice taken after the point of instability (45% steeper), indicating a possible shift to head voice.

-2 1: -1.10*10 dB/Hz -2 2: -1.59*10 dB/Hz (difference of 74%)

Figure 13: Spectral slope comparisons for FP3 Middle Scale. The blue line represents a spectral slice taken at 766.7 seconds. The red line represents a spectral slice taken at 769.5 seconds. This is a spectral slice from Praat with window length of 0.005s. 37

The EGG waveform can also provide information across the scale. Although the EGG waveform was unavailable after the quality change, the waveform changes could be measured prior to the perceptual quality change, specifically across the largest DC shift where the airflow values began to increase. Figure 14 shows the change in the EGG waveform (height and width changing) when comparing trend removed raw EGG signals from before the DC shift and airflow increase (at 763.9 seconds, during the first pitch) and after the shift and airflow increase

(at 768.5 seconds, during the sixth pitch). The contours show that the height of the EGG signal at

763.9 seconds is nearly seven times the height of the signal at 768.5 seconds. Additionally, the widths of the signals are vastly different; the signal before the shift has an EGGW50 value of

0.63 and the signal after the shift has an EGGW50 value of 0.33. This large difference in the two signals indicate a potential registration shift occurring prior to the perceptual quality change, the

EGG analysis values suggesting less adduction and less vocal fold contact after the shift.

Figure 14: Trend-removed EGG signals for FP3 Middle Scale. The taller EGG waveform was taken at time 763.9 s, and the shorter at 768.5 s. 38

In the case of this scale, the subtler measures support the more significant instabilities heard, and possibly reveal additional instabilities. Figure 15, giving the fo rate of change shows the steady and consistent first six notes and five pitch changes. However, with the sixth pitch change, there is a double peak, where the initial pitch change attempt was made, followed by another abrupt pitch shift. This is uncommon and not seen in the smooth scales. Finally, the last pitch change is seen to be the fastest and had a large overshoot in the negative direction, followed by excessive pitch variation (not characterized by vibrato but rather vocal instability).

Figure 15: fo derivative over time for FP3 Middle scale.

Figure 16 shows the fo rate of change compared to fo. The point of instability (normal pitch change followed by abrupt frequency shift near 500 Hz) is demonstrated in this plot with a disruption in the rounded arch representing the pitch change. Instead, the plot shows two smaller arches between 460 and 510 Hz, where there is a large notch in the normal arch. The final pitch is shown to have significant variations based on the shape of the circular pattern surrounding the intended pitch. 39

Figure 16: fo derivative over frequency for FP3 Middle scale.

FP2 also created a scale that was perceived as part of the middle group. This scale had

perceivable instabilities and inconsistencies and was more closely comparable to the unsteady

scales than the smooth scales. Despite this, the scale is still perceived as having enough smooth

elements to maintain its classification in the middle group. The scale is a descending scale from

G4 to G3 at a loud intensity level. This was the 23rd token produced by FP2 and was sung on an

/a/ vowel. The contours for intensity, fo, smoothed EGG, and smoothed airflow are shown in

Figure 17. The intensity contour shows an initial intensity around 70 dB at the top of the scale

and about 60 dB at the top of the scale, a change of 10 dB. The drop in the intensity, however, is

smooth and gradual, with minimal interruption. One change of direction occurs at 248.5 seconds,

where the intensity increases by about 2 dB. Then the intensity returns to a gradual descending

pattern. The fo contour shows the step-down pattern of a descending scale with continuous phonation (i.e., no pitch breaks), but there are visible pitch overshoots on the fifth and sixth pitch changes and wavering sustained fo throughout the scale. On the fifth pitch change, there is an

abrupt quality change in the voice. 40

1 2 3 4 5 6 7 8

Figure 17: FP2 Middle Scale Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP2 on /a/ vowel, loud intensity level, from G4 to G3 pitches.

The airflow contour shows a unique and idiosyncratic pattern. Initially, the airflow values are relatively high with a pattern that shows a gradual increase over the first four notes, with a positive extra pulse of air at the pitch change locations, as if there were brief and slightly greater abduction at those locations. The airflow pattern then becomes irregular across the fourth and fifth pitches. On the fifth pitch change (at 247.8 seconds), the airflow decreases by more than 75 cm3/s, suggesting an increase in glottal adduction but not in subglottal pressure (because the intensity stays relatively constant). Across the sixth sustained pitch, the airflow decreases by about 50 cm3/s more. Then, on the sixth (penultimate) pitch change, the airflow dips another 75 cm3/s, again suggesting further glottal adduction, but again without subglottal pressure rise since the intensity actually decreases. The negative peak of the airflow at the pitch change between the sixth and seventh pitches lines up with the negative peak of the fo contour, and then the airflow

3 increases by about 50 cm /s with the fo again. Then during the seventh and eighth notes the 41

airflow reduces further, to about 140 cm3/s. This is the largest overall variation of airflow noted

in a scale up to this point; it is surprising that this scale was classified as part of the middle

category rather than the unsteady category due to the airflow variability. However, because the

airflow cannot directly be heard, and the intensity and fo remain relatively consistent and stable,

the middle category classification may be reasonable.

Because of the noted quality change, spectral slice comparisons and EGG waveform

comparisons were completed. Figure 18 displays the spectral slice comparison between a point

prior to the quality change and airflow shift (at 245 seconds during the second pitch) and a point

after the quality change and airflow shift (at 249 seconds during the seventh pitch). Two straight

lines were drawn as envelopes to the first and third formants. The slope of the spectrum prior to

the quality change is steeper (1.25 *10-2 dB/Hz for the blue line) than the slope of the spectrum after the quality change (1.03*10-2 dB/Hz for the red line, 21% less than for the other slope),

indicating that the participant likely shifted from a head voice register for the higher pitches to a

more modal register for the lower pitches.

-2 1: -1.25*10 dB/Hz -2 2: -1.03*10 dB/Hz (difference of 74%)

Figure 18: Spectral slope comparisons for FP2 Middle Scale. The blue contour represents a spectral slice taken at 245 seconds. The red contour represents a spectral slice taken at 249 seconds. These slices are from Praat with window length of 0.005s. 42

While the spectral slope change may indicate a registration shift, the EGG waveform change across the quality change may refute or support this claim. Figure 19 displays the EGG waveforms at 246.5 seconds (prior to the quality change) and 249 (after the quality change). The amplitude of the signal at both time points remains nearly equivalent. However, the EGGW50 value does change across the two time points. Prior to the quality change, the EGGW50 value was 0.26. After the quality change, the EGGW50 value was 0.37. The smaller value prior to the quality change indicates less tissue contact during phonation, which can be a characteristic of the head voice register. When the EGGW50 value increases after the quality change, there is more tissue contact with phonation, indicative of modal registration. Based on these findings, the conclusion can be drawn that the participant did shift registration during the scale, resulting in a perceptual quality change and consequent decrease in smoothed airflow (greater adduction),

EGGW50 increase (suggesting greater adduction), and decreased spectral slope (also suggesting greater glottal adduction). The participant attempted to navigate through this scale smoothly, but a shift was nonetheless obvious, resulting in the classification in the middle category.

Figure 19: Trend-removed EGG signal for FP2 Middle Scale. 43

Unsteady Group

In the unsteady group, the salient features tended to be the following: abrupt voice quality changes often attributed to registration, large pitch overshoots and variations (unrelated to vibrato), aphonic segments, loss of EGG signal waveform due to laryngeal height or tissue contact change, and unexpected intensity changes.

FP5 produced a scale that was classified in the unsteady group. This was the fifth token produced by FP5 and it was on an /a/ vowel at a louder intensity level. This scale had perceivable instabilities and inconsistencies. The scale is a descending scale from F5 to F4 at a loud intensity level. The contours for intensity, fo, smoothed EGG, and smoothed airflow are shown in Figure

20. The intensity contour shows an initial intensity around 77 dB at the top of the scale and about

70 dB at the bottom of the scale, a change of about 7 dB. The drop in the intensity occurs over the first four notes with considerable variation during each note (there is a 5 dB variation during the fourth note, for example). However, across the last four notes, the intensity contour stabilizes and hovers between 70-72 dB. The fo contour shows the step-down pattern of a descending scale with continuous phonation (i.e., no pitch breaks), but, like the intensity contour, the first four pitches show variations across the sustained pitches and inconsistent pitch changes. The last three pitches are relatively stable and consistent, with appropriate rates of change across the pitch changes. The fourth pitch change and the fifth pitch (beginning at 69 seconds) serve as a transition between the unsteady and inconsistent first half of the scale and the more stable and consistent second half. This distinct division between the two parts of the scale corresponds with a perceptual quality change, resulting in the unsteady classification. 44

1 2 3 4 5 6 7 8

Figure 20: FP5 Unsteady Scale 1 Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP5 on /a/ vowel, loud intensity level, from pitches F5 to F4.

The smoothed airflow pattern follows the distinct change halfway through the scale. At a pitch change the smoothed airflow dips to a minimum value, considered to be an adductory gesture, and then immediately increases to a value slightly higher (abductory) than the subsequent airflow value for the rest of the note, until the next pitch change. The airflow pattern emphasizes the adductory-related dip for the second and third note changes, whereas the abductory-related increase after the dip is emphasized for the latter part of the scale, at the 5-7 pitch changes. For example, at 67.6 seconds, the airflow starts around 120 cm3/s, drops to about

50 cm3/s, “rebounds” to about 140 cm3/s, and then settles back to about 100-110 cm3/s. At 69 seconds, the airflow pattern shows small variations, different from the previous adductory pattern. Beginning at 69.5 seconds, the airflow pattern becomes more abductory, where there is a minor decrease in airflow, followed by the airflow increase of about 50 cm3/s, and then decreases 45 back to the initial flow value. The difference in the two strategies is subtle, but the change in the airflow signal is aligned with the change in the perceptual signals of intensity and frequency at

69 seconds.

The smoothed EGG signal shows a dip at each pitch change after the first pitch change, but the complexity of the DC shifting of the signal increases during the second half of the scale, where there is a DC shift in the positive direction during each sustained pitch. The EGG waveform is not picked up well in the first half of the scale (the EGG height is minimized).

When the scale shifts to the second half at 69 seconds (where the patterns of the signals change), the EGG waveform is picked up so that the raw EGG can be seen. This could be an indication of laryngeal shift into the region of the EGG electrodes or increased vocal fold tissue contact.

Figure 21 displays the raw EGG signal across the scale, showing the waveform appearance at the halfway point (corresponding with 69 seconds in Figure 20).

Figure 21: Raw EGG signal across FP5 Unsteady Scale.

Because of the noted quality change, a spectral slice comparison was completed. Figure

22 displays the spectral slice comparison between a point prior to the quality and contour changes (at 67.3 seconds during the second note of the scale, the blue spectrum) and a point after the quality and contour changes (at 70.5 seconds during the seventh note of the scale, the red spectrum). Two straight lines were then drawn to illustrate the slope of each spectrum between the first and third formants. The slope of the spectrum prior to the quality change is steeper (-

1.90*10-2 dB/Hz compared to -1.09*10-2 dB/Hz, a difference of 74%), indicating that the participant likely shifted from a head voice register to a modal register. The difference between 46

the slopes of the two spectra is quite prominent, highlighting the reasonable cause for the

unsteady categorization.

-2 1: -1.90*10 dB/Hz -2 2: -1.09*10 dB/Hz (difference of 74%)

Figure 22: Spectral slope comparisons for FP5 Unsteady Scale. The blue contour represents a spectral slice taken at 67.3 seconds. The red contour represents a spectral slice taken at 70.5 seconds. These slices are from Praat with window length of 0.005s.

Finally, the subtler measures support the significant instabilities perceived. Figure 23 shows the unsteady and inconsistent first four notes, with the fifth note serving as a transition note between the two parts of the scale. The overshoot following the second pitch change is the most prominent, producing a positive peak of nearly 300 Hz/s. Each of the first four sustained pitches has quite a bit of variation and inconsistency, especially when compared to the last three notes. The fifth note shows the greatest amount of variation and instability, varying from 250

Hz/s to -350 Hz/s. The final three notes contain pitch overshoots from 150-250 Hz/s, which may still be considered unsteady, but the last three pitches only vary from about 100 Hz/s to -100

Hz/s. The second half of the scale is much smoother than the first half, but this distinct difference in the scale results in the perception of unsteadiness and inconsistency. 47

Figure 23: fo derivative over time for the FP5 Unsteady scale.

Figure 24 shows the fo rate of change versus fo. While the peaks of the rates of change remain fairly consistent throughout the scale, the pitch variation becomes clearer in this phase plot. The third, fourth, and fifth sustained pitches in this figure display clear variation by straying from the desired tightness of the point. The third note shows a large pitch variation prior to the establishment of the true pitch; the fourth pitch circles around the intended heart of the pitch but never creates a tight circle at that point; the fifth pitch shows more accurate pitch establishment, but greater fluctuation of the rate of change.

Figure 24: fo derivative over frequency for the FP5 Unsteady scale. 48

The many indications of instability and inconsistency resulted in this scale being categorized in the unsteady group. This may be attributed to the fact that the second half of the scale did smooth out with no pitch breaks, which was reflected in both the fo and the intensity.

Additionally, the fo rates of change were consistent throughout the scale, which may have resulted in the perception of some level of stability.

FP1 produced perceptually inconsistent ascending scales from C4 to C5. The contours for intensity, fo, smoothed EGG, and smoothed airflow are shown in Figure 25 below.

8 7 6 5 4 3 2 1

Figure 25: FP1 Unsteady Scale Contours Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP1 on /i/ vowel, soft intensity level, from C4 to C5 pitches.

The intensity contour shows an initial intensity near 70 dB at the bottom of the scale. The intensity ends near 75 dB at the top of the scale, only increasing by a net of 5 dB across the whole scale. The shape of the intensity trace is inconsistent, with small increases, or “pumping,” over most of the sustained pitches. There are also two significant abrupt drops of the intensity, occurring near 81.25 seconds and 84.8 seconds, where the intensity drops by more than 30 dB 49 and more than 20 dB, respectively. Compared to the intensity contour seen in Figure 1, this contour is inconsistent and unsteady across the whole scale.

The fo contour shows the step-up pattern of an ascending scale. The first four pitches appear to have minimal variability, but then the last four pitches vary more. Specifically, the circled location on Figure 25 shows a large pitch overshoot followed by pitch variability, or

“wobbling,” which may be interpreted as an attempt to establish a stable pitch. Figure 26 shows a zoomed in image of the four contours during the first large intensity drop. It can be seen that the voice actually ceases for about 100 ms, resulting in a pitch break (pitch cessation) and intensity drop.

Figure 26: FP1 Unsteady Scale Contours – first intensity drop. Zoomed in from 80.5 seconds to 82 seconds.

In Figure 27, the second intensity drop is about 10 dB less than the previous drop, and is accompanied by continuous phonation, rather than a pitch break. 50

Figure 27: FP1 Unsteady Scale Contours – second intensity drop. Zoomed in from 84.25 seconds to 85.5 seconds.

The smoothed EGG signal suggests large laryngeal shifts that occur on the first four pitch changes followed by smaller laryngeal movements on the last three pitch changes. There is a fast and large laryngeal shift at the end of the last note in the scale, possibly a result of extrinsic muscle contractions toward the top of the singer’s range in this modal register. While the raw

EGG signal was not clear enough to analyze for width changes, Figure 28 shows a trace of the

EGG signal before the pitch break (at 80.9 seconds), and a trace after the pitch break (at 81.75 seconds).

Figure 28: FP1 Unsteady Scale raw EGG signal. Signal taken before and after pitch break at 81.25 seconds. 51

These traces reveal that the signal shape is similar before and after the voice break, as well as the amplitude and the EGG width (EGGW50). Because of the similarities of the traces, the vocal folds were likely vibrating similarly before and after the pitch break, with similar change in tissue contact on phonation.

Specific to airflow, the participant appears to have used an increased subglottal pressure strategy on the pitch change, where airflow increases by about 15 cm3/s and then returns to (or close to) the original airflow level. At the two pitch changes with the large intensity drop, the airflow patterns vary, indicating a perceptual change influenced by an aerodynamic change.

Similarly, each sustained pitch results in a small increase of airflow (<10 cm3/s); however, the two notes following the intensity drops experience a slightly different pattern. See explanations following Figures 26 and 27 for specific detail.

Figure 26 shows the initial intensity drop and the pitch break. The airflow is stable to start, then in preparation for the pitch change, the subglottal pressure is most likely increased, and airflow begins to increase as a result (similar to the pattern on all other pitch changes). The airflow reaches its peak (where the first black line is drawn) and starts to decrease, but there is a voice cessation that then occurs. The airflow is decreasing first during the voice break, then hits a minimum and begins to slightly increase. This cessation is hypothesized to be a result of hyperadduction with increased glottal flow through the posterior gap to allow continuation of airflow. When the airflow hits the second peak (marked by the second black line), the voice has just come back in, meaning the vocal folds have changed contact configuration to allow for vibration of the membranous portion (rather than hyperadduction) and reduction of the posterior gap. As a result, the airflow decreases by about 20 cm3/s, indicating more even and consistent vocal fold contact. After the flow hits a minimum, the value increases by almost 50 cm3/s over 52

the following sustained pitch. This is novel, as the other sustained pitches remain around the

same flow value, +/-10 cm3/s.

Figure 27 shows the second intensity drop that occurs without a pitch break. The airflow

is stable to start, then in preparation for the pitch change, the subglottal pressure is most likely

increased, and airflow increases by about 20 cm3/s as a result. The airflow reaches its peak

(marked by the first black line) when intensity is on the downslope and frequency is just

beginning to increase at a faster rate. The airflow then begins to decrease; at the halfway point of

the airflow decrease, the intensity hits a minimum and begins to increase and the frequency

begins to level out. Airflow is likely decreasing, again, due most likely to adduction of the vocal

folds, which is also reducing the intensity of the sound. The minimum point for airflow (marked

by the second black line) occurs at nearly the same point as intensity reaches the original level

prior to the drop. The airflow, frequency, and intensity signals appear to stabilize, and the airflow

value increases by almost 50 cm3/s over the following sustained pitch. This is nearly identical to the flow pattern on the pitch following the first intensity drop, indicating a residual pattern following an instability. The other sustained pitches (not including the third and last pitches) remain around the same flow value, +/-10 cm3/s.

Scales perceived as inconsistent or unsteady are hypothesized to have instabilities shown

in the subtler measures as well. Figure 29 shows the fo derivative over time and Figure 30 shows

fo derivative against fo. In Figure 29, the second peak shows the voice break with a short flat line

at zero. Additionally, there is a smaller peak in the negative direction at this point, indicating a

movement of pitch in the negative direction prior to the voice break. The fifth pitch change

(marked by the larger circle) is not only where the faster positive change occurs, but also where

the largest pitch overshoot in the negative direction occurs. After the overshoot in the negative 53 direction, the pitch then continues in the positive direction, causing another smaller peak to be present. These larger pitch movements outside of the designated pitch change confirm the unsteadiness perceived during this scale.

Figure 29: fo derivative over time for the FP1 Unsteady scale.

Figure 30: fo derivative over frequency for the FP1 Unsteady scale. 54

Figure 30 confirms the instabilities observed in Figure 29. The second pitch change is not

a continuous arc, but rather it is broken up due to the voice break. The pitch overshoot on the

sixth note is very clear on this plot, represented by single lines surrounding the heart of the pitch

(near 450 Hz). The pitch change from the fifth to the sixth note is not only the fastest, but it is

more than 600 Hz/s faster than the first pitch change, causing a large contrast within the scale.

Lastly, the final pitch (near 525 Hz) shows large variation, spanning from about -300 units to 200

Hz/s. This variability may have impacted the large laryngeal DC shift in the smoothed EGG signal noted above.

The final scale was produced by FP2; however, this scale was voluntarily unsteady. FP2 was asked to produce a scale that was unstable to determine the possible instabilities that might happen, but are uncommon, even with untrained singers. The ascending scale was from G3 to

G4. This scale was sung at a loud intensity level and sung on an /a/ vowel. The contours for intensity, fo, smoothed EGG, and smoothed airflow are shown in Figure 31. The intensity

contour shows an initial intensity around 62 dB at the bottom of the scale and about 72 dB at the

top of the scale, an increase of 10 dB, suggesting that she got louder as she sang the scale. The

intensity contour shows the intensity pumping pattern seen previously. However, across the third

pitch change and fourth note, the intensity drops by about 3 dB and does not increase along the

sustained note, but rather remains steady. The intensity pumping pattern begins again with the

fifth note and continues until the end. The fo contour shows the step-up pattern of an ascending

scale with continuous phonation (i.e., no pitch breaks), but the third pitch change and fourth note

(at 205.5 seconds) show a very large pitch overshoot with difficulty establishing the desired

pitch. While there is variability present with sustained pitches, these are attributed to vibrato in

the voice and would not contribute to an unstable or inconsistent perception. The main instability 55 in this scale is not seen easily through the frequency and intensity contours; however, the participant’s voice has a very distinct quality change that occurs at 205.5 seconds but is not displayed in the auditorily perceptible contours.

7 8 6 5 3 4 2 1

Figure 31: FP2 Unsteady Scale Contours. Intensity (dB), Fundamental Frequency (Hz), Smoothed EGG (nondim), and Smoothed Airflow (cm3/s). Sung by FP2 on /a/ vowel, loud intensity level, from G3 to G4 pitches.

In this example it is the airflow and EGG signals that reveal the unsteadiness in the scale.

The airflow starts near 150 cm3/s at the beginning of the scale. On the third pitch transition (from the third to the fourth note at 205.4 seconds), the airflow rises from 175 cm3/s to about 375 cm3/s, considered to be an extremely large change. The pattern of the airflow is highly variable and does not begin to stabilize until the fifth note, when it rises across the sustained pitch and decreases on the pitch change. This established pattern in the second half of the scale is one that has appeared before, indicating that this may be a common pattern in untrained singers – the use of a subglottal pressure increase across sustained notes, with an adductory strategy on pitch changes. The fifth pitch change results in a large airflow increase (>150 cm3/s) due to presumed 56 abduction at the start of the pitch change. Then, the participant nearly immediately adducts on the second part of the pitch change, resulting in a return to the initial airflow value.

The EGG contours also reveal the adductory change that occurs across the large airflow change. Figure 32 displays the EGG contours before and after the large airflow change, taken at

205.1 seconds and 206 seconds, respectively. Prior to the change that occurs near 205.5 seconds, the EGGW50 value is 0.48 and the amplitude is near 0.25. Directly after the largest airflow change, the EGG50W value reduces to 0.27 and the amplitude also reduces, to near 0.18. This large change in the EGGW50 value indicates large tissue contact reduction and a significant decrease in glottal adduction. This is a logical consistent with the increased airflow after this change.

Figure 32: Trend-removed EGG signals for FP2 Unsteady Scale.

Comparison of spectra can also suggest a registration change. Figure 33 provides the two spectral slopes across the large airflow change. The slope near 205 seconds, the blue line, is -

1.04*10-2 dB/Hz, and the slope near 205.7 seconds, the red line, is -1.42*10-2 dB/Hz, 37% greater. The slope increases after the airflow change, likely indicating a registration change from 57 modal to head register. The difference between the slopes of the two spectra, when combined with the prominent airflow change, large EGGW50 value change, and perceptual quality change, it is reasonable to hypothesize that the participant experienced an abrupt register change.

-2 1: -1.04*10 dB/Hz -2 2: -1.42*10 dB/Hz (difference of 74%)

Figure 33: Spectral slope comparison for FP2 Unsteady Scale. The blue contour represents a spectral slice taken at 205 seconds. The red contour represents a spectral slice taken at 205.7 seconds. These slices are from Praat with window length of 0.005s.

Finally, this scale provided an example of an obvious difference in the fo derivative plots.

Figure 34 shows the fo derivative over time and Figure 35 shows fo derivative against fo. As seen in Figure 34, the third pitch change has the fastest rate of change by over 800 Hz/s. This stark inconsistency in the rate of change is a subtle but contributing factor to the classification of this utterance as unsteady. The rest of the rates of change remain near the same value, isolating the third pitch change even further. 58

Figure 34: fo derivative over time for the FP2 Unsteady scale.

Figure 35: fo derivative over frequency for the FP2 Unsteady scale.

Figure 35 also reveals the fast rate of change of the third pitch change but more readily displays the participant’s difficulty establishing an appropriate pitch for the fourth pitch. The contour swings up near 290 Hz momentarily, but then bobbles back down to 260 Hz, which is the more appropriate pitch in this G major scale. However, because of the time it took the 59 participant to reach the appropriate note, she did not have time to fully establish a clear sustained pitch prior to moving to the next pitch in the scale. As a result, the unsteadiness continues up the scale, with visible pitch overshoots until the final note.

Correspondence among the various measures (amount and direction of changes, timing of changes, as well as the relation among acoustics, the EGG signal, and flow) are explored through

Table 2 and Appendix A. The table is filtered first by the level categorization each scale received

(L1 to L6), and secondarily filtered by the total number of instabilities found in the scale. The instabilities listed in the table include all the different instabilities found in the signals from all participants’ tokens. The number of instabilities within a single scale ranges from zero to fourteen. For the scales classified in the smooth category (L1 and L2), the total number of instabilities ranged from zero to six, but 23 of the 32 smooth scales had zero to three instabilities within the scale. For the scales classified in the middle category (L3 and L4), the total number of instabilities ranged from three to seven, but 18 of 24 middle scales had four to six instabilities within the scale. For the scales classified in the unsteady category (L5 and L6), the total number of instabilities ranged from three to fourteen, but 26 of the 32 unsteady scales had six or more instabilities within the scale. The middle category had five scales that had a total of six instabilities in the scale, while the unsteady category had twelve scales with a total of six instabilities in the scale. Although there is overlap in the number of total instabilities across the categorization, the scales perceived to be smooth had fewer measurable instabilities than the scales perceived to be unsteady. 60

Table 2: Instabilities across all analyzed scales. Group Categorization (Smooth [S], Middle [M], S M U Unsteady [U]) Total scales 32 24 32

Frequency: Aphonic segment 0 0 14

Intensity: change by >10dB at once 0 0 29 Airflow: change >150 cm^3/s across pitch 5 9 12 change Frequency: abrupt shift or wobble (unrelated to 3 8 8 vibrato) Perceptual voice quality change 1 11 18 Airflow: Inconsistent airflow changes on pitch 8 8 16 changes Rate of Change: Pitch overshoot >375 Hz/s on 6 18 27 pitch change Rate of Change: Maximum positive peak to 7 11 17 maximum negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max negative to min negative peak 4 6 12 difference >900 Hz/s EGG: amplitude change by >30% 19 14 12

EGG: EGGW50 value change by >0.1 13 26 15 EGG: Inconsistent direction or pattern of DC 14 17 23 shifts on each pitch change EGG: waveform signal dropout 3 9 17

Total Instabilities 83 137 220 61

CHAPTER IV: DISCUSSION The results allowed prioritization of measures relative to importance or salience. Some measurable instabilities included: abrupt frequency shifts or fo variations (other than vibrato), with continuous sound; aphonic segments, with an absence of sound; glottal adductory shifts relative to EGGW measures of the EGG signal; unusual DC shifts of the EGG signal; abrupt intensity changes; and inconsistencies of fo rate of change. Airflow instabilities included abrupt flow changes due to abrupt adductory changes and/or hypothesized transglottal pressure changes.

One key finding discovered through initial categorization of the scales is that untrained can produce smooth octave scales when asked to sing “naturally” or “comfortably.” One initial hypothesis was that the untrained singers would have more difficulty producing smooth scales, and instead would produce more obvious and “clunky” instabilities (i.e., scales classified as L6), such as the final example in the level 6 category. Instead, the untrained singers produced many scales that were categorized in the smooth and middle categories (L1-L4), so we included one voluntarily unsteady scale to observe the measurements that resulted from the most unsteady scales not often produced.

One major instability that occurred in several scales was the presence of an aphonic segment, or fo break. This instability was present in ten different scales, and seven of the ten scales were classified as L6, while the remaining three were classified as L5. The aphonic segments that were classified in L5 were very short, and thus perceived as slightly smoother than longer aphonic segments. Aphonic segments only occurred in scales that were classified as being unsteady; this indicates that abrupt breaks or aphonia (regardless of the length of time of aphonia) represents unsteadiness in singing. In addition to aphonic segments, abrupt changes in intensity greater than 10 dB occurred often in the scales. These large changes in intensity occurred in 19 different scales, often occurring more than once in a single scale. The intensity 62

change was present in all ten of the scales with aphonic segments, but there were nine additional

scales that had large intensity changes with no pitch dropouts. Eleven of the 19 total scales were

classified as L5, while the remaining eight were classified as L6. Once again, these large

intensity changes were only present in scales that were classified as unsteady. Therefore, in

addition to aphonic segments, large sudden intensity changes appear to indicate a certain level of

unsteadiness.

Airflow changes were explored across all scales. Sudden smoothed airflow changes of

greater than 150 cm3/s were rarely observed in smooth scales; only four scales of the 32 smooth scales contained these large airflow changes. The occurrence of these sudden airflow changes was more common in scales categorized in the middle or unsteady groups but was inconsistent.

This may be due in part to the fact that smoothed or average airflow is not a perceptual signal; it is very difficult to hear airflow changes unless they alter a perception of turbulent airflow. It is

not clear based on the collected samples if there is a perceptual correlate with airflow changes.

However, ten of the twenty scales with abrupt large airflow changes had concurrent perceived

voice quality changes across the scale, as well as changes in EGGW50 values, suggesting a

change of adduction (greater adduction with reduced flow). It is more likely that these airflow

changes are idiosyncratic depending on the participant’s strategies for adduction, pitch change,

and subglottal pressure control, and may not have a consistent perceptual correlate.

Large differences in the fo rates of change were more common in the unsteady scales than in the smooth scales. Pitch overshoots of greater than 375 Hz/s occurred in 20 of the 32 unsteady scales, while this level of change occurred in only four of the 32 smooth scales. The rate of change measurement between the maximum positive peak and maximum negative peak was greater than 1600 Hz/s in 17 of the 32 unsteady scales, while present in only seven of the 32 63

smooth scales. Finally, the rate of change measurement between the maximum positive to

minimum positive peak for an ascending scale, or the maximum negative peak to the minimum

negative peak for a descending scale, was greater than 900 Hz/s in 12 of the 32 unsteady scales,

while it was only present in four of the 32 smooth scales. This indicates that the increased rates

of change measurements are not specifically indicative of unsteady scales, but they are more

likely in unsteady scales than in smooth scales.

The changes in the amplitude and width measurements of the EGG signals between smooth and unsteady scales were varied compared to the other measures. The EGG amplitude changed by more than 30% in 12 of the 32 unsteady scales, while it changed in 19 of the 32 smooth scales. The EGGW50 changed by more than 0.1 in 15 of the 32 unsteady scales, while it changed in 13 of the 32 smooth scales. There does not appear to be a clear indication of these instabilities or changes in the perceptual classification of the scale. Similar to the airflow measures, the EGG measures may be specific to the adductory strategies of the individual participant and the change in this measurement alone may not have a direct implication on how smooth a scale is perceived. 64

CHAPTER V: CLINICAL IMPLICATIONS AND FUTURE DIRECTIONS

Instabilities that pertain to those seen in speech production include: aphonic segments, frequency instability, abrupt intensity changes, and obvious registration shifts. These instabilities would most often be seen with clients who have dysphonia, such as clients with or muscle tension dysphonia. Additionally, clients with vocal fold lesions may experience similar instabilities. Puberphonia may be a condition where a client may experience primarily exacerbation of obvious registration shifts, among the other instabilities. The complementary aspect of the set of signals used in this project can be applied to clinical practice.

The concept of the fundamental frequency derivative plots can be applied to the clinical setting; these plots can provide information about how the voice changes in time, especially in terms of subtle and fast changes. The data from the signals could be used to focus clinical intervention specific to the individual client based on the salient instabilities. The instability measures used in this project could also be used to document objective clinical findings of clients with voice disorders. Importantly, this study suggests that the objective measures (e.g., related to fo

derivative and smoothed airflow) may be more indicative of smoothness of production than

perceptual measures. These objective measures become increasingly important when the voicing

production contains continuous phonation with consistent intensity; these perceptual

characteristics are the most salient but when they are absent, objective measures can provide

additional information needed to determine smoothness in the voice.

This research can also be applied to singing pedagogy for understanding the untrained

singer. This project helps to inform a singing teacher about what is occurring acoustically,

aerodynamically, and physiologically when there is instability in singing by the untrained. This

should give insight into what to change to improve the singing. 65

Further research should be performed to determine concrete descriptions of the changes

in physiology of the voice mechanism across smooth and unsteady scales. First, it would be

beneficial to visualize the larynx via high speed video during the singing of these scales. With

this additional information, one could align changes in the EGG signal specific to adduction and

tissue contact to changes seen in the video recordings specific to lengthening of the vocal folds,

size of the posterior gap, vocal fold excursion, and adduction. Additionally, adding the visual

imaging would allow for defining specific instabilities further. This could include defining what

occurs at the level of the vocal folds when an aphonic segment is present in a scale, or better

understand the glottal configuration when large and abrupt airflow changes occur, especially

relative to adduction. Visualization of the larynx would add an additional layer to the research

that would allow for more specific conclusions about the changes in physiology that correspond

with the changes in the given signals.

Another opportunity for further research could include the use of oral air pressure to

approximate subglottal pressure. The use of this measure would assist in differentiating changes

in adduction versus changes in subglottal pressure, such as airflow increases across a sustained

pitch (seen clearly in Figure 8). Understanding the difference between the variation of use of

subglottal pressure and adductory strategies would better inform voice professionals and singing

teaching about the strategies that untrained singers use when producing scales.

The next step for this project would be to apply discriminatory measures in order to predict the smoothness of sung scales. Because of the exploratory nature of this projects and the results yielded, it would be appropriate to build theoretical and computational models off of the results for unsteadiness measures. 66

CHAPTER VI: CONCLUSIONS

The study describes perceptual and objective measures of unsteadiness in singing scales through a methodological approach. This study provides a multi-signal description and analysis of vocal instabilities in untrained female singers. The primary aim of this study was to identify voice instabilities during singing through the singer’s passaggi and describe them relative to changes in fo, airflow, intensity, inferred adduction, and acoustic spectra.

The primary findings of this project are the following:

1. Untrained singers can sing smoothly across octave scales – 36.4% of the total number of

scales analyzed were perceived as smoothly sung.

2. The rate of change of fo reveals unsteadiness that appears to be related to subtle aspects

of register change and vocal control.

3. The smoothed airflow and EGG signals also appear to be sensitive measures for register

change and vocal control, especially relative to adduction.

4. The primary unsteadiness variable was an aphonic segment and corresponding abrupt and

large intensity reduction.

5. Register shifts from modal to head register appeared to correspond to steeper spectral

slopes, increased smoothed airflow, and decreased EGG waveform height and width.

6. Some of the objective measures appear to be more visually and numerically salient than

auditory perceptual measures relative to unsteadiness in the production of scales, such as

the fo derivative and the abrupt flow changes, especially when more obvious perceptual

unsteady characteristics are not present (e.g., aphonic segment, abrupt intensity change). 67

REFERENCES

Bernadin, S., Morris, R., Ellerbe, L., & Kessela, D. (2015). Investigating acoustic and

electroglottograph features to characterize passaggio in female singers. Proceedings of

Meetings on Acoustics, The Journal of the Acoustical Society of America, Vol 21

(035005), pp 1-9.

Boltez̆ ar, I. H., Burger, Z. R., & Z̆ argi, M. (1997). Instability of voice in adolescence: Pathologic

condition or normal developmental variation?. The Journal of Pediatrics, 130(2), 185-

190.

Cielo, C. A., Elias, V. S., Brum, D. M., & Ferreira, F. V. (2011). Thyroarytenoid muscle and

vocal fry: a literature review. Revista da Sociedade Brasileira de Fonoaudiologia, 16(3),

362-369.

Colton, R. H., & Hollien, H. (1973). Perceptual differentiation of the modal and falsetto

registers. Folia Phoniatrica, 25, 270 – 280.

Echternach, M., Burk, F., Burdumy, M., Herbst, C. T., Koberlein, M., Dollinger, M., & Richter,

B. (2016). The influence of vocal fold mass lesions on the passaggio region of

professional singers. The Laryngoscope, 1-10.

Garcia, M. (1924). Art of singing. (A. Garcia, Ed.). London: Leonard & Co.

Henrich, N. (2006). Mirroring the voice from Garcia to the present day: Some insights into

singing voice registers. Logopedics Phoniatrics , 31, 3–14.

Herbst, C., Hess, M., Muller, F., Švec, J., & Sundberg, J. (2015). Glottal adduction and

subglottal pressure in singing. Journal of Voice, 29(4), 391-402.

Hollien, H. (1974). On vocal registers. Journal of , 125–143.

Kochis-Jennings, K. A., Finnegan, E. M., Hoffman, H., & Jaiswal, S. (2012). Laryngeal muscle 68

activity and vocal fold adduction during chest, chestmix, headmix, and head registers in

females. Journal of Voice, 26(2), 182–193.

Kochis-Jennings, K. A., Finnegan, E. M., Hoffman, H. T., Jaiswal, S., & Hull, D. (2014).

Cricothyroid muscle and thyroarytenoid muscle dominance in vocal register control:

Preliminary results. Journal of Voice, 28(5), 652.e21–652.e29.

Miller, D. G., Švec, J. G., & Schutte, H. K. (2002). Measurement of characteristic leap interval

between chest and falsetto registers. Journal of Voice, 16(1), 8–19.

Morris, R. J., Okerlund, D., & Bernadin, S. (2015). Differentiated electroglottograph and audio

signal measurements of vocal fold closed quotient during a register change: Single note

data. Journal of the Acoustical Society of America, 137(4), 2405-2405.

Morris, R. J., Okerlund, D. A., & Craven, E. A. (2016). First passaggio transition gestures in

classically trained female singers. Journal of Voice, 30(3), 377.e21-377.e29

Morris, R. J., Okerlund, D., & Dolly, C. E. (2012). Acoustic and physiologic measures of register

transitions sung by females. The Journal of the Acoustical Society of America, 132(3),

2003.

Murry, T., Xu, J. J., & Woodson, G. E. (1998). Glottal configuration associated with

fundamental frequency and vocal register. Journal of Voice, 12(1), 44-49.

Roubeau, B., Chevrie-Muller, C., and Arabia-Guidet, C. (1987). “Electroglottographic study of

the changes of voice registers,” Folia Phoniatrica, 39, 280–289.

Roubeau, B., Henrich, N., & Castellengo, M. (2009). Laryngeal vibratory mechanisms: The

notion of vocal register revisited. Journal of Voice, 23(4), 425-438.

Selamtzis, A. & Ternstrom, S. (2014). Analysis of vibratory states in phonation using spectral

features of the electroglottographic signal. Journal of the Acoustical Society of America, 69

136(5), 2773-2783.

Sundberg, J., Gu, L., Huang, Q., & Huang, P. (2012). Acoustical study of classical Peking Opera

singing. Journal of Voice, 26(2), 137-143.

Švec, J., & Pešák, J. (1994). Vocal breaks from the modal to falsetto register. Folia phoniatrica

et logopaedica, 46(2), 97-103.

Švec, J. G., Schutte, H. K., & Miller, D. G. (1999). On pitch jumps between chest and falsetto

registers in voice: Data from living and excised human larynges. Journal of the

Acoustical Society of America, 106(3), 1523–1531.

Švec, J. G., Sundberg, J., & Hertegård, S. (2008). Three registers in an untrained female singer

analyzed by videokymography, strobolaryngoscopy and sound spectrography. Journal of

the Acoustical Society of America, 123(1), 347–353.

Titze, I. R. (1988). A framework for the study of vocal registers. Journal of Voice, 2(3), 183–

194.

Titze, I. (2014). Bi-stable vocal fold adduction: A mechanism of modal-falsetto register shifts

and mixed registration. Journal of the Acoustical Society of America, 135(4), 2091-2101.

Tokuda, I. T., Horacek, J., Švec, J. G., & Herzel, H. (2007). Comparison of biomechanical

modeling of register transitions and voice instabilities with excised larynx experiments.

Journal of the Acoustical Society of America, 122(1), 519-531.

Tokuda, I. T., Zemke, M., Kob, M., & Herzel, H. (2010). Biomechanical modeling of register

transitions and the role of vocal tract resonators. Journal of the Acoustical Society of

America, 127(3), 1528–1536.

Vaidya, S., & Vyas, G. (2006). Puberphonia: A novel approach to treatment. Indian Journal of

Otolaryngology and Head and Neck Surgery, 58(1), 20-21. 70

APPENDIX A: EXPANDED INSTABILITIES TABLE

Table 3: Instabilities across all analyzed scales – expanded for each scale. “UN” means unavailable. An “X” represents a single occurrence of the instability category. Participant FP2 FP4 FP5 FP1 FP4 FP2 FP3 FP4 FP3 FP3 FP4 FP5 Token Scale Number 19 5 3 9 12 22 1 1 3 18 4 14 Group Categorization (Smooth [S], Middle S S S S S S S S S S S S [M], Unsteady [U]) Frequency: Aphonic segment Intensity: change by >10dB at once Airflow: change >150 cm^3/s across pitch change Frequency: abrupt shift or wobble (unrelated to vibrato) Perceptual voice quality change Airflow: Inconsistent airflow changes on X pitch changes Rate of Change: Pitch overshoot >375 Hz/s on pitch change Rate of Change: Max positive peak to max negative peak difference >1600 Hz/s Rate of Change: Max positive to min pos. peak OR max X negative to min neg. peak difference >900 Hz/s EGG: amplitude UN X UN X X X X X change by >30% EGG: EGGW50 UN UN X X X X X X value change by >0.1 EGG: Inconsistent direction/pattern of X X X X X X shifts on pitch change EGG: waveform UN UN signal dropout Total Instabilities 0 0 0 1 1 2 2 2 3 3 3 3 71

Participant FP4 FP2 FP3 FP3 FP4 FP4 FP1 FP2 FP2 FP3 FP3 FP4

Token Scale Number 2 20 11 12 10 15 10 4 17 2 21 7 Group Categorization (Smooth [S], Middle S S S S S S S S S S S S [M], Unsteady [U]) Frequency: Aphonic segment Intensity: change by >10dB at once Airflow: change >150 cm^3/s across pitch change Frequency: abrupt shift or wobble X X (unrelated to vibrato) Perceptual voice quality change Airflow: Inconsistent airflow changes on X X pitch changes Rate of Change: Pitch overshoot >375 Hz/s X X on pitch change Rate of Change: Maximum positive peak to maximum X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max X X negative to min negative peak difference >900 Hz/s EGG: amplitude UN X X X X X X X X change by >30% EGG: EGGW50 value UN X X X X change by >0.1 EGG: Inconsistent direction or pattern of X X X X X X DC shifts on each pitch change EGG: waveform UN signal dropout Total Instabilities 1 2 2 2 2 2 3 3 3 3 3 3 72

Participant FP5 FP2 FP2 FP3 FP5 FP5 FP2 FP2 FP5 FP1 FP2 FP3

Token Scale Number 2 10 24 9 7 13 16 18 1 1 21 22 Group Categorization (Smooth [S], Middle S S S S S S S S M M M M [M], Unsteady [U]) Frequency: Aphonic segment Intensity: change by >10dB at once Airflow: change >150 cm^3/s across pitch XX X X X X X XX change Frequency: abrupt shift or wobble (unrelated to X X vibrato) Perceptual voice X X quality change Airflow: Inconsistent airflow changes on X X X X X X pitch changes Rate of Change: Pitch overshoot >375 Hz/s on XX XX X X pitch change Rate of Change: Maximum positive peak to maximum X X X X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max negative X X to min negative peak difference >900 Hz/s EGG: amplitude X X X X X X UN change by >30% EGG: EGGW50 value X X X UN UN change by >0.1 EGG: Inconsistent direction or pattern of X X X DC shifts on each pitch change EGG: waveform signal X X X X UN dropout Total Instabilities 3 4 4 4 4 4 5 6 3 4 4 4 73

Participant FP4 FP3 FP3 FP4 FP4 FP5 FP2 FP5 FP3 FP3 FP3

Token Scale Number 13 13 17 3 6 17 15 12 4 19 20 Group Categorization (Smooth [S], Middle M M M M M M M M M M M [M], Unsteady [U] Frequency: Aphonic segment Intensity: change by >10dB at once Airflow: change >150 cm^3/s across pitch X X X change Frequency: abrupt shift or wobble (unrelated to X X X vibrato) Perceptual voice X X X X X quality change Airflow: Inconsistent airflow changes on X X X X X pitch changes Rate of Change: Pitch XX overshoot >375 Hz/s on X X XX X X pitch change X Rate of Change: Maximum positive peak to maximum X X X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max negative X X to min negative peak difference >900 Hz/s EGG: amplitude X X X X X X X X X change by >30% EGG: EGGW50 value X X X X X X X X change by >0.1 EGG: Inconsistent direction or pattern of X X X X X X X X X DC shifts on each pitch change EGG: waveform signal X X X X dropout Total Instabilities 4 5 5 5 5 5 6 6 7 7 7 74

Participant FP2 FP1 FP2 FP3 FP3 FP1 FP3 FP4 FP5 FP2 FP2 FP2

Token Scale Number 2 6 23 5 16 2 10 14 10 7 3 5 Group Categorization (Smooth [S], Middle M M M M M M M M M U U U [M], Unsteady [U]]) Frequency: Aphonic X segment Intensity: change by X XX X >10dB at once Airflow: change >150 cm^3/s across pitch X X change Frequency: abrupt shift or wobble (unrelated to X X X X X vibrato) Perceptual voice quality X X X X X change Airflow: Inconsistent airflow changes on X X pitch changes Rate of Change: Pitch overshoot >375 Hz/s on X X XX X XX X X pitch change Rate of Change: Maximum positive peak to maximum X X X X X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max negative X X X to min negative peak difference >900 Hz/s EGG: amplitude change UN X X X X UN UN UN by >30% EGG: EGGW50 value UN UN X X X X X UN UN UN change by >0.1 EGG: Inconsistent direction or pattern of X X X X X X X DC shifts on each pitch change EGG: waveform signal UN X X X X UN UN UN dropout Total Instabilities 3 4 5 5 5 6 6 6 7 3 4 4 75

Participant FP2 FP2 FP1 FP3 FP3 FP3 FP3 FP5 FP5 FP1 FP5 FP1

Token Scale Number 1 9 8 6 7 8 14 5 8 3 4 7 Group Categorization (Smooth [S], Middle U U U U U U U U U U U U [M], Unsteady [U]) Frequency: Aphonic X X X X segment Intensity: change by XX XX XX X X XX >10dB at once Airflow: change >150 cm^3/s across pitch X XX change Frequency: abrupt shift or wobble (unrelated to X X X vibrato) Perceptual voice X X X X X X X X quality change Airflow: Inconsistent airflow changes on X X X X X pitch changes Rate of Change: Pitch overshoot >375 Hz/s on XX X XX XX X X pitch change Rate of Change: Maximum positive peak to maximum X X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max negative X X to min negative peak difference >900 Hz/s EGG: amplitude UN UN X X X X X UN change by >30% EGG: EGGW50 value UN UN X X X UN X UN change by >0.1 EGG: Inconsistent direction or pattern of X X X X X X X X DC shifts on each pitch change EGG: waveform signal UN UN X X X X X X X UN X X dropout Total Instabilities 5 5 6 6 6 6 6 6 6 7 7 8 76

Participant FP3 FP5 FP5 FP2 FP2 FP2 FP2 FP2 FP4 FP1 FP4 FP4

Token Scale Number 15 6 15 6 8 11 12 14 11 5 8 9 Group Categorization (Smooth [S], Middle U U U U U U U U U U U U [M], Unsteady [U]) Frequency: Aphonic X XX X X X X segment Intensity: change by X X XX X X X X X >10dB at once Airflow: change >150 cm^3/s across pitch X X XX X change Frequency: abrupt shift or wobble X (unrelated to vibrato) Perceptual voice X X X X X X quality change Airflow: Inconsistent airflow changes on X X X X X X pitch changes Rate of Change: Pitch XX overshoot >375 Hz/s XX X X X X X X on pitch change X Rate of Change: Maximum positive peak to maximum X X X X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max X X X X X X negative to min negative peak difference >900 Hz/s EGG: amplitude X UN UN X X UN X X change by >30% EGG: EGGW50 value X X UN UN X X X UN X X change by >0.1 EGG: Inconsistent direction or pattern of X X X X X X X X X X DC shifts on each pitch change EGG: waveform X X X UN UN X signal dropout Total Instabilities 8 9 10 5 6 6 6 6 6 7 7 7 77

Participant FP1 FP2 FP5 FP5 FP5

Token Scale Number 4 13 9 16 11 Level categorization 6 6 6 6 6 (L1-L6) Group Categorization (Smooth [S], Middle U U U U U [M], Inconsistent [I]) Frequency: Aphonic X X segment Intensity: change by XX XXXX >10dB at once Airflow: change >150 cm^3/s across pitch XX XX change Frequency: abrupt shift or wobble (unrelated to X X X vibrato) Perceptual voice X X X X quality change Airflow: Inconsistent airflow changes on X X X X X pitch changes Rate of Change: Pitch overshoot >375 Hz/s on X X X XX pitch change Rate of Change: Maximum positive peak to maximum X X X X negative peak difference >1600 Hz/s Rate of Change: Max positive to min positive peak OR max negative X X X X to min negative peak difference >900 Hz/s EGG: amplitude UN X X change by >30% EGG: EGGW50 value UN X X X X change by >0.1 EGG: Inconsistent direction or pattern of X X X X X DC shifts on each pitch change EGG: waveform signal X X X X dropout Total Instabilities 8 8 10 12 14 78

APPENDIX B: IRB APPROVAL