<<

Florida State University Libraries

Electronic Theses, Treatises and Dissertations The Graduate School

2008 A Choral Conductor's Reference Guide to Acoustic Choral Music Measurement: 1885 to 2007 Brenda Kaye Scoggins Fauls

Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected]

FLORIDA STATE UNIVERSITY

COLLEGE OF MUSIC

A CHORAL CONDUCTOR'S REFERENCE GUIDE

TO ACOUSTIC CHORAL MUSIC MEASUREMENT:

1885 TO 2007

By

BRENDA KAYE SCOGGINS FAULS

A Dissertation submitted to the College of Music in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Degree Awarded: Summer, Semester 2008

Copyright 2008 Brenda K. S. Fauls All Rights Reserved

The members of the committee approve the dissertation of Brenda K. S. Fauls defended on

June 23, 2008.

______André Thomas Professor Directing Dissertation

______Richard Morris Outside Committee Member

______Judy Bowers Committee Member

______Kevin Fenton Committee Member

The Office of Graduate Studies has verified and approved the above named committee members.

ii

To the Tom's in my life, I dedicate this document. One brought back the music, The other unlocked my heart. Life has begun anew.

iii ACKNOWLEDGEMENTS

The path of a doctoral degree is not walked alone. My journey has been blessed with the guidance and support of a loving family, wise mentors, and wonderful friends - indeed too many to name. I extend my sincere gratitude to the members of my committee. First, my deepest gratitude to my major professor, Dr. André J. Thomas, whose generous spirit and navigational fortitude provided for a future of dream fulfillment. To Dr. Judy Bowers, I extend my thanks for her continued modeling of trailblazing artistry and fierce dedication to excellence in choral music education. I would like to thank Dr. Kevin Fenton for his openness and guidance throughout my degree program. To Dr. Richard Morris, I extend my appreciation for his continued willingness to bridge our worlds with the generous sharing of knowledge, resources, and opportunities. In closing, I thank those who dedicated countless hours, resources, and motivation to the successful and joyous completion of this personal goal.

iv TABLE OF CONTENTS

List of Tables ...... vi List of Figures ...... vii Abstract ...... viii

1. INTRODUCTION Purpose of the Study ...... 1 Need For Study ...... 1 Delimitations ...... 1 Organization of Study ...... 1 Introduction of Topic ...... 2

2. SELECTIVE REVIEW OF LITERATURE Choral Blend ...... 6 Amplitude ...... 7 Formants ~Resonances ...... 10 Frequency ...... 12 Quality of Tone ...... 17 Registration ...... 27

3. HISTORY OF ACOUSTIC CHORAL MUSIC MEASUREMENT 1878-1969 ...... 37 1970-1979 ...... 44 1980-1989 ...... 52 1990-1999 ...... 66 2000-Present ...... 75

4. SUMMARY ...... 88

5. DISCUSSION AND CONCLUSIONS ...... 93

6. APPENDICES A. Glossary ...... 100 B. Equipment ...... 138 C. Respiratory System ...... 153 D. Laryngeal System ...... 155 E. Articulatory System ...... 157 F. Comparison of Different Interval Naming Systems ...... 159 G. Piano Pitch ~ Hertz Chart ...... 161 H. IPA English Chart ...... 163

7. REFERENCES ...... 164

8. BIOGRAPHICAL SKETCH ...... 176

v LIST OF TABLES

Table 1. Terminology Correlate Chart...... 3

Table 2. Register Definition by Physiological Activity...... 30

Table 3. Comparison of the Average Formant Frequencies of Timbre Types to the Average Formant Frequencies of Voice Classifications...... 49

vi LIST OF FIGURES

Figure 1. Amplitude Chart...... 4

Figure 2. Spectrogram of the Vowel /o/ at 131 Hz (C3) ...... 4

Figure 3. Singing Tasks...... 45

Figure 4. Warm-up Cadence ...... 54

Figure 5. Choral Formations ...... 74

Figure 6. Chamber Choir Spacings ...... 78

Figure 7. Organization of Choral Formation by Vocal Parts ...... 82

Figure 8. A Choral Exercise...... 83

vii ABSTRACT The study of choral sound is accomplished through acoustic choral music measurement. Physical acoustics are the aspects of sound that can be quantifiably measured and psycho- acoustics is how we perceive what we hear. This study of choral sound will focus on the measurable physical acoustic facets of amplitude, frequency and the quality of sound. These facets of acoustic choral sound have psycho-acoustical correlates of loudness, pitch and timbre. The success of individual singers within a choral setting is largely dependant upon the conductor's capacity to identify unconscious vocal habits and provide guidance for their ameliorated vocal function. A clear understanding of the acoustics of choral sound and the appropriate application of this knowledge can enable choral conductors to better facilitate the creation of a superior choral sound. To assist the conductor, appropriate solo and speech research literature has been included to provide an historical foundation and additional clarification of apropos subject matter. An extensive glossary has been provided in this document that codifies terminology from music acoustics, voice science, choral studies, voice studies, equipment guides and usage, mathematics, and statistics. The goal of this glossary is to facilitate the intermingling of many divergent disciplines present in this document and to provide a resource for reference when reading documents not included in this writing. The acoustics of choral sound are introduced to provide a unified document in a concise format that can serve as a springboard for informed practice, rehearsal and study.

viii CHAPTER ONE

PURPOSE OF THE STUDY

The purposes of this study were to provide a concise overview of the history of acoustic choral music measurement; to provide selective, applicable solo voice measurement studies for a foundational understanding of subject matter; to provide a detailed glossary of definitions, abbreviations, and equipment to aid understanding of acoustic choral music literature; and to provide suggested applications of the findings of acoustic choral music measurement.

NEED FOR STUDY

Acoustic choral music research is wide spread and can be difficult to access both logistically and physically. The wealth of diverse subject matter with each having its own specific language, equipment, and procedures makes it difficult to understand and apply to outside settings. The measurement process is continually changing, diverse and confusing. A concise, thorough reference source is needed to inform conductors, singers, and students alike.

DELIMITATIONS

The present study excludes two areas of acoustic choral music research: bone conducted sound and its effect on the singer, and children's choir research.

ORGANIZATION OF STUDY

A review of selected solo voice research articles and empirical choral sound articles are presented first by subject matter and then historically. The study begins with selected early investigations (1879- 1969) into singing research which have had direct impact on choral research. The subsequent chapters present sequential researches within each decade up to and including 2007. Following a summary of research to date, a discussion chapter is devoted to suggested choral applications of research findings for the choral conductor. The closing section is an equipment reference guide and a detailed, multi-subject glossary of terms, abbreviations, and procedures.

1

INTRODUCTION OF TOPIC Choral conductors have the enviable goal of bringing aural life to a composer's work through a group of individual singers who, once their voices are lifted together, create the choral sound. Consider Mayer's (1964) words: On occasion, when listening to a fine choir, one hears tone of such infinite beauty that it is evident that the sum is far greater than the parts; that is, the sound produced is of greater beauty than would normally be expected from the individual voices involved.1 Is this an occasion made possible by fate or by luck? No! This greater beauty is the result of an educated, well-informed conductor, who, while working with and developing exceptional, individual voices, continually strives toward an amalgamated excellence of such intertwined talents that only one sound is heard – a superior choir. What is required for a choir to be recognized as superior? How does a choral conductor aid the individual choir singers toward the ultimate choral sound – a perfectly blended choir? Most would agree that vowel unification, diction timing, loudness variance, pitch precision, vibrato amalgamation, timbre mergence, and choice of registration would be primary considerations. Each of these components is indigenous to choral sound and equally so, has measurable physical properties which can be examined within the acoustic study of choral sound. Acoustics is the study of sound. Acousticians and choral conductors alike are interested in sound – how sound is made, how a produced sound travels, and then how the sound is heard. How sound is made is the production of the sound. How sound travels is known as the propagation of the sound. How we interpret what we hear is the perception of sound. 2 What then is choral sound? Is there a way to measure a characteristic of a choral sound? The answer is yes and the field is known as acoustic choral measurement: the process of determining the dimensions and/or specifics of the sound of voices singing together. The study of choral sound employs both physical acoustics and psycho acoustics. Physical acoustics is the reality of sound – the aspects of sound that can be quantifiably measured. Psycho-acoustics is our reaction to sound – how we perceive what we hear. Each time a choir sings forte, is it the same degree of loudness as the previous time they sang forte? Choral conductors would agree that a choir will produce forte at a level that is in response to the prior level of sound. Amplitude

1 Mayer, F. (1964). The relations of blend and intonation in the choral art. Music Educator's Journal, 51 (1), 109- 110.

2 Hall, D. (1980). Musical Acoustics: An Introduction. Belmont, CA: Wadsworth Publishing Co. pp. 4-6.

2 is the physical measurement of the choir singing forte. If an acoustician were to measure each occurrence of forte singing in a song selection, the amplitude would most definitely vary – yet the choir, and conductor, may feel that the fortes were all equal. This is but one difference between what we perceive (psycho-acoustics) as compared to what we can measure (physical acoustics). This is an example of the crux of this document. As you can see, for choral conductors and acousticians to understand one another, an agreement in terminology is crucial. As choral conductors, we describe music with terms that express our perceptions of music; loudness, pitch, timbre and duration. The correlation of perception terminology to physical terminology is represented in the chart below. The perceptual component of loudness is relative to amplitude; pitch to frequency; timbre to the quality of the sound; and duration as it functions in time. Table 1: Terminology Correlate Chart Psycho-acoustics Physical Acoustics Measurement Abbreviation (Perceptual) Loudness Amplitude Decibels dB Pitch Frequency Hertz Hz Cycles-per- cps second kHz Kilo-hertz

Timbre Quality of Sound Formants, FN or RN Formant Frequencies, and Resonances Duration Length of Sound Milliseconds or ms Seconds Sec

Our discussion of acoustic choral measurement is now properly framed for both the choral conductor and the acoustician. Frequency is diagramed as the number of sound waves for a given duration of time. The sound waves' displacement is usually a measurement in Hertz (Hz), for instance A4 is 440 Hz or 440 sound wave cycles per second (cps). Notice in Figure 1 each cycle is periodic but each has different amplitude from the baseline.

3 Decibels x Milliseconds Figure 1: Amplitude Chart3 All three lines represent a sound that is the same frequency. Listeners would perceive all three sounds as being the same pitch. Pitch is that which we can discern as being within a continuum of low to high or high to low. Figure 2 is a spectrogram of a singing sample which shows us both the amplitude and the frequency of a recorded song sample. Here you will note that the x-axis is in decibels (dB) representing the amplitude of the sample. The y-axis is in kilo-hertz (kHz) representing the frequency of the singing sample.

Figure 2: Spectrogram of the Vowel /o/ at 131 Hz (C3)

3 (http://www.acs.appstate.edu/~kms/classes/psy3203/SoundPhysics/amplitude_waves.jpg)

4 Lagefoged explains the quality of sound quite simply: this is the difference between two notes that are equal in pitch and loudness but have been produced by different instruments, such as a piano and a violin.4 When choral conductors talk about the differences between voices they will often use descriptors such as warm, thin, or full. In other words, choral conductors often use the psycho-acoustical term timbre when talking about the physical acoustic component – quality of sound. The production of human sound requires the interaction of the respiratory system (air), the laryngeal system (vibrator), and the articulatory system (shaper). The respiratory system provides the energy – air – for sound production. The air moves from the lungs into the trachea until reaching the closed vocal folds (the vibrator of the laryngeal system). Air pressure increases until the vocal folds are forced apart and caused to vibrate. As the air moves between the vibrating vocal folds, sound is emitted. This is called the voice source,5 which is a rich spectrum of the harmonics; whole number multiples of the fundamental frequency. The sound now moves through the vocal tract (the mouth, throat, and nose)6 and is molded into speech sounds by the articulatory system (the tongue, lips, teeth, and soft palate). Depending upon the length, shape and degree of mouth opening, these cavities resonate at different frequencies and shape the sound source into vowels, consonants, and vocal colors that make up the sound you and I recognize as the human voice. These resonances, also known as formants, are distinct characteristics of the singer's morphology, training and habitual use of the voice. The success of individual singers within a choral setting is largely dependant upon the conductor's capacity to identify unconscious vocal habits and provide guidance for their ameliorated vocal function. A clear understanding of the acoustics of choral sound and the appropriate application of this knowledge can enable choral conductors to better facilitate the creation of a superior choral sound. To assist the conductor, appropriate solo and speech research literature has been included to provide an historical foundation and additional clarification of apropos subject matter. A conductor's conscious understanding of the individual's vocal production and its contribution to the synergized acoustical delivery of the ensemble creates that phenomenon known not only to audiences, but most especially to the creators of that unique experience – that which we know as the choral experience.

4 Ladefoged, P. (1996), 14.

5 Sundberg, J. (1987). The Science of the Singing Voice. Northern Illinois University Press: Dekalb, Illinois, p. 49.

6 Lagefoged, (1996), 92.

5 CHAPTER TWO

REVIEW OF LITERATURE

Choral Blend

The joining of individual voices to create a combined sound, a choir, requires choral blend. When one voice is heard above all others, choral blend is adversely affected. Cashmore (1964) points out that an individual's attempt to lead his or her vocal section is both vocally taxing and is a detriment to the growth of independent singers.7 Often though, an individual may have a larger voice than their fellow choir members. Voice instructors often have issue with conductors who ask such singers to sing with a minimized production in order for the choir to achieve an overall choral blend. To achieve this perfect choral blend, Mayer8 believed the focus needed to be on timbre, dynamics and pitch. Vibrato and the tuning accuracy of singers have great impact on the choir's overall intonation. His method for improving choir intonation involved both just and non-tempered tuning and began by tuning perfect octaves on the pitches of D4 and/or E4. Meyer would start with the bass section and then add each vocal section, one at a time at a mf level, until all of the singers were participating. Once the octaves were in tune, Meyer would move into perfect fourths and fifths, again centered on D4, and would remain there until the intervals were mastered. Moving gradually through this process, his choir was able to master intonation and thereby achieve a more perfect choral blend.9 This empirical approach is used by many fine conductors. F. Melius Christiansen and Weston Noble are recognized as two important American conductors of the twentieth century. Giardiniere, in his 1991 dissertation, explained Weston Noble’s re-definition of F. Melius Christiansen’s concept of voice matching to achieve choral blend. Christiansen directed singers to alter their sound to match the person(s) next to them, whereas Noble positioned singers next to other singers whose vocal character was similar. Noble believed an acoustic phenomenon would occur when voices were placed correctly. Recordings of Noble’s voice matching procedures of two to seven singers were compiled into a cassette perception survey and then was mailed to active choral musicians (N = 218). Auditors showed marked preference for Noble’s final arrangement of voices in more than half of the listening survey. Auditors were not consistent in responses which had only two voices (duets). This

7 Cashmore, D. (1964). A good performance. The Musical Times, 105 (1451), 56-57.

8 Mayer, F. (1964), 109-110.

9 Ibid.

6 study had many responses concerning the quality of the recordings, the process of mailing the tapes, the varying quality of the listening equipment, and its effect on listener preference.10 Amplitude Amplitude is the measurable physical attribute of what is perceived as loudness. Specifically, amplitude is the extent of the variation in air pressure from normal air pressure. When air pressure reduces, the sound is perceived as less loud and conversely, when the air pressure increases, the sound is perceived as louder. However, how much air pressure increases or decreases is not equal to how much louder or softer the sound is perceived.11 When measuring the sound pressure level (SPL) of a vowel sound, the amplitude of the voice source is the sound produced by the vocal fold vibrations. The main controlling element for this amplitude is subglottal pressure. Other elements involved are the relationship between the resonances' frequencies of the vocal tract and the partials present in the spectrum. When air pressure increases, the amplitude increases and conversely when air pressure decreases the amplitude decreases. Gramming (1991) designed the following experiments to study the effect of loud and soft phonation on the spectral envelope. In the first experiment, a female participant was recorded speaking the vowel /a/ at approximately 400 Hz, first in soft phonation and then in loud phonation. The loud phonation revealed all 12 partials below five kHz. However, in soft phonation, only the first two partials were present and the F0

(fundamental frequency) was stronger. The F0 (fundamental frequency) is the number of repeating cycles of the vocal folds in one second and is measured in Hertz (Hz). The first partial of a sound is also called the F0 (fundamental frequency). A partial of the sound is a component of a complex sound which can be the F0 (fundamental frequency), a harmonic of the F0 (fundamental frequency), or an overtone of the F0 (fundamental frequency). This single participant pilot study was utilized as a basis for the next study. Participants (N = 20, n = 10 women and n = 10 men, all with normal, untrained voices) were recorded speaking the vowel /a/ in soft and loud phonation. Overall, the F0 (fundamental frequency) remained louder than the partials in all participants' soft phonation. However, in loud phonation, a partial, which represented an overtone, was the loudest. An observed consistency occurred when the resonances frequencies remained the same although the F0 (fundamental frequency) increased, in loud phonation, the strongest partial in the spectrum correlated with the first resonant frequency. Again, as in the pilot study,

10 Giardiniere, D. (1991). Voice matching: an investigation of vocal matches, their effect on choral sound and procedures of inquiry conducted by Weston Noble (Doctoral dissertation, New York University, 1991). UMI ProQuest Digital Dissertation Abstracts, 241, AAT 9213181.

11 Lagefoged, P. (1996), 14-16.

7 the louder phonations had many more partials than the softer phonations. The increases were evident when the F0 (fundamental frequency) was at a lower pitch, as in the male participants. Participants (N = 22 speech therapy students) in the second experiment were recorded speaking the vowels /a/, /i/, and /u/. The averaged phonetogram results showed the vowel /a/ was ~10 dB higher in sound pressure level (SPL) than /i/ or /u/ when the participant sounded a low F0 (fundamental frequency).

As the F0 (fundamental frequency) rose, the sound pressure level (SPL) differences between the vowels reduced. In loud phonation, the sound pressure level (SPL) increased as the frequency of the first formant increased. In soft phonation, there was no difference between the vowels because the F0 (fundamental frequency) was the strongest partial. Grammings' third study utilized both healthy (N = 20 men and women) and non-healthy (N = 10 female patients diagnosed with non-organic dysphonia) participants. Again, phonetograms were made of the vowel /a/ on a pitch chosen by the participant. The pitch chosen by the participant was evaluated and described in relation to the participant's full range. The goal of this study was the short term variance in sound pressure level (SPL) in loud and soft phonation. The patient participants, who used soft phonation, more than 60% of the time, chose a frequency in the higher part of their range which showed significantly more sound pressure level (SPL) variation. The resulting sound pressure level (SPL) variation mean for loud phonation was 2 dB whereas in soft phonation the sound pressure level (SPL) variation mean was 5 dB which led Gramming to conclude voice control was more difficult when the patient participants used soft phonation. Weber (1992) was interested in the difference between vibrato and straight tone singing on sound pressure level (SPL). College choir sopranos (N = 20) were recorded singing /a/ for representative low, middle, and high pitches in loud and soft dynamics with both vibrato and straight tone. For each participant, this resulted in 24 trials per soprano (each condition was repeated). Analysis of the recordings found no significant difference in sound pressure level (SPL) for any condition except for a slight difference in the loud vibrato condition. Weber concluded conductors should determine the use of straight tone or vibrato be based on the acoustic characteristics of the performance location since the sound pressure level (SPL) showed very little variance.12 Sundberg et al. (1998) chose an unexplored musician population to investigate voice source characteristics, one of which was intensity. Singing participants (N =6 premier male country singers)

12 Weber, S. T. (1992). An investigation of intensity differences between vibrato and straight tone singing (Doctoral Dissertation, Arizona State University, 1992). ProQuest Dissertation Abstracts International, AAT 9223155.

8 wore a Rothenberg mask and were recorded speaking and singing the CV (consonant-vowel orientated) syllable /pae/. The speech condition was two fold: in speech condition one, the participants started at basal pitch (lowest comfortable pitch) and repeated /pae/ in soft, medium, and loud voice. This pattern was repeated at four successive thirds, imitating in speech the pitch pattern of an arpeggio. In the second speech condition, the participants spoke the syllable /pae/ to the pattern of a limerick in soft, medium and loud voice. The singing conditions were also two fold. The participants chose and sang a song from their country repertoire on a starting pitch of their choice without accompaniment. The participant was encouraged to sing with all the same inflections, dynamics, and intensity as in a performance. The second singing condition had the participants sing The National Anthem at a starting pitch of their choice. Extensive detail was given to the recording and analysis process including the equipment used. Listening participants (N = 19 singing experts) listened to a perception test designed to answer the question “How much pressedness do you hear in this voice?” Answers were given on a 100-mm visual analog scale which ranged from “None” to “Extreme”. One third of the samples were replayed to test for reliability. Listening participants’ perceptions included an awareness of different voice quality between the chosen country song and The National Anthem. The participants reported that the amount of pressedness heard in the samples increased with higher pitches that were coupled with louder volume. The correlation between the pressedness of the voice on higher pitches with increases in sound pressure level (SPL), which would be expected to also double the subglottal pressure (Ps), was not evident in the results. The results suggested that the smaller the sound pressure level (SPL) gain, the greater the perceived pressedness. But, as expected by the authors, the closed quotient (CQ) and the glottal compliance were greater in loud speech than in soft speech whereas in singing, the participants used similar or slightly higher closed quotient (CQ) values. The authors concluded that a voice source characteristic of country singing was very high closed quotient (CQ) values in loud singing. This characteristic, often considered a cause of vocal damage (pressedness), had not manifested itself in the vocal fold pathology of these participants.13 Miller, Schutte and Doing (2001) explored soft phonation in professional tenors. Participants (N = 2) were fitted with an electoglottograph collar and an esophageal balloon while singing into a microphone four vocal tasks: 1) a sustained Ab4, 2) an Ab4 arpeggio, 3) a sustained note in falsetto, and 4) a sustained

13 Sundberg, J., Cleveland, T., Stone, R., & Iwarsson, J. (1999). Voice source characteristics in six premier country singers. Journal of Voice, 13 (2), 168-183.

9 note in modal production. Each vocal task was performed in a soft level and then in a medium level gradually down to a very soft level while maintaining the same vocal production. One participant’s vocal timbre was described as lyrical while the other voice was described as robust. The lyric tenor had no difficulty with the requested tasks. The robust tenor experienced a moment of silence as the voice would equalize from a louder production to a softer production. This was accredited to a longer closed quotient (CQ) phase that was incomplete and a steeper slope on the electroglottography (EEG) that became significantly shallower in the very soft level. The lyric tenor maintained a steady subglottal pressure (Ps) throughout the entire task. This data prompted the authors to suggest that messo di voce is a voice register, not a vocal task.14 Formants ~ Resonances Pulsating air flow through the glottis (the space between open vocal folds) is known as the voice source. When sound is measured at the voice source, the fundamental frequency (F0) will have the greatest amplitude. Each cavity of the vocal tract will have a resonance that will be represented in the source spectrum envelope as peaks of amplitude at various frequencies. These peaks of amplitude are formants. Beginning with the first spectral peak occurring at the lowest frequency, the formants are 15 labeled in order F1, F2, F3…and so on. Each formant rises in frequency. The resonance frequencies change as the vocal tract molds articulation. Specific frequencies increase with individual vowels that are articulated in a specific region of the articulatory system. The first frequency peak (F1) is usually associated with the pharyngeal space (back cavity of the mouth) particularly with the vowels /e/, /i/, and

/ɨ/. The second frequency peak (F2) is generated in the front cavity of the mouth for the back vowels /u/,

/o/, and /ɑ/. The third frequency peak (F3) is dependant upon the front of the tongue, especially in vowels

/u/, /o/, /i/ and /ɑ/. The fourth and fifth frequency peaks again have front of the tongue influence on the

/ɑ/, /ɨ/, and /e/ whereas the back of the tongue influences /u/, /o/, and /i/. The fifth peak is strongly impacted by the larynx tube.16 It is the unique morphology of each singer that requires individually specific training to achieve maximum resonances from the vocal tract. Knowledge of the production and

14 Miller, D. G., Schutte, H. K., Doing, J. (2001). Soft phonation in the male singing voice: preliminary study. Journal of Voice, 15 (4), 483-491.

15 Fant, G. (1970). Acoustic Theory of Speech Production: With Calculations Based on X-ray Studies of Russian Articulations. Mouton: The Hague, pp. 17-20.

16 Fant, (1970). 121-122.

10 propagation of these resonances will aid in developing voices that are capable of singing healthily over orchestras and in producing full rich choral ensembles. One of the most cited articles in voice research is Fant et al.'s (1972) article on the measurement of subglottal formants. Measurements were taken of the first through third formants (F1, F2, and F3) of the recordings of participants' speaking the CV (consonant-vowel) syllable /pa/. Results of this study suggested the glottal strength of participants had a direct impact on the measurement of subglottal formants. Weak and/or breathy voices showed more subglottal formant traces than those of normal voices. Formant measurement data garnered in this study was used to develop computer models of synthesized voices.17 Miller and Schutte (1990) defined formant tuning as using vowel modification to approximate one or both of the two lowest resonances of the vocal tract to harmonics of the glottal source.18 A leading Netherlands opera baritone was recorded singing melodic patterns on a variety of vowels and CV nonsense syllables with a catheter (fitted with a miniature wide band pressure transducer) inserted through the neck and into the glottis area as well as an EGG (electraglottographic) neck band. Vocal production began once the topical anesthesia had faded. Supra- and sub-glottal pressures were measured and well as the formant frequencies and harmonics. Phonations were made at the participant’s choice of pitch and ranged from 230 Hz to 380 Hz (Bb 3 to F4) – an area where vocal tract realignment is usually needed to move baritones into full head voice. In other words, the participant reduced the sub-glottal pressure (Ps) and modified the vowel to make a smooth transition into head voice.19 Miller and Schutte (1992) continued their research into subglottal pressure and formant measurement by recording professional male singers (n = 2), equipped with two glottis transducers, an electoglottograph (EGG), and a microphone at a distance of 30 centimeters. The recorded singing tasks were four scales on the vowel /a/ and sustained /a/ vowels on four range- representative pitches. Conclusions included confirmation of measurement tools to show center frequencies of pitches when vibrato was present in the singers' vocal production. The same equipment was able to accurately measure

17 Fant, G., Ishizaka, K., Lindqvist-Gauffin, J., Sundberg, J. (1972). Subglottal formants. STL-QPSR, 13 (1), 001-012.

18 Miller, G., Schutte, H. K. (1990). Formant tuning in a professional baritone. Journal of Voice, 4 (3), 231.

19 Ibid.

11 the frequency distance between harmonics and a dominant formant. The vocal tract configuration was confirmed, in this study, as a variable in determining formant frequency modulation.20 Ternström (2007) chose to investigate formant frequencies by using a professional barbershop quartet. Three four-track recordings of Paper Moon were sung by the participants in an absorbent room. The recordings included the participants singing together but with each singer placed in one of the four corners of the room, each participant singing alone, each participant speaking alone, and then all participants speaking together. Each singer wore a small microphone taped on the end of his nose. The recordings were analyzed through inverse filtering utilizing Decap software to determine the identity of formant frequencies, the measurements of the spread of formant frequencies, and the relationship of partials in both individual and ensemble measurements. The vowels chosen for analyzing were /u/ (to), /i/ (be), and /a/ (divine). Results suggested singers separated their formants from each other as evidenced in wide- spread formant frequencies. Formant frequencies were often on or close to a partial of the individual singer as well as to the common partials of another singer. The spread formant frequencies may have been in an effort to hear oneself better so that the combined sound might have seemed larger and more expanded, in other words, more resonant. In the barbershop world this is referred to as locked and rung!21 Success for this quartet was achieved through varied vowel production versus attempting to sing exactly the same vowel – the opposite of choral singing. Barbershop quartets may be able to increase their resonance by adjusting their vowel quality.22 Frequency Frequency is the rate of vibration of a periodic event. In phonated sound this means the number of sound wave cycles per second (cps). When we measure frequency it is expressed in hertz (Hz). We assign a specific name to a pitch because we do not hear frequencies. Our available hearing range of frequency is approximately 20 Hz to 20,000 Hz.23 The lowest note that we can hear is what would be the lowest C (Csub zero) on the piano if it were extended two whole tones. Each successive C going from left to

20 Schutte, H., Miller, D., Svec, J. G. (1995). Measurement of formant frequencies and bandwidths in singing. Journal of Voice, 9 (3), 290-296.

21 Ternström, S., & Kalin, G. (2007). Formant frequency adjustment in barbershop quartet singing. International Congress on Acoustics, Madrid, September 2007, 1-6.

22 Ibid.

23 Lagefoged, (1996), 21.

12 right on the piano is ordered numerically – C1, C2 and so on. These notes are said to be an octave apart. C4 is commonly referred to as middle C. A4 is the fourth A on the piano from right to left and is commonly known as A440 because the vibration of the air stream as it passes through the glottis is 440 cycles per second (cps) or 440 Hz. When we speak of pitch, we are using a perceptual term of relativity that functions on a scale from low to high. When we speak of frequency, we are speaking in absolutes using a term of measurement of the number of sound waves occurring within a second. (See Appendix D). In 1979, Shipp et al. recorded participants (N is not provided, n = 10 professional operatic singers, n = not provided number of spastic dysphonic patients) singing a variety of sustained vocal lines utilizing targeted frequencies throughout their ranges. Acoustic analysis revealed many differences between the sub-groups. The singer participants' variance of vibrato pitch was within ± 0.5 semitones whereas the patient participants had very little vibrato as reflected in their signal amplitude. The patient participants had very large cycle-to-cycle variations whereas the singer participants' variations were very small. However, the variation mean rate of vibrato was similar for both the singers and the patients. The results suggested the physiological manifestation of vocal tremor and vibrato are similar, yet, singers may have mastered a stabilizing technique in which the nerve pulses of muscles are inhibited except for the superior laryngeal nerve which stimulates the cricothyroid muscle. Perhaps patients and less experienced singers allow, or do not suppress, stimulation of muscle nerves in areas of the vocal tract (including the respiratory system) that cause muscles to engage that are not needed for phonation.24 The next three landmark studies investigated the understanding of singers' vowel production in a variety of singer modes of phonation. Bloothooft and Plomp (1984) first recorded each singer (N = 14 professional singers, n = 7 male and n = 7 female) in an anechoic room singing the nine Dutch vowels for one to two seconds in each of the following tone qualities: neutral, light, dark, free, pressed, soft, loud, straight, and extra vibrato. These terms were taken from accepted vocal pedagogy and the participants confirmed knowledge of and an understanding of each of the terms. Comparison of the nine modes' average sound pressure level (SPL) revealed that the neutral mode and the free mode appeared to be interchangeable descriptors of the same mode of singing. Comparison of the nine modes spectral compositions showed that the presence of, or increased use of vibrato did not vary the spectral

24 Shipp, T. & Izdebski, K. (1979). Elements of frequency and amplitude modulation in the trained and pathologic voice. Acoustical Society of America Supplement, 1 (66), Fall 1979, 56.

13 compositions. From these conclusions, Bloothooft and Plomp reduced the number of modes to six; soft, light, dark, neutral, pressed and loud.

Each singer’s classification was used to determine the fundamental frequencies (F0) used for each participant (five for men and four for women). The sopranos and tenors showed twice the spectral variance in the F0 (fundamental frequency) across the vowels and modes of singing as that of the bass and alto participants. Although no perceptual data were taken, authors suggested sopranos and tenors needed better intelligibility of vowels. The greatest vowel variance for all the participants was the vowel /u/. The vowels /a/, /α /, and /ε/ showed half of the variance than that of the vowel /u/. The information was not provided regarding measurement tools used for the vowel variances; however, great detail was given to the measurement process and results.25 Bloothooft and Plomp's (1985) second article used the same subjects and data to discuss the vowel spectrum for each participant with respect to the main effect of the four vowels. Each vowel was measured in dBs and at increments of ten milliseconds with a 1/3-octave band filter spectrum that was “normalized” for SPL (sound pressure level). A comparison was made between the perception-oriented spectrum space (formant frequencies) and the production-oriented spectrum space (from 1/3 octave spectra). The vowels were represented as the most important single source of spectra variance for low fundamental frequencies (F0). Male and female variants were consistent with one another. The relationship between the average sound level of the singer’s formant (Fs) and the fundamental frequency

(F0) was found to be vowel dependant. When the fundamental frequency (F0) was higher than 392 Hz, the results showed a lower singer’s formant for women. The modal register had less variability in the first formant (F1) than the falsetto register and it was hypothesized that in singing higher frequencies, the first formant (F1) is very close to the fundamental frequency (F0). Bloothooft references Sundberg's (1981) results which showed strong acoustic coupling between glottis and vocal tract26 and suggested this was a possible cause for these results.27

25 Bloothooft, G., Plomp, R. (1984). Spectral analysis of sung vowels: I. variation due to differences between vowels, singers, and modes of singing. Journal of the Acoustical Society of America, 75 (4), 1259-1264.

26 Sundberg, J. (1981). Formants and fundamental frequency control in singing. An experimental study of coupling between vocal tract and voice source. Acustica, 49, 47-54.

27 Bloothooft, G., Plomp, R. (1985). Spectral analysis of sung vowels. II. The effect of Fundamental frequency on vowel spectra. Journal of the Acoustical Society of America, 77 (4), 1580-1588.

14 Again, the same data is used in Bloothooft and Plomp's third study, which compared the individual participant's spectra of the different modes of phonation. The overall conclusions confirmed that primary differences in the fundamental frequency (F0) were associated with the differing lengths of the male vocal tract whereas in the women, the main difference was associated with the glottal opening. The pressed- dark mode of singing in the participants clearly showed increased pharyngeal volume which was directly influenced by the height of the larynx.28 Maxwell (1985) investigated the effect of masking on a singer's ability to sing in tune. Masking is the obscuring of one sound by another. In singing, the inability to hear oneself sing is often the result of a masking noise – which sometimes is the loudness of the surrounding singers. The greatest masking effect within a choir occurs within one's own vocal section, for those singers are singing the same frequencies (what we think of as pitches). In the first of three experiments, participants (N = 24 college voice majors) were recorded singing vocalizes and song excerpts with and without masking noise. The second experiment recorded participants (N = 15) as they sang “The Star Spangled Banner” in a key of their choosing in which masking noise was added at an unknown, random point. The third experiment was a 10-week longitudinal study with four treatment conditions: 1) normal lessons and normal practice (CG – control group); 2) white noise lessons and normal practice; 3) normal lessons and white noise practice; and 4) white noise lessons with white noise practice. In each study, pre-experiment and post-experiment recordings were made of each participant prior to and after each experiment. From these recordings of the first two experiments, a listening tape was made for judge participants (N = 9, n = 3 voice teachers, n = professional non-voice musicians, and n = 3 lay musicians). The listening tape contained excerpts from the pre- and post-recordings of participants. The judges ranked the voice quality of the first excerpt as compared to the voice quality of the second excerpt as better, same, or worse (studies one and two). This same procedure was executed for an intonation comparison of the paired excerpts. For the third experiment, the judge participants were asked to rank the singer participants' vocal progress between the first of the paired excerpts as compared to the second of the paired excerpt. Five options were provided for the ranking: great progress, considerable progress, some progress, same, and worse. The judges’ perceptions of the first experiment participants’ samples found white noise adversely affected participant intonation and voice quality. It was not surprising that the judges were able to detect

28 Bloothooft, G., Plomp, R. (1986). Spectral analysis of sung vowels III. Characteristics of singers and modes of singing. Journal of the Acoustical Society of America, 79 (3), 852-864.

15 the point when masking noise had been introduced in the second study. The sample group of the third study, which received the highest mean score ranking, had masking noise during their lessons and practice time. However, comparison of variance within groups found much greater variance within all other groups outside the control group. Teacher guidance with white noise indeed produced greater results. Participants without teacher guidance of white noise regressed. In all studies, participants tended to flat ascending passages, sharp descending passages, sharp sustained notes, and modify /ɑ / to /ɔ/ or /a/ when masking was introduced. The recording, editing, and playback equipment are unknown for the listening participants' perception listening tape. Also, the production of white noise is unknown. These specifics would aide in understanding the conclusions drawn, the perceptions of the auditors, and would provide a roadmap from which to apply the information garnered. However, great detail is given to the statistical analyses of the listening participants’ responses.29 Gramming et al. (1988) wondered what the relationship was between the changes in voice pitch when loudness was considered as a factor. Male and female singers and non-singers (N = 20) were recorded singing triads (singers) and pitch glides (non-singers) to provide data for phonetograms. The same participants were asked to read a lengthy (non-related) passage, first in a quiet environment, followed by three additional readings in steadily increasing noisy environments. Singers were found to use a stronger fundamental frequency (F0) and an elevated frequency with increased noise in the environment. Non-singers showed no difference in pitch. Authors proposed singers’ wider pitch range accessibility and familiarity with their full pitch range as an explanation for these results. Additionally, this may be a reason for reduced pathology in similar life settings.30 Nordmark and Ternström (1996) looked at intonation from a very different angle. The most defining interval of Western tuning systems (Pythagorean, pure, and equal temperament) is the major and the minor third. Hemholtz believed that intervals which were not "purely" tuned caused a "beating" which would be heard as a dissonance.31 Nordmark and Ternström created synthesized non-beating ensembles sounds to add to the existing knowledge of beat ensemble sounds and their relationship to intonation. To create these sounds, synthesized violas were used because they most closely resembled human sounds - once a flutter component was added. The average ensemble flutter level was found to be between 10-15

29 Maxwell, D. (1986). The effect of white noise masking on singers. Journal of Research in Singing, 8 (2), 9-19.

30 Gramming, P., Sundberg, J., Ternström, S. Leanderson, R., Perkins, W. H. (1988). The relationship between changes in voice and pitch loudness. Journal of Voice, 2 (2), 118-126.

31 Hemholtz, (1885), 24.

16 cents (Ternström, 1993).32 For this experiment, nine cents of flutter was added to the synthesized viola sounds. Two groups of three ensemble sounds were used to create versions of major thirds: the first group had the fundamental frequency (F0) set at 220 Hz; the second group was set at 390 cents above the fundamental frequency (F0) for a slightly larger major third interval than a pure major third which would have been at 386 cents above the fundamental frequency (F0). Once created, the dyad was replicated 9 more times at different fundamental (F0) pitches. Each dyad was repeated twice in random order on a 20 dyad perception test. The headphoned listening participants (N = 16, n = 11 undergraduate choral music education students, and n = 5 orchestra musicians) were given the opportunity to tune each dyad to their preference for a major third. The range of cents above the fundamental was 350 to 450 cents. If a participant expressed preference for a deviation above or below this range, the computer would not allow the participant to move on to the next dyad. The results showed listener preference for interval size of a major third was 395.4 cents - which is closer to equal temperament than to pure intonation (386 cents). Participant results suggested that non-beating intervals (pure intonation) are not preferred. However, participant preference reliability was inconsistent in this study.33 Quality of Tone Helmholtz (1885) described the quality of a tone as being sometimes called its color, timbre, or register.34 When one is able to discern one pitch of the same frequency, duration, and loudness from another – it is because its quality of sound is different from the others. Hemholtz determined that the difference must be in the manner in which the motion is performed within the period of each single vibration.35 This manner can be perceived as brighter or more acute; it could be the way the tone begins (onset) or ends (off set); the amount of resonance (or the lack of resonance) in the sound; or the effect of one's pronunciation on the tone.36 Fillebrown believed the quality of a tone was the result of the singer's mood or emotion; an expression of the individual which was completely unique to the singer.37

32 Ternström, (1993), 7.

33 Nordmark, J. & Ternström, S. (1996). Intonation preferences for major thirds with non-beating ensemble sounds. TMH- QPSR, 37 (1), 57-62.

34 Helmholtz, (1885), 24.

35 Ibid., 19.

36 Ibid. pp. 65, 66, 113.

37 Fillebrown, (1911), 7-8.

17 Fillebrown did not have a scientific, anatomical, physiological explanation for the quality of a singer's tone but believed the answer would be found through continued research. Schoen (1921), a student of Carl Seashore, studied the presence of vibrato in professional sopranos (N = 5). Professional recordings of Nellie Melba, Alma Gluck, Frances Alda, Emma Eames, and Emma Destinn singing Bach-Gounod's Ave Maria were analyzed by tonoscope (early stroboscopy). The selected pitch was the third note of the composition, D5 (~ 587.33 Hz). Each participant's sample was analyzed with respect to the attack of the note, the accuracy of intonation, the fluctuation of the frequency, the release of the note, and the tonal movements leading to the note and away from the note. Individual characteristics were provided for each participant. The overall conclusions showed this tone was led to from a lower note and resulted in a low attach frequency. Schoen surmised a time interval might have elapsed before the intensity of breath was engaged fully. Equally interesting was that the release of the note was high in frequency even though the next note was lower. Schoen conjectured that this might be due to an attempt to maintain a steady pitch to the end of the tone and that breath support might wane, causing the participant to press more breath support which raised the pitch. Each time the same pitch from the same musical phrase was repeated, the participant sang it differently. The vowel quality seemed to have no effect on the pitch accuracy. Movement from tone to tone seemed to be glide- like, almost a portamento. The participants seemed to sing sharp with respect to both pure and tempered intonation. Schoen concluded [erroneously] that although vibrato was present in every voice, it was only present when there was strain in the accompanying muscles. Schoen suggested the muscle strain was in response to the singer's emotional excitement while singing and that vibrato was the result of a neuro- muscular condition characteristic of the singing mechanism and therefore a periodic-pitch phenomenon.38 Bartholomew (1934) hypothesized that oscillator recordings of singers, both professional and amateur, would reveal the physiological structure(s) responsible for various qualities. With this information, singers, as much as was possible, would be able to consciously control the voice mechanism. Bartholomew recorded 46 films and from them defined four characteristics of good male voice quality: vibrato, tonal intensity, the presence of a strengthened low partial at 500 cycles per second (cps) or lower, and the presence of a high formant lying between 2400 and 3200 cycles per second (cps). Sometimes another peak occurred around 5700 cycles per second (cps), which the author surmised occurred when the larynx pipe was energized strongly enough that its natural octave began to appear. There were similar

38 Schoen, M. (1921). An Experimental Study of the Pitch Factor in Artistic Singing. Ph.D. Dissertation: University of Iowa, August, 1921.

18 indications for female voices but with the following exceptions: the high formant centered higher around 3200 cycles per second (cps); and the coloratura had almost no high formant yet the tone quality was deemed "good" because of its "purity".39 Twenty five years later, at the 51st conference proceedings for the Acoustical Society of America, Bartholomew suggested a classification of singer tones was necessary. Spectrographic and X-ray studies of singing would allow for voice classification according to the singer’s voice quality, the singer’s expressed mood, and the vowel sung by the singer. Bartholomew proposed twenty-seven classifications of physiological differences visually noted in x-rays coupled with acoustic differences found in spectrograms.40 Fry (1956) immediately responded with 27 voice classifications, but based the system on three specific voice types – light, lyric, and dramatic. Although a definition of these three voice types is not provided, the general thought was that light described a voice that did not have a professional quality to the sound – perhaps lack of the singer's formant (Fs). Lyric and Dramatic voices represented opposites of the professional voice spectrum. The three types were determined by the position of the larynx and the configuration of the epiglottis, pharynx, and root of the tongue. Additional factors taken into consideration were the mood of the singer and the vowel being articulated.41 Rshevkin (1956) recorded male voices singing vowels /u/, /a/, /i/, and /o/ on pitches ranging from 94 cycles per second (cps) to 490 cycles per second (cps) for duration of approximately 0.1 seconds. Harmonic analysis revealed two distinct increases within two narrow bands of spectrum; 400-600 cycles per second (cps) and 2200-2800 cycles per second (cps) which were not present in untrained baritones. The higher formant frequency in the 2200-2800 cycles per second (cps) region was labeled the singer's formant (Fs). Listeners described voices with the singer's formant (Fs) as metallic. Rshevkin suggested that these peaks occurred only at the beginning of the vowel which the trained singer learned to modify to

39 Bartholomew, W.T. (1934). A physical definition of “good voice-quality” in the male voice. Journal of the Acoustical Society of America, 5 (3), 25-33.

40 Bartholomew, W. (1956). A basis for the acoustical study of singing. The Journal of the Acoustical Society of America, 28 (4), 757.

41 Fry, D. B. (1956). A basis for the acoustical study of singing. Program of the Fifty-First Meeting of the Acoustical Society of America’s Joint Meeting with the Second ICA Congress. Cambridge, Massachusetts, 34.

19 the “vowel singing position”. These results agreed with the findings of his earlier research (1927) and those of Bartholomew who found a high singer's formant around 2800-3200 cycles per second (cps).42 Delattre (1958) felt the work of correlating voice formants with types and classes of voices had not yet been successfully accomplished. The design of this study was not provided, but through an acoustic articulatory comparison of vowel color and its effect on voice quality, Delattre reached the conclusion that the quality of a singer's voice seemed to be characterized by the two or three formants whose frequencies are just above the vowel formants.43 Arment's (1960) dissertation sought to compare the spectra of vowel tones with the perceptual designation of the same tones on a bright to dark hierarchy. For the initial pilot study, participants (N = 2 sopranos with perceptually different tone qualities) were asked to sing four different pitches (D4, A4, D5, F#5) on three different vowels (/i/, /a/, /u/) for a duration of four seconds per tone. The recordings were made in an 8' x 10' acoustically dead room. To make the perception tape, the tones had both their onset and offset trimmed leaving a two second tone. Auditors (N = 6 voice teachers and singers) were asked to rate the vowel on a bright to dark ranking scale of: 1) very bright, 2) moderately bright, 3) neither predominantly bright or dark [neutral], 4) moderately dark, or 5) very dark. Analysis of the auditors' preferences included: 1) brightness to darkness rating for each vowel, 2) brightness to darkness rating for each pitch, 3) brightness to darkness ratings for each vowel on each pitch, 4) brightness to darkness ratings for each singer, and 5) brightness to darkness ratings for each vowel as sung by each singer. Those tones which received the highest agreement on the dark to bright hierarchy were chosen for spectral analysis, including identification of formants and intensities of harmonics. Spectral analysis of the tones, which the auditors found to be very bright, revealed narrow formants. The second formant (F2) was high in intensity and overall high harmonics. The dark vowel spectra had broad formants, a third formant (F3) low in intensity, and a broad formant between 3000 and 5000 Hz. Another variable, two different singers with perceptually different tone qualities, was apparent in the overall spectra (no definitive information is given regarding this statement). The primary study recorded participants (N = 5 sopranos with a minimum of five years of vocal training) singing D4, F#4, B4, D5, C#5, A4, G4, and E4. Each tone was sung on each of six vowels; /i/, /e/, /a/, /o/, /u/, and /ə/. The participants were asked to sing specific tones on a specific vowel in a

42 Rshevkin, S. N. (1956). Some results of the analysis of singing voice. Program of the Fifty-First Meeting of the Acoustical Society of America’s Joint Meeting with the Second ICA Congress. Cambridge, Massachusetts. 34-36.

43 Delattre, P. (1951). The physiological interpretation of sound spectrograms. Publication of the Modern Language Association (PMLA), 66 (5), 864-875.

20 particular voice quality – bright, dark, or neutral. Each singer was given time to study the required order of tasks and then given time for a practice run prior to the official recording. To aid the participant in maintaining the intensity level between singing tasks, a decibel meter was positioned in the participant's sight line. The target intensity level was 75-80 dB. The participants were recorded in the same 8' x 10' acoustically dead room with a microphone thirty-two inches from the singer and forty-eight inches from the floor. The listening participants (N = 16 singing teachers and graduate level singers, n = 8 men and n = 8 women) were asked to evaluate a series of tones on a Likert 10 point scale from extremely bright to neutral to extremely dark. Each tone in the series was to receive its own evaluation although the auditor was going to hear six tones at a time. Spectral analysis of each tone was completed for all harmonic, formant, and vibrato data. These spectral data were cross referenced with the listening participants' answers. Arment concluded the brightness or darkness of a tone may be regarded as a continuum of tonal characteristics44. The brightness to darkness continuum might be influenced by the vowel, the intensity, and/or the pitch of the tone, but ultimately it stands alone as a significant descriptor of the tone. Varying loudness of tones showed no effect on the brightness or darkness of tones. However, vowel did seem to have a direct effect on the brightness or darkness of tone. Just as in the pilot study, bright tones had strong high harmonics whereas dark tones had strong low harmonics. Bright tones had narrow formant bands in comparison to wide banded dark tones. Tones which ranked the brightest showed greater 45 intensity of the second formant (F2) and an increase in the amount of harmonics in the tone. Coleman (1973) investigated exactly what physiological components define the quality of a speaker's voice such that the speaker's sex is known. Two experiments were devised in which participants (N = 40 university students, n = 20 males and n = 20 females) were recorded speaking a variety of speech tasks and repeated some of the tasks using a laryngeal vibrator. The recordings were analyzed and the vocal tract resonances (VTR) and laryngeal fundamental frequencies (LFF) were computed for each participant. The first perception test utilized five-second samples from each participant played backwards. In this test, auditors' (N = 17 university students) responses were significant (p > .01) with 94% accuracy in identifying the sex of the sample with respect to the laryngeal fundamental frequency

44 Arment, H. (1960). A Study By Means of Spectrographic Analysis of the Brightness and Darkness Qualities of Vowel Tones in Women’s Voices. (University Microfilms No. AAG6002989).

45 Ibid.

21 (LFF). Accuracy dropped to 56% when the sex of the sample was compared to the average mean of the vocal tract resonances (VTR). The second perception test utilized the laryngeal vibrator samples which had the highest vocal tract resonances (VTR) for the females (n = 5) and the lowest vocal tract resonances (VTR) for the males (n = 5). Only two pitches were used for the samples – 240 Hz and 120 Hz. The samples had equal representations of the following descriptors: low vocal tract resonances (VTR) and low laryngeal fundamental frequencies (LFF), high vocal tract resonances (VTR) and high laryngeal fundamental frequencies (LFF), high vocal tract resonances (VTR) with low laryngeal fundamental frequencies (LFF), low vocal tract resonances (VTR) with high laryngeal fundamental frequencies (LFF). The auditors (N = 25 university students) were asked to determine the sex of the speaker and the results showed correct sex identification 245 out of 250 times in the first two descriptors above (those in which the VTR and the LFF are indicative of the same sex). When the descriptors were jumbled, male characteristics (low VTR and low LFF) were perceptually more prominent. The results of these experiments led Coleman to the conclusion that laryngeal fundamental frequency plays a heavier role in our ability to discern between male and female speakers.46 Teie (1976) used a variety of singers (N = 31, n = 5 male first year voice students, n = 5 female first year voice students, n = 5 male fourth year voice students, n = 5 female fourth year voice students, n = 3 male untrained singers, n = 3 female untrained singers, and n = 5 voice faculty members) to look at the effect of vocal training on presence of the singer's formant (Fs), an increase in energy in the 2800- 3200 Hz range. Each participant was recorded singing the vowels /a/, /i/, and /u/ on two pitches. The male participants sang at 160 Hz (E3) and 288 Hz (D4) and the female participants sang on 288 Hz (D4) and 512 Hz (C5). These pitches were chosen to represent the upper and lower voice registers of both the male and female participants. It was also deemed important for one of the pitches to be sung by all participants (288 Hz, D4). The participants were instructed to sing at full volume and to vary the distance of their mouth to the microphone by watching an oscilloscope so that 125 dB was maintained. The recordings were conducted in a sound proof speech laboratory room. Each of the participant's six samples was analyzed through spectrography for the fundamental frequency (F0) and the presence of partials in the tone. Teie concluded that the amount of training affects the frequencies higher than the second formant (F2), most specifically the range of 2 kHz to 4 kHz. The

46 Coleman, R. (1973). A comparison of the contributions of two vocal characteristics to the perception of maleness and femaleness in the voice. STL-QPSR, 14 (2-3), 13-22.

22 training effect was present in the intensity levels of the tones for both trained and untrained participants had similar configuration and breadth within the singer's formant (Fs) range, 2800-3200 Hz. Little difference between all of the subsets of participants was apparent in the F1 and the F2 when the singers were singing the same vowel at the same pitch. In the trained singers' samples, there was spectral energy peaks in the 6 to 8 kHz region. Most interesting was that the untrained singer's produced tones with almost as prominent singer's formant (Fs) as did the trained singers on the /i/ vowel. This suggests singers should strive to have the /i/ vowel quality in all vowel sounds to enhance the singer's formant (Fs) region. Teie went so far as to conclude the essence of consistent tone quality is the ability to color all vowel sounds with an /i/ resonance. Teie felt his results were circumspect due to the low participant number for each subset category. Additionally, the dynamic level chosen may have had an impact on the results and therefore a greater variety of dynamic levels would have provided keener insight as to this effect. The Fs was inconsistent in the female singers' spectra. In closing, Teie conjectured as to the effect consonants would have on the 47 presence of the singer's formant (Fs). To examine vocal registration, Cleveland (1977) recorded male participants (N = 8 professional

Swedish singers) singing the vowels /i/, /e/, /α/, /o/, /u/ on the pitches C3, F3, A3, E4. A listening test was designed with three hearings of each vowel vocalization presented in two twenty-five minute sessions each separated by a thirty-minute break (five vowels x four pitches x eight subjects). Some vowel sounds were synthesized by a source-filter network. Auditors (number unknown) were asked to determine the voice classification of the singers as bass, baritone, or tenor. Source spectra, formant frequency, and sonogram measurement were employed on the vowel vocalizations. The information showed the voice classification was dependant on vocal tract size and dimension, for example, the vocal tract length of basses singing /i/ was nineteen centimeters as compared to tenors at 15.5 centimeters. Cleveland also found timbre type classification to be strongly influenced by formant frequency and suggested that its importance outweighed pitch. The correlation between formant frequency of spoken vowels and sung vowels was quite high and could be useful in future voice

47 Teie, E. (1976). A comparative study of the development of the third formant in trained and untrained voices. (Doctoral Dissertation, University of Minnesota, 1976). Dissertation Abstracts International, 37, (10A), 6135.

23 classification. Since vocal timbre exists at an earlier age than full range capability, Cleveland suggested it is a better indicator of voice classification.48 Magill and Jacobsen asked professional singers (n=15) and college music students (n=15) to identify their voice classification and then recorded them singing sustained vowels and major arpeggios appropriately pitched for their self-proclaimed voice categories. Analysis of the recordings showed the presence of increased spectral energy in the range of the singers’ formant (Fs) in both males and females and in all voice categories. There was more singers' formant (Fs) presence in the male voices which

Magill and Jacobsen hypothesized may have been due to a lower first formant (F1) that allowed for a greater number of harmonics to fall within the area of the singer's formant (Fs) frequency envelope. The strength of the energy in the singer's formant Fs) region showed a direct correlation to the participant's amount of training and experience.49 Colton and Estes (1979) recorded participants (N is not provided) singing in four separate voice qualities on selected pitches throughout their vocal range. Auditors (N is unknown) had a high degree of accuracy in identifying the participants' modes of phonation, even at the ends of the vocal ranges. The acoustical results of the recordings showed definite frequency bandwidths, specific resonant peak locations with representative spectral envelopes to dynamic ranges. Physiological results were equally definitive of each vocal mode. The results suggested the unique features of each voice mode could provide singers with a roadmap toward a variety of healthy vocal modes of phonation that in turn would offer singers multiple voice qualities.50 Murray (1979) explored the presence of jitter in female spoken phonation as compared to sung phonation. Jitter is the presence of irregular periodicity in the action of the vocal folds and is often perceived as hoarseness. Female singers (N = 4) were recorded speaking the vowel /a/ and then singing the vowel /a/ in four different conditions (conditions are unknown). The recorded samples were measured for frequency perturbation (jitter). A panel of participants (N is not provided) were asked to listen to the recorded samples and determine if the samples were sung or spoken. Perception participants were unable

48 Cleveland, T.F. (1977). Acoustic properties of voice timbre types and their influence on voice classification. Journal of the Acoustical Society of America, 61, 1622-1629.

49 Magill, P., Jacobson, L. (1978). A comparison of the singing formant in the voices of professional and student singers. Journal of Research in Music Education, 26 (4), 456-469.

50 Colton, R., & Estill, J. (1979). Elements of quality variation voice modes and singing. Acoustical Society of America Supplement, 1(66), Fall 1979, 55-56.

24 to discern differences between spoken and sung vowels. The analysis results showed less jitter in spoken vowels than that of the sung vowels.51 Hertegard et al. (1990) used sung vowels to investigate "open" versus "covered" vowels. Participants (N = 11 professionally trained male singers, n = five tenors, n = three baritones, and n = 3 basses) were recorded singing in both head and covered technique with a variety of acoustical equipment in many conditions. Participants received no training as to the difference between covered and open singing technique as all participants confirmed they had received such training from singing experts during their years of vocal study. The first study utilized a flexible fiberoptic endoscope to allow video of the working mechanism during both the open and covered singing of a one octave scale on the vowel /ae/. The participants were instructed to choose a scale that would cross the passaggio near the top of the scale. At the end of the scale the participants were asked to sing an octave interval to return to the starting pitch. The participants then sang a sustained note on the vowel /ae/ near the passaggio. No directions were given for dynamics in either task. The resulting recordings (both audio and visual) were observed and listened to by participants (N = 3, n = two phoniatricians and n = one logopedist) to evaluate whether or not the flexible fiberoptic endoscope recordings presented any differences between open and covered techniques. The designated form for the participants conclusions had the categories “no difference” and “obvious difference” for the visual recordings and “obvious, slight, or nil” for the audio recordings. Obvious differences were noted by the panel participants in the recordings between open and covered vocal production. Visual analysis revealed the soft palate was consistently higher in seven of the subjects in covered singing. Ten of the subjects widened their pharynx for covered singing. Of the five participants in which the larynx was clearly visible, all five participants widened the laryngeal ventricles and tilted the larynx forward in covered singing. In the second study, participants (N = 7 males singers) wore a Rothenberg mask (pneumotachograph mask) and were recorded singing /pae/ at a pitch of their choosing near the passaggio, alternating between open and covered singing. The recorded samples were inverse filtered and produced a transglottal air-flow wave form (FGG) for analysis of the first and second formant (F1, F2). A flow glottogram graph (FGG) shows specific activity of the vocal fold cycle peak-to-peak flow amplitude in

51 Murray, T. (1979). Vocal jitter in singers voice. The 98th Meeting of Acoustical Society of America, November 1979, Salt Lake City, Utah, 55.

25 milliliters per second, glottal leakage in milliliters per second, period time in milliseconds, and duration of the quasi-closed phase in milliseconds. The results obtained from the inverse filtering were varied.

Subglottal pressure (Ps) and sound pressure level (SPL) showed little or no variation between covered and open singing. The first formant (F1) was generally lower during covered singing whereas the second formant (F2) was generally higher in covered singing. Also, the voice source appeared different between open and covered singing – although no definitive information was detailed. The participants from the second study also participated in the third study. Participants were recorded singing a sustained vowel /ae/ near the passaggio at a pitch of their choosing in both open and covered technique. Spectral analysis of these recordings gave information regarding fundamental frequency (F0), the level of harmonics, and the frequencies of the first and sometimes second formant (F1,

F2). The spectrogram of same participant’s open singing was superimposed on the participant's covered singing spectrogram for six of the seven participants. The energy of the singer’s formant region was unchanged between open and covered production. The highest energy level was located at the harmonic closest to the first formant (F1), but it was unclear whether this was in open or covered singing. When the participant used covered technique to equalize the passaggio, the frequency of the fourth harmonic (F4) would often agree with the frequency of the second formant (F2). Perhaps a relationship existed between the passaggio and this match. Was this result due to the increased loudness (averaging eight dB) in covered singing versus open singing? Another factor in the sound spectrum was that the amplitude of the fundamental frequency (F0) was higher in covered singing, just as the first formant (F1) was lower. These changes were speculated to be due to changes in the voice source. Most importantly, these combined results suggested that covered singing reduced strain on the vocal mechanism and could prevent hyper-functional strain of the larynx.52

Detweiler (1993) designed a study to confirm Sundberg’s concept of the singer’s formant (Fs).

The singer's formant (Fs) and the source of the singer's formant (Fs) resonance has a direct relationship between the ventricular spaces in pulse phonation, and the laryngopharyngeal outlet cross section area which would result in a 6:1 ratio. One tenor and two baritones (N = 3) were recorded singing during laryngeal stroboscopy and an MRI procedure. Although the participants produced consistent energy increases in the singer's formant region (Fs) in all procedures, and in both modal and pulse modes, these participants did not meet the 6:1 ratio requirement. However, the MRI images were obtained while the

52 Hertegard, S., Gauffin, J., Sundberg, J. (1990). Open and covered singing as studied by means of fiber optics, inverse filtering, and spectral analysis. Journal of Voice, 4, 220-230.

26 participant was lying down, which produced a vertical orientation. The result was an overestimation of the area to be measured (Sundberg, 2003). However, this study confirmed the existence of singer's 53 formant resonances (Fs1 and Fs2) in the pulse registers of these participants. Female barbershop tenors have a very specific voice quality which is perceived as light and having very little vibrato. Abbott (2001) recorded female barbershop tenors’ (N = 27) speaking and singing voices. Acoustic analysis of the recordings revealed consistencies throughout the participant group. The female barbershop tenors' voices were characterized with an increased fundamental frequency variation in their speaking voice when compared to existing data for similar aged women. When singing, the participants had great variability in the upper passaggios, higher spectral energy in the fundamental and lower harmonics, and vibrato presented in 25% of the time recorded (extremely low percentage).54 Registration Helmholtz (1885) believed the tension of vocal folds not only determined the pitch of the tone, but also which register the tone originated. He also asserted the thickness of the vocal folds played a part in the sound of the tone, for example, the head voice was thought to be the product of the drawing aside of the mucous coat below the chords (sic) thus rendering the edge of the chords sharper, and the weight of the vibrating part less, while the elasticity is unaltered.55 The breast voice (chest or modal voice) was a result of the tissue below the vocal folds pulling at the bottom of the vocal folds, thereby making them in effect heavily weighted.56 The articles that we are about to examine are built on the foundation Hemholtz provided for us, yet many surprises are in store. Fillebrown (1911) acknowledged that head tones, chest tones, closed tones, and open tones were accepted vernacular of the day, but strongly advocated that registers were not a natural feature of the voice. He supported his claim through a series of statements by surgeons and professional singing teachers. These included Manuel García, the creator of the laryngoscope, who was reported to have confirmed Fillebrown's belief in the "one voice" system.57 Although Fillebrown purported no vocal

53 Detweiler, R. (1994). An investigation of the laryngeal system as the resonance source of the singer’s formant. Journal of Voice, 8 (4), 303-313.

54 Abbott, S. E. (2001). Acoustic evaluation and analysis of the female barbershop tenor voice. Unpublished doctoral dissertation, The Florida State University.

55 Helmholtz, (1885), 101.

56 Ibid.

57 Fillebrown, (1911), 2.

27 registers, he provided this definition of registers: a series of tones of a characteristic clang or quality, produced by the same mechanism.58 Janwillem van den Berg (1963) agreed that vocal registers were produced by the same mechanism, but asserted the larynx had three different audible responses. These three different responses van den Berg labeled as registers; chest, middle, and falsetto. Each of these registers had different physiological activities such as resonances that were present in the actual chest when singing in chest voice. Van den Berg found this method of register definition incomplete. In response, he classified tones by recording singers and analyzing the recordings by means of a sonagraph to explore the height of each tone as well as the number of significant partials present in the tone. From this analysis, van den Berg created a chart which defined five registers (Strohbass, Chest, Mid, Falsetto, and Whistle) by the amount (weak, medium, strong, increased, and decreased) of involvement of 18 different physiological activities necessary for phonation. From this chart, overlapping regions of vocal instability were evidenced, and supported the idea of register equalization. Register equalization was the process a singer initiated to maintain the sound of one vocal register, which brought us back to the opinion of Fillebrown.59

Table #2: Register Definition by Physiological Activity60 Strohbass Chest Mid Falsetto Whistle Contraction of Weak Weak Medium Strong -- Interarytenoids Contraction of Laterals -- Weak Medium Strong Medium Longitudinal Tension In: Small Large Medium -- -- Vocal muscles Longitudinal Tension In: -- -- Medium Large -- Vocal Ligaments Amplitude of Vocal Folds Large Large Medium Small -- Closure Time Small Large Medium -- -- Number of Partials Small Large Medium Small --

58 Ibid. 38.

59 Fillebrown (1911), 38.

60 Van den Berg, J. (1963). Vocal ligaments versus registers. The NATS Bulletin, Dec, 16-31.

28 Table 2 continued:

Strohbass Chest Mid Falsetto Whistle Effect of Increasing Constriction of Increase Increase Increase -- Decrease Interarytenoids (EICA)

On Pitch EICA Register, eventually → chest ------→ chest Effect of constriction of Increase Increase Increase Increase Decrease laterals (ECL) on Pitch ECL on Register, eventually → chest -- → chest → mid --

Effect of increasing longitudinal tension in vocal Increase Increase Increase Decrease Increase muscles (EITLV) on Pitch EITLV Stop -- → chest → mid Stop Register, eventually

Effect of increasing longitudinal tension of Increase Increase Increase Increase Increase vocal ligaments (ELTVL): Pitch ELTVL Stop → mid → -- Stop Register, eventually falsetto Effect of increasing flow: Increase Increase Increase Increase Decrease On pitch Effect of increasing flow: → chest -- → chest → mid → chest On register, eventually

Van den Berg suggested that register breaks, the stops listed in the chart above, occurred at about 300 cps (cycles per second) or between D4 and Eb4. Van den Berg shared in this article that confirmation

29 of the results was gained from William Vennard, a respected vocal scientist, although no reference was provided to a written source.61 However, Vennard (1970) published his own research on the chest, head, and falsetto registers seven years later. Vennard began his review of existing literature with Duey's account of Tosi (18th century voice instructor), who firmly stated there were only two voice registers; the first register was the upper, lighter register referred to as voce de testa, voce finta, or falsetto which was in contrast to the lower, heavier register referred to as voce di petto or voce piena.62 However, later in his career, Tosi left notes from lessons suggesting there was a difference from head register (previously called the upper register) and the falsetto register (a new register).63 Particularly interesting was that Vennard et al. referenced García's earlier conclusions of three registers – not García's later conclusion of one register referenced by Hemholtz.64 Also, Vennard et al. made an indirect reference to Fillebrown and others who believed in differences in resonances – not different registers.65 Vennard et al. used an electromyography (EMG) collar to record the activity of the voice source in register overlapping regions of the participant's vocal range. Results showed identifiable chest and falsetto ranges for both males and females. Head voice was defined as that area of the vocal range that existed between chest and falsetto. The increased activity of the voice source muscles was in direct correlation with the increase in sung pitches (frequencies). Differences in loudness of the overlapping regions of vocal range were perceived and reflected the electromyography (EMG) analysis. In conclusion, Vennard et al. found heavy phonation involved the myoelastic theory of air flow – but not in light registration. Women were found to utilize a mixed voice when singing in the overlapping register regions, whereas the men did not. Air flow in both sexes was the greatest in falsetto and the least in head voice.66 Large (1973) followed Vennard et al. with an historical study which involved young adult female participants (n = 10) who were chosen from voice teacher recommendations at San Francisco State

61 Van den Berg, (1963), 21.

62 Duey, P. (1950). Bel Canto in its Golden Age: A Study of It's Teaching Concepts. New York: King's Crown Press. p. 113.

63 Ibid., 117.

64 Vennard, W., Hirano, M. & Ohala, J. (1970). Chest, head, and falsetto. The NATS Bulletin, December, 30.

65 Ibid.

66 Vennard, (1970), 37.

30 College. Each participant was auditioned to ensure the ability to sing comfortably in modal and chest registers and to sing the following two tasks: A3 to E4 on the vowel /a/ in chest register at a mezzo forte (medium loud, mf) dynamic; and A4 down to E4 on the vowel /a/ in modal register at a mezzo forte (medium loud, mf) dynamic. Auditors (n = 12) listened to a paired sample perception test of tone pairs to identify the register, the magnitude of register difference, and the identification of the vowel being sung. Results of the perception test yielded a 92% accuracy rating. Spectral analysis revealed that chest voice possessed more energy in the third, fourth, and fifth partials whereas modal register had more energy in the fundamental frequency (F0). Large hypothesized that chest and modal registers used different laryngeal movements.67 Large (1979) continued research into registers by recording participants (N = 10 college voice students, male and female) on a high-fidelity tape recorder singing /a/ in an overlapping region of male and female vocal ranges with the same fundamental frequency (F0 - pitch) and sound level (dynamic). A pneumotachographic system was used to measure airflow rates throughout the singing tasks. The laryngeal movements were analyzed by photography. The results suggested that changes in airflow and in laryngeal movement created sensations in singers that supported the feeling of voice registers, in agreement with the studies of Vennard et al. (1970).68, 69 Schutte and Miller (1984) identified a correlation between the singer's continual focus on even resonance production throughout the entire vocal range (resonance balancing) and scientists' studies of the relationship between the fundamental frequency (F0) and partials found in sung tones. Participants (N = 2, n = 1 professional tenor [author 2] and n = 1 untrained male singer [author 1]) were recorded with sonagrams and spectrograms singing single vowels and vowel transitions on five different register representative pitches. Each vocalization was sung at a dynamic level comfortable for the participant. The participants maintained a distance of 30 centimeters from the microphone. The results showed clear differences between the trained and untrained singers. The trained singer maintained energy in the 2600- 3200 Hz range whereas the untrained singer had stronger energy present in the fundamental frequency

(F0). The vibrato rate of the trained singer was consistent as compared to the untrained singer's vibrato

67 Large, J. (1973). Acoustic study of register equalization in singing. Folia Phoniatrica, 25, 39-61.

68 Large, J. (1979). Studies of the Garcían model for vocal registration. Acoustical Society of America Supplement, 1 (66), Fall 1979, 56.

69 Vennard, (1970), 37.

31 rate. Also, the presences of partials in the trained singer's tones stop around 3200 Hz in contrast to the untrained singer's numerous partials above 3200 Hz and up to 5000 Hz. Schutte and Miller suggested the trained singer is able to adjust his vocal tract to enhance resonances in all voice registers. Schutte and Miller's conclusions begged the question of singer's management of the area of passaggio, that region of overlapping voice registers in which singers must coordinated resonance balancing. They chose F4 (349 Hz) to compare the trained and untrained singer's passaggio regions. The trained singer phonated this frequency in multiple timbres in effort to minimize any perceptible difference in the overall sound. No information was provided regarding the untrained singer's production in the passaggio region. Analysis of the vocal folds, muscles, and ligaments showed that the amount of vocal fold tissue that is engaged in vibration, the amount of tension in the vocal folds, the length or longitudinal tension of the vocal folds as they respond to the cricothyroid muscles, the reaction of the vocal folds to the lateral cricoarytenoid muscles, and the reaction of the larynx to the involved muscles will determine the success of a singer's resonance balance throughout the vocal range. All of this musculature must work in tandem with the amount of subglottal air pressure and overall air flow.70 Titze (1988) brought this discussion of voice registers full circle by proposing two areas of vocal production as types of transitions. He defined the first transition as a periodicity transition and the second as a timbre transition. A periodicity transition occurs when the greatest concentration of energy is actually below the fundamental frequency (F0). A lower area of energy is the crossover frequency (Fc) and was found in the pulse register, which was earlier referred to as the Strohbass register or the vocal fry region. A timbre transition encompassed all other sudden changes in the vocal range wherein there was a loss or gain of high frequency spectral energy. This timbre transition was visually apparent in the spectral slope of the tone. A spectral slope is a visual representation of the measurement of the decrease in amplitude of successive partials of the voice source.71 This measurement is in decibels (dB) per octave. To begin the periodicity transition investigation, Titze created a synthesized listener perception test. The perception test stimuli (samples) were of sung vowels with different glottal flow pulse at F0 of 20 Hz, 40 Hz, 60 Hz, 80 Hz, and 100 HHHz. The stimuli were randomized and repeated. The listeners were asked to identify the stimuli as pulsed (fry voice) or non-pulsed. The single determiner for the listeners was F0. This seems to have been because as the frequency rose - the amount of time the vocal

70 Schutte, H., Miller, D. (1983). Resonance balance in register categories of the singing voice: a spectral analysis study. Folia Phoniatrica, 36, 289-295.

71 Titze, I. (1988). A framework for the study of vocal registers. Journal of Voice, 2 (3), 183-4.

32 folds were closed (CQ – closed quotient) was decreased. This was most evident in the 100 Hz stimuli, which had a recognition factor of 3%, and a closed quotient (CQ) of 10 milliseconds. The listeners also perceived the stimuli in which the crossover frequencies (Fc) were above the fundamental frequency (F0) as pulsed and the reverse as non-pulsed. As for the timbre transition, Titze found that all of the major areas of resonance balancing (register equalization) that singers and voice teachers' reference could be attributed to the first subglottal resonance. The difference between bass and soprano tracheal length is approximately 10-20 % or two to three semitones which would allow for the passaggio range which overlaps between men and women; the last or top passaggio for men from head into falsetto and the first or bottom passaggio for women from chest into head register. Additionally, there are four predictable register breaks (timbre transitions) for each voice type. These four timbre registrations are further delineated into two that are inhibitory (the lower two) and two that are facilitory (the upper two). Understanding each of these transitions and the role that subglottal air pressure and air flow plays in equalizing the transitions would provide singers with a vocal range perceived to have no transitions (breaks).72 Vilkman and Alku (1994) were concerned with the register transition in the lower range from falsetto to chest voice. Participants (N = 2 experienced choir singers, n = 1 male and n = 1 female) were recorded in an anechoic room wearing an electroglottography (EGG) collar and singing into a microphone that was 40 centimeters from the lips. An electroglottography (EGG) is a measuring device of the changes in electrical resistance at the glottis. The singing tasks were a very soft, breathy vowel glide in which the second vowel was produced in head voice and the first vowel was in chest voice. The pitch was chosen by the participant and was to remain constant as the vowel glided from /a/ to /æ/. The researchers modified and duplicated this task with an excised male larynx. The results showed an increase in closed quotient (CQ) and in sound pressure level (SPL) when the participants moved from chest to head (female) and from head to falsetto (male). Vilkman and Alku warned that the vocal fold closure certainly necessary for the register transition, but is not enough of an event by itself to signal the register transition. The authors concluded that register balance is a biomechanical adjustment.73 Miller and Schutte (2005) continued their foray into register equalization by recording professional singers (N = 4) singing an Ab3 scale passage. Electroglottography (EGG) and spectral

72 Titze, (1988), 183-194.

73 Vilkman, E. & Alku, P. (1994). Register shift in the lower pitch range. Proceedings of the Stockholm Music Acoustics Conference, 79, 271-275.

33 analysis of the recorded passages showed the closed quotients (CQ) increased between C4 and D4 for both of the mezzo soprano participants, but the second mezzo’s closed quotient (CQ) was half that of the first, leading to the conclusion that mezzo soprano two better blended the chest and modal registers. Yet, in perception listening tests, voice teacher auditors indicated preference for mezzo soprano one’s scale passages because they heard a more “seamless” scale. The participants were also recorded singing a sustained F4 /a/ (first space on the treble clef) in which each participant changed from chest to modal register. This resulted in a sudden drop in SPL (sound pressure level) at the moment of the register shift. High frequency components (prominent harmonics of 8 and 10) were present in the chest register examples just as in the falsetto range of men.

Furthermore, there was an upward leap in the fundamental frequency (F0) at the moment of register shift that was ‘corrected’ (equalized) within 200 milliseconds. Miller and Schutte suggested that the mezzo soprano register shifts were comparable to a professional yodeler’s rapid alteration between registers, but without the severe pitch changes. This study's results suggested successful register equalization was a product of resonance management versus muscular adjustment of the glottal voice source.74

Björkner et al. (2006) explored the voice source differences of subglottal pressure (Ps) and its relationship to voice source parameters such as closed quotient (Qclosed), peak-to-peak pulse amplitude

(Up-t-p), amplitude of the negative peak of the differentiated flow glottogram (MFDR – maximum flow declination rate), and the normalized amplitude quotient (NAQ) with respect to the female chest (modal register) and the head register. Participants (N = 7 female musical theater singers between the ages of 17 and 43, all classically trained) were recorded wearing a Rothenberg mask singing the CV (consonant- vowel orientation) syllable /pae/ at a pitch of the participant’s choosing in which the participant could sing in both chest and head register. The pitch was sung initially as loud as possible and then gradually decreased to the softest possible sound maintaining the same vocal register. The participant sang three times in each register. The syllable /pae/ was chosen because the high first and second formants of the vowel add to the reliability of inverse filtering and the oral pressure during the p-occlusion allows estimation of sub-glottal pressure (Ps). Listening participants (N = 3) performed a perceptual evaluation of the samples with a computerized listening test (Judge by Svante Granqvist). Upon hearing one of the 280 samples, the listening participants moved a marker on a 1000 millimeter visual analogue rating scale between “Head”

74 Miller, D. G., Schutte, H. K. (2005). “Mixing” the registers: glottal source or vocal tract? Folia Phoniatrica et Logopaedica, 57, 278-291.

34 and “Chest” to designate the degree of register they perceived. From the results of this perception test, seventeen samples were chosen as representative of chest voice and head voice each. The thirty-four samples were analyzed by inverse filtering and suggested that the subglottal pressure (Ps) and glottal adduction were modified when the participant changed from chest to head register and from head to chest register. To successfully master this, a professional theatre singer would require mastery of respiratory and phonation muscles. Consistent with earlier findings (Sundberg et al., 2001), the closed quotient

(Qclosed) is longer in untrained singers in chest voice. Specifically, sub-glottal pressure (Ps), maximum flow declination rate (MFDR), and closed quotient (Qclosed) were higher in chest register whereas the normalized amplitude quotient (NAQ) was higher in head voice. Perceptually, this translated into greater register equalization at soft dynamic levels as compared to register equalization in loud dynamic levels.75 Roers et al. (2008) details a measurement method to predict the length of singer's vocal folds which in turn would predict the singer's voice classification. In this method, measurements of the cartilages and muscles of the voice instrument, from x-rays of singers, were compared to the voice instructors' classifications of the singers. The results showed a significant agreement between the voice classification (subjective) and the vocal fold length measurement method (objective).76

75 Björkner, E., Sundberg, J., Cleveland, T., & Stone, Ed. (2006). Voice source differences between registers in female musical theater singers. Journal of Voice, 20 (2), 187-197.

76 Roers, F., Mürde, D., & Sundberg, J. (2008). Predicted singers' vocal fold lengths and voice classification – as study of x- ray morphological measures. Journal of Voice, Article in Press, accepted for publication December 6, 2007, 1-9.

35

CHAPTER THREE

HISTORY OF ACOUSTIC CHORAL MUSIC MEASUREMENT

"Throughout the centuries, until recently, music and acoustics have been closely allied. To be a musician, it was necessary to know thoroughly the science of sound, and the acoustician pursued his theories and experiments almost wholly for the benefit of music. Today, musicians as a group know far too little about acoustics, and acousticians know less about music.” (Knudsen, 1937) 77

The following collection of acoustic choral music research article annotations is arranged historically so the reader can learn how the science has also developed to provide a strong foundation of knowledge for the reader as we move into the 21st century. Glossaries are provided to aid understanding of the material presented. Every effort is made to provide the necessary details of each research effort as they apply to choral sound. 1878 – 1969 Helmholtz (1878, German physician and physicist) designated all sound as either music or noise - determined solely by the hearer of said sound. Each tone has three specific characteristics that are unique to it: its force; its pitch, and its quality.78 Force was recognized as a perceptual idea. It could measure the amplitude of the oscillating particle, but not the specifics of the differences in what one hears. The pitch of the tone is measured by the time taken for one oscillation. Helmholtz hypothesized the quality of a sound is how the motion of the amplitude is performed within the time of one oscillation.79 From this simple hypothesis, the stage is set for acoustic choral measurement. Fillebrown (1911, oral surgeon) references Helmholtz's writings which stated resonance is enhanced by the pharynx and head cavities.80 Edmund Meyer (1886, German ophthalmologist) suggested a study to determine exactly which parts of the pharynx and head cavities played an active role in resonance. From this suggestion, Fillebrown, motivated by his oral surgery patients, came to the

77 Knudson, B. (1987). Interviews with selected choral conductors concerning rationale and practices regarding choral blend. (Doctoral Dissertation, The Florida State University, 1987). UMI ProQuest Digital Dissertation Abstracts, 135, AAT 8802564.

78 Helmholtz, (1885), 10-11.

79 Ibid., 19.

80 Fillebrown, (1911), 43.

36 informed conclusion that resonance occurred when the vibrations of the air in the resonance chambers of the human instrument, together with the induced vibrations of the instrument itself, which [then] give tone its sonority, its reach , its color, and emotional power.81 Fillebrown's informed conclusions came from the life long practice of teaching voice lessons. His experiences and knowledge of the human anatomy led him to the conclusion that the human voice had but one voice register. This was in contrast to the practice of the day in which most singers were taught three voice registers occurred; the chest, head, and falsetto registers. Fillebrown explained his one register belief through a comparison of the human voice instrument to the pipe organ. The pipe organ has a separate pipe for each timbre and register. Not so for the human instrument that is but one amalgamated instrument with multiple timbres and pitches at its disposal, all representative of one register. Teaching a voice student to focus the voice in the nose and the head, and how to feel the vibrations of the head through a light touch of one's finger will allow development of the entire voice without concern for a break such as the lowest passaggio for women often found between D and D#.82 Good vocal quality possesses each of these descriptors, but what physiological structure provides for this in male baritones? Bartholomew (1934) implemented a study wherein male participants (N is not provided) were recorded singing into a condenser microphone with resistance coupled amplification that led to an oscillator. New technology included film speed capable of separating frequencies (pitches) in excess of 8000 cycles per second. From both amateur and professional singers, Bartholomew secured forty-six films to evaluate for a scientific definition of good voice quality: possessing vibrato; consistent tonal intensity; a strengthened low partial at 500 cycles per second (cps) or lower; the presence of a high formant between 2400 and 3200 cycles per second (cps); and sometimes, another formant peak around 5700 cycles per second (cps). This study is considered the first documented, scientific study of voice acoustics and certainly the first mention of what would soon be labeled “the singer’s formant” (2800-3200 cps). The formant peak around 5700 cycles per second (cps) Bartholomew hypothesized occurred when the larynx pipe was energized strongly enough such that its natural octave would appear.83 Bartholomew also recorded female singers, including a coloratura soprano. The results compared equally with the men's results except the highest formant centered around 3200 cycles per second (cps) -

81 Fillebrown, (1911), 45.

82 Ibid., 38-42.

83 Bartholomew, W. (1934). A physical definition of “good voice-quality” in the male voice. Journal of the Acoustical Society of America, 5 (3), 25-33.

37 perhaps due to a smaller larynx. Another interesting perceptual point was that the auditors (N is not provided) were willing to accept poorer quality tones from the women yet defined the tones as good quality. A perfect example is the coloratura soprano whose films did not show a high formant peak, yet the auditors deemed the tone representative of good voice quality because of its extreme “purity”.84 From 1929 to 1949, many leaps were made in technology which created an atmosphere for the study of acoustics, and in particular, arts acoustics. It was during this time that we began to understand the vibrations of the vocal folds and the sensation of its reception to the brain. Music directors learned of decibels (dB) as well as the physical and psychological aspects of hearing. Measuring equipment was developed and included sound recordings with pictures of frequency and intensity. Improvements in microphones, loud speakers, and vacuum tube amplifiers changed recording procedures and the reproduction of sound into two speakers, stereophonic, transmission changed forever how music could be heard.85 Manen and Fry (1956) of the University College of London had access to the latest technologies including x-rays and spectrographs. With this technology, the authors proposed a voice classification system for three primary voice qualities: light, lyric, and dramatic could be created. For each of these voice classifications, each singer was said to have a battery of moods and vowels from which to articulate any given music. The singers were x-rayed (for physiological definition) and recorded (for acoustic determination) on spectrographs singing a declared mood and vowel. From this process, twenty-seven possible combinations of voice classification were defined.86 Rshevkin (1956, Moscow State University) took a different approach in that the male participants (N is unknown) were recorded with an oscillograph singing five vowels (/a/, /e/, /i/, /u/, and /o/) for one second on a variety of pitches from 94 cycles per second (cps) to 490 cycles per second (cps). From these recordings, harmonic analysis revealed energy peaks in the narrow band regions of 400-600 cycles per second (cps) and 2200-2800 cycles per second (cps) in the professional singers. Singers voices with the

“singer’s formant” (Fs) were described as having a metallic sounding timbre. Amateur singers’ voice timbre was described as sounding sharp. [Rshevkin's choices of adjectives to describe voice timbre are confusing, but are his words.] Amateur singers’ records also showed a decrease in amplitude in pitches

84 Bartholomew, (1934), 25-33.

85 Bartholomew, W. (1949). The contributions of acoustics to the arts. Journal of the Acoustical Society of America, 21 (4), 311-314.

86 Manen, L. & Fry, D. (1956). A basis for the acoustical study of singing. Journal of the Acoustical Society of America, 28 (4), 757.

38 higher than 325 cycles per second (cps). An interesting conclusion presented by the authors was that singers begin a vowel sound by aligning formants for identification of the vowel and then realign the vocal apparatus to “the singing position”. The speed with which a singer switches to the “singing position” is a determinant of a professional singer.87 One year later, Sacerdote (1957, Acoustical Laboratory, Istituti Ellettrotecnico Nazionale, Torino, Italy) published an update on the measurement possibilities of the singer’s voice: sustained notes could now be examined as to their quality of sound, amplitude and frequency vibrato; the movement from one note to another note as in portamento ( a sung glide of notes in between the first notated pitch and the second notated pitch), and glissando ( sung notes of a scale between the first notated pitch and the second notated pitch), and the relationship between what one hears and what one phonates (auto-regulation). Sacerdote suggested successful study of the singing voice could be accomplished from these perspectives: movement of the vocal folds; sets of resonant cavities used for different registers; measurement of air in vocal production; external measurement of sound pressure produced through phonation; and the total mechanical of vocal emission. Unlike many of his contemporaries, Sacerdote supported research into all manner of voices – those trained and untrained, educated and un-educated, and beginners to singing as well as life long singers – not just professionals. From Sacerdote's guidelines, the outline of voice research, and the framework for choral sound measurement was defined. The first choir recordings made for analysis were of a soprano monodic choir (N is not provided) singing a fragment of a song. Then Sacerdote recorded a male monodic choir (N = 15) singing a plainsong chant. The soprano choir vibrato rate far exceeded that of the male choir. Sacerdote refers to a synchronization phenomenon that occurs when the fusion of timbres, similar discrimination of pitch, and the blend of individual intensities occur as a possible reason for the differences in vibrato rate. Perceptually, the ear will hear the mean of the individual extremes.88 In contrast to Sacerdote, Lottermoser and Meyer (1960, German physicists) studied commercial recordings of polyphonic choirs. They were interested in the intonation of intervals – specifically major and minor thirds, fifths and octaves. The measurement reporting method used to compare intervals of different sizes was cents, which is a logarithmic scale in which the octave is divided into 1200 equal parts or 1200 cents. Lottermoser and Meyer used the equal temperament scale and therefore each semitone was

87 Rshevkin, S. (1956). Some results of the analysis of singing voice. Program of the Fifty-First Meeting of the Acoustical Society of America’s Joint Meeting with the Second ICA Congress. Cambridge, MA., 34-36.

88 Sacerdote, G. (1957). Researches on the singing voice. Acustica, 7, 61-68.

39 100 cents. The major thirds (400 Hz) averaged large at 416 cents (about 1/6th of a semitone) whereas the minor thirds (300 Hz) were small at an average of 276 cents (almost 1/4th of a semitone). All three choirs used octaves and fifths in just intonation. The fundamental frequency (F0) was measured by the bandwidth of partial tones and found to have a large range of variation – 6 to 30 cents.89 The next choral study came just one year later and was the first choral dissertation (University of Utah, 1958) which Lambson condensed and published in 1961. Lambson’s participants (N is not provided) were members of a college chorale, probably 18-25 years old. These participants were recorded singing two contrasting songs in each of the following four formations; sectional block (each voice type clustered together), quartets (one of each voice type by each other), scatter/scramble (a formation created by voice matching small groups of 2-3 singers at a time), and random distribution (singers stood wherever they wanted). During the recording, auditors (N = 10) used the MENC (Music Educators National Conference) standard adjudication form to evaluate each performance. The auditors were unable to see the choirs – only hear them – so the choral formation was not a factor in their ratings. The recordings were randomized and played again for the same auditors to evaluate. Choral formation was not detectable in either the live or recorded samples and therefore preference was not expressed for one formation over another. The scatter/scramble choral formation received the lowest ratings and was described as acoustically inferior. Lambson concluded this study with an acknowledgment that choral tone could not be acoustically measured at this time.90 The decade of the sixties was not a strong period of recorded choral music measurement. Instead, much of the measurement continued in which individual voice studies focused on understanding the anatomy and physiology of the voice source. Janwillem van den Berg (1963, Dutch speech scientist and medical physicist) conducted extensive research on excised human larynges which led to his theory of the myoelastic-aerodynamic theory of voice production. The myoelastic-aerodynamic theory of voice production is two theories (myoelastic and aerodynamic) that are not in contention with one another but work simultaneously to produce phonation. The myoelastic theory holds that air pressure builds up below the closed vocal folds (sub-glottic pressure) until the folds open allowing air to move freely through the glottis until the folds come together again and the cycle is repeated. This cycle is represented numerically as the cycles-per-second (cps) which in turn determines the frequency of the pitch phonated – for instance

89 Lottermoser, W. & Meyer, F. (1960). Frequenzmessunger an gesungenen akkorden. Akustica, 10, 181-184.

90 Lambson, A. (1961). An evaluation of various seating plans used in choral singing. Journal of Research in Music Education, 9 (1), 47-54.

40 A4 is 440 cycles per second (440 Hz). The aerodynamic theory speaks to the muscles and cartilages affected by the free flowing air through the glottis. The myoelastic-aerodynamic theory joined with the Bernoulli principle, when applied to phonation, explains the increase of the air through the glottis occurring simultaneously with a decrease in vocal fold tension. The decrease in vocal fold tension allows for rippling action as the air moves through – much like a shower curtain's movement when the shower space's air pressure is moved by the force of the shower water. This aerodynamic activity causes the vocal folds to be moved into vibration before the arytenoids are together. When the arytenoids fully close, the air pressure stops and must build up again beginning another cycle. Of particular interest to this effort is van den Berg’s acoustical analyses of the vibrational patterns of the vocal ligaments when singing in the area of register overlapping. Initial classification of registers was established by taking into account the height of the pitch and the number of significant partials as recorded by sonagraph. Van den Berg recorded vibrational patterns with a stroboscope and delta f generator. The information gathered through this process is reflected in a generalized Scheme of Registers table which lists the events of the vocal ligaments in each of the five primary voice registers: Strohbass, chest, mid, falsetto, and whistle, supporting the author’s conclusion that the longitudinal tension in the vocal ligaments and muscles is an important factor in register equalization.91 Along with register equalization, singers must also choose the mode of singing for a given situation. Harper's dissertation (1967, Indiana University) addressed his concern about the vocal health of singers who must fluctuate between solo and choral singing. What information regarding the differences and the similarities between solo and choral singing would aid conductors in preserving singers' vocal health? Participants (N = 12 male and female university music students who had had private voice lessons) were recorded singing in both the choral and solo setting. Each music selection, whether for solo or choral conditions, was chosen because the author believed more realistic results would be gained from actual serious literature which the participants had prepared in the two modes and each selection possessed the vowels on specific pitches and for a useable duration for analysis. The vowels chosen were /i/, /a/, and /u/. The choral portion utilized Mendelssohn’s He Watching Over Israel from Elijah, Mendelssohn’s first chorus from Opus 42 Like As the Hart, Handel’s Glory to God and Surely He Hath Borne Our Griefs from Messiah, and Schein’s Who With Grieving Soweth. The choral selections were practiced for one hour a week for four weeks. The study was performed in a recording studio that had been acoustically

91 Van den Berg, (1963), 16-31.

41 treated to reduce outside noise and reverberance. The choir was seated in two rows facing the conductor. The individual choral singer being recorded stood approximately four feet behind the back row of the ensemble facing the conductor. The participant sang into a standing microphone in a blended fashion to the conductor’s approval. The ensemble was accompanied by a piano. The solo portion used appropriate movements, from the same mass works mentioned above, chosen by voice classification. The participant stood in front of a standing microphone positioned three inches from the participant’s mouth, configured to create a 900 angle to the breath stream to guard against distortion from plosive consonants. A plywood box with fiberglass insulation was hung six inches in front of the microphone to shield the ensemble sounds from the microphone. The soloist was accompanied by a piano. The intensity levels reached in the choral sections were identified from the VU meter on the tape recorder. These measurements were logged and used as a reference for the solo sections. Researchers aided the participants in producing the same or near same intensity level in solo singing that was accomplished in the choral singing. Using the recorded solo mode intensity levels for each participant, the researchers provided visual cues to direct the participant to sing with more or less intensity as the recording proceeded to approximate matching levels for both modes. This was done to rule out intensity level as an effect on mode of phonation. The perception samples were taken from both the solo and choral singing modes of the same vowel at the same pitch. All manner of variables were randomized and represented in a 54 ABX test (sample A, sample B, and then either sample A or B again – participant determines if the last sample (X) is either A or B). Each trio of sound had a choral vowel, a solo vowel, and then a repetition of one of them. Spoken numbers were inserted before each increment of five (number five, number 10, number 15, and so on …). The listening participants (N = 31, n = 14 linguistics majors and n = 17 music majors) were asked to identify which of the first two vowel sounds the third vowel sound matched. Listening participants were unable to discern between solo and choral vowels. This was supported in the analysis of the spectrograms. In particular, there were no differences between the first and second formants. Variances occurred between subjects. The listening participants were able to discern accurately between choral quality and solo quality more than 75% of the time. This also was supported in the spectrogram analyses in which the partials between the formants had more energy in the choral quality than in the solo quality for most singing participants but not at all pitch levels. There was no notable difference in the soprano voices when they sang chorally – but in the solo mode the sopranos had

42 increased energy in the upper formant regions for the higher pitched /i/ and /u/. The alto and tenor had more partial energy in the choral mode when singing the /i/ vowel. No enunciation differences were identified by the listening participants. Harper makes direct comparisons of his data with that of his predecessors and found that unlike Rshevkin (1956), there was no change of the formants after the initial enunciation vowels. None of the vowel spectrograms Harper analyzed showed evidence of the singer's formant (Fs). Harper hypothesized the increased energy in the first and second formants (F1, F2) in choral singing might have been caused by increased nasality or by relaxed musculature.92 1970 – 1979 The seventies began with another hallmark study regarding voice registers. Vennard (1970, voice professor), Hirano (otolaryngologist), and Ohala (linguistics professor), recorded participants (N = 4, n = 2 women and n = 2 men) singing two octave scales in light registration and again in heavy registration in a variety of keys resulting in over 150 recorded scales. Participants were fitted with new electromyography (EMG) electrodes that had been developed the previous year, which were lighter and more flexible, providing the necessary equipment finesse for the studies of Vennard et al. After the scales, the participants performed messa di voce (swell tones) in light and heavy registration on a variety of tones throughout the vocal ranges established by the scales. The scales, both ascending and descending, were successful in light registration. In heavy registration, the female participants struggled descending and one of the male participants struggled ascending. Chest and falsetto were present and easily identified in both male and female participant. The area between the two was significant and yet different; the male area seemed to be a clear head voice but the female area was more of a mix. Vennard et al. named the pitch region between chest and falsetto as mid-voice or mixed registration which were supported by electromyography (EMG) graphs. Muscular activity was detailed in the graphs leading to the conclusions that in light registration the musculature works minimally with heavy air flow in contrast to heavy registration where the musculature was heavily involved and the aerodynamics provide merely what was needed for the musculature. The exception to this physiology occurred in falsetto where the air flow was the greatest followed by chest and the least in mid registration. One of the participants, who had a three plus octave range in both light and heavy registration without a decrease or increase in volume,

92 Harper, A. H., Jr. (1967). Spectrographic Comparison of Certain Vowels to Ascertain Differences Between Solo and Choral Singing, Reinforced by Aural Comparison. (Doctoral dissertation, Indiana University, 1967).

43 repeatedly readjusted the vocalis muscle involvement, never allowing it to become excessive.93 This was the first mention of how to manage register equalization – although Vennard does not make this statement. Large (1973), however, looks at register equalization from high-fidelity tape recordings of participants (N = 10 female, college voice students) singing the vowel /a/. The singing tasks were two- fold: in the first task, the participants sang /a/ in chest voice on A3 (220 Hz) up to E4 (330 Hz) maintaining the same intensity level and vibrato rate for four seconds; the second task was to sing /a/ on A4 (440 Hz) down to E4 (330 Hz) in mid register maintaining the same intensity level and vibrato rate for four seconds. If a participant was successful in the above two tasks, the participant was asked to repeat the tone pairs in a variety of timbres for both tones and then with a timbre change from the first tone to the second tone. Finally, the participants sang the tone pairs with subtle timbre differences between the tone pairs and then with obvious timbre differences between the two tones. In each task, the participant was reminded to maintain vowel consistency throughout the experiments.

Figure 3: Singing Tasks For each task, the participant needed to remain equidistant from the two microphones and the sound level meter so that the angle was 90o to the breath stream. A string was attached to the chin of the participant and run to the microphone of the sound level meter microphone, which was in the center of the two tape recorder microphones, to ensure consistent distance.

93 Vernard, W., Hirano, M., & Ohala, J. (1970). Chest, head and falsetto. The NATS Bulletin, December, 30-37.

44 From the tone pair collection, thirty-four tone pairs, representing each of the tasks described above, were chosen for a perception test. Some of the pairs were spliced and rearranged to include created clips which were half chest voice with half mid-voice and half mid-voice with half chest voice. Then the thirty-four samples were randomized and presented in the perception test with five seconds between each part of the sample (tone 1 + 5 sec + tone 2), 3.5 for auditor response and then the number of the next sample verbalized prior to the sounding of the next sample. The first playing of the test, the auditors (N = 12 experts in vocal performance and pedagogy) were asked to identify the register of each tone; C for chest and M for mid register. The second playing of the test followed immediately the first test in which the auditors were asked to rank the timbre difference between the two pairs on a scale from I to V. I represented no difference and V represented a great difference. For both playings of the perception test, the auditors were asked to ignore any differences in vowel, nasality, pitch, loudness or vibrato. The auditors were able to replay any tone as often as they wished. To check for consistency, the tone pairs were presented in reverse order when they occurred in the second playing of the perception test. Large employed a phonetics specialist to listen to the perception test and identify the IPA symbol of each tone presented. The concern was that variances in vowel presentation could effect the perception of register by the auditors. Darker vowels could mean chest register to some auditors whereas lighter vowels would signal head register. Additionally, different vowels have different spectral characteristics and those very differences could be incorrectly attributed to register differences or equalization. The acoustic evaluation of the tone pairs was accomplished through sonagraphs of the recordings. To measure the partials, a 2.4 second segment of each tone was low-band filtered. The amplitude was superimposed on the sonagram for a visual record of the overall level. For vibrato measurements, twenty- four millisecond samples at five locations of the 2.4 second segment were selected and analyzed for peak one, negative slope, valley, positive slope, and peak two. The results were discussed and supported with specific tables representing the perceptual, linguistic, reliability, and acoustic analyses. The reliability of the perception participants was quite high at 92%. Of interest is that in this study, the perception participant with the greatest amount of error had the most years of teaching experience (41 years) and the participants with the greatest amount of agreement had but one to four years of teaching experience. The presentation of the tone pairs did not affect the reliability of the perception test scores. Register judgments did not appear vowel dependant in this perception test. There was a direct correlation between the changes in the spectral envelope and the auditor perceived register changes. The chest voice registration for a fundamental frequency (F0) of 330

45 Hz had greater energy in the harmonics above the third partial, particularly four and five, in contrast to the mid register’s energy that was found in the fundamental frequency (F0) and the third partial. Yet, when the auditors identified the paired tones as sounding from the same register [equalized], all partial descriptors used above were minimized in the spectral envelopes. In conclusion, the chest and mid registers appeared to be the result of different vocal fold waveforms.94 Hunt's dissertation (1970, University of North Texas) stated the purpose of his study was to evaluate the use of spectrograms in the analysis of choral sounds. Three choirs, (one junior high, one high school, one college- all 40-50 members), were recorded singing three unison vowels (/i/, /ε/, /a/) on the pitches of a C . The initial pitch was given by a piano. Recordings were made of men alone, women alone, and men/women together. Auditors (N = 7 choral directors from the faculty of a major university) evaluated vocal blend of random ordered recordings as good, acceptable or poor. Auditors were also asked to identify the vowel being sung. The complete perception test was played twice to check for auditor reliability. The auditors were consistent in their responses 66% of the time. Spectrograms of recordings were compared to the auditors’ ratings. Good ratings correlated with spectrograms in which the spectral distributions of concentrated sound showed perfect alignment of the frequency bands in the natural harmonic series. Female voices in all choirs received the highest ratings, and particularly the high school females. Hunt concluded good choral blend is achieved when all of the acoustical factors in the choral sound are aligned with the natural harmonic series; those naturally occurring overtones produced from the fundamental frequencies of each tone sung. Therefore, unity of vowel sound is essential to good vocal blend. No author evaluation was provided regarding the suitability or success of spectrograms for choral sound analysis.95 Haack (1975, University of Kansas) utilizes much of the format explained above to present his study on the effect of loudness on participants’ perceptions of pitch, loudness, rhythm, timbre, and tonal memory. Although equipment information and usage are not detailed, the creation of the perception tests and the statistical analysis of the responses were quite clear. The thirty perception test questions were taken from the Seashore Measures of Musical Talent.96

94 Large, J. (1973). Acoustic study of register equalization in singing. Folia Phoniatrica, 25, 39-61.

95 Hunt, W. (1970). Spectrographic Analysis of the Acoustical Properties of Selected Vowels in Choral Sound. (Education Degree Dissertation, University of North Texas, 1970).

96 Seashore, C. (1938). Psychology of Music. New York: McGraw-Hill Book Co., Inc., p. 86.

46 The thirty questions were ranked according to difficulty level, split into three equal groups with ten questions presented soft (45-50 dB), ten questions presented moderate (75-80 dB) and the remaining 10 presented loud (105-110 dB). The loudness levels of the recordings were confirmed with the use of a sound level meter that had been placed in the perception testing area, although the participants were not present. The 30 samples were randomized for the presentation of the perception test. The listening participants (N = 101, n = 46 college undergraduate music majors and n = 55 college undergraduate non-music majors) took the perception test in four to eight member groups in an acoustically treated room. The numbers of participants were managed to ensure the accuracy of the loudness levels and the effect of their bodies on the overall acoustics of the room. Each testing period was designed at a duration of thirty minutes in an effort to reduce the effect of listening fatigue. The participants’ results were analyzed for error rate and then compared within the participant subsets. Music majors had a higher percentage of correct answers in loud presentations over the non- music majors, who had an increase in errors – particularly pitch discrimination. Overall, the author concluded that music presented in the loud condition allowed for an increase in harmonic partials, thereby making sound discrimination more difficult. Participants’ timbral discrimination was most accurate when the samples were presented at the moderate level. In conclusion, Haack suggests music factor discrimination tests [perception tests] must have rigorous controls enforced on maintaining a constant level of loudness in the presentation of the samples. Additionally, only clear examples of the test questions should be used. If indeed a teacher wishes a student to learn to distinguish between different timbres, the initial presentation of the sample material should be at a medium to soft loudness level. If the goal is for the student to be able to discern dynamic levels, then the presentation should be at louder levels to facilitate discrimination.97 Cleveland (1977, voice scientist) explored the use of timbre as a definitive characteristic from which to determine a male singer’s voice classification. Participants (N = 8 professional male singers) were recorded singing /i/, /e/, /α/, /o/, /u/ on the pitches C3, F3, A3, E4. Listening participants (N = not provided, all vocal pedagogues) assigned a voice classification (bass, baritone, tenor) to each vowel vocalization. For reliability, each vocalization occurred three times throughout the perception test. The perception test divided into two sessions of 25 minutes with a break of 30 minutes between the two sessions. No discussion regarding the perception test took place during the break.

97 Hack, P. A. (1975). The influence of loudness on the discrimination of musical sound factors. Journal of Research in Music Education, (1), 67-77.

47 Spectral analysis of each vocalization allowed for visual identification of the four lowest formant frequencies. The lowest four formant frequencies were averaged to obtain a mean formant frequency (MFF) for each singing participant. The mean formant frequencies (MFF) were compared to the voice classifications that the listening participants’ identified for each participant. The comparison between the voice classification and the mean formant frequencies (MFF) had a significant statistical correlation (0.001 level of confidence). Therefore, the conclusion was that the mean of the first four formant frequencies was a determining factor for voice classification. To rule out pitch as a confounding element, the fundamental frequencies of each pitch were correlated with the mean formant frequencies (MFF) for each singer participant and the voice classifications of the listener participants. This correlation revealed an 0.81 correlation coefficient and therefore, pitch, also, was an influential acoustical property in voice classification. The next consideration was the voice source spectrum in which Cleveland investigated the presence of the expected -12 dB octave slope. At the low extremes of the tenor range and the high extremes of the bass range, the octave slope was diminished. Possible reasons for this diminution could be singer vocal inefficiency as evidenced by greater vocal effort in the voice source spectrum. However, the voice source information supports Sundberg’s (1973) findings: source spectrum differs only slightly between dark and light voices…and the development of voice timbre in voice training would be a matter of learning a special articulation rather than having the vocal chords [sic] to vibrate in a very special way.”98 Three more experiments were part of this study: the same procedure as experiment one recreated with synthesized sounds of the information garnered from experiment one; experiment two was a comparison of the mean of the center range pitch for each voice classification with the results of experiment showing a correlation coefficient of 0.97. As Cleveland explored the range extent of the singing participants, observations were made that have implications for voice pedagogues with regard to non-professional voices of any age. This study linked high formant frequencies with high voice classifications and low formant frequencies with low voice classifications. However, sometimes a voice's timbre will seem to be a mismatch with their voice classification. In these circumstances, Cleveland suggests the voice timbre should initially determine the voice classification until the range can be developed and then the center range frequency could be identified and aid voice classification. Consider

98 Sundberg, J. (1973). The source spectrum in professional singing. Folia Phoniatrica, 25, 87.

48 the two following tables in which the first table is the average formant frequencies of timbre types and the second table is the average formant frequencies of voice classes. Table 3: Comparison of the Average Formant Frequencies of Timbre Types to the Average Formant Frequencies of Voice Classifications

/i/ F1 F2 F3 F4 /i/ F1 F2 F3 F4 Tenor 312 1996 2602 3116 Tenor 304 1969 2567 3105 Δ (%) 9 7 4 5 Δ (%) 9 13 3 7 Baritone 287 1864 2500 2973 Baritone 278 1744 2482 2897 Δ (%) 3 20 8 6 Δ (%) -7 10 12 5 Bass 278 1557 2312 2800 Bass 300 1587 2214 2752

/e/ F1 F2 F3 F4 /e/ F1 F2 F3 F4 Tenor 352 1942 2424 3041 Tenor 350 1942 2414 3061 Δ (%) 1 11 4 3 Δ (%) 0 17 7 7 Baritone 348 1744 2339 2950 Baritone 350 1662 2247 2873 Δ (%) -1 14 13 5 Δ (%) -2 8 10 4 Bass 350 1533 2075 2822 Bass 356 1539 2041 2754

/a/ F1 F2 F3 F4 /a/ F1 F2 F3 F4 Tenor 623 1005 2620 2919 Tenor 609 994 2576 2909 Δ (%) 19 7 6 3 Δ (%) 15 5 7 2 Baritone 522 942 2478 2823 Baritone 530 944 2400 2849 Δ (%) -1.5 1 8 3 Δ (%) 5 5 1 13 Bass 530 934 2298 2741 Bass 503 900 2386 2527

/o/ F1 F2 F3 F4 /o/ F1 F2 F3 F4 Tenor 389 698 2757 3010 Tenor 401 724 2706 2989 Δ (%) -6 -7 7 3 Δ (%) 3 2 6 3 Baritone 414 750 2569 2938 Baritone 391 711 2554 2906 Δ (%) 13 6 0 1 Δ (%) 7 -2 -2 -2 Bass 366 708 2557 2899 Bass 365 729 2605 2969

/u/ F1 F2 F3 F4 /u/ F1 F2 F3 F4 Tenor 339 683 2538 2944 Tenor 330 682 2548 2957 Δ (%) 4 -2 1 4 Δ (%) -1 -5 5 9 Baritone 326 700 2522 2843 Baritone 333 719 2420 2716 Δ (%) -3 -6 5 5 Δ (%) -4 -4 -5 -2 Bass 336 742 2404 2700 Bass 348 749 2536 2784

49 The final experiment was a comparison of the singing participant's morphology of the larynx with the voice classification results which suggested that the overall vocal tract length for basses is 19 centimeters and for tenors is 15.5 centimeters. These results were approximated by utilizing Fant's formant measurements of the vowel /i/ in which the second formant (F2) is a half-wavelength resonance of 99 the back cavity and the third formant (F3) is dependant on the front resonance cavity. The percentage of differences between these two formant measurements allowed for hypothetical approximations of voice classification by vocal tract length. The four experiments were linked together by perception test results, and were successful in supporting the hypothesis that voice timbre types are significantly dependant on formant frequencies and pitch. Specifically, the average of the first four formants of a singer is appropriate for voice classification. Also, the differentiating pharyngeal morphology between tenors and basses seems to be a miniature version of the already established difference between male and female pharyngeal length.100 Imagine having the ability to immediately see the actual anatomy of an individual, as they are auditioning for a choir, as well as biofeedback equipment to aid the auditionee is producing more or less of the singer’s formant per the conductor’s directions. Magill (Wright State University) & Jacobsen (1978, United States Air Force) concluded from their study regarding the singer’s formant, that indeed this would be a valuable tool – but suggested that this option was still out of grasp. Participants (N = 35, n = 22 college voice students representing all voice classifications equally and with less than 3 years of voice training and n = 13 professional singers equally representing voice classifications) were recorded singing pitch-specific sustained vowels (/i/, /a/, and /u/) and a major arpeggio in each of the three vowels. Spectral analysis was accomplished through the newest technology, the fast Fourier analyzer, which showed 222% more harmonic energy present in the 2 kHz to 4 kHz area of the professional participants versus the student participants. The most marked difference between participant groups was found between the professional and student female participants. More harmonic energy was present in all of the participants singers formant region, the difference was the amount of energy. In other words, the amount of training showed a significant effect on the strength of the singer's formant (Fs) region (2-4 kHz). All of the male participants showed more energy in the singer's formant (Fs) compared to the women whom

99 Fant, (1970).

100 Cleveland, T. (1977). Acoustic properties of voice timbre types and their influence on voice classification. Journal of the Acoustical Society of America, 61 (6), 1622-1629.

50 Magill & Jacobsen suggest could have been because the lower fundamental frequency (F0) allowed for a 101 greater number of harmonics to fall within the singer's formant (Fs) frequency envelope. Colton (voice scientist) and Estill's (1979), reported results had definitive spectral envelopes for all dynamic levels produced by participants (N is unknown) as well as frequency bandwidths and resonant peak locations with regard to specific modes of vocal production. The participants sang pitches throughout their vocal range in four different voice qualities. Perception participants were successful in identifying each vocal mode which also matched the physiological results. Colton proposed singers should explore the unique features (perhaps through spectral analysis and biofeedback) of their voices to provide a variety of healthy voice qualities.102 Large (1979) continued his investigations into register equalization by recording participants (N = 10 college voice students, male and female) on a high-fidelity tape recorder singing /a/ in an overlapping region of male and female vocal ranges with the same fundamental frequency (pitch) and sound level

(dynamic). [This is probably in the pitch range of A4 to D5.] A pneumotacho-graphic system was used to measure airflow rates and laryngeal movements were analyzed by photography. The results suggest changes in airflow and laryngeal movement create sensations in singers that support the feeling of vocal registers.103 At the same convention, Titze (1979) described vocal registers as either a stable or transitional region of the vocal range located in a four-dimensional space consisting of cricothyroid, thryoarytenoid, inter-arytenoid, and pulmonary stress.104 Participants (N is unknown) were recorded with glottography and electromyography (EMG). Computer simulation was utilized to verify conclusions. The chest register was found to be the most stable voice register with the greatest oscillation [vocal fold vibration] possibility. The falsetto voice register was still stable but conditions for oscillation are not favorable. The vocal fry region, sometimes referred to as creaky phonation, was defined as a transitional state that occurred between each of the following vocal events; glottal stop, chest register, and falsetto register.

101 Magill, P. & Jacobsen, L. (1978). A comparison of the singing formant in the voices of professional and student singers. Journal of Research in Music Education, 26 (4), 456-469.

102 Colton, R. & Estill, J. (1979). Elements of quality variation voice modes and singing. Acoustical Society of America Supplement, 1 (66), 55-56.

103 Large, J. (1979). Studies of the Garcían model for vocal registration. Acoustical Society of America Supplement, 1 (66), Fall 1979, 56.

104 Titze, I. (1979). A physiological interpretation of vocal registers. Acoustical Society of America Supplement, 1 (66), Fall 1979, 55-56.

51 1980 – 1989 Throughout the seventies, research improved understanding of the singer’s formant, voice registers, modes of phonation, the effect of loudness, and the effect of other singers’ sounds on singers. Goodwin (1980, Virginia Intermont College) began the eighties by exploring the changes that occurred in his participants’ (N = 30 college sopranos) solo and choral mode of phonation. In each mode, the participants were recorded singing six sustained vowels (/a, /o/, /u/, /e/, and /i/) on the pitches C4 (261 Hz), A4 (440 Hz), and F5 (698 Hz). When performing in the choral mode, the participant sang in a blended fashion with a soprano choir played through the participant’s headphones. Spectral analysis of the recordings showed blended tones to have fewer and weaker partials on frequencies above the first formant whereas the partials in the first formant region were strong. Solo singing mode had higher levels of intensity in the second and third formant region perhaps suggesting presence of the singer’s formant. The choral mode had lower overall intensity as compared to the solo mode. Goodwin suggested vowel modification occurred in the strength of the formants – not a change in the formant frequencies. This correlated with a reduction in overall partials which served as an aid in camouflaging an individual’s voice through reduction of aural cues (usually in the 200-3500 Hz frequency range). Additionally, Goodwin’s participants had a reduction in overall intensity in the choral blend mode. These same participants did not show a reduction in vibrato rate when comparison was made between solo and choral mode. Goodwin hypothesized this may be due to unconscious vibrato rate synchronization as reflected in the spectral envelope.105 In barbershop singing, vibrato rate is greatly reduced and therefore pitch accuracy is tantamount to accurate chord singing – a hallmark of barbershop style. Reduction in vibrato rate allows for purer intonation because of less beat influence. Sundberg (music acoustician) and Hagerman (1980, technical audiologist) found the greatest challenge of successful barbershop singing is fundamental frequency (F0) adjustment from chord to chord. Two professional barbershop quartets (N = 8) were recorded singing homophonic chordal exercises on the consonant-vowel (CV) syllables /mα/ and /mo/ with accelerometers

(contact microphones) glued to their neck a few centimeters below the thyroid cartilage. F0 analysis was performed on each of the chords and found to be quite accurate. The inherent intonation issues between the two vowels chosen were not present in these results. The greatest interval accuracies occurred for the simplest intervals (octave, fourth, and fifth) which are also the intervals with the greatest number of

105 Goodwin, A. (1980). An acoustical study of individual voices in choral blend. Journal of Research in Music Education, 28 (2), 119-128.

52 common partials. The standard deviation of the absolute magnitudes of all the intervals ranged from 4.3 cents to 16.9 cents. The researchers compared the intervallic ratios of just and Pythagorean intonations and found no clear preference from these participants. Instead, the participants seemed to “stretch” intervals for expression and energy momentum between chords. Within all the chords analyzed, the most successfully tuned intervals were those in which the reference tone (usually the F0 of the lead singer) had a high degree of accuracy as well as the intervals that shared the greatest number of partials.106 In choral singing, then, what acoustical factors influence pitch precision of individual singers? Sundberg and Ternström (1982, voice scientist and music acoustician) explored this idea in the next four studies. First, participants (N = unknown males) were recorded singing a major third or a over a synthesized human stimulus tone played over loudspeakers at 75 dB for 10 seconds. The participants were equipped with a contact microphone glued to their throat just below the larynx. The recordings were measured and displayed as histograms with the F0 on the y-axis and duration (time) on the x-axis. The standard deviations of the F0 disagreement from the target pitch were determined to use as a measure of the difficulty of intonation. (If the interval was easy to sing, the accuracy would be high and vice versa). The first experiment had the choir sing a normal eight chord warm-up cadence in their normal rehearsal room until the conductor believed the choir had achieved good intonation (five times). The F0 of each bass (N = 6) was measured and compared to the target pitches. The standard deviation of all 192 pitches was 13 cent (range 3 to 24 cent).

Figure 4: Warm-up Cadence Participants (N = 4 skilled male singers) in the second experiment replicated the initial study but in a sound treated booth, which allowed for sound reflections in the booth but controlled for outside noise. The premise was to see if different room acoustic characteristics impacted the participants’ intonation. The standard deviation of all participants’ 23 tones ranged from 3 to 30 cents. The greatest deviations

106 Hagerman, B. & Sundberg, J. (1980). Fundamental frequency adjustment in barbershop singing. Journal of Research in Singing, 4 (1), 3-17.

53 occurred when the reference tones were different in vowel – not in pitch, which suggests vowel properties influence the scatter of F0. The third experiment took place in an anechoic chamber to focus on the effect of the stimulus tones. The same set of intervals was manipulated with three different acoustic characteristics: vibrato with a frequency swing of ± 1.5%; common partials; and all partials higher that the lowest common partial. In this experiment, the pitch range was confined to E3 – A3. Four of the stimulus tones were repeated to check for participant reliability. The standard deviation of all participants’ (N = 18 sex unknown) fundamental frequency (F0) ranged from 19 to 45 cents. Reliability was significantly high for three of the four repeated tokens; two were identical, one was a single digit off, and the third had a difference of five. Intonation accuracy was greatest when the stimulus tones included common partials, high partials, and a lack of vibrato – much like the successes recognized in the barbershop study. However, this effort also emphasized a choir singer’s intonation precision is dependant on the acoustic properties of the reference sound (room acoustics, standing formation, self to other ratios) reaching his/her ears.107

Vibrato without audible fluctuations of F0 is referred to as straight tone singing – the goal of barbershop singing and of many early music ensembles. Weber (1992 dissertation) sought to find out what the differences of intensity were between singing with vibrato and straight tone singing. College choir sopranos (N = 20) were recorded singing /a/ for representative low, middle, and high pitches in loud and soft dynamics with both vibrato and straight tone resulting in 24 trials per soprano (each condition was repeated). Analysis of the recording found no significant difference in sound pressure level (SPL) for any condition except for loud vibrato. The final conclusion of this study was a suggestion that conductors determine the vibrato rate for the choir by the acoustics of the performance space.108 Yet, even if the vibrato rate were varied to best suit the performance hall, a decision must be made regarding how loudly each choir member should sing to facilitate an ensemble sound. Ternström and Sundberg (1983) sought to define the ratio between one’s own vocal production and what one hears from others’ vocal production that would aid intonation between singers in a choir. An initial study was

107 Ternström, S., Sundberg, J. (1982). Acoustical factors related to pitch precision in choir singing. Speech Transmission Lab. Qt. Prog. Status Rep. 2-3, 76-90 (Dept. of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm).

108 Weber, S. T. (1992). An investigation of intensity differences between vibrato and straight tone singing (Doctoral dissertation, Arizona State University, 1992). ProQuest Dissertation Abstracts International, AAT 9223155.

54 performed to establish sound pressure levels (SPLs) within different singer locations in a choir. The researcher wore binaural headphones and sat in different locations while recording a 30 minute rehearsal for two separate choirs. The first choir was an amateur choir practicing in a reverberant hall and the second was a professional choir rehearsing on a concert hall stage. Sound pressure level (SPL) was evaluated using histograms and showed the basses singing mezzo forte had an average sound pressure level (SPL) that was 90-95 dB. The average sound pressure level (SPL) of the rest of the choir to the bass was averaged at 80 dB. The standard deviation of the sound pressure level (SPL) was about 10 dB. A stimulus test was synthesized utilizing /u/ (poor in spectral components at high frequencies) and /a/ (rich in spectral components at high frequencies) at approximately 40 dB for all tokens. Each token had duration of 9 seconds, a vibrato with an extent of ± 1%, and a fundamental frequency around G4 or 196 Hz. Participants (N = 9 highly experienced choral singers) wore headphones and were recorded in an anechoic room, seated in front of a microphone with a sound pressure level (SPL) meter and a small lamp to signal the playing and duration of stimulus tones. The participants were asked to maintain 90 dB on the sound pressure level (SPL) meter. Participants were directed to sing in unison with the stimulus tone matching the vowel and pitch while maintaining the sound pressure level (SPL) at 90 dB and keeping a distance of 15 centimeters from the microphone until the lamp went off. All recorded samples were measured for fundamental frequency. Vibrato was removed from the samples with an appropriate filter. The linear average and the standard deviation of the samples were obtained from fundamental frequency (F0) histograms. Results were fairly similar for both vowels. Some of the participants’ responses were more dependant than others on the amplitude of the reference stimulus. Of note was that the higher pitched vowel /u/ required a higher amplitude of reference for a more accurate fundamental frequency (F0) which the authors suggested was because the volume required to produce the /u/ at higher pitches was louder than the volume needed to produce the vowel /a/. Should this occur within a song, the choir could feasibly lose intonation. Authors suggested conductors could experiment with louder reference tones, different spacing between singers (closer or further apart), or change the acoustic properties of the room.109 Killian (1985, music education, Texas Tech University) investigated if listeners had a preference for a particular voice category (soprano, alto, tenor, or bass) in four voice chorales. Not provided was

109 Ternström, S., Sundberg, J. (1983). How loudly should you hear your colleagues and yourself? A study of SPL within choirs. Speech Transmission Lab. Qt. Prog. Status Rep. 4, 16-26 (Dept. Of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm).

55 information regarding singer spacing, reference tones or the room in which the recordings were made. Phrases from the sixth, fourteenth, and twentieth chorales of Loewe’s oratorio Das Sühnopfer des neuen Bundes were sung (singers unknown) and recorded with each vocal part on a separate track. The phrases had all rhythmic variation removed so that no individual vocal part could be detected. All phrases were sung on an undisclosed neutral vowel to maintain timbre uniformity. Five conditions were created from the individual tracks: first all voices in perfect balance and then each individual vocal part ~ 11 dB louder which created four unbalanced choral samples. A listening test was created with a random mix of 10 of the five conditions from the possible 60 samples. Participants (N = 85, n = 52 high school choral students and n = 35 practiced choral conductors) listened to the first playing of a sample and then on the second playing altered the voice part that was perceived as out of balance to the point of “perfect balance”. This task was repeated ten times. Participants were successful in identifying unbalanced choral samples. Yet, when choosing the amount of the unbalanced part to be utilized for a perfect balance, the unbalanced part still remained louder (unbalanced). Preference was shown for less bass with respect to the other voice parts (soprano, alto, tenor) in all conditions. No significant differences were found for any of the demographic subsets (i.e. conductors versus students) except that men preferred louder levels than women. Killian summarized fewer basses are needed for a balanced choir.110 Of course, the amount of bass would be dependant on the vocal output of the individual basses. Marshall (acoustician) & Meyer (1985) hypothesized the room reverberation would have an effect on singer output. A male quartet was recorded in a hemi-anechoic room (floor reflections only). Room reverberation of the recorded sound was taken out in varying increments and played so that the singers could sing with the recording. The singers expressed preference for 15-35 milliseconds of reverberance (which correlates to reflecting wall distances of 2.6-6.0 meters) and strong dislike for 40 milliseconds (correlates to reflecting wall distances of 6.5 meters). 111 Room reflections are but one effect on a singer’s ability to hear themselves and other choir members. Another effect is known as masking - the phenomenon that one cannot hear a normally audible sound because of the presence of a competing sound. Maxwell (1986, music educator) hypothesized

110 Killian, J. N. (1985). Operant preference for vocal balance in four-voice chorales. Journal of Research in Music Education, 33 (1), 55-67.

111 Marshall, A., & Meyer, J. (1985). The directivity and auditory impressions of singers. Akustica, 58, 130-140.

56 singer vocalization would be different with and without masking noise showing an effect on both voice quality and voice production. Because masking can occur without warning or preparation, Maxwell wondered if a protocol for voice lessons should be designed to aid singers in creating an auditory tonal imagery of pharyngeal/laryngeal control such that overall vocal quality and production would be enhanced in the eventuality of the occurrence of masking. In the first of three studies, participants (N = 24 college voice majors) were recorded singing vocalizes and song excerpts with and without masking noise. Participants (N = 15) in the second study sang “The Star Spangled Banner” in a key of their choosing. Masking noise was added at an unknown, random point. The creation of the masking noise along with its properties is not provided. The third study was a ten-week longitudinal study with four treatment conditions: normal lessons and normal practice (CG – control group); white noise lessons and normal practice; normal lessons and white noise practice; and white noise lessons and white noise practice. In each study, recordings were made of each participant prior to and after the study. From these recordings, a listening tape was made for judge participants (N = 9, n = 3 voice teachers, n = 3 professional non-voice musicians, and n = 3 lay musicians). For each listening pair (pre and post recording of participant) the judges ranked voice quality and intonation as better, same, or worse (studies one and two). For the third study, the ranking had five options: great progress, considerable progress, some progress, same, and worse. The judges’ perceptions of the first study participants’ samples found white noise adversely affected participant intonation and voice quality. Not surprising then that the judges were able to detect the point when masking noise had been introduced in the second study. The sample group of the third study which received the highest mean score ranking was those who had masking noise during their lessons. However, comparison of variance within groups found much greater variance within all other groups outside the control group. Teacher guidance with white noise indeed produced greater results. Participants with no teacher guidance of white noise regressed. In all studies, participants tended to flat

112 ascending passages; sharp descending passages, sharp sustained notes, and modify /α/ to /ɔ/ or /a/. How does modifying vowels affect their spectra? Bloothooft (1984, physicist and phonetician), in three landmark studies, considered the effect of fundamental frequency, mode of singing, and voice classification on the vowel spectra. Participants (N = 14 professional singers, n = 7 men and n = 7 women) were recorded on a microphone (0.3 centimeters from the singers’ mouths) singing nine Dutch

112 Maxwell, D. (1986). The effect of white noise masking on singers. Journal of Research in Singing, 8 (2), 9-19.

57 vowels on five voice classification specific pitches (including 392 Hz for a falsetto produced tone in the males) in nine different modes of production (neutral, light, dark, free, pressed, soft (pianissimo), loud (fortissimo), straight (without vibrato), and extra vibrato for a duration of 1-2 seconds. Measurements were taken from recorded samples at the point where the pitch had stabilized, including at least one vibrato cycle, and every 300 milliseconds to obtain ten 10 milliseconds tokens. The fundamental frequency (F0) of all 10 tokens was averaged for a mean fundamental frequency (MF0). In the first experiment, Bloothooft looked specifically for spectral variance. When the target pitch was above 98 Hz, the spectral variance greatly dropped which the authors attributed to vowel variance. Above 660 Hz, the variance became voice classification specific influenced by the mode of singing. For instance, tenors and sopranos had the greatest variance and particularly with the /u/ vowel in a pressed mode of phonation. Overall, mode of singing had very little effect on the spectral variance in all voice classifications. Spectral variation was effected the most by vowel production.113 The same participant data was used for the second evaluation in which the effect of the fundamental frequency (F0) on vowel spectra with respect to the factor vowels. Factor vowels are the single most important source of spectra variance for low fundamental frequency (F0). Male and female variants were consistent with one another. The relationship between the average sound level of the singer’s formant (Fs) and the fundamental frequency (F0) was found to be vowel dependant. Higher fundamental frequency (F0 = 392 Hz) resulted in a lower singer’s formant (Fs) in women. Modal register had less variability in the first formant (F1) than the falsetto register and it was hypothesized that this is because in higher singing the first formant (F1) is very close to the fundamental frequency (F0). Bloothooft references Sundberg (1981) “strong acoustic coupling between glottis and vocal tract” as a possible cause.114, 115 Characteristics of the participants and the mode of phonation were the focus of the third study. Bloothooft and Plomp (1985, psycho-physician) found the main difference reflected in the male participants’ spectral characteristics was the vocal tract dimensions whereas in female participants, the

113 Bloothooft, G., Plomp, R. (1984). Spectral analysis of sung vowels: I. variation due to differences between vowels, singers, and modes of singing. Journal of the Acoustical Society of America, 75 (4), 1259-1264.

114 Sundberg, J. (1981). Formants and fundamental frequency control in singing: an experimental study of coupling between vocal tract and voice source. Acustica, 49, 47-54.

115 Bloothooft, G., Plomp, R. (1985). Spectral analysis of sung vowels II. The effect of fundamental frequency on vowel spectra. Journal of the Acoustical Society of America, 77 (4), 1580-1588.

58 primary difference was glottal opening. Vocal effort seemed to be a noticeable factor in the spectral characteristics of the different modes of phonation. Increased pharyngeal volume, influenced by the height of the larynx, seemed to also be a factor in the pressed-dark mode of phonation. The data collected and the results presented concur with Cleveland’s116 study of male timbre.117 Rossing et al. (1986, physicist) returns to the comparison of the solo mode of singing to the choral mode of singing. Participants (N = 6, n = 5 sopranos, n = 1 mezzo soprano) were recorded singing both solo and choir passages from Mendelssohn’s Hear My Prayer. A binaural microphone was head mounted with a stethoscope holder .05 meters from the mouth as well as a sound pressure level meter .05 meters from the mouth. When the soprano sang in solo mode, the soprano was positioned in front of the choir and when the soprano sang in choir mode the soprano was seated in the soprano section. The participant being recorded wore earphones which provided the sounds of the piano accompaniment, the other singers, and a small amount of the individual. Four identical passages from the solo and the choir sections of music were recorded and analyzed using long term average spectra (LTAS). In choir mode, the sopranos showed small changes in the sound level of the first formant and increases in the higher formants which is consistent with Gauffin & Sundberg's (1980)118 findings for the voice source. The average slope of the sound pressure level (SPL) up to 90 dB in choir mode was about 1.5 as was the case with Bloothooft (1985).119 In choir mode, both the onset and offset had low sound levels and a reduction in vibrato consistent with authors’ findings for bass/baritones (1984).120 In solo singing, the vibrato extent was increased which is consistent with Goodwin (1980)121 who found that sopranos reduced their vibrato to achieve greater choral blend. LTAS results also suggest that higher frequencies display less variation between solo and choir modes. The authors’ conclusions include that the increasing of the amplitude of the partials around 3 kHz can be achieved by any singer who raises their loudness. However, when this occurs with trained singers, the increase is caused by a clustering of the higher formants – what is referred to as the singer's formant (Fs). In female voices, this translates as a smaller than normal frequency

116 Cleveland, (1977), 1627-1628.

117 Bloothooft, (1986), 852-864.

118 Gauffin, J. & Sundberg, J. (1980). Data on the glottal voice source behavior in vowel production. STL-QPSR 61-70.

119 Bloothooft, (1985), 1580-1588.

120 Bloothooft, (1984), 1259-1264.

121 Goodwin, (1980), 119-128.

59 separation of the third and fourth formants. The ability to maintain this clustering of frequencies is what would summarily be one of the characteristics of a trained singer.122

In an effort to define the most valued spectral characteristics of singers in the choral and solo modes of singing, Letowski et al. (1988, audiologist) designed a research study to record and evaluate vocal production. In the initial design, participants (N = 5 chamber choir sopranos) were recorded singing the soprano part of A Lullaby (by Maklakiewicz) in four conditions: recording of all sopranos; recording of each soprano individually; recording of all sopranos minus one; repeat recording of all sopranos; and repeat of recording sopranos minus one. Author stated the sound pressure level (SPL) was maintained at 75-80 dB for all recordings, but no indication was given of the measurement procedure. The two groups of recordings of all sopranos minus one were blended and compared to the two groups of recordings of all sopranos. The singer participants were unable to distinguish between the finished products. A formal listening test was designed using a 15 question randomized design of all combined recordings. Each question presented three recordings and asked the participant which of the triad sounded most different from the other two. The perception participants’ (N = 10, n = 5 conductors and n = 5 sound engineers) results indicated that the recording procedure was perceptually equivalent. The main study recording was with the entire Warsaw Madrigal Singers (N = 17, n = 5 sopranos, n = 4 altos, n = 5 tenors, and n = 3 basses) singing Befiehl du deine Wege (a four-bar excerpt from Bach’s St. Matthew Passion) in exactly the same order as described above. Only twelve individual singers (three from each section) of the original 17 were used for the solo conditions. Spectral envelope comparisons were made of the recordings and indicated that two of the basses indeed did use increased energy in the singer’s formant region when in solo mode and not in choral mode. Other more amateur participants showed decreased energy in the spectral envelope when in choral mode. The listening test created from these recordings was again triadic but the task was to rank order the three samples on a scale of least pleasant to most pleasant. Participants' (N = 10 – no information is given regarding these participants) test results were consistent with the spectral analysis, even as to the specific singer. Overall results of this study showed a definite effect for vocal training as evidenced in the presence of singer’s formant in soloistic modes of singing. Untrained voices in the choral mode were described as brighter, more stable, and more colorful whereas trained voices seemed to reduce vocal energy with a darker timbre. In the solo mode, untrained singers struggled with maintaining tempo and tone duration

122 Rossing, T., Sundberg, J., Ternström, S. (1987). Acoustic comparison of soprano solo and choir singing. Journal of the Acoustical Society of America, 82 (3), 830-836.

60 whereas the trained singers were very consistent in all recording conditions. This study suggests that untrained singers exhibited soloistic tendencies (richer, more energetic vocal quality) when in choral 123 mode and that trained singers reduced their singer's formant (Fs) when singing in choral mode. But when singing in choral mode – whether trained or untrained – what is the intonation level of the individual singers? This is the question Ternström and Sundberg (1988) explored through the next four studies. The first study involved a researcher sitting in two different intra-choir locations during two different choir rehearsals in two different rooms (a rehearsal room, a performance stage) wearing binaural head microphones recording 30 minutes of each rehearsal. Recordings were measured for SPL (sound pressure level) and found to be between 50 and 100 dB averaging around 80 dB. To estimate the acoustic (airborne) feedback, individual singers’ mouths were positioned 0.15 meters (the average distance between the mouth and ears) from a SLM (sound level meter) which confirmed the average sound pressure level (SPL) to be about 80 dB for mf singing. The second study involved professional bass choir singers (N = 9) who were fitted with headphones and seated with the sound pressure meter (SPM) 0.15 centimeters from their mouth and able to see the sound pressure meter (SLM) display screen. The vowels /u/ and /α/ were simulated by a synthesizer and played through the singer’s headphones at nine different sound pressure levels (SPLs - 0, -5, -10, -20, -25, -30, -35, and -38). The participant was instructed to match the vowel heard and maintain it at a sound pressure level (SPL) of 90 dB. The fundamental frequency (F0), the mean fundamental frequency (MF0), and the standard deviation were calculated for each of the recordings. The mean fundamental frequency (MF0) results for the participants ranged from 26 to 112 cents. One applicable conclusion was that a reference level can be varied over a range of 25 dB relative to ones own sound pressure level (SPL) without causing pitch errors. However, singers with strong pitch-amplitude dependence might falter earlier. This study bears out the difference in difficulty to sing darker, more closed vowels (/u/) with accurate pitch than those vowels which have more jaw opening and more forward focus (/α/). The third study involved synthesized tones which possessed all, some, or none of the following acoustic characteristics; vibrato, common partials, partials higher than the lower common partial; and ranged in pitch from E3 to A3. Participants (N = 18 bass/baritones) were recorded singing thirds or fifths (as directed by an instruction sheet) above the synthesized stimulus tones played over a loud speaker in an

123 Letowksi, T., Zimak, L., & Ciolkosz-Lupinowa, H. (1988). Timbre differences of an individual voice in solo and in choral singing. Archives of Acoustics, 13 (1-2), 55-65.

61 anechoic room. Participants wore microphones fastened to the throat just below the larynx. All conditions of the stimuli resulted in similar dispersion. Results ranged from 15 to 40 cents – much larger than the average unison choral scatter of 13 cents. Authors concluded the reduction in intonation accuracy was due to the total absence of common partials in the synthesized reference tones. The fourth study attempted to recreate the singer’s actual experience within a choir; reference tones were provided by a choral ensemble wherein the conductor first gave pitches then had the ensemble sing the pitch for 9 seconds. Sixteen reference tones were derived from the choirs’ efforts and then were randomly manipulated to add or subtract common partials. The final stimulus tape had the two forms of manipulated tones, the choir tones, and vowel reinforced tones. Individual male participants (N = 17) were then recorded by larynx mounted microphones and air microphones singing prescribed thirds and fifths over a loud speaker sounded reference tone for a duration of 8 seconds maintaining a sound pressure level (SPL) of 90dB. Greatest intonation accuracy was achieved when the lowest common partial has the greatest amplitude. There was no measurable effect for the vowel /a/ even when reinforced, but the vowel /u/ changed as much as 2.6 cents/dB. The sound pressure level (SPL) variance can vary over a 25 dB range without effecting the overall fundamental frequency (F0) agreement within the choir. In conclusion, again, individuals within a choir – just as a soloist – have the greatest approximation of fundamental frequency (F0) when the reference tone is louder than singer feedback, the lowest common partials are audible, high partials are present, and the vibrato rate is small.124 Building on the studies above, Ternström et al. wondered if intonation would suffer when singers had changes in articulation, for instance when the diction required movement from one vowel to the next while sustaining a tone. The task was to investigate singer success of maintaining pitch accuracy

(fundamental frequency stability – F0) while moving through vowels. Participants (N = 6 semi- professional singers with formal training, n = 2 women, n = 4 men), wore headphones and were recorded through a miniature microphone attached to the bridge of the nose singing vowel changes at a singer chosen comfortable pitch. Participants were encouraged to imagine the pitch before sounding the pitch and to make all vowel changes rapidly. The participants repeated the exercise after a one hour break and in the second session the vowel pairs were reversed. In each run the vowel pairs were sung twice. The first time the vowel pair was sung with masking noise played through the headphones; the second time was without masking noise.

124 Ternström, S., Sundberg, J. (1988). Intonation precision of choir singers. Journal of the Acoustic Society of America, 84(1), 59-70.

62 In the analysis procedure, the visual analyzer looked for the presence of an obvious fundamental frequency (F0) transient and a bimodal fundamental frequency (F0) histogram as criterion of a vowel transition. The masking noise was evidenced by a large initial transient in the participants’ fundament frequency (F0) contours. The peaks of the fundamental frequency (F0) contours, as shown through histogram and spectrum, were taller and narrower in the non-masked vowel pairs which were interpreted as having higher fundamental frequency (F0) stability. Immediately, the analyzer performed a t-test on the mean differences between the two runs and found the fundamental frequency (F0) stability to be significantly better on the second run indicating a positive effect of training. Results in this study agreed with speech results in that the vowel pairs' /i/ to /y/ and /o/ to /y/ (high vowels – with tongue movement) 125 had consistent fundamental frequency (F0) effect. Similar results were also seen in /ε/ to /i/ and /e/ to /i/ attributed to jaw opening as well as /i/ to /y/ which is probably larynx height. The resultant average fundamental frequency (F0) effect was 10 cents. However, some singers in the masked condition had fundamental frequency (F0) effects in the range of 50 cents which would be detrimental in a choral ensemble. Of great importance is the conclusion from this study that singers need auditory feedback for accuracy and precision of fundamental frequency (F0). When subjects could not hear their own voices, fundamental frequency (F0) perturbations increased. When articulation between vowels caused fundamental frequency (F0) feedback and the singers could hear their own voice, singers attempted compensation but not with a high degree of success.126 In addition to singing in tune, choir singers must blend with other singers such that individual voices are not distinguishable. One way choir singers achieve choral blend is to match their articulation, specifically on vowels which are determined by the formant frequencies of the vocal tract. Cleveland127 found a gradual increase in the formant frequencies of male singers going from the lowest singers (basses) to the highest singers (tenors). Rossing et al128 found solo singers reduced the area of the singer’s

125 Honda, K. (1983). Relationship between pitch control and vowel articulation. In D. M. Bless & J. H. Abbs (Eds.), Vocal Fold Physiology (pp.286-297). San Diego: College-Hill Press.

126 Ternström, S., Sundberg, J., Colldén, A. (1988). Articulatory F0 perturbations and auditory feedback. Journal of Speech and Hearing Research, 31 (June), 187-192.

127 Cleveland, (1977), 1622-1629.

128 Rossing, et al., (1986), 1975-1981.

63 formant but increased the amplitude of the fundamental when performing in choral mode. Goodwin’s129 participants used less amplitude in choral mode and the higher partials of their spectrum were reduced. No one to date had looked at vowel agreement, specifically the vocal tract's production of the first and second formants (F1, F2) in choral singing. Ternström et al recorded participants (N = 8 bass choral singers) singing and speaking in an anechoic room. Participants were recorded with an air microphone and a larynx wall contact microphone. The song text had nine locations of the following seven appropriate vowels to measure: /o/, /ǽ/, /a/, /e/, /i/, /ǿ/, and /ε/. The participants sang the excerpt four times and spoke the excerpt four times. The mean fundamental frequency (MF0) and the standard deviation of the fundamental frequency (SF0) analysis was performed on the third rendering of each condition in addition to the difference between the fourth formant (F4) minus the third formant (F3) to give information regarding the singer’s formant. Also, the lower formants (F1 and F2) define the vowel quality whereas third formant (F3) and the fourth formant (F4) give significant information regarding singer voice timbre. Finally, LTAS was performed on the entire four measure phrase to compare against the upper formant measures for additional singer’s formant data. The results showed a tendency of the participants to neutralize the vowel when singing, suggesting that choral singers do modify vowels toward a unified sound. This idea was represented in a clustering of scatter in the mean fundamental frequency (MF0) and the standard deviation of the fundamental frequency

(SF0) analysis and further substantiated in the agreement of the first three formants in singing as compared to speaking. However, in this data group, there was but a very slight increase in the fourth formant minus the third formant (F4 – F3) area which suggests that singer’s formant is a solo characteristic – not choral.130 Ternström and Sundberg created the first choir synthesization which handled six voices independently, the fundamental frequency contours of each voice (F), and put each voice on its own channel. To adequately reproduce a choral sound, the same vowels, reverberation, and vibrato were utilized for each voice. A comparison was made against a recording of a unison male choir singing the vowel /o/ with a synthesized “choir” singing the same vowel. The synthesized choir was found to have fundamental frequency contour variations less than 0.8%. The greatest success of the synthesized choral

129 Goodwin, (1980), 119-128.

130 Ternström, S., & Sundberg, J. (1989). Formant frequencies of choir singers. Journal of the Acoustical Society of America, 86 (2), 517-522.

64 sound was achieved when a chord was built utilizing one unsteady voice superimposed five times with a healthy dose of reverberation.131 1990-1999 If a choir was to have one person who sang with strong energy in the singer’s formant region, and the rest of the singers sang with strong energy in the fundamental frequency, how would the conductor arrange the singers to aid in producing a blended choral sound? Tocheff (1990, Ohio State University dissertation) explored the benefits of many different choral arrangements as recommended by prominent conductors – Weston Noble, Rodney Eichenberger, and John Williams. Two college choirs performed excerpts from Verdi’s Requiem and Bach’s Lobet dem Herrn in four different formations including acoustical placement of voices within section, mixed, and unorganized formations while singing in polyphonic or homophonic texture resulting in 32 recorded performances. The choir, under the direction of the same conductor, sang the two choral excerpts in each of the formations. Auditors (N = 5 high school choral conductors) assessed the performances for overall blend on a six point Likert scale and then attempted to identify the choral formation in use. In homophonic texture, the acoustic placement of voices in a sectional formation scored the highest choral blend. Sectional formations also had the least amount of individual singer protrusion. Intonation was better in the polyphonic selection yet more voices stood out of the texture especially in choral formations that did not use acoustic voice placement. Acoustic voice placement describes a subjective, aural method of arranging singers whose tones blend (cause very little beating). No equipment or software was used to measure acoustic compatibility of singers.132 Anderson (1993, University of Missouri-Kansas City dissertation) was also interested in vowel timbre and the ability of choir singers from three distinct groups to identify aurally vowel timbre and intelligibility. Anderson wondered what the ability of auditors would be to identify a vowel if ¼th of the singers sang a predetermined rogue vowel. Three choirs participated in this study: a high school choir (N = 220); a community choir (N = 111); and a university choir (N = 89). Each choir listened to a perception test played over loudspeakers in a normal rehearsal room. The perception task had three sections: the first section asked the participant to determine where the soloist’s timbre ranked on a dark-to-bright continuum; the second section had recordings of a four-voice ensemble and again the participant rated the

131 Ternström, S., Sundberg, J., Anders, F. (1988). Synthesizing choir singing. Journal of Voice, (1)4, pp. 332-335.

132 Tocheff, R. D. (1990). Acoustical Placement of Voices in Choral Formations. (Doctoral Dissertation, The Ohio State University, 1990).

65 token on a dark-to-bright continuum; and the third section asked the participant to identify the single voice part which had different timbral characteristics than the other three voices singing the chorale. All examples were from the same Bach chorale number 102, Ermuntre dich, mein schwacher Geist. In the recorded samples, the selected voices sang the chorale each time on one of the following three vowels: /u/, /a/, or /i/ representing from left to right the dark-to-bright continuum. Each of the 25 recorded samples had a specific design of voice parts performing as well as what vowel(s) were being sung, for example, the soprano, tenor, and bass would sing /a/ while the alto sang /i/. Some samples were only one section singing in unison one vowel – all sopranos singing /i/ and other samples had all four parts singing their individual parts on a unison /u/ vowel. The perception test results showed no effect for subject demographics (age, training, education, experience, setting). All three groups were consistent in their perceptions identifying the four part unison vowel recordings on the vowel hierarchy. However, the participants’ perceptions of the vowel hierarchy were different than the hierarchy chosen by the author. Participants expressed preference for /u/ (darkest) 81% of the time. The vowel /i/ was identified correctly as the rogue vowel 61% of the time, whereas the vowel /u/ was identified correctly 52% of the time. Anderson suggests that a better explanation of terms and perhaps a training period for timbre aural identification would have improved participants’ success in timbre identification.133 Anderson believed a more or less reverberant room (the exact room information is not provided) for the recording of the samples’ would have impacted his auditors’ perceptions. Ternström (1991) provided a synopsis for the Journal of Voice of five independent studies, all relating to choir acoustics for his dissertation (1989).134 The results provide an understanding of the dynamic range of choirs; the feedback and reference sounds necessary for choir members; of the effect of wow, flutter and jitter on fundamental frequency; overall choral intonation accuracy; sources of scatter; choral timbre; self-to-other ratios; room acoustics; and closes with a discussion of the chorus effect. The applicable individual studies are reported separately here in order to provide the specifics not included in the

133 Anderson, S. E. (1993). Choral singers’ timbral descriptions and evaluations of recorded choral excerpts using a dark-to- bright vowel hierarchy. (DMA Dissertation, University of Missouri-Kansas City, 1993).

134 Ternström, S. (1989). Acoustical Aspects of Choir Singing. (Ph.D. Dissertation, Department of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm). ISSN 0280-9850.

66 overview of this article. But this article is masterful in providing an understanding of relating all of the studies within a given specific parameter, such as loudness.135 One of the articles used for Ternström's dissertation explored what the effect of different rooms would be on choirs of different demographics. A boy’s choir (n =16, average age 12), a mixed youth choir (n = 30, average age 18) and a mixed adult choir (n = 27, average age 30) were recorded in the same basement assembly room, a custom designed rehearsal hall, and a large stone church. A diffuse field was unattainable in the basement location. Each choir was recorded ten times: 2 different songs; 4 dynamic levels – pianissimo (pp – very soft), mezzo-forte (mf – medium loud),fortissimo (ff – very loud), mf; and a repeated mf which was obstructed by the singers’ folders being held in front of their faces as they sang. The boy’s choir sang the soprano part in unison for their recordings and the same songs were sung by the other choirs in four part harmony. To determine the distance from the microphone to each singer, the room reverberation was measured and doubled. This distance was to ensure that the recorded sound would be from the diffuse sound field and not the nearest singers. Long term averaged spectra (LTAS) analysis was chosen for the measurement method because of its ability to illustrate overall voice timbre across time. To determine the effect of the room acoustics, a reference sound – such as a fan – can provide a known, accurate power spectrum from which the vocal sound measurements can be compared. The reference sound source used was a Bruel and Kjaer sound generator model 4202– basically a powerful fan – designed to generate a broadband noise which is known to produce a stable 0.1 – 10 kHz noise at an output power of about 91 dB. The effect of the three different rooms showed a significant variance in the vocal effort of the singers. In the basement and the church, the youth and adult choirs changed their effort in response to the room absorption. All of the choirs had an increase of 5-10% in the higher formant frequencies in the basement as compared to the performance hall. Ternström hypothesized this could be from modifying the vocal tract instead of increased phonation effort. Particularly interesting is that the boys choir did not use more vocal effort in the basement but did have a pressed phonation. The music binder component of the study revealed a greater effect on clarity of sound in the church than in the hall or the basement for all

135 Ternström, S. (1991). Physical and acoustic factors that interact with the singer to produce the choral sound. Journal of Voice, 5 (2), 128-143.

67 choirs. The dynamic variations’ from pp to ff reflected a maximum amplitude difference of 12-20 dB. In conclusion, Ternström found LTAS to be an incomplete measure of choir timbre.136 In his continuing effort to find a measurement process for choral timbre, Ternström created a synthesized choir in which all components of the sound would be known and controlled. Choral educated participants (N = 9) listened to twelve stereophonic samples of a synthesized SATB choir singing unison vowels (/u/, /a/, and /æ/). Samples were randomly presented to the participants by the perception trial software program. Participants were given six options for the pitch scatter (fundamental frequency dispersion – how in tune is the sound) and for the spectral smear (vocal tract length differences – vowel clarity and timbre) and were asked to choose the “maximum tolerable” modification to the sound and then the “preferred” modification of the sound. Participants expressed preference for 0 – 5 cents pitch scatter but would tolerate pitch scatter between 10 to 15 cents. Smear was more difficult and less consistent for the participants yet the results show participant preference for 2 to 6% and tolerance up to 12%. The synthesized singer representation of /u/ had the greatest amount of participant preference and tolerance.137 Coleman (1994, otolaryngologist and voice scientist) decided to break down the measurement of a choir to the smallest denominator to evaluate the acoustic and physiologic differences between measuring solo singing and ensemble singing. Guide Us Now, O Great Jehovah and Hiding in Thee were sung by a vocally trained tenor and baritone who have performed together ten years. The microphone was placed on the supra-sternal notch. The location for the recording was an empty 1200 seat sanctuary. The premise was to evaluate the difference in measuring two voices versus one voice. The average difference in onset time was 29.6 ms. Vibrato was reduced by 50% in unison sections of music as compared to harmony sections (where two pitches were sung simultaneously). The sound pressure level (SPL) range was 28dB in the duet sections; 21 dB for the baritone solo sections, and 24 dB for the tenor sections.

Singer's formant (Fs) was exhibited in the duet sections with 2.5- to 3.3-kHz energy enhanced regions. All of these results lead to the conclusion that duet singing employs components of both solo and ensemble techniques.138

136 Ternström, S. (1993). Long-time average spectrum characteristics of different choirs in different rooms. The Journal of the British Voice Association, 2, 55-77.

137 Ternström, S. (1993). Perceptual evaluations of voice scatter in unison choir sounds. Journal of Voice, 7 (2), 129-135.

138 Coleman, R. (1994). Acoustic and physiologic factors in duet singing: a pilot study. Journal of Voice, 8 (3), 202-206.

68 The next step in Coleman’s (1994) line of research was to look at one component of choral sound. Individual church choir adult singers (N = 20, n = 10 males and n = 10 females) were recorded singing the vowel /a/ for four seconds and the first verse of Amazing Grace three times each at mezzo forte (medium loud, mf), fortissimo (very loud, ff), and pianissimo (very soft, pp) at a pitch chosen by the participant. The microphone was positioned 15 centimeters from each participant’s mouth. The six recordings of each participant were measured for SPL (sound pressure level) variations and showed great intra-group dynamic variance. The total dB variation of all twenty singers was 11 – 33 dBs from pianissimo (pp) to fortissimo (ff). Trained singers were more successful in softer phonation than untrained singers but the difference in maximum sound pressure level (SPL) showed no training effect. The sound pressure level (SPL) was an average of 3 dB greater in the vowel recordings as compared to the song recordings. All participants sang mezzo forte (mf) closer to fortissimo (ff) than they did to pianissimo (pp). The participants expressed preference for their usual performance space (very large church choir loft which is surrounded by a 32 rank pipe organ) as compared to the small recording space (room information not provided) and felt the room had a direct effect on the volume with which they sang. A hypothetical choir was created by blending all of the individual choir members’ recordings. The dynamics produced by this choir were fortissimo (ff) at 114-117 dB, mezzo-forte (mf) at 110 dB, and piano (p) level of ~90 dB. Sustained vowels had a total range of 28.8 dB and the song 22 dB. These sound pressure level (SPL) measurements are consistent with Ternström’s choir sound pressure level (SPL) measurements.139 Coleman suggested choir directors reduce the sound pressure level (SPL) output of strong, technically advanced singers so that the ensemble could reach an overall improved choral blend. From this improved choral blend, the dynamic range possibilities of the ensemble could be determined.140 Coleman’s suggestion to conductors leads a singer to perhaps wonder just how loud a choir singer should sing. Is there a way for a singer to know that he/she is singing too loud or too soft? From what vantage point could a researcher explore the answers to these questions? Ternström determined the best location would be from within the choir. Descriptive language was designed to facilitate quick understanding of research design and implementation: research study of the SOR relationship (self to other ratio) of an individual’s own voice (feedback) with the referent sounds (other singers, the room,

139 Ternström, (1993), 128-143.

140 Coleman, R. F. (1994). Dynamic intensity variations of individual choral singers. Journal of Voice, 8 (3), 196-201.

69 instruments) in an ensemble. The first step was to record individual chamber choir singers (N = 12) with two miniature binaural microphones singing the first two phrases of a homophonic chorale with the ensemble followed by a solo of the next phrase. Alto, tenor and bass subjects sang the solo sections an average of 1.7 dB softer whereas sopranos sang the solo sections at the same level or louder than in the ensemble condition. In the solo condition the average self to other ratio (SOR) was +15.2 dB. The self to other ratio (SOR) results in this initial study reflected an average of +3.85 dB with a mean magnitude of 3.46 dB. Background noise was measured twice with each member of the choir standing silently in the room and was found to be consistent at approximately 50 Hz in each location. In the discussion of the results, Ternström noted the chamber choir utilized in this study rehearsed and performed in a u-shaped single row configuration. It was hypothesized that a different standing arrangement, and of course, different rooms would have a significant impact on the results. A chart of feedback and reference components and their possible impact on physical properties (singers, walls, rooms, bone conduction, etc.) was included in addition to all mathematical algorithms proposed for future measurement of self-to-other ratio. The introduction provided a detailed explanation of sound pressure level components; reference, feedback, and measurement principles. The analysis section provided explanations and reasoning for channel gain correction and de-emphasis of a DAT machine. All of this information was designed to aid in the development of acoustic choral music measurement tools and procedures.141 Tonkinson (1990, music educator) continued the journey into understanding an individual choir singer’s contribution to the choral experience. One influence was the masking of one’s own voice by the surrounding singers. Often when this occurs, the individual singer will sing as loud as necessary to hear one's own voice without respect to the ensemble. This is known as the Lombard effect. Church choir singers and college singers (N = 27) were recorded singing The Star Spangled Banner twice while listening to a choir and themselves singing through headphones. The singers were outfitted with harmonica holders for their microphones to ensure microphone placement and consistency throughout the recording.142 After the pre-test, prior to the second recording, the singers were informed of the Lombard effect and asked to resist succumbing to the Lombard effect by maintaining a consistent energy output (intensity). Results found that years of choral experience and voice lessons had little or no significance.

141 Ternström, S. (1994). Hearing myself with others: Sound levels in choral performance measured with separation of one’s own voice from the rest of the choir. Journal of Voice, 8 (4), 293-302.

142 Sataloff, R. (1988).

70 There was an overall decrease of approximately five dBs in the post-test suggesting that education and specific direction to correct for the Lombard effect was successful.143 Although training can improve singer performance, Kitch et al. (1996, speech pathologist) found also that training was not reflected in perception test scores. Choir tenors (N = 10) were recorded by both an electroglottography (EGG) collar and a condenser microphone approximately one hour before and within 30 minutes after the performance of Mahler’s Symphony No. 2. The participants were men with a mean age of 38.7 and 7.5 years of formal voice training. The vocal tasks recorded were a prolonged /a/ on low, medium, and high pitches at soft, medium, and loud levels; and ascending and descending mid- range scales on the vowel /a/. Demographic and voice self-analyzing questionnaires were completed by the participants before the concert, immediately after the concert and then two weeks after the concert. The last questionnaire included a perceptual listening test of the participant’s voice production recorded before and after the performance. The acoustic analysis of the pre- and post-concert participant recordings began with a comparison of the pitch and amplitude ranges which were perceptibly decreased. Next, comfortable sounding notes – both in pitch and in dynamics – were evaluated for an increase or decrease in harmonic-to-noise ratio and were found to have decreased. Jitter was present in the bottom notes of scale singing and its rate increased in comfortable pitches, high soft notes, and again the bottom notes of scales. The participants’ vocal production post-concert showed clear characteristics of vocal fatigue – reduced ranges and reduction in successful soft vocal production. However, the participants reported in the perception questionnaires no reduction in vocal abilities in both the post concert and the “two weeks later” questionnaires.144

Did the tenors perform the Mahler with a strong singer’s formant (Fs) throughout the performance or did they reduce energy in the 2800 – 3200 Hz range (Fs range) and instead use increased energy in the fundamental frequency (F0)? Ford (1999 dissertation) wondered if auditors would express preference for singer's formant (Fs) or increased F0 in ensemble performance. A small SATB (soprano, alto, tenor, bass) ensemble (N = 8) was recorded in an anechoic room singing first with a soloistic, fully resonant, and strong singer's formant (Fs) and secondly in a weaker, less resonant, greatly reduced singer’s formant style. Singers were chosen based on their ability to sing in both of these styles as needed. In the

143 Tonkinson, S. (1990). The Lombard effect in choral singing. Journal of Voice, 8 (1), 24-29.

144 Kitch, J. A., Oates, J., Greenwood, K. (1996). Performance effects on the voices and 10 choral tenors: Acoustic and perceptual findings. Journal of Voice, 10 (3), 217-227.

71 recording session, the singers stood in a circle .75m from a suspended microphone. It was determined that the tenors were louder than the rest of the ensemble, so they were moved to .92m from the microphone. Measurement of the singer’s formant was measured by the total root mean square (RMS) power output. Differences between the two techniques ranged form -4.08 dB to -4.84 dB. A panel (N = 6 of music faculty and doctoral students) was unable to discern recordings by the presence or lack of presence of the singer’s formant. So the recording was altered by adjusting the gain to make the strong singer’s formant recordings more sonorous. Auditor participants (3 groups – n = 49 college music majors, n = 47 instrumental music majors, and n = 43 college students with no music training) showed overall preference for the less resonant, greatly reduced singer’s formant style where the individual singer's used "blended" technique and reinforced the fundamental frequency. Results suggested musical training had no effect on preferences.145 Daugherty's (1999, music educator) auditor participants (N = 160, n = 80 experienced musicians and n = 80 inexperienced musicians) also had significant agreement in their expressed preferences. The auditor listening task was to listen to 10 pairs of recorded choir excerpts and to answer the following questions: comparing the overall sound of the choir I these two performances, I heard a) no difference, b) a little difference, c) much difference, d) very much difference, and e) not sure; I preferred the overall choral sound of the a) first performance, b) second performance, c) both sounded the same. The choral excerpts were of participants (N = 46 high school choir singers) singing a homophonic choral excerpt (Ubi Caritas by Maurice Duruflé) in three spacings (close, lateral, and circumambient) and two formations (block sectional and mixed).

145 Ford, J. K. (1999). The Preference for Strong or Weak Singer’s Formant Resonance in Choral Tone Quality. Unpublished Doctoral Dissertation, Florida State University, Tallahassee.

72

Figure 5: Choral Formations All recordings of the different spacings and formations were made utilizing a video tape of the conductor to ensure consistency of direction. Auditors overwhelmingly expressed preference for spread spacing yet showed no preference for block sectional or mixed formation. The singer participants filled out a questionnaire from which the conclusions were that 95.6% preferred spread spacing and believed that it facilitated vocal ease and improved vocal production. Soprano, alto, and tenor preferred mixed formation; however, the basses preferred block section formation. Singer reported SOR (self-to-other ratio) was improved in the spread spacing in the block section formation (90%).146 Ternström (1999) gave participants (N = 23, n = 6 basses, n = 6 tenors, n = 6 altos, and n = 5 sopranos, all member of Swedish choirs ranging in age from 19-62) the opportunity to determine their individual preference for self to other ratio (SOR). Each participant was outfitted with a pair of binaural head mounted microphones and instructed to move closer or further away from a stationary microphone while singing either in unison (18 samples) or in a chord (18 samples) with a synthesized choir. The synthesized choir was projected from four loudspeakers placed equidistant from the participant’s average location. The participant would hear the synthesized choir begin the sung vowel and then would join in as if to blend with the choir. While singing, the participant would move forwards and/or backwards until their perfect self to other ratio (SOR) had been achieved. At that point, the participant would push a hand

146 Daugherty, J. (1999). Spacing, formation, and choral sound: Preferences and perceptions of auditors and choristers. Journal of Research in Music Education, 47 (3), 224-238.

73 held signal button for five seconds all the while singing. As the participant would move, the synthesized choir would become louder or softer in correlation with the participant’s movements. The average preferred self to other ratio (SOR) was +6.1 dB with the lowest self to other ratio (SOR) preference expressed by the basses and the highest the sopranos. There was no noticeable difference in self to other ratio (SOR) preference expressed by the participants when singing unison as compared to harmony (part of a chord). Ternström hypothesized the self to other ratio (SOR) preference expressed by the individual participants may very well be due to their usual location within a choir which would have direct impact on the amount of choir one usually hears and therefore a habitual preference may have been cultivated. However, the standard deviation in the self to other ratio (SOR) preferences expressed in this study varied only 2.2 dB. Ternström believes choral conductors could benefit from allowing singers to determine their self to other ratio (SOR) within the choir formation and arrangement.147 2000 – Present The new millennium began with Eckholm's (2000) dissertation on singing mode, seating arrangement, and choral blend. An ad hoc balanced chamber choir (N = 22, n = 5 sopranos, n = 6 altos, n = tenors, and n = 5 basses) were recorded singing a cappella four polyphonic choral excerpts in four conditions: solo mode in random seating; solo mode in voice matched seating; choral mode in random seating; and choral mode in voice matched seating. The choral excerpts were taken from Victoria's O Magnum Mysterium (measures 1 – 39), Mozart's Ave Verum Corpus, Bruckner's Locus iste, and Messiaen's O Sacrum convivium (measures 17 - end). Choristers completed a perception survey regarding ease of vocal production for each recording. Eight choristers were individually recorded on headset microphones during the entire recording process. These same eight choristers were recorded individually singing the excerpts as a solo with piano accompaniment. All conditions were sung to a videotaped conductor to ensure consistency between recordings. The recordings took place in an intimate 100 seat auditorium. Reverb was added to the mixing of the perception survey recordings to make the excerpts sound realistic. Reverb is added to a sound to add the feeling of more or less space to the sound.148 In other words, Eckholm wanted the recording to sound as though it were recorded in a larger space with more reflected sounds. Eckholm asked the experienced conductor to "conduct in the same manner over

147 Ternström, S. (1999). Preferred self-to-other ratios in choir singing. Journal of the Acoustical Society of America, 105 (6), 3563-3574.

148 Howard, D. (2007). Voice Science, Acoustics, and Recording. San Diego: Plural Publishing Co., p. 129.

74 all four experimental conditions."149 Evaluators (N = 4, n = 3 experienced choral conductors and n = 1 experienced chorister) observed the videotaping of the recording session and completed a Conductor Consistency Observation Form in which no significant differences were found between each of the recordings. Auditors (N = 65, n = 33 voice teachers, n = 32 professional non-vocal musicians) evaluated 160 excerpts and showed preference for soloistic singing in voice matched seating. Voice teachers (N = 12) listened to all recorded conditions of the individually microphoned singers and expressed overwhelming preference for the solo mode of singing in all conditions. Eckholm surmised acoustic seating would benefit solo singers in a choral setting for it would allow for a soloistic vocal production without detriment to overall success of choral blend or harmful vocal technique.150 An expressed concern of this study was the perception surveys were mailed to the listening participants with no directions as to the quality of equipment that should be utilized when listening to the perception survey to ensure a consistency.151 Woodruff (2001, University of Oklahoma dissertation) was interested in the interaction between voice-matched singers in an ensemble that would facilitate natural vocal production in choral and solo mode of phonation. Two groups of three male singers each (N = 6) were recorded in every possible solo, duet, and trio formation. Three cardiod microphones were arranged in a "v" pattern one foot in front of each singer. The microphones were rolled off at 75 Hz to remove low frequency noise from the recordings. Perceptual data were taken from singers (n is unknown), voice teachers (n is unknown), and choir directors. Woodruff concluded from the acoustical and perceptual results that lateral spacing which is 24" wide and windowed between rows of singers, may provide a choral setting requiring less voice modification to blend in a choral setting; but the best would be achieved if the singers were first voice matched and then put in a lateral, windowed formation. If a conductor needed to choose either voice matching or lateral spacing, voice matching received the highest ratings.152 These conclusions were

149 Eckholm, E. (2000). The effect of singing mode and seating arrangement on choral blend and overall choral sound. Journal of Research in Music Education, 48 (2), 123-135.

150 Ibid.

151 Ternström, (2003), 9.

152 Woodruff, N. (2001). The acoustic interaction of voices in ensemble: An inquiry into the phenomenon of voice matching and the perception of unaltered vocal process (Doctoral dissertation, The University of Oklahoma, 2001). UMI ProQuest Digital Dissertations, AAT 3075332.

75 reached from recordings of three singers; thereby one must evaluate the validity of a trio representing a choir. Folger (2002, University of North Carolina-Greensboro dissertation) used this same grouping of singers (n = 3) to differentiate between solo and choral modes of singing. Participants (N = 41 undergraduate bass/baritone and soprano choir singers) were recorded, three at a time, singing 8 bars of

Bach/Gonoud’s Ave Maria and five representative vowels (/a/, /i/, /e/, /u/, and /ɔ/ for two seconds in a voice lab (not anechoic or semi-anechoic). Then each participant sang the song sample individually three times. The final task was to sing the vowel series with as straight (vibrato-free) a tone as possible. The participants were directed to sing all samples at a mezzo forte level. Each voice type was given a specific beginning pitch from a Yamaha upright piano (soprano A-440 Hz for vowels, key of G for song; basses A-220 Hz for vowels, key of E for song). Each participant wore a head-set microphone with the windscreen positioned ¼ inch from the mouth. Results suggest participants’ vibrato rate was different in solo mode than in choral mode. Folger hypothesized that choral singers were more likely to experience the "chorus effect" when the singers maintained a constantly varied vibrato rate. Singers who have a wide and inconsistent vibrato rate must be placed carefully within the ensemble to not have a negative location effect on the overall ensemble sound.153 Vibrato was one of the five different types of voice production that Smith (2002, music educator) included in spectrograph comparison of solo singing and choral singing. The types of voice production were "straight tone", falsetto, areas of passaggio, forte production and piano production. The spectrographs were analyzed for formant peaks in the different modes of phonation (choral or solo) and in each of the five types of voice production.154 A year later, Thurman (specialist voice educator) and Daugherty (2003) proposed a different perspective of both the data and the interpretation of data in the Smith (2002) article. The data provided by Perry, Thurman and Daugherty hypothesized, was from two 21 year old singers. The authors reiterate

153 Folger, W. M. (2002). Unifying the choral sound through voice matching: An empirical study of the adjustments in vibrato frequency modulation and amplitude modulation (Doctoral dissertation, The University of North Carolina at Greensboro, 2002).

154 Smith, P. (2002). Balance or blend? Two approaches to choral singing. Choral Journal, 43 (5), 31-43.

76 the transfer of information from two singers applied to a choral context is inappropriate. Questions are posed and answered with regard to Perry's verbiage, interpretation of the data, and conclusions.155 Daugherty (2003) continued research into choral spacing and formation. A university choral ensemble (N = 20, n = 10 female and n = 10 male) was recorded in two formations (random block sectional, and synergistic) and in three different spacings (close, lateral, and circumambient). See Figure 6. In close spacing, the arm of one singer was one inch from the next singer, whereas in lateral the singer's arm was 24" from the next singer's arm. These were almost the same conditions as the 1999 study; the one variable was that a mixed formation was used in 1999 as compared to the synergistic used in 2003.

Figure 6: Chamber Choir Spacings The synergistic formation was comprised of ten pairs of singers who were of the same relative height and the same vocal loudness. The specific location of the singer pairs was determined first by loudness so that the loudest pairs would be in the center of the formation. Next, the most rhythmically accurate were placed on the outsides of the formation. The singer pairs were then arranged on four tiered risers in a window formation. The mixed choral excerpts (SATB – soprano, alto, tenor, and bass),

155 Thurman, L., & Daugherty, J. (2003). Balance or Blend? Are these the only vocal approaches to choral singing? (A rebuttal). The Choral Journal, 43 (7), 35 - 43.

77 Palestrina's Adoramus Te and Jacob Handl's O Admirabile Commercium, were chosen because they were a cappella, in Latin, and primarily homophonic. The recordings were conducted in a 600 seat recital auditorium. Higher frequencies had a measured 2.7 seconds of reverberation, mid-range were at 2.9 seconds confirming the singers' impressions of a "live" and "reverberant" room. The recordings were completed by the SATB (soprano, alto, tenor and bass) choir, the women only (soprano and alto), and then the men only (tenor and bass). From these recordings, a 12 paired sample perception test was created in which the auditors (N = 60 with choral experience) were asked for each pair: comparing the overall sound of the choir in these two performances, I heard a) no difference, b) a little difference, c) much difference, d) very much difference, e) not sure; and I preferred the overall choral sound of the a) first performance, b) second performance, c) both sounded the same. Auditors showed clear preference for circumambient spacing in all formations and within all gender groupings. Preference was shown for female groups in circumambient formation whereas male groupings were preferred in lateral formation. Auditors preferred random formation for mixed gender groups. Singers expressed greater vocal ease and less vocal tension with circumambient spacing. 156 The preferences expressed here agree with acoustical findings of Jers (1998).157 Aspaas et al., (2004, choral music educator) continued the investigation into choral formation with a return to LTAS analysis (long term average spectra). A university graduate choir (N = 30) was recorded singing a homophonic and a polyphonic choral excerpt in three formations (block sectional, mixed, and column sectional). The recordings were analyzed with long term averaged spectra (LTAS) and no differences were found between formations with one exception; the homophonic choral excerpt had higher energy levels than the polyphonic choral excerpt. The choristers completed a survey following the recording session in which sopranos expressed preference for the column sectional formation citing this formation allowed greater ease of vocal production and a greater ability to hear ones own voice. Altos preferred the column sectional formation for it allowed the greatest hearing of other vocal parts.158 Howard (2004, voice scientist and acoustician) obtained from radio stations recordings of football league fans (N = 20 different recordings, each a different team’s fans) singing during football games.

156 Daugherty, J. F. (2003). Choir spacing and formation: choral sound preferences in random, synergistic, and gender-specific chamber choir placements. International Journal of Research in Choral Singing, 1 (1), 48-59.

157 Jers, (2007), 1-5.

158 Aspaas, C., McCrea, C. R., Morris, R. J., Fowler, L. (2004). Select acoustic and perceptual measures of choral formation. International Journal of Research in Choral Singing, 2 (1), 11-27.

78 Each recording was analyzed for pitch accuracy and the degree of sharpness or flatness for each note measured. Measurement methods (spectrograph and cepstra) used for solo singing did not work with these recordings for the resulting graphs were noise-like without any identifiable closings of the vocal folds, no periodicity. Pitch identification was made by aural match of recorded tone to a matching synthesizer pitch. Reliability of this pitch measuring method was tested and resulted in less that 1% difference. The flatness or sharpness of the pitch was shown through the plotting of the standard deviation in cents of each pitch and then representing those figures on a graph. The participants' relative tuning accuracy (represented in the standard deviation in cents) was 3.3 to 38.2 cents.159 In stark contrast to the football stadium recordings, Freiheit (2005, acoustician) recorded the St. Olaf Choir (N = 80) singing Almighty and Everlasting God by Orlando Gibbons in an anechoic research chamber, a room with no sound reflections. Four of the participants became claustrophobic and left the concrete "building within a building". The participants reported difficulty in hearing themselves and others; thereby the making of the recording was difficult. However, the resulting anechoic recording was a representation of pure sound and was utilized for auralization, a form of acoustic research such as the next study. Participants (N is not provided) chose a seat in the anechoic chamber and listened to 33 seconds of the St. Olaf pure sound recording. Then participants listened to the recording again, but this time the recording was treated with acoustic parameters (sound reflections) as if the recording had been made in a normal high school auditorium complemented with a sound shell, acoustic ceiling panels, and audience acoustic "clouds". Each listening participant heard each example twice in an ABAB design. In each playing, the dynamics and volume were strictly controlled. Participants were encouraged to listen to the entire perception test in a variety of seat locations so as to experience different locations within the "perceptual room". In the acoustically created condition, listeners reported volume increases in the sound clips, greater clarity of diction, and an overall brightness to the tone which suggests audiences too prefer sound with acoustic reverberation.160 Jers (2005, conductor and music acoustician) was also interested in acoustic reverberation and the impact that it would have on amateur versus professional choirs. First, members of a community choir (N

159 Howard, D. (2004). Measuring the tuning accuracy of thousands singing in unison: An English premier football league table of fans’ singing tunefulness. Logopedics Phoniatrics Vocology, (29) 2, 77-83.

160 Freiheit, R. 2005. Historic recording gives choir “alien” feeling: In anechoic space, no one can hear you sing. Lay Language Paper presented at the ASA/NOISE-CON 2005 Meeting, Minneapolis, MN, 1-3.

79 is not provided) were recorded in a church singing an eight-bar unison phrase from a Praetorius canon. Then professional choir singers (N = 8) were recorded singing the same selection in the church. The next set of recordings of the same sequence was performed in a radio studio. Because the recording studio does not allow as much referent sound or echo, the differences between the choirs was magnified.

Analysis of the mean fundamental frequency (F0) of the singers revealed the standard deviation of singer accuracy. Both choirs pitched fifths and octaves high and low pitches had a higher degree of inaccuracy. The amateur choir had greater diversity in the beginning of the pitch onset and took longer to synchronize the pitch. The larger the interval, the greater the inaccuracy in both choirs as well as ascending scale notes were sharp and descending notes were flat (overall). The professional choir had clearer individual step movement; their same note repetitions were more accurate and the vibrato synchronization of notes with longer duration was quicker.161 Jers next investigation explored defining parameters of the "chorus effect" – the combined sound of many sources that are similar but uncorrelated at the level of the waveform of the sound.162 Amateur choir singers (N=16) were dressed with individual miniature electret microphones over their noses and recorded singing a Praetorius 8 bar canon in both slow (1/2 note = 80 beats per measure [bpm]) and fast (1/2 note = 125 bpm) tempos. The singers were arranged in a semi-circle facing the conductor.

Recordings were analyzed for MFON (averaged fundamental frequency over the duration of the tone),

MFOV (averaged fundamental frequency of all the participants), MFON,V (averaged fundamental frequency and scatter of fundamental frequency of all participants for each note), SFON (the standard deviation of the stability of fundamental frequency for one singer per one note), SFOV (the standard deviation of the stability of the fundamental frequency for all participants), and SFON,V (the standard deviation of the fundamental frequency and the scatter of the fundamental frequency for all participants) for the entire 8 bars and for 4 selected notes of longer duration. The results showed there was some attempt to synchronize vibrato; the accuracy of fundamental pitch was more stable in the slower tempo; there was greater pitch accuracy achieved when coming down a P4 than when going up a P5; and downward stepwise movement had greater pitch accuracy than upward stepwise movement which showed a tendency to pitch high. Sound pressure level (SPL) was not measured at this time but future studies are

161 Jers, H. (2005a). What are the differences between amateur and professional choirs? ASA/NOISE-CON 2005 Meeting Lay Language Papers Minneapolis, MN, 1-4.

162 Jers, H. & Ternström, S. (2005b). Intonation analysis of a multi-channel choir recording. TMH-QPSR, (1) 47, 1-6.

80 planned. Jers suggested that in understanding the experience of individual singers within a choral experience, an improved understanding of the chorus effect might be achieved.163 Morris et al. (2006, voice scientist) examined the effect of three different choral arrangements on the output of LTAS (long term average spectrum) as well as the effect of polyphonic versus homophonic musical selections in each of the arrangements on the long term average spectra (LTAS) analysis. A college graduate chamber choir (N = 30, n = 8 sopranos, n = 11 altos, n = 9 basses, n = 13 tenors) was recorded in a 480 seat concert hall singing the homophonic section of Mocnik's Christus est natus and the polyphonic section of Lajos' Cantemus! To evaluate microphone placement, microphones were placed at three staggered locations: 0.5 meters in front of participants for near field recording; 3.0 meters from choir and close to conductor for a mixed field recording; and 10 meters from the choir in the audience for a diffuse field recording. The participants sang each music excerpt three times, once in each of the following formations; block sectional, sectional in columns, and mixed.

Figure 7: Organization of Choral Formation by Vocal Parts LTAS analysis was performed on each of the excerpts from each of the microphone locations. The highest peak in the averaged spectral analysis for all conditions was around 500 Hz. The LTAS of the homophonic selection in the near field had 5-8 dB more signal amplitude than the column formation. The mixed arrangement had more signal amplitude above 2000 Hz in the homophonic and polyphonic

163 Jers, 1-6.

81 selections. No difference was present in the long term average spectra (LTAS) of the conditions recorded in the diffuse (far) field.164 Howard (2007a, conductor and music acoustician) used a "mixed" arrangement of singers, a SATB quartet (1 soprano, 1 alto, 1 tenor, and 1 bass) to investigate intonation. The participants were fitted each with an electrolaryngograph and miniature microphone 30 centimeters from participants' lips. The participants were recorded singing an a cappella thirteen chord exercise which began on a C major root close voicing chord (bass C3) and ended on a C major root close voicing chord (bass C4).

Figure 8: Choral Exercise After the first recording, singers were asked how each of them sang in tune. Howard suggested the singers listen to the voice that had their next note and to tune each chord from that point. The result was an overall reduction in cents out of tune. A comparison was made between the overall intonation success of the participants in both equal temperament and just intonation. Just intonation average pitch variation comparison revealed singers naturally tuned approximately 23-26 cents below the pitch versus equal temperament was at 49 cents below pitch in this quartet. Howard hypothesizes that these results support that conductors who ask singers to sing a cappella are in actuality requiring the ensemble to stray in pitch in order to stay in tune.165

164 Morris, R., Mustafa, A., McCrea, C., Fowler, L., Aspaas, C. (2006). Acoustic analysis of the interaction of choral arrangements, musical selection, and microphone location. Journal of Voice, 21 (5), 568-575.

165 Howard, D. (2007a). Equal or non-equal temperament in a cappella SATB singing. Logopedics Phoniatrics Vocology, (2) 32, 87-94.

82 Again, Howard (2007b) uses an SATB quartet fitted with an electrolaryngography (EGG) collar and a miniature microphone placed 30 centimeters from each participant's lips to form a 45o angle. Building upon the advice given to the quartet in the study above, Howard asked the participant who had the common chord tone to hold for two beats while the singers with moving tones would hold the first chord for one beat and then rest to clearly hear the common chord tone and establish a new tuning from that note. Singers were directed to not tune their line individually but to re-tune to the common chord tone each time. All individual notes were measured for accuracy of F0 (fundamental frequency pitch) using a time domain cycle-by-cycle (cpc) analysis. Measurements were averaged and compared against just intonation pitch frequency and equal tempered pitch frequency and reported in cents deviated. The results showed the singers most closely followed just intonation – although consistently sharp except for one musical excerpt in which the singers were sharp only as the chords went upward (chords 1-6) and then flat as the chords went lower in pitch (chords 7-13). Howard again suggested conductors would do well to be aware of the difficulties in singing “in tune” through out an entire piece of a cappella music, especially if modulations occur.166 Howard's (2007c) third study used two SATB quartets. Each quartet was recorded singing six songs. For each of the songs, the first recording was with each participant singing their voice classification part. In the second recording, the sopranos sang the alto part and the altos sang the soprano part; the basses sang the tenor part and the tenors sang the bass part. Each participant was equipped with an electrolaryngograph (EGG) collar to measure the larynx closed quotient (CQ). The larynx closed quotient is a measurement of each vocal fold cycle for which the folds remain in contact.167 This measurement is obtained by the electrolaryngograph (EGG) through a close fitting neck collar with two electrodes positioned on either side of the larynx. Plotted EGG results suggested there was a difference in

CQ values and F0 (fundamental frequency) for each vocal part. The amount of variation between vocal parts was also different for each quartet. Howard hypothesized one of the factors of good choral blend could be the result of a smooth distribution of closed quotient (CQ) values.168 Ternström (2007) also looked at a quartet, a barbershop quartet, to investigate formant frequency adjustment. Three four-track recordings of Paper Moon were sung by a professional barbershop quartet

166 Howard, D. (2007b). Intonation drift in a capella soprano, alto, tenor, bass quartet singing with key modulation. Journal of Voice, (21) 3, 300-315.

167 Howard, D. (2007c). Larynx closed quotient variation in quartet singing. 19thInternational Congress on Acoustics, Madrid, September 2007, (PACS: 43.55.Cs), 1-6.

168 Ibid.

83 three times in an absorbent room. Recordings were of ensemble performance, individual performance, as well as spoken and sung portions. Each singer wore small microphone taped on nose. Vowels chosen for analyzation were /u/ (to), /i/ (be), and /a/ (divine). The results suggested singers arranged their vocal tract such that the formant frequencies were more spread. The singer's formant frequencies were often on or close to a partial as well as the common partials of another singer. The spread formant frequencies may be in an effort to hear oneself better so that the combined sound may seem larger and more expanded, more resonant – locked and rung! Barbershop quartets may be able to increase their resonance by adjusting their vowel quality. Success for this quartet was achieved through varied vowel production versus attempting to sing exactly the same vowel – the opposite of choral singing.169 To look at choral blend, Jers designed an artificial singer (AS) and placed it in an anechoic room. Other artificial singers were placed in three different positions: 75 centimeters lateral to first singer; 75 centimeters in front of first singer; and 4 surrounding singer with adjacent singers at 50 centimeters and front singers 75 centimeters in front of singer and 50 centimeters between the front singers. The first artificial singer (AS) was recorded simultaneously by 14 microphones placed 265 centimeters from the artificial singer (AS) in an almost complete half circle. The results were presented in 3-D spherical plots with color differentiation for dB level. The lateral artificial singer (AS) measurements showed very little change in singer directivity as compared to solo level. The added frontal singer met expectations in that measurements showed increases in the higher frequencies. Sound pressure levels (SPL) varied with a 20-25 dB reduction in the frontal region and an increase of 10 dB in the rear region. Singer directivity was directly influenced by surrounding singers. The overall sound pressure level (SPL) increased 5-10 dB – even for low frequencies and for the gaps between the singers. Results included confirmation of different singer directivity as measured by sound pressure level (SPL) when in a choir surrounded by singers using choral mode versus solo mode of singing. Jers suggested this supports the use of risers and looks to do future studies in this area.170 Libeaux (2007) created an artificial choir by recording live singers (N = 8) singing 3 dynamically varied song excerpts in an anechoic environment. These recordings were then synchronized, normalized,

169 Ternström, S., & Kalin, G. (2007). Formant frequency adjustment in barbershop quartet singing. International Congress on Acoustics, Madrid, September 2007, 1-6.

170 Jers, H. (2007). Directivity measurements of adjacent singers in a choir. 19th International Congress on Acoustics, Madrid, September 2007, (PACS: 43.75.Rs), 1-5.

84 and stored as wave files and then utilized to create the virtual choir. Participants (N = 35) individually sang with the newly created virtual choir. Each participant’s vocal part was removed from the virtual choir during their recording session. The virtual singers (VS = 8) were arranged in a half moon with lowest voices (basses) on the singer’s left to the highest voices (sopranos) on the singer’s right. In other words, the virtual choir had seven virtual singers and one human singer participant, always a double quartet comprised of Bass 2, Bass 1, Tenor 2, Tenor 1, Alto 2, Alto 1, Soprano 2, and Soprano 1. The virtual choir was played sometimes from loud speakers in the corners of the room, and sometimes played directly into the live singer’s headphones. Live singer participants (N=35) concluded that the experience was more realistic (75% amateur, 67% semi-professional) singing with the loudspeakers. When the live singers were asked: How was the musical integration with regard to rhythm, intonation, and sound- pressure level (SPL), again the loudspeakers were preferred. This study shows the validity and possibility of virtual choir studies and sets a framework for future design of virtual choir studies.171 Reid et al. (2007, voice scientist) utilized live participants (N = 26 professional opera chorus singers, n = 12 designated singers and n = 14 surrounding singers) to revisit the differences between choral singing mode and solo singing mode. The participants were recorded singing two excerpts in four conditions. The first excerpt was the last twenty-one bars of the Easter Hymn from Mascagni’s Cavalleria Rusticana. The second excerpt was the last sixteen bars of Torna a Surriento by Di Curtis. In each condition, singers of the same voice classification surrounded a singer of the same voice classification. There were four such groups. Each singer took a turn in each condition as the designated singer in the center of the group. The microphones were mounted seven centimeters from the left corner of the singers’ lips. In the first condition, the surrounding singers were one meter from the center singer. All sang the first excerpt as part of the ensemble. In the second condition, the center singer remained silent while the other singers sang as part of the ensemble. Both of these conditions were repeated with all singers at a distance of two meters. The final task was an individual recording of each participant singing the second excerpt as a soloist. The LTAS (long term averaged spectra) of the singing power ratio (SPR) and the energy ratio (ER) of the excerpts showed more relative energy in the higher frequency range where the singer’s formant (2800 Hz to 3200 Hz) occurs in chorus mode when compared with solo singing mode. In other

171 Libeaux, A., Lentz, T., Houben, D., & Kob, M. (2007). Voice assessment in choir singers using a virtual choir environment. 19th International Congress on Acoustics, Madrid, September 2007, (PACS: 43.57.Rs), 1-6.

85 words, these singers did not reduce the energy level in the singer's formant (Fs) range when singing in either mode – ensemble or solo. Both singing excerpts utilized were in Italian to provide for same vowel excerpts for long term average spectra (LTAS) calculations of vibrato rate and extent. The vowels /o/ and /a/ were chosen. There was no significant difference in vibrato rate or extent for either vowel. But, there was a significant difference in the energy ratio (ER) for the choral mode of vowel /a/. One proposed reason was if the singers sang louder in choral mode than they did in solo mode, that would account for the difference in the energy ratio. This is different than in previous studies (Letowski, 1988) wherein a liturgical choir’s singers used a “dampened technique” for choral mode resulting in reduced energy ratio. Participants in this study stated they used only one vocal technique for both the solo and choral mode. The lack of difference in the vibrato rate and extent between the solo and ensemble modes also conflicts with prior research (Rossing et al., 1986, 1987 and Goodwin, 1980) who found that soprano singers reduce their vibrato extent when singing in ensemble. Reid et al. acknowledged singers in an opera chorus may be required to remain in solo mode even when singing in ensemble which would explain the high energy in the singer’s formant region.172

172 Reid, K., Davis, P., Oates, J., Cabrera, D., Ternström, S., Black, M., & Chapman, J. (2007). The acoustic characteristics of professional opera singers performing in chorus versus solo mode. The Journal of Voice, 21 (1), 35-45.

86 CHAPTER FOUR SUMMARY Mayer (1964) believed the sum, the choral sound, could be greater than the individual voices. In order for the sum to be greater, a number of specific components of choral sound must be understood and applied. The choral conductor is the determiner, the expert of the desired sound. Moving from the imagined into the world of absolutes, a choral conductor who understands the acoustics of choral sound is well prepared for the challenge of perfect choral blend. Perfect choral blend implies that no one voice is heard above the others. We have learned the impact of self-to-other ratio (SOR) within a choir can have an effect on the ability of a singer to hear oneself. When the singer is unable to hear oneself, (the Lombard effect)173, the ability to self-regulate pitch, loudness, and voice quality is greatly compromised.174 Singers expressed preference for a 6.1 dB self to other (SOR).175 To provide singers the greatest success in managing their self to other ratio (SOR), spacing between singers is critical. The preferred spacing seems to be 18-24 inches between singers both horizontally and vertically.176 Another factor of ensemble singing is the mode in which the singers produce their sound. In the choral mode, singers increase energy in the F0 whereas in solo mode of production, more energy is 177 focused in the 2800 – 3200 Hz range, the area known as the singer's formant region (Fs). The Fs is the clustering of the third, fourth, and fifth resonance peaks that enables the singer to project over orchestras without amplification. Few conductors are given the opportunity to staff their choir completely with solo singers and often supplement with choir singers. This is the scenario in which choral conductors may ask the solo singers to revert to the choral mode of singing in an attempt to improve overall choral blend. The payoff for this direction can be that choir singers often will be more vocally productive when solo singers revert to choral mode.178 The cost for this direction can be that solo singers may choose not to participate in choral singing so as to focus entirely on the development of solo technique.

173 Tonkinson, S. (1990).

174 Ternström, S. & Sundberg, J. (1988).

175 Ternstrom, S. (1999).

176 Daugherty, J. (1999, 2003).

177 Rossing et al. (1986).

178 Letowski et al. (1988).

87 From these beginning decisions, the choral conductor can move forward to decisions of the ensembles' acoustic possibilities regarding loudness (amplitude), timbre (quality of sound), and pitch (frequency). If indeed each of the choir singers is utilizing the solo mode of phonation, focusing energy in the region of the singer's formant (Fs), the ensemble will have greater amplitude. This amplitude will be perceived as louder than if the singers were producing in the choral mode of phonation; reducing the 179 singer's formant (Fs) and increasing energy to the fundamental frequency (F0). Another factor impacting the overall amplitude of the ensemble is the amount and timing of sound reflections in the room. The impact of the room's acoustics can reflect an amplitude variance of 12 to 20 dB in a choir's output.180 Singers have expressed preference for rooms with early reflections, 15-35 milliseconds of reverberation, which in a semi-anechoic room would require reflecting wall distances of 2.6-6.0 meters. The effect of training did not have an impact on the preference for room reverberation. 181 The amplitude of a trained singer's phonation is measurably different than that of untrained singers. Trained singers are able to sing softer phonation with greater resonance and pitch accuracy and are able to produce a variety of dynamic levels.182 Both trained and untrained singers tend to sing mezzo forte (medium loud, mf) closer to fortissimo (very loud, ff) than to pianissimo (very soft, pp).183 These factors lead us directly into the psycho-acoustical realm of perception. Singers, and choral conductors, have difficulty reproducing exactly the same forte (loud, f) as produced earlier in the composition, yet, when asked will report the same forte (loud, f) was produced. This type of error is consistent in perception research literature, and no effect is present for the amount of training. Remember the tenors who were recorded before and after a performance of Mahler's Symphony No. 2 who reported no effect of vocal fatigue after the performance, but the recordings indicated marked reduction in vocal and dynamic range abilities.184

179 Goodwin, 1980.

180 Ternström, 1993.

181 Marshall & Meyer, (1985).

182 Coleman, R. (1994).

183 Ibid.

184 Kitch et al., (1996).

88 The training effect is much evidenced in singers' quality of sound. Untrained voices are perceived as brighter, in contrast to trained voices which are often perceived as darker.185 This directly correlates with continual preference expressed for covered vocal production, in other words, a vocal production in which all areas of the voice sound as one "color". Physiologically, when the quality of the sound is perceived darker, the singer raises the soft palate, widens the pharynx and laryngeal ventricles, and tilts the larynx forward.186 Anderson's auditor participants expressed preference for a dark vowel (/u/) in 81% of his recorded choral samples.187 In order for a choral ensemble to sound perfectly blended, the choral conductor must determine the exact vowel and the "color" of the vowel expected from the ensemble. Ternström found that increased resonance could be accomplished by complete singer unity of the vowel.188 The effect of the quality of the vowel (bright to dark) has been found to influence pitch accuracy. Vowels produced more forward in the vocal tract (/a/, /i/) have less intonation errors than those produced

189 further back in the vocal tract, the darker vowels (/u/, /ɔ/). More importantly, less pitch accuracy is produced when the singer cannot hear a clear, harmonically rich reference tone.190 When singers are provided with a clear reference tone prior to producing their tone, the amount of error was reduced by 50%. Additionally, when singers are given the opportunity to tune without the pre-determination of a prescribed tuning system, and instead tune from the fundamental frequency of the previous chord, most especially if the fundamental frequency is a with the next chord, greater pitch accuracy was achieved.191 Both the quality of the sound and the frequency are influenced by vibrato rate and extent. Interestingly, studies have found no effect for vowel on vibrato, yet a direct correlation to the energy ratio

185 Letowski et al., (1988).

186 Hertegard et al., (1990).

187 Anderson, (1993).

188 Ternström, (2007).

189 Ibid.

190 Ternström & Sundberg, (1988).

191 Howard, (2007a).

89 of the sound was significant in the vowel /a/ as compared to /i/.192 The energy a singer produces is influenced by so many factors; physiological, self to other ratio (SOR), mode of phonation, and choral formation. Choral formation is a conductor's best opportunity to empower the success of the individual singers and thereby significantly impact the overall ensemble performance quality. Choral singers expressed an overwhelming preference for circumambient formation – that which spaces singers 18 to 24 inches between each other to the side, front, and back. Singers of the second row stand in the 18 – 24 space such that no singer is directly behind another, but to the right or left of the front singer. In addition to circumambience, the singers expressed preference for block sections of each vocal part. Singers found this block/sectional circumambient formation encouraged vocal ease of production and excellent self to other ratio (SOR) which in turn allowed for the greatest access to vocal abilities.193, 194, 195 An understanding of vocal abilities with respect to how they function within a choral setting was a primary goal of this writing. Choral sound is foremost composed of individual singers whose morphology, habitual voice use, and training have a direct impact on the ensemble's choral blend. The ensemble's sound is then molded by the choral conductor. The underlying power of the conductor is the foundation of knowledge and the ability to apply that knowledge. The acoustics of choral sound have been introduced here to provide a unified document in a concise format that can serve as a springboard for informed practice, rehearsal and study. Armed with the basics of acoustic choral music measurement, the educated, well informed choral conductor can create an atmosphere wherein the ultimate goal – perfected choral blend - is not only attainable, but well within artistic reach. Once choral blend is firmly in the grasp of a superior ensemble, a synergized aural experience can come to both the composer's music and the joyous work of the ensemble, creating the phenomenon known as the choral experience.

192 Reid et al., (2007).

193 Daugherty, (1999, 2003)

194 Aspaas et al., (2004).

195 Jers, (2005).

90

CHAPTER FIVE DISCUSSION AND CONCLUSIONS

91 CHAPTER FIVE DISCUSSION AND CONCLUSIONS The investigations of vocal science researchers have illuminated aspects of individual sound that can be developed and monitored by educated choral conductors within the choral rehearsal that will have direct, positive impact on the ensemble sound. Certainly the process of acoustic choral sound measurement is slow and often tedious – yet the knowledge gained by each experiment brings the collective understanding to newer plateaus which are discussed, improved upon, and result in new equipment, software, and processes for choral sound measurement. Each new investigation provides new insights into the phenomenon of perfect choral blend that can aid a choral conductor in perfecting their craft. From the outset, a choral conductor has a vast myriad of decisions prior to hearing a single sound. Some of these decisions can be aided by knowledge of the current physical understanding of choral sound. One such decision is influenced by the rehearsal and performance spaces. How large are these spaces? Prevailing knowledge garnered from auditor and singer preference is that vocal ease of production and the ability to hear not only one's self but also other singers is best accomplished with 18- 24" between and around singers.196,197,198,199,200 If a conductor's rehearsal space is not able to accommodate this spacing when singers are horizontally arranged in rows, a conductor could investigate arranging the singers in a circle, or series of circles, around the centrally located conductor. In each of these arrangements, the standard singer windowing would apply – that is a row of singers should space themselves 18-24" of space between their row members such that each singer stands in the 18-24" space between the singers in front of them. Spacing between singers has been described in the literature as the self-to-other ratio (SOR). Conductors could allow the singers to determine their own best spacing. Ternström suggests the conductor first voice match the singers. Voice matching is a process whereby a conductor determines aurally where each singer should stand within a given vocal part (soprano, alto, tenor, bass) such that no

196 Daugherty, (1999).

197 Daugherty, (2003).

198 Woodruff, (2001).

199 Aspaas, (2004).

200 Ternström, (1999).

92 one individual voice is heard above the others. After voice matching is completed, have the vocal section (soprano, for example) minus one sing a unison vowel. The individual singer joins the section on the unison vowel and then moves within the assigned individual's location until the singer can hear equally their own voice and that of the vocal section. Ternström refers to the singer's individual sound as the feedback and the other singers' sound as the reference. Singers have expressed preference for 6.1 dB of reference sound with the lowest preference expressed by basses and the highest preference expressed by sopranos.201 Knowing this, a conductor could allow for more space between sopranos and less between basses. Among singers overall, 90% of the participants found the best self-to-other ratio (SOR) balance was achieved in a block sectional arrangement.202 In this arrangement, each voice part was equally spaced both horizontally and vertically such that a block was created for each voice part. Sopranos described this arrangement as one that afforded greater ease of vocal production as well as a greater ability to hear themselves. Altos expressed a greater ability to hear other vocal parts when arranged in a block (also known as column) sectional formation.203 Auditors expressed overwhelming preference for random formation and circumambient spacing – windowed, 18-24" spacing, and separation of voice parts.204 Perhaps, conductors would do well to experiment with this formation of singers. The formation and spacing of singers can also aid in combating the Lombard effect. The Lombard effect occurs when singers are unable to hear themselves singing. To hear, the singers often use a brighter, edgier vocal timbre from which intonation errors can result. Tonkinson found that informing the singers about the Lombard Effect had a 5% reduction in intonation errors in the subsequent music rehearsal.205 This is another example that supports conductors sharing their interpretation of the rehearsal challenges so that singers can be an active part of improving the choral experience. In addition to the Lombard Effect, a conductor's ears must be sensitive to the amount of reverberation in the rehearsal and performance space. In 1985, Maxwell and Meyer's singer participants expressed preference for 15-35 milliseconds of reverberation and extreme dislike for 40 milliseconds of

201 Ternström, (1999).

202 Daugherty, (1999).

203 Aspaas, (2004).

204 Daugherty, (2003).

205 Tonkinson, (1990).

93 reverberation.206 Yet, a live room as described by Daugherty had 2.7 seconds reverberation of higher frequencies and 2.9 seconds of mid-ranged frequencies.207 A conductor may consider changing the formation and spacing of choir members to help singers adjust to the timing of the receipt of sound references. Sound references are a key component for good intonation. In fact, reference tones need to be approximately 25 dB for accurate pitch matching.208 Different vowels require different amplitude for in tune singing, for instance, /α/ requires the most amplitude.209 This is more important than one would expect, for the vowel of a reference tone was found to have more impact on intonation than did the amplitude of the reference tone.210 Accurate intonation is most successful when the reference tone is louder than the singer feedback; when the lowest common partials are audible to the singers; when high partials are present in the reference tone(s); and when the vibrato rate is small.211 Reference tones are what choir singers use to move from one note to the next. Choirs that have a high degree of intonation accuracy often sing the fundamental frequency with little vibrato and a full, resonant voice quality which is measured by the number of partials and harmonics present in the tone. When other notes that sound simultaneously have matching partials and/or harmonics, accuracy of intonation is said to be high. However, another effect that causes intonation errors is masking. Masking occurs when the singer is surrounded by other singers who are singing the same note, often louder, and therefore the singer cannot hear themselves. Just as in the Lombard effect, masking will also lead to intonation errors for the singer cannot auto-correct what they cannot hear. Masking has been found to cause intonation errors in the neighborhood of 50 cents in professional singers.212 When masking was added into a study design, singers sang ascending scales flat, descending scales sharp, sustained notes progressively sharp, and

206 Maxwell and Meyer, (1985).

207 Daugherty, (2003).

208 Ternström and Sundberg, (1988).

209 Ternström and Sundberg, (1983).

210 Ternström and Sundberg, (1982).

211 Ternström and Sundberg, (1982, 1988).

212 Ternström et al. (1988).

94 213 vowels were modified from /α/ to /ɔ/ or /a/. Vowel glides and darker vowels (/u/) are particularly vulnerable spots for intonation errors when masking is present.214 The spectral variation in a tone, the accuracy of the frequency target, and the presence of partials and harmonics, is most affected by vowel production. Additionally, the vowel production has direct impact on the singer's formant (Fs) and the fundamental frequency (F0). One aspect of the spectral variation has been attributed to the vocal tract in men and the glottis in women.215 Many conductors list vowel unification as a primary element of good choral blend. To that end, Anderson asked participants to identify a rogue vowel (sung by ¼th of the choir while 3/4th of the choir sang the target vowel) in choral excerpts. The participants were successful (81%) in identifying the rogue vowel /u/, somewhat successful (61%) in identifying the rogue vowel /i/, and completely unsuccessful identifying /α/.216 Ternström and

217 Sundberg noted the tendency of choir singers to neutralize vowels, and quite often toward /ɔ/. Conductors are often guilty of asking singers to modify the vowel because of a higher or lower frequency, because more space is desired in the sound, because more of the singers will then be able to sound as if they are in vowel agreement. Yet, presence of the singer's formant (Fs) and vowel intelligibility are often sacrificed for these goals.218 One area of the singer's voice that will require a conductor's understanding is the area of the passaggio. The passaggio, an area of frequency production that requires vocal fold/tract realignment, is present in all voices as the singer moves from one vocal range area to another. The passaggio is an active area of research for a clear explanation for its existence has yet to be developed or explained. Yet, modifying the vowel with a more open pharynx and reduction of sub-glottal pressure has been found to allow the singer progression through the passaggio without the listener being aware of a change in vocal sound or production.219 The conductor must be aware of the passaggio areas of the singer's vocal range

213 Ibid.

214 Ternström et al., (1988).

215 Bloothooft, (1982).

216 Anderson, (1993).

217 Ternström and Sundberg, (1988).

218 Ternström and Sundberg, (1989).

219 Miller and Schutte, (1990).

95 and abilities and consider its impact on the music being performed. Score study will allow identification of areas of the music where areas of the passaggio will influence the intonation, dynamic, and intensity successes of the choir singers. In contrast to existing pedagogy, Ternström found that a quartet of male singers was able to sing a choral passage, in a highly resonant vocal technique by spreading their formant frequencies so that more common partials were accessed. In other words, the singers did not all sing exactly the same vowel but instead vowel variances that once sung together sounded as if one vowel were being produced. Additionally, each singer seemed to stretch (add or subtract cents) each target frequency to create the most excitement, or partials in the sound. None of the standard tuning methods fit each chord, yet intonation was perceived to be excellent.220 Formant tuning refers to the process by which a singer adjusts the vocal tract to create the most resonance possible for a given sound. Conductors can aid their singers in understanding what areas need adjustment, or tuning, to create more resonance. In doing so, conductors should be aware that they will be assisting in the acquisition of the singer's formant (Fs). Specific frequencies are enhanced by molding the vocal tract for target vowels. For example, the first resonance reflected as a frequency peak (F1, or what

221 222 some now call R1 ) is associated with the pharyngeal space and particularly /e/, /i/, and /ɨ/. When doing IPA (International Phonetic Alphabet) score study, a conductor could identify vowel occurrences and work with increasing or decreasing the pharyngeal space to achieve a unity of vowel pronunciation and an increase in resonant sound. Similarly, all vowels have an area of the vocal tract wherein the informed conductor could guide choir singers into accessing better enunciation and improved resonances. The accuracy of the target frequencies of the choir singers will enhance the amount of partials and harmonics in the sound which can then be resonated throughout the vocal tract. The following exercise could be used for formant tuning, but was designed for first help with intonation. Conductors could consider tuning from chord to chord within musical phrases instead of from the beginning of a phrase to the end of the phrase. Singing in tune could be thought of as moving within tuning systems from chord to chord which in the end would result in overall excellent intonation. Isolate two chords within in a phrase that have a common tone from one chord to the next. Have the choir sing

220 Ternström, (2007).

221 Joliveau, (2004).

222 Fant, (1970).

96 the first chord. Cut off the three vocal sections that do not have the common tone for the next chord. The common tone should be held and then the vocal section that has the common tone in the next chord should join in unison. Once a unison has been achieved, the first vocal section should drop out to prepare for the next chord. Have the three vocal parts tune and enter with the held common tone.223 This procedure will help the choir singer to identify the reference tones from chord to chord – creating a horizontal singer in harmony with the standard vertical singer. A vertical singer will understand the function of his/her note within a chord and will constantly strive to tune vertically which may or may not aid in creating the line. Other suggestions from research literature include that conductors should carefully select compositions with vocal ranges that match the pitch range of the choir singers.224 If the choir has great variance in vibrato, consider slower tempos with notes of longer duration to encourage vibrato synchronization which in turn will improve the fundamental frequency (F0) accuracy. If intonation is a constant struggle, conductors could choose repertoire with more downward step motion verses upward step motion which tends to sharp. Additionally, choirs tend to more accurately sing downward perfect fourths versus upward perfect fifths.225 Being aware of these consistent intonation errors can provide conductors with composition areas of rehearsal focus. Comparison of amateur choirs versus professional choirs showed many of the same differences as did a comparison between solo and duet singing. Professional choirs were identified as having cleaner individual step movement, same note repetitions with less deviation in frequency, and synchronization of vibrato was quicker. Amateur choirs had a greater diversity of beginning frequency onsets and struggled with vibrato synchronization.226 In duet singing, the onset of sound had a 29.6 millisecond difference, vibrato reduction of 50% as compared to solo singing.227 This information emphasizes the need for rehearsal of each onset (and offset) to establish ensemble timing through symbiotic reaction to the conductor's gesture.

223 Howard, (2007a) and (2007b).

224 Howard, (2004).

225 Jers, (2005b).

226 Jers, (2005a).

227 Coleman, (1994).

97 Many would say that duet singing was not a valid comparison to choral singing. And yet, choral singing is deemed successful when every person's sound is in perfect unison with every other singer's sound joined together to create an ideal choral experience. If indeed researchers can determine the difficulties and/or the differences two professional singers have when singing together – then the choral experience will improve. For instance, musically educated auditors express preference for the solo mode of vocal production within a choral setting and especially when the singers were voice matched.228 Untrained singers exhibited richer, more vocally energetic voice quality when in choral mode versus solo mode.229 Yet, research has found that singers, both professional and amateur, sing with less amplitude in choral mode of singing which suggested the use of flow phonation, a glottal voice source difference.230 However, no reduction in vibrato rate or extent was found between choral and solo modes of singing except in very loud singing and the vowel /a/ at any loudness level. 231,232 So often professional, or more advanced singers, will reduce their singer's formant (Fs) when singing in choral mode and increase the 233 fundamental frequency (F0). Choir singers who are able to increase energy in the singer's formant (Fs) region will be able to produce greater dynamic diversity thereby providing the conductor with a broader range of sounds from which to convey the music. The sound pressure level (SPL) required for pianissimo (pp) to fortissimo (ff) has been found to not significantly vary in professional singers.234 Now the circle has returned to its starting point. Conductors have many decisions to consider before hearing the very first sound. Will the singers use the choral mode of phonation in which they enhance the fundamental frequency (F0) and subdue the singer's formant (Fs) or will the singers be asked to use the solo mode of phonation where energy will be focused in the 2800-3200 Hz region, the singer's formant (F0), instead of the fundamental frequency (F0)? Will all of the singers phonate the same vowel in the same way or will some of the singers vary the vowel to enhance spreading the formant frequencies in hopes of matching partials with another singer(s)? Will the rehearsal and performance space allow for

228 Eckholm, (2000).

229 Letowski, (1988).

230 Rossing, (1986).

231 Goodwin, (1980).

232 Reid, (2007).

233 Goodwin, (1980).

234 Weber, (1992).

98 18-24" of space between each singer? Will the singers determine their own self-to-other ratio (SOR)? Will the conductor voice match the singers? How will intonation errors be addressed? How will the repertoire be selected? Indeed, conductors are the fortunate ones who are empowered to create a single sound from the mouths of many. The knowledge that a conductor brings to this music experience affects creation of an experience that is not only unique to the moment, but has the possibility of achieving choral blend through a process that respects and develops each individual singer to create a musical experience of lasting significance – the choral experience.

99

APPENDIX A

GLOSSARY

100 GLOSSARY

An extensive glossary has been provided in this document that codifies terminology from music acoustics, voice science, choral studies, voice studies, equipment guides and usage, mathematics, and statistics. The goal of this glossary is to facilitate the intermingling of many different divergent disciplines that are utilized throughout this document and to provide a resource when reading documents not included in this writing. Many definitions have evolved over time and in such cases the most recent definition is used in the glossary. Older definitions are explained in the review of literature annotations.

2800 factor: a strong set of overtones in the frequency range between the 3rd and 4th speech formants. a cappella: without accompaniment. accelerometer: a device used to measure vibration; its electrical output indicates its acceleration. acoustic: relating to sound. acoustic feedback: sound from a loudspeaker picked up by a microphone (either in the direct field or the reverberant field) and re-amplified. acoustic impedance: 1. a measure of the difficulty of generating flow (as in the vocal tract); the ratio of the sound pressure to the volume velocity due to a sound wave. 2. the ratio of sound pressure to volume velocity. A graph of acoustic impedance of a musical instrument as a function of frequency shows peaks that correspond to the resonances of the air column. acoustical signature: the unique sound of a space as determined by the room size, room shape, the acoustical properties of all surfaces. acoustics: the science of sound, including its production, transmission and effects; the sum of the qualities of an enclosure that determines the nature of the sound generated within it. affricate: a speech sound that involves the two phases of a stop (vocal tract obstruction) and a prolonged frication. aftersound: the second portion of a sound decay having a longer decay time. algorithm: step-by-step directions for solving a problem (as in multiplying two numbers). alto: short for contralto, the lowest of the female singing voices.

101 ambience: spaciousness; the degree to which sound appears to come from many directions. amplifier: 1. a device in which a small amount of input power controls a larger amount of output power. 2. an electronic circuit that increases the level of a signal. 3. a device that increases the amplitude of a signal. Amplifiers are used to increase signal gain for purposes of recording, playback, or analysis. amplitude: 1. the height of a wave; the maximum of displacement of a vibrating system from equilibrium. 2. the magnitude of displacement for a sound wave. The waveform of a sound is represented on a two-dimensional graph in which amplitude is plotted as a function of the sound. 3. in phonation amplitude is controlled by subglotttal pressure. analog: 1. refers to a signal (for example, in electrical or magnetic signals) that directly represents an acoustical signal. Analog media (such as vinyl or cassettes) make a permanent record of the sound using a continuously varying signal. 2. a signal that has continuous variations in amplitude. The radiated sound-pressure waveform of speech is an analog signal because its amplitude varies continuously in time. analysis of variance (ANOVA): 1. a statistical test to assess the null hypothesis for the observed difference between two sample means. 2. a collection of statistical models and their associated procedures in which the observed variance is partitioned into components due to different explanatory variables. analytical listening: listening to a complex tone in a way that individual components or partial tones are heard as separate entities. anechoic: 1. echo free. 2. a reflection-free and therefore reverberation-free room or environment. antiformant: a transfer function property in which energy is not passed effectively; opposite in effect to a formant. Antiformants, or zeros, arise because of divided passages or constrictions in a vocal tract. articulation: moving those parts of the vocal tract (for example, tongue, lips, jaw) that change its shape and volume. articulator: tongue, lips, teeth, hard and soft palates, which modify the acoustic properties of the vocal tract. aural harmonic: a harmonic that is generated in the auditory system.

102 auralization software: utilizes pure sound recordings from anechoic chambers to predict how a room will acoustically sound by adding in the room size, room shape, the acoustical properties of all surfaces. autocorrelation: 1. the comparison of a signal with a previous signal in order to pick out repetitive features. 2. an analytical procedure in which a signal is correlated with a time-shifted version of itself (auto=self). If the signal is periodic, the autocorrelation function will have a peak at the time-shift value corresponding to a fundamental period. If the signal is a periodic, the autocorrelation will not have conspicuous peaks at any time-shift value. Autocorrelation is sometimes used to determine the fundamental frequency of a speech signal. 3. a mathematical tool for describing how similar one segment of a sequence, in this instance a sound, is a delayed segment of the same sequence. A sequence that repeats identically receives the maximum autocorrelation value when the delay equals the length of the repeated segment. azimuth: the angular distance along the horizon between a point of reference, usually the observer's bearing, and another object. bandwidth: 1. a measure of the frequency band of a sound, especially the resonance. Conventionally, bandwidth is determined at the half-power (3 dB down) points of the frequency response curve. Both the higher and the lower frequencies that define the bandwidth are 3dB less intense than the peak energy in the band. 2. the frequency range of a signal. The sound on normal telephone links, for example, is restricted to frequencies from 300 Hz to 2400 Hz, and thus has a bandwidth of 3100 Hz. barbershop: a style of popular music for unaccompanied single-sex voices in close harmony, originally four male voices. There are many female barbershop groups and larger barbershop choirs. baritone: the male singing range between bass and tenor. basal pitch: the frequency at which the periodic fundamental frequency is no longer able to be analyzed. bass: the lowest male singing voice. beats: periodic variations in amplitude that result from the superposition or addition of two tones with nearly the same frequency. Beats occur when mistuning an interval slightly so that partials with nearly equal frequencies will sound together causing beats.

Bernoulli effect: the effect in which the pressure in a fluid is decreased when the flow velocity is increased. bias: 1. the ratio between the parts of the distribution lying left and right of the mode.

103 2. that which is added to the desired signal to produce a composite. In the case of magnetic tape recording, for example, the bias may be either a constant magnetic field or a field that varies at a high frequency. bimodality: the simultaneous use of two pitch collections. binaural: second reproduction using two microphones (usually a "dummy" head) feeding two headphones in order for the listener to hear the sound he or she would have heard at the recording location. cardinal vowels: eight vowel sounds that serve as a standard of comparison for the vowels of various languages. cardiod: The most common unidirectional microphone is a cardioid microphone, so named because the sensitivity pattern is heart-shaped. A hyper-cardioid is similar but with a tighter area of front sensitivity and a tiny lobe of rear sensitivity. A supercardioid microphone is similar to a hyper-cardioid , except there is more front pick up and less rear pick up. These three patterns are commonly used as vocal or speech microphones, since they are good at rejecting sounds from other directions. cardinal microphone: a microphone with a heart-shaped directivity pattern designed to pick sound in one direction preferentially. cent: one-hundredth of a semitone or half-step; the frequency ratio for one cent = 1200√2 = 1.000578. The equal tempered semitone is subdivided into cents with one cent being one hundredth of an equal tempered semitone; the number that when multiplied by itself 1200 times equals the frequency ratio of one octave. cepstrum: a Fourier transform of the power spectrum of a signal. The transform is described in terms of quefrency (note the transliteration from frequency), which has time-like properties. The cepstrum is used to determine the fundamental frequency of a speech signal. Voiced speech tends to have a strong cepstral peak, at the first harmonic (note the transliteration from harmonic). cepstral fundamental frequency analysis: the cepstrum operation to determine the fundamental frequency of a tone. channel: 1. a path for electrical current. 2. a path for computer signals; a path for electronic signals within a computer and a peripheral device. channel gain correction: small asymmetries in microphone position and physical differences between subjects that night cause small variation in gain between the left and the right microphones. The equivalent level (Leq) (average power) in the two channels must be measured and compared to reveal a channel gain correction factor to the nearest 0.1 dB for each participant. This factor is then used to adjust the channel balance for

104 maximum cancellation of the participant signal. chest voice (register): node of singing associated with a heavy mechanism or active vocalis muscles. choir: an organized group of singers who perform together, typically combining smaller groups of singers who sing different parts at different pitches. choral: having to do with choirs. choral blend: 1. aspects that contribute to choral blend, including dynamics, intonation, timbre, and temporal aspects relating to note onsets and offsets and consonants in the text. 2. the individuals should strive to make the sound of his or her own voice similar in character to that which is prevalent in the group. chorus: when one hears many singers doing the same thing, although we cannot distinguish any one of them, also referred to as ensemble. chorus effect: the effect that arises when many voices, all with flutter and pitch scatter, combine and create a quasirandom sound of such complexity that the normal mechanisms of auditory localization and fusion are disrupted. In a cognitive sense, the chorus effect can magically dissociate the sound from its sources and endow it with an independent almost ethereal, existence of its own. The sensation of this extraordinary phenomenon, strongly perceived inside the choir, is one of attractions of choir singing. chromatic scale: an ascending or descending sequence of twelve tones, each separated by a semitone. clip: (see distortion) 1. in phonetics to shorten a speech sound. clipping: the unpleasant effect that is achieved when a signal exceeds the dynamic range of the medium being used. In analog systems this is sometimes desirable in terms of the perceptual effect it has on the audio material. With a digital system, clipping occurs when the signal exceeds 0 dB FS and results in very harsh distortion, also known as distortion or clipping distortion. closed phase: the portion of the vocal fold vibration cycle for which the folds are in contact, often referred to as CP. closed quotient (CQ, Qclosed): 1. the amount time in the vibratory vocal fold pattern which the vocal fold pattern in which the vocal folds are completely closed. 2. the percentage of a vocal fold vibration cycle for which the folds are in contact, often referred to as CQ. closed phase (CO): encompasses the closing, closed and opening phases of the vocal fold

105 vibratory cycle. closed quotient (CQ): the amount of time in the vibratory vocal fold pattern in which the vocal folds are completely closed. coarticulation: 1. modification of speech sounds when they are connected to other sounds in a spoken sequence. 2. he phenomenon in speech in which the attributes of successive speech units overlap in articulatory or acoustic patterns. One feature of a speech unit may be anticipated during production of an earlier unit in the string (anticipatory or forward coarticulation) or retained during a production of a unit that comes later (retentive or backward coarticulation). combination tone: a secondary tone heard when two primary tones are received. Combination tones are usually different tones, although summation tones are possible. common partials: those partial tones of two sounds that coincide in frequency when the two sounds are playing a harmonic dyad. For example, in a pure fifth, every second partial of the upper tone will coincide with every third partial of the lower tone. compact disc (CD): a popular digital recording format, first appearing in 1982, surpassing vinyl record sales in 1988. A CD is two-track and uses a sample rate of 44.1 kHz and a resolution of 16 bits. compact disc-recordable (CD-R): a digital data storage format, based on the same technology as the audio CD. It is used to store and distribute up to 700 MB of data files or audio material. complex nonperiodic waveform: a waveform that exhibits no repeating pattern or cycle. complex periodic waveform: a nonsinusoidal waveform that exhibits a repeating pattern or cycle. condenser microphone: a microphone in which the diaphragm serves as one plate of a small capacitor or condenser. As the diaphragm moves, the electrical charge on the condenser varies. consonance: two tones presented together with minimum roughness. contact microphone: accelerometer microphone. contralto: (see alto) lowest female singing part. correlation: in statistics, the degree to which two or more variables are related when compared to each other.

106 countertenor: a male singing voice in the alto range that is primarily based on falsetto. covered singing: vocal technique introduced by the French tenor Duprez in 1830 used for singing on open vowels which smoothes the register transitions near the passaggio. This technique allows for the passage between registers without perceptual differences in timbre. It is often described as a slight darkening of the voice quality and acoustically at a higher level of the fundamental and a higher air flow. Physiologically, covering implies an elevation of the soft palate, a lowering and forward tilting of the larynx in addition to a widening of the supraglottal tract as well as the hypopharynx and the laryngeal ventricles. Some voice teachers will direct the singer to sing while yawning to achieve the effect of covered singing.

creaky voice: a voice quality that is low and rather broken up in pitch, in which the vocal folds are relaxed and vibrating in a low frequency, often with more than one closure pure cycle. crescendo: a gradual increase in loudness. crico: lower cartilage of the larynx. cricoarytenoids: muscles that rotate the arytenoids cartilages on the cricoid cartilage. cricothyroids: muscles attached to the front of the cricoid cartilage that can change the relationships of the thyroid and cricoid cartilages. critical band: 1. the frequency bandwidth at which subjective response (to loudness, pitch, etc.) changes rather abruptly. 2. the range of frequencies over which tones simply add in loudness; the critical bandwidth appears to determine consonance or dissonance. critical bands of hearing: perceptual relevance of a critical band is that tones of similar amplitudes falling into the same critical band merge into a buzzing sound unit so that the tones cannot be heard individually as two autonomous tones. As soon as the frequency separation exceeds a critical band, it is possible to hear each of the tones as autonomous tones. critical distance: the sound source to listener distance at which the levels of the direct sound and the reverberant field are equal. critical frequency: the frequency of bending (flexural) waves in a panel that can be excited by sound waves traveling at the same speed.

Cronbach's alpha: a measure of the reliability of a psychometric instrument. cross correlation: the comparison of two signals in order to pick out common features.

107 cues: the characteristics of speech sounds that help to identify the speech sound. current: the flow of electrical charge measured in amperes, often abbreviated "amps." current gain: the ratio of output current to input current. cycle: a pattern that is repeated in a periodic waveform. damping: 1. loss of energy of a vibrator, usually through friction. 2. energy loss in a system that slows it down or leads to a decrease in amplitude. 3. the rate of absorption of sound energy, related to bandwidth. digital audio tape (DAT): a tape-based digital recording medium, introduced in 1987, mostly used in studios. dB SIL: decibel level based on a sound intensity level measurement. dB SPL: decibel level based in a sound pressure level measurement. deci-: prefix indicating a tenth, which is 10-1. decibel (dB): 1. a dimensionless unit used to compare the ratio of two quantities (such as sound pressure, power, or intensity), or to express the ratio of one such quantity to an appropriate reference. 2. a dimensionless unit used for measuring sound intensity or sound pressure level. The decibel measurement is a ratio measurement that measures how loud a sound level is with respect to a reference (usually the softest sound that can, on average be heard). decrescendo: a gradual increase in loudness. de-emphasis: a process used in portable DAT machines which applies to a standard high- frequency emphasis, with a 50 μs/15 characteristic, before recording the signal, and removes it on playback. When the signal is taken from a tape in digital form, its frequency response must be corrected, if fidelity is important. The DAT stereo signal should be transferred in digital form to disc files with a computer interface so that the frequency characteristic is de-emphasized digitally for digital signal processing (DSP).

∆: the Greek letter Delta, denoting change in quantity. diaphragm: organ composed of muscles and sinews, separating the respiratory and digestive systems; partition between the chest and abdominal cavities. : a scale of seven whole tones and semitones appropriate to a particular key.

108 difference tone: when two tones having frequencies f1 and f2 are sounded together, a difference tone with frequency f1 - f2 . If properly referenced, this should be called the quadratic difference tone to distinguish it from the cubic and other difference tones. diffraction: the spreading of waves when they encounter a barrier or pass through a narrow opening. digital: refers to a signal which is coded as a stream of binary numbers. When sound is digitized, the original analog is sampled many times per second and each sample is represented by a binary number. The result can therefore be easily stored and manipulated as part of a computer system. digital-to-analog converter: a circuit that converts numbers from a digital to an analog representation. digital oscillator: a circuit that assembles a sequence of numbers to represent the desired waveform.

DSP: digital signal processing. duet: an instrumental or vocal composition written for two performers of equal importance. digital virtual disc (DVD): a high-intensity recording medium, introduced in 1996 and used for film and computer data storage. Derivatives also include DVD-audio and HD-DVD (high-definition DVD). diphthong: sound involving a gradual change in articulatory configuration from an onglide to offglide position. The usual phonetic symbol is a diagraph, or combination of two symbols to represent the onglide and offglide portions. direct sound: sound that reaches the listener without being reflected. dissonance: in acoustical sound measurement, dissonance is described as roughness and results when tones with appropriate frequency difference are presented simultaneously. distortion: 1. an undesired change in waveform, as in harmonic distortion, inter-modulation distortion. 2. a measure of the difference between the output and input signals in an amplifier. 3. signals that appear in the output of a sound reproduction system that were not present in the original program material.

Doppler effect: the shift in apparent frequency when the source or observer is in motion. dynamic microphone: 1. a microphone that generates an electrical voltage by the movement of a coil of wire in a magnetic field.

109 2. a microphone that generates an electrical signal when acoustic pressure waves cause a conductive coil to vibrate in a stationary magnetic field. dynamic range: the difference in dB SPL between the maximum acceptable level and noise floor of a system or microphone, being the useful variation between the quietist and loudest and softest parts, and this is very evident when viewed as a waveform. dynamics: refers to the loudness or softness of an audio signal and its related variation over time. A dynamic audio signal is said to show significant variation between the loudest and softest parts which is evident when used as a waveform. early sound: sound that reaches the listener within a short time (about 50 ms) after the direct sound. electret-condenser microphone: 1. a condenser microphone that has an electrified foil as a dielectric; thus eliminating the need for a polarizing voltage as required in an air-dielectric condenser microphone. 2. a type of condenser microphone in which the electrostatic charge on the plates of the capacitor is generated by an electret - a material that permanently stores an electrostatic charge. electoaerometer: an airflow transducer that converts airflow into an appropriate electrical signal. It measures vital capacity, tidal volume, expiratory flow volume, and inspiratory capacity which are compared with predicted values. electrodes: as in electrolaryngograph electrodes – an electrical conductor used to make contact with a non-metalic part of a circuit. electroglottograph: an electroglottograph offers a signal mirroring the opening and closing of the vocal folds. This is obtained by measuring the variations in a high-frequency current between surfaces electrodes placed on each side of the neck at the level of the glottis. The variation in the current is caused by the difference in the electrical impedance of the tissue when the glottis is open or closed and thus corresponds to the fundamental frequency of phonation. The resulting waveform generally meets the demands of fundamental frequency detectors, and the electroglottography is known as an excellent method for measurements of the fundamental frequency in speech (Fant et al., 1966; Fourcin, 1974). electroglottography (EGG): a device for measuring changes in electrical impedance (resistance) at the glottis. (Two electrodes are placed on opposite sides of the thyroid cartilage of the larynx, and register a waveform for visual display.) electromyography (EMG): an objective method available to study laryngeal muscle activity by providing information about the electrical activity resulting from the contraction of muscles or motor units. In laryngeal muscle study, EMG helps determine which laryngeal muscles are being used during different respiratory

110 and phonation conditions. An EMG can tell the investigator whether muscle is operating, when a muscle starts and stops contracting, whether paired muscles fire in synchrony, and to what extent a muscle is contracting. The process involves invasive needle electrodes placed into the cricothyroid and thyroid - parytenoid muscles. Christy Ludlow used EMG non-invasively with surface electrodes for studies not requiring the same precision level and high-frequency responses. energy ratio (ER): measures the balance in total energy between the low (0-2 kHz) and the high (2-4 kHz) ranges of the spectrum. It is calculated by taking the difference between the average energy in the low and high ranges of the LTAS. A low ER means that there is more energy in the high range and the singer's format region relative to the energy in the low range of the spectrum. notes: two different notes that sound the same on keyboard instruments, for example, Gb and A#. envelope: 1. time variation of the amplitude (or energy) of a vibration. 2. the amplitude of a tone as a function of time. 3. the manner in which amplitude varies with time; the envelope determines the attack and decay of a tone, among other things. epiglottis: a thin piece of skin that protects the glottis during swallowing. epilaryngeal tube: the cavity limited by the vocal folds, the epiglottis, the arytenoids, and the aryepiglottic folds. equal temperament: 1. often the tuning used for pianos, organs, and electronic keyboards, which is based on each semitone having an ideal frequency ratio - the twelfth root of two. This results in each octave being perfectly "in tune." 2. a system of tuning in which all semitones are the same, namely a frequency ratio of 2-12 = 1.059. equal tempered scale: the state in which no one interval is accurately in-tune apart from the octave, and every interval apart from the octave therefore has a degree of dissonance associated with it. equalization: 1. changing the gain of a sound system at certain frequencies to compensate for the room resonances and other peaks in the response curve. 2. frequency selective gain, defined by center frequency control, cut or boost applied, Q factor, and bandwidth. equalizer: electronic sound adjuster; an electronic device used to reduce distortion in a sound system by internally adjusting the system's response to different audio frequencies. f-test: statistical test to show the significance of variance changes.

111 falsetto: a voice quality that has a high fundamental frequency that is achieved by restricting the vibrating portions of the vocal folds. fast Fourier transform (FFT): an algorithm commonly used in the micro-computer programs to calculate a Fourier spectrum. The FFT is a special type of DFT in which the number of points transformed in a power of two. The number of points expresses the bandwidth of analysis; the higher the value, the narrower the bandwidth. feedback: 1. in choral acoustics this refers to the sound of one's own voice. Singers will sing only as quiet as they can hear themselves. 2. use of an output signal to control or influenced the input. Positive feedback, if great enough, can cause a system to oscillate. 3. an arrangement by which a portion of the output of an amplifier is applied to the input. Negative feedback induces amplifier gain but also decreases distortion. Positive feedback increases the gain and may lead to self-oscillation. 4. acoustic feedback occurs when, for instance, a microphone signal is amplified and played back through the amplifier and speaker system again. The loop continues and results in a high pitched and unpleasant squeal if left unchecked, corresponding to a particular resonant frequency characteristic of the audio system in question. 5. the lower loudness limit dictated by the need for hearing the sound of one's own voice. filter: 1. an electrical circuit that passes alternating currents of some frequencies and attenuates others. Basic filter types are high-pass, low-pass, band-pass, and band-reject. 2. a hardware device or software program that provides a frequency-dependent transmission of energy. Commonly, a filter is used to exclude energy at certain frequencies while passing the energy at other frequencies. A low-pass filter passes the frequencies below a certain cut off frequency to be transmitted. A band filter allows frequencies within a certain band to pass. filters (high-pass and band-pass): acoustic elements that allow certain frequencies to be transmitted while attenuating others. A high-pass filter allows all components above a cut-off frequency to be transmitted; a band-pass filter allows frequencies within a certain band to pass. flow glottogram graph (FGG): graph that can show the following: peak-to-peak flow amplitude, in milliliters per second, glottal leakage in milliliters per second, defined as the mean flow during the quasi-closed phase, period time in milliseconds, and duration of the quasi-closed phase in milliseconds. flow phonation: a higher peak amplitude of the trans-glottal air flow waveform, which can be achieved by reducing the degree of glottal abduction activity and by lowering the subglottal pressure. In analysis, flow phonation usually presents itself as a higher amplitude in the LTAS in the low-frequency range in choral singing which is compatible with a lower first formant frequency and to a higher amplitude of the voice source fundamental.

112 flow rate: the volume of air that flows past a point and measured per second. flutter: 1. rapid changes in the speed of a phono turntable or tape transport that can cause a wavering of the musical pitch. Vibrato is a special type of flutter. 2. personal in character, with regard to amplitude and speed. 3. small variations in F0 that are too rapid to be perceived in pitch variations, yet too slow to affect timbre, in the approximate range of 5-15 HZ for example. flutter level: a standard deviation in F0. Typical flutter levels sustained by choir singers are 10-15 cents. flutter signal: a white noise added to the synthesized sound that has passed through a second- order resonant filter and usually has its own generator. The amplitude and frequency are important in decreasing the occurrence of beating. formant(s): a resonance of the vocal tract specified by its center frequency (commonly called formant frequency) and bandwidth. Formants are noted by integers that increase with the relative frequency location of the formants (F1, F2, etc.). formant frequency: the center frequency of a formant. formant tracking: resonant frequency of the speaker's vocal track are estimated by linear prediction at regular intervals through the signal, and the formants are identified from peaks. formant transition: a change in formant pattern, typically associated with a phonetic boundary; for example, the CV formant transition refers to formant pattern changes associated with the consonant vowel transition. formant tuning: an enhancement in the energy of a song output achieved by moving a formant so that it lies over or close to the frequency of a harmonic. forte (ƒ ): loud. fortissimo (ƒƒ): very loud.

Fouier analysis: also known as spectral analysis, the determination of the component tones that make up a complex tone or waveform.

Fouier synthesis: creation of a complex tone or waveform by combining its spectral components.

Fouier transform: a mathematical procedure that converts a series of values in the time domain (waveform) to a set of values in the frequency domain(spectrum). The spectrum is the Fourier transform of the spectrum.

113

Fouier transformation analysis: the Fouier construct of any signal consists of the addition of regularly varying functions of the sine wave type, each with higher and higher frequency, the amplitudes or strengths of which will differ according to the nature of the original signal. frame: a set of points taken as a single unit of analysis. Software that performs multiple operations over an extended set of data often performs the operations on successive frames or blocks of data. In speech analysis systems, the frame is the temporal interval in which operations were performed. free field: 1. a reflection-free environment, such as exists outdoors or in an anechoic room, in which sound pressure varies inversely with distance. 2. that part of the sound field where the sound level decreases by 6 dB for each doubling of distance. frequency: 1. the number of vibrations per second, expressed in Hertz (Hz). 2. the rate of vibration of a periodic event; for example, a periodic sound has a frequency measured as the number of cycles of vibration per second (expressed in Hertz (Hz). frequency domain operation: an operation that is performed in the frequency domain, for example, with a FFT or LPC spectrum. frequency modulation (FM): the method of radio broadcasting in which the frequency of a carrier wave is determined by the audio signal. The FM band, which extends from 88 to 108 MHz, allows stations sufficient bandwidth to transmit high-fidelity stereophonic, and even quadraphonic, sound. frequency perturbations: jitter. frequency response: how the gain in a system varies with the frequency. fricative: 1. a speech sound, such as the consonants in Sue, zoo, show, Joe, fee and vee, that involves acoustic noise as its voice source (voiceless fricatives) and acoustic noise as well as vocal fold vibration (voiced fricatives), resulting from the airstream flowing through a narrow gap in the vocal tract. 2. a speech sound characterized by a long interval of turbulence noise. Fricatives are often classified as stridents or non-stridents, depending on the degree of noise energy. fricatives: consonants that are formed by constricting air flow in the vocal tract , such as f, v, s, z, th, sh, etc. function generator: an audio generator that provides several different waveforms or functions at the desired frequency.

114 fundamental(s): 1. the lowest common factor in a series of harmonic partials. The frequency of a periodic waveform is the reciprocal of its period. 2. first harmonic of the pitch being sung 3. the mode of vibration (or component of sound) with the lowest frequency. fundamental frequency (f0, F0, F0): 1. the number of repeating cycles of a periodic waveform occurring in one second. The unit is the Hertz or Hz. 2. the lowest frequency (first harmonic) of a periodic signal. In speech, the fundamental frequency refers to the first harmonic of the voice. Fundamental frequency is the reciprocal of the fundamental period. Ideally, fundamental frequency is used to refer to a physical measure of the lowest periodic component of the vocal fold vibration. Pitch should be used to indicate the perceptual phenomenon in which stimuli can be rated along a continuum of low to high (see pitch determination algorithm). 3. the frequency with which the vocal folds repeat their oscillatory motion in phonation. The "fundamental" is another name for the first (lowest) partial tone, whose frequency corresponds to the periodicity of the sound (cycles per second). fundamental maximum (F0 max): the mode of distribution. fundamental mode: the mode of lowest frequency. glide: a consonant sound that has a gradual (gliding) change in articulation reflected by a relatively long interval of formant-frequency shift. glottal stop: the sudden release of a glottal closure. glottis: 1. the v-shaped opening between the vocal folds. 2. the sudden release of a glottal closure. harmonic: a series of partials with frequencies that are simple multiples of a fundamental frequency. In a harmonic series, the first harmonic would be the fundamental, the second harmonic of the first overtone. harmonic distortion: 1. harmonics are generated by altering the waveform, for example, clipping the peaks. 2. the creation of harmonics (frequency multiples) of the original signal by some type of nonlinearity in the system (the most common cause is over-driving some component). harmonic-to-noise ratio: the degree of periodicity in the sound. head voice (register): 1. upper range of the singing voice. 2. found to differ from the chest/modal register in that it involved higher pitch, smaller amplitude of vocal fold vibrations, relatively longer closed segments, and shorter closing time (reflected in an increased speed quotient). It differed from the falsetto/loft register in that the vibratory cycle and reduced closing time.

115 heavy mechanism: a term sometimes used to describe the predominant role of the vocalis muscle; "chest voice."

Hemholtz resonator: a vibrator consisting of a volume of enclosed air with an open neck or port. hemi-anechoic: used to describe a room with structural reflections from the floor only. hertz (Hz): cycles per second - the international system of units (SI) base unit of frequency. histogram: (fundamental frequency histogram) a graph of a frequency distribution in which rectangles with basis on the horizontal axis are given widths equal to the class intervals and heights equal to the corresponding frequencies.

Hixon's kinematic method: a method of measuring usable air volume as it applies to vocal production. Hixon proposes the chest wall is made up of a two part system of the cage and the abdomen. Together, the rib cage and abdomen may displace a volume equal to that displaced by the lungs. To estimate these volumes, one looks at the changes in diameter(s) using magnetometers. Through this line of research, a well defined set of kinematic patterns associated with speech in normal subjects has been documented. hyper-functional phonation: pressed tone quality. hypo-functional notation: breathy tone quality usually caused by poor glottal abduction. impedance: 1. the ratio of the pressure to the velocity in a sound wave. 2. in electricity - a measure of the opposition to the flow of electric current by a circuit element such as a resistor, capacitor, or inductor. Impedance is measured in ohms. 3. in microphones - the ratio of voltage to current. In the case of source impedance, or output impedance, it is the current that the device can deliver. In the case of input impedance, it is the current that the device draws from the source. inharmonic overtones: overtones whose frequencies are not multiples of the fundamental, as in harmonics. inharmonic partial: 1. a partial or overtone that is not a harmonic of the fundamental. inharmonicity: the departure of the frequencies of partials from those of a harmonic series. intensity: power per unit area; rate of energy flow that indicates changes that are measured by reference to a sound level matter. interference: the interaction of two or more identical waves, which may support

116 (constructive interference) or cancel (destructive interference) each other. inter-modulation distortion: 1. generation of sum and difference tones. 2. (IM) distortion - the creation of sum and difference frequencies from signals of two different frequencies. interval: 1. a frequency ratio expressed in a logarithmic unit. 2. the distance between two frequencies (pitches). inverse filtering: a technique that compensates for the effect of the vocal tract resonances (formants). Ideally, this provides a graph showing the transglottal air flow wave form during phonation. This is done by tuning so-called inverse filters to formant frequencies so that a smooth spectrum envelope, unaffected by formant peaks, and a smooth flow glottogram are obtained. The FGG mirrors the transglottal airflow variations. jitter: irregularity in the period of time of vocal fold vibrations; cycle-to-cycle variation in fundamental frequency. Jitter is often perceived as hoarseness. just interval: occurs when the ratio of their frequencies can be expressed by small integers. If a just interval is formed by two tones with harmonic spectra, then some of the partials from the two tones will coincide in frequency. For example, fifth - 3.2, major third - 5.4. just intonation: intervallic tuning based on integer ratios taken from the natural harmonic series that attempts to make thirds, fourths, and fifths as constant as possible. It is based on major triads with frequency ratios 4:5:6. Chords tuned in just intonation will have the maximum degree of consonance associated with them because the maximum number of neighboring harmonies of each note is in unison and the tuning is maximally beat-free. kilohertz (Hz): 1,000 Hertz. laminar flow: a type of airflow in which the air moves in smooth layers, contrasts with turbulence. larynx: 1. the source of sound for speaking and singing. 2. the organ in the neck that is used for voiced sounds; sometimes referred to as the voice, is the source of sound for speaking and singing. The larynx is the human sound generator comprised of quasiperiodic pulses of air which cause vibration of the vocal folds, often referred to as the gatekeeper of the lungs. 3. the section of the vocal tract, composed mostly of cartilage, that contains vocal folds. larynx closed quotient measurement with electrolaryngograph: larynx closed quotient (CQ) is measured from the electrolaryngograph output waveform. This allows the nature of vibration of the vocal folds to be monitored non-invasively via two electrodes that are placed superficially on either side of the neck at larynx level and held in place with an

117 elastic neckband. A constant amplitude high frequency voltage is applied between the electrodes and the electrolaryngograph monitors the current flow which varies as the electrical output waveform, usually denoted Lx, is indicative of the inter-electrode current flow. light mechanism: term sometimes used to describe dominant vocal-ligament activity; may refer to head voice. limen: just notable difference (jnd). linear predictive coding (LPC): 1. describes a speech waveform in terms of a set time-varying parameters derived from speech samples. 2. class of methods used to obtain a spectrum. Linear predictive coding uses a weighted linear sum of samples to predict an upcoming value. logarithm: the power to which 10 or some other base must be raised to give the desired number. logarithm scale: a scale on which moving a given distance right or left multiplies or divides by a given factor.

Lombard effect: 1. the tendency to increase one's vocal intensity in noise. 2. First described by Etienne Lombard in 1911, the Lombard effect is the spontaneous tendency of speakers to increase their vocal intensity when talking in the presence of noise. long-term average spectrum (LTAS): an average of a number of short-term spectra. loudness: 1. the subjective assessment of the strength of a sound, which depends on its pressure, frequency, and timbre; loudness may be expressed in sones. 2. the perceived level of a sound which changes as acoustic intensity changes. loudness level: sound pressure level of a 1000 Hz tone that sounds equally loud when compared to the tone in question. Loudness level is expressed in phons. major diatonic scale: a scale of seven notes with the following sequence of intervals: two whole tones, one semitone, three whole tones, one semitone. major triad: a chord of three notes having intervals of a major third and a minor third respectively. manner: a description used with voice and place to describe the way in which speech sounds are articulated. masking: 1. the obscuring of one sound by another. 2. the phenomenon that one cannot hear a normally audible sound because of the

118 presence of a competing sound. maximum flow declination rate (MFDR): amplitude of the negative peak of the differentiated flow glottogram. mean: intermediate value in mathematics, a value that is intermediate between other values, such as an average or expected value. meantone: see Pythagorean tuning. messa di voce: singing the same pitch, while the loudness is varied from soft to loud and then from loud back to soft with no change in timbre. measurement: an act of or the process of ascertaining the extent, dimensions, quantity, etc. mezzo-forte (mf): medium loud. mezzo-piano (mp): medium soft. mezzo-soprano: the female singing voice that is between an alto and a soprano.

MF0: the arithmetic mean of the distribution, the time averaged such that the voice pitch is perceived to match a target pitch.

MFON: mean fundamental average of all participants.

MFOV: mean fundamental average of all participants.

MFOV, N: one value of the mean fundamental average of all participants and the scatter over the singers. mHz: an abbreviation for mill Hertz, millihertz or megahertz. microphone: a device that converts sound pressure variation into an electrical voltage signal. microtone: any interval smaller than a semitone. middle register: a combination of light and heavy mechanism that lies between the head and chest registers. : a scale with one to three notes, lowered a semitone from the corresponding major scale. In the key of C-minor, the three minor scales are: natural: C D Eb F G Ab Bb C harmonic: C D Eb F G Ab B C melodic (ascending): C D Eb F G A B C melodic (descending): C D Eb F G Ab Bb C

119 minor triad: a chord of three notes having intervals of a minor third and a major third, respectively (as C: Eb : G ). modulate: to change some parameter (usually amplitude or frequency) of one signal in proportion to a second signal. modulation transfer index: a measure used in room acoustics that predicts how well the variations in the loudness of a source are transmitted to a listener. In rooms with little reverberation, the loudness at the listener's position will faithfully track that of the source (MTI close to 1): while in rooms with much reverberation the loudness at the listener will be smeared to an almost constant level (MTI close to 0). The MTI value correlates rather well with the intelligibility of speech in a room. monaural: sound reproduction using one microphone to feed a single headphone, such as is used in telephone communication. monitor: generally refers to listening to the output of an audio system. Hence, loud speakers are sometimes called monitors. Also that aspect of a PA system that relates to the onstage performers. The monitor mixing desk in a large PA system is located on or near the stage. The monitor speakers (also known as wedges) are those used to project sound to the performers rather than the audience. monophonic: sound reproduction using one microphone to feed one or more loudspeakers with one signal musical staff (pl: staves): a five-line graph on which musical notes are written. narrow-band analysis: an analysis in which the analyzing bandwidth is relatively narrow (such as 45 Hz in speech analysis). A narrow-band analysis is preferred when the interest is to increase frequency resolution, as in the analysis of harmonics for a man's voice. nasal: a speech sound that involves nasal radiation of sound energy, either with or without an accompanying oral radiation. nasal cavity: nose. nasal formant: the low frequency resonance associated with the nasal tract. For men's speech, the nasal formant has a frequency of less than 500 Hz. nasals: consonants that make use of resonance of the nasal cavity such as m, n, ng. near field: that part of the sound field where the sound level varies from point to point because of the radiation pattern of the source.

120 node (or nodal line): 1. a point or line where minimal motion takes place. 2. points or lines that do not move when a body vibrates in one of its modes. non-voiced sounds: see voiceless sounds. nomogram: a graph usually containing three parallel scales, graduated for different variables so that when a straight line connects values of any two, , the related value may be read may be directly from the third at the point intersected by the lines. normal modes: independent ways in which a system can vibrate. normalization: a correction for variance. Speaker normalization refers to the correction or scaling that reduces variability in acoustic measures such as formant frequencies. Time normalization refers to the correction or scaling that reduces variability in the duration of sound sequences. normalize: to conform to a standard, model, or pattern.

Nyquist sampling theorem: This theorem states that a digital representation requires at least two sampling points for every periodic cycle in the signal of interest. Therefore, the sampling rate of digitization should be at least twice the highest frequency of interest in the signal to be analyzed. Unfortunately, the term Nyquist Frequency is inconsistently used. Some use it to indicate the highest frequency of interest in an analysis; others use it to refer to twice the highest frequency of interest, that is, to the sampling rate needed to prevent aliasing. octave: the basic unit in most musical scales. Notes judged an octave apart have frequencies nearly in the ratio 2:1. ohms: measures impedance in electricity. omni-directional: a microphone directivity pattern that is perfectly uniform; that is, the microphone is equally responsive to sounds indicated from any angle. open phase (OP): 1. the part of the vocal fold vibratory cycle is which the vocal folds are completely open. An open phase does not include the process of opening or closing the vocal folds, only the complete open stage. 2. the portion of the vibration cycle for which the folds are apart. open quotient (OQ): the percentage of a vocal fold vibration cycle for which the folds are apart. oral cavity: mouth. oscilloscope: a device that depicts on a screen periodic changes in and electric quantity, as as voltage or current, using a cathode-ray tube or similar instrument.

121 other-signal: the sum of all external sounds perceived by the singer. oversampling: a method for increasing the rate of digital samples to the DAC in order to avoid the need for an analog filter with a sharp cutoff. overtone(s): 1. a mode of vibration (or component of sound) with frequency greater than the fundamental frequency. 2. upper partials or all components of a tone except the fundamental. 3. harmonics that are over the fundamental, where the first overtone is the first tone over the fundamental (the second harmonic), the second overtone is the third harmonic, etc. paired t-test: for matched paired samples where two groups are matched on a particular variable (see t-test). palate: the roof of the mouth. partial tone: (partial) 1. one of the components in a complex tone. It may or may not be a harmonic of the fundamental. 2. a mode of vibration (or component of a sound); included the fundamental plan plus the overtones. passaggio: the point of transition between the chest/modal register and the adjacent higher register. peak clipping: omitting the amplitude of a waveform so that peaks in the waveform are eliminated; this distorts the waveform. peak-to-peak pulse amplitude (Up-t-p): the measurement between the peak and the trough of a waveform.

Pearson product-moment correlation (Pearson correlation): a statistical measurement obtained by dividing the co-variance of the two variables by the product of their standard deviations. pentatonic scale: a scale of five notes used in several music cultures, such as the Chinese, Native American, and Celtic cultures. period: 1. the smallest increment of time over which a waveform repeats itself. 2. the time duration of one vibration; the minimum time necessary for the motion to repeat. 3. the time that a repeating cycle lasts. periodic quantity: a period that repeats itself at regular time intervals.

122 periodicity pitch: pitch determination of the basis of the period of the waveform of a tone. perturbation: term for describing a complicated system (phonation) in terms of a simpler one. perturbation measures: indices of irregularity or instability, especially in the laryngeal waveform. The common measures of perturbation include jitter, shimmer, and signal-to noise ratio. phantom power: 12 to 48 v DC applied to pins 2 and 3 of the microphone connector required to make non-electret condenser microphones work, that are usually supplied from the microphone input of the mixing desk, but can also come from an internal battery. Although only used for condenser type microphones, phantom power will not damage the internal workings of a dynamic microphone if used in error. pharynx: 1. lower part of the vocal tract which connects the mouth to the trachea. 2. the lower part of the vocal tract connecting the larynx and the oral cavity. 3. the vocal tract region between the larynx and the velum. phase: the fractional part of a period through which a waveform has passed, measured from a reference. Phase is often expressed as an angle that is an appropriate fraction of 3600. phase cancellation: occurs when two acoustic (or electric) signals are added together that are not exactly matched, and hence are not in phase. Peaks and troughs in the signal waveform do not line up, and this can result in a signal of lower amplitude than the original, or additional peaks or troughs in the overall frequency response. Phase cancellation typically occurs when a signal plus a very slightly delayed version of the same signal are added together and thus may happen due to reflections from a surface being added to the direct sound at a microphone or due to timing difference when using multiple spaced microphones. phase difference: 1. a measure of the relative positions of two vibrating objects at a given time; also the relative positions, in a vibration cycle, of a vibrating object and a driving force. 2. the difference in phase angle between two simple harmonic motions or waves. (If the phase difference is zero, they are in phase; if it is 1800, they are in opposite phase.) piano (p): soft. pianissimo (pp): very soft. pitch: particular frequency of a single note. pitch determination algorithm (PDA): (also known as pitch distraction.) a procedure used to extract the fundamental frequency of a speech signal. Although the term pitch strictly should be used to refer to a perceptual phenomenon, is it is often used in

123 speech analysis to refer to fundamental frequency. phon: a dimensionless unit used to measure loudness level; for a tone of 1000 Hz. The loudness level in phons equals the sound pressure level in decibels. phonation: 1. a sound with a sound source that involves vocal fold vibration. 2. requires proper functioning of three interrelated systems: respiratory, laryngeal, and articulatory. Subglottal aerodynamic power is converted into laryngeal and acoustic dynamics with fine tuning by the kinesthetic and auditory senses. phonation frequency: fundamental frequency (F0). phonemes: individual units of sound that make up speech. phonetics: the study of speech sounds. phonetogram: (also called a voice profile) recordings of minimum and maximum phonatory sound level as functions of fundamental frequency illustrated as a graph showing the sound pressure level (SPL) of softest and loudest phonation over the entire fundamental frequency (f0) range of a voice. A phonetogram is a display of intensity range versus fundamental frequency (f0). phono: a one signal wire plus screen hi-fi connector carrying a line level signal and used extensively on semiprofessional recording equipment. pitch: 1. that attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high. 2. the perception of tones on a scale from low to high, essentially, but not entirely, due to changes in fundamental frequency. pitch scatter: arises when voices in an ensemble exhibit small differences in mean fundamental. pitch shift: sound recording technique in which the normal pitch or tone of the sound is altered. plosive: a speech sound that involves a complete vocal tract closure behind which lung air pressure builds up, and the closure is released to create the characteristic sound. plosives: consonants that are produced by suddenly removing a constriction in the vocal tract (p, b, t, d, k, g). plethmysmograph: 1. a process in which a patient is placed in a sealed box with the face freed for full phonation. The changes of volume within the box, which occur as the patient changes thoracic air volume are recorded by connecting the box to a spirometer

124 face. Its basic function is to measure airflow. First described by Mead in 1960 and then modified by Hixon and Warren. 2. a type of airflow transducer that converts airflow into an appropriate electrical signal. pneumotachometer: a type of airflow transducer that converts airflow into an appropriate electrical signal. When added to a ventilograph, moment to moment airflow can be measured. portamento: sliding from one note to another rather than changing the pitch abruptly. power gain: the ratio of output power to input power. power source: in the context of voice production, the energy source for sound creation, which is provided from the lungs as an outflow of air by the breathing mechanism. preamp: an electronic current and the first stage of amplification used to boost a signal level. The preamp is found in the input stages of mixing desk for boosting microphone levels, also as external stand-alone units and as part of the internal workings of a condenser microphone. preemphasis: in speech analysis, a filtering that boosts high-frequency energy relative to low-frequency energy. Because speech normally contains its strongest energy in the low frequencies, these frequencies would dominate analysis results if preemphasis were not performed. pressed phonation: characterized by an elevated degree of glottal adduction which yields less SPL for a given Ps than a more neutral mode of phonation. prevoicing: the onset of voicing before the appearance of a supra-glottal articulatory event; for example, for stops, prevoicing means that voicing precedes the stop release (also called voicing lead). proprioception: awareness of stimuli produced by one's own tension, relaxation, movement, or function, resulting in muscle, vibratory, or auditory sensation. prosodic feature: a characteristic of speech, such as pitch, rhythm, and accent, that is used to convey meaning, emphasis, and emotion. proximity effect: the bass boost that occurs when using a cardiod type microphone placed close to a sound source. The closer the microphone, the greater the low frequency boost. psychophysics: the study of the relationship between stimuli and the sensations they produce. pure sound: sound without sound reflections or echoes, a resultant of recordings from anechoic chambers. Pure sound is utilized in acoustical design research and with auralization

125 software. pure tuning: pure intonation (also known as just intonation) is any musical tuning in which the frequencies of notes are related by ratios of whole numbers.

Pythagorean comma: a formula supporting equal temperament tuning of pianos for it shows that if one were to tune the piano by securing ascending P5ths as one moved around the circle of 5ths, the result would be sharp octaves. If one were to tune descending P5ths (around the circle of 5ths), the result would be flat octaves. The Pythagorean comma illustrates the small difference between two kinds of semitones (chromatic and diatonic) in the Pythagorean tuning; a frequency ratio 1.0136 corresponding to 23.5¢. Pythagorean Comma = [3/2]12 = 129.746=1.01364 [2]7 128.000

Pythagorean tuning: 1. a system of tuning based on perfect fifths and fourths. 2. the only intervals constant in Pythagorean tuning are the octaves, fourths and fifths, which could all be derived by the simple operation of numbers one to four. The divisions were based on a philosophical numerological principle rather than on the acceptability of the sounds of the intervals themselves. Therefore, the frequency ratio of the interval of the major third was the rather complex one of the 81:64.

Qclosed: closed quotient of the vocal fold wave. quartet: four persons singing and/or playing an instrument. quadraphonic: sound reproduction using four microphones to feed four loudspeakers, usually two are in front of the listener and two are behind or to the sides. quantization: the assignment of discrete values to the amplitude dimension of an analog signal. Quantization is the process by which a continuous variation in amplitude is represented as a sequence of discrete values. This process is necessary to represent the signal in a digital computer. quantization noise: a signal distortion that results from an inadequate number of quantizations levels in digitizing a signal. radiation characteristic: the term in source filter theory associated with the radiation of sound from the lips to the atmosphere. It is typically expressed as a 6 dB per octave increase in sound energy (hence, a high pass filter). real-time: an operation which takes no more time than the incoming signal itself. reference: (in choral acoustics) the sound of the rest of the choir as the singer hears them. The level difference between the reference and the feedback is one of the more important acoustic factors in choir singing. Weak feedback or reference generally leads to intonation problems.

126 register: 1. a group of related notes on a musical instrument; one register, for example, may include all notes whose pitch corresponds to the lowest resonance of an air column. 2. a phonation frequency range in which all tones are perceived as being produced in a similar way and possessing a similar quality. resonance: 1. When a vibrator is driven by a force that varies at a frequency at or near the natural frequency of the vibrator. 2. in electricity, the natural frequency of a system at which its response to a mechanical or electrical force reaches a maximum. 3. a preferred frequency of a system which in the case of the vocal tract is the set of formants associated with different articulations that create various vocal tract shapes during speech and singing. reverberant field: that part of the sound field in which sound level is independent of distance from the source. reverberant sound: 1. sound that builds up and decays gradually and can be "stored" in a room for an appreciable time. 2. sound that reaches the listener after a large number of reflections; as one moves away from a sound source, the sound level reaches a steady value called the reverberant level. reverberation radius: the sound source to listener distance at which the levels of the direct sound and the reverberation field are equal. reverberation time: the time required for the stored or reverberant sound to decrease by 60 dB. rf: radio frequency. rhinomanometer: a device used to measure nasal inspiratory flow and pressure. roll off frequencies: frequencies selected to be filtered out of the sound so that other frequencies predominate.

Rothenberg flow mask system: see pneumotachometer. roughness: 1. a difference between male and female voice character. Male voices tend to have roughness whereas female voices will be more smooth. In extreme forms, roughness implies that one can hear the individual air pulses during phonation. Roughness is dependent on the critical bands of hearing. rounding: an articulatory description referring to the rounding or protrusion of the lips. As applied to vowels, rounding is associated with a lowering of the frequencies of all formants.

127 second-order beats: beats between two tones with frequencies that are nearly but not quite in a simple ratio, also beats between mistuned consonances. self-signal: the sum of the airborne sound (not counting room reflections) and the bone- conducted sound that the singer perceives of his or her own voice. self-to-other ratio (SOR): the level difference in decibels between self and others, with a stronger self signal represented by a positive SOR value. The SOR will increase with increased spacing between singers, because the direct sound from immediate neighbors becomes weaker with distance. The amount of reverberation in the room governs the intensity of the diffuse field, which is dominated by the other sound; hence, the SOR decreases with increasing reverberation. sensitivity (microphone): 1. voltage or power generated in a microphone at a given sound pressure level (SPL). 2. the unloaded output voltage of a microphone determined by placing it in front of a reference sound source with a measured sound pressure level of 94 dB at 1 kHz. This value for SPL is the same as a pressure value of 1 Pa. semitone: 1. one step on a chromatic scale, normally 1/12 of an octave. 2. a half-step. In equal temperament, a semitone corresponds to 100¢ or to a frequency ratio of 1.059.

SFON: standard deviation of fundamental stability of one singer.

SFOV: standard deviation measurement of scatter in FO across singers.

SFOV, N: standard deviation measurement of the fundamental average of all participants and the scatter over all the singers. shimmer: an index of instability in the laryngeal waveform, usually measured as variation in the amplitude of successive glottal cycles. short term spectrum: a single spectrum. sidebands: sum and difference tones generated during modulation. signal-to-noise ratio: 1. the ratio (usually expressed in dB) of the average recorded signal to the background noise. 2. the ratio between the measured noise floor or noise level of a medium and a reference signal transmitted through this medium; typically for microphones, a 1 kHz test tone at 94 dB SPL. 3. a measure of the ratio between signal energy and noise energy. In speech analysis S/N usually refers to the periodic energy relative to noise energy.

SIL: sound intensity level.

128 simple harmonic motion: smooth, regular vibrational motion at a single frequency such as that of a mass supported by a spring. sine wave: a waveform that is characteristic of a pure tone (that is, a tone without harmonics or overtones) and also simple harmonic motion. singer's formant: 1. a resonance around 2500-3000 Hz in male (and low female) voices that adds brilliance to the tone. 2. a resonance in the 2.5 kHz to 4 kHz region that gives a voice projection over an orchestra, often described as its ring. 3. widely used by classical singers and teachers 4. singers project their voices over an orchestra by adding an additional region of energy peaks within a particular frequency region of the voice spectrum corresponding to the shape of the epilaryngeal tube. 5. occurs when singers cluster formants 3, 4, 5 which in turn generates the singer's formant (Fs). 6. located at an optimal frequency, high enough to be in the region of declining orchestral sound energy but not so high as to be beyond the range in which the singer can exercise good control. Because it is generated by resonance effects alone, it calls for no extra air pressure. singing formant: quality singing that has a characteristic "ring" identified as a harmonic strength centering around 2.8 Hz. This timbral quality is relatively independent of other factors. singing power ratio (SPR): a measure of the singer's format energy. The SPR measures the difference between the peak levels in the low range (0-2 kHz) and the high range (2-4 kHz) of the spectrum. sinusoidal: pertaining to a sine wave; a pure tone or frequency of vibration. sinusoidal force: a smoothly varying force with a single frequency; the waveform is described as a sine wave. smear: the standard deviation over voices of the scaling factor up or down of formant clusters. solo: one person singing and/or playing an instrument. sone: a unit used to express subjective loudness; doubling the number of sones should describe a sound twice as loud. soprano: the female singing voice above mezzo-soprano. source-filter theory: a theory of the acoustic production of speech that states that the energy from a sound source is modified by a filter or set of filters. For example, for vowels,

129 the vibrating vocal folds usually are the source of sound energy and the vocal tract resonances (formants) are the filters. sound modifiers: the cavities of the vocal tract which lie between the sound source and the lips and/or nostrils.

-12 sound power level: Lw = 10 log W/W0 where W is sound power and W0 = 10 W (abbreviated PWL or Lw).

-5 2 sound pressure level: Lp = 20 log p/p0 where p is sound pressure and p0 = 2 X 10 N/m (or 20 micropascals) (abbreviated SPL or Lp). sound source: the mechanism that converts power form the power source into sound, which in speech is either the vibrating vocal folds for voiced sounds, or air being forced past a constriction in the vocal tract for a voiceless sound. sound spectrograph: an instrument that displays sound level as a function of frequency and time for a brief sample of speech or song. sound spectrographic analysis: primary tool used in acoustic analysis of the voice. This analysis is based on the Fourier theorem which states that any periodic waveform can be analyzed into a series of sine waves with different frequencies, amplitudes, and phase relations. The fundamental (repetition) frequency and harmonics (integral multiples of the repetition) can be determined. The fundamental frequency value gives a clue to abnormalities, but does not establish a cause for the problem. source spectrum: a plot of amplitude against frequency. spectral analysis: (also known as Fourier analysis) determination of the component tones that make up a complex tone or waveform. spectral dominance: a view that certain partials dominate in the determination of the pitch of a complex tone. spectral smear: defined as such dispersion of formants 3 to 5 as arises form differences in vocal tract length. spectrogram: 1. a graph of sound level vs. frequency and time as recorded on a sound spectrograph or similar instrument. 2. the output plot from a spectrograph, which is plotted either in color or grey scale with frequency on the vertical axis, time on the horizontal axis, and the color or degree of grey indicating the energy level at that frequency at that point in time. 3. a pattern for sound analysis containing information on intensity, frequency, and time. The typical spectrogram provides a three-dimensional display of time on the horizontal axis, frequency on the vertical axis, and the intensity on the grey scale. A spectrogram can be printed as hard copy or displayed on a video monitor.

130 spectrograph: a machine or computer program that carries out an analysis of the energy in an acoustic signal across frequency and time. The output, known as the spectrogram, is plotted either in color or grey scale with frequency on the vertical axis, time on the horizontal axis, and the color or degree of grey indicating the energy level at that frequency at that point in time. spectrum: 1. the "recipe" for a complex tone that gives the amplitude and frequency of the various partials. 2. a plot of energy against frequency. A single spectrum is known as a short-term spectrum, and an average of a number of a short-term spectra is known as a long-term average spectrum, or LTAS. 3. a graph showing the distribution of signal energy as a function of frequency; a plot of intensity by frequency. spectrum analysis: used to track formant frequencies I singing; to modify the definition of vibrato. spirometer: consists of a cylinder, resting in water, in and out of which a patient breathes while movements are recorded. A spirometer records vital capacity.

SPL: sound pressure level. spoiler: an obstacle in the path of airflow. In the production of fricative sounds, the upper and lower teeth may serve as spoilers. . standard deviation (SD): a measure of the dispersion of a set of values, defined as the root- mean-square (RMS). SD is the deviation of the values from their mean, or as the square root of the variance. stereophonic: sound reproduction using two microphones to feel two loudspeakers. stop: a speech sound characterized by a complete obstruction of the vocal tract; usually followed by an abrupt release of air that produces a burst of noise. stop gap: the acoustic interval corresponding to articulatory closure for a stop or affricate. strident: a fricative with an intense noise energy; also called a sibilant; /s/ is an example. The nonstrident fricatives have less energy. stroboscope: a light that flashes as a regular rate, making possible a photographic record of motion. stroboscope tuner: a tuning device that make use of a rotating pattern illuminated by flashing lights.

131 stroboscopy: comes in two basic forms; the flexible fiberoptic laryngoscope and the rigid laryngoscope. Both provide visual analysis of respiration, phonation, glottal effort closure, and swallowing. subglottal pressure: 1. main physiological control parameter for vocal loudness. 2. expressed in terms of the normalized excess pressure, defined as the ratio between the excess pressure above the threshold pressure and the threshold pressure. support: a term employed by voice teachers to indicate control of the power source from the region of the diaphragm. supraglottal studies: studies of the spatial, aerodynamic and acoustic characteristics of the area superior to the larynx. Some equipment utilized are: radar x-ray devices, nasal airflow devices including rhinomanometer; palatal sensor devices, craniofacial research which includes intra-oral airflow and pressure, lingual and palatal pressure, oral sensory, perception and visualization techniques. synthesizer: an instrument that creates complex sounds of generating, altering, and combing various electrical waveforms, generally by means of voltage-controlled modules. synthetic listening: (holistic listening) listening to a complex tone in a way that focuses on the whole sound rather than the individual partials. syntonic comma: the small difference between a major or minor third in the Pythagorean and just tunings. tenor: the male singing voice above the baritone and below the countertenor. tension: the force applied to the two ends of a string, or around the periphery of a membrane, that provides a restoring force during vibration. tessitura: the pitch range that predominates in a piece of music. thyroid cartilage: the largest single cartilage of the larynx. thyroid-arytenoids: muscle arising below the thyroidal notch and inserted into each arytenoids. timbre: 1. an attribute of auditory sensation by which two sounds with the same loudness and pitch can be judged dissimilar as determined by the spectrum. Formants 3, 4, 5 are used to ascertain voice timbre. 2. a perceived difference between sounds that is not related to a change in pitch, loudness, or duration.

TIME: the total sample time defined as the sum of the period times considered in the calculations.

132 time-domain operation: an operation that is performed in the time domain, for example, calculations performed with respect to the waveform of a sound. token: each measured unit of speech, song, syllable, word, and utterance. tongue advancement: an articulatory description referring to the relative position of the tongue in the anterior-posterior (front-back) dimension of the vocal tract. As applied to vowels, tongue advancement relates primarily to the relative frequency of F2, or to the frequency difference between F1 and F2. Front vowels tend to have relatively high F2 values and a relatively large value of the F2-F1 difference. tongue height: an articulatory description referring to the relative position of the tongue in the inferior-superior (low-high) dimension of the vocal tract. As applied to vowels, tongue height relates primarily to the relative frequency of F1; the higher the vowel, the lower F1 tends to be. Tongue height also varies with jaw position, such that high vowels tend to have a closed jaw position. transducer: a device that converts one form of energy into another; for example, acoustic energy to electrical energy. tremolo: rapid reiteration of a note; a trill. tremolo: (tremulant) a device on an organ that produces a vibrato, usually by varying the air pressure. triad: a chord of three notes; in the just tuning, a major triad as frequency rations 4:5:6 while a minor triad has ratios 10:12:15. truncated: shortened by having a part cut off or removed. t-test: any statistical hypothesis test in which the test statistic has a student's distribution if the null hypothesis is true. It is applied when samples are small enough that using an assumption of normality and the associated z-test lead to incorrect inference. turbulence: a condition of airflow in which eddies (rotating volume elements of air) are generated. This condition is associated with noise energy (therefore we speak of turbulence noise). Turbulence contrasts with laminar flow. turbulent flow: fluid flow characterized by eddies and vortices; the flow velocity tends to vary randomly. turning point: 1. the point at which reflection of a wave occurs at the epen end of the bell, tubing, or vocal tract. 2. the point in a musical instrument at which most of the sound wave is reflected back toward the mouthpiece.

133

Tx: the measurement of fundamental period. ultrasonography: first used by Hertz (1970) to visualize vocal chord movement and then by Kelsy to study pharyngeal wall motion, involves the use of a transducer to produce a frequency in the ultrasonic range. Sound waves are reflected off target structures and picked up by a sensor. In other words, the reflection of high frequency sound from the the air/tissue interface of the vocal fold motion during phonation. value in cents: 1. a formula for converting tine tuning frequency values to cents wherein one cent is 1/100th of a semitone, and there are twelve semitones in one octave. 2. value in cents = 3986.3137 x log10 (fine tuning value/440). variable-pitch recording: the spacing of the groove on a disc according to the amplitude of the recorded material. velocity microphone: a microphone that responds to particle velocity rather than to sound pressure. velum: the soft palate, which can be moved up to close off the nasal cavity from the airstream and down to open it. ventilograph: equipment used to provide information on basal minute ventilations and maximum breathing capacity. Often a pneumotachometer is added to this equipment, wherein moment to moment airflow speed can be determined. vibrato: 1. tonal effect in music resulting from periodic variation of amplitude, frequency, and/or phase. 2. frequency modulation (FM) that may or may not have amplitude modulation (AM) associated with it. Some musicians speak of "intensity vibrato," "pitch vibrato" and "timbre vibrato" as separate entities; others understand vibrato to include all three. Sometimes the term tremolo is used to describe other things, such as rapid reiteration of a note or even a trill. 3. periodic modulation of the fundamental frequency often associated with operatic singing. 4. corresponds to a regular undulation of fundamental frequency. Vibrato is characterized by the rate (the number of undulation cycles per second) and the extent (the magnitude of the greatest departures from the average typically varying from ± 6 to ± 12%). 5. laryngeally based, a phenomenon with an aesthetically pleasing modulation in pitch and intensity. The modulation in pitch varies approximately ± 1-2 semitones, and the rate is typically between 5.5 and 7.0 undulations per second. virtual pitch: subjective pitch created by two or more partials in a complex tone (for example, the "missing fundamental" of a filtered tone and the strike note of a bell).

134 viscosity: the property of a fluid that resists the force tending to cause the fluid to flow; the measure of the extent to which a fluid possesses this property. vocal chords: a popular but misunderstood term for the vocal cords or vocal folds. vocal cords: 1. a term replaced by vocal folds that refers to the folds of ligaments extending across the larynx that interrupt the flow of air to produce sound. 2. another term for vocal folds. vocal fold vibration: 1. the source of sound energy for singing and speaking. 2. vocal folds moving open and closed emitting puffs of air. 3. the manner of vocal cord motion in generating such a source spectrum is suggested by the analogue to the sound-producing action of oboe reeds. Several different scraping techniques have been developed, each of which produces a characteristic timbre. The American scrape leaves a small spine extending into the thinnest portion of the reed. This extra material makes for a slightly stiff reed that opens and closes more slowly, and results in a mellow tone. The French scrape, on the other hand, does not leave any excess material in the thin portion of the reed. The reed ends are therefore more flexible and their ability to open and close more rapidly accounts for its bright tone. The specific reed-making procedures employed by the professional oboist are the result of extensive practice and personal preference, but to the experienced listener, the resultant tone is almost as distinctive as the tone of a particular singer. vocal folds: the vibrating muscles in the larynx that provide the sound source during voiced sounds. vocal fry: prolonged vocal-fold rattle; a "frying" sound produced through nonperiodic vocal-fold vibration; a glottal scrape, rattle or "click," considered by some researchers a separate low voiced register. vocal personality: due largely to the voice timbre determined by the higher formants (F3, F4, F5) which are directly related to the vocal tract length. vocal tract: 1. the tube connecting the larynx to the mouth consisting of the pharynx and the oral cavity. 2. the oral cavity (mouth) and the nasal cavity (nose). 3. comprised of the larynx, the pharynx, and the mouth. 4. a resonant chamber something like the tube of a horn or the body of a violin. 5. shape is determined by the positions of the articulators: the lips, the jaw, the tongue, and the larynx. 6. vocal tract can be elongated by protruding the lips. vocalis muscle: the thryoarytenoid muscle. vocoder: a combined speech analyzer and synthesizer ("voice coder").

135 voice: 1. description used (with place and manner) to describe whether a speech sound is voiced or voiceless. 2. the sound made by the human vocal instrument. 3. sounds generated by the voice organ, including the vibrating vocal folds, or to be more precise, by means of an air stream from the lungs, modified first by the vibrating vocal folds, and then by the rest of the larynx, and the pharynx, the mouth and sometimes the nasal cavities. 4. a synonym of "voiced sound." voice box: common popular term for the larynx. voice onset time (VOT): a measure of the time between a supraglottal event and the onset of voicing. For stops, VOT is the interval between release of the stop (usually determined acoustically as the stop burst) and the appearance of periodic modulation (voicing) for a following sound. voice organ: that which facilitates phonation. Voice organs include the lungs (power supply), the vocal folds (an oscillator), and the larynx, pharynx, and mouth (a resonator). voice prints: speech spectrograms from which a speaker's identity may be determined. voice range profile: interchangeably used in the literature with phonetogram and Stimmfeld, is a display of intensity range versus fundamental frequency. voice quality: 1. defined by the two lowest formants f1 and f2. 2. the quality of the produced vibrato, the relation between loudness and pitch over the entire vocal range, and the spectral character of the vocal sounds. voice registers: 1. perceptually distinct regions of voice quality that can be maintained over some ranges of pitch and loudness (modal or chest, head, falsetto for example). 2. a series or range of consecutively phonated frequencies which can be produced with nearly identical vocal quality. voiced sounds: sounds in speech or singing that involve the vibrating vocal folds. voice source: 1. the sound generated by the air stream chopped by the vibrating vocal folds. 2. the raw material for speech or song. 3. a complex tone composed of a fundamental frequency (determined by the vibratory frequency of the vocal folds) and a large number of higher harmonic partials or overtones. voicing: adjusting organ pipes to have the desired sound. voiceless sounds: sounds in speech or singing that do not involve the vibrating vocal folds. volume velocity: the rate of air flow in a tube (vocal tract), expressed in units of volume per

136 unit of time (such as m3 /s). vowel formation: formed by changing the frequency relationships between the first two formants through adjustments of the vocal tract. vowel identity: 1. relies on the identification of the spectrum envelope peaks that correspond to the two lowest formants. 2. determined mainly by the frequencies F1 and F2 of the two lowest formants. vowel quality: determined by the frequencies of the two lowest formants. waveform: 1. a plot of a measured quantity against time. 2. a graph showing the amplitude versus time function for a continuous signal such as the acoustic signal of speech. wavelength: the distance between corresponding points on two successive waves. 2. the length of one cycle of a periodic disturbance in space, usually given the Greek letter lamda (λ). whistle register: the highest female singing voice which can extend over two and half octaves above middle C. white noise: noise whose amplitude is constant throughout the audible frequency range. wide-band analysis: an analysis in which a relatively large analyzing bandwidth is used (such as 300 Hz in speech analysis). A wideband analysis is preferred when the primary concern is to reveal formant pattern or to increase time resolution. wideband spectrogram: a spectrogram which uses a wide band filter.

Wilcoxon signed rank test: a non-parametric alternative to the paired student's t-test for the care of two related samples or repeated measurements on a single sample. window: a weighting function applied to a waveform so that its amplitude gradually increases and decreases; the window acts like an acoustic "lens" to focus the analysis on a representative part of the signal. wow: slow periodic variation in the speed of a turntable or tape transport. wowmeter: makes a graphical record of small speed variations in any sound recorder capable of producing a milli-volt output at about 1000 cps. It consists of the following sequence of components: amplifier → clipper → band pass filter → frequency discriminator → demodulator → oscillograph.

137

APPENDIX B

EQUIPMENT

138 EQUIPMENT

An understanding of acoustic voice measurement is best accomplished with at least a cursory knowledge of the voice measurement equipment utilized. An overview of the function of the main components utilized in voice research is provided here, followed by a spreadsheet which lists the specific equipment authors have provided in their articles. The spreadsheet also serves as a page reference to the pictures and data sheets provided in this section. The earliest tools of acoustic voice measurement were the eyes and ears of the curious, the learned and the affirmed. The first tool, which remains one of the best to this day, was a mirror. Mirrors provide good visualization of the larynx – especially mirrors which magnify. Manuel Patricio Rodríguez García (1805-1906) first published pictures of his own vocal cords (sic) and larynx using a small dental mirror in the throat with another mirror, illuminated by sunlight, showing the reflection of the first mirror's image. García is also credited with inventing the laryngoscope in 1854.

A Miller laryngoscope on an infant handle. (http://en.wikipedia.org/wiki/Laryngoscope)

139 A Macintosh size 3-blade on an adult laryngoscope handle. (http://en.wikipedia.org/wiki/Laryngoscope)

Moving ahead to the mid twentieth century, Sawashima and Hirose created the first flexible fiberoptic laryngoscope in 1968. The patient is given the choice of a local anesthetic spray but most find this procedure not painful, just uncomfortable.

Fiberoptic Nasal Endoscopy (used to visualize internal nasal and sinus anatomy)

140

Fiberoptic Nasopharyngoscopy (used to visualize the back of the nose for velopharyngeal function as well as discerning any masses leading to eustachian tube dysfunction and subsequent ear problems). Expected image in this position shown to the right.

141 Fiberoptic Laryngoscopy or Nasolaryngoscopy (used to visualize the voice box and surrounding anatomic structures). Expected image in this position shown to the right. (http://homepage.mac.com/changcy/endo.htm#fne)

Another tool used for visual analysis is stroboscopy, which can be performed with the flexible fiberoptic laryngoscope pictured above or with the rigid laryngoscope pictured below. Stroboscopy has a powerful light source attached to the end of the scope that flashes (the model above flashes every 5 microseconds) to take clear, concise pictures of the vocal folds in motion. Many researchers have found the frame-by-frame analysis of the glottal cycle to be of extreme value in the study of the voice.

142

(http://www.kaypentax.com/Product%20Info/Strobe%20Systems/9295.htm) However, the impact of air pressure both above the voice source (supra-glottal) and below the voice source (sub-glottal) are not measured nor adequately visualized with laryngoscope or stroboscopy. An early measurement tool (still in use today) is the spirometer. A spirometer measures the volume of air (both the rate of air flow and the amount of air flow) that is taken into the lungs (inspired) and expelled by the lungs (expired).

143

SPIROMETER

144 (Air Flow) X (Time) (http://www.vernier.com/probes/spr-bta.html)

Another measuring device for air flow is a ventilograph which gives additional information of the maximum breathing capacity of the participant. When you combine the spirometer, the ventilograph and a facial mask (as in the Rothenberg Mask pictured below) you have a pneumotachometer which measures the speed of moment to moment air flow.

Participant using a pneumotachometer. (http://www.ipds.uni-kiel.de/img/physio1.jpg)

A pneumotachometer is primarily "head gear" and presents many problems for singers to produce normal phonation while wearing the mask. Another approach is to use a plethysmograph which is a sealed box that allows free phonation without the cumbersome mask of a pneumotachometer. However, this approach can be a problem for participants who suffer from claustrophobia.

145

(http://en.wikipedia.org/wiki/Image:BodyBox_Empty.jpg#file)

The plethysmograph and the pneumotachometer are transducers (a device that converts one form of energy into another) that convert airflow to an electrical signal. Information regarding air flow can help explain why vocal folds are moving as depicted in the pictures taken with laryngoscopy or stroboscopy.

Acoustic analysis begins with listening to the participant(s) sing or talk. After this initial evaluation, the primary measurement tool used is sound spectrography. This technology is based on the

146 Fourier theorem which when applied to sound means that any waveform which repeats (periodic) can be analyzed as a reoccurring group of sine waves defined by their frequency, amplitude, and point of phase or cycle. Spectrographic analysis has been successfully applied to the study of fundamental frequency, formants, resonances, harmonics, overtones, vibrato, jitter, and shimmer. To analyze a tone with spectrography, one must obtain acoustic signals from one or more of the following: microphone, electroglottograph (EGG), neck accelerometer, photoglottograph, and/or an oral flow mask (such as the Rothenberg Mask). Neck accelerometers, photoglottographs and EGGs record the acoustic signal directly from the voice source without input of the vocal tract. Microphones record the acoustic signal as it leaves the body – therefore, both the voice source and the vocal tract have provided input.

Participant is wearing an EGG collar and speaking into a microphone.

(UCLA voice lab)

147 EGG – Electroglottography

Measurements of the signal-to-noise ratio The dual-channel electrodes for the EG2 made using the LS-1 Larynx Simulator and EG2-PCX measure the translaryngeal and a comparison to the EGG unit electrical resistance at two marketed by other manufacturers, can be adjacent locations on the neck. These two seen here. signals are combined at the main EGG output.

148

An EGG waveforms in VFCA polarity (movement up indicates an increase in vocal fold contact area) from a 4 year, 3 month old boy producing a sustained vowel /a/. The horizontal time scale is 2 milliseconds/div.

Ultrasonography has been used briefly for voice research to look at the movement of the vocal folds. Electromyography (EMG) has had wider use and its proponents are increasing. EMG is used to study all of the laryngeal muscle activity during phonation. However, this measurement process requires the insertion of needle electrodes placed directly in the muscles to be studied. There have been efforts to use an exterior electrode approach, but results are limited to medium to low frequency responses and less precise muscle feedback.

149

Patient is wearing external and internal electrodes.

System transducers include (from left) tongue array, EMG electrodes, stethoscopic microphone, nasal cannula, and solid state manometer. Two auxiliary channels are provided for other signals of interest.

The external hardware module provides proper signal conditioning specific to each transducer.

(http://www.kaypentax.com/Product%20Info/7120B/7120B.htm)

150 The last large piece of acoustic sound measurement equipment is x-ray devices. X-rays can provide information regarding the participant's morphology which can be considered when analyzing other data. New equipment is being developed daily and old equipment is being applied in new ways. Quite often the greatest tool is a complete personal history which would include medical, emotional, and psychological histories. This is merely an introduction to some of the basic vocal acoustic measurement tools used by researchers and health specialists alike when a participant's history does not suffice.

151

APPENDIX C RESPIRATORY SYSTEM

152 RESPIRATORY SYSTEM

(http://commons.wikimedia.org/wiki/Image:Respiratory_system_complete_en.svg)

153

APPENDIX D LARYNGEAL SYSTEM

154 LARYNGEAL SYSTEM

(http://commons.wikimedia.org/wiki/Image:Larynx_external_en.svg)

155

APPENDIX E ARTICULATORY SYSTEM

156 Sagittal View of the Nose, Mouth, Pharynx, and Larynx.

(http://commons.wikimedia.org/wiki/Image:Sagittalmouth.png) The articulators are the tongue, lips, teeth, and soft palate.

157

APPENDIX F COMPARISON CHART OF TUNING SYSTEMS

158 Cents The standard system for comparing intervals of different sizes is with cents. This is a logarithmic scale in which the octave is divided into 1200 equal parts. In Equal temperament, each semitone is exactly 100 cents. The value in cents for the interval f1 to f2 is 1200xlog2 (f1/(f2). Table X. Comparison of Different Interval Naming Systems

Comparable Just Interval Number Common Common Quarter- Interval Generic Equal Just of Diatonic Diatonic Comma Class Interval Temperament Intonation Semitones Name Name Meantone Perfect 0 0 0 1:1 0 0 0 unison 1 1 1 Minor second 16:15 100 112 117 2 2 1 Major second 9:8 200 204 193 3 3 2 Minor third 6:5 300 316 310 4 4 2 Major third 5:4 400 386 386 5 5 3 Perfect fourth 4:3 500 498 503 Augmented 3 45:32 590 579 fourth/ 6 6 600 Diminished 4 64:45 610 621 fifth 697 7 5 4 Perfect fifth 3:2 700 702 (wolf fifth 737) 8 4 5 Minor sixth 8:5 800 814 814 9 3 5 Major sixth 5:3 900 884 889 Minor 10 2 6 16:9 1000 996 1007 seventh Major 11 1 6 15:8 1100 1088 1083 seventh Perfect 12 0 0 2:1 1200 1200 1200 octave

It is possible to construct just intervals which are closer to the equal-tempered equivalents, but most of the ones listed above have been used historically in equivalent contexts. In particular the (augmented fourth or diminished fifth), could have other ratios; 17:12 (603 cents) is fairly common. The 7:4 interval (the harmonic seventh) has been a contentious issue throughout the history of music theory; it is 31 cents flatter than an equal-tempered minor seventh. Some assert the 7:4 is one of the blue notes used in jazz.

In the diatonic system, every interval has one or more enharmonic equivalents, such as augmented second for minor third.

159

APPENDIX G

PIANO PITCH ~ HERTZ CHART

160

161

APPENDIX H

ENGLISH IPA CHART

162 THE ENGLISH PHONEMIC CHART VOWELS DIPTHONGS

i: i ɪ e Æ eɪ əʊ (BR) oʊ (US) tea happy bit leg cat say boat boat /ti:/ /hæpi/ /bɪt/ /leg/ /cæt/ /seɪ/ /bəʊt/ /boʊt/

ɑ: ɒ ɔ: ʊ aɪ ɔɪ aʊ Father Dog Daughter Sugar my Boy wow /fa:ðə(r)/ /dɒg/ /dɔ:tə/ /ʃʊgə/ /maɪ/ /bɔɪ/ /waʊ/

u: ʌ ɜ: ə ɪə eə ʊə Too Cup Bird About Near Hair poor /tu:/ /kʌp/ /bɜ:(r)d/ /əbaʊt/ /nɪə(r)/ /heə(r)/ /pʊə(r)/

CONSONANTS

p b t d K g tʃ dʒ Pen baby Toy Diary Key Game Cheese jump /pen/ /beɪbi/ /tɔɪ/ /daɪəri/ /ki:/ /geɪm/ /tʃi:z/ / dʒʌmp/

f v θ ð S z ʃ ʒ Fire Video Thumb They Sing Zero Shop vision /feɪə(r)/ /vɪdəʊ/ / θʌm/ /ðeɪ/ /sɪŋ/ /zi:rəʊ/ /ʃɒp/ /vɪʒɪn/

h m n ŋ L r j w Hot Amaze News Building Laugh Rain Yes Wood /hɒt/ /əmeɪz/ /nju:z/ /bɪldɪŋ/ /la:ʃ/ /reɪn/ /jes/ /wʊd/

Photocopiable: © 2007 English Skool, http://www.englishskool.com

163 REFERENCES

Abbott, S. E. (2001). Acoustic evaluation and analysis of the female barbershop tenor voice. Unpublished doctoral dissertation, The Florida State University.

Anderson, S. E. (1993). Choral singers’ timbral descriptions and evaluations of recorded choral excerpts using a dark-to-bright vowel hierarchy. (DMA Dissertation, University of Missouri-Kansas City, 1993), ProQuest Digital Dissertation Abstracts, AAT 9418459.

Arment, H.E.(1960). A Study By Means of Spectrographic Analysis of the Brightness and Darkness Qualities of Vowel Tones in Women's Voices. (University Microfilms No. AAG6002989).

Askenfelt, A., Gauffin, J., Kitzing, P., Sundberg, J. (1977). Electroglottograph and contact microphone for measuring vocal pitch. STL-QPSR, 18 (4), 013-021.

Aspaas, C., McCrea, C. R., Morris, R. J., Fowler, L. (2004). Select acoustic and perceptual measures of choral formation. International Journal of Research in Choral Singing, 2 (1), 11-27.

Bartholomew, W. T. (1934). A physical definition of “good voice-quality” in the male voice. Journal of the Acoustical Society of America, 5 (3), 25-33.

Bartholomew, W.T. (1949). The contributions of acoustics to the arts. Journal of the Acoustical Society of America, 21 (4), 311-314.

Bartholomew, W. (1956). A basis for the acoustical study of singing. Journal of the Acoustical Society of America, 28 (4), 757.

Björkner, E., Sundberg, J., Cleveland, T., & Stone, E. (2006). Voice source differences between registers in female musical theater singers. Journal of Voice, 20 (2), 187- 197.

Bloothooft, G., & Plomp, R. (1984). Spectral analysis of sung vowels: I. Variation due to differences between vowels, singers, and modes of singing. Journal of the Acoustical Society of America, 75 (4), 1259-1264.

Bloothooft, G., & Plomp, R. (1985). Spectral analysis of sung vowels. II. The effect of fundamental frequency on vowel spectra. Journal of the Acoustical Society of America, 77 (4), 1580-1588.

Bloothooft, G., & Plomp, R. (1986a). The sound level of the singer's formant in professional singing. Journal of the Acoustical Society of America, 79 (6), 2028- 2033.

Bloothooft, G., & Plomp, R. (1986b). Spectral analysis of sung vowels III. Characteristics of singers and modes of singing. Journal of the Acoustical Society of America, 79 (3), 852-864.

164 Burnau, J. (1967). Building and balancing choral blend. Music Journal Annual, 3, 68, 80-81, and 122.

Cashmore, D. (1964). A good performance. The Musical Times, 105 (1451), 56-57.

Christiansen, E. (1988). Spectral analysis of choral singing involving oral manipulation and maintenance of vowel intelligibility (DMA Dissertation, Arizona State University, 1988). ProQuest Digital Dissertations, AAT 8907690.

Cleveland, T. (1977). A clearer view of singing voice production: 25 years of progress. Journal of Voice, 8 (1), 18-23.

Cleveland, T. (1977). Acoustic properties of voice timbre types and their influence on voice classification. Journal of the Acoustical Society of America, 61, 1622-1629.

Coleman, R. (1973). A comparison of the contributions of two vocal characteristics to the perception of maleness and femaleness in the voice. STL-QPSR, 14 (2-3), 13- 22.

Coleman, R. (1979). Objective measures of the singer’s voice as a “damage risk” indication. Acoustical Society of America Supplement, 1 (66), Fall 1979, 56.

Coleman, R. (1994a). Acoustic and physiologic factors in duet singing: A pilot study. Journal of Voice, 8 (3), 202-206.

Coleman, R. (1994b). Dynamic intensity variations of individual choral singers. Journal of Voice, 8 (3), 196-201.

Colton, R., & Estill, J. (1979). Elements of quality variation voice modes and singing. Acoustical Society of America Supplement, 1 (66), Fall 1979, 55-56.

Daugherty, J. (1996). Spacing, formation and choral sound: Preferences and perceptions of auditors and choristers. Unpublished Ph.D. dissertation, The Florida State University.

Daugherty, J. (1999). Spacing, formation, and choral sound: Preferences and perceptions of auditors and choristers. Journal of Research in Music Education, 47 (3), 224-238.

Daugherty, J. (2001). Rethinking how voices work in a choral ensemble. The Choral Journal, Dec., 69-75.

Daugherty, J. (2003). Choir spacing and formation: Choral sound preferences in random, synergistic, and gender-specific chamber choir placements. International Journal of Research in Choral Singing, 1 (1), 48-59.

165 Delattre, P. (1951). The physiological interpretation of sound spectrograms. Publication of the Modern Language Association of America, 66 (5), 864-875.

Detweiler, R. (1994). An investigation of the laryngeal system as the resonance source of the singer’s formant. Journal of Voice, 8 (4), 303-313.

Dolson, M. (1982). A tracking phase vocoder and its use in the analysis of ensemble sounds. (Doctoral Dissertation, California Institute of Technology, 1982). ProQuest Digital Dissertation Abstracts, AAT 8312042.

Duey, P. (1950). Bel Canto in Its Golden Age: A Study of Its Teaching Concepts. New York: King's Crown Press.

Ekholm, E. (2000). The effect of singing mode and seating arrangement on choral blend and overall choral sound. Journal of Research in Music Education, 48 (2), 123-135.

Fant, G. (1970). Acoustic Theory of Speech Production with Calculations Based on X-ray Studies of Russian Articulations. The Hague: Mouton Publishing.

Fant, G., Ishizaka, K., Lindqvist-Gauffin, J., & Sundberg, J. (1972). Subglottal formants. STL-QPSR, 13 (1), 001-012..

Fillebrown, T. (1911). Resonance in singing and speaking. Boston: Oliver Ditson Company.

Fletcher, H. (1946). Pitch, loudness, and quality of musical tones. American Journal of Physics, 14 (4), 215.

Folger, W. M. (2002). Unifying the choral sound through voice matching: an empirical study of the adjustments in vibrato frequency modulation and amplitude modulation. (Doctoral dissertation, The University of North Carolina at Greensboro, 2002).

Ford, J. (1999). The preference for strong or weak singer’s formant resonance in choral tone quality. Unpublished Doctoral Dissertation, Florida State University, Tallahassee.

Ford, J. (2003). Preferences for strong or weak singer’s formant resonance in choral tone quality. International Journal of Research in Choral Singing, 1 (1), 29-47.

Freiheit, R.(2005). Historic recording gives choir “alien” feeling: In anechoic space, no one can hear you sing. Lay Language Paper presented at the ASA/NOISE-CON 2005 Meeting, Minneapolis, MN, 1-3.

Fry, D. B. (1956). A basis for the acoustical study of singing. Program of the fifty-first Meeting of the Acoustical Society of America's Joint Meeting with the Second ICA Congress. Cambridge, MA, 34.

166 Gauffin, J. & Hammarberg, B. (1991). Vocal Fold Physiology: Acoustic, Perceptual, and Physiological Aspects of Voice Mechanisms. Stockholm: Singular Publishing Group, Inc.

Gauffin, J. & Sundberg, J. (1980). Data on the glottal voice behavior in vowel production. STL-QPSR , 21.

Giardiniere, D. C. (1991). Voice matching: An investigation of vocal matches, their effect on choral sound and procedures of inquiry conducted by Weston Noble. (Doctoral dissertation, New York University, 1991). UMI ProQuest Digital Dissertation Abstracts, 241, AAT 9213181.

Goodwin, A. W. (1980). An acoustical study of individual voices in choral blend. Journal of Research in Music Education, 28 (2), 119-128.

Gould, W., & Korovin, G. (1994). Laboratory advances for voice measurements. Journal of Voice, 8 (1), 8-17.

Gramming, P. (1991). Vocal loudness and frequency capabilities of the voice. Journal of Voice, 5 (2), 144-157.

Gramming, P., Sundberg, J., Ternström, S., Leanderson, R., & Perkins, W. (1988). The relationship between changes in voice, pitch, and loudness. Journal of Voice, 2 (2), 118-126.

Granqvist, S. (2003). Computer Methods for Voice Analysis. (Doctoral dissertation, KTH, 2003. ISSN 1104-5787.

Gunn, G.H. (1960). An acoustical analysis of quality of variation in sung vowels. (Unpublished doctoral dissertation, University of Michigan, 1960).

Hack, P. A. (1975). The influence of loudness on the discrimination of musical sound factors. Journal of Research in Music Education, (1), 67-77.

Hagerman, B. & Sundberg, J. (1980). Fundamental frequency adjustment in barbershop singing. Journal of Research in Singing, 4 (1), 3-17.

Hall, D. (1980). Musical Acoustics: An Introduction. Belmont, CA: Wadsworth Publishing Co.

Harper, A. H., Jr. (1967). Spectrographic Comparison of Certain Vowels to Ascertain Differences Between Solo and Choral Singing, Reinforced by Aural Comparison. (Doctoral dissertation, Indiana University, 1967).

Hertegard, S., Gauffin, J., Sundberg, J. (1990). Open and covered singing as studied by means of fiber optics, inverse filtering, and spectral analysis. Journal of Voice, 4, 220-230.

167 Helmholtz, H. (1885). On the Sensations of Tone as a Physiological Basis for the Theory of Music. (Alexander J. Ellis, Trans.). New York: Dover.

Honda, K. (1983). Relationship between pitch control and vowel articulation. In D. M. Bless & J. H. Abbs (Eds.), Vocal Fold Physiology (286-297). San Diego: College-Hill Press.

Howard, D. (2004). Measuring the tuning accuracy of thousands singing in unison: An English premier football league table of fans’ singing tunefulness. Logopedics Phoniatrics Vocology, (29) 2, 77-83.

Howard, D. (2007a). Equal or non-equal temperament in a cappella SATB singing. Logopedics Phoniatrics Vocology, (2) 32, 87-94.

Howard, D. (2007b). Intonation drift in a cappella soprano, alto, tenor, bass quartet singing with key modulation. Journal of Voice, (2) 3, 300-315.

Howard, D. (2007c). Larynx closed quotient variation in quartet singing. 19th International Congress on Acoustics, Madrid, September 2007, (PACS: 43.55.Cs).

Howard, D. (2007d). Voice Science, Acoustics, and Recordings. San Diego: Plural Publishing.

Hunt, W. (1970). Spectrographic Analysis of the Acoustical Properties of Selected Vowels in Choral Sound. (Education Degree Dissertation, University of North Texas, 1970).

Jers, H. (2005). What are the differences between amateur and professional choirs? ASA/NOISE-CON 2005 Meeting Lay Language Papers, Minneapolis, MN, 1-4.

Jers, H. (2007). Directivity measurements of adjacent singers in a choir. 19th International Congress on Acoustics, Madrid, September 2007, (PACS: 43.75.Rs), 1-5.

Jers, H. & Ternström, S. (2005). Intonation analysis of a multi-channel recording. TMH- QPSR, 47 (1), 001-006.

Joliveau, E., Smith, J., & Wolfe, J. (2004). Vocal tract resonance in singing: The soprano voice. Journal of the Acoustical Society of America, 116 (4), 2434-2439.

Kahlin, D. & Ternstrom, S. (1999). The chorus effect revisited – experiments in frequency-domain analysis and stimulation of ensemble sounds. Proceedings of the Euromicro Conference, September 8-10, 75-80.

Kent, R. & Read, C. (1992). The Acoustic Analysis of Speech. San Diego: Singular Publishing Group, Inc.

Kiesgen, P. (1997). Warning! Soft singing may be harmful to your health! Choral Journal, August, 29-33.

168

Killian, J. N. (1985). Operant preference for vocal balance in four-voice chorales. Journal of Research in Music Education, 33 (1), 55-67.

Kitch, J., Oates, J., & Greenwood, K. (1996). Performance effects on the voices of 10 choral tenors: Acoustic and perceptual findings. Journal of Voice, 10 (3), 217-227.

Knutson, B. J. (1987). Interviews with selected choral conductors concerning rationale and practices regarding choral blend. (Doctoral Dissertation, The Florida State University, 1987). UNI ProQuest Digital Dissertation Abstracts, 135, AAT 8802564.

Lagefoged, P. (1996). Elements of Acoustical Phonetics, 2nd Ed. Chicago: University of Chicago Press.

Lambson, A. (1961). An evaluation of various seating plans used in choral singing. Journal of Research in Music Education, 9 (1), 47-54.

Large, J. (1973). Acoustic study of register equalization in singing. Folia Phoniatrica, 25, 39-61.

Large, J. (1979). Studies of the Garcían model for vocal registration. Acoustical Society of America Supplement, 1 (66), Fall, 56.

Letowksi, T., Zimak, L., & Ciolkosz-Lupinowa, H. (1988). Timbre differences of an individual voice in solo and in choral singing. Archives of Acoustics, 13 (1-2), 55-65.

Libeaux, A., Lentz, T., Houben, D., & Kob, M. (2007). Voice assessment in choir singers using a virtual choir environment. 19th International Congress on Acoustics, Madrid, 2007, (PACS: 43.57.Rs), 1-6.

Liemohn, E. (1958). Intonation and blend in the a cappella choir. Music Educators Journal, 44 (6), 50-51.

Lofqvist, A. (1986). The long time average spectrum as a tool in voice research. Journal of Phonetics, 14, 471-475.

Lottermoser, W., Meyer, Fr-J. (1960). Frequenzmessunger an gesungenen akkorden. Akustica, 10, 181-184.

Magill, P.C., Jacobson, L. (1978). A comparison of the singing formant in the voices of professional and student singers. Journal of Research in Music Education, 26 (4), 456-469.

Manen, L., Fry, D. B. (1956). A basis for the acoustical study of singing. Program of the Fifty-First Meeting of the Acoustical Society of America’s Joint Meeting with the Second ICA Congress. Cambridge, Massachusetts, 34.

169

Marshall, A. H., Gottlob, D., Alrutz, H. (1978). Acoustical conditions preferred for ensemble. The Journal of the Acoustical Society of America, 64 (5), 1437-1442.

Marshall, A., & Meyer, J. (1985). The directivity and auditory impressions of singers. Akustica, 58, 130-140.

Maxwell, Donald E. (1986). The effect of white noise masking on singers. Journal of Research in Singing, 8 (2), 9-19.

Mayer, F. C. (1964). The relationship of blend and intonation in the choral art. Music Educators Journal, 51 (1), 109-110.

McCoy, S. (2004). Your Voice: An Inside View. Princeton: Inside View Press.

Miller, G., & Schutte, H. (1990a). Feedback from spectrum analysis applied to the singing voice. Journal of Voice, 4 (4), 329-334.

Miller, G., & Schutte, H. (1990b). Formant tuning in a professional baritone. Journal of Voice, 4 (3), 231-237.

Miller, D., Schutte, H., Doing, J. (2001). Soft phonation in the male singing voice: A preliminary study. Journal of Voice, 15 (4), 483-491.

Miller, D., Schutte, H. (2005). “Mixing” the registers: glottal source or vocal tract? Folia Phoniatrica et Logopaedica, 57, 278-291.

Miller, R. (2004). Solutions for Singers: Tools for Performers and Teachers. New York: Oxford University Press.

Molnar, J. (1950). The selection and placement of choir voices. Music Educator’s Journal, (1), 48-49.

Morris, R. (2007). Planning and recording effective acoustic research of choral singing. International Journal of Research in Choral Singing, article in press, 1-12.

Morris, R., Mustafa, A., McCrea, C., Fowler, L., Aspaas, C. (2006). Acoustic analysis of the interaction of choral arrangements, musical selection, and microphone location. Journal of Voice, article in press, 1-8.

Murray, T. (1979). Vocal jitter in singers voice. The 98th Meeting of Acoustical Society of America, November, Salt Lake City, Utah.

Noll, M. (1967). Cepstrum pitch determination. The Journal of the Acoustical Society of America, 41 (2), 293-308.

170 Nordmark, J. & Ternström, S. (1996). Intonation preferences for major thirds with non-beating ensemble sounds. TMH-QPSR, 37 (1), 57-62.

Pike, D. (1988). More ideas for choral conductors. Music Educators Journal, 74 (8), 5.

Powell, S. (1991). Choral intonation: More than meets the ear. Music Educators Journal, 77 (9), pp. 40-43.

Quinn, S. (1996). Choral intonation: A practical guide to the process and the development of skills necessary for acquiring and maintaining accurate tuning. Canadian Music Educator, 37 (3), 3-15.

Reid, K., Davis, P., Oates, J., Cabrera, D., Ternström, S., Black, M., & Chapman, J. (2007). The acoustic characteristics of professional opera singers performing in chorus versus solo mode. The Journal of Voice, 21 (1), 35-45.

Rodda, R. S. (1960). How does your choir sound? Music Educators Journal, 46 (3), 60- 62.

Roers, F., Mürbe, D., & Sundberg, J. (2007). Predicted singers' vocal fold lengths and voice classification – a study of x-ray morphological measures. Journal of Voice, article in press, 1-8.

Rossing, T., Moore, F., & Wheeler, P. (2002). The Science of Sound, 3rd Ed. San Francisco: Addison Wesley.

Rossing, T., Sundberg, J., & Ternström, S. (1986). Acoustic comparison of voice use in solo and choir singing. Journal of the Acoustical Society of America, 79, 1975 – 1981.

Rossing, T., Sundberg, J., & Ternström, S. (1986). Voice timbre in solo and choir singing: Is there a difference? Journal of Research in Singing, 8 (2), 1-7.

Rossing, T., Sundberg, J., & Ternström, S. (1987). Acoustic comparison of soprano solo and choir singing. Journal of the Acoustical Society of America, 82 (3), 830-836.

Rshevkin, S. N. (1956). Some results of the analysis of singing voice. Program of the Fifty-First Meeting of the Acoustical Society of America’s Joint Meeting with the Second ICA Congress. Cambridge, Massachusetts, 34-36.

Sacerdote, G. (1957). Researches on the singing voice. Acustica, (7) 2, 61-68.

Sataloff, R. (1988). Vocal Health and Pedagogy. San Diego: Singular Publishing.

Sataloff, R. (2005). Voice Science. San Diego: Plural Publishing, Inc.

171 Schoen, M. (1921). An Experimental Study of the Pitch Factor in Artistic Singing. Doctoral Dissertation, University of Iowa, 1921.

Schutte, H., & Miller, D. (1983). Resonance balance in register categories of the singing voice: A spectral analysis study. Folia Phoniatrica, 36, 289-295.

Schutte, H., Miller, D. (1991). Acoustic details of vibrato cycle in tenor high notes. Journal of Voice, 5 (3), 217-223.

Schutte, H., Miller, D., & Svec, J. (1995). Measurement of formant frequencies and bandwidths in singing. Journal of Voice, 9 (3), 290-296.

Seashore, C. (1938). Psychology of Music. New York: McGraw-Hill Book Co., Inc.

Seashore, C. (1960). Seashore Measures of Musical Talents. New York: The Psychological Corporation.

Shipp, T., & Izdebski, K. (1979). Elements of frequency and amplitude modulation in the trained and pathologic voice. Acoustical Society of America Supplement, 1 (66), Fall, 56.

Smith, B., Sataloff, R. T. (2000). Choral Pedagogy. San Diego: Singular Publishing Group.

Smith, P. (2002). Balance or blend? Two approaches to choral singing. Choral Journal, 43 (5), 31-43.

Spurgeon, D. (2002). The balancing act: Nurture individual voices and get a great group sound. Teaching Music, 10 (2), 36-40.

Sundberg, J. (1973). The source spectrum in professional singing. Folia Phoniatrica, 25 87.

Sundberg, J. (1977b). The acoustics of the singing voice. Scientific American, 236 (3), 88-91. Sundberg, J. (1981). Formants and fundamental frequency control in singing. An experimental study of coupling between vocal tract and voice source. Acustica, 49, 47-54.

Sundberg, J. (1987). The Science of the Singing Voice. Northern Illinois University Press: Dekalb, IL.

Sundberg, J. (1988). Vocal tract resonance in singing. National Association of Teachers of Singing Journal, 44 (4), 11-31.

Sundberg, J. (1994). Perceptual aspects of singing. Journal of Voice, 8 (2), 106-122.

172 Sundberg, J. (2003). Research on the singing voice in retrospect. TMH_QPSR, 45, 11-22.

Sundberg, J., Lindqvist-Gauffin, J. (1974). Masking effects of one’s own voice. STL- QPSR, (1), 35-41.

Sundberg, J., Titze, I. (1992). Vocal intensity in speakers and singers. Journal of the Acoustical Society of America, 95 (2), 1133-1142.

Sundberg, J., Cleveland, T., Stone, R., & Iwarsson, J. (1999). Voice source characteristics in six premier country singers. Journal of Voice, 13 (2), 168- 183.

Teie, E. W. (1976). A comparative study of the development of the third formant in trained and untrained voices. (Doctoral dissertation, University of Minnesota, 976). Dissertation Abstracts International, 37, (10A), 6135.

Ternström, S. (1989). Acoustical aspects of choir singing. Dissertation Abstracts International, 51, (04C), 609. (University Microfilm No. AAGC150108).

Ternström, S. (1993). Long-time average spectrum characteristics of different choirs in different rooms. The Journal of the British Voice Association, 2, 55-77.

Ternström, S. (1993). Perceptual evaluations of voice scatter in unison choir sounds. Journal of Voice, 7 (2), 129-135.

Ternström, S. (1994). Hearing myself with others: Sound levels in choral performance measured with separation of one’s own voice from the rest of the choir. Journal of Voice, 8 (4), 293-302.

Ternström, S. (1999). Preferred self-to-other ratios in choir singing. Journal of the Acoustical Society of America, 105, 3563-3574.

Ternström, S. (2003). Choir acoustics: an overview of scientific research published to date. International Journal of Research in Choral Singing, 1 (1), 3-13.

Ternström, S. (1991). Physical and acoustic factors that interact with the singer to produce the choral sound. Journal of Voice, 5 (2), 128-143.

Ternström, S., Jers, H. (2005). Intonation analysis of a multi-channel choir recording. TMH-QPSR, (1) 47, 1-6.

Ternström, S., & Kalin, G. (2007). Formant frequency adjustment in barbershop quartet singing. International Congress on Acoustics, Madrid, September 2007, pp. 1-6.

Ternström, S., & Sundberg, J. (1982). “Acoustical factors related to pitch precision in

173 choir singing,” Speech Transmission Lab. Qt. Prog. Status Rep. 2-3, 76-90 (Dept. of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm).

Ternström, S., & Sundberg, J. (1983). “How loudly should you hear your colleagues and yourself? A study of SPL within choirs,” Speech Transmission Lab. Qt. Prog. Status Rep. 4, 16-26 (Dept. Of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm).

Ternström, S., & Sundberg, J. (1985). Voice timbre in solo and choir singing: Is there a difference? Journal of Research in Singing, 8, 1-8.

Ternström, S., Cabrera, D., & Davis, P. (2005). Self-to-other ratios measured in an opera chorus in performance. Journal of the Acoustical Society of America, 116 (6), 3903-3911.

Ternström, S., Sundberg, J., & Colldén, A. (1988). Articulatory F0 Perturbations and auditory feedback. Journal of Speech and Hearing Research, 31 (June), 187-192.

Ternström, S., Sundberg, J., Colldén, A. (1988). Articulatory F0 perfurbations and auditory feedback. Journal of Speech and Hearing Research, 31 (June), 187-192.

Ternström, S., & Sundberg, J. (1988). Intonation precision of choir singers. Journal of the Acoustic Society of America, 84 (1), 59-70.

Ternström, S., & Sundberg, J. (1989). Formant frequencies of choir singers. Journal of the Acoustical Society of America, 86 (2), 517-522.

Thurman, L., & Daugherty, J. (2003). Balance or blend? Are these the only vocal approaches to choral singing? (A rebuttal). Choral Journal, 43, 35-43.

Titze, I. (1979). A physiological interpretation of vocal registers. Acoustical Society of American Supplement, I (66), 55-56.

Titze, I. (1988). A framework for the study of vocal registers. Journal of Voice, 2 (3), 183-194.

Titze, I. (1994). Towards standards in acoustic analysis of voice. Journal of Voice, 8 (4), 1-7.

Tocheff, R. (1990). Acoustical placement of voices in choral formations (Doctoral Dissertation, The Ohio State University, 1990). ProQuest Digital Dissertation Abstracts, AAT 9111807.

Tonkinson, S. (1990). The Lombard effect in choral singing. Journal of Voice, 8 (1), 24- 29.

174

Van den Berg, J. (1963). Vocal ligaments versus registers. The NATS Bulletin, December, 16-31.

Venard, W., Hirano, M. & Ohala, J. (1970). Chest, head, and falsetto. The NATS Bulletin, December, 33-37.

Vilkman, E. & Alku, P. (1994). Register shift in the lower pitch range. Proceedings of the Stockholm Music Acoustics Conference, July 1993, No. 79, 1-6.

Votaw, L. (1931). Choral intonation. Music Supervisors’ Journal, 18 (1), 50-53.

Weber, S. (1992). An investigation of intensity differences between vibrato and straight tone singing (Doctoral dissertation, Arizona State University, 1992). ProQuest Dissertation Abstracts International, AAT 9223155.

Woodruff, N. W. (2001). The acoustic interaction of voices in ensemble: An inquiry into the phenomenon of voice matching and the perception of unaltered vocal process (Doctoral dissertation, The University of Oklahoma, 2001). UMI ProQuest Digital Dissertations, 240, AAT 3075332.

175 BIOGRAPHICAL SKETCH

Brenda was born in Lake Charles, Louisiana where she lived for six weeks. Her father served in the United States Air Force and therefore the family enjoyed a full life of travel which culminated at Langley Air Force Base in Hampton, Virginia. Brenda graduated from Poquoson High School and promptly entered James Madison University in Harrisonburg, Virginia. JMU awarded Brenda a Bachelor's of Music Degree with a concentration in Vocal Performance. Ten years later Brenda re- entered JMU and earned a Masters of Music Degree in Choral Conducting. Throughout this time Brenda enjoyed a successful career as a soprano soloist in opera, oratorio, and church music as well as a voice instructor and vocal coach. In 1997, Brenda began teaching choral music at Martin Middle School in Tarboro, North Carolina while simultaneously earning a North Carolina K-12 Music Teaching Certificate. During this time, Brenda also was the North Carolina South East Region Solo and Small Ensemble Coordinator for MENC (Music Educators National Convention). Brenda began teaching high school chorus in 2000 at East Wake High School in Wendell, North Carolina during which time she served on the MENC Repertoire Standards and Selection Committee. Interestingly enough, ten years had passed once again. Brenda was admitted to the Doctoral program at The Florida State University College of Music and began her studies in August, 2005. This effort completes Brenda's program and she has been awarded her Doctorate of Philosophy in Music Education with a concentration in Choral Conducting. Her new home will be in Macomb, Illinois as she begins her new career in Choral Music Education as a member of the faculty at Western Illinois University.

176