The Perception of Singing
Total Page:16
File Type:pdf, Size:1020Kb
6 THE PERCEPTION OF SINGING JOHAN SUNDBERG Department of Speech, Music, and Hearing Kungl Tekniska HOgskolan Stockholm, Sweden I. INTRODUCTION An understanding of the perception of singing may be developed from two different types of investigation. One type considers an acoustic property of sing- ing, varies it systematically, and explores the perception of these variations. Such investigations are rare in singing research. Another type of investigation concerns the acoustic correlates of certain types of voices or phonations. The underlying typology of voices and phonations must be based mainly on aural perception. Consequently, even such studies have perceptual relevance. Most studies of sing- ing have this type of perceptual relevance. However, research on the perception of singing is not as well developed as the closely related field of speech research. Therefore, readers will not find an exhaustive presentation in this chapter. Rather they will find a presentation of different studies only partly related to each other. A confession on the part of the author of this chapter is in order here. Writing this chapter has been a bit embarrassing, because so many references to my own work seemed necessary. As readers may notice, it is difficult to be fully objective in the presentation of one's own investigations. However, as was just mentioned, relatively few studies on the perceptual aspects of singing have been published. When we listen to a singer, a number of remarkable perceptual facts are appar- ent. For instance: How is it that we can hear the voice even when the orchestra is loud? How is it that we generally identify the singer's vowels correctly even though vowel quality in singing differs considerably from the vowel quality that we are used to in speech? How is it that we can identify the individual singer's sex, register, and voice timbre when the pitch of the vowel lies within a range common to all singers and several registers? How is it that we perceive singing as a se- quence of discrete pitches even though the fundamental frequency events do not form a pattern of discrete fundamental frequencies? How is it that we hear a Copyright 1999 by Academic Press. The Psychology of Music, Second Edition 171 All rights of reproduction in any form reserved. 1 7 2 JOHAN SUNDBERG phrase rather than a sequence of isolated tones? These are some of the main ques- tions that are discussed in this chapter. In order to understand the questions as well as the answers, it is necessary to have a basic knowledge of the acoustics of sing- ing. We will therefore first briefly present what is known about this. !!. FUNCTION OF THE VOICE The vocal organ consists of three basic components: (a) the respiratory system, which provides an excess pressure of air in the lungs; (b) the vocal folds, which chop the airstream from the lungs into a sequence of quasi-periodic air pulses; and (c) the vocal tract, which gives each sound its final characteristic spectral shape and thus its timbral identity. These three components are referred to as respiration, phonation, and resonance or articulation, respectively. The chopped airstream (i.e., the voice source) is the raw material of all voiced sounds. It can be described as a complex tone composed of a number of harmonic partials. This implies that the frequency of the nth partial equals n times the fre- quency of the first partial, which is called the fundamental. The frequency of the fundamental (i.e., the fundamental frequency) is identical to the number of air pulses occurring in 1 sec or, in other words, to the frequency of vibration of the vocal folds. The fundamental frequency determines the pitch we perceive in the sense that the pitch would remain essentially the same even if the fundamental sounded alone. The amplitudes of the voice-source partials decrease monotoni- cally with rising frequency. For medium vocal loudness, a given partial is 12 dB stronger than a partial located one octave higher; for softer phonation, this differ- ence is greater. On the other hand, the slope of the voice-source spectrum is gener- ally not dependent on which voiced sound is produced. Spectral differences between various voiced sounds arise when the sound from the voice source is transferred through the vocal tract (i.e., from the vocal folds to the lip opening). The reason for this is that the ability of the vocal tract to transfer sound is highly dependent on the frequency of the sound being transferred. This ability culminates at certain frequencies, called the formant frequencies. In conse- quence, those voice-source partials that lie closest to the formant frequencies are radiated from the lip opening at greater amplitudes than are the other neighboring partials. Hence, the formant frequencies are manifest as peaks in the spectrum of the radiated sound. The formant frequencies vary within rather wide limits in response to changing the position of the articulators (i.e., lips, tongue body, tongue tip, lower jaw, ve- lum, pharyngeal sidewalls, and larynx). We can change the two lowest formant frequencies by two octaves or more by changing the position of the articulators. The frequencies of these two formants determine the identity of most vowels, that is, the vowel quality. The higher formant frequencies cannot be varied as much and do not contribute much to vowel quality. Rather, they are significant to the per- sonal voice timbre, which henceforth will be referred to as voice quality. 6. THE PERCEPTION OF SINGING 173 Properties of vowel sounds that are of great importance to vowel quality can be described in a chart showing the frequencies of the two lowest formants, as in Figure 1. Note that each vowel is represented by a small area rather than by a point in the chart. In other words, these formant frequencies may vary within certain limits without changing the identity of the vowel. This reflects the fact that a given vowel normally possesses higher formant frequencies in a child or in a woman than in a male adult. The reason for these differences lies in differing vocal tract dimensions, as will be shown later. In singing, more or less substantial deviations are observed from the vowel ranges shown in Figure 1. Indeed, a male opera singer may change the formant frequencies so much that they enter the area of a different vowel. For instance, in the vowel [i:] 1 as sung by a male opera singer, the two lowest formant frequencies may be those of the vowel [y:] according to Figure 1. In female high-pitched opera singing, the formant frequencies may be totally different from those found in nor- mal speech. Yet we tend to identify such vowels correctly. This shows that the frequencies of the two lowest formants do not determine vowel quality entirely. Next we will see how and why these deviations from normal speech are made in singing. 2.5 2.o I1 1. o 0 .2 .4 .6 .8 1.0 t2 F I (kHz) F! G U R E 1 Rangesof the two lowest formant frequencies for different vowels represented by their symbols in the International Phonetic Alphabet (IPA). Above, the frequency scale of the first formant is given in musical notation. IAll letters appearing within [ ] are symbols in the International Phonetic Alphabet. 174 JOHAN SUNDBERG !!!. RESONATORY ASPECTS A. SINGING AT HIGH PITCHES 1. Formant Frequencies Most singers are required to sing at fundamental frequencies higher than those used in normal speech, where the voice fundamental frequencies of male and fe- male adults center on approximately 110 Hz and 200 Hz, respectively, and gener- ally do not exceed about 200 Hz and 350 Hz, respectively. The highest pitches for soprano, alto, tenor, baritone, and bass correspond to fundamental frequencies of about 1400 Hz (pitch C6), 700 Hz (Fs), 523 Hz (C5), 390 Hz (G4), and 350 Hz (F4), respectively. In speech the first formant is normally higher than the fundamental frequency, but in singing the normal value of the first formant frequency of many vowels is often much lower than these top fundamental frequencies, as shown in Figure 1. If the singer were to use the same articulation in singing a high-pitched tone as in normal speech, the situation illustrated in the upper part of Figure 2 would occur. The fundamental, that is, the lowest partial in the spectrum, would appear at a frequency far above that of the first formant. In other words, the capa- FORMANTS 2 1 PART I A LS FREQUENCY FORMANTS r~ I-- .1 _1 n :E < PARTIALS FREQUENCY F i G U R E 2 Schematic illustration of the formant strategy in high-pitched singing. In the upper case, the singer has a small jaw opening. The first formant appears at a frequency lower than that of the fundamental (i.e., the lowest partial of the spectrum). The result is a low amplitude of that partial. In the lower case, the jaw opening is widened so that the first formant is raised to a frequency near that of the fundamental. The result is a considerable gain in amplitude of that partial. (Reprinted from Sundberg, 1977b.) 6. THE PERCEPTION OF SINGING 175 bility of the vocal tract to transfer sound would be optimal at a frequency where there is no sound to transfer. It seems that singers learn to avoid this situation. The strategy is to abandon the formant frequencies of normal speech and move the frequency of the first formant close to that of the fundamental. The main articulatory gesture used to achieve this tuning of the first formant is a widening of the jaw opening, which is particularly effective for increasing the first formant frequency (cf.