A

Acoustic

Acoustic phonetics is the study of the acoustic charac- amplitude relations result in perceived differences in teristics of speech. Speech consists of variations in air timbre or quality. pressure that result from physical disturbances of air Fourier’s theorem enables us to describe speech molecules caused by the flow of air out of the lungs. sounds in terms of the and amplitude of This airflow makes the air molecules alternately crowd each of its constituent simple waves. Such a descrip- together and move apart (oscillate), creating increases tion is known as the spectrum of a sound. A spectrum and decreases, respectively, in air pressure. The result- is visually displayed as a plot of frequency vs. ampli- ing sound wave transmits these changes in pressure tude, with frequency represented from low to high from speaker to hearer. Sound waves can be described along the horizontal axis and amplitude from low to in terms of physical properties such as cycle, period, high along the vertical axis. frequency, and amplitude. These concepts are most The usual energy source for speech is the airstream easily illustrated when considering a simple wave cor- generated by the lungs. This steady flow of air is con- responding to a pure . A cycle is a sequence of one verted into brief puffs of air by the vibrating vocal increase and one decrease in air pressure. A period is folds, two muscular folds housed in the larynx. The the amount of time (expressed in seconds or millisec- dominant way of conceptualizing the process of onds) that one cycle takes. Frequency is the number of is in terms of the source-filter theo- cycles in one second, expressed in hertz (Hz). An ry, according to which the acoustic characteristics of increase in frequency usually results in an increase in speech can be understood as a result of a source com- perceived pitch. Amplitude refers to the magnitude of ponent and a filter component. The source component vibrations, with larger vibrations resulting in greater is determined by the rate of vocal fold vibration, which peaks of pressure (greater amplitude), which usually in turn is affected by a number of factors, including the result in an increase in perceived loudness. rate of airflow and the mass and stiffness of the vocal Unlike pure tones, which rarely occur in the envi- folds. The rate of vocal fold vibration directly deter- ronment, speech sounds are complex waves with com- mines the F0 of the waveform. The mean F0 for adult binations of different and amplitudes. women is approximately 220 Hz, and approximately However, as first stated by the French mathematician 130 Hz for adult men. “In addition to their role as Fourier (1768–1830), any complex wave can be properties of individual speech sounds, F0 and ampli- described as a combination of simple waves. A com- tude also signal emphasis, , and .” plex wave has a regular rate of repetition, known as the For speech, the source component itself has a com- (F0). Changes in F0 give rise plex waveform, and its spectrum will typically show to differences in perceived pitch, whereas changes in the highest energy at the lowest frequencies and a the number of constituent simple waves and their number of higher frequency components that

1

3000 heed

hayed 2500 hid had

head 2000

heard hod 1500 hud hood

Frequency (Hz) hawed who'd hoed 1000 hod Figure 1. Frequencies of the first two for- hud hawed head had mants of 12 of American English, 500 hayed heard hoed hood averaged across 48 adult female speakers. hid who'd heed F1 is in black, F2 is in gray. (Source: Hillenbrand et al., J. Acoust. 0 Soc. Am., 1995) systematically decrease in amplitude. This source have major energy around 2,500 to 3,500 Hz, whereas component is subsequently modified by the vocal tract the more anterior ones [s, z] have major energy well above the larynx, which acts as the filter. This filter above 4,000 to 5,000 Hz. However, in the case of enhances energy in certain frequency regions and sup- with a constriction toward the very front of presses energy in others, resulting in a spectrum with the vocal tract, the extremely short section in front of peaks and valleys, respectively. The peaks in the spec- the constriction does not result in clearly defined spec- trum (local energy maxima) are known as fre- tra. As a result, bilabial [b, p] and labiodental [f, v] con- quencies. The lowest-frequency peak is known as the sonants are described as having diffuse spectra, without first formant, or F1, the next lowest is F2, and so on. any clear concentration of energy. The vocal tract filter is determined by the size and From a linguistic point of view, a detailed description shape of the vocal tract and is therefore directly affect- of speech sounds in terms of their frequency, in addition ed by the position and movement of the articulators to amplitude and duration, can elucidate the factors that such as the tongue, jaw, and lips. shape sound categories and determine phonological Vowels are typically characterized in terms of the processes both within and across languages. In addition, location of the first two , as illustrated in acoustic phonetic analysis may serve to quantify atypi- Figure 1 for the vowels of American English. For a cal speech patterns produced by nonnative speakers or given speaker, each typically has a unique for- speakers with specific speech disorders. mant pattern. However, variation in vocal tract size among speakers often leads to a degree of formant References overlap for different vowels. Consonants can also be described in terms of their Asher, R., and Eugenie Henderson (eds.) 1981. Towards a his- spectral properties. These sounds are produced with a tory of phonetics. Edinburgh: Edinburgh University Press. Fant, Gunnar. 1960. Acoustic theory of speech production. The complete or narrow constriction in the vocal tract, Hague: Mouton. essentially creating a vocal tract with two sections: one Flanagan, James L. 1965. Speech analysis synthesis and per- behind and the other in front of the constriction. The ception. Berlin: Springer-Verlag. length of the section in front of the constriction is one of Hardcastle, William, and John Laver (eds.) 1997. The handbook the primary determinants of the spectra of these sounds. of phonetic sciences. Oxford and Cambridge, MA: Blackwell. The longer this section (i.e. the farther back the con- Helmholtz, Hermann von. 1877. Die Lehre von den striction), the lower the frequency at which a concentra- Tonempfindungen als physiologische Grundlage fur die tion of energy occurs. For example, consonants like k Theorie der Musik. Braunschweig: F. Vieweg; as On the and g, which are produced at the back of the mouth, are sensations of tone as a physiological basis for the theory of typically characterized by a concentration of energy music, translated by Alexander J. Ellis, London: Longmans, Green, 1885. between approximately 1,500 and 2,500 Hz, whereas Hillenbrand, James, Laura A. Getty, Michael J. Clark, and more anterior consonants like t and d typically have a Kimberlee Wheeler. 1995. Acoustic characteristics of concentration of energy above 3,000 Hz. Similarly, the American English vowels. Journal of the Acoustical Society sibilants [,] produced in the middle of the mouth of America 97(5). 3099–111. Jakobson, Roman, Gunnar Fant, and Morris Halle (eds.) 1952.

2 ACOUSTIC PHONETICS

Preliminaries to speech analysis. Cambridge, MA: MIT edition, 1996. Press. Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, MA: MIT Johnson, Keith. 1997. Acoustic and . Oxford Press. and Cambridge, MA: Blackwell. Lieberman, Philip, and Sheila E. Blumstein. 1988. Speech Kent, Raymond D., Bishnu S. Atal, and Joanne L. Miller (eds.) physiology, , and acoustic phonetics. 1991. Papers in Speech Communication: speech production. Cambridge: Cambridge University Press. Woodbury, New York: Acoustical Society of America. Stevens, Kenneth N. 1998. Acoustic phonetics. Cambridge, Ladefoged, Peter. 1962. Elements of acoustic phonetics. MA: MIT Press. Chicago and London: The University of Chicago Press, 2nd ALLARD JONGMAN

3