An Introduction to Sciences (Acoustic Analysis of Speech)

Rezaei Nasser , BSc, P.G.D., MA, PhD

Assistant Professor; University of Social Welfare and Rehabilitation Sciences, Salehi Abolfazl , BSc , MSc Lecturer , University of Social Welfare and Rehabilitation Sciences, Tehran.Iran

Abstract fconsonants and vowels such as, voice onset time (VOT), peech sciences deal with the and spectrograms will be presented. S acoustical characteristics of speech by means of sophisticated soft wares as Key words: well as hard wares. Although, a speech Acoustic analysis, VOT, Spectrogram, science is a well known science in the Formant , Disordered speech developed countries, especially the west- ern societies, however, it has been Introduction remained almost unknown in Iran, though, Speech science is the study of all the fac- in recent years a group of scholars have tors involved in producing, transmitting, been involved in this branch of science. perceiving and comprehending speech, The application of the speech sciences in including all relevant aspects of , the field of 'Speech and Language physiology, neurology and , as Therapy' has been widely accepted well as phonetics. Speech analysis began throughout the world. The main focus of in 1940s (i.e., during the Second World these applications is the therapeutic War) in the United States of America. The aspects of the program, which is used for study of from an both adults and children. However, the acoustical point of view provides the non-therapeutic applications are mainly means for looking at a very complex used for research purposes only. process in a simpler way. Speech science In this article, a brief introduction about the is the study of all the factors involved in nature as well as the application of speech producing, transmitting, perceiving and analysis for both normal and abnormal comprehending speech, including all is given. In addition, examples of the acoustical characteristics

vol.4 - No .4 5 relevant aspects of anatomy, physiology, neurology and acoustics, as well as pho- an engine, but rather we hear the vibrations netics. that are broadcast, or transmitted, in a medi- Acoustics is the branch of physics that um like air. Figure B-1 shows a simple physi- deals with , and Psychoacoustics cal arrangement to demonstrate the nature is the study of the psychological of sound. The sound source is a stretched response to sound; it is a division of psy- rubber band that can be pulled to set it into chophysics, or the general study of psy- vibration. The band undergoes a series of to- chological responses to physical stimuli. and-fro vibration after it is pulled. The initial The study of speech acoustics has both a physical and psychological side. The vibrations are of large amplitude, meaning physical side pertains to the physical that the to-and-fro swings have a relatively structure of the of speech. The great motion. The amplitude diminishes until psychophysical side is concerned with motion eventually stops altogether. The the of these sounds. A proper reduction in amplitude reflects damping, or understanding of speech requires knowl- the loss of energy. edge of both of these aspects of speech acoustics. A well known riddle asks, "If a Oscillograms: tree falls in the forest, but there is no The first major advance in the acoustic one to hear it, does it make a sound?" analysis of speech began with Oscillograms Of course, the answer depends on the (, or graphs of amplitude over definition of terms. If sound is defined with respect to human perception, then time) which was an important advance. In no sound could be verified. But if sound this method, the sounds selected for analy- is defined as a physical disturbance in sis were often vowels, because they are rel- the air, then sound must have occurred. atively easier to analyse than most conso- A riddle more regarding speech might nants. be: "If speech is made visible as pat- terns on paper (or a video monitor), but The Henrici Analyser: no one hears it, is it really speech?" Is one of the earliest tools for spectral Sound is vibration. Vibration is a repeti- analysis. It is a mechanical device consisting tive to-and-fro motion of a body. Usually of five rolling integrating Units (glass we do not directly hear the vibration in a spheres). They could obtain Oscillograms, sound source such as tracing on a surface,

Iranian Rehabilitation Journal 6 playback of original or edited signals; F0; calculating the values of amplitude, and Jitter or Shimmer analysis; spectrogram; pressure. FFT' LPC; and speech synthesis.

Filtering: The Speech Chain: A filter is a frequency-selective trans- Understanding of subject of spoken com- mission system. That is, a filter will pass munication requires knowledge of many dis- energy at certain frequencies but not at ciplines. The Speech Chain provides an others. A filter allows a selective look at easy-to-understand introduction to these the energy in various frequency regions. topics and explains how events within these domains interact to bring about communica- The Spectrograph: tion by speech. A convenient way of exami- Provided major advantages to the nation what happens during speech is to study of speech and was developed in take the simple situation of two people talk- 1940s. It was fast, provided a better ing to each other. For example, you as a description of the energy concentration in speaker want to transmit information to speech, and finally, it produced a running another person, the listener. The first think short-term spectrum, enabling the scien- you have to do is to arrange your thoughts, tists to visualise how energy concentra- decide what you want to say and then put tions changed in time. The display of the what you want to say into linguistic form. running short-term spectrum is called The message is put into linguistic form by spectrogram. selecting the right words and phrases to express its message, and by placing these Digital Signal Processing of Speech: words in the order required by the grammat- The dominance of the spectrograph ical rules of the language. This process is was seriously challenged only with the associated with activity in the speaker's introduction of digital computers. The , and it is from the brain that appropri- challenge has intensified with the contin- ate instructions, in the form of impulses ued refinement of computers (hardware) along the motor nerves, are sent to the mus- and analysis programs (software). Some cles that activate the vocal organs (the developments in the use of digital meth- , the , the , and the ods for speech analysis are: ). The nerve impulses display; waveform editing;

vol.4 - No .4 7 set the vocal muscles into movement, travel along the acoustic nerve to the listen- which in turn, produce precise pressure er's brain. In the listener's brain, a consider- change in the surrounding air. We call able amount of activity is already taking this pressure change a sound wave. place, and this activity is modified by the Sound waves are often called acoustic nerve impulses arriving from the ear. This waves, because acoustics is the branch modification of brain activity, in ways that are of physics concerned with sound. yet fully understood, brings about recognition The movement of the vocal organs gen- of the speaker's message. We see, there- erate a speech sound wave that travels fore, that speech communication consists of the air between speaker and listener. a chain of events linking the speaker's brain Pressure changes at the ear activate with the listener's brain. We shall call this the listener's hearing mechanism and chain of events the speech chain. produce nerve impulses that

Acoustic Analysis of Speech These include; waveform analysis, FFT or Acoustic analysis of speech is the LPC analysis, voice onset time (VOT) meas- study of the acoustical characteristics urements, formant frequency measurements, of speech, both normal and abnormal and so on. speech. It involves the physical aspects of spoken language.

Iranian Rehabilitation Journal 8 Some of these properties will be reviewed VOT which are shown in the schematic briefly. diagram below: -Zero VOT: where the onset of vocal fold Voice Onset Time (VOT): vibration coincides (approximately) with Voice Onset Time (VOT) for instance, is the the plosive release. duration of the period of time between the -Positive VOT: where there is a delay in release of a plosive and the beginning of the onset of vocal fold vibration after the vocal fold vibration. This period is usually plosive release. measured in milliseconds (ms), which is a -Negative VOT: where the onset of vocal microscopic analysis. It is useful to distin- fold vibration precedes the plosive guish at least three types of release.

Formant frequency with the lowest frequency ). Each Formant is the vocal tract resonance. formant can be described by two charac- Theoretically, there is an infinite number of teristics: centre frequency (commonly formants, but for practical purposes only the called formant frequency) and bandwidth lowest three or four are of interest. (formant bandwidth, which is a measure Formants are identified by formant numbers of the width of energy in the frequency (e.g., F1, F2, F3, and F4, numbered in suc- domain. cession beginning

vol 4 - No .4 9 Acoustical Characteristics of Vowels deed, the air behind the tongue will vibrate Producing different vowels is like altering at about 250 Hz, and the air in front of it at the size and the shape of the vocal tract, about 2100 Hz. which is like a tube. The air in this tube is set in vibration by the pulse of air from The relationship between the tongue posi- the vocal folds. Every time they open and tion and formant frequencies close, the air in the vocal tract above The frequency of F1 is inversely related to them will be set in vibration. Because, tongue height (e.g., high vowels have a low the vocal tract has a complex shape, the F1 frequency), and the frequency of F2 is air within it will vibrate in more than one related to tongue advancement (e.g., F2 way. Usually we can consider the body of frequency increases as the tongue position air behind the raised tongue (i.e., in the moves forward in the ). throat) to vibrate in one way, and the F1: High vowels such as /i/ and /u/ have a body of air in front of the tongue (i.e., in relatively low frequency of the first formant the mouth) to vibrate in another way. In (F1), whereas the low vowels /a/ and /ae/ the vowel /i:/ as in have a relatively high frequency of this for- mant, that is, the frequency of

Iranian Rehabilitation Journal 10 F1 varies inversely with tongue height of anterior dimension of vowel articulation. the vowel. This result points to an articulatory-acoustic F2: On the other hand, back vowels /u/ correspondence: the frequencies of the first and /a/ share a low frequency of F2, two formants, F1 and F2 can be related to whereas the front vowels /i/ and /ae/ have dimensions of vowel articulation. a high F2, that is, the frequency of the second formant varies with the posterior-

Figure 4: Illustration of the tongue position for the production of the vowels /i:/ and /u/.

Figure 5: The relationship between the tongue position and formant frequencies for different vowels.

vol. 4 - No .4 11 The Acoustic Characteristics of a)Vowels & Diphthongs Disordered Speech b)Sonorants: nasals, glides, liquids The Characteristics of disordered speech c)Obstruents: e.g., stops, fricatives are somewhat different from those of the normal speech. These include: Dysfluent vs. Fluent Vowels and voiceless plosives Howell and Vause (1986), in an attempt to The acoustic properties and formant fre- investigate the acoustic properties of stut- quencies of stutterers for fluently and tered speech, analysed the spectral prop- dysfluently produced vowels, as well as erties of 8 English vowels in adult stutter- the duration and amplitude of vowels ers in comparison with the subsequent were examined. The results revealed fluent vowels. They intended to examine that dysfluent vowels were shorter than some of the previous researchers findings the fluently produced vowels. Other such as Van Riper (1971) who had stated acoustic data showed that the brief vow- that, the stutterer produces the schwa els in repetitions had formant locations vowel at a point where some other vowel at the same frequencies as in the fluent is required. In this experiment, words con- speech of the same speaker. Stuttered sisting of a voiceless consonants in a vowels were low in amplitude and short CVC syllable were investigated, where the in duration, which caused stuttered vow- final consonant in the fluent word includes els to be perceived as schwa. nasals, fricatives (voiced and voiceless),

Iranian Rehabilitation Journal 12 Figure 6: Idealised spectrogram of a young male who stutters producing a sound/syllable repetition (SSR) on the word "but" with one iteration of "bu-" in the stuttered or repeated portion of the SSR ("bu-but"). The stuttered portion is immediately followed by the fluent portion used for stuttered versus fluent com- parisons (Yaruss and Conture, 1993, p. 888).

Figure 7: Spectrograms for the sentence "mama made apple jam" produced by a a speaker with velopharyngeal incompetence (hepernality) at the top (A) and by a normal speaker at the bottom (B).

vol.4 - No .4 13 References: Kent, R. D., and Read, C. (1992). The Acoustic Analysis of Speech. USA, Ball, M. J. and Code, C. (1997). Singular Publishing Group, Inc. Instrumental Clinical Phonetics. England, Rezaei-Aghbash, N. (2003). Acoustic Whurr Publishers. Characteristics of Dysfluent Speech: A Denes, P. B., and Pinson, E. N. (1993) Study of Persian and English stutterers. The Speech Chain: The Physics and An unpublished PhD dissertation. UK. Biology of Spoken Language. New York, Van Riper, C. (1971). The Nature of W. H. Freeman and Company. Fry, D. B. (1982). The Physics of Stuttering. 2nd edition. NJ (USA), Speech. Cambridge: Cambridge Prentice-Hall, Englewood Cliffs. University Press. Yaruss, J. S., and Conture, E.G. (1993). Howell, P., and Vause, L. (1986). F2 Transition During Sound/Syllable Acoustic Analysis and Perception of Repetitions of Children Who Stutter. Vowels in Stuttered Speech. Journal of the Acoustic Society of America, 79 (5), 1571-1579.

Iranian Rehabilitation Journal 14