<<

THE PHARYNGEAL

Ashraf Alkhairy MIT Room 36-547, 50 VasserStreet, Cambridge, MA 02139, USA

ABSTRACT is stochastic, comprising of both deterministic and pseudo- random components. The acousticproperties of the Arabic pharyngeal approximant in In addition to the study of the nature of the sound, a intervocallic sequences are studied. This sound has traditionally comprehensive acoustic analysis is conducted. The analysis is been classified as a voiced , but our researchindicates directed towards investigation of acoustic variables that aid in that it is an approximant, with no IPA symbol. The attenuationin articulator- and source modeling of the pharyngeal the intensity of the from the neighboring is approximant, as well as in determining acoustic cuesfor their around 11 dB, and the fundamental of the consonantis perception. less than that of the vowels by 25 Hz. Study of the This paper is divided into six sections.Section 2 discusses trajectories indicates that Fl increasesand F3 decreasesinto the the samples that were used for analysis. Section 3 consonant for all contexts. The bandwidths remain unchanged investigates the nature of the sound. Section 4 studies the from that of vowels, except in the context of /a/ for which both . Section 5 presents results related to the glottal B2 and B3 increase.Examination of the estimatedglottal volume characteristics. Section 6 concludes the paper and provides velocity indicates an open quotient of lOO%,and a spectraltilt suggestionsfor future research. that is highly dependenton the context with an averageof 36 dB, and a value of 2%dB for the context of /a/. 2. EXPERIMENTS The Modem Standard Arabic (MSA) was used to conduct our 1. INTRODUCTION study. MSA is employed by the entire Arab population for The Arabic consists of a relatively large number of purposes of official communication, teaching, and literature. The produced with constriction in the uvular and speakers consisted of three educated adult male Saudi natives, pharyngeal regions, in addition to locations in the oral cavity with no speech or hearing impairments, and without phonetic between the velum and the lips. The articulation and training. Speech sampleswere recorded in a sound-treatedroom characteristics of the pharyngeal consonants is still not clearly into a digital computer with sampling rate of 1OkHz. understood because of lack of research, and the difficulty of The linguistic material for the experimentswere of the form measurements for these regions. These consonants involve IZNCVIZI, where represents the approximant, and denotes articulation of the tongue root, back wall of the pharynx, and the vowels. The approximant was in geminate form to allow possibly the epiglottis and other structures in the lower vocal more accuratemeasurements, and the vowels consistedof /a/, //, tract [1,2]. and Ii/. The VCV sequence were embedded within // to Traditionally, Arabic has been considered to have two provide the most natural context. In a number of cases the pharyngeal consonants, a voiced and a voiceless fricative properties of the consonant are comparedto that of the vowels. [ 1,2,3,4]. The “voiced fricative” has also been classified as a To accomplish this, reference utterances consisting of /z/V/z/ voiceless stop, voiced glide, and an approximant [ 1,2,3,4,5,6,71. were also recorded,where V representsthe vowels in long form. While these discrepancies in categorization of the voiced The long versions provide better circumstancesfor the study of fricative may attributed to differences in dialect or context, we the steady state characteristics. believe that this not to be the case.Existing studieshave relied on -ray images, and spectrographic plots to reach their 3. THE NATURE OF THE SOUND conclusions.In all cases,the images are not clear enough to make A number of analysis methods were used to determine the distinctions as far as mannersof articulation are concerned,and classification of the consonant under consideration.Both time- the plots are not of adequate resolution to make appropriate domain and frequency-domain analysis results indicate that the judgements. In this paper we make a detailed study of the nature sound is produced without the generation of any turbulence noise of the consonant and conclude that it is indeed an approximant, and without ; and hence should be classified as an and a new IPA symbol is required. approximant - with a new IPA symbol. The analysismethods Classification of a sound, as far as is used are pressure waveform examination, intensity analysis, concerned is important in its acoustic analysis, as well as ,and spectrum. modeling, synthesis, and the determination of perceptualcues. An approximant is produced with two articulators close to one 3.1. The Pressure Waveform another, but without a narrow enough constriction to generate A first indicator of the nature of the sound, although turbulence. In contrast,the areaand shapeof the cavity between inconclusive, is obtained from visual inspection of the the two constricting articulators in a fricative are such as to waveforms in the time-domain. An example is shown in Figure 1 produce turbulent airflow. In the first case,the acoustic signal is (Top). The plot is free from random noise and is representativeof deterministic, whereas in the secondcase the pressurewaveform all the waveforms.

page 1029 ICPhS99 Francisco

hcsgmi fcsgmu I I I

4

3.5 53 1 I 3 100 150 Time (msec) u” 2.5 z- E 2 IL 1.5

1

0.5

-2 ’ I I I I I 0 50 100 150 2al 250 * 250 300 350 400 450 500 Time (msec) Time (msec) Figure 1: Top: The pressurewaveform of the consonantin the Figure2: Spectrogramof the consonantin the contextof/u/. contextof/i/. Bottom: The bandpassfiltered waveform.

Absence of turbulence noise in the consonantcan be fcsgmu comirmed by inspection of the bandpassfiltered waveforms centeredon the third formant, with a bandwidthof 600 Hz [S]. This methodologywas appliedto the waveformsfor all speakers and contexts. The resulting filtered signals consistof damped sinusoidssynchronized with the original waveforms;a typical example of which is shownin Figure 1 (Bottom). We thus infer rz- 0 ----- ;-- --; ---- ;- that the soundis an approximantrather than a voicedfricative. I I I 4i , I I I I -----(---I *----- I g -10 - v I I I 3.2. Intensity Analysis 4 I I I I I I I I I An approximantis likely to havehigher energyin comparisonto I I I -20 - ----4------c----*- 0 I 0 I I I voiced becausethe constrictionin the vocal tract for an I I I I I I approximantis less obstructive. Although voicedfricatives are 30 - ----g-----c----*-, : 1 I I I I I I i I generally attenuatedby 20 dB, no exactranges are known for I I I I I 0 I I Y 1 I I I I I I I I I I I I I I I I I I and voiced fricatives. To examine intensity -40 I , I I , I 1 , 1 0 500 loo0 1500 2000 2500 3000 3500 4000 45llo 5000 characteristicsof our approximant, differences in intensity Frequency (Hz) betweenthe steady state values of the approximantand that of Figure 3: Widebandspectrum of the consonantwithin . the following vowels were computed. The meanvalues are as follows : /a/ context(12 dB), /u/ context(14 dB), /i/ context(11 dB), and overall average(12 dB). These results support the . observationthat the consonantis an approximant. hcsgma 30 I II 1 I I I I I I I I I I I I I I I I I I I I I , I I I , I 3.3. The Spectrogram 20 --+- ----q------I I I I I I To study the global characteristicsof the approximant,digital I I 10 -,I ----- I ------I I spectrogramsof the utteranceswere computedusing a Hanning I I I I , I window of length 7.97 ms, an overlap of 97%, lengthof 5 12 -- - - - - + - - - -- I I points, bandwidth of 180 Hz, and no pre-emphasis.A I I I I representativeexample is displayedin Figure2. The spectrogram 0 I has regular striations and is not diffused, indicatingcontinuous voicing andlack of turbulencenoise.

-30 3.4. The Spectrum Widebandspectra at the steadystate positions of the approximant -40 ----- -----; ---- 4 ----;----: ---- 4 -----; ----: -----; ----- I I I I I I I I I I I I I I I I were computedusing a Hanningwindow of size 6.4 ms with no I I I I I I I I I I I I I I I I I I I I , I 1 , 1 I I -50 pre-emphasisto determinethe generalspectral characteristics of 0 500 1000 1500 Moo 2500 3000 3500 4000 4500 5000 the approximant.As the representativeexample plotted in Figure Frequency (Hz) 3 indicates, the energy is clusteredin the mid-frequencyrange, Figure4: Narrowbandspectrum of the consonantwithin /a/. andno signsof anti-resonanceare present.

page 1030 ICPhS99 San Francisco

To examine aspects related to the duty-cycle of the glottal The data in Table 1 clearly indicatesthat the first formant of source from the spectrum, narrowband spectrawere computed the approximantis larger than that of all the vowels, and that the with a Hanning window of size 25.6 ms. As the illustrative third formant is smaller. This is in agreement with modeling example in Figure 4 indicates, the first harmonic has a large results [6]. In the context of/i/, however, the secondformant is amplitude, implying a high open quotient. This is examinedin smaller. Further research in acoustic is required to greaterdetail in a latter section. explain the result. Bandwidth values for the approximantin the context of /a/ 4. THE FORMANTS indicate a widening of the second and third formants when The primary parametric identifier of a are formant compared to that of the adjacentvowels. This featuremay aid in ; the first three of which can be predictedbased on distinguishing the approximant from the closely soundingback articulatory models in many cases [9, IO]. In this section we vowel, /a/; perceptual studies can validate this hypothesis.The present results related to the first five formant frequenciesand bandwidth data is in disagreementwith predictedmodel-based bandwidths. Although issues related to bandwidth and higher results, and further researchis neededto developmore accurate formants have not been investigatedin-depth in the literature - as models. far as modeling, synthesis, and recognition are concerned,we have included them in this study to provide more comprehensive 5. THE GLOTTAL CHARACTERISTICS resultsthat can be used in the future. Conventional acoustic analysis of a sound does not usually The formant frequenciesand bandwidthswere computedfor include investigation of its glottal attributes.This is becausethey all word sequences using the synchronous covariance LPC are considered secondary in importance to formant method, with the pre-emphasisof 0.95 and filter order of 12 characteristics, and until lately, no reliable and accurate (sometimes 10 or 14). Although the LPC method has been algorithms have existed for determining the glottal features[ 121. criticized for underestimating the bandwidth, it is the most A number of recent publications, however, have practical accurate method to date [ 111. To ensureaccuracy in demonstrated the importance of glottal characteristics in bandwidth estimation, we validated the resulting LPC values producing high-quality synthesized sounds [ 81. In addition, with the measurementsof the spectra. aspects related to articulatory-acoustic models for the glottal The statistical analysis results can be presented as either waveform have receivedconsiderable attention, and are likely to absolute values or as deviations from the references. We can encourage future investigations into analysis of the glottal either compute the statistics of the formants at the steadystate characteristics. times of the approximant within particular vowel contexts,or In this paper, we examine the glottal attributes of the compute the statistics of the difference betweenthe formants of pharyngealapproximant in considerabledetail. First, we examine the approximant within a particular vowel context and the the of the approximant. Next, we formants of the corresponding long vowel in a reference investigate temporal and spectral characteristicsof the glottal utterance. waveform. Tables 1 and 2 present the averagedeviations in formant frequencies and bandwidths respectively. Statistics of 5.1. The Fundamental Frequency differences, rather than absolute values, are tabulatedbecause To study fundamental frequency trajectories statistically, time they are more suitable for modeling purposes, and because duration between impulse markers of the words were computed significant differences can be distinguished from insignificant and reciprocated to computethe fundamentalfrequency in Hz at ones more readily. Cells with “*” indicate statistically each pitch period. Values at the steadystates of the consonant, insignificant changes. preceding and following vowels were selected, while making sure that jitters are avoided. Means of the differencesbetween these values and fundamental frequenciesof the corresponding long vowels spokenby the samespeakers were computedand are listed in Table 3.

I I /a/ context I /u/ context I /i/ context I Average I Pre-vowel -7 0 -1 -3 - Table 1: Deviations in formant frequencies(Hz). Consonant -27 -24 -37 -29 Post-vowel -12 -13 -12 -12 * I /a/ context I /u/ context I /i/ context I Average I Table 3: Deviations in fundamentalfrequencies (Hz). Examination of the results indicate that the fundamental frequencies of the approximant are noticeably lower than that of the vowels by around 30 Hz. The valuesat the consonant,when compared to that of the adjacentvowels, indicatesthat this is a phoneme characteristicrather than a suprasegmentalartifact. The Table 2: Deviations in bandwidths(Hz). deviations for the context of front vowels is larger than that of

page 1031 ICPhS99 San Francisco

back vowels because the fundamental frequencies for front in light of further research.Improved articulatory modelsneed to vowels are higher [ 131.The fundamentalfrequencies for /a/, /u/ be developed to explain the formant trajectory for the consonant and /i/ are 168, 173, and 180 respectively. Such changesin in the context of Ii/. Also, loses in articulatory modelsneed to be fundamental frequency can be expectedbecause of the proximity better understood to explain the bandwidth changesthat occur of the constriction for a pharyngeal approximant to the vocal when the approximantis spokenin the context of /a/. folds. Significant modeling research can be conductedin view of the results provided concerning the glottal characteristics. 5.2. The Glottal Waveform Variations in the open quotient and spectraltilt can be explained Temporal and spectral characteristics of the glottal waveform basedon future modeling approaches. were directly computed from the calculatedglottal waveform at A number of perceptual studies can be conducted to the steady-statepositions of the approximant.A decomposition examine the acoustic featuresthat effect either the intelligibility method was usedto estimatethe glottal signal [ 141.An example, or the quality of the sound.The contribution of bandwidth, open of the obtainedwaveform is shown in Figure 5. quotient, speed quotient and spectraltilt form major factors that may significantly effect the quality of the sound.

ACKNOWLEDGMENTS The author was sponsoredby KACST during his sabbatical leave at MIT, at which time this work was conducted. wishes to thank Kenneth Stevens for numerous discussions.

REFERENCES [l] Al-Ani, S. (1970). Arabic . Mouton, The Hague. [2] Ghazeli, S. (1977). Back consonants and backing coarticulation in Arabic, unpublished Ph.. thesis, Univ. of Texas at,Austin. [3] Delattre, P. (1971). Pharyngeal Features in the Consonants of Arabic, German, Spanish, and American English. Phonetics 23: 129-155. [4] Butcher, A. and Ahmad, . (1987). Some Acoustic and Aerodynamic Characteristics of Pharyngeal Consonants in Iraqi Arabic. Phonetica 44: 156-172. [5] Al-Ani, S. (1967). An Acoustical and Physiological Investigation of the Arabic /*/. Acts of the Xth International Congress on . Bucharest. 0 20 40 60 80 loo 120 140 160 180 200 Time (percentage of pitch period) [6] Klatt D.. and Stevens K.N. (1969). “ Pharyngeal Consonants”, QPR no. 93, RLE, MIT, 207-215 Figure 5: The glottal waveform for the approximantin the (two [7] Alwan, A. (1986). Acoustic and Perceptual Correlates of Pharyngeal pitch periods shown). and Uvular Consonants, unpublished Master’s thesis, MIT. [8] Klatt, D. and Klatt, L. (1990). Analysis, Synthesis and Perception of The graph shows a low-bandwidth signal with an open Voice Quality Variations Among Female and Male Talkers. Journal of quotient (OQ) of 100% and a large speedquotient (SQ). These the Acoustical Society of America, 87:820-857. results are representative of all occurrencesof the approximant, [9] Fant, G. (1970). Acoustic Theory of . Mounton. as indicatedby the statistical analysisshown in Table 4. The last [lo] Stevens, K. (1998). . To be published. [ 1 l] Wakita, H. (1980). New Methods of Analysis in Speech Acousics. row in the table lists the spectraltilt (TL), which measuresthe Phonetics 37:87-108. attenuation (in dB) betweenthe amplitudesat 300 and 3000 Hz. [ 121 Fant, G. (1993). Some Problems in Voice Source Analysis. Speech The average spectral tilt is around40 Hz, which is the sameas Communication, 13 :7-22. that of vowels, and is in agreement with the attribute of ideal [13] Lehiste, I. and Peterson, G. (1961). Transitions, Glides and models. The spectraltilt for the approximantin the context of /a/, . Journal of the Acoustical Society of America, 33:268-277 is however low at around 20 dB. This may be a featurethat is [14] Alkhairy, A. (1999). An Algorithm for Glottal Volume Velocity used to enhancethe distinction of the approximant from the Estimation. Proceedings of ICASSP, March 1999. closely soundingvowel /a/.

Table 4: Glottal characteristics.

6. CONCLUSION This paper has investigatedthe acousticproperties of the Arabic pharyngeal approximant. While some of our findings are in agreementwith existing predictions, othersneed to be examined

page 1032 ICPhS99 San Francisco