Modeting and Synthesis of the Lateral N

Adrienne M. Prahler

Department of Electrical Engineering and Computer Science, Research bboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Abstmct: ~elateralconsonant in English isgenemllyprduced with abacked tongue body, amidtine closure of thetongoe blade at the atveolar ridge, and a path around one or both of the lateral dges of the tongue blade. In pre-vocalic lateral , the release of the closure causes a discontinuity in the spectral characteristics of the sound. Past attempts to synthesize syliable-irritid Iaterd consonants using formant changes alone to represent the discontinuity have not been entirely satisfactory. Data from prior research hasshown rapid changes notonly intheformmt frquencies but also in the glottal source amplitude -and spectrum and in the amplitudes of the formant peaks at the consonant release. Further measurements have ben made on additional utterances, guided by models of lateral prduction. Synthesis of laterat- syllables that included appropriate rapid changesin source amplitudes andin formant frequencies arejudgedby listeners to sound as natural as spoken syllables. The effect of including additional parameters such as formant bandwidths and the location of a pole-zero pairon the naturalness of thespthesized lateral-vowel syllables isnotclew ininitial permption tes@and more sensitive testing is required.

The lateral consonant fll is producd with a back and lowered tongue body and occlusion at the alveolar ridge. A complete closure is not made with the tongue, and airflow continues around the edges of the tongue. The fust formant is low, although it is higher than that t~ically found for a high vowel, and the second formant frequency, F2, is barely separated from F1. The third formant generally has a relatively strong amplitude and is higher in frequency than F3 for most . The lateral is prone to considerable variation depending on the individual and phonetic context, and this variability makes it more difficult to characterize than other consonants (1 ,2).

MODELING

Source-filter modeling of laterals can help to identify the various acoustic characteristics impormnt for these sounds. The vocal tract during the lateral consonant can k modeled as a tube with constrictions and side branches. me production of the lateral fl/ with an alveolar point of articulation crates an interior cavity formed by the tongue blade. An additional cavity is created under the tongue which couples with the back cavity, creating poles and zeros during the lateral (3). The side branches around the tongue affect the high frequency components of the lateral and account for the pole-zero pairs observed (2). Recent work by Narayanan et al. (4) using Magnetic Resonance Imaging (MM) and electropalatographic techniques @PG) confirmed the presence of side Iateral channels with great variation across subjects and phonetic contexts. Individual differences in the exact locations of the pole-zero pairs are expectd since the lengths of the side branches are so variable among speakers. In addition to the pole-zero pair, the extreme constriction during the lateral production causes a r~ction on the glottal source, and leads to a decreased amplitude of the volume velocity waveform at the glottis (5). The model also suggests that acoustic losses manifested in bandwidth increases are significant during the lateral.

MEASUREMENTS

A database of utterances containing several repetitions of prevmalic /1/ followed by six different vowels was recorded by four speakers. Acoustic analysis of these utterances examined attributes of the sound that providd information about back reactions on the glottal source during the lateral, pole-zero pairs, and increased bandwidths. Measurements of the formant frequencies and amplitudes were taken at two points in each utterance using a 6.4ms . . Hamming window and averaging over 12ms. The first measurement was taken 20ms before the release of the lateral and the second 20ms after the release. Shown in Figure 1 is the average change in amplitudes between the two measurement points for the four speakers. The amplitudes of the first three formant peaks incrmse by at least 7dB during the 40 ms surrounding the release of the lateral. The increase in A 1 and A2 can not be accounted for by simply the changes in frequencies of

257 the formants during the release, which are shown in Figure 2. The additional increase in amplitude can be accounted for by changes in the source, the bandwidths, and pole-zero pairs.

30 ~ 1000 ~ 800 20 ❑ Delta Al g= g ❑ Delta A2 - 10 H Delta A3 H n 200 o 0 ~ ~ >% g _& ~ = ~ word word

FIG~ 1. Change in amplitude of formant peaks. FIG= 2. Change in first two formant peak frequencies

SYNT~SIS

The theoretical model of the lateral together with data from the measurements, was used to guide the synthesis of two words using the Klatt synthesizer (6). The Matt synthesizer parameters that were manipulated were formant changes, TL (tilt), BW (bandwidths of formants), AV (amplitude of voicing), and additional poles and zeros. In planning the experiments, the changes of AV were included in all synthesize versions because informal listening suggested they were necessary to obtain a reasonable synthesized Iaterd. For each word, a basic synthesize version was creatd by altering only the amplitude of voicing and the formant frequencies at the lateral release. A s=ond version was created by including alterations of the formant bandwidths. The final synthesized utterance was produced by adding a pole-zero pair around the third formant and abmptly changing the pole-zero spacing at the time of release. These three synthesize versions were presentd with the natural utterances and 6 foils to four listeners in random order five times. The listeners were asked to rate each utterance on a continuous scale of O to I in terms of its naturalness. The results showed all stimuli were judgd to be just as good as the original uttermce although there were individual differences. Since there were no significant differences in the naturalness of the various synthesized utterances, a more sensitive test, possibly involving comparisons of pairs of stimuli, is needd to determine the actual impact of additional parameters to the quality of the synthesis.

CONCLUSIONS

The naturalness of the synthesize utterances approaches that of the original utterances with rapid transitions of the formants and changes in the amplitude of voicing. The perceptual effects of including changes in the bandwidths of the formants and the addition of a pole and zero pair are still not clear. The method of comparison of the various utterances was not sensitive enough to determine the affect of additional parameters to the quality of the synthesized lateral. However, these results support the conclusion that the key acoustic characteristic of the pre- vocalic lateral fl/ is an abruptness in the source amplitude and in the frequencies FI and F2 at the release of the lateral into the vowel. A good quality, natural sounding synthesized lateral must include this abruptness. [Work supported in part by a hBel Fellowship and by ~ Grant DC~75.]

REFERENCES

1. Rpy-Wtlson, C., Journal of Acoustical Socie~ of America 92, 736-757(1992). 2. Stevens, K.N., Acoustic , Cambridge, MA: Mm Press, in press. 3. Fant, G., Acoustic ~eory of , The Hague: Mouton, 1960. 4. Narayanan, S., Atwan, A., Haker, K., Journal of Acousrica[ Society of America 101, 1W-1 078 ( 1997). 5. Bickley, C.A., and Stevens, K.N., “Effects of vocaI tract constriction on the glottal source: Data from voiced consonants,” in T. Baer, C. Sasaki, and K. Harris (e&.), bryngeal Function in Phonatiorr and Respiration, San Diego: College Hill Press, 1987, Qp. 239-253. 6. Ktatt, D.H., and Watt, .C., Journal of Acoustical Society of America W, 820-857(190).

258