The Palatal Stop: Results from Acoustic-Articulatory Recovery of Articulatory Movements

The palatal stop: Results from Acoustic-articulatory recovery of articulatory movements Christian Geng∗, Ralf Winkler†∗ and Bernd Pompino-Marschall‡∗ ∗ Research Centre for General Linguistics [email protected] † Institute for Communications Research, Technical University Berlin [email protected] ‡ Humboldt-University Berlin [email protected] ABSTRACT LPC approach uses the fact that the filtering process of the lossless uniform tube model of the vocal tract The articulatory data situation with respect to the is the same as that of the optimal inverse filtering of palatal stop in Czech is dissatisfying: Static X-rays, the speech signal with proper boundary conditions at linguo- and palatograms still seem to be state of the the glottis and the lips [3]. Sorting methods perform art. This study aims at the potential benefit of sampling of the articulatory parameters from the artic- acoustic-articulatory recovery strategies in the deter- ulatory model and establish tables of vocal tract shapes mination of place of articulation features. Results in- and related acoustic representations, usually formant dicate recovery problems using area functions as in- frequencies. These tables are used for matching vocal- put data, that vanish, if linear articulatory models are tract geometry and acoustic representations. One ap- used. The recontructions suggest a primary laminal proach within a computer-sorting framework was de- and a secondary dorsal component for the articulation scribed in a paper by Atal, Chang, Matthews and for the Czech palatal stop. Tukey[4]. Another approach is described in a more recent paper by Story and Titze[5]. They establish a mapping between the first two formant frequencies 1 INTRODUCTION and vocal tract area functions of vowels measured by Magnetic Resonance Imaging (MRI): In the first step, each vocal tract area function in the data set is inter- To our knowledge, nobody has made recordings of polated to a constant number of sections. From these Czech palatal stops using contemporary methods as empirical area functions, a neutral, schwa-like config- electropalatography(EPG) or the Eletromagnetic Mag- uration is calculated as the mean of the whole data netometer(EMA) so far. To give a short overview set, which is in turn subtracted from each individual over the data situation and the fragmentary knowl- (empirical) area function. The resulting displacement edge available, (if not mentioned differently, cited after matrix from the neutral configuration is subjected to Keating et.al.[1]), we contrast the Czech palatal stop a Principal Component Analysis (PCA) or -the pre- with the velar and the palatal glide: Comparing the ferred term of the authors- empirical orthogonal mode velar and palatal stops, it was observed that the oc- decomposition and the first two factors are retained. clusions of the velars are slightly longer than for the The resulting solution can be used as a generator of palatals. Stevens[2] uses occlusion lengths of 2cm in an infinite number of area functions. Story and simulating the velar stop. The data further suggest Titze’s model sweeps the first two coefficients of the so- that the occlusion is made with the blade, not the tip lution through 50 steps each to generate 2500 distinct or body of the tongue. Contrasting the palatal stops area functions, which by an area-to-spectrum transfor- with the palatal glide [j], it can be stated that both mation and formant peak picking result in a large ta- of them involve a laminal and a dorsal component, the ble of factor coefficients and associated F1/F2-pairs. difference consists in the dominance of one or the other: In the next step, a data gridding and cubic-spline- For the stop, the laminal articulation is dominant, the interpolation technique is applied, which can in turn be dorsal secondary, and the reverse holds for the palatal used to associate a given F1/F2-pair with correspond- glide. ing coefficients to produce the desired area functions. Various techniques have been proposed to derive use- Finally, nonlinear optimization approaches have not ful vocal tract information from the speech signal. The been listed. One of these approaches, called Simulated 2 DATA 2.1 Articulatory Data With one speaker of German, 3D MRI data on 26 held articulations (the tense vowels [i:,y:,e:,E:,ø:,A:,o:,u:], lax vowels [I,Y,E,œ,a,O,U] neutral vowels [@] and [5], nasals [m,n,N],fricatives [f,s,S,¸c,x] and the lateral [l]) were recorded at the radiology department of Virchow Hos- pital Berlin with the procedure Fast Brain SAG TSE 979 8.0 90 at a Gyroscan NT, Philips Medical Sys- tems. The recordings were made as 18 sagittal slices of 3 mm thickness at steps of 3.5 mm with a pixel di- mension of 0.586 x 0.586 mm. One recording lasted Figure 1: Left: Tracing of palatogram; the contacted about 12 s. Phantoms of the teeth were later inserted area is shaded. Right: Tracing of linguogram; by means of an interactive software. The 3D area func- the contacted area is shaded (from Daneˇs, F., tion were calculated in three steps: (i) from a prelimi- Hála, B., Jedliˇcka, A. and Romportl,M. [6], af- nary semi-polar grid on the midsagittal slice a geomet- ter Keating & Lahiri[1] rical midline of the tract is constructed, (ii) a second grid, perpendicular to this midline is constructed and smoothed with respect to sudden changes in the direc- tion of airflow, and (iii) the 3D data are determined by calculating the area of the air column at the planes Annealing, is an extension of the classical Metropolis defined by these gridlines through all 18 slices. algorithm [7]. Simulated Annealing is a stochastic optimization technique that can process quite arbitrary 2.2 Acoustic Data degrees of nonlinearities, discontinuities and stochas- The acoustic raw material consists of the palatal ticity in highdimensional parameter spaces. The vari- stops of Czech in syllable-initial position as de- ant used here is Adaptive Simulated Annealing [8], a scribed in the Handbook of the International Phonetic more flexible variant of classical Simulated Annealing Association[10]1. These recordings were lowpass fil- as described in [9]. tered and resampled at 11 kHz. Formant tracks were Now, we can formulate the aims of this study: From created using PRAAT[11]. As the acoustic represen- formant information, we will try to recover meaning- tation of the recovery model consists in formant infor- ful articulatory information with respect to the palatal mation, the first recovered area functions reported are stop in Czech. not associated with the complete occlusion, but with the first valid frame of the formant track. The formant 4000 voiced voiceless 3500 3000 2500 2000 1500 1000 Formant Frequencies [Hz] 500 0 0 10 20 30 40 50 60 Time[ms] Figure 3: Formant track of the CV transition for [cElo] éE Figure 2: Sagittal X-Ray of a Czech palatal stop (from and [ lo] Daneˇs et. al. [6],after Keating & Lahiri [1]) tracking procedure did not find realistic formant values 1at http://web.uvic.ca/ling/resources/ipa/handbook.htm for some frames due to the presence of friction. Since 4.5 the focus here is on articulatory recovery, these values ST98 table search were imputed by means of linear interpolation. The 4 formant track for the voiced cognate is given in Fig.3. 3.5 3 ] 2 2.5 3 RESULTS 2 Area [cm For our own data, Story & Titze’s method worked best 1.5 if tense vowels were used as input data, i.e. those that 1 are close to the border of the maximal vowel space 0.5 (MVS,[12]). In other words, only a subset of our area 0 functions was used. As this method is at the same time 0 5 10 15 20 based on the first two principal components only, it was Distance from glottis [cm] resorted to another strategy: A lookup-table was gen- erated by sweeping the values of the first three factors Figure 4: Reconstructed area functions for the first tar- that explained about 83% of the variance through ±3 get frame of the voiceless stimulus [cElo]. Solid standard deviations around the means for each factor, line: Recontruction scheme used by Story & but inputting all of our empirical area functions. This Titze. Dotted Line: Table search resulted in a table of about 64000 area functions. This table could have been pruned according to suggestions made by e.g. Boë et.al.[13] in order to retain only high partitioning of the input space might explain why the articulatorily meaningful configurations. They re- all our attempts to obtain reasonable configurations moved configurations with constrictions smaller than by means of simulated annealing failed. On the other 2 20mm , (the lower limit for laminar airflow), configu- hand, control analyses with Simulated Annealing using rations with contrictions centered less than 4.5 cm from the articulator-based model of Maeda [14] quickly re- the glottis (these are articulatorily impossible) and for- sulted in reasonable reconstructions (see Fig. 5). The mants which do not lie in the range of the MVS (ac- only constraint on the articulators that was made was cording to the MVS, the minimum of the first formant constraining the control parameter tongue body not is at 250 Hz, the range of the second formant is be- to move more than 1.5 standard deviations around the tween 510 and 2295 Hz.) Since the focus here is rather mean for this parameter in order to prevent meaning- on the generation and interpretation of vocal tract tar- less configurations. The cost function was specified as get configurations than on vowel spaces or trajectory the percentage of deviation of the original formants generation, an alternative procedure was used: Fur- to the target formants.

Load more