Lingual Articulation of the Sūzhōu Chinese Labial Fricative Vowels
Total Page:16
File Type:pdf, Size:1020Kb
LINGUAL ARTICULATION OF THE SŪZHŌU CHINESE LABIAL FRICATIVE VOWELS Matthew Faytak1, Jennifer Kuo1, and Shunjie Wang1 1University of California, Los Angeles [email protected], [email protected] ABSTRACT Figure 1: Frontal lip positions at vowel midpoint for the three vowels at issue for S44. In this paper, we describe lingual activity for three back vowels with different labial constrictions in [u] [ɯβ] [ɯv] Suzhou Chinese: a rounded high back vowel [u] and two more unusual vowels, vertically compressed [ɯβ] and labiodental [ɯv]. The latter two vowels are so-called fricative vowels, consistently produced with slight fricative noise; [ɯβ] also sporadically ex- hibits bilabial trilling. Smoothing-spline ANOVA sion in a vowel with a similar but distinct acous- models of tongue surface contours extracted from ul- tic quality that we transcribe as [ɯβ]; and fricative- trasound recordings suggest that [ɯβ] and [ɯv] have like labiodental constriction in a vowel we transcribe a tongue position lower and fronter than [u]. Linear as [ɯv]. Typical lip positions for each vowel are mixed effects modeling of contour shape parameters shown in Figure 1. The latter two vowels can be also suggests that [u] has a less flat and more com- said to continue the constriction type of the conso- plex, back-raised tongue shape than [ɯβ] and [ɯv]. nants that obligatorily precede them: [ɯβ] occurs We relate the lowered tongue body of [ɯβ] and [ɯv] only after bilabial stops /p, ph, b/, and [ɯv] occurs to the trading relations between lingual and labial ac- only after labiodental fricatives /f, v/ [25, 16]. On tivity that characterize rounded vowels, as well as to distributional grounds, [u] and [ɯβ] are in fact con- aeroacoustic properties of labial fricatives and bil- trastive (though [u] and [ɯv] are allophonic): al- abial trills. though [ɯβ] is restricted in its occurrence, [u] may Keywords: Ultrasound, Wu Chinese, trading rela- occur in the same environment, and minimal pairs tion, fricative vowels are easily found (see Table 1). The two vowels [ɯβ] and [ɯv] have been called 1. INTRODUCTION syllabic labial fricatives or labial fricative vowels, in part due to the acoustic consequences of their The labial component of vowel articulation is known labial constrictions [25, 27, 16]. For [ɯv], labioden- to involve several different types of constriction, in- tal frication from the preceding [f] or [v] onset con- cluding in-rounding, out-rounding, and vertical bil- tinues through the entire vowel. Any fricative noise abial compression [15, 17]. While usage of differ- present in [ɯβ] is more spectrally diffuse and sub- ent labial constriction types most often systemati- tle, but it is also sporadically produced with bilabial cally varies with the backness of the vowel—front trilling. The aerodynamic conditions for trilling do rounded vowels tending to be out-rounded and back not seem to be met in all tokens of [ɯβ]; when this rounded vowels tending to be in-rounded—different is the case, the resulting vowel quality is not easily behaviors can be found cross-linguistically [17, 26]. distinguished from [u], but the position of the lips is A distinction between out-rounding and compres- still visually distinct from [u]. Vowels with similar sion also minimally contrasts vowel categories in a constrictions to these have been attested in a handful handful of languages, notably in a pair of Swedish of languages of southwestern China [7, 3, 4, 9] and high vowels ranging from central to front [13, 22] Cameroon [10, 20]. and in a pair of back vowels in Shanghainese [2]. While the labial articulations of [ɯβ] and [ɯv] Like Shanghainese, Sūzhōu Chinese exhibits a have been remarked upon in various languages, to three-way contrast in labial activity in a set of back our knowledge, only Ling [16] has collected data vowels: rounding in [u] that varies between in- on their lingual articulation; this data is somewhat rounding and out-rounding; vertical lip compres- limited, however, both in speaker population (n=3) 1440 and spatial resolution (three fleshpoints tracked us- ing EMA). We thus aim to characterize the lingual Table 1: Stimuli with the simplified Chinese char- β v acters used for display. Notation for each vowel as activity of [ɯ ] and [ɯ ] relative to [u] in Sūzhōu used in [16] is provided for reference. Chinese with a larger data set and a richer spatial rep- resentation using ultrasound tongue imaging. Sev- Vowel As in [16] Stimulus Gloss eral factors lead us to hypothesize that the tongue 44 will exhibit a greater back-raising excursion for [u] [u] o 疤 [pu] ‘scar’ than the other two vowels, which have more substan- [ɯβ] u 播 [pɯβ]44 ‘spread, sow’ tial lip constrictions with fewer articulatory degrees [ɯv] u 夫 [fɯv]44 ‘husband’ of freedom. In particular, rounded vowels such as [i] ɪ 边 [pi]44 ‘side’ [u] are known to exhibit trading relations in realizing [æ] æ 包 [pæ]44 ‘package’ their low F2 targets [21], which all three Sūzhōu Chi- nese vowels share [16]. A prediction of this model is that [ɯβ] and [ɯv], having more consistent and crophone attached to the stabilization headset’s right more constricted lip settings, should have less of a cheekpad arm. Audio was digitized using a Focus- need for lingual activity in service of a back vowel rite Scarlett 2i2 USB audio interface, which was also quality (i.e., low F2), and may be produced with a configured to accept the synchronization pulse train lower or centralized tongue position relative to [u]. generated by the ultrasound device for time align- This study can also be seen as a test of whether ment of the articulatory and acoustic signals. some lingual articulatory properties of similar con- sonants extend to the segments at issue. Known 2.2. Materials aerodynamic requirements for producing labioden- tal fricatives and bilabial trills, which may be shared Stimuli, shown in Table 1, were presented as sim- by [ɯv] and [ɯβ], respectively, make different pre- plified Chinese characters in randomized order. dictions. Active lowering of the tongue dorsum has The stimuli were interspersed with other characters been observed during the production of labioden- which have readings containing different consonan- tal fricative consonants [24], which should extend tal onsets and vowels not relevant to the present straightforwardly to [ɯv]. However, known lingual study. Stimuli were displayed on a computer screen, articulatory strategies for bilabial trills, which pref- and participants were instructed to produce them in a erentially occur before high back vowels [8], predict frame sentence with a reading appropriate to Sūzhōu tongue dorsum height similar to [u]. Chinese rather than Standard Chinese. Participants produced 9–13 tokens of each stimulus. 2. METHODS 2.3. Analysis 2.1. Data collection The series of ultrasound frames recorded during the Participants were 15 native speakers of Sūzhōu Chi- production of each vowel token was extracted from nese (13 F, 2 M) who took part in the larger study the ultrasound video. For each vowel token, tongue described in [11]. Participants have similar residen- surface contours consisting of 100 sampling points tial and linguistic histories: all are long-term resi- were extracted from the frames in this series using dents of Sūzhōu, and all report native proficiency in EdgeTrak [14]. The frame closest to the acoustic Sūzhōu Chinese and high or native-like proficiency midpoint of each target vowel was selected. Each in Standard Chinese. participant’s set of midpoint tongue surface contours Ultrasound video was recorded using an was submitted to a smoothing-spline analysis of vari- Echo Blaster ultrasound device equipped with ance (SSANOVA) [5]; splines are generated in polar a PV6.5/10/128 Z-3 microconvex probe, with a coordinates and then converted to Cartesian coordi- typical frame rate of 54 Hz. The ultrasound probe nates to avoid distortion in the tongue tip and root was held in place under the chin using an Articulate regions [19]. The resulting models are not directly Instruments, Ltd. stabilization headset [23]. The comparable across participants due to variation in probe angle relative to the occlusal plane varies vocal tract morphology and probe orientation. from participant to participant given the need to A discrete Fourier transform (DFT) was also per- accommodate differences in participant jaw and formed on the contour for each midpoint frame for chin morphology. [u], [ɯβ], and [ɯv], following the method imple- Audio was recorded at a sampling rate of 44.1 mented in [6]. DFT converts contour data into a kHz using a Sony ECM-77B electret condenser mi- smaller set of numerical coefficients representing the 1441 frequency and magnitude of sine and cosine func- Figure 2: SSANOVA splines for four representa- tions that can be combined to model the basis data. tive speakers; anterior is right. The DFT used here is computed from the tangent an- S03 S07 gle of each point in the contour and makes no refer- ence to a fixed coordinate system. As such, the co- efficients analyzed here do not reflect contour posi- tion or rotation, but rather solely contour shape. This property is desirable for the present study, given that the direction of speaker-specific articulatory lines u u ɯβ ɯβ may differ in physical space owing to variation in ɯv ɯv vocal tract morphology. S37 S44 DFT coefficients have real and imaginary por- tions, which correspond to the phase and magni- tude of the sinusoidal basis functions, respectively [6]. Coefficients’ real parts model the anterior- posterior position of the bulk of the tongue, and co- u u efficients’ imaginary parts the extent of bunching or ɯβ ɯβ ɯv ɯv raising. The order n of coefficients corresponds to a wavelength of 1/n contour arc lengths for the si- but we hypothesize that both will point to a larger nusoidal functions represented, such that higher co- back-raising excursion for rounded [u] than for the efficients represent greater spatial frequencies.