The Phonetic Specification of Contour Tones: the Rising Tone in Mandarin
Total Page:16
File Type:pdf, Size:1020Kb
ICPhS XVII Special Session Hong Kong, 17-21 August 2011 THE PHONETIC SPECIFICATION OF CONTOUR TONES: THE RISING TONE IN MANDARIN Hyesun Choa & Edward Flemmingb aSeoul National University, Korea; bMassachusetts Institute of Technology, USA [email protected]; [email protected] ABSTRACT transition to have a specified target in such tones, contrary to the SAH [5, 10]. This paper investigates the phonetic specification In this study we investigate the phonetic of contour tones through a case study of the rising specification of the Mandarin rising tone, testing tone in Mandarin. The patterns of variation in the for targets for 3 properties of the tone (fig.1): (1) realization of the rising tone as a function of the alignment of the onset (L) and offset (H) of the speech rate indicate that the specification of this rise to the segmental string, (2) the magnitude of tone involves targets for the slope of the f0 rise, the the rise (M), and (3) the slope of the rise (S) by magnitude of the rise, and the alignment of the examining this tone under variation in speech rate. onset and offset of the rise. These targets conflict so the realization of the tone is a compromise Figure 1: A schematic illustration of a rising f0 between them. This analysis is formalized as a movement. quantitative model of tone realization formulated in terms of weighted constraints enforcing tone targets. Keywords: contour tones, phonetic constraints 1. INTRODUCTION If the onset and offset of the rise are In autosegmental analyses contour tones are consistently aligned to segmental anchors then as represented as a sequence of level tones [4], and speech rate increases, moving the anchors closer this conception has been extended to the level of together, the duration of the rise should decrease. phonetic implementation in many models of If speakers try to maintain a target magnitude of F0 intonation. That is, the F0 contour is derived rise then a decrease in rise duration will result in through interpolation between point targets increased slope, whereas if they try to maintain a associated with individual High and Low tones, constant slope it will result in a rise of smaller even if the tones form part of a bi-tonal accent magnitude. If speakers try to keep both slope and such as a rising pitch accent [7]. On this view the magnitude constant, then variation in the duration transitions between tones are the result of general of the rise must be limited and consistent principles of interpolation rather than being segmental anchoring will not be possible. Thus the specified by tonal targets. This is made explicit in patterns of variation with speech rate can reveal the Segmental Anchoring Hypothesis (SAH), the nature of the targets of the rising tone. according to which ‘the beginning and end of a pitch movement are anchored to specific locations 2. EXPERIMENT relative to segmental structure, while the slope and 2.1. Speech materials and methods duration of the pitch movement vary according to the segmental material with which it is associated’ Four speakers of Beijing Mandarin Chinese, two [6]. This approach has been successfully applied to male and two female, were recorded reading 15 rising pitch accents in languages such as Greek [1]. disyllabic words containing almost all sonorant Intonational and lexical tones are represented in consonants, where each syllable carries the rising the same way in phonology so we might expect tone. The analysis focuses on the first rising tone. their phonetic realizations to be similar, but F0 The words were produced in a carrier phrase that slope is an important cue to the distinction between placed the target word after a low tone. The level and rising lexical tones in a language like subjects read the materials, together with filler Mandarin, so we might expect the slope of the sentences, at normal, fast and slow rates. 112 ICPhS XVII Special Session Hong Kong, 17-21 August 2011 2.2. Measurements the anchor for H, AH, was at 107% of rime duration, i.e. just after the end of the syllable (cf. [2, 9]). The F0 trajectory of the rising tones generally shows a relatively level interval followed by a rise Figure 3: Timing of H (circles) and L (+) as (fig.2). H is taken to correspond to the F proportions of rime duration plotted against rime 0 duration (ms). maximum, but L does not correspond to an F0 minimum, particularly where the low plateau shows a slight rise, as in fig. 2 (cf. [9]). Intuitively, L is at the ‘elbow’ between the plateau and the rise. This point was located algorithmically, adapting a H procedure described in [8]: the F0 trajectory from F0 minimum to F0 maximum was approximated by AH three straight lines, fitted to minimize squared A error. The onset of the rise is the beginning of the L steepest line segment (‘e2’ in fig.2). The timing of L L and H were measured relative to word onset. The magnitude M is then F0 at H minus F0 at L, and the average slope is M divided by the duration between L and H. The peak velocity of the rise (‘m’ in fig.2) was measured by fitting cubic The tones remain close to these anchor points, smoothing splines to the F trajectory between the 0 as predicted by the SAH, but there are systematic F minimum and maximum, and finding the peak 0 deviations from these alignment targets as a of the derivative of the smoothed curve. However function of speech rate (fig. 3). The figure shows peak velocity is highly correlated with average 2 the timing of L and H as proportions of the syllable slope (r =0.92), so we will only report on the latter rime plotted against rime duration. The dashed here. Segment boundaries in the target word were lines mark A and A . H occurs progressively later labeled manually. L H than AH (above the dashed line) as speech rate Figure 2: Pitch track of the word mingnian ‘next year’. increases (i.e. as rime duration decreases), whereas L occurs progressively earlier than AL as speech H rate increases. That is, speakers are deviating from the anchors to avoid the rise becoming too short. This is expected if speakers have targets for slope L and magnitude since maintaining relatively constant slope and magnitude implies limiting variation in rise duration. 3.2. Magnitude and slope of the rise Neither slope nor rise magnitude is actually 3. RESULTS constant. Magnitude increases with increasing rise duration (fig.4), but not sufficiently to maintain a The speech rate manipulation successfully elicited constant slope, so slope decreases with increasing a wide and continuous range of syllable durations rise duration (fig.5). (fig. 3). Even the fastest speech rate was judged by The effect of duration on magnitude is a native speaker to be clearly comprehensible. significant: a linear mixed effects model predicting rise magnitude as a function of duration, with 3.1. Segmental anchoring random intercepts by speaker, fits significantly Anchor points for L and H were identified by better than a model according to which rise looking for segmentally defined points that had the magnitude is constant (2(1)=111, p<0.0001). A smallest squared deviation from the f0 events similar test shows that the effect of duration on across all syllable durations. Segment boundaries slope is also significant (2(1)=55, p<0.0001). and proportions of syllable and rime duration were To sum up: none of the properties of the rise are tested. The best anchor for L, AL, was about half invariant across rates, and we observe all of the way through the rime (56% of rime duration) and 113 ICPhS XVII Special Session Hong Kong, 17-21 August 2011 patterns of variation outlined in section 1, syllable duration, is selected to minimize violation indicating that there are targets for all three of the constraints. The cost of violation of a properties of the rise. It is not possible to realize all constraint is equal to the square of the deviation of these targets and this conflict is resolved by a from the target (table 1), and the total cost of a compromise between them: as AH and AL move candidate set of values for the timing of L and H closer to each other at faster speech rates, so do L and the average slope, S, is the weighted sum of its and H, but the tones lag behind their anchors, so constraint violations (1). Minimizing this cost slope increases and magnitude decreases. yields a compromise between the various targets. Figure 4: Rise magnitude plotted against rise duration. Table 1: Constraints and costs of violations. The line shows the model of this relationship discussed in section 4. Target Constraint Cost of violation 2 Magnitude M = TM wE(S(H-L)-TM) 2 Slope S = TS wS(S-TS) 2 L alignment L = AL wL(L-AL) 2 H alignment H = AH wH(H-AH) 2 2 2 (1) Cost = wE(S(H-L) – TM) + wS(S – TS) + wL(L – AL) + 2 wH(H – AH) The constraints specify targets for M, S, and timing of L and H, but we select values of just L, H and S because M can be calculated from these quantities as S(H-L) (slope times rise duration). We will refer to the duration H-L as D. The minimum cost lies at the point where the partial derivatives of the cost function (1) are equal to zero. Accordingly we can derive the Figure 5: Average slope of the rise plotted against rise duration.