Analysis of F0 Contours of Cantonese Utterances Based on the Command-Response Model ISCA Archive
Total Page:16
File Type:pdf, Size:1020Kb
INTERSPEECH 2004 -- ICSLP 8th International Conference on Spoken ISCA Archive Language Processing http://www.isca-speech.org/archive ICC Jeju, Jeju Island, Korea October 4-8, 2004 Analysis of F0 Contours of Cantonese Utterances Based on the Command-Response Model Wentao Gu1, 2, Keikichi Hirose1 and Hiroya Fujisaki1 1The University of Tokyo, Japan 2Shanghai Jiaotong University, China {wtgu; hirose}@gavo.t.u-tokyo.ac.jp [email protected] somewhat from one reference to another. Such a notation Abstract system provides a simplified canonical form for tones in As a major Chinese dialect, Cantonese is well known for its isolated syllables, but the scales are subjective and relative, complex tone system. This paper applies the command- and the actual F0 values change with the tonal context, word response model to represent the F0 contours of Cantonese accentuation and phrase intonation. In most references, the speech. Analysis-by-Synthesis is conducted on both utterances five scales are interpreted as indicating the beginning and end of carrier sentences and utterances with less constrained F0 values in a syllable, and the F0 contour is approximated by structures, from which a set of appropriate tone command piecewise straight lines. However, this is too simple to patterns is derived. By intrinsically incorporating the effects of describe the actual F0 contours accurately. tone coarticulation, word accentuation and phrase intonation, To overcome the intrinsic limitation of the traditional tone the model provides a high accuracy of approximation to the F0 scale notation, we introduce the command-response model for contours of Cantonese, and hence serves as a much better the generation process of F0 contours of Cantonese. method to quantitatively describe the continuous F0 contours than the traditional tone letter scale notation system. The Table 1: Some traditional descriptions of Cantonese tones. constraints in timing and amplitude of tone commands are also investigated, which can be used for synthesis of F0 contours. Tone name in Tone * Pitch 5-scale Middle Chinese system number feature notation 1. Introduction Upper-level T1 High level55 Upper-elevating T2 Mid rising 35 Non- An accurate and quantitative representation of the essential Upper-departing T3 Mid level 33 entering characteristics of the F0 contours of speech is necessary for Lower-level T4 Low falling21 both text-to-speech synthesis and automatic speech tones Lower-elevating T5 Low rising 13 recognition, especially for tone languages. For this purpose, Lower-departing T6 Low level 22 the command-response model for the process of F contour 0 Upper-entering T7 High level 5 generation, proposed by Fujisaki and his coworkers, is useful Entering Middle-entering T8 Mid level 3 since it can generate very close approximations to observed F tones 0 Lower-entering T9 Low level 2 contours from a relatively small number of linguistically meaningful parameters. The model has been successfully * Note: T1~T4 here are different from those of Mandarin. applied to tone languages including Mandarin and Thai [1]. In this paper we investigate the possibility of its application to 3. The command-response model for 10.21437/Interspeech.2004-295 Cantonese, a Chinese dialect with a complex tone system. generating F0 contours of tone languages 2. Cantonese tone system Figure 1 shows the diagram of the model for tone languages. It describes F0 contours in the logarithmic scale as the sum of Cantonese is one of the major dialects of Chinese spoken by phrase components, tone components and a baseline level. over 60 million people in Hong Kong, Guangdong and The phrase commands (impulses) produce phrase components Guangxi provinces of China, and in many overseas Chinese through the phrase control mechanism, giving the global communities. Although Cantonese largely shares the same shape of the F0 contour, while the tone commands (pedestals) writing system and the same monosyllabic nature as Mandarin, of both positive and negative polarities generate tone it is much richer in the number of tone types. It is usually components through the tone control mechanism, accepted that Cantonese has nine citation tones, which characterizing the local F0 changes. Both mechanisms are preserve the tonal categories of Middle Chinese (7th~10th assumed to be critically-damped second-order linear systems. century A.D.). Table 1 gives some traditional descriptions of all the nine citation tones. PHRASE COMMANDS Ap Gp (t) The syllables of entering tones end with an unreleased PHRASE TONE COMPONENT T PHRASE 03 COMPONENTS PHRASE COMPONENT CONTROL lnF (t) T T t 0 stop coda /p/, /t/ or /k/, and are comparatively shorter in 01 02 MECHANISM duration than those of non-entering tones. Each entering tone lnFb has its counterpart of non-entering tone, showing a similar F0 lnFb t A TONE COMMANDS FUNDAMENTAL pattern. T7, T8 and T9 correspond to T1, T3 and T6 t Gt(t) FREQUENCY TONE CONTOUR respectively. Therefore in some transcription schemes only six T12 TT14 24 T 26 CONTROL TT11 21 T23 T15 t MECHANISM TONE tones are distinguished. T 22 (T13 ) T 25 (T16 ) COMPONENTS Traditionally a 5-scale tone letter notation system is Figure 1: The command-response model for F0 contour adopted for Cantonese after Chao [2], though it varies generation with both positive and negative tone commands. The model can be formulated by the following equations: 4. Analysis of Cantonese F0 contours I ln F 0 (t ) = ln F b + ∑ A pi G p (t − T 0 i ) i =1 4.1. Speech data J + ∑ A tj {G t (t − T1 j ) − G t (t − T 2 j )}, (1) Two sets of speech material were used: Speech Material A j =1 was designed with exactly the same carrier sentences, while α 2 t exp( −α t ), t ≥ 0, Speech Material B included various meaningful sentences. ( ) (2) G p t = Speech Material A consists of carrier sentences “hon3 0, t < 0, gin3 ‘maa’ faai3 gong2 ceot7 lai4” (Speak it out quickly min[ 1 − (1 + β t ) exp( − β t ), γ ], t ≥ 0, when you see ‘maa’), in each of which the target syllable maa, G t (t ) = (3) 0, t < 0, carrying each of the nine tones, is embedded. Each sentence was uttered seven times by a native male speaker of where Gp (t) represents the impulse response function of the Cantonese (from Guangzhou) at his normal speech rate. phrase control mechanism, and Gt (t) represents the step Although in the lexicon there are no characters corresponding response function of the tone control mechanism. Fb is a to maa2 and maa3, the speaker was trained to utter pseudo baseline value. The parameters Api and T0i denote the magnitude and time of the ith phrase command respectively, words for them. Speech Material B consists of 20 declarative sentences while Atj, T1j and T2j denote the amplitude, onset time and offset time of the jth tone command respectively. The each with 5~14 syllables. Each sentence was recorded three constants α, β and γ are set at their respective default values times at normal speech rate. 3.0 (1/s), 20.0 (1/s) and 0.9 in the current study, though the The speech signal was digitized at 10 kHz with 16bit values of β and γ can be set for positive and negative tone precision. The fundamental frequency was extracted by the commands separately. modified autocorrelation analysis of the LPC residual. The A set of tone command patterns needs to be specified for syllable boundaries and rhyme boundaries were labeled by the model to fit a specific tone language. Different sets of visual inspection of the waveform and spectrogram. tone command patterns have been assigned to Mandarin and Thai respectively [1]. 4.2. Tone command patterns This model incorporates the effects of tone coarticulation, First, the F0 contours of Speech Material A were word accentuation and phrase intonation simultaneously in an approximated by the procedure of Analysis-by-Synthesis. A explicit way. Tone coarticulation is automatically taken care sample utterance for each target tone is shown in Fig. 2. The of by the transfer characteristics of the tone control crossed symbols indicate the observed F0 values, while the mechanism. Word accentuation can be implemented either by solid lines, dotted lines and dashed lines indicate the magnifying the amplitudes or by lengthening the duration of approximated F0 contours, baseline frequency and the tone commands. Phrase intonation is explicitly represented by contribution of phrase components, respectively. the phrase components. WAVEFORM WAVEFORM WAVEFORM FUNDAMENTAL FREQUENCY FUNDAMENTAL FREQUENCY FUNDAMENTAL FREQUENCY Fo(t) hon3 gin3 maa1 faai3 gong2ceot7lai4 Fo(t) hon3 gin3 maa2 faai3 gong2ceot7lai4 Fo(t) hon3 gin3 maa3 faai3 gong2ceot7lai4 180 [Hz] 180 [Hz] 180 [Hz] 140 140 140 100 100 100 60 60 60 -0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 PHRASE COMMANDS PHRASE COMMANDS PHRASE COMMANDS 1.0 Ap 1.0 Ap 1.0 Ap 0.5 0.5 0.5 0.0-0.5 0.0 0.5 1.0 1.5 0.0-0.5 0.0 0.5 1.0 1.5 0.0-0.5 0.0 0.5 1.0 1.5 TONE COMMANDS TONE COMMANDS TONE COMMANDS 1.0 At 1.0 At 1.0 At 0.5 0.5 0.5 0.0 0.0 0.0 -0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 TIME [s] TIME [s] TIME [s] WAVEFORM WAVEFORM WAVEFORM FUNDAMENTAL FREQUENCY gin3 faai3 gong2ceot7 lai4 FUNDAMENTAL FREQUENCY gong2 ceot7lai4 FUNDAMENTAL FREQUENCY Fo(t) hon3 maa4 Fo(t) hon3 gin3 maa5 faai3 Fo(t) hon3 gin3 maa6 faai3 gong2ceot7lai4 180 [Hz] 180 [Hz] 180 [Hz] 140 140 140 100 100 100 60 60 60 -0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 -0.5 0.0 0.5 1.0 1.5 PHRASE COMMANDS PHRASE COMMANDS PHRASE COMMANDS 1.0 Ap 1.0 Ap 1.0 Ap 0.5 0.5 0.5 0.0 -0.5 0.0 0.5 1.0 1.5 0.0-0.5 0.0 0.5