Emotional Intonation in a Tone Language: Experimental Evidence from Chinese
Total Page:16
File Type:pdf, Size:1020Kb
ICPhS XVII Regular Session Hong Kong, 17-21 August 2011 EMOTIONAL INTONATION IN A TONE LANGUAGE: EXPERIMENTAL EVIDENCE FROM CHINESE Aijun Lia, Qiang Fanga & Jianwu Dangb,c aInstitute of Linguistics, Chinese Academy of Social Sciences, Beijing, China; bTianjin University, Tianjian, China; cJapan Advanced Institute of Science and Technology, Japan [email protected]; [email protected]; [email protected] ABSTRACT fixed tones of that language. It is in fact a resultant of three elements: tone or etymological tone, the neutral Chinese is a tonal language. How lexical tones and intonation, and the expressive intonation, the latter intonation interplay with each other is an interesting two together forming sentence intonation. He question. In this study, we investigated emotional emphasized that the expressive intonation depends on intonations by analyzing monosyllabic utterances the quality of the voice, unusual degrees of stress (or from two speakers. We found that the tonal space, the weakness), general pitch of the whole phrase and edge tone and the duration differ greatly across 7 tempo of speech [1]. emotions, and that different speakers showed Chao [1] distinguished at least two types of tone consistent production patterns. The speakers and intonation addition patterns: simultaneous addition expressed „Disgust‟ or „Angry‟ by using a kind of and successive addition. The simultaneous addition „Falling‟ successive addition tone, and „Happy‟ or refers to the tones that are the algebraic sums or the „Surprise‟ by a kind of „Rising‟ successive addition resultants of two factors: the original lexical tone and tone, as pointed out by Chao [1]. The function of the the sentence intonation proper. The successive successive addition boundary tone is to express the addition refers to the clause that has a rising or falling speaker‟s emotion rather than linguistic information. intonation, which is not added simultaneously to the We show that it is more reasonable to use both last syllables but added on successively after the traditional boundary tone features (H% or L%) and lexical tones are completed. He described the the successive addition tone features (r, f or le) to successive addition rising ending (↗) and the falling describe boundary tones of emotional intonation. For ending (↘) with the following formulae: instance, „H-f%‟ stands for a high boundary tone but with a falling successive addition tone. T1: ↗55:= 56: ↘55:=551: T2: ↗35:= 36: ↘35:= 351: Keywords: emotion, Chinese, intonation, successive T3: ↗214:= 216: ↘214:= 2141: addition tone, boundary tone T4: ↗51:= 513: ↘51:= 5121: 1. INTRODUCTION Where T1~T4 are four lexical tones, 1~5 are tonal The intonation of tonal languages like Chinese is an values in 5 tone-letter system [3], 6 represents extra autosegmental element independent of the lexical high pitch. Numbers on the left of „:=‟ are the lexical tone element, although these two elements are tone values and the numbers on the right are the expressed by the same F0 curve. However, the F0 addition tone values. The most interesting effect is curve conveys both the linguistic and paralinguistic shown with the Qusheng (T4), resulting in a functions [4, 5, 13, 14, 16]. Hence, to understand the circumflex tone. encoding mechanism of expressive speech in speech Chao even enumerated 40 intonation patterns to communication, we examined the interplay between demonstrate the forms and the functions of the tone and intonation in Chinese emotional speech. intonation by grouping them according to Xu, et al. [16, 17] suggested that emotional pitch/duration elements, voice quality and intensity meanings are encoded along a set of benefit-oriented elements [1, 2]. bio-informational dimensions which involve both The successive addition tone has been found in segmental and prosodic aspects of the vocal signal. expressive speech to signal the emotion-attitudinal Xu et al. further argued that the bio-informational messages by Mueller-Liu [11], and also has been dimensions allow emotional meanings to be encoded found in the intonational questions to signal the in parallel with non-emotional meanings; thus there interrogative mood [10]. is unlikely to be an autonomous affective prosody. In our study, we differentiate the expressive Chao is one of the pioneers who studied Chinese speech into emotional speech and attitudinal speech. emotional/expressive intonation. He proposed that In previous research, based on Wu Zongji‟s the actual melody or pitch movement of a tonal intonation theory [13, 14], we found that speakers use language differs from the mere succession of the few boundary tone H% to express happy and friendly 1198 ICPhS XVII Regular Session Hong Kong, 17-21 August 2011 speech [8, 12], and that sentence stress shifts a lot register, „Neutral‟ and „Fear‟ have medial band across the observed emotions [6]. intonation in medial register, while „Angry‟ and Our previous studies support the „bio- „Happy‟ have broader band intonation in higher informational dimensions‟ theory proposed by Xu register. For the female speaker, the F0 values are [17] and the expressive intonation elements almost the same as those of the male speaker except suggested by Chao [1]. In the present paper, we focus for „Angry‟, which is expressed with a reduced pitch on the emotional intonation by analyzing the F0 and range in medial register. duration patterns of monosyllabic utterances in seven Figure 1: F0 of 7 emotional utterances with 4 tones emotions. The boundary or edge tones were in 5-tone sale (female: left column; male: right examined to demonstrate the tone and intonation column). addition pattern in emotional speech. Four tones of 'Nuetral' Four tones of 'Neutral' 5 5 T1 T1 4.5 T2 4.5 T2 T3 T3 4 T4 4 T4 2. MATERIALS AND RECORDING 3.5 3.5 3 3 2.5 2.5 2 2 F0 (5 F0 letter (5 tone) F0 (5 F0 letter (5 tone) Data used here were from our emotional speech 1.5 1.5 1 1 corpus. A set of 111 sentences with varying lengths 0.5 0.5 0 0 0 2 4 6 8 10 0 2 4 6 8 10 (from 1 to 14 syllables), sentence types (narrative or Four tones of 'Surprise' Four tones of 'Surprise' 5 5 T1 T1 interrogative) and structures were recorded. The 4.5 T2 4.5 T2 T3 T3 4 T4 4 T4 contents of all these sentences were emotionally 3.5 3.5 3 3 2.5 2.5 „neutral‟. The monosyllabic sentences covered the 2 2 F0 (5 F0 letter (5 tone) F0 (5 F0 letter (5 tone) 1.5 1.5 combinations of 4 tones and all the vowels. A male 1 1 0.5 0.5 0 0 and a female professional voice actors were recruited 0 2 4 6 8 10 0 2 4 6 8 10 Four tones of 'Angry' Four tones of 'Angry' 5 5 to produce the sentences in 7 kinds of emotions: T1 T1 4.5 T2 4.5 T2 T3 T3 Disgust(D), Sad(S), Angry(A), Happy(H), 4 T4 4 T4 3.5 3.5 3 3 Surprise(SU), Fear(F) and Neutral(N). Sampling rate 2.5 2.5 2 2 F0 (5 F0 letter (5 tone) F0 (5 F0 letter (5 tone) and resolution were 16KHz and 16bits. 1.5 1.5 1 1 To clearly illustrate the tone and intonation 0.5 0.5 0 0 0 2 4 6 8 10 0 2 4 6 8 10 addition patterns in the present study, we chose 36 Four tones of 'Happy' Four tones of 'Happy' 5 5 T1 T1 4.5 T2 4.5 T2 T3 T3 monosyllabic sentences coving 9 mandarin vowels 4 T4 4 T4 3.5 3.5 3 and 4 lexical tones for each speaker, altogether 3 2.5 2.5 2 2 F0 letter (5 tone) 9syllables × 7emotions × 4tones × 2 speakers = 504 F0 letter (5 tone) 1.5 1.5 1 1 monosyllabic emotional utterances. All the F0 and 0.5 0.5 0 0 2 4 6 8 10 0 0 2 4 6 8 10 duration data of these utterances were extracted by Four tones of 'Sad' Four tones of 'Sad' 5 5 using Praat and manual checking. T1 T1 4.5 T2 4.5 T2 T3 T3 F0 values of the tonal bearing part of each syllable 4 T4 4 T4 3.5 3.5 3 3 were normalized into 10 points. Then, they were 2.5 2.5 2 2 F0 (5 F0 letter (5 tone) F0 (5 F0 letter (5 tone) transformed into Semitone scale with the reference 1.5 1.5 1 1 0.5 frequency of 75Hz. Finally, all the F0 values were 0.5 0 0 2 4 6 8 10 0 mapped into a 5-tone value space [3, 13, 14]. 0 2 4 6 8 10 Four tones of 'Fear' Four tones of 'Fear' 5 5 T1 T1 4.5 T2 4.5 T2 T3 T3 4 T4 4 T4 3. FUNDAMENTAL FREQUENCY 3.5 3.5 3 3 2.5 2.5 2 2 F0 (5 F0 letter (5 tone) F0 (5 F0 letter (5 tone) Figure 1 shows the F0 patterns of 7 emotional 1.5 1.5 1 1 0.5 utterances in 5 letter tone space. Compared to neutral 0.5 0 0 2 4 6 8 10 0 emotion, the tonal pattern varies greatly among 7 0 2 4 6 8 10 Four tones of 'Disgust' Four tones of 'Disgust' 5 5 T1 T1 4.5 T2 4.5 T2 emotions in aspects of tonal range, tonal register and T3 T3 4 T4 4 T4 tonal contours. The results show that the tone 3.5 3.5 3 3 2.5 2.5 patterns are consistent within the two speakers for all 2 2 F0 (5 F0 letter (5 tone) F0 (5 F0 letter (5 tone) 1.5 1.5 of emotions, except “Anger” where the male speaker 1 1 0.5 0.5 0 0 has a much shrunken pattern in amplitude than the 0 2 4 6 8 10 0 2 4 6 8 10 female speaker.