<<

INTERSPEECH 2012 ISCA Archive ISCA's 13th Annual Conference http://www.isca-speech.org/archive Portland, OR, USA September 9-13, 2012

Smile with a smile

Hugo Quene,´ Will Schuerman

Utrecht institute of Linguistics OTS, Utrecht University, Utrecht, the Netherlands [email protected], [email protected]

Abstract results in motor activation of the muscles involved in the corresponding facial affective expressions, viz. smiles Smiling during talking yields speech with higher for- and frowns [8]. Moreover, spoken words are perceived mants, and hence larger formant dispersion. Previous slower if the word’s phonetic form (formant dispersion, studies have shown that motor during percep- indicating smile or frown) is inconsistent with the word’s tion of words related to smiling can activate muscles affective valence [4], which again indicates that affective responsible for the smiling action. If word perception resonance contributes to comprehension of spoken lan- causes smiling activation for such smile-related words, guage. then this motor resonance may occur also during produc- If these affective and motor resonance processes do tion, resulting in larger formant dispersion in these smile- indeed contribute to comprehension, as the evidence dis- related words. This paper reports on formant measure- cussed above suggests, then the same resonance pro- ments from tokens of the Corpus of Spoken Dutch. For- cesses are also expected to contribute to speech produc- mant values of smile-related word tokens were compared tion, where motor activation is essential. Thus one might to semantically different but phonetically similar word to- expect that affective words tend to be produced with kens. Results suggest that formant dispersion is indeed a consistent facial expression, because the articulatory larger in smile-related words than in control words, al- and affective properties of a word are integrated during though the predicted difference was observed only for fe- speech production. Specifically, the word smile may tend male speakers. These findings suggest that motor reso- to be produced with a congruent smile, the word frown nance originating from a word’s meaning may affect the with a frown, etc. articulatory and acoustic realization of affective spoken Of course, the affective resonance hypothesis does words. Female speakers tend to produce the word smile not predict that speakers are forced to mimic these af- with a smile. fective facial gestures whenever they produce an affec- Index Terms: smiling; affect; motor resonance; for- tive word. The facial gesture may be blocked by volun- mants; formant dispersion; tary control (e.g. if the speaker wants to keep a neutral facial expression), and/or by semantic distance between 10.21437/Interspeech.2012-183 1. Introduction the speaker and the agent, e.g. by third-person perspec- Smiling during talking effectively reduces the length of tive and/or negation, as in the utterance he did not smile. the vocal tract [1]. Speech produced with a smile has a Nevertheless, given a sufficiently large speech corpus, we higher pitch, as well as higher formants and larger dis- predict that smile-related words will tend to have larger persion between formants [2, 3], relative to speech pro- formant dispersion, relative to phonetically similar con- duced with a neutral facial expression. Listeners can hear trol words, due to (micro-)activation of the muscles in- whether speech is spoken with a smile [2, 3], basing their volved in smiling gestures [8], and the consequent acous- judgements on F0 and on formant dispersion [4]. tic effects of lip spreading [1, 2, 3]. This activation pre- This study is grounded in the hypothesis that linguis- sumably results from motor resonance between the smile- tic and affective processing are not separate but interact- related word and the affective smiling gesture. ing components of spoken processing. With re- gard to , it has been argued that per- 2. Methods ceiving speech is perceiving articulatory gestures [5, 6]. 2.1. Materials In addition, language comprehension involves motor res- onance of the actions being described in the (linguistic) Tokens of spoken words were selected from the Corpus utterance, such as turning or pushing or pulling [7, 8, 9]. of Spoken Dutch [16], which is available with an ortho- With regard to affect perception, it has similarly been graphic transcription tier for the entire corpus. Tokens argued that perceiving affective gestures requires motor of smile-related words were found by searching for the simulation of these gestures [10, 11, 12, 13, 14, 15]. string glimlach* (meaning “smile”) in the orthogra- Reading printed affective words (e.g. honest, dangerous) phy tier. Because of possible differences in quality

INTERSPEECH 2012 603 between the variants of Dutch spoken in the Netherlands female speakers male speakers and Belgium (Flanders), only the Netherlands Dutch part

of the corpus was used. The search yielded 167 word 3 3 tokens with a meaning related to smiling, both nouns

and verbs, in various inflection forms (58 tokens of glim- 4 4 lach /xlImlAx/ “smile”, 58 glimlachte, 16 glimlachen, 15

glimlacht, and 20 tokens of 6 other forms). Lexical stress 5 5 falls on the initial syllable in all these word forms. F1 (Bark)

For comparison, words were also selected that have 6 6 a similar phoneme sequence /C lIm/ in their stressed syl- lable, but whose meaning is unrelated to smiling. Rele- 7 7 vant control words were glim /xlIm/ “glimmer, gleam”, 15 14 13 12 11 15 14 13 12 11 klim /klIm/ “climb”, glimp /xlImp/ ”glimpse”, and their F2 (Bark) F2 (Bark) derivates. In total 62 word tokens were selected in this word category from the same sub-corpus (41 tokens of 5 Figure 1: Formant of F and F in Bark forms of glim, 3 tokens of glimp, 14 tokens of 4 forms 2 1 units, in /I/ vowel tokens, broken down by speaker sex of klim, 2 tokens of the proper name Glimmerveen, and 2 and by semantic category (filled symbols: smile-related, hapax forms). open symbols: control). The 229 selected word tokens had been realized by 136 different speakers (73 female, 63 male). Only 13 speakers (9 female, 4 male) contributed tokens of both 2.2. Formant analysis smile-related and control words, and this involved only Formants were measured in the nuclear vowel /I/ in each 36 tokens. Table 1 shows how the numbers of tokens word token, using Praat 4.6.36 [17] (Burg method). Be- are distributed over speakers. For example, there were cause of the large variation in the corpus (e.g. overlap 2 female and 2 male speakers who each contributed ex- with speech from other speakers), it was not possible to actly 1 control token and 1 smile-related token to the sam- use the same time point for all measurements. The pre- ple; there were also 2 other male speakers who each con- ferred time point of measurement within the vowel was tributed 3 control tokens and no smile-related tokens, etc. (in decreasing order of preference) (a) the temporal mid- Of the 167 smile-related tokens, 100 were from fe- point of the vowel, (b) the point where F1 is highest, (c) male speakers and only 67 were from male speakers, but the point of maximal intensity, after [18]. For each vowel this difference was not significant [-weighted the frequencies of formants F1, F2 and F3 were measured goodness-of-fit, χ2(1) = 2.604, p = .107]. and converted to Bark units ([19] formula 6).

3. Results Table 1: Numbers of speakers, broken down by sex and by The observed values for F1 and F2 are displayed in Fig- numbers of contributed control tokens (rows) and smile- ure 1, broken down by speaker’s sex and by the semantic related tokens (columns). Boldface numbers indicate category of each word token. Formant frequencies of the speakers contributing tokens in both categories. /I/ vowel show considerable overlap between the smile- sex nr of control nr of smile-related tokens related words (filled symbols) and control words (open tokens 0 1 2 3 4 8 symbols), in particular for male speakers. F 0 31 7 4 5 1 The primary dependent measure however was the 1 13 2 4 1 amount of formant dispersion, or difference F2 F1 − 2 3 2 in Bark units, measured in the nuclear /I/ vowel of 3 each word token. The data were analyzed by means M 0 22 10 4 2 of mixed-effects regression, with speakers and tokens- 1 17 2 1 within-speakers as nested random effects [20]. Fixed ef- 2 2 1 fects were the semantic category of each word (smile- 3 2 related vs. control), and the sex of the speaker. The re- sulting optimal model is summarized in Table 2. (This optimal model did not perform worse than “random Each selected word token was excerpted from the cor- slopes” models in which the fixed effects were also al- pus and stored as a separate audio file. Speaker identifier lowed to vary between speakers and/or between tokens, code and sex were also retrieved (from the corpus meta- according to likelihood ratio tests.) The resulting model data) for each selected word token. confirms that (in control word tokens) the formant dis-

INTERSPEECH 2012 604 INTERSPEECH 2012 iue2: Figure atcctgr side bevd h antd of magnitude The observed. indeed is category mantic speaker. per tokens more to- also more preferably only but not replicate kens, yielding to corpora recommended larger using is study it large this and insufficiently study, be present may the Dutch for Spoken of part of Netherlands Corpus The the 1). Table of (see number speaker low per very tokens the and only speakers is between large variability difference the to sex due probably The however, significant, marginally words. smile-related and dif- the words control This for the for length. both observed, tract indeed speakers, vocal is ference in female differences (adult) sexual for for to larger due than be speakers to expected male is (adult) dispersion formant priori, A re- sexes. are the effects between dispersion the formant of in none that (all small significant motely so is sample This (N to tokens categories contributed semantic who speakers analy- both 13 the the then to only, limited is speakers sis within compared be to are 2). Figure (see effect interaction significant words, control a in yielding and For words smile-related in equal speakers. approxi- mately is female dispersion control formant for contrast, by for only speakers, male but than predicted, words re- is as smile-related dispersion The words, formant for that larger males. confirms indeed and also females model vo- between sulting in length differences of tract effects cal expected (p the females for confirms than This males for smaller is persion median. box’s the corre- of intervals approxi- width confidence indicate 95% Box notches mate tokens; of number words). with smile-related sponds lighter (con- vs category (F: semantic words by sex trol and speaker boxes) darker by M: down boxes, broken tokens, vowel F2-F1 dispersion (Bark) o eaesekr,tepeitdmi feto se- of effect main predicted the speakers, female For token word the of category semantic of effects the If 6 7 8 9 10 oposo omn dispersion formant of Boxplots female speakers ctrl * .Discussion 4. < t smile ..,ntee h difference the even not n.s.), 1, 36 = ctrl male speakers oes e al 1). Table see tokens, F smile 2 − F = 1 .057). in /I/ neutral smiley hre oa rc ntr ugssasalrbd size, body smaller a suggests turn seemingly in the this tract tract; that vocal vocal shorter short suggest relatively properties a has acoustic for- speaker higher these For in 3]. listener, [2, results formants the spreading) between dispersion lip larger 21, and [1, mants (i.e. Code Smiling Frequency so-called 22]. the from understood word smile-related tokens. further, for somewhat dispersion formant spreading larger female lip yielding the tokens, the word increased smile-related spread- speakers of lip case of speakers the amount in female moderate ing; a the only tokens, used have word to control seem the of spreading. case (moderate) In with i.e. [23], rounding lip the without in measured was dispersion /I/ formant that given able, in and study, [4]. present perception the speech in shown as production, in speech both occurs non- resonance affective a the in fashion, resonate directional processes spoken the As during processing. interacting linguis- language are that processes viz. affective study, and this tic in hypothesis main while the more firms smile smile-related speakers for female saying dispersion that amount larger suggests the the words dis- to then formant related smiling, If monotonically of tokens. indeed word is control persion their to- in word smile-related than in kens dispersion than larger speak- a female dispersion have and also larger tokens, Ta-ers word a in control show in summarized speakers speakers analysis male Female the in 2). if into done ble effect, taken is sex are (as the speakers account than between smaller differences somewhat individual is effect this ICC estimate. the con- of 95% interval with units, fidence Bark of deviations are standard parameters in random given of Estimates on speakers. female based words control by to levels with refers intercept units significance The simulations. Bark with MCMC in and given errors are standard parameters fixed of mates dispersion formant 2: Table oe ny nDthti oe stpclyproduced typically is vowel this Dutch In only. vowel h rgno mln sa fetv etr a be may gesture affective an as smiling of origin The remark- is smiling of effect phonetic observed The oe .8 08,100 229 N 1.030) (0.84, 136 0.680 0.277) (0.00, C.I. 95% .0264 .0032 0.651 .0566 estimate p .0001 0.28 0.19 token 0.25 speaker 0.17 s.e. effect -0.50 part 0.36 random -0.52 (interaction) estimate 8.99 Categ.Smile Sex.Male (intercept) effect part fixed 0.48 = smile siae aaeeso ie-fet oe of model mixed-effects of parameters Estimated hnwiesyn oto od.Ti con- This words. control saying while than F 2 − F 1 in N /I/ 229 = oe oes Esti- tokens. vowel intra-speaker ; 605 corresponding with a less threatening and more friendly and whisper registers,” J. Acoust. Soc. America, 96:2101–2107, attitude. Speakers seem to be aware that lip spread- 1994. ing, and consequent larger formant dispersion, conveys [3] A. Drahota, A. Costall and V. Reddy, “The vocal communication of different kinds of smile,” Speech Commun., 50(4):278–287. a smaller body size. When a sample of speakers was in- [4] H. Quene,´ G.R. Semin and F. Foroni, “Audible smiles and frowns structed to “feminine”, about half of the female and affect speech comprehension,” Speech Commun., 54:917-922, half of the male speakers reported that they spread their 2012. lips to do so [24]. [5] A.M. Liberman and I.G. Mattingly, “The motor theory of speech In human social interaction, however, smiles may perception revised,” Cognition, 21:1–36, 1985. carry multiple meanings. A smile can be an expression of [6] B. Galantucci, C.A. Fowler, and M.T. Turvey, “The motor the- ory of speech perception reviewed,” Psychonomic Bull. and Rev., a friendly attitude, but it can also express joy and/or con- 13:361–377, 2006. fidence and/or dominance [15]. Interestingly, males and [7] R.A. Zwaan and L.J. Taylor, “Seeing, acting, understanding: Mo- females seem to differ in the communicative function of tor resonance in language comprehension,” J. Exper. Psychology: the produced smile: according to [25], female smiles tend General, 135:1–11, 2006. to convey social warmth and friendliness, whereas male [8] F. Foroni and G.R. Semin, “Language that puts you in touch with your bodily feelings: The multimodal responsiveness of affective smiles tend to convey self-confidence and lack of distress. expressions,” Psychol. Science, 20:974–980, 2009. This sex difference might explain the observed interac- [9] B. Tomasino, P.H. Weiss and G.R. Fink, “To move or not to move: tion pattern, where female speakers tend to produce (the imperatives modulate action-related verb processing in the motor Dutch equivalent of) the word smile with a smile, whereas system,” Neuroscience, 169:246–258, 2010. there is no such effect for male speakers. In other words, [10] J.K. Hietanen, V. Surakka, V., and I. Linnankoski, “Facial female speakers tend to also execute the affective gesture electromyographic responses to vocal affect expressions,” Psy- chophysiology, 35:530–536, 1998. that they verbally describe, whereas male speakers do not [11] U. Dimberg, M. Thunberg, and K. Elmehed, “Unconscious fa- exhibit such affective motor resonance. Thus the reso- cial reactions to emotional facial expressions,” Psychol. Science, nance patterns seem to differ between male and female 11(1):86–89, 2000. speakers. Obviously, further research is necessary to con- [12] V. Gallese, “The roots of empathy: The shared manifold hypoth- firm these possible sexual differences in affective reso- esis and the neural basis of intersubjectivity,” Psychopathology, 36:171–180, 2003. nance between verbal meanings and affective gestures. [13] P.M. Niedenthal, “Embodying emotion,” Science, In the future we plan to expand this study with other 316(5827):1002–1005, 2007. affective verbal expressions for which phonetic effects [14] V. Gallese, “Motor abstraction: a neuroscientific account of how might be expected, such as lip rounding co-occurring action goals and intentions are mapped and understood,” Psychol. with saying frown, and we plan to expand it to other lan- Research, 73:486–498, 2009. guages for which large speech corpora are available. [15] P.M. Niedenthal, M. Mermillod, M. Maringer and U. Hess, “The Simulation of Smiles (SIMS) model: Embodied simulation and the meaning of facial expression,” Behav. and Brain Sc., 33:417– 5. Conclusion 433, 2010. [16] N. Oostdijk, “The Spoken Dutch Corpus: Overview and In conclusion, the present results indicate that for female first evaluation”. In Proc. LREC-2000, II:887–894. Available: speakers, the verbal expression of an affective action, i.e. http://www.lrec-conf.org/proceedings/lrec2000/pdf/110.pdf saying the word smile, results in larger formant disper- [17] Boersma, P., and Weenink, D., Praat: Doing by com- puter, 2007. Online: http://www.praat.org sion, which indicates a corresponding smiling action dur- [18] Van Son, R.J.J.H., and Pols, L.C.W., “Formant frequencies of ing speech production. Female speakers, but not male Dutch in a text, read at normal and fast rate”, J. Acoust. speakers, tend to say the word smile with a smile, pre- Soc. America, 88:1683–1693, 1990. sumably because of affective motor resonance between [19] Traunmuller,¨ H., “Analytical expressions for the tonotopic sensory the smiling gesture being described and being executed. scale”, J. Acoust. Soc. America, 88:97–100, 1990. [20] H. Quene´ and H. van den Bergh, “On multi-level modeling of data from repeated measures designs: A tutorial,” Speech Commun., 6. Acknowledgements 43:103–121, 2004. [21] J.J. Ohala, “Cross-language use of pitch: an ethological view,” The authors wish to express their sincere thanks to Bob Phonetica, 40:1–18, 1983. Port, Gun¨ Semin, Francesco Foroni and Karin Wagenaar [22] Y. Xu and A. Kelly, “Perception of anger and happiness for ideas and discussions, to Sander van der Harst for from resynthesized speech with size-related manipulations,” pre- technical assistance, and to Dagmar van der Voordt for sented at Speech Prosody 2010, Chicago, 2010. Available: http://www.isca-speech.org/archive/sp2010/papers/sp10 027.pdf corpus metadata analyses. [23] G. Booij, The Phonology of Dutch. Oxford: Oxford Univ. Press, 1995. 7. References [24] V. Cartei, H.W. Cowles, and D. Reby, “Spontaneous voice gender [1] J.J. Ohala, “The acoustic origin of the smile,” imitation abilities in adult speakers,” PLoS ONE, 7:e31353, 2012. J. Acoust. Soc. America, 1980. Available: [25] S. Vazire, L.P. Naumann, P.J. Rentfrow and S.D. Gosling, “Smil- http://linguistics.berkeley.edu/˜ohala/papers/smile.pdf ing reflects different emotions in men and women,” Behav. and Brain Sc., 32:403–405, 2009. [2] V.C. Tartter and D. Braun, “Hearing smiles and frowns in normal

INTERSPEECH 2012 606