Lecture 22: Affect Recognition from the Voice
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 22: Affect Recognition from the Voice CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Affective computing in the news CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Takeaway from today ▪ Recognizing emotion from voice is hard – “Artifacts” can undermine recognition accuracy – Like face, context can be crucial ▪ Many tools confound perceived emotion with felt emotion A few seconds of speech are enough to determine the emotional state of the caller ▪ But voice stronger association (than face) with physiology CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Review ▪ The challenge of variance – Within-person: Same person can show considerable variability – Across people: Same expression manifest in very different ways across people – Across contexts: Lighting, motion, social context CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Faces communicate far more than affect ▪ Age ▪ Race ▪ Gender ▪ Nationality CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Faces communicate far more than affect What about voice? ▪ Age ▪ Race ▪ Gender ▪ Nationality ▪ Language ▪ Dialect – African-American vernacular ▪ Accent – Texan v. Georgian ▪ Intelligence? CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Voices communicate far more than affect ▪ If statement difficult to process, less likely to be judged true and compelling – Even if difficulty from incidental features – Because of accent of speaker (Lev-Ari & Keysar, 2010) – Ease the name of source can be pronounced (Newman et al., 2014) ▪ If statement difficult to process, less likely to be judged true and compelling ▪ Even if difficulty arises from factors irrelevant to content of speech – Because of accent of speaker (Lev-Ari & Keysar, 2010) – Ease the name of source can be pronounced (Newman et al., 2014) CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Voices communicate far more than affect ▪ Study: gave participants science presentations – Conference talks; radio interviews from NPR Science Friday ▪ Manipulated audio quality – Good vs. Low audio quality (like what you might notice on Zoom or Skype) Newman, E.J., & Schwarz, N. (2018). Good sound, good research: How audio quality influences perceptions of the researcher and research. Science Communication, 40(2), 246–257. CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Review: Cultural influences on judgment Ellsworth & Peng 1997 CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Example: Cultural influences on judgment ▪ People explained instructions by racially ambiguous character; American or Chinese Accent (identical appearance and gestures) – 2(Acccent) x 2(Native- vs. Chinese-American) Study ▪ Give “Fish Task” (measure of collectivist tendencies) Chinese Accent Bi-culturals more Chinese mono-culturals more American Dehghani, M., Khooshabeh, P., Huang, L., Nazarian, A. & Gratch J. (2012). Using Accent to Induce Cultural Frame-Switching. In the Proceedings of the 34th Annual Conference of the Cognitive Science Society CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Example: Customer service C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Transactions on Speech and Audio Processing, 2005 CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Example: Depression detection CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Voice has advantages Not impacted by lighting Harder to regulate / mask? Less influence of head orientation CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Problems recognizing “in the wild” Ambient Noise Wind, Breath, Movement Reverberation CSCI 534(Affective Computing) – Lecture by Jonathan Gratch CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Solutions: audio-visual source separation Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Just as with the face, need to control for this individual variability when recognizing affect or emotion CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Recognizing affect in speech Is person (or group) depressed? Is this a nice Is this an angry sentence? person? (Scherer 2005) Personality prediction e.g., does person like this product? CSCI 534(Affective Computing) – Lecture by Jonathan Gratch CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Adapted from Dan Jurafsky CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Adapted from Dan Jurafsky CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Adapted from Dan Jurafsky Expression of Emotion ▪ Categorical labels ▪ Anger, happiness, sadness, neutral ▪ Dimensional or attribute based labels ▪ Valence (negative vs positive) ▪ Arousal (calm vs active) ▪ More accurate emotion descriptors (intensity) ✦Sample 1: [fru; ()] [ang; ()] [neu; ()] ✦Sample 2: [fru; ()] [oth; (exasperated)] [neu; ()] ✦Sample 3: [ang; ()] [ang; ()] [ang; ()] 22 Adapted from Carlos Busso 2 3 Anatomy of speech production ▪ To consider how emotion shapes speech, useful to consider how speech is produced CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Slide adapted from Danny Bone (SAIL) 2 4 Anatomy of speech production CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Slide adapted from Danny Bone (SAIL) 2 5 Anatomy of speech production Fundamental frequency, (f0) is number of glottal cycles that occur per second This frequency, over some interval, is perceived as pitch CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Slide adapted from Danny Bone (SAIL) 2 6 Anatomy of speech production CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Slide adapted from Danny Bone (SAIL) 2 7 Anatomy of speech production CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Video from SAIL lab Speech and Physiology ▪ Speech production engages wide range of physiological systems ▪ These systems impacted by emotion – Sympathetic activation increases respiration rate, muscle tension, saliva production ▪ Many of these systems under involuntary control ▪ Thus, aspects of speech could serve as “honest signal” of physiological processes associated with emotion CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Determines the individual sounds of speech Resonance Sound out Larynx vibrates in Air CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Determines overall quality of speech Resonance Larynx vibrates in Air CSCI 534(Affective Computing) – Lecture by Jonathan Gratch How is this impacted by emotion? Shape changes ▪ Arousal ▪ Short-term Stress Resonance Sound out ▪ Congestion ▪ Inflammation Why would emotion impact congestion / inflammation? Larynx vibrates in Air CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Hormones ▪ Hormones create structural changes – Testosterone changes vocal tract through development ▪ Holds across species – Vocal pitch correlated with testosterone level in Giant Pandas – What did we call this type of signal? honest signal CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Review Parasympathetic Sympathetic (conserves energy; (mobilizes & expends energy; undertakes ‘housekeeping’) prepares for fight or flight) Challenge Stephen Porges Boredom Engagement Jim Blascovich (arousal) Regulation Threat Joe Tamaka Julian Thayer CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Vocal markers of arousal? Engagement (arousal) CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Vocal markers of arousal? ▪ Clinical interviews – Taped clinical interactions between physicians and patients in follow- up consultations about cancer diagnosis – Measured skin conductance (indicator of sympathetic activation) – Vocal jitter (aspect of voice quality perceived as hoarseness) increased during and immediately after skin conductance increases – Voice unsteadiness (slope and standard deviation of fundamental frequency) associated with changes in skin conductance Postma-Nilsenová, Holt, Heyn, Groeneveld, Finset, A case study of vocal features associated with galvanic skin response to stressors in a clinical interaction, Patient Educ. Couns. 99 (2016) CSCI 534(Affective Computing) – Lecture by Jonathan Gratch (CO) (CO) CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Measure Send a low impedance to this voltage electrical current through current through inner bands outer bands (ICG) Measure heart electrical activity (ECG) As blood volume increases, impedance decreases CSCI 534(Affective Computing) – Lecture by Jonathan Gratch 37 Slide courtesy of Jessica Cornick Vocal markers of physiological threat? ▪ Neubauer et al., 2017 – Examined if there were vocal indicators of challenge threat in a “bomb disposal” task – Couldn’t find marker of threat response – But cardiac output was found to significantly predict f0 and peakSlope in several trials. – Conclusion: vocal and physiological features are indeed strongly related and that one modality could be used to estimate the other in certain contexts. ▪ But this area underexplored Neubauer, et al. The relationship between task-induced stress, vocal changes, and physiological state during a dyadic team task. ICMI 2017 CSCI 534(Affective Computing) – Lecture by Jonathan Gratch Vocal markers of physiology? ▪ Such results promising but area underexplored – Meta-analysis suggests the results highly variable and could depend on subtle aspects of social context ▪ More typical to take a “computer science” approach – Ignore theory