Emotion Algebra Reveals That Sequences of Facial Expressions Have Meaning Beyond Their Discrete Constituents
Total Page:16
File Type:pdf, Size:1020Kb
Emotion Algebra reveals that sequences of facial expressions have meaning beyond their discrete constituents Carmel Sofer1, Dan Vilenchik2, Ron Dotsch3, Galia Avidan1 1. Ben Gurion University of the Negev, Department of Psychology 2. Ben Gurion University of the Negev, Department of Communication Systems Engineering 3. Utrecht University, Department of Psychology 1 Abstract Studies of emotional facial expressions reveal agreement among observes about the meaning of six to fifteen basic static expressions. Other studies focused on the temporal evolvement, within single facial expressions. Here, we argue that people infer a larger set of emotion states than previously assumed, by taking into account sequences of different facial expressions, rather than single facial expressions. Perceivers interpreted sequences of two images, derived from eight facial expressions. We employed vector representation of emotion states, adopted from the field of Natural Language Processing and performed algebraic vector computations. We found that the interpretation participants ascribed to the sequences of facial expressions could be expressed as a weighted average of the single expressions comprising them, resulting in 32 new emotion states. Additionally, observes’ agreement about the interpretations, was expressed as a multiplication of respective expressions. Our study sheds light on the significance of facial expression sequences to perception. It offers an alternative account as to how a variety of emotion states are being perceived and a computational method to predict these states. Keywords: word embedding, emotions, facial expressions, vector representation 2 Studies of emotional facial expressions reveal agreement among observers about the meaning of basic expressions, ranging between six to fifteen emotional states (Ekman, 1994, 2016; Ekman & Cordaro, 2011; Keltner & Cordaro, 2015; Russell, 1994), creating a vocabulary conveyed during face to face interactions. Other studies which used a different approach found 21 expression categories which combine muscle movements of two expressions (e.g., a single image of sadly fearful expression combined muscle movements of sad and fear expressions; Du, Tao, & Martinez, 2014) . However, if the vocabulary of the emotion-states is indeed limited to that small number of discrete facial expressions, it would have limited people’s ability to express and perceive finer emotions and emotional states. Thus, it is conceivable that people can perceive a much wider collection of emotion states, using a combination of basic, discrete facial expressions. Such mechanisms, which expand a very small number of basic components to a much wider number of meanings, exist in other human communications modalities, such as spoken language and music. For example, in the English language – a combination of 26 letters, forms a rich language of more than 200,000 words (Merriam- Webster, 2002) and an unlimited number of sentences; More so in western music; a combination of seven notes creates endless musical compositions. Here, we argue that people infer a larger set of emotion states from facial expressions than previously assumed, by taking into account sequences of facial expressions, rather than single facial expressions. Some studies already revealed that an expression presented to participants affects their interpretation of subsequent facial expressions (Russell, 1991; Russell & Fehr, 1987; Thayer, 1980). For example, Russell (1991) found that a contempt facial 3 expression, followed by a sadness expression was interpreted as disgust. However, when a disgust expression was presented first, the contempt facial expression was perceived as a sign of sadness. Other studies which focused on the temporal evolvement within single facial expressions (lasting one to two seconds), have found that early in their evolvement, facial expressions allow for crude differentiation of expression cues, while later transmissions consist of more complex cues that support finer categorization (Jack, Garrod, & Schyns, 2014). Critically, these previous studies were constrained by focusing on effects of single facial expressions or on the effect of specific facial expressions on the interpretation of subsequent expressions, while a significant social interaction usually includes various facial expressions. As such, they might have overlooked a more general system of rules governing the interpretation of sequences of facial expressions as a whole, whereby sequences of basic facial expressions are mapped onto specific integrated interpretations of emotion states. For example, would a person displaying a surprise, followed by a smile, followed by sadness facial expressions, be perceived as surprised? happy? sad? Or alternatively the integrated interpretation of the sequence, would be helplessness, a breakdown or maybe disappointment. Several processes might affect one’s interpretation of a sequence of expressions. One possibility is that during a face to face interaction, observers accumulate the emotional information conveyed by facial expressions. Emotional information is integrated (summed), taking into account the different facial expressions and their duration. Another alternative is that weights are assigned to discrete events (“snapshots”), based on their location in the sequence, irrespective of their duration 4 (Fredrickson & Kahneman, 1993). The resulted impression would then be the weighted average of the judgments assigned to these snapshots. Studies that focused on scene perception support the latter alternative (Fredrickson & Kahneman, 1993; McDuff, El Kaliouby, Cohn, & Picard, 2015; Noë et al., 2017; Varey & Kahneman, 1992). The goal of this study is to model, using a data-driven approach, how the order of serially presented facial expressions, map onto a perceiver’s interpretation of the sequence. We adopted language modeling and feature learning methods from the field of Natural Language Processing (NLP), which rely on computational methodologies and machine learning techniques. There are various approaches used in this domain, for example, the word2vec family of methods (Mikolov, Chen, Corrado, & Dean, 2013)and the GloVe - (Global Vectors for Word Representation; Pennington, Socher, & Manning, 2014). Generally, the application of these methods, termed word embedding, maps words from a text corpus to numerical vectors. Each word in the corpus is represented as a vector in a multi-dimensional word space, enabling the usage of vector arithmetic. For example, one can perform a simple and useful "emotion algebra", by averaging observers’ interpretation of “happy” facial expression (see Figure 1). The expression may be perceived as a sign of happiness by one participant or satisfaction by another, hence represented by the vectors of the words happy or satisfaction respectively. However, averaging the judgment vectors of all participants, may reveal that on average, the “happy” expression is closest to the vector of the word excitement. By using word embedding methods, here we integrated a mathematical framework with participants’ empirical data, arriving at conclusions, which would have been hard to achieve with traditional statistical approaches. 5 We examined 3 hypotheses: (i) The meaning of sequences of emotional facial expressions is manifested by the weighted average of its discrete constituents, rather than their integration (summation). (ii) The size of the extended vocabulary of emotion expression, based on sequences of emotions, is larger than the number of their discrete constituents, and smaller than the upper bound of all possible theoretical combinations of facial expressions. (iii) Frequent sequences of facial expressions that are more likely to appear in real life (Seger, 1994) elicit greater agreement among observers compared to other, less frequent sequences. We predict that this agreement, reflected in participants’ judgment variability, can be modeled by vector arithmetics. Figure 1: A 2D (D1, D2) illustration of the averaging process of participants’ interpretation of the “happy” expression, perceived by participants as signs of happiness or satisfaction. By applying vector arithmetics, we find that, on average, participants judge the “happy” expression as a sign of excitement (dashed black line). To examine these 3 hypotheses, participants were presented with 64 sequences of two facial expressions. They were required to describe, using a single word in free text format, the “state of mind” of the person in the picture, based on all expressions in the sequence. 6 Methods Participants Forty-seven participants1 fluent in English (19 female participants) aged 20 to 70 (M= 36.65) took part in an on-line study, conducted from their homes at their own pace using Amazon’s Mechanical-Turk platform. Before the study, participants were requested to repeat in writing the instructions given to them, using their own words. Their descriptions, reviewed independently by two researchers served as an indication for their language fluency and comprehension of the task. Seven participants were excluded from the study because their language was unreasonably not fluent. Participants received a payment of 1 USD, for their participation, which lasted 27 minutes on average. Stimuli Stimuli included eight facial expressions consisted of seven most consensual expressions: anger, contempt, disgust, happiness, fear, sadness and surprise (Ekman & Friesen, 1971) complemented by a neutral expression. Images were created by digitally averaging,