[Version for pre-print view only; revised in January 2021] Chen et al., 2021
Facial expressions dynamically decouple the transmission of emotion categories and intensity over time Chaona Chen1, Daniel S. Messinger2, Cheng Chen3, Hongmei Yan4, Yaocong Duan1, Robin A. A. Ince5, Oliver G. B. Garrod5, Philippe G. Schyns1,5, & Rachael E. Jack1,5
1School of Psychology, University of Glasgow, Scotland, UK 2Department of Psychology, University of Miami, Florida, USA 3Foreign Language Department, Teaching Center for General Courses, Chengdu Medical College, Chengdu, China 4The MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China 5Institute of Neuroscience and Psychology, University of Glasgow, Scotland, UK
Abstract. Facial expressions dynamically transmit information-rich social messages. How they achieve this complex signalling task remains unknown. Here we identified, in two cultures – East Asian and Western European – the specific face movements that transmit two key signalling elements – emotion categories (e.g., ‘happy’) and intensity (e.g., ‘very intense’) – in basic and complex emotions. Using a data-driven approach and information- theoretic analyses, we identified in the six basic emotions (e.g., happy, fear, sad) – the specific face movements that transmit the emotion category (classifiers), intensity (intensifiers), or both (classifier+intensifier) to each of 60 participants in each culture. We validated these results in a broader set of complex emotions (e.g., excited, shame). Cross- cultural comparisons revealed cultural similarities (e.g., eye whites as intensifiers) and differences (e.g., mouth gaping). Further, in both cultures, classifier and intensifier face movements are temporally distinct. Our results reveal that facial expressions transmit complex emotion messages by cascading information over time.
One Sentence Summary. Facial expressions of emotion universally transmit multiplexed emotion information using specific face movements that signal emotion categories and intensity in a temporally structured manner over time.
1 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
Social communication is essential for the survival of most species because it provides important information about the internal states1 and behavioural intentions2 of others. Across the animal kingdom, social communication is often achieved using non-verbal signalling such as facial expressions3-6. For example, when smiling retracts the corners of the lips, this facial movement is often readily perceived as a sign of happiness or appeasement in humans, apes, and dogs7-9. Facial expressions can also convey additional important information such as emotional intensity – for example, contentment to cheerful to delighted and ecstatic – each of which can also signal affiliation and social bonding or reward and joy10-12. Across human cultures, the intensity of expressed emotion can also lead to different social inferences – for example, in Western European cultures broad smiling is often associated with positive traits such as competence and leadership. In contrast, in Eastern cultures such as Russia and China where milder expressions are favoured, broad smiles are often associated with negative traits such as low intelligence13 or high dominance14. Therefore, facial expressions are a powerful tool for social communication because they can transmit information-rich social messages, such as emotion categories and their intensities, that inform and shape subsequent social perceptions and interactions15-20. However, how facial expressions achieve this complex signalling task remains unknown – that is, which specific components of facial expression signals transmit the key elements of a social message: its category and intensity.
Here, we address this question by studying the communicative functions and adaptive significance of human facial expressions of emotion from the perspective of theories of communication (see Fig. 1). These theories posit that signals are designed to serve several main purposes, two of which are particularly important for social communication. The first main purpose is ‘classifying,’ which enables the receiver to recognize a particular emotion category. For example, smiles are typically associated with states of happiness and positive
2 [Version for pre-print view only; revised in January 2021] Chen et al., 2021 affect. The second main purpose is ‘intensification,’ where specific modulations of a signal – such as variations in amplitude, size, duration, or repetition rate – enhances the signal salience, quickly draws the receiver’s attention, and communicates the magnitude of social message. For example, larger, higher amplitude signals are detectable from longer distances21, and signals with long durations or high repetition rates can easily draw the attention of otherwise distracted receivers22,23 thereby enabling them to focus on analysing the signal in more detail24-26 which may be particularly important is cases of threat. Although certain signals might serve to communicate either the emotion category or its intensity, some might play a dual role, particularly for emotions that require efficient signalling to elicit rapid responses from others, such as surprise, fear, disgust, or anger. We study these communicative functions in two distinct cultures – East Asian and Western European – each with known differences in perceiving facial expressions27,28, to derive a culturally informed understanding of facial expression communication29,30. Fig. 1 illustrates the logic of our hypothesis as a Venn diagram, where each colour represents a different communicative function.
3 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
Fig. 1 | Sending and receiving signals for social communication. To communicate a message to others, the sender encodes a message (e.g., “I am very happy” coloured in blue) in a signal. Here, the signal is a facial expression composed of different face movements, called Action Units (AUs)31. The sender transmits this signal to the receiver across communication channel. On receiving the signal, the receiver decodes a message from it (“he is very happy”) according to existing associations. A complex signal such as a facial expression could contain certain components – e.g., smiling, crinkled eyes, or wide opened mouth – that transmit specific elements of the message such as the emotion category ‘happy’ or its intensity (‘very’). We represent these different communicative functions using the Venn diagram. Green represents the set of AUs that communicate the emotion category (‘Classify,’ e.g., ‘happy’), red represents those that communicate intensity (‘Intensify,’ e.g., ‘very’), and orange represents those that serve a dual role of classification and intensification (‘Classify & Intensify’).
The green set represents the facial signals that receivers use to classify the emotion category
(e.g., ‘happy’), red represents those that receivers use to perceive emotional intensity (e.g.,
‘very’), and the orange intersection represents the facial signals that serve both functions of classification and intensification (e.g., ‘very happy’). The empirical question we address is to identify, in each culture, the facial signals – here, individual face movements called Action
Units31 (AUs) and their dynamic characteristics such as amplitude and temporal signalling order – that serve each type of communicative function (see Fig. 2 for the methodical
4 [Version for pre-print view only; revised in January 2021] Chen et al., 2021 approach). We find that, in each culture, individual face movements such as smiling, eye widening, or scowling, each serve a specific communicative function of transmitting emotion category and/or intensity information. Cross-cultural comparisons showed that certain face movements serve a similar communicative function across cultures – for example, Upper Lid
Raiser (AU5) serves primarily as an emotion classifier with occasional use as an intensifier
– while others serve different functions across cultures – for example, Mouth Stretch (AU27) primarily serves as an emotion classifier for East Asian participants and an intensifier for
Western participants (see Fig. 3). An analysis of the temporal ordering of classifier and intensifier face movements show that, in each culture, they are temporally distinct with intensifier face movements peaking earlier or later than classifiers. Together, our results reveal for the first time how facial expressions, as a complex dynamical signalling system, transmit multi-layered emotion messages. Our results therefore provide new insights into the longstanding goal of deciphering the language of human facial expressions3,4,32-35.
Results
Identifying face movements that communicate emotion categories and intensity. To identify the specific face movements that serve each communicative function – emotion classifier, intensifier, or dual classifier and intensifier – we used a data-driven approach that agnostically generates face movements and tests them against subjective human cultural perception36. We then measured the statistical relationship between the dynamic face movements – i.e., Action Units (AUs) – presented on each trial and the participants’ response using an information-theoretic analysis37. Fig. 2 operationalizes our hypothesis and illustrates our methodological approach with the six classic emotions – happy, surprise, fear, disgust, anger and sad.
5 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
Fig. 2 | Data-driven modelling of the dynamic face movements that transmit emotion category and intensity information in the six classic emotions. a, Transmitting. On each experimental trial, a dynamic facial movement generator36 randomly selected a sub-set of individual face movements called Action Units31 (AUs) from a core set of 42 AUs (here, Cheek Raiser – AU6, Lip Corner Puller – AU12, Lips Part – AU25, see labels to the left). A random movement is then assigned to each AU individually using random values for each of six temporal parameters (onset latency, acceleration, peak amplitude, peak latency, deceleration, and offset latency; see labels illustrating the solid black curve). The randomly activated AUs are then combined to produce a photo-realistic facial animation, shown here with four snapshots across time. The face movement vector at the bottom shows the three AUs randomly selected on this example trial. b, Decoding. The receiver viewed the facial animation and classified it according to one of six classic emotions – happy, surprise, fear, disgust, anger or sad – and rated its intensity on a 5-point scale from very weak to very strong (response here is ‘happy,’ ‘strong,’ shown in blue). Otherwise, the receiver selected ‘other’. Sixty Western and 60 East Asian participants each completed 2,400 such trials with all facial animations displayed on same-ethnicity male and female face identities.
6 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
On each experimental trial, we generated a random facial animation using a dynamic face movement generator36 that randomly selected a sub-set of individual face movements, called
Action Units31 (AUs; minimum of 1 AU, maximum of 5 AUs), and assigned a random movement to each AU individually using six temporal parameters (onset, acceleration, peak amplitude, peak latency, deceleration, and offset; see the labels illustrating the black solid curve in Fig. 2a; full details are provided in the Methods). For example, in the trial shown in
Fig. 2a, three AUs are randomly selected – Cheek Raiser (AU6), Lip Corner Puller (AU12), and Lips Part (AU25) – and each is activated by a random movement (Fig. 2a, see solid, dotted or dashed curves representing each AU). The dynamic AUs are then combined to produce a photo-realistic facial animation, shown as four snapshots across time (see in Fig.
2a). The receiver viewed the random facial animation and classified it according to one of the six classic emotions – happy, surprise, fear’, disgust, anger, or sad – and rated its intensity on a 5-point scale from very weak to very strong. For example, in Fig. 2b, on this trial the receiver perceived this combination of AUs – Cheek Raiser (AU6), Lip Corner Puller
(AU12), and Lips Part (AU25) each with a dynamic pattern – as ‘happy’ at ‘strong’ intensity.
If the receiver did not perceive any of the six emotions from the facial animation, they selected ‘other’. Sixty Western receivers (white European, 31 females, mean age = 22 years,
SD = 1.71 years) and 60 East Asian receivers (Chinese, 24 females, mean age 23 years, SD =
1.55 years; see full details in Methods, under ‘Participants’) each completed 2,400 such trials with all facial animations displayed on same-ethnicity male and female faces (full details are provided in Methods, under ‘Stimuli and procedure’).
Using this procedure, we therefore captured on each experimental trial the dynamic face movement patterns that elicited a given emotion category and intensity perception in the receiver – e.g., ‘happy’, ‘strong’ intensity; ‘sad’, ‘low’ intensity, and so forth (see Fig. 2b,
7 [Version for pre-print view only; revised in January 2021] Chen et al., 2021 highlighted in blue). After many such trials, we were then able to build the statistical relationship between the face movements presented on each trial and the receiver’s perceptions, to produce a statistically robust model of the face movement patterns that transmit emotion category and emotion intensity information to each receiver. As illustrated above, the strength of this data-driven approach is that it can objectively and precisely characterize the face movements that receivers use to classify emotions and to judge their intensity. This agnostic approach to generating face movements and testing them against subjective perception is therefore less constrained than theory-focused methods, and can instead extract the communicative functions of face movements directly from the receiver’s implicit knowledge29,38,39.
Following the experimental trials, we identified – for each receiver – the individual face movements (i.e., AUs) that corresponded to each of the three communicative functions depicted in the Venn diagram in Fig. 2: emotion classifiers (colour-coded in green), intensifiers (red), and classifier and intensifiers (orange). To do so, we measured the strength of the statistical relationship between each individual face movement and the receiver’s emotion classification and intensity responses using Mutual Information37,40 (MI). MI is the most general measure of the statistical relationship between two variables that makes no assumption about the distribution of the variables or the nature of their relationship37 (e.g. linear, nonlinear). For each culture seperately, we computed MI as follows.
1. Classifier face movements (green set). To identify intensifier AUs, we computed the
MI between each AU (present vs. absent on each trial) and each receiver’s emotion
classification responses (e.g., individual trials categorized as ‘happy’ vs. those that
were not). To establish statistical significance, we used a non-parametric permutation
test, which derived a chance level distribution for each receiver by randomly shuffling
8 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
their responses. We then identified the AUs with statistically significant MI values,
controlling the family-wise error rate (FWER) over AUs with the method of
maximum statistics (FWER P < 0.05 within receiver test). Significant MI values
indicate a strong relationship between a specific facial movement (e.g., Lip Corner
Puller, AU12) and the classification of an emotion (e.g., ‘happy’; full details in
Methods, under ‘Characterizing the communicative function of face movements’).
2. Intensifier face movements (red set). To identify intensifier AUs, we used all trials
associated with a given emotion classification response as described in (1) – for
example, all ‘anger’ trials – and identified within those trials the specific AUs that
intensify that emotion. Specifically, we computed across trials the MI between each
AU (present vs. absent on each trial) and the receiver’s corresponding intensity
ratings (i.e., low vs. high). We established statistical significance using the same
permutation test and family-wise error rate method as described in (1) above and
identified the AUs with statistically significant MI values (FWER P < 0.05).
Significant MI values indicate a strong relationship between the face movement (e.g.,
Upper Lid Raiser, AU5) and the perceived intensity of the emotion (e.g., ‘high
intensity’ anger; full details in Methods, under ‘Characterizing the communicative
function of face movements’).
3. Classifier & Intensifier face movements (orange intersection). These AUs, which
serve a dual role, have significant MI values for both emotion classification (green
set) and intensification (red set).
We applied the above analysis to the data of each individual receiver. Finally, we computed the population prevalence of each of the above statistical effects. This indicates the proportion of the population from which the sample of experimental participants was drawn
9 [Version for pre-print view only; revised in January 2021] Chen et al., 2021 that would be expected to show the same effect, if subjected to the same experimental procedure. Inferring a non-zero population prevalence41,42 at P = 0.05, (Bonferroni corrected over emotions) corresponds to a significance threshold of N > 10 receivers showing a significant result in each culture (i.e., full details in Methods, under ‘Characterizing the communicative function of face movements, Population prevalence’). That is, 10 out of our
60 receivers showing an effect provides enough evidence to reject a null hypothesis that the population prevalence proportion is 0 (i.e., no receiver in the population would show the effect). Fig. 3a and b show these results.
10 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
11 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
Fig. 3 | Face movements that transmit emotion category and/or intensity information the six classic emotions in each culture. a, Each row of colour-coded faces shows, for each culture – Western and East Asian – and for each of the six classic emotions, the face movements (i.e., AUs) receivers used to classify the emotion (‘Classify,’ in green), perceive its intensity (‘Intensify,’ in red) or both (‘Classify & Intensify,’ in orange; see Venn diagram). Colour saturation shows the number of statistically significant receivers for each AU (FWER P < 0.05 within receiver test; see colour bar to right, normalized per emotion). b, Results above presented in tabular format. Only AUs with >10 significant receivers are shown, as this corresponds to rejecting the population null hypothesis of zero prevalence (P < 0.05, Bonferroni corrected). For example, Sharp Lip Puller (AU13) is a classifier (green) in happy in 42/60 Western receivers and 51/60 East Asian receivers. We repeated this analysis with a broad set of complex facial expressions of emotion in each culture (Supplemental Fig. S1 show full results). The color-coded matrix on the right (‘Classic & complex’) shows communicative function of each AU across these two data sets. Colour saturation shows the proportion of emotion categories in which the AU serves a given communicative function, ranked from highest to lowest (left to right). For example, Inner Brow Raiser (AU2) is exclusively a classifier (green) in each culture and is thus represented by a full saturation green cell in each culture. Cross-cultural comparison of the communicative functions of the AUs showed cross-cultural differences (denoted by Black dots) and similarities (no marking). c, Color-coded face maps show the AUs that serve the same or different communicative function across cultures. The list of AUs is shown next to each face.
In Fig. 3a, each row of colour-coded faces shows the face movements that serve each type of communicative function: emotion classification (‘Classify,’ in green), intensification
(‘Intensify,’ in red) and classification and intensification (‘Classify and Intensify,’ in orange; see Methods, under ‘Characterizing the communicative function of face movements’). Colour saturation represents the number of receivers with a significant effect (FWER P < 0.05 within receiver test; see colour bar to the right of Fig. 3a) above population prevalence41,42 threshold
(see Methods, under ‘Characterizing the communicative function of face movements’). Fig.
3b shows each of these face movements separately, for each emotion, colour-coded by communicative function, and with the number of receivers showing this effect. For example,
Lip Corner Puller-Cheek Raiser (AU12-6) in happy serves as classifier (green) in 34/60 receivers and as a classifier and intensifier (orange) in 24/60 receivers43. Mouth Stretch
12 [Version for pre-print view only; revised in January 2021] Chen et al., 2021
(AU27) serves as classifier and intensifier (orange) in surprise and fear in 41/60 and 12/60 receivers, respectively, and as an intensifier (red) in anger in 15/60 receivers.
To validate the findings reported for the six classic emotions, we applied the same analyses to a second set of facial expressions of 50 more complex emotions in each culture including excited, embarrassed, anxious, hate in Western culture and /amazed, /shame,