Emograph: Capturing Emotion Correlations Using Graph Networks

EmoGraph: Capturing Emotion Correlations using Graph Networks Peng Xu∗, Zihan Liu∗, Genta Indra Winata, Zhaojiang Lin, Pascale Fung Center for Artificial Intelligence Research (CAiRE) Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong fpxuab, zliucr, giwinata, [email protected] Abstract ask is, how can we obtain and incorporate the emotion correlation to improve emotion understanding Most emotion recognition methods tackle the tasks, such as classification? emotion understanding task by considering in- To obtain emotion correlations, a possible way dividual emotion independently while ignor- is to take advantage of a multi-label emotion ing their fuzziness nature and the interconnections among them. In this paper, we ex- dataset. Intuitively, emotions with high correla- plore how emotion correlations can be cap- tions will be labeled together, and therefore, emo- tured and help different classification tasks. tion correlations can be extracted from the label co- We propose EmoGraph that captures the de- occurrences. Recently, a multi-label emotion classi- pendencies among different emotions through fication competition (Mohammad et al., 2018) with graph networks. These graphs are con- 11 emotions has been introduced to promote re- structed by leveraging the co-occurrence statis- search into emotional understanding. To tackle this tics among different emotion categories. Em- pirical results on two multi-label classification challenge, the best team (Baziotis et al., 2018) first datasets demonstrate that EmoGraph outper- pre-trains on a large amount of external emotion- forms strong baselines, especially for macro- related datasets and then performs transfer learning F1. An additional experiment illustrates the on this multi-label task. However, they still neglect captured emotion correlations can also benefit the correlations between different emotions. a single-label classification task. In this paper, we propose EmoGraph which leverages graph neural networks to model the de- 1 Introduction pendencies between different emotions. We take Understanding human emotions is considered as each emotion as a node and first construct an emo- the key to building engaging dialogue systems tion graph based on the co-occurrence statistics (Zhou et al., 2018). However, most works on between every two emotion classes. Graph neural emotion understanding tasks treat individual emo- networks are then applied to extract the features tions independently while ignoring the fuzziness from the neighbours of each emotion node. We nature and the interconnections among them. A conduct experiments on two multi-label emotion psychoevolutionary theory proposed by Plutchik classification datasets. Empirical results show that (1984) shows that different emotions are actually our model outperforms strong baselines, especially arXiv:2008.09378v1 [cs.CL] 21 Aug 2020 correlated, and all emotions follow a circular struc- for macro-F1 score. The analysis shows that low ture. For example, “optimism” is close to “joy” resource emotions, such as “trust”, can particu- and “anticipation” instead of “disgust” and “sad- larly benefit from the emotion correlations. An ness”. Without considering the fundamental inter- additional experiment illustrates that the captured correlation between them, the understanding of emotion correlations can also help the single-label emotions can be unilateral, leading to sub-optimal emotion classification task. performance. These understanding can be particu- larly important for low resource emotions, such as 2 Related Work “surprise” and “trust” whose training samples are For emotion classifications, Tang et al.(2016) pro- hard to get. Therefore, the research question we posed sentiment embeddings that incorporate sen- ∗∗ Equal contributions. timent information into word vectors. Felbo et al. (2017) trained a huge LSTM-based emotion rep- 3.1 Emotion Graphs resentation by predicting emojis. Various meth- We take each emotion class as a node in the emotion ods have also been developed for automatic con- graph. To create the connections between emotion structions of sentiment lexicons using both a su- nodes, we use the co-occurrence statistics between pervised and unsupervised method (Wang and Xia, emotions. The intuition is that if two emotions 2017). Duppada et al.(2018) combined both pre- co-occurs frequently, they will have a high corre- trained representations and emotion lexicon fea- lation. Directly using this co-occurrence matrix tures, which significantly improved the emotion un- as our graph may be problematic because the co- derstanding systems. Park et al.(2018); Fung et al.; occurrence matrix is symmetric while the emotion Xu et al.(2018) encoded emotion information into relation is not symmetric. For example, in our cor- word representations and demonstrated improve- pus, “anticipation” co-occurs with “optimism” 197 ments over emotion classification tasks. Shin et al. times, while “anticipation” appears 425 times and (2019); Lin et al.(2019); Xu et al.(2019) further in- “optimism” appears 1143 times. Thus, knowing troduced emotion supervision into generation tasks. “anticipation” and “optimism” co-occur is notably Multi-label classification is an important yet more important for “anticipation” than “optimism”. challenging task in natural language processing. Thus, we calculate the co-occurrence matrix M Binary relevance (Boutell et al., 2004) transformed from a given emotion corpus and then normalize the multi-label problem into several independent Mi;j with Mi;i so that the graph encodes the asym- classifiers. Other methods to model the depen- metric relation between different emotions. dencies have since been proposed by creating new 1 Mi;j Gi;j = : (1) labels (Tsoumakas and Katakis, 2007), using clas- Mi;i sifier chains (Read et al., 2011), graphs (Li et al., Due to the fuzziness nature of emotions, the 2015), and RNN (Chen et al., 2017; Yang et al., graph matrix G1 may contain some noise. Thus, 2018). These models are either non-scalable or we adopted the approach in Chen et al.(2019) to modeling the labels as a sequence. binarize G1 with a threshold µ to reduce noise and Graph networks have been applied to model rela- tune another hyper-parameter w to mitigate the tions across different tasks such as image recogni- over-smoothing problem (Li et al., 2018): tion (Chen et al., 2019; Satorras and Estrach, 2018), ( 1 2 1; if Gi;j ≥ µ and text classification (Ghosal et al., 2019; Yao Gi;j = (2) et al., 2019) with different graph networks (Kipf 0; otherwise: and Welling, 2017; Velikovi et al., 2018). 8 n > 2 X 2 <>Gi;j=( Gi;j); if i 6= j Despite the growing interests in low-resource Gi;j = j=1 (3) studies in machine translation (Artetxe et al., 2017; > :>1 − w; otherwise: Lample et al., 2017), dialogue systems (Bapna et al., 2017; Liu et al., 2019, 2020), speech recogni- 3.2 Emotion Classification Models tion (Miao et al., 2013; Thomas et al., 2013; Winata Our emotion classification model consists of an en- et al., 2020), emotion recognition (Haider et al., coder and graph-based emotion classifiers follow- 2020), and etc, emotion detection for low-resource ing the framework of Chen et al.(2019). For the en- emotions has been less studied. coder, we choose the Transformer (TRS) (Vaswani et al., 2017) and the pre-trained model BERT (De- vlin et al., 2019) for their strong representation ca- 3 Methodology pacities on many natural language tasks. We denote the encoded representation of x as s. For graph- based emotion classifiers, we experiment with two In this section, we first introduce the emotion types of graph networks: GCN (Kipf and Welling, graphs and then our emotion classification mod- 2017) and GAT (Velikovi et al., 2018). The GCN els. We denote the input sentence as x and the takes the features of each emotion node as inputs emotion classes as e = fe1; e2; ··· ; eng. The la- n and applies a convolution over neighboring nodes bel for x is y, where y 2 f0; 1g and yj denotes to generate the classifier for ei: the label for ej. The embedding matrix is E. The co-occurrence matrix of these emotions is M. C = ReLU(GE~ eW1); (4) where G~ is the normalized matrix of G following Accuracy Micro-F1 Macro-F1 NTUA-SLP 58.8 70.1 52.8 Kipf and Welling(2017). Ee is the embeddings for DATN 58.3 - 54.4 all emotions e, and W1 is the trainable parameters. SGM 48.2 57.5 41.1 Our emotion classifiers are C = C ;C ; ··· ;C , 1 2 n TRS 51.1 63.4 46.7 where Ci is the classifier for ei. Alternatively, GAT TRS-GAT 51.7 64.6 48.3 takes the features of each node as input and learns TRS-GCN 51.9 63.8 49.2 a multi-head self-attention over the emotion nodes BERT 58.4 70.1 53.8 to generate classifiers C. BERT-GAT 58.3 69.9 56.9 We then simply take the inner product between BERT-GCN 58.9 70.7 56.3 s and C to compute y^ , the logits of the emotion i i Table 1: Comparisons among different systems on class ei for classification: SemEval-2018 dataset. y^ = s ∗ C : i i (5) Accuracy Micro-F1 Macro-F1 TRS 30.4 46.5 35.0 As our task is a multi-label prediction problem, SGM 33.5 42.9 25.2 we add a sigmoid activation to yî and use a cross- TRS-GAT 33.1 49.1 38.6 entropy loss function. TRS-GCN 34.4 50.5 40.8 4 Experimental Setup Table 2: Comparisons among different systems on Twitter dataset. 4.1 Dataset and Evaluation Metrics We choose two datasets for our multi-label classification training and evaluation. SemEval-2018 the labels using beam search. TRS is the sys- (Mohammad et al., 2018) contains 10,983 tweets tem that adds a linear layer on top of the Trans- with 11 different emotion categories.

Load more