Latent Emotion Memory for Multi-Label Emotion Classification
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Latent Emotion Memory for Multi-Label Emotion Classification Hao Fei,1 Yue Zhang,2 Yafeng Ren,3∗ Donghong Ji1∗ 1Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China 2School of Engineering, Westlake University, Hangzhou, China 3Guangdong Collaborative Innovation Center for Language Research & Services, Guangdong University of Foreign Studies, Guangzhou, China {hao.fei, renyafeng, dhji}@whu.edu.cn [email protected] Abstract Identifying multiple emotions in a sentence is an important research topic. Existing methods usually model the problem as multi-label classification task. However, previous methods have two issues, limiting the performance of the task. First, these models do not consider prior emotion distribution in a sentence. Second, they fail to effectively capture the context information closely related to the corresponding emotion. In this paper, we propose a Latent Emotion Memory network (LEM) for multi-label emotion classification. The proposed model can learn the latent emotion distribution without exter- nal knowledge, and can effectively leverage it into the clas- sification network. Experimental results on two benchmark datasets show that the proposed model outperforms strong baselines, achieving the state-of-the-art performance. Figure 1: Multiple emotions with different intensities. Introduction Emotion classification is an important task in natural lan- and fail to consider prior emotion distribution in a sentence. guage processing (NLP). Automatically inferencing the Intuitively, different emotion in a sentence has different in- emotions is the initial step for downstream applications such tensity. Figure 1 illustrates the emotion distribution of sen- as emotional chatbots (Zhou et al. 2018), stock market pre- tence S1, where there are four different emotions with differ- diction (Nguyen, Shirai, and Velcin 2015), policy studies ent intensities. In particular, the emotions anticipation and (Bermingham and Smeaton 2011), etc. However, it is com- pessimism receive higher intensities than disgust and love. mon that there exist more than one emotion in a piece of text. Emotion labels with higher intensity should deserve higher Intuitively, people tend to express multiple emotions in one probabilities at the final prediction for the model. 2) Previ- piece of text. Taking the following sentences as example: ous work does not effectively capture the context informa- (S1) How’s the new Batman Telltale Series? Looks good tion closely related to the corresponding emotion, which is but I’m growing weary of this gaming style. crucial for the prediction. In sentence S2, the clues indicat- (S2) It really is amazing in the worst ways. It was very ing the sadness emotion, ‘worst’, are scattered broadly, and hard to stifle my laughter after I overheard this comment. surrounded by the words ‘laughter’ and ‘amazing’ that sup- In sentence S1, multiple emotions are conveyed, includ- port the joy emotion. Correct predictions can be made only ing anticipation, disgust, love and pessimism. Sentence S2 when these features can be sufficiently mined and properly contains two emotions: joy and sadness. How to identify the selected. If we can sufficiently capture effective features for multiple co-existing emotions in a sentence remains a chal- each emotion, the final prediction will be relatively easy. lenging task. This requires a strong ability of the model on features ex- There has been work considering multi-label emotion traction. classification (He and Xia 2018; Almeida et al. 2018; Yu et To address these issues, we propose a Latent Emotion al. 2018). However, there still are two limitations. 1) They Memory network (LEM) for multi-label emotion classifica- assume each emotion occurs with equal prior probability, tion. LEM consists of two main components: a latent emo- ∗Corresponding author tion module and a memory module, which are shown in Copyright c 2020, Association for the Advancement of Artificial Figure 2. First, the latent emotion module learns emotions Intelligence (www.aaai.org). All rights reserved. distribution by reconstructing the input via variational au- 7692 improve the performance of multi-label emotion classifica- tion. However, these methods do not consider the prior emo- tion distribution information in a sentence. Our work is related to the work proposed by Zhou et al. (2016). They proposed an emotion distribution learning (EDL) method, which first learned the relations between emotions based on the theory of Plutchik’s wheel of emo- tions (Plutchik 1980), and then conducted multi-label emo- tion classification by incorporating these label relations into the cost function (Zhou et al. 2016a). Nevertheless, our method differs from theirs in three aspects: 1) our model ef- fectively learns the emotion distribution, which is free from the restraint of any theory. 2) the emotion intensities distri- bution is automatically captured during the reconstruction of inputs in VAE model. 3) multi-hop memory module ensures that each emotion makes full use of context information for the corresponding emotion. Figure 2: The basic unit of LEM, including the latent emo- Variational Models Our proposed method is also related tion module and the memory module with one hop. Wled is to work on variational models in NLP applications. Bow- the latent emotion distribution embedding. Wef is the emo- tion feature embedding, also used as memory representation man et al. (2015) introduced a RNN-based VAE model for memory module. for generating diverse and coherent sentences. Miao et al. (2016) proposed a neural variational framework incorporat- ing multilayer perceptrons (MLP), CNN and RNN for gen- toencoder. Second, the memory module captures emotion- erative models of text. Bahuleyan et al. (2017) proposed an related features for the corresponding emotion. Finally, the attention-based variational seq2seq model for alleviating the feature representation from the memory module concate- attention bypassing effect. Different from the above meth- nated with emotion distribution representation from the la- ods, we first employ the VAE model to make reconstruction tent emotion module is fed into a bi-directional Gated Re- for original input, and then make use of the intermediate la- current Unit (BiGRU) to make prediction. tent representation as a prior emotion distribution informa- All the components are trained jointly in a supervised end- tion for facilitating downstream prediction. to-end learning: the latent variable representation from the latent emotion module guides the prediction of the mem- Method ory module, and the emotion memory module in return en- The proposed model consists of two main components: a la- courages the latent emotion module to better learn the emo- tent emotion module and a memory module. The mechanism tion distribution through back-propagation. Our model can of the basic unit of LEM is shown in Figure 2. learn latent emotion distribution information without exter- nal knowledge, effectively leveraging it into the classifica- Latent Emotion Module tion network. Since we cannot measure the emotion distribution explic- We conduct experiments on the SemEval 2018 task 1C itly, we model it as a set of latent variables. We employ English dataset and the Ren-CECps Chinese dataset. Experi- variational autoencoder (VAE) to learn the latent multino- mental results show that our model outperforms strong base- mial distribution representation Z during the reconstruction lines, achieving the state-of-the-art performance. of the input. Related Work Encoding The input of latent emotion module is emotion- Multi-label Emotion Classification Emotion detection BoW (eBoW) features of the sentence. Before being fed has been extensively researched in recent years (Ren et al. into VAE, the BoW features are preprocessed so that stop- 2017; Tang et al. 2019). Existing work mainly includes words or meaningless words are excluded from the vocabu- lexicon-based methods (Wang and Pal 2015), graphical lary. The reasons are two fold: 1) our target is to capture the model-based methods (Li et al. 2015) and linear classifier- emotion distribution, and the latent representation should be based methods (Quan et al. 2015). More recently, various emotion-rich rather than general semantic meaning. 2) re- neural networks models have been proposed for this task, ducing the size of BoW vocabulary is beneficial to the train- achieving highly competitive results on several benchmark ing of VAE. The encoder fe(·) consists of multiple non-linear hidden datasets (Ren et al. 2016; Felbo et al. 2017; Baziotis et al. L layers, transforming the input XeBoW ∈ R (L is the max 2018; He and Xia 2018). For example, Wang et al. (2016) μ σ employed the TDNN framework by constructing a convo- length of feature sequence) into prior parameters and : lutional neural network (CNN) for multiclass classification. μ = fe,μ(XeBoW ), (1) Yu et al. (2018) proposed a transfer learning architecture to logσ = fe,σ(XeBoW ). 7693 We define (Bowman et al. 2015) an emotion latent variable Z = μ + σ · , where is Gaussian noise variable sampled from N (0, 1), Z ∈ RK (K denotes the number of emotion labels). Then, the variable is normalized: Z = softmax(Z ). (2) Correspondingly, the latent emotion distribution p(ek|,k = 1, ··· ,K) is reflected in Z. Decoding Variational inference is used to approximate a posterior distribution over Z. We first use a linear hidden layer fled(·) to transform Z into embeddings: led R = fled(Z; Wled), (3) K×E where Wled ∈ R led (Eled is the corresponding embed- ding dimension) is the learned embedding of the latent emo- Figure 3: The overall framework of the LEM model. The tion distribution. This embedding will later be used to guide memory module are private for each k-th emotion. The la- the feature learning of memory module, and to control the tent emotion module is shared globally. overall prediction of emotions. A good decoder can learn a bunch of high level represen- tation of rich emotion features during the process of recon- Emotion Memory Module structing data.