Emotional Embeddings: Refining Word Embeddings to Capture

Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words Armin Seyeditabari Narges Tabari Shefie Gholizade Wlodek Zadrozny UNC Charlotte University of Virginia UNC Charlotte UNC Charlotte [email protected] [email protected] [email protected] [email protected] Abstract fine-grained affective information extraction tech- nique, is just recently making larger appearance Word embeddings are one of the most use- in the literature. The amount of useful informa- ful tools in any modern natural language processing expert’s toolkit. They contain var- tion which can be gained by moving past the neg- ious types of information about each word ative and positive sentiments and towards identify- which makes them the best way to represent ing discrete emotions can help improve many ap- the terms in any NLP task. But there are some plications. For example, the two emotions Fear types of information that cannot be learned by and Anger both express negative opinion of a per- these models. Emotional information of words son toward something, however, it has been shown are one of those. In this paper, we present that fearful people tend to have pessimistic view an approach to incorporate emotional infor- of the future, while angry people tend to have mation of words into these models. We accomplish this by adding a secondary training more optimistic view (Lerner and Keltner, 2000). stage which uses an emotional lexicon and a Moreover, fear generally is a passive emotion, psychological model of basic emotions. We while anger is more likely to lead to action (Miller show that fitting an emotional model into pre- et al., 2009). The usefulness of understanding trained word vectors can increase the perfor- emotions in political science (Druckman and Mc- mance of these models in emotional similar- Dermott, 2008), psychology, marketing (Bagozzi ity metrics. Retrained models perform better et al., 1999), human-computer interaction (Brave than their original counterparts from 13% im- provement for Word2Vec model, to 29% for and Nass, 2003), and many more, gave the field GloVe vectors. This is the first such model pre- of emotion detection in natural language process- sented in the literature, and although prelimi- ing life of its own, resulting in a surge of research nary, these emotion sensitive models can open papers in recent years. the way to increase performance in variety of Word embeddings, as one of the best methods emotion detection techniques. to create representation for each word in the corpus, is mostly used as features in any neural net- 1 Introduction work base classifiers. These word vectors are cre- There is an abundant volume of textual data avail- ated in manner that the angular distance between able online about variety of subjects through social them represents various types of information. For arXiv:1906.00112v2 [cs.CL] 19 Jun 2019 media. This availability of large amount of data example, the distance between the two words cat led to a fast growth in information extraction using and feline should be less that the distance between natural language processing. One of the most im- cat and canine as cat is a feline but not a canine. portant types of information that can be captured You can find verity of similarity, or categorical is the affective reaction of the population to a spe- information in the shape of these vector spaces that cific event, product, etc. We have seen a vast im- make them one of the best tools we have in natu- provement in extracting the sentiment from text to ral language processing. But these embeddings, the point that sentiment analysis has become one due to the nature of their training methods, do of the standard tools in any NLP expert’s toolkit not contain the emotional similarity information. and has been used in various applications(Ravi and In this paper, we present and analyze a method- Ravi, 2015). ology to incorporate emotional information into On the other hand, emotion detection, as a more these models after the fact. We accomplish this by After the success of these models, many stud- ies have been done to figure out the shortcom- ings of these models and try to make them better. Speer et al. used an ensemble method to integrate Word2Vec and GloVe with ConceptNet knowl- edge base (Speer et al., 2016) and created Con- ceptNet NumberBatch model (Speer and Chin, 2016) and showed that their model outperform ei- ther of those models in verity of tasks. Faruqui et al.(2014) also presented a method to refine these vectors base on an external semantic lexicon by encouraging vectors for similar words move closer to each other. Understanding that these embedding models do not perform well for semantically opposite words, in their paper Mrksiˇ c´ et al.(2016) created a methodology that not only brought the vectors for Figure 1: Plutchik’s Wheel of Emotions. Opposite similar words close to each other, but also moved emotions are placed on opposite petals. vectors for opposite words farther apart. Mikolov et al.(2017) created an improved model fastText utilizing an emotion model and an emotion lex- in which they used combination of known tricks to icon -in this case Plutchik’s wheel of emotions make the vectors perform better in different tasks. (Plutchik, 1991), and NRC emotion lexicon (Mo- But as shown in Seyeditabari and Zadrozny(2017) hammad and Turney, 2013). We have also used a these models do not perform well in emotional secondary emotion model to create an emotional similarity tasks. similarity test to compare the performance of the There has also been various attempts to create models before and after training. sentiment embeddings that would perform better This preliminary result is an important step to in sentiment analysis tasks compared to standard show the potential that these models can be used vector spaces (Tang et al., 2014, 2016; Yu et al., to improve emotion detection systems in differ- 2017). Moving past sentiments, in this paper, we ent ways. Emotion sensitive embeddings can be present a method to incorporate emotional infor- used in various emotion detection methodologies, mation of words into some of these models men- such as recurrent neural network classifiers to pos- tioned above. sibly improve the model performance in learning, and classifying emotions. It can also be used in 3 Fitting Emotional Constraints in Word attention networks (Yang et al., 2016) to calcu- Vectors late feature weights for each term in the corpus to For fitting emotional information into pre-trained potentially improve the classification accuracy by word vectors, we use a methodology similar to giving more weights to the emotionally charged what Mrksiˇ c´ et al.(2016) used to incorporate addi- terms. tional linguistic constraints in word vector spaces. Our goal here, is to change the vector space V = 2 Related Work 0 0 0 0 fv1; v2; : : : ; vng to V = fv1; v2; : : : ; vng in a In the past decade, specially by increasing usage of careful manner to add emotional constraints to the neural networks, word embeddings have been one vector space without loosing too much informa- of the most useful tools in natural language pro- tion already present during the original learning cessing. Word2Vec created by Mikolov et al., and step. To preform this task we create two sets of presented in two papers (2013b; 2013a) has shown constraints based on NRC emotion lexicon, one that these vectors could perform reliably in variety for words which have positive relation to an emo- of tasks. GloVe (JeffreyPennington et al., 2014) tion such as (abduction, sadness), and one to keep took a different approach for creating word em- track of each words relation to the opposite of that beddings which performed on par with Word2Vec. emotion (abduction, joy), joy being the opposite of sadness. In NRC lexicon Mohammad et al. an- notated over 14k English words for eight emotions Obj(V 0) = PR(V 0) + NR(V 0) + V SP (V; V 0) (4) from Plutchiks model of basic emotions(See Fig- ure1). Stochastic gradient decent was used for 20 To create our two constraint sets, we ex- epochs to train the vector space V and generate tract all word/emotion relations indicated in the new space V 0. the lexicon so that in our first set S = f(w1; e1); (w1; e3); (w2; e2);:::g we have ordered 4 Experiments pairs, each indicating a word and the emotion In our experiment we compared variety of word it is associated with. And for each emotion ei, 0 embeddings with their emotionally fitted counter- we add its opposite ei to our second set O = 0 0 0 0 parts for various metrics based on emotional mod- f(w1; e1); (w1; e3); (w2; e2);:::g in which e is i els. As we trained the model on Plutchik’s model the opposite emotion to ei based on Plutchiks model. We have extracted over 8k such pairs of we decided to use another emotion model for test- (word, emotion) constraints from NRC lexicon for ing. In the first experiment we assess the average each of the positive and negative relation sets. in-category mutual similarity of secondary and ter- We define our objective functions so that we de- tiary emotions in the three level categorization of crease the angular distance between words with emotions described by Shaver et al.(1987). In this their associated emotion in the set S, and at the model, Shaver et al. defined 6 basic emotions of same time, increase their distance with their op- Liking, Joy, Surprise, Anger, Sadness, and Fear, posite emotions in the set O. We want the pairs and categorized around 140 sub-emotions under of words in positive relation set to get closer to- these 6 emotions in two layers (See Table1).

Load more