<<

Emotional Embeddings: Refining Word Embeddings to Capture Emotional Content of Words

Armin Seyeditabari Narges Tabari Shefie Gholizade Wlodek Zadrozny UNC Charlotte University of Virginia UNC Charlotte UNC Charlotte [email protected] [email protected] [email protected] [email protected]

Abstract fine-grained affective information extraction tech- nique, is just recently making larger appearance Word embeddings are one of the most use- in the literature. The amount of useful informa- ful tools in any modern natural language pro- cessing expert’s toolkit. They contain var- tion which can be gained by moving past the neg- ious types of information about each word ative and positive sentiments and towards identify- which makes them the best way to represent ing discrete can help improve many ap- the terms in any NLP task. But there are some plications. For example, the two emotions types of information that cannot be learned by and both express negative opinion of a per- these models. Emotional information of words son toward something, however, it has been shown are one of those. In this paper, we present that fearful people tend to have pessimistic view an approach to incorporate emotional infor- of the future, while angry people tend to have mation of words into these models. We ac- complish this by adding a secondary training more optimistic view (Lerner and Keltner, 2000). stage which uses an emotional lexicon and a Moreover, fear generally is a passive , psychological model of basic emotions. We while anger is more likely to lead to action (Miller show that fitting an emotional model into pre- et al., 2009). The usefulness of understanding trained word vectors can increase the perfor- emotions in political science (Druckman and Mc- mance of these models in emotional similar- Dermott, 2008), psychology, marketing (Bagozzi ity metrics. Retrained models perform better et al., 1999), human-computer interaction (Brave than their original counterparts from 13% im- provement for Word2Vec model, to 29% for and Nass, 2003), and many more, gave the field GloVe vectors. This is the first such model pre- of emotion detection in natural language process- sented in the literature, and although prelimi- ing life of its own, resulting in a surge of research nary, these emotion sensitive models can open papers in recent years. the way to increase performance in variety of Word embeddings, as one of the best methods emotion detection techniques. to create representation for each word in the cor- pus, is mostly used as features in any neural net- 1 Introduction work base classifiers. These word vectors are cre- There is an abundant volume of textual data avail- ated in manner that the angular distance between able online about variety of subjects through social them represents various types of information. For arXiv:1906.00112v2 [cs.CL] 19 Jun 2019 media. This availability of large amount of data example, the distance between the two words cat led to a fast growth in information extraction using and feline should be less that the distance between natural language processing. One of the most im- cat and canine as cat is a feline but not a canine. portant types of information that can be captured You can find verity of similarity, or categorical is the affective reaction of the population to a spe- information in the shape of these vector spaces that cific event, product, etc. We have seen a vast im- make them one of the best tools we have in natu- provement in extracting the sentiment from text to ral language processing. But these embeddings, the point that sentiment analysis has become one due to the nature of their training methods, do of the standard tools in any NLP expert’s toolkit not contain the emotional similarity information. and has been used in various applications(Ravi and In this paper, we present and analyze a method- Ravi, 2015). ology to incorporate emotional information into On the other hand, emotion detection, as a more these models after the fact. We accomplish this by After the success of these models, many stud- ies have been done to figure out the shortcom- ings of these models and try to make them better. Speer et al. used an ensemble method to integrate Word2Vec and GloVe with ConceptNet knowl- edge base (Speer et al., 2016) and created Con- ceptNet NumberBatch model (Speer and Chin, 2016) and showed that their model outperform ei- ther of those models in verity of tasks. Faruqui et al.(2014) also presented a method to refine these vectors base on an external semantic lexicon by encouraging vectors for similar words move closer to each other. Understanding that these embedding models do not perform well for semantically opposite words, in their paper Mrksiˇ c´ et al.(2016) created a methodology that not only brought the vectors for Figure 1: Plutchik’s Wheel of Emotions. Opposite similar words close to each other, but also moved emotions are placed on opposite petals. vectors for opposite words farther apart. Mikolov et al.(2017) created an improved model fastText utilizing an emotion model and an emotion lex- in which they used combination of known tricks to icon -in this case Plutchik’s wheel of emotions make the vectors perform better in different tasks. (Plutchik, 1991), and NRC emotion lexicon (Mo- But as shown in Seyeditabari and Zadrozny(2017) hammad and Turney, 2013). We have also used a these models do not perform well in emotional secondary emotion model to create an emotional similarity tasks. similarity test to compare the performance of the There has also been various attempts to create models before and after training. sentiment embeddings that would perform better This preliminary result is an important step to in sentiment analysis tasks compared to standard show the potential that these models can be used vector spaces (Tang et al., 2014, 2016; Yu et al., to improve emotion detection systems in differ- 2017). Moving past sentiments, in this paper, we ent ways. Emotion sensitive embeddings can be present a method to incorporate emotional infor- used in various emotion detection methodologies, mation of words into some of these models men- such as recurrent neural network classifiers to pos- tioned above. sibly improve the model performance in learning, and classifying emotions. It can also be used in 3 Fitting Emotional Constraints in Word attention networks (Yang et al., 2016) to calcu- Vectors late feature weights for each term in the corpus to For fitting emotional information into pre-trained potentially improve the classification accuracy by word vectors, we use a methodology similar to giving more weights to the emotionally charged what Mrksiˇ c´ et al.(2016) used to incorporate addi- terms. tional linguistic constraints in word vector spaces. Our goal here, is to change the vector space V = 2 Related Work 0 0 0 0 {v1, v2, . . . , vn} to V = {v1, v2, . . . , vn} in a In the past decade, specially by increasing usage of careful manner to add emotional constraints to the neural networks, word embeddings have been one vector space without loosing too much informa- of the most useful tools in natural language pro- tion already present during the original learning cessing. Word2Vec created by Mikolov et al., and step. To preform this task we create two sets of presented in two papers (2013b; 2013a) has shown constraints based on NRC emotion lexicon, one that these vectors could perform reliably in variety for words which have positive relation to an emo- of tasks. GloVe (JeffreyPennington et al., 2014) tion such as (abduction, ), and one to keep took a different approach for creating word em- track of each words relation to the opposite of that beddings which performed on par with Word2Vec. emotion (abduction, ), joy being the opposite of sadness. In NRC lexicon Mohammad et al. an- notated over 14k English words for eight emotions Obj(V 0) = PR(V 0) + NR(V 0) + VSP (V,V 0) (4) from Plutchiks model of basic emotions(See Fig- ure1). Stochastic gradient decent was used for 20 To create our two constraint sets, we ex- epochs to train the vector space V and generate tract all word/emotion relations indicated in the new space V 0. the lexicon so that in our first set S = {(w1, e1), (w1, e3), (w2, e2),...} we have ordered 4 Experiments pairs, each indicating a word and the emotion In our experiment we compared variety of word it is associated with. And for each emotion ei, 0 embeddings with their emotionally fitted counter- we add its opposite ei to our second set O = 0 0 0 0 parts for various metrics based on emotional mod- {(w1, e1), (w1, e3), (w2, e2),...} in which e is i els. As we trained the model on Plutchik’s model the opposite emotion to ei based on Plutchiks model. We have extracted over 8k such pairs of we decided to use another emotion model for test- (word, emotion) constraints from NRC lexicon for ing. In the first experiment we assess the average each of the positive and negative relation sets. in-category mutual similarity of secondary and ter- We define our objective functions so that we de- tiary emotions in the three level categorization of crease the angular distance between words with emotions described by Shaver et al.(1987). In this their associated emotion in the set S, and at the model, Shaver et al. defined 6 basic emotions of same time, increase their distance with their op- Liking, Joy, , Anger, Sadness, and Fear, posite emotions in the set O. We want the pairs and categorized around 140 sub-emotions under of words in positive relation set to get closer to- these 6 emotions in two layers (See Table1). The gether, so the objective function for positive rela- reported numbers are the average cosine similarity tions would be: of all mutual in-category emotions words, and can be seen in Table2. The vector spaces used here 0 X 0 0 are: PR(V ) = max(0, d(vu, vw)) (1) (u,w)∈S • Word2Vec trained full English Wikipedia dump

0 0 • GloVe from their own website where d(vu, vw) is the cosine distance between the two vectors. And we want to increase the dis- • fastText trained with subword information on Common tance between pairs of words in our negative rela- Crawl tion set, so the objective function for the negative • ConceptNet Numberbatch relations would be: It is clear that each emotionally fitted vector 0 X 0 0 NR(V ) = max(0, 1 − d(vu, vw)) (2) space is preforming much better than its original (u,w)∈O counterpart from 13% improvement for Word2Vec model, to 29% for GloVe vectors. Overall best We also need to make sure we lose as little infor- performance belongs to emotionally fitted Con- mation as possible by preserving the shape of our ceptNet Numberbatch by average similarity score original vector space. In order to do this, we add of 0.57 up from 0.47 (See Table2). a third part to our objective function to make sure In the second experiment we assessed the per- we are not changing overall shape of the space by formance of the model for similarity between op- much: posite emotions. Again we used Shaver et al.’s categorization as our testing emotion model and N 0 X X 0 0 calculated the mutual similarity between opposite VSP (V,V ) = max(0, |d(vu, vw)−d(vu, vw)|) i=1 j∈N(i) emotion groups. In this test we chose two pairs of (3) opposite emotions, Joy vs. Sadness and Anger vs. For efficiency purposes, we only calculate the Fear. The reported numbers are average cosine distance for a neighborhood of each word N(i) similarity between each member of the opposite which includes all words within the radius dis- emotion categories. tance r = 0.2 of the word. So our final objective As shown in Table2 the models perform signif- function is the sum of all three together: icantly better after training with best performance Primary Emotion Secondary Emotion Tertiary Emotion Fondness Liking Attractiveness Caring Tenderness Liking /Sexual Desire Longing Longing Bliss Gaiety Glee Jolliness Joviality Joy De-light Enjoyment Gladness Cheerfulness Jubilation Elation Satisfaction Zest Zeal Excitement Thrill Exhilaration Joy Triumph Eagerness Enthrallment Enthrallment Rapture Relief Relief Surprise Surprise Amazement Astonishment Aggravation Agitation Grouchy Grumpy Crosspatch Exasperation Anger Fury Wrath Ferocity Bitter Scorn Vengefulness Dislike Anger Revulsion Loathing Torment Torment Agony Hurt Sadness Despair Gloom Glumness Unhappy Sor-row Woe Misery Melancholy Dismay Displeasure Sadness Alienation Dejection Insecurity Isola- Neglect tion Rejection Sympathy Horror Alarm Shock Fear Fright Horror Terror Mortification Fear Nervousness Suspense Uneasiness Apprehension (fear) Distress Dread

Table 1: Three layered emotion classification.

Sadness vs. Joy Anger vs. Fear In-category Similarity Before After Before After Before After Word2Vec 0.32 0.16 0.31 0.09 0.45 0.51 GloVe 0.23 0.11 0.19 0.04 0.38 0.49 fastText 0.38 0.17 0.33 0.12 0.44 0.50 Numberbatch 0.23 0.10 0.19 0.05 0.47 0.57

Table 2: Left: Average similarity between opposite emotion groups. We want the similarity of opposite emotions be as close to zero az possible. After training the average similarities decrease for all models . Right: Average of in-category mutual similarity in three layered categorization of emotions before and after emotional fitting. We want the similarity of close emotions be as close to one as possible. After training, average similarity of in-category emotions increases for all models. for the retrained Numberbatch in distinguishing the first steps toward creating and analyzing emo- between Anger vs. Fear and retrained Glove for tional embeddings, further study is required, and Joy vs. Sadness (with Numberbatch following being done, to test various emotional models and closely). All emotionally fitted vector spaces can lexicons, and to improve emotional information be accessed via this link1. that can be incorporated into these models. 5 Conclusion and Future Work Embedding models have an important role in word representation in various natural language pro- This methodology could be improved by in- cessing tasks. They are able to preserve many corporating more complex emotional information, types information about the terms, but not emo- such as intensity of emotions and combination of tional information which due to its complexity is emotions defined in psychological models. Train- hard to grasp just from the statistical information ing the original embeddings on corpora that are of the corpus. In this paper, we have proposed an more emotionally rich might also increase the approach to incorporate emotional information of emotion sensitivity of these models. With the ab- words into these models in a second stage of train- sence of established emotional similarity metrics, ing, and showed that this methodology is able to we chose a secondary emotion model to create a increase the performance of the embedding model basic similarity test. With lack of standard met- in the defined emotional similarity metrics from rics and corpora, further testing of the model is 13% increase for Word2Vec to 29% increase for required to see how they increase performance in Glove with the best performance belonging to re- related NLP tasks such as detecting emotion trained Conceptnet Numberbatch. While this is in text using recurrent neural networks. 1https://goo.gl/R43CEQ References Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: tasks, ap- Richard P Bagozzi, Mahesh Gopinath, and Prashanth U proaches and applications. Knowledge-Based Sys- Nyer. 1999. The role of emotions in market- tems, 89:14–46. ing. Journal of the academy of marketing science, 27(2):184–206. Armin Seyeditabari and Wlodek Zadrozny. 2017. Can word embeddings help find latent emotions in text? Scott Brave and Clifford Nass. 2003. Emotion in preliminary results. In The Thirtieth International human–computer interaction. Human-Computer In- Flairs Conference. teraction, page 53. Phillip Shaver, Judith Schwartz, Donald Kirson, and James N. Druckman and Rose McDermott. 2008. Cary O’connor. 1987. Emotion knowledge: further Emotion and the framing of risky choice. Political exploration of a prototype approach. Journal of per- Behavior, 30(3):297–321. sonality and social psychology, 52(6):1061. Manaal Faruqui, Jesse Dodge, Sujay K Jauhar, Chris Robert Speer and Joshua Chin. 2016. An ensemble Dyer, Eduard Hovy, and Noah A Smith. 2014. method to produce high-quality word embeddings. Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1604.01692. arXiv preprint arXiv:1411.4166. Robert Speer, Joshua Chin, and Catherine Havasi. RichardSocher JeffreyPennington, ChristopherD CD 2016. Conceptnet 5.5: An open multilingual Manning, J Pennington, R Socher, and Christo- graph of general knowledge. arXiv preprint pherD CD Manning. 2014. Glove: Global vectors arXiv:1612.03975. for word representation. Proceedings of the Empiri- cial Methods in . . . , 12:1532–1543. Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embed- Jennifer S Lerner and Dacher Keltner. 2000. Beyond dings with applications to sentiment analysis. IEEE valence: Toward a model of emotion-specific influ- Transactions on Knowledge and Data Engineering, ences on judgement and choice. Cognition & emo- 28(2):496–509. tion, 14(4):473–493. Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- Liu, and Bing Qin. 2014. Learning sentiment- frey Dean. 2013a. Efficient estimation of word specific word embedding for twitter sentiment clas- representations in vector space. arXiv preprint sification. In Proceedings of the 52nd Annual Meet- arXiv:1301.3781. ing of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1555– Tomas Mikolov, Edouard Grave, Piotr Bojanowski, 1565. Christian Puhrsch, and Armand Joulin. 2017. Ad- vances in pre-training distributed word representa- Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, tions. arXiv preprint arXiv:1712.09405. Alex Smola, and Eduard Hovy. 2016. Hierarchi- cal attention networks for document classification. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- In Proceedings of the 2016 Conference of the North rado, and Jeff Dean. 2013b. Distributed representa- American Chapter of the Association for Computa- tions of words and phrases and their compositional- tional Linguistics: Human Language Technologies, ity. In Advances in neural information processing pages 1480–1489. systems, pages 3111–3119. Liang-Chih Yu, Jin Wang, K Robert Lai, and Xuejie Daniel A Miller, Tracey Cronin, Amber L Garcia, and Zhang. 2017. Refining word embeddings for sen- Nyla R Branscombe. 2009. The relative impact of timent analysis. In Proceedings of the 2017 Con- anger and efficacy on collective action is affected by ference on Empirical Methods in Natural Language of fear. Group Processes & Intergroup Re- Processing, pages 534–539. lations, 12(4):445–462.

Saif M Mohammad and Peter D Turney. 2013. Crowd- sourcing a word–emotion association lexicon. Com- putational Intelligence, 29(3):436–465.

Nikola Mrksiˇ c,´ Diarmuid O Seaghdha,´ Blaise Thom- son, Milica Gasiˇ c,´ Lina Rojas-Barahona, Pei- Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. 2016. Counter-fitting word vec- tors to linguistic constraints. arXiv preprint arXiv:1603.00892.

Robert Plutchik. 1991. The emotions. University Press of America.