Fine-Grained Emotion Detection with Gated Recurrent Neural Networks

EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks Muhammad Abdul-Mageed Lyle Ungar School of Library, Archival & Computer and Information Science Information Studies University of Pennsylvania University of British Columbia [email protected] [email protected] Abstract However, emotion detection has remained a chal- lenging task, partly due to the limited availabil- Accurate detection of emotion from natu- ity of labeled data and partly due the controversial ral language has applications ranging from nature of what emotions themselves are (Aaron building emotional chatbots to better un- C. Weidman and Tracy, 2017). derstanding individuals and their lives. Recent advances in machine learning for natu- However, progress on emotion detection ral language processing (NLP) suggest that, given has been hampered by the absence of large enough labeled data, there should be an oppor- labeled datasets. In this work, we build tunity to build better emotion detection models. a very large dataset for fine-grained emo- Manual labeling of data, however, is costly and tions and develop deep learning models so it is desirable to develop labeled emotion data on it. We achieve a new state-of-the-art without annotators. While the proliferation of on 24 fine-grained types of emotions (with social media has made it possible for us to ac- an average accuracy of 87.58%). We also quire large datasets with implicit labels in the form extend the task beyond emotion types to of hashtags (Mohammad and Kiritchenko, 2015), model Robert Plutchik’s 8 primary emo- such labels are noisy and reliable. tion dimensions, acquiring a superior ac- In this work, we seek to enable deep learning curacy of 95.68%. by creating a large dataset of fine-grained emotions using Twitter data. More specifically, we 1 Introduction harness cues in Twitter data in the form of emotion hashtags as a way to build a labeled emotion According to the Oxford English Dictionary, emo- dataset that we then exploit using distant supervi- tion is defined as “[a] strong feeling deriving sion (Mintz et al., 2009) (the use of hashtags as a from one’s circumstances, mood, or relationships 1 surrogate for annotator-generated emotion labels) with others.” This “standard” definition identifies to build emotion models grounded in psychology. emotions as constructs involving something innate We construct such a dataset and exploit it using that is often invoked in social interactions and that powerful deep learning methods to build accu- aids in communicating with others(Hwang and rate, high coverage models for emotion prediction. Matsumoto, 2016). It is no exaggeration that hu- Overall, we make the following contributions: 1) mans are emotional beings: Emotions are an in- Grounded in psychological theory of emotions, we tegral part of human life, and affect our decision build a large-scale, high quality dataset of tweets making as well as our mental and physical health. labeled with emotions. Key to this are methods to As such, developing emotion detection models is ensure data quality, 2) we validate the data collec- important; they have a wide array of applications, tion method using human annotations, 3) we de- ranging from building nuanced virtual assistants velop powerful deep learning models using a gated that cater for the emotions of their users to de- recurrent network to exploit the data, yielding new tecting the emotions of social media users in order state-of-the-art on 24 fine-grained types of emo- to understand their mental and/or physical health. tions, and 4) we extend the task beyond these emo- 1https://en.oxforddictionaries.com/ tion types to model Plutick’s 8 primary emotion definition/emotion. dimensions. 718 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 718–728 Vancouver, Canada, July 30 - August 4, 2017. c 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1067 Our emotion modeling relies on distant supervi- example, (Mohammad, 2012) shows that by using sion (Read, 2005; Mintz et al., 2009), the approach a simple domain adaptation method to train a clas- of using cues in data (e.g., hashtags or emoticons) sifier on their data they are able to improve both as a proxy for “ground truth” labels as we ex- precision and recall on the SemEval-2007 (Strap- plained above. Distant supervision has been in- parava and Mihalcea, 2007) dataset. As the author vestigated by a number of researchers for emotion points out, this is another premise that the self- detection (Tanaka et al., 2005; Mohammad, 2012; labeled hashtags acquired from Twitter are con- Purver and Battersby, 2012; Wang et al., 2012; sistent, to some degree, with the emotion labels Pak and Paroubek, 2010; Yang et al., 2007) and given by the trained human judges who labeled for other semantic tasks such as sentiment anal- the SemEval-2007 data. As pointed out earlier, ysis (Read, 2005; Go et al., 2009) and sarcasm (Wang et al., 2012) randomly sample a set of 400 detection (Gonzalez-Ib´ anez´ et al., 2011). In these tweets from their data and human-label as rele- works, authors successfully use emoticons and/or vant/irrelevant, as a way to verify the distant super- hashtags as marks to label data after performing vision approach with the quality assurance heuris- varying degrees of data quality assurance. We tics they employ. The authors found that the pre- take a similar approach, using a larger collection cision on a test set is 93.16%, thus confirming of tweets, richer emotion definitions, and stronger the utility of the heuristics. (Wang et al., 2012) filtering for tweet quality. provide a number of important observations, as The remainder of the paper is organized as fol- conclusions based on their work. These include lows: We first overview related literature in Sec- that since they are provided by the tweets’ writers, tion2, describe our data collection in Section 3.1, the emotion hashtags are more natural and reli- and the annotation study we performed to validate able than the emotion labels traditionally assigned our distant supervision method in Section4. We by annotators to data by a few annotators. This then describe our methods in Section5, provide is the case since in the lab-condition method an- results in Section6, and conclude in Section8. notators need to infer the writers emotions from text, which may not be accurate. Additionally, 2 Related Work (Volkova and Bachrach, 2016) follow the same distant supervision approach and find correlations 2.1 Computational Treatment of Emotion of users’ emotional tone and the perceived demo- The SemEval-2007 Affective Text task (Strappa- graphics of these users’ social networks exploit- rava and Mihalcea, 2007) [SEM07] focused on ing the emotion hashtag-labeled data. Our dataset classification of emotion and valence (i.e., posi- is more than an order of magnitude larger than tive and negative texts) in news headlines. A to- (Mohammad, 2012) and (Volkova and Bachrach, tal of 1,250 headlines were manually labeled with 2016) and the range of emotions we target is much the 6 basic emotions of Ekman (Ekman, 1972) and more fine grained than (Mohammad, 2012; Wang made available to participants. Similarly, (Aman et al., 2012; Volkova and Bachrach, 2016) since and Szpakowicz, 2007) describe an emotion anno- we model 24 emotion types, rather than focus on tation task of identifying emotion category, emo- 7 basic emotions. tion intensity and the words/phrases that indicate ≤ emotion in blog post data of 4,090 sentences and a (Yan et al., 2016; Yan and Turtle, 2016a,b) de- system exploiting the data. Our work differs from velop a dataset of 15,553 tweets labeled with 28 both that of SEM07 (Strapparava and Mihalcea, emotion types and so target a fine-grained range 2007) and (Aman and Szpakowicz, 2007) in that as we do. The authors instruct human annotators we focus on a different genre (i.e., Twitter) and in- under lab conditions to assign any emotion they vestigate distant supervision as a way to acquire a feel is expressed in the data, allowing them to as- significantly larger labeled dataset. sign more than one emotion to a given tweet. A set Our work is similar to (Mohammad, 2012; Mo- of 28 chosen emotions was then decided upon and hammad and Kiritchenko, 2015), (Wang et al., further annotations were performed using Ama- 2012), and (Volkova and Bachrach, 2016) who use zon Mechanical Turk (AMT). The authors cite an distant supervision to acquire Twitter data with agreement of 0.50 Krippendorff’s alpha (α) be- emotion hashtags and report analyses and exper- tween the lab/expert annotators, and an (α) of 0.28 iments to validate the utility of this approach. For between experts and AMT workers. EmoTweet- 719 28 is a useful resource. However, the agreement son, 2013; Maas et al., 2011; Tang et al., 2014b,a) between annotators is not high and the set of as- aim to learn sentiment-specific word embeddings signed labels do not adhere to a specific theory of (Bengio et al., 2003; Mikolov et al., 2013) from emotion. We use a much larger dataset and report neighboring text. Another thread of research fo- an accuracy of the hashtag approach at 90% based cuses on learning semantic composition (Mitchell on human judgement as reported in Section4. and Lapata, 2010), including extensions to phrases and sentences with recursive neural networks (a 2.2 Mood class of syntax-tree models) (Socher et al., 2013; A number of studies have also been performed Irsoy and Cardie, 2014; Li et al., 2015) and to to analyze and/or model mood in social media documents with distributed representations of sen- data.

Fine-Grained Emotion Detection with Gated Recurrent Neural Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support