Happy Dance, Slow Clap: Using Reaction Gifs to Predict Induced Affect on Twitter
Total Page:16
File Type:pdf, Size:1020Kb
Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter Boaz Shmueli1;2;3 , Soumya Ray2, and Lun-Wei Ku3 1Social Networks and Human-Centered Computing, TIGP, Academia Sinica 2Institute of Service Science, National Tsing Hua University 3Institute of Information Science, Academia Sinica [email protected] [email protected] [email protected] Abstract presently most human-annotated emotion datasets are labeled with perceived emotions (e. g., Strap- induced emotion Datasets with labels are parava and Mihalcea, 2008; Preo¸tiuc-Pietroet al., scarce but of utmost importance for many NLP tasks. We present a new, automated method for 2016; Hsu and Ku, 2018; Demszky et al., 2020). collecting texts along with their induced reac- Induced emotions data can be collected using phys- tion labels. The method exploits the online use iological measurements or self-reporting, but both of reaction GIFs, which capture complex affec- methods are complex, expensive, unreliable and tive states. We show how to augment the data cannot scale easily. Still, having well-classified with induced emotions and induced sentiment induced emotions data is of utmost importance labels. We use our method to create and pub- to dialogue systems and other applications that lish ReactionGIF, a first-of-its-kind affective aim to detect, predict, or elicit a particular emo- dataset of 30K tweets. We provide baselines for three new tasks, including induced senti- tional response in users. Pool and Nissim(2016) ment prediction and multilabel classification used distant supervision to detect induced emotions of induced emotions. Our method and dataset from Facebook posts by looking at the six available open new research opportunities in emotion emoji reactions. Although this method is automatic, detection and affective computing. it is limited both in emotional range, since the set of reactions is small and rigid, and accuracy, because 1 Introduction emojis are often misunderstood due to their visual Affective states such as emotions are an elemental ambiguity (Tigwell and Flatla, 2016). part of the human condition. The automatic de- To overcome these drawbacks, we propose a new tection of these states is thus an important task in method that innovatively exploits the use of reac- affective computing, with applications in diverse tion GIFs in online conversations. Reaction GIFs fields including psychology, political science, and are effective because they “display emotional re- marketing (Seyeditabari et al., 2018). Training ma- sponses to prior talk in text-mediated conversations” chine learning algorithms for such applications (Tolins and Samermit, 2016). We propose a fully- requires large yet task-specific emotion-labeled automated method that captures in-the-wild texts, datasets (Bostan and Klinger, 2018). naturally supervised using fine-grained, induced Borrowing from music (Gabrielsson, 2001) and reaction labels. We also augment our dataset with arXiv:2105.09967v1 [cs.CL] 20 May 2021 film (Tian et al., 2017), one can distinguish between sentiment and emotion labels. We use our method two reader perspectives when labeling emotions in to collect and publish the ReactionGIF dataset.1 text: perceived emotions, which are the emotions that the reader recognizes in the text, and induced 2 Automatic Supervision using GIFs emotions, which are the emotions aroused in the Figure 1a shows a typical Twitter thread. User reader. However, with the exception of Buechel A writes “I can’t take this any more!”. User B and Hahn(2017), this distinction is mostly miss- replies with a reaction GIF depicting an embrace. ing from the NLP literature, which focuses on the Our method automatically infers a hug reaction, distinction between author and reader perspectives signaling that A’s text induced a feeling of love and (Calvo and Mac Kim, 2013). caring. In the following, we formalize our method. The collection of perceived emotions data is con- siderably simpler than induced emotions data, and 1github.com/bshmueli/ReactionGIF (a) A root tweet (“I can’t take...”) with (b) The reaction categories menu of- (c) The top reaction GIFs offered to a hug reaction GIF reply. fers 43 categories (hug, kiss, ...). the user from the hug category. Figure 1: How reaction GIFs are used (left) and inserted (middle, right) on Twitter. 2.1 The Method Computing R(g) Given a (t;g) sample, we la- bel text t with reaction category r by mapping re- Let (t;g) represent a 2-turn online interaction action GIF g back to its category r = R(g). We with a root post comprised solely of text t, and search for g in the GIF dictionary and identify the a reply containing only reaction GIF g. Let R = category(ies) in which it is offered to the user. If {R ;R ;:::;R } be a set of M different reaction cat- 1 2 M the GIF is not found, the sample is discarded. For egories representing various affective states (e. g., the small minority of GIFs that appear in two or hug, facepalm). The function R maps a GIF g to more categories, we look at the positions of the a reaction category, g ↦ R(g), R(g) ∈ R. We use GIF in each of its categories and select the category r = R(g) as the label of t. In the Twitter thread with the higher position. shown in Figure 1a, the label of the tweet “I can’t take this any more!” is r = R(g) = hug. 2.2 Category Clustering Inferring R(g) would usually require humans to manually view and annotate each GIF. Our method Because reaction categories represent overlapping automatically determines the reaction category con- affective states, a GIF may appear in multiple cat- veyed in the GIF. In the following, we explain how egories. For example, a GIF that appears in the we automate this step. thumbs up category may also appear in the ok cate- gory, since both express approval. Out of the 4300 GIF Dictionary We first build a dictionary of GIFs, 408 appear in two or more categories. Ex- GIFs and their reaction categories by taking ad- ploiting this artefact, we propose a new metric: the vantage of the 2-step process by which users post pairwise reaction similarity, which is the number reaction GIFs. We describe this process on Twitter; of reaction GIFs that appear in a pair of categories. other platforms follow a similar approach: To automatically discover affinities between re- Step 1: The user clicks on the GIF button. A action categories, we use our similarity metric and menu of reaction categories pops up (Figure 1b). perform hierarchical clustering with average link- Twitter has 43 pre-defined categories (e. g., high age. The resulting dendrogram, shown in Figure2, five, hug). The user clicks their preferred category. uncovers surprisingly well the relationships be- Step 2: A grid of reaction GIFs from the selected tween common human gesticulations. For exam- category is displayed (Figure 1c). The user selects ple, shrug and idk (I don’t know) share common one reaction GIF to insert into the tweet. emotions related to uncertainty and defensiveness. To compile the GIF dictionary, we collect the In particular, we can see two major clusters cap- first 100 GIFs in each of the M = 43 reaction cate- turing negative sentiment (left cluster: mic drop gories on Twitter. We save the 4300 GIFs, along to smh [shake my head]) and positive sentiment with their categories, to the dictionary. While in (right cluster: hug to slow clap), which are useful general GIFs do not necessarily contain affective for downstream sentiment analysis tasks. The two information, our method collects reaction GIFs that rightmost singletons, popcorn and thank you, lack depict corresponding affective states. sufficient similarity data. 1.00 0.75 0.50 0.25 0.00 no ok idk omg eww smh hug win yes oops yolo want yawn sigh kiss wink awww sorry shrug agree dance please scared hearts oh_snap shocked popcorn mic_drop eye_roll facepalm applause seriously fist_bump high_five good_luck thumbs_up slow_clap thank_you thumbs_down do_not_want happy_dance deal_with_it you_got_this Figure 2: Hierarchical clustering (average linkage) of reaction categories shows relationships between reactions. 3 ReactionGIF Dataset Admiration Curiosity Fear Pride Amusement Desire Gratitude Realization We applied our proposed method to 30K English- Anger Disappointment Grief Relief language (t;g) 2-turn pairs collected from Twitter Annoyance Disapproval Joy Remorse in April 2020. t are text-only root tweets (not con- Approval Disgust Love Sadness Caring Embarrassment Nervousness Surprise taining links or media) and g are pure GIF reactions. Confusion Excitement Optimism We label each tweet t with its reaction category r = R(g). See AppendixA for samples. The result- Table 1: The 27 emotions in Demszky et al.(2020). ing dataset, ReactionGIF, is publicly available. Figure3 shows the category distribution’s long 4 Baselines tail. The top seven categories (applause to eye- roll) label more than half of the samples (50.9%). As this is the first dataset of its kind, we aim to Each of the remaining 36 categories accounts for promote future research by offering baselines for between 0.2% to 2.8% of the samples. predicting the reaction, sentiment, and emotion in- duced by tweets. We use the following four models Label Augmentation Reaction categories con- in our experiments: vey a rich affective signal. We can thus augment the dataset with other affective labels. We add sen- • Majority: A simple majority class classifier. timent labels by using the positive and negative • LR: Logistic regression classifier (L-BFGS reaction category clusters, labeling each sample solver with C = 3, maximum iterations 1000, according to its cluster’s sentiment (§2.2). Fur- stratified K-fold cross validation with K = 5) thermore, we add emotion labels using a novel using TF-IDF vectors (unigrams and bigrams, reactions-to-emotions mapping: we asked 3 anno- cutoff 2, maximum 1000 features, removing tators to map each reaction category onto a subset English-language stop words).