<<

Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

Boaz Shmueli1,2,3 , Soumya Ray2, and Lun-Wei Ku3 1Social Networks and Human-Centered Computing, TIGP, Academia Sinica 2Institute of Service Science, National Tsing Hua University 3Institute of Information Science, Academia Sinica [email protected] [email protected] [email protected]

Abstract presently most human-annotated emotion datasets are labeled with perceived emotions (e. g., Strap- induced emotion Datasets with labels are parava and Mihalcea, 2008; Preo¸tiuc-Pietroet al., scarce but of utmost importance for many NLP tasks. We present a new, automated method for 2016; Hsu and Ku, 2018; Demszky et al., 2020). collecting texts along with their induced reac- Induced emotions data can be collected using phys- tion labels. The method exploits the online use iological measurements or self-reporting, but both of reaction GIFs, which capture complex affec- methods are complex, expensive, unreliable and tive states. We show how to augment the data cannot scale easily. Still, having well-classified with induced emotions and induced sentiment induced emotions data is of utmost importance labels. We use our method to create and pub- to dialogue systems and other applications that lish ReactionGIF, a first-of-its-kind affective aim to detect, predict, or elicit a particular emo- dataset of 30K tweets. We provide baselines for three new tasks, including induced senti- tional response in users. Pool and Nissim(2016) ment prediction and multilabel classification used distant supervision to detect induced emotions of induced emotions. Our method and dataset from Facebook posts by looking at the six available open new research opportunities in emotion reactions. Although this method is automatic, detection and affective computing. it is limited both in emotional range, since the set of reactions is small and rigid, and accuracy, because 1 Introduction are often misunderstood due to their visual Affective states such as emotions are an elemental ambiguity (Tigwell and Flatla, 2016). part of the human condition. The automatic de- To overcome these drawbacks, we propose a new tection of these states is thus an important task in method that innovatively exploits the use of reac- affective computing, with applications in diverse tion GIFs in online conversations. Reaction GIFs fields including psychology, political science, and are effective because they “display emotional re- marketing (Seyeditabari et al., 2018). Training ma- sponses to prior talk in text-mediated conversations” chine learning algorithms for such applications (Tolins and Samermit, 2016). We propose a fully- requires large yet task-specific emotion-labeled automated method that captures in-the-wild texts, datasets (Bostan and Klinger, 2018). naturally supervised using fine-grained, induced Borrowing from music (Gabrielsson, 2001) and reaction labels. We also augment our dataset with arXiv:2105.09967v1 [cs.CL] 20 May 2021 film (Tian et al., 2017), one can distinguish between sentiment and emotion labels. We use our method two reader perspectives when labeling emotions in to collect and publish the ReactionGIF dataset.1 text: perceived emotions, which are the emotions that the reader recognizes in the text, and induced 2 Automatic Supervision using GIFs emotions, which are the emotions aroused in the Figure 1a shows a typical Twitter thread. User reader. However, with the exception of Buechel A writes “I can’t take this any more!”. User B and Hahn(2017), this distinction is mostly miss- replies with a reaction GIF depicting an embrace. ing from the NLP literature, which focuses on the Our method automatically infers a hug reaction, distinction between author and reader perspectives signaling that A’s text induced a feeling of love and (Calvo and Mac Kim, 2013). caring. In the following, we formalize our method. The collection of perceived emotions data is con- siderably simpler than induced emotions data, and 1github.com/bshmueli/ReactionGIF (a) A root tweet (“I can’t take...”) with (b) The reaction categories menu of- (c) The top reaction GIFs offered to a hug reaction GIF reply. fers 43 categories (hug, , ...). the user from the hug category.

Figure 1: How reaction GIFs are used (left) and inserted (middle, right) on Twitter.

2.1 The Method Computing R(g) Given a (t,g) sample, we la- bel text t with reaction category r by mapping re- Let (t,g) represent a 2-turn online interaction action GIF g back to its category r = R(g). We with a root post comprised solely of text t, and search for g in the GIF dictionary and identify the a reply containing only reaction GIF g. Let R = category(ies) in which it is offered to the user. If {R ,R ,...,R } be a set of M different reaction cat- 1 2 M the GIF is not found, the sample is discarded. For egories representing various affective states (e. g., the small minority of GIFs that appear in two or hug, ). The function R maps a GIF g to more categories, we look at the positions of the a reaction category, g ↦ R(g), R(g) ∈ R. We use GIF in each of its categories and select the category r = R(g) as the label of t. In the Twitter thread with the higher position. shown in Figure 1a, the label of the tweet “I can’t take this any more!” is r = R(g) = hug. 2.2 Category Clustering Inferring R(g) would usually require humans to manually view and annotate each GIF. Our method Because reaction categories represent overlapping automatically determines the reaction category con- affective states, a GIF may appear in multiple cat- veyed in the GIF. In the following, we explain how egories. For example, a GIF that appears in the we automate this step. thumbs up category may also appear in the ok cate- gory, since both express approval. Out of the 4300 GIF Dictionary We first build a dictionary of GIFs, 408 appear in two or more categories. Ex- GIFs and their reaction categories by taking ad- ploiting this artefact, we propose a new metric: the vantage of the 2-step process by which users post pairwise reaction similarity, which is the number reaction GIFs. We describe this process on Twitter; of reaction GIFs that appear in a pair of categories. other platforms follow a similar approach: To automatically discover affinities between re- Step 1: The user clicks on the GIF button. A action categories, we use our similarity metric and menu of reaction categories pops up (Figure 1b). perform hierarchical clustering with average link- Twitter has 43 pre-defined categories (e. g., high age. The resulting dendrogram, shown in Figure2, five, hug). The user clicks their preferred category. uncovers surprisingly well the relationships be- Step 2: A grid of reaction GIFs from the selected tween common human gesticulations. For exam- category is displayed (Figure 1c). The user selects ple, and idk (I don’t know) share common one reaction GIF to insert into the tweet. emotions related to uncertainty and defensiveness. To compile the GIF dictionary, we collect the In particular, we can see two major clusters cap- first 100 GIFs in each of the M = 43 reaction cate- turing negative sentiment (left cluster: mic drop gories on Twitter. We save the 4300 GIFs, along to smh [shake my head]) and positive sentiment with their categories, to the dictionary. While in (right cluster: hug to slow clap), which are useful general GIFs do not necessarily contain affective for downstream sentiment analysis tasks. The two information, our method collects reaction GIFs that rightmost singletons, popcorn and thank you, lack depict corresponding affective states. sufficient similarity data. elblec tweet each label We English- 30K to method proposed our applied We n GIFGIF/GIFGIF+, and 2016 , , al. et Li by (TGIF e ihafciesga.W a hsaugment thus can We signal. affective rich a vey akbt h lctn texts eliciting categories. the reaction both the lack and 2014) al. , et Jou g., e. datasets GIF available Other category. with reaction labeled are a GIFs eliciting reaction their the with Moreover, GIFs texts. reaction offer to first the is Context in GIFs subset. emotions mapped category’s according its sample to each label and mapping many- to-many final the as decisions majority annotators’ the κ with emo- agreements moderate interrater expressed indicate kappa the Cohen’s Pairwise select GIFs tions. and the view category — to each (2020) were in al. Instructions et 1. Demszky Table in see emotions subset 27 a the onto of category reaction each map to anno- tators 3 asked we mapping: reactions-to-emotions Fur- add (§2.2). we sentiment thermore, cluster’s its sample to each according labeling clusters, category reaction add labels We timent labels. affective other with dataset the Augmentation Label samples. the of 2.8% for to accounts 0.2% categories between 36 remaining the of Each ( categories seven roll top The tail. available. publicly is ReactionGIF, dataset, ing r and media) or links taining 2020. April in language Dataset ReactionGIF 3 iue2: Figure 23 = sostectgr itiuinslong distribution’s category the shows 3 Figure R ae oeta afo h ape (50.9%). samples the of half than more label ) = 0.00 0.25 0.50 0.75 1.00 ( 0 g . 449 ) frsmls h result- The samples. for A Appendix See . (

irrhclcutrn aeaelnae frato aeoissosrltosisbtenreactions. between relationships shows categories reaction of linkage) (average clustering Hierarchical mic_drop t n lis kappa Fleiss’ and , , please g

) sorry yuigtepstv n negative and positive the using by t -unpiscletdfo Twitter from collected pairs 2-turn

r etol ottet ntcon- (not tweets root text-only are oops idk mto labels emotion sfra eko,ordataset our know, we as far As shrug

t deal_with_it ihisrato category reaction its with

ecinctgre con- categories Reaction yolo κ g scared 12 r ueGFreactions. GIF pure are oh_snap =

κ omg 0 F

. shocked 512 = eww 0 sn novel a using .

, want 483 κ thumbs_down 13

euse We . do_not_want =

to no 0

. yawn 494 sen- eye- eye_roll

, seriously sigh sti stefis aae fiskn,w i to aim we kind, its of dataset first the is this As prvlDsutLv Sadness Baselines 4 Remorse Optimism Love Relief Nervousness Realization Pride Excitement Joy Confusion Gratitude Disgust Grief Caring Disapproval Fear Approval Anger Desire Curiosity Amusement Admiration 2. Table in summarized are results experiment the The with reproducibility. along for available dataset publicly is code The tion. models experiments: four our following in the use We tweets. in- by emotion duced and sentiment, reaction, for the baselines predicting offering by research future promote facepalm al 1: Table smh ehl u 0 ftesmlsfrevalua- for samples the of 10% out hold We • • • • hug .Mvcblr,ucsd 0d (Pennington 100d) uncased, vocabulary, 1.2M ihGoeebdig Titr 7 tokens, 27B (Twitter, embeddings GloVe with bs,bthsz 2 aiu sequence maximum 32, size batch (base, 2019). , al. et (Liu epochs) training 3 96, length RoBERTa 2014). al., et 0.0005) rate learning 128, size batch 100 solver, epochs, Adam dropout; hid- 0.2 2 with pooling; layers den max global 3, size kernel ters, CNN words). stop English-language removing features, 1000 maximum 2, bigrams, cutoff and (unigrams vectors TF-IDF using with validation cross K-fold stratified with solver LR Majority kiss

oitcrgeso lsie (L-BFGS classifier regression Logistic : wink (2020). al. et Demszky in emotions 27 The

ovltoa erlntok(0 fil- (100 network neural Convolutional : awww hearts

ipemjrt ls classifier. class majority simple A : win r-rie rnfre model transformer Pre-trained :

C fist_bump high_five =

3 good_luck aiu trtos1000, iterations maximum , you_got_this ok thumbs_up agree yes dance happy_dance applause slow_clap popcorn

K thank_you = 5 ) weight-averaged al 2: Table h ihs au nec ouni emboldened. is column each in value highest The h amo n’ adi ruh ooesfc,a nexpres- University an Oxford lexico.com/en/definition/facepalm exasperation.”, as Press, face, or one’s shame, to disbelief, brought of is sion hand one’s of palm the 1996). (Mehrabian, emotion of di- model the mensional or 1992) NLP (Ekman, in model the emotions either available discrete use yet datasets not existing label, datasets: of emotion type new a category reaction is the Consequently, 2016). (Bakhshi al., emojis et and , text, to compared effec- precise more and also tive are and They lightweight pictures. as moving uniqueness silent their to conversa- due online tions in ubiquitous are GIFs Reaction Discussion 5 0.596. Average of Ranking (LRAP) Label Precision with model, best the is RoBERTa again states. emotional complex capture ability to unique dataset’s our reflecting task, cation 27-emotion a is This emotions. predicting for transformation reaction-to-emotion and 70.0% accuracy F with performance best the has by induced tweet sentiment the predict to task sification reaction the predict category we where task classification 0.00 0.02 0.04 0.06 0.08 0.10 1 soeo 69.8%. of -score fetv ecinPrediction Reaction Affective Finally, Prediction Sentiment Induced 2 o xml,the example, For

t applause yuigteagetdlbl.RoBERTa labels. augmented the using by

aeie o h ecin etmn,adeoincasfiaintss l erc r weight-averaged. are metrics All tasks. classification emotion and sentiment, reaction, the for Baselines agree r nue mto Prediction Emotion Induced o ahtweet each for hug no yes 2 F hncneigafciestates affective conveying when facepalm shocked 1 soeo 23.9%. of -score CNN LR Majority Model Task RoBERTa eye_roll iue3: Figure smh

→ eww ecini agsuei which in “a is reaction ↓ t

oET civsa achieves RoBERTa . thank_you seriously 842. 8423.9 28.4 19.1 23.6 25.5 18.0 2.0 28.4 17.3 22.7 10.4 25.5 19.5 1.1 22.7 10.4 c R P Acc itiuino h 3rato aeoisi ReactionGIF in categories reaction 43 the of Distribution awww multilabel facepalm sabnr clas- binary a is samulticlass a is ok Reaction thumbs_up oops ssour uses

classifi- popcorn sigh oh_snap F

1 omg you_got_this 476. 4762.4 64.7 42.6 64.4 58.0 64.7 33.7 58.0 006. 0069.8 70.0 66.3 69.7 67.1 70.0 66.8 67.1 c R P Acc ilas mrv the improve also will h e aespsesipratavnae,but advantages, important possess labels new The e h otdfcl ooti,wt applications with obtain, to difficult most the yet Advantages challenges. interesting present also ewe ecin (§2.2). reactions between relationships studying for approaches new offering dictionary GIF larger A and dataset, metrics. larger evaluation a new GIFs), 4300 has (our one dictionary current GIF larger a require Precise will mapping emotions. accu- to by reactions is the issues mapping these rately address to cate- way the One between gories. overlap affective an tail, is long there a and has distribution category the task. addition, challenging In a prediction their makes by ) communication of richness the (reflecting gories Challenges response. emotional requires users’ that inducing or application predicting other any and systems, dialogue systems, recommender GIF include that importance prime of are labels these states; fective la- the measure datasets, bels emotion other most with contrast in and Significantly, labels. affective high-quality naturally-occurring, inexpensive, of collect amounts automatically large can on- we in conversations, ubiquitous addi- line are In GIFs reaction moods. because and tion, feelings possibly and emo- tions sentiment, including other labels, to affective mapped of be types can that signal complex rich, scared dance wink Sentiment high_five idk slow_clap induced

h ag ubro ecincate- reaction of number large The happy_dance h e ecinlbl rvd a provide labels reaction new The good_luck F

1 shrug

ecinsimilarity reaction please Emotion a poe to opposed (as LRAP 0.529 0.445 0.596 0.557 want do_not_want yawn thumbs_down hearts deal_with_it

perceived win

saccuracy, ’s fist_bump mic_drop kiss yolo af- ) sorry 6 Conclusion offering by the Twitter platform to all its users. As a result, this content is highly familiar to users of Our new method is the first to exploit the use of social media platforms such as Facebook or Twitter, reaction GIFs for capturing in-the-wild induced af- and thus presents a very low risk of psychological fective data. We augment the data with induced harm. Annotators gave informed consent after be- sentiment and emotion labels using two novel map- ing presented with details about the purpose of the ping techniques: reaction category clustering and study, the procedure, risks, benefits, statement of reactions-to-emotions transformation. We used our confidentiality and other standard consent items. method to publish ReactionGIF, a first-of-its-kind Each annotator was paid US$18. The average com- dataset with multiple affective labels. The new pletion time was 45 minutes. method and dataset offer opportunities for advances in emotion detection. Applications Moreover, our method can be generalized to cap- The dataset and resulting models can be used to ture data from other social media and instant mes- infer readers’ induced emotions. Such capability saging platforms that use reaction GIFs, as well as can be used to help online platforms detect and fil- applied to other downstream tasks such as multi- ter out content that can be emotionally harmful, or modal emotion detection and emotion recognition emphasize and highlight texts that induce positive in dialogues, thus enabling new research directions emotions with the potential to improve users’ well- in affective computing. being. For example, when a person is in grief or Acknowledgements distress, platforms can give preference to responses which will induce a feeling of caring, gratitude, We thank the anonymous reviewers for their com- love, or optimism. Moreover, such technology ments and suggestions. Special thanks to Thilina can be of beneficial use in assistive computing ap- Rajapakse, creator of the elegant Simple Trans- plications. For example, people with emotional formers package, for his help. disabilities can find it difficult to understand the This research was partially supported by the emotional affect in stories or other narratives, or Ministry of Science and Technology in Taiwan un- decipher emotional responses by third parties. By der grants MOST 108-2221-E-001-012-MY3 and computing the emotional properties of texts, such MOST 109-2221-E-001-015- [sic]. applications can provide hints or instructions and provide for smoother and richer communication. Ethical Considerations and Implications However, this technology also has substantial risks and peril. Inducing users’ affective response can Data Collection also be used by digital platforms in order to stir The ReactionGIF data was collected from Twit- users into specific action or thoughts, from prod- ter using the official API in full accordance with uct purchase and ad clicking to propaganda and their Development Agreement and Policy (Twitter, opinion forming. Deployers must ensure that users 2020). Similar to other Twitter datasets, we include understand and agree to the use of such systems, the tweet IDs but not the texts. This guarantees and consider if the benefit created by such systems that researchers who want to use the data will also outweigh the potential harm that users may incur. need to agree with Twitter’s Terms of Service. It also ensures compliance with section III (Updates and Removals) of the Developer Agreement and References Policy’s requirement that when users delete tweets Saeideh Bakhshi, David A. Shamma, Lyndon Kennedy, (or make them private), these changes are reflected Yale Song, Paloma de Juan, and Joseph ’Jofish’ in the dataset (Belli et al., 2020). Kaye. 2016. Fast, cheap, and good: Why animated GIFs engage us. In Proceedings of the 2016 CHI Annotation Conference on Human Factors in Computing Sys- tems, CHI ’16, page 575–586, New York, NY, USA. Annotation work was performed by three adult stu- Association for Computing Machinery. dents, two males and one female, who use social Luca Belli, Sofia Ira Ktena, Alykhan Tejani, Alexandre media regularly. The labeling involved viewing 43 Lung-Yut-Fong, Frank Portman, Xiao Zhu, Yuanpu sets of standard reaction GIFs, one for each reac- Xie, Akshay Gupta, Michael M. Bronstein, Amra tion category. These reaction GIFs are the standard Delic,´ Gabriele Sottocornola, Vito Walter Anelli, Nazareno Andrade, Jessie Smith, and Wenzhe Shi. Albert Mehrabian. 1996. Pleasure-arousal-dominance: 2020. Privacy-preserving recommender systems A general framework for describing and measuring challenge on Twitter’s home timeline. CoRR, individual differences in temperament. Current Psy- abs/2004.13715. chology, 14(4):261–292.

Laura-Ana-Maria Bostan and Roman Klinger. 2018. Jeffrey Pennington, Richard Socher, and Christopher An analysis of annotated corpora for emotion clas- Manning. 2014. GloVe: Global vectors for word sification in text. In Proceedings of the 27th Inter- representation. In Proceedings of the 2014 Confer- national Conference on Computational Linguistics, ence on Empirical Methods in Natural Language pages 2104–2119, Santa Fe, New Mexico, USA. As- Processing (EMNLP), pages 1532–1543, Doha, sociation for Computational Linguistics. Qatar. Association for Computational Linguistics.

Sven Buechel and Udo Hahn. 2017. Readers vs. writ- Chris Pool and Malvina Nissim. 2016. Distant super- ers vs. texts: Coping with different perspectives of vision for emotion detection using Facebook reac- text understanding in emotion annotation. In Pro- tions. In Proceedings of the Workshop on Compu- ceedings of the 11th Linguistic Annotation Work- tational Modeling of People’s Opinions, Personality, shop, pages 1–12, Valencia, Spain. Association for and Emotions in Social Media (PEOPLES), pages Computational Linguistics. 30–39, Osaka, Japan. The COLING 2016 Organiz- ing Committee. Rafael A. Calvo and Sunghwan Mac Kim. 2013. Emo- tions in text: Dimensional and categorical models. Daniel Preo¸tiuc-Pietro,H. Andrew Schwartz, Gregory Computational Intelligence, 29(3):527–543. Park, Johannes Eichstaedt, Margaret Kern, Lyle Un- gar, and Elisabeth Shulman. 2016. Modelling va- Dorottya Demszky, Dana Movshovitz-Attias, Jeong- lence and arousal in Facebook posts. In Proceedings woo Ko, Alan Cowen, Gaurav Nemade, and Sujith of the 7th Workshop on Computational Approaches Ravi. 2020. GoEmotions: A dataset of fine-grained to Subjectivity, Sentiment and Social Media Analy- Proceedings of the 58th Annual Meet- emotions. In sis, pages 9–15, San Diego, California. Association ing of the Association for Computational Linguistics , for Computational Linguistics. pages 4040–4054, Online. Association for Computa- tional Linguistics. Armin Seyeditabari, Narges Tabari, and Wlodek Zadrozny. 2018. Emotion detection in text: a review. Paul Ekman. 1992. An argument for basic emotions. arXiv preprint arXiv:1806.00674. Cognition and Emotion, 6(3-4):169–200. Alf Gabrielsson. 2001. Emotion perceived and emo- Carlo Strapparava and Rada Mihalcea. 2008. Learning tion felt: Same or different? Musicae Scientiae, to identify emotions in text. In Proceedings of the 5(Special Issue: Current Trends in the Study of Mu- 2008 ACM Symposium on Applied Computing, SAC sic and Emotion):123–147. ’08, page 1556–1560, New York, NY, USA. Associ- ation for Computing Machinery. Chao-Chun Hsu and Lun-Wei Ku. 2018. SocialNLP 2018 EmotionX challenge overview: Recognizing Leimin Tian, Michal Muszynski, Catherine Lai, Jo- emotions in dialogues. In Proceedings of the Sixth hanna D. Moore, Theodoros Kostoulas, Patrizia International Workshop on Natural Language Pro- Lombardo, Thierry Pun, and Guillaume Chanel. cessing for Social Media, pages 27–31, Melbourne, 2017. Recognizing induced emotions of movie au- Australia. Association for Computational Linguis- diences: Are induced and perceived emotions the tics. same? In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Brendan Jou, Subhabrata Bhattacharya, and Shih-Fu (ACII), pages 28–35. Chang. 2014. Predicting viewer perceived emotions in animated GIFs. In Proceedings of the 22nd ACM Garreth W. Tigwell and David R. Flatla. 2016. Oh International Conference on Multimedia, MM ’14, that’s what you meant!Reducing emoji misunder- page 213–216, New York, NY, USA. Association for standing. In Proceedings of the 18th International Computing Machinery. Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct, MobileHCI Y. Li, Y. Song, L. Cao, J. Tetreault, L. Goldberg, ’16, page 859–866, New York, NY, USA. Associa- A. Jaimes, and J. Luo. 2016. TGIF: A new dataset tion for Computing Machinery. and benchmark on animated GIF description. In 2016 IEEE Conference on Computer Vision and Pat- Jackson Tolins and Patrawat Samermit. 2016. GIFs tern Recognition (CVPR), pages 4641–4650, Los as embodied enactments in text-mediated conversa- Alamitos, CA, USA. IEEE Computer Society. tion. Research on Language and Social Interaction, 49(2):75–91. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Twitter. 2020. Developer Agreement and Policy. Luke Zettlemoyer, and Veselin Stoyanov. 2019. https://developer.twitter.com/developer-terms/ RoBERTa: A robustly optimized BERT pretraining agreement-and-policy. (Accessed on 02/01/2021). approach. Record Reaction Tweet GIF Response Sentiment Emotions ID Category

Amusement, “so...I have a job 13241 dance positive Excitement, now ” Joy

Admiration Approval “dyed my hair...... 1320 applause positive Excitement Pics soon” Gratitude Surprise

“Don't forget to Disappointment 17 yawn negative Hydrate!” Disapproval

“Folks, I have a BIG BIG Confusion announcement Fear 808 scared negative coming tomorrow Nervousness night at 9 PM Surprise EST”

Figure 4: ReactionGIF samples.

A Dataset Samples Figure4 includes four samples from the dataset. For each sample, we show the record ID within the dataset, the text of the tweet, a thumbnail of the reaction GIF, the reaction category of the GIF, and the two augmented labels: the sentiment and the emotions.