Fine-Grained Emotion Detection in Microblog Text
Total Page:16
File Type:pdf, Size:1020Kb
Syracuse University SURFACE Dissertations - ALL SURFACE May 2016 FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT Jasy Suet Yan Liew Syracuse University Follow this and additional works at: https://surface.syr.edu/etd Part of the Social and Behavioral Sciences Commons Recommended Citation Liew, Jasy Suet Yan, "FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT" (2016). Dissertations - ALL. 440. https://surface.syr.edu/etd/440 This Dissertation is brought to you for free and open access by the SURFACE at SURFACE. It has been accepted for inclusion in Dissertations - ALL by an authorized administrator of SURFACE. For more information, please contact [email protected]. Abstract Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine- grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine- grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text. FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT by Jasy Liew Suet Yan B.S. Computer Science, Universiti Sains Malaysia, 2008 M.S. Information Management, Syracuse University, 2011 Dissertation Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Information Science and Technology Syracuse University May 2016 Copyright © Jasy Liew Suet Yan 2016 All Rights Reserved Acknowledgments I would like to express my deepest appreciation to my advisor, Howard Turtle, for his thoughtful insights, his guidance, his patience and his relentless faith in me throughout this dissertation. To Elizabeth Liddy, my co-advisor, I am grateful for her encouragement, inspiration and mentorship. They are the closest confidants in my academic family and I could not imagine completing this dissertation without their immense knowledge and tireless support. I am very fortunate to have Ping Zhang, Nancy McCracken, Jennifer Stromer-Galley, Bei Yu, Diana Inkpen and Michael Kalish serve on my committee. I thank them for carefully reviewing my dissertation, providing insightful comments and challenging me with hard questions and criticism to help widen my perspective. I would also like to acknowledge Phyllis Turtle for her editorial comments on the dissertation. Special thanks to the students who volunteered their time to perform the annotation task and worked on building the tools to support my research: Rudy Rusli, Olivia Rhinehart, Tuo Wu, Jiaqi Li, Mohammed Hassan Aldrees, Tim Fu, Ke Ding, Shujin Cao, Yang Liu, Dane Dell, Feifei Zhang, Rucha Somani, Venkata Chadalawada, Anusha Ambati, Yiwei Jin, Yueming Sun, Ruby Cuate Ayala, Sharon Lee, Brian Dobreski and Aravind Gopalakrishnan. Their voluntary spirit, dedication and contribution to knowledge are commendable. I am also indebted to all faculty members who have helped me grown into a confident scholar and teacher as well as to the administrative staff who have helped made my journey in the program a smooth and successful one. Many thanks to my mentors, colleagues, peers and friends for the constant boost in morale and social support. Particularly, I wish to extend my heartfelt appreciation to my cohort mates, Katie DeVries Hassman and Douglas Crescenzi. I am blessed to be given the opportunity to grow, laugh and cry with them throughout my journey in iv the PhD program. Special thanks to Joy Ying Tang and Stephanie Santoso for making my life as a PhD student especially colorful and memorable. This dissertation is made possible by the joint financial support from Universiti Sains Malaysia and the Ministry of Higher Education Malaysia. I thank them for investing in the future of the university and the country. I would also like to express my gratitude to Christine Larsen for funding part of the data collection in this research under the Liddy Fellowship. Last but not least, I would like to thank my family especially my parents and sisters who stood by me and cheered me on. I also want to specially dedicate this dissertation to my niece and nephew, who provided me with generous doses of laughter all throughout the dissertation writing process. Most of my determination to complete the dissertation came from the hope that their generation and many others to come will also benefit from this research. v Table of Content Abstract ..................................................................................................................................... i Chapter 1: Introduction ........................................................................................................... 1 1.1 Motivation .................................................................................................................... 1 1.2 Problem Definition........................................................................................................ 7 1.3 Research Goals ..........................................................................................................12 1.4 Research Questions ...................................................................................................12 1.5 Thesis Overview .........................................................................................................13 Chapter 2: Literature Review ..................................................................................................15 2.1 Introduction .................................................................................................................15 2.2 Emotion Theories: The Psychology of Emotion ...........................................................16 2.2.1 The Darwinian Perspective: Emotion as Expression ............................................16 2.2.2 The Jamesian Perspective: Emotion as Embodiment ..........................................20 2.2.3 The Cognitive Perspective: Emotion as Appraisal ................................................21 2.2.4 The Social Constructivist Perspective: Emotion as Social Constructs ..................25 2.3 Models of Emotion ......................................................................................................27 2.3.1 Categorical Model ................................................................................................27 2.3.2 Dimensional Model ..............................................................................................28 2.4 Application of Emotion Theories in Automatic Emotion Detection in Text: A Summary29 2.5 Distinguishing Emotion from Related Terms ...............................................................32 2.5.1 Emotion and Subjectivity ......................................................................................33 2.5.2 Emotion and Sentiment ........................................................................................33 vi 2.5.3 Emotion and Affect ..............................................................................................35 2.5.4 Hierarchical Organization of Related Terms .........................................................36 2.6 Conceptualizing Emotion in Text .................................................................................37 2.7 Approaches to Identify Emotions in Text .....................................................................40 2.8 Linguistic Representations of Emotion ........................................................................42