Emotion Detection in Twitter Posts: a Rule-Based Algorithm for Annotated Data Acquisition
Total Page:16
File Type:pdf, Size:1020Kb
2020 International Conference on Computational Science and Computational Intelligence (CSCI) Emotion detection in Twitter posts: a rule-based algorithm for annotated data acquisition Maria Krommyda Anastatios Rigos Institute of Communication and Computer Systems Institute of Communication and Computer Systems Athens, Greece Athens, Greece [email protected] [email protected] Kostas Bouklas Angelos Amditis Institute of Communication and Computer Systems Institute of Communication and Computer Systems Athens, Greece Athens, Greece [email protected] [email protected] Abstract—Social media analysis plays a key role to the person that provided the text and it is categorized as positive, understanding of the public’s opinion regarding recent events negative, or neutral. Such analysis is mainly used for movies, and decisions, to the design and management of advertising products and persons, when measuring their appeal to the campaigns as well as to the planning of next steps and mitigation actions for public relationship initiatives. Significant effort has public is of interest [4] and require extended quantities of text been dedicated recently to the development of data analysis collected over a long period of time from different sources to algorithms that will perform, in an automated way, sentiment ensure an unbiased and objective output. analysis over publicly available text. Most of the available While this technique is very popular and well established, research work, focuses on binary categorizing text as positive or with many important use cases and applications, there are negative without further investigating the emotions leading to that categorization. The current needs, however, for in-depth analysis other equally important use cases where such analysis fail to of the available content combined with the complexity and multi- provided the needed information. As an example, such analysis dimensional aspects of the human emotions and opinions have will not be of value in the case of users in the proximity of rendered such solutions obsolete. Due to these needs, currently, an stressful or extreme event, such as a natural disaster due research is focusing on specifying the emotions and not only the to extreme weather phenomena. To begin with, users in the sentiment expressed in a given text. This is, however, a very challenging effort due to not only the lack of annotated datasets vicinity of an event are expected to be negatively affected by that can be used for emotion detection in text but also the the situation, they may be scared, worried or angry, so such subjectivity infused in datasets that have been created based on an analysis would have little or no added value for the risk manual annotations. A hybrid rule-based algorithm is presented assessment and the end users. Also, the text, regardless of in this paper, that supports the creation of a fully annotated the sources examined that will be available, is expected to be dataset over the Plutchik’s eight basic emotions. The presented algorithm takes into consideration the available emoji in the text limited, coming from the few users at the area of the event and utilized them as objective indicators of the expressed emotion and produced within a short amount of time. thus efficiently tackling both identified challenges. This is a full Extending the idea of the sentiment analysis, the emotion regular paper submitted to the CSCI-ISNA Symposium. detection [5] does not examine if the expressed sentiments are Index Terms —social media analysis, data analytics, emotion positive or negative but focuses on identifying the exact human detection, sentiment analysis, data acquisition, data annotation, Plutchik’s eight basic emotions, social media posts emotion that is present in an image, video, voice recording or text. The task of identifying in an automated way the I. INTRODUCTION emotions expressed by an individual, especially when the size Social media monitoring can be referring either to mea- and the features of the input are limited, is not a trivial or suring opinions about current events, also called sentiment easy to model task. Humans have the ability to understand the analysis, or to the emotion detection in the produced content emotions of the people around them using a series of signs [1]. The term sentiment analysis [2], [3] is referring to the in addition to the actual words exchanged, including the body process of identifying and categorizing text based on the language, the voice tone and the facial expressions. Even then, opinions expressed in it using an automated way. The process there are cases where there are contradictory opinions about is focusing exclusively to the analysis of the attitude of the what the real expressed emotion is in a given context, as each individual is expected to understand and interpret differently This work is part of the RESIST project. RESIST has received funding emotional expressions due to personal social experiences. from the European Union’s Horizon 2020 research and innovation programme The most popular theory regarding emotion classification, under grant agreement no 769066. Content reflects only the authors’ view. The Innovation and Networks Executive Agency (INEA) is not responsible for any called the discrete emotion theory [6], is that there are some use that may be made of the information it contains. core human emotions that are the basics upon which all 978-1-7281-7624-6/20/$31.00 ©2020 IEEE 257 DOI 10.1109/CSCI51800.2020.00050 the other emotions can be interpreted and categorized. The utilized them as objective indicators of the expressed emotions that form this basis, however, have been in discussion emotion thus efficiently tackling the challenge of the among psychologists for many years. subjectivity of the emotion detection. It was as early as the 1872, in Charles Darwin’s book ‘The • A manually created list that provides the categorization of Expression of the Emotions in Man and Animals’ [7] that specific emoji over the Plutchik’s eight basic emotions. the idea of the discrete emotion theory was first formed. The The list has been designed to include only emoji that facial, physiological as well as behavioural characteristics of can be exclusively mapped to one of the eight emotions an individual were associated with the emotional state of the examined, excluding all the others. individual. In this book, however, there is no discussion about These functionalities have been developed so that they can which may be the basic emotions that humans express. be fully parameterized and used in a modular way, provid- The idea around the discrete emotions evolved over time ing training datasets fully compliant with the characteristics [8] and in 1957 Paul Ekman presented [9] his initial view needed for the modeling task they will be used for. about basic emotions. His initial work can be summarized in two main assumptions. To begin with, Ekman claimed II. NATURAL LANGUAGE PROCESSING FOR SOCIAL that a pleasant-unpleasant and active-passive scale is sufficient MEDIA POSTS to capture the differences among emotions. Next, he argued Natural Language Processing (NLP) [12] is a field of that the association between the body language and the facial artificial intelligence that focuses on the interpretation of expression with the emotion that it corresponds to is a skill that the human language from computers as well as the human- people develop through social interaction and heavily based on computer interaction using natural language 1. The ultimate their cultural background. objective of NLP is to recognize text, identify the meaning A few years later as he proceed with his research regarding of the words used, interpret the meaning in the context used the expression of emotions, he became the first to challenge and in the end understand the text the same way that a human his own assumptions and proposed categorization and tried to would do. In the end, the purpose of the NLP is to extract establish a systematic and unbiased methodology of contacting knowledge and meaning from the text that can have added emotion classification. While Ekman’s work has received a value for applications and systems. Given the complexity of lot of criticism regarding the reliability, the data collection the task, the plethora of meanings a word or phrase can have and validation process and the trustworthiness of the result it based on its usage and content and the uncertainty around the has provided a significant contribution, the six basic emotion way humans are able to understand text, NLP is using machine classes, which are happiness, anger, sadness, fear, disgust, and learning techniques to derive meaning from text. surprise. NLP is one of the very challenging fields of computer Further evolving Ekman’s work, Robert Plutchik [10] in- science due to the characteristics of the human language and creased the number of primitive emotions to eight. He pro- the multiple indicators that contribute to the understanding of posed a psycho-evolutionary classification approach for the the meaning of a phrase [13]. Grammar and syntax rules used emotions based on psychological observations of general for the formation of sentences vary in the level of detail, may emotional responses [11]. He justified the selection of these have many exceptions and their applicability can depend on emotions, as well as their need to belong to the list of primitive the content of the phrase. One of the most indicative rules ones, by placing them as the triggers to behaviors important that can used here as an example is the plurality of items. The for the survival, in emergency situations, such as the fight-or- general rule, that dictates that the use of the character “s” at flight response triggered by the emotion of fear. The Plutchik’s the end of a noun signifies the plurality, has three word groups eight average-intensity emotional categories are joy, trust, fear, as exceptions.