A Machine Learning Approach to Sentiment Analysis And
Total Page:16
File Type:pdf, Size:1020Kb
A MACHINE LEARNING APPROACH TO SENTIMENT ANALYSIS AND STANCE DETECTION FOR POLITICAL TWEETS EXPLORING THE INFLUENCE OF IRONY ON THE PREDICTABILITY OF SENTIMENT AND STANCE Aantal woorden: 14658 Lot De Kimpe Studentennummer: 01404202 Promotor: Prof. Dr. Els Lefever Masterproef voorgelegd voor het behalen van de graad Master in de Meertalige Communicatie Academiejaar: 2017 - 2018 Verklaring i.v.m. auteursrecht De auteur en de promotor(en) geven de toelating deze studie als geheel voor consultatie beschikbaar te stellen voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van gegevens uit deze studie. Het auteursrecht betreffende de gegevens vermeld in deze studie berust bij de promotor(en). Het auteursrecht beperkt zich tot de wijze waarop de auteur de problematiek van het onderwerp heeft benaderd en neergeschreven. De auteur respecteert daarbij het oorspronkelijke auteursrecht van de individueel geciteerde studies en eventueel bijhorende documentatie, zoals tabellen en figuren. Page 3 of 76 Abstract With the emergence of Web 2.0, easy-accessible microblogging platforms such as Facebook and Twitter have allowed users to easily share their opinions online. Sentiment analysis and stance detection, allow a business, organization or political party to gather all these viewpoints and to find out which sentiment (positive, negative or neutral) a piece of text contains to optimize their products and services. Despite the fast developments in this field of study, challenges for the automatic prediction of sentiment and stance labels are still present (Pang et al., 2008; Kumar & Sebastian, 2012; Mandya et al., 2016). In this research, it was explored how well a machine learning system performs for sentiment analysis and stance detection on an English Twitter corpus of 482 political tweets with #Brexit. The manually annotated labels were compared to the predictions of a machine learning system, considering the possible impact of irony on the performance of our system. The results show that the system performs fairly well on sentiment analysis (accuracy of 0,55) and stance detection (accuracy of 0,61). It remains, however, unclear to which extent irony affects the quality of the automatic predictions. Further research could specifically focus on the comparison between irony detection and sentiment analysis or stance detection. (204) Page 4 of 76 Acknowledgements First of all, I would like to express my gratitude towards my sister, Lies De Kimpe. As from the very first letter of my bachelor’s paper, she provided me with professional feedback and encouraging pep talks. And even now, up until the very last letter of my master’s thesis, she has always been by my side for support. Without her eye for detail and willingness to answer every little question, this paper would have not reached the quality it has today. Secondly, many thanks go to my supervisor, Els Lefever, for her help, patience and support during the last two years. She has always given me useful advice and motivating compliments, which encouraged me to complete my thesis successfully. Furthermore, I also wish to thank her for being such an approachable and kind mentor. Thirdly, I would like to thank my friends Julie Carton, Lien De Wulf, Bo Van Eetvelde and the entire group of KLJ people, who were my towers of strength in stressful moments. They were wonderful in offering my daily dose of distraction in solitary times behind my desk. Special thanks go to Hanne Christiaens, who was willing to share her recognizable experiences as a master’s student at VTC with me. And last but definitely not least, I wish to thank my parents from the bottom of my heart for allowing me to pursue any possible dream and keeping their endless faith in me. Page 5 of 76 Table of contents List of tables and figures ......................................................................................................................... 8 1. INTRODUCTION ........................................................................................................................... 9 2. LITERATURE STUDY ................................................................................................................ 11 2.1 Sentiment analysis ................................................................................................................. 11 2.1.1 Terminology .................................................................................................................. 12 2.2 Approaches to SA .................................................................................................................. 13 2.2.1 Lexicon-based approach to SA ...................................................................................... 14 2.2.2 Supervised machine learning approach to SA ............................................................... 15 2.3 ABSA: Aspect Based Sentiment Analysis ............................................................................ 17 2.4 Sentiment analysis for political tweets .................................................................................. 17 2.4.1 Twitter ........................................................................................................................... 18 2.5 Stance detection ..................................................................................................................... 19 2.6 Irony detection ....................................................................................................................... 20 2.6.1 What is irony? ............................................................................................................... 20 2.6.2 Difficulties and challenges ............................................................................................ 21 3. RESEARCH DESIGN .................................................................................................................. 22 3.1 Research questions and hypotheses ....................................................................................... 22 3.2 Methodology ......................................................................................................................... 23 3.2.1 Data collection ............................................................................................................... 23 3.1.1 Annotation ..................................................................................................................... 24 3.1.2 Experimental approach .................................................................................................. 25 4. RESULTS ...................................................................................................................................... 26 4.1 Results manual annotation ..................................................................................................... 26 4.1.1 Sentiment and topics...................................................................................................... 26 4.1.2 Stance and irony ............................................................................................................ 28 4.2 Results machine learning system ........................................................................................... 30 4.2.1 Sentiment and topics...................................................................................................... 30 4.2.2 Stance and irony ............................................................................................................ 31 4.3 Analysis ................................................................................................................................. 33 4.3.1 Sentiment analysis: tenfold cross-validation scheme .................................................... 33 4.3.2 Sentiment analysis: general overview ........................................................................... 34 4.3.2.1 Impact of irony on the prediction of sentiment ....................................................... 34 Page 6 of 76 4.3.2.2 Error Analysis .......................................................................................................... 34 4.3.3 Stance detection: tenfold cross-validation scheme ........................................................ 39 4.3.4 Stance detection: general overview ............................................................................... 40 4.3.2.1 Impact of irony on the prediction of stance ............................................................. 41 4.3.2.2 Error analysis .......................................................................................................... 42 4.3.5 Comparison sentiment analysis and stance detection .................................................... 43 5. CONCLUSION ............................................................................................................................. 46 6. LIMITATIONS AND FURTHER RESEARCH ........................................................................... 49 APPENDIX 1 ........................................................................................................................................ 54 APPENDIX 2 ........................................................................................................................................ 57 Page 7 of 76 List of tables and figures Table 1 - Sentiment per topic (manual annotation)