Machine Learning and Lexicon-Based Sentiment Analysis of Twitter Responses to Video Assistant Referees in the Premier League During the 2019-2020 Season
Total Page:16
File Type:pdf, Size:1020Kb
Technische Universität München Machine Learning and Lexicon-Based Sentiment Analysis of Twitter Responses to Video Assistant Referees in the Premier League during the 2019-2020 Season Master’s Thesis For Completion of Degree: M. Sc. Sport and Exercise Science TUM Department of Sport and Health Sciences Technical University of Munich Supervisor: Dr. Otto Kolbinger Chair of Performance Analysis and Sports Informatics Department of Sport and Health Sciences Submitted by: Melanie Knopp [email protected] Arcisstraße 21, 80333 München Submitted: Munich, July 6, 2020 Knopp - Twitter Responses to Video Assisted Referee Calls Acknowledgment First of all, I would like to thank my thesis advisor Dr. Otto Kolbinger for giving me the opportunity to write this thesis in a manner most fitting to me and for his patience and support. Whenever I ran into trouble with my writing or needed coding or other assistance, he pointed me in the right direction. Additionally, his understanding of my situation was very much appreciated. An additional thanks goes to my instructors both at Pepperdine University and at the Technical University of Munich for providing me with the tools and knowledge necessary not only to be successful in my career but also to develop a passion for this field. Special acknowledgment also goes to Marcel Knopp (TUM Biomedical Computing) for assisting with the processing of the data files as he has a much more powerful computational environment than my laptop. I would also like to acknowledge Dr. Michael Knopp for his valuable comments on this thesis. Furthermore, I would like to express my gratitude to Dr. Daniel Ruiz for believing in me every step of the way and for being so understanding. Finally, I would like to thank both the Menter’s, particularly Chris, and my family for their continuous support and encouragement and for giving me a balance to the situation. This accomplishment would not have been possible without all of them. Thank you! Munich, July 6, 2020 MELANIE KNOPP Technical University Munich i Knopp - Twitter Responses to Video Assisted Referee Calls I. Abstract Introduction: In soccer, the video assistant referee (VAR), which refers to a match official who reviews video footage and communicates with the head referee, has relatively recently been introduced to the game. While previous research has examined the descriptive statistics of the impact of VAR on soccer, so far, no research has examined fans’ reaction to its implementation. One way to study how fans feel about its introduction to soccer is to use Twitter analysis collecting input from many different fans all around the world. Therefore, this report examines the use of Twitter as a technology to obtain data from fans in real time coordinated through various time zones in response to live events as its happening during a broadcasted sports event. In more detail, the aim of this study is to analyze how sports fans on Twitter react during soccer games in response to VAR used in the English Premier League via sentiment analysis to examine the emotions demonstrated in the published tweets. Methods: For this study, the Twitter data, made up of 643,251 tweets posted during the examined games of the 2019-2020 English Premier League season, was collected and analyzed. Two different text mining techniques were applied for both event detection and sentiment analysis. The first method consisted of a frequency analysis of the content of tweets to determine if and when the VAR was involved and a lexicon-based system to calculate the corresponding sentiment. The second method attempted to do both event detection of VAR incidents and sentiment analysis with the help of a supervised machine learning algorithm. Since the machine learning method performed better, this was used to categorize or label each tweet accordingly. Results: The results showed VAR tweets have a significantly lower sentiment value than the collected non-VAR tweets. Furthermore, when games were split into quantiles and examined, it was found that VAR tweets were overrepresented in quantiles with low average sentiment values. When examining triggers to VAR peaks, the sentiment of the period before the trigger was found to be significantly higher than the period directly after the trigger as well as higher still than the later period. Conclusions: Distinct results emerged suggesting fans on Twitter express a significant negative opinion towards VAR usage during the examined games. Indicating that, in this case, the main objective of VAR to correct clear and obvious errors without significantly interrupting the game is not being achieved. While the findings in this research are based specifically on the Premier League’s usage of VAR, implications can be highlighted for other leagues and tournaments about the potential reactions from fans if VAR is administered in the same way. Keywords: Video Assistant Referee, Soccer, Football, Twitter, Social Media, Text Mining, Event Detection, Sentiment Analysis, Sentiment Lexicon, Machine Learning Technical University Munich ii Knopp - Twitter Responses to Video Assisted Referee Calls II. Table of Contents Acknowledgment ......................................................................................................... i I. Abstract ................................................................................................................... ii 1. Introduction ........................................................................................................... 1 1.1 Video Assistant Referee ................................................................................................ 1 1.2 Social Media and Twitter in Sports ............................................................................... 4 1.3 Twitter as a Backchannel for Television and Current Events ........................................ 8 1.4 Who is on Twitter, Why are they using it, and to Whom are they talking? ................... 9 1.5 Event Detection with Twitter ...................................................................................... 12 1.6 Sentiment Analysis ..................................................................................................... 13 1.7 Lexicon-Based Approach For Sentiment Analysis ...................................................... 16 1.8 Machine Learning Approach For Sentiment Analysis ................................................. 18 1.9 Aim ............................................................................................................................. 20 2. Methods ................................................................................................................ 21 2.1 Sample ........................................................................................................................ 21 2.2 Data Access: Twitter API ............................................................................................ 22 2.3 Lexicon-Based Approach ............................................................................................ 24 2.3.1 General Data Pre-Processing for Lexicon-Based Approach ................................................ 24 2.3.2 Event Detection of VAR Incidents ...................................................................................... 25 2.3.3 Sentiment Analysis ............................................................................................................ 26 2.3.4 Finalized Data for Export ................................................................................................... 27 2.3.5 Limitations of the Lexicon-Based Method ......................................................................... 28 2.4 Machine Learning Method .......................................................................................... 29 2.4.1 Preparing Training Data for Supervised Learning .............................................................. 29 2.4.2 Three Different Text Classification Learning Algorithms .................................................. 32 2.4.3 Naïve Bayes Classifier ........................................................................................................ 33 2.4.4 Support Vector Machine ..................................................................................................... 34 2.4.5 Random Forest ................................................................................................................... 34 2.4.6 Generalized Tuning ............................................................................................................ 36 2.4.7 Accuracy Results of Machine Learning Method ................................................................. 37 2.5 Method Selection: Lexicon-Based vs Machine Learning ............................................. 39 2.5.1 Lexicon-Based Results of Arsenal vs Brighton Match ........................................................ 40 2.5.2 Machine Learning Results of Arsenal vs Brighton Match ................................................... 41 2.5.3 Accuracy Results of Both Methods ..................................................................................... 42 2.6 Data Processing .......................................................................................................... 44 2.6.1 General Pre-Processing of Game Tweets ............................................................................ 45 2.6.2 Prediction Algorithms For Game Tweets ...........................................................................