Predicting Teamfight Tactics Results with Machine Learning Techniques

Independent thesis (degree project), 15 credits, for the degree of Bachelor of Science (180 credits) with a major in Computer Science Spring Semester 2020 Faculty of Natural Sciences Predicting Teamfight Tactics Results with Machine Learning Techniques Andrei Casian Martin Zannato Author Andrei Casian Martin Zannato Predicting Teamfight Tactics Results with Machine Learning Techniques Supervisor Kamilla Klonowska Examiner Dawit Mengistu Abstract The scope of this thesis is to determine whether Machine Learning techniques used in predicting traditional sports results are relevant in predicting electronic sports results. Moreover, it explores whether there is a hidden pattern in a team that leads to a win rather than just plain team composition. The selected video game from electronic sports was Teamfight Tactics. The literature study was conducted on journals, articles and projects that predicted traditional sports outcomes, from which the Machine Learning models were collected. Literature concerning Machine Learning algorithms, their advantages and caveats was studied as well for the scientific background and implementation. Finally the experiment consisted in implementing models identified in literature study and analyzing the results. The experiment showed that predicting electronic sports with Machine Learning techniques used in traditional sports is not only relevant but also more accurate, due to the complex nature of the video game and data availability which yields a dataset with high dimensionality and data points.The second part of the experiment showed that indeed there is a pattern in a team that leads to higher winning chances. It was demonstrated by encoding the dataset with word2vec (encoder used in natural language processing) and visualizing the data through the Embedding Projector (a tool that shows patterns in data through clusters). Keywords Machine Learning, Data Preprocessing, Feature Selection, Overfitting, Curse of Dimensionality Contents Acknowledgements 5 Abbreviations 6 1. Introduction 7 1.1 Related Work 8 1.2 Research Questions 9 1.3 Aim and Purpose 9 1.4 Methods 11 1.5 Thesis Structure 12 2. Challenges & Limitations 13 2.1 Challenges 14 2.2 Limitations 15 2.2.1 Missing Features 15 2.2.2 Missing Items 15 2.2.3 Hardware 15 3. Background 17 3.1 TFT Rules 17 3.2 ML Algorithms 18 3.2.1 Multiple Linear Regression 18 3.2.2 Logistic Regression 18 3.2.3 Support Vector Machines 19 3.2.4 K-Nearest Neighbour 20 3.2.5 Gaussian Naive Bayes 21 3.2.6 Decision Trees 21 3.2.7 Random Forest 22 3.2.8 Feedforward Neural Networks 23 4. Dataset 24 4.1 Data Generation 24 4.2 Feature Selection 25 4.2.1 Data distribution 26 4.3 Feature Engineering 27 4.3.1 Encoding 27 4.3.2 Standardization 29 5. Experimentation & Optimisation 30 5.1 Implementation 31 5.1.1 Models Using common Implementation Steps 31 5.1.2 Deep Feedforward Neural Network 34 5.2 Evaluation 36 5.2.1 Underfitting and Overfitting 36 5.2.2 Model Performance 38 5.2.3 Parameter Tuning 39 6. Results 40 6.1 Regression 40 6.2 Classification 41 6.3 BinaryEncoder vs Word2Vec 42 7. Discussion and Analysis 43 7.1 Result analysis 43 7.1.1 Comparison of ML in Sports and Esports 43 7.1.2 Identifying a Hidden Pattern 44 7.2 Ethical aspects 45 8. Conclusion 46 8.1 Research questions 46 8.2 Future work 47 9. References 48 Appendix A 54 Appendix B 55 Appendix C 58 Appendix D 59 Appendix E 60 Appendix F 61 Acknowledgements We would like to thank our supervisor Kamilla Klonowska who believed in our project idea. For her help and support throughout our work. Abbreviations DFNN Deep feedforward neural network CSV Comma Separated File CV Cross Validation DOM Document Object Model ESPORTS Electronic Sports IP Intellectual property LIR Linear Regression LOR Logistic Regression ML Machine Learning MLR Multiple Linear Regression RMSE Root Mean Square Error TFT Teamfight Tactics PUUID Player User Unique Identification 1. Introduction In spite of Machine Learning being considered a technology of the future, it actually persisted in people’s daily lives for years now. Notorious early usages of Machine Learning (ML) techniques were applied for email spam filtering in 2002, using Bayesian Algorithms [1], for detecting payment card fraud in 2000, using Deep Feedforward Neural Network (DFNN) [2] as well as in many other domains without people even realizing it. In his book, Aurélien Géron defines ML as: “... the science (and art) of programming computers so they can learn from data.“ [3]. It couldn’t be described better since science (in the aforementioned quote) encompasses the ensemble of mathematical formulas implemented in a computer language to represent an algorithm and the art refers to the fact that the programming can be achieved in many different ways. ML techniques are great for solving complex problems with vast datasets where traditional algorithms would fail. And because of this nature, ML transcends essential sectors of infrastructure (medecine [4], economics [5], agriculture [6]) to entertainment (sports, video games) as well. A well known usage of ML in sports is to predict wins and placements whether it is for betting purposes or analysis. Electronic Sports (esports) [7] share common traits with traditional sports, such as: competitive play, fixed set of rules, win prizes, large audiences [8]. With all the similarities, esports are more complex than traditional sports and therefore form a perfect base for ML algorithms. Predicting esports match-results might be useful not only for the players, coaches and spectators but for the game developers as well as it can bring light on how well a game is balanced, how likely a player will play the game. This thesis will capitalize on the complex datasets that the online video game Teamfight Tactics (TFT) [9] generates, to implement ML algorithms used in traditional sports. The goal is to predict a discrete variable (win/loss) as well as a continuous one (placement) and find the model with the most accurate prediction. 7 1.1 Related Work This chapter introduces various ML terminologies which are explained later in this thesis in Chapter 3 and Chapter 4. Herbinet [10] in his research used regression and classification models to predict the score and the outcome of football matches. The focus of his study was to predict accurate results based on in-game match events rather than just the number of goals scored by a team. For the outcome of a match he used various classification models such as Logistic Regression (LOR), K-Nearest-Neighbours (KNN), Gaussian Naive Bayes (GNB). He finally obtained an Accuracy of 0.511. To predict a match score he used regression models such as Linear Regression (LIR), Support Vector Machine (SVM), Random Forest (RF). The best accuracy he obtained from these models was 0.446. Bunker et al. [11] consolidated the usage of ML in various sports thanks to its ability of predicting a target variable in previously unseen data. Although they mentioned predicting a continuous variable such as the match score in Herbinet’s work, Bunker et al. treated sport prediction as a classification problem. The result of their work was a pipeline based on the CRISP-DM framework, using the Deep Feedforward Neural Network (DFNN) as the selected prediction model. Yang [12] has built a model to predict the winning team in a video game called Overwatch by focusing on synergies of the heroes in that team. Unlike the models described in Herbinet and Bunker et al. work, his model is designed to predict the outcome on the fly rather than pre match. He was interested in increasing the accuracy prediction in early game progress, a task that other models underperformed. He explained the importance of encoding categorical features accordingly. For that he compared champions1 in a team to words in a sentence. Randomly arranged they might not make sense but in a sentence they carry a meaning. Similarly, champions create synergies when combined accordingly. He then presented how the most popular categorical encoder - OneHotEncoding (OHE) fails to preserve the synergy because of 1 A single unit that can be placed in a team. 8 its nature of creating a new dimension for every different word, not capturing in this way the semantic. His approach was to use distributed representations: “Technique which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.” [13]. He managed to obtain an early accuracy increase of 7% in his LOR model by using Word2vec - a two layer neural network that processes data by vectorizing words, compared to OneHotEncoding implementation. Eggels et al. [14] used the results of football matches after their conclusion to predict which team was supposed to win, instead of which team actually won. The researchers used LOR, Decision Tree (DT), RF and DTs boosted with Ada-boost. The best accuracy they obtained was 0.785 with Random Forest implementation. In the researchers’ work 11 features were used collected from 5017 matches. Gu et al. [15] created a ML prediction and data gathering framework for hockey games. They gathered data by mining it from the web and retrieved such information as results of previous matches, player performance indicators, opposition information. They collected 18 features to evaluate players on. Their scope was to predict winning chances and therefore they used classification models. The researchers used DTs, KNN, SVM, Naïve Bayes (NB) with a training set of 2000 and a test set of 638. The worst prediction the researchers obtained was from KNN (0.84 accuracy) and the other tested ML algorithms (SVM, NB, DT) all scored similarly within the 0.9-0.91 range. 1.2 Research Questions Q1: Are ML techniques used to predict traditional sports results applicable for esports? Q2: Can ML techniques detect a pattern in a team’s attribute set that will provide higher winning chances? 1.3 Aim and Purpose Teamfight Tactics (TFT) is an online auto-battler [16] from Riot Games.

Load more