Master Thesis Project 15P, Spring 2019 Winner Prediction of Blood Bowl 2 Matches with Binary Classification
Total Page:16
File Type:pdf, Size:1020Kb
Faculty of Technology and Society Department of Computer Science and Media Technology Master Thesis Project 15p, Spring 2019 Winner Prediction of Blood Bowl 2 Matches with Binary Classification By Andreas Gustafsson Supervisors: Jose Maria Font Fernandez Alberto Enrique Alvarez Uribe Examiner: Johan Holmgren Winner Prediction of Blood Bowl 2 Matches with Binary Classification Contact information Author: Andreas Gustafsson E-mail: [email protected] Supervisors: Jose Maria Font Fernandez E-mail: [email protected] Malmö University, Departament of Computer Science Alberto Enrique Alvarez Uribe E-mail: [email protected] Malmö University, Departament of Computer Science Examiner: Johan Holmgren E-mail: [email protected] Malmö University, Departament of Computer Science 1| Winner Prediction of Blood Bowl 2 Matches with Binary Classification Contents Abstract 5 Popular Science Summary 6 Acknowledgement 7 1 Introduction 12 1.1 Motivation . 12 1.2 Aim and Objectives . 13 1.3 Research Questions . 14 1.4 Expected Outcome . 15 1.5 Summary . 16 2 Related Work 17 2.1 Outcome Prediction . 17 2.2 Player modelling . 18 2.3 Machine Learning . 19 2.3.1 Supervised Learning . 19 2.3.1.1 Binary Classification . 20 2.3.1.2 Decision Trees . 20 2.3.1.3 Ensemble Methods . 21 2.3.1.4 Support Vector Machines . 22 2.3.1.5 Naive Bayes Methods . 23 2.3.1.6 k-Nearest Neighbors . 24 2.3.1.7 Logistic Regression . 25 2| Winner Prediction of Blood Bowl 2 Matches with Binary Classification 2.3.1.8 Multilayer Perceptron . 26 2.4 Summary . 27 3 Preliminaries: Blood Bowl 2 28 3.1 Terminology . 28 3.2 Description . 28 3.2.1 Statistics . 29 3.2.2 Races . 29 3.3 Examples of Playing . 35 3.4 Community Aspects . 38 4 Proposed Approach 40 4.1 Considerations . 40 4.2 Data Generation . 41 4.3 Features . 41 4.4 Datasets . 44 5 Method 48 5.1 Motivation . 48 5.2 The Experiment . 49 5.3 Measurements . 50 5.4 Classifiers . 50 5.5 Hyper-Parameter Search . 52 6 Result 53 6.1 Classification Performance . 53 6.1.1 Base Dataset (D1)....................... 55 6.1.2 Dataset with Races (D2).................... 58 6.1.3 Dataset with Play-styles (D3)................. 61 7 Analysis and Discussion 64 7.1 Classification Performance and Datasets . 64 7.2 Validity Threats and Limitations . 67 3| Winner Prediction of Blood Bowl 2 Matches with Binary Classification 8 Conclusions and Future Work 68 8.1 Conclusions . 68 8.2 Future Work . 69 References 71 9 Appendix A 79 9.1 Replication Data . 79 4| Winner Prediction of Blood Bowl 2 Matches with Binary Classification Abstract Being able to predict the outcome of a game is useful in many aspects. Such as, to aid designers in the process of understanding how the game is played by the players, as well as how to be able to balance the elements within the game are two of those aspects. If one could predict the outcome of games with certainty the design process could possibly be evolved into more of an experiment based approach where one can observe cause and effect to some degree. It has previously been shown that it is possible to predict outcomes of games to varying degrees of success. However, there is a lack of research which compares and evaluates several different models on the same domain with common aims. To narrow this identified gap an experiment is conducted to compare and analyze seven different classifiers within the same domain. The classifiers are then ranked on accuracy against each other with help of appropriate statistical methods. The classifiers compete on the task of predicting which team will win or lose in a match of the game Blood Bowl 2. For nuance three different datasets are made for the models to be trained on. While the results vary between the models of the various datasets the gen- eral consensus has an identifiable pattern of rejections. The results also indicate a strong accuracy for Support Vector Machine and Logistic Regression across all the datasets. Keywords: Machine learning; Blood Bowl 2; Predict winner; Outcome predic- tion; Supervised learning; Binary classification; Match prediction. 5| Winner Prediction of Blood Bowl 2 Matches with Binary Classification Popular Science Summary Can the computer predict who will win a match of Blood Bowl 2? Yes! 88% of the time it will correctly guess which team will win a match of the game. This is important since it shows that given enough time to test many different settings and algorithms accurate guesses can be made for complicated games like Blood Bowl 2. If we document what has been tried and how it went for many different problems then it will be easier to understand what algorithms to start trying with at a new problem that is similar to other problems that have already been done. This discovery is good for the curious player community of Blood Bowl 2, as many of them try to understand the game even better. It is also good for people to get a starting point at ideas that might work for their own similar problems. If we manage to accurately guess who will win in games it will help a lot during the development of new games. The creators will be able to test new things quickly to see how it changes their game instead of having to let the players of the game act as test subjects for their new ideas. How can the computer guess so well? It looks at matches played by players before and find similar matches to the one that is about to be played. If it has enough knowledge about how the similar matches went it will make an educated guess about how this match will go. There is still a lot of room for improvement in the findings of this study, but it is believed to be a step in the right direction on the path of truly being able to guess what is about to happen with a match of Blood Bowl 2, rather than having to wait and see how the match plays out. 6| Winner Prediction of Blood Bowl 2 Matches with Binary Classification Acknowledgement Special thanks to Jose Font and Alberto Alvarez for all the supervision, feedback, and excellent help during times of confusion. Also, many thanks to Carl Magnus Olsson for introducing, and coaching me through, the world of Blood Bowl. Finally, thanks to everyone over at the Blood Bowl community that helped out with various questions with unyielding support. 7| Winner Prediction of Blood Bowl 2 Matches with Binary Classification List of Figures 2.1 Illustration of a decision tree (DT) . 21 2.2 Illustration of the main idea behind a Support Vector Machine (SVM) 23 2.3 Illustration of k-Nearest Neighbors (kNN) . 25 2.4 Illustration of a multilayer perceptron (MLP) . 27 3.1 Illustrations of all races in Blood Bowl 2 . 35 3.2 Bird view of Blood Bowl 2 playfield . 36 3.3 Start of a match in Blood Bowl 2 . 37 3.4 Active turn in Blood Bowl 2 . 37 3.5 Showing a player carrying the ball in Blood Bowl 2 . 38 6.1 Plot of accuracy of classifiers from D1 ................. 55 6.2 Plot of accuracy of classifiers from D2 ................. 58 6.3 Plot of accuracy of classifiers from D3 ................. 61 8| Winner Prediction of Blood Bowl 2 Matches with Binary Classification List of Tables 4.1 Example of general rows in the dataset . 46 5.1 Section index of Classifiers . 51 6.1 Accuracy of classifiers from D1 ..................... 56 6.2 Statistical relevance of comparisons for classifiers from D1 ..... 56 6.3 Confusion matrix of Dummy model for D1 .............. 57 6.4 Confusion matrix of the Gaussian Naive Bayes model for D1 .... 57 6.5 Confusion matrix of the Decision Tree model for D1 ......... 57 6.6 Confusion matrix of the k-Nearest Neighbors model for D1 ..... 57 6.7 Confusion matrix of the Support Vector Machine model for D1 ... 57 6.8 Confusion matrix of the Logistic Regression model for D1 ...... 57 6.9 Confusion matrix of the Random Forest for D1 ............ 57 6.10 Confusion matrix of the Multilayer Perceptron for D1 ........ 57 6.11 Accuracy of classifiers from D2 ..................... 59 6.12 Statistical relevance of comparisons for classifiers from D2 ..... 59 6.13 Confusion matrix of Dummy model for D2 .............. 60 6.14 Confusion matrix of the Gaussian Naive Bayes model for D2 .... 60 6.15 Confusion matrix of the Decision Tree model for D2 ......... 60 6.16 Confusion matrix of the k-Nearest Neighbors model for D2 ..... 60 6.17 Confusion matrix of the Support Vector Machine model for D2 ... 60 6.18 Confusion matrix of the Logistic Regression model for D2 ...... 60 6.19 Confusion matrix of the Random Forest for D2 ............ 60 6.20 Confusion matrix of the Multilayer Perceptron for D2 ........ 60 6.21 Accuracy of classifiers from D3 ..................... 62 9| Winner Prediction of Blood Bowl 2 Matches with Binary Classification 6.22 Statistical relevance of comparisons for classifiers from D3 ..... 62 6.23 Confusion matrix of Dummy model for D3 .............. 63 6.24 Confusion matrix of the Gaussian Naive Bayes model for D3 .... 63 6.25 Confusion matrix of the Decision Tree model for D3 ......... 63 6.26 Confusion matrix of the k-Nearest Neighbors model for D3 ..... 63 6.27 Confusion matrix of the Support Vector Machine model for D3 ... 63 6.28 Confusion matrix of the Logistic Regression model for D3 ...... 63 6.29 Confusion matrix of the Random Forest for D3 ...........