Bloodbowl 2 Race Clustering by Different Playstyles Master Thesis
Total Page:16
File Type:pdf, Size:1020Kb
BloodBowl 2 race clustering by different playstyles Master Thesis Project 15p, Spring 2020 Supervisors: Author: Jose Maria Font Tadas Ivanauskas Fernanderz [email protected] Examiner: Johan Holmgren Blood Bowl 2 race clustering by different playstyles Contact information Author: Tadas Ivanauskas E-mail: [email protected] Supervisors: Jose Maria Font Fernanderz E-mail: [email protected] Malm¨oUniversity, Department of Computer Science and Media Technology. Examiner: Johan Holmgren E-mail: [email protected] Malm¨oUniversity, Department of Computer Science and Media Technology. 1j Blood Bowl 2 race clustering by different playstyles Abstract The number of features and number of instances has a significant impact on computation time and memory footprint for machine learning algorithms. Reducing the number of features reduces the memory footprint and compu- tation time and allows for a number of instances to remain constant. This thesis investigates the feature reduction by clustering. 9 clustering algorithms and 3 classification algorithms were used to investi- gate whether categories obtained by clustering algorithms can be a replace- ment for original attributes in the data set with minimal impact on classifi- cation accuracy. The video game Blood Bowl 2 was chosen as a study subject. Blood Bowl 2 match data was obtained from a public database The results show that the cluster labels cannot be used as a substitute for the original features as the substitution had no effect on the classifications. Furthermore, the cluster labels had relatively low weight values and would be excluded by activation functions on most algorithms. 2j Blood Bowl 2 race clustering by different playstyles Popular science summary You are not playing a hybrid, you are just bad at the game. Blood Bowl 2 community has long argued about what race falls into what category. Finally, the answer is here. Quite a few community members tried to objectively group races by playstyles, the closest we got was arguing on online message boards. This time we used 200 000 matches and asked the machines to do it for us. By looking at meters run, stuns, blocks, passes and other numbers, machine learning algorithms found three playstyles, namely bash, dash and hybrid. The interesting difference between these playstyles is that hybrid performs worse than the other two at nearly everything. The races do not neatly fall into categories, more like they lean towards their categories. This means that it is possible to play bash with elves and play dash with dwarves, but chances of winning the match drop significantly. The findings of this paper can provide some insight for the developers to balance the races and help the community to formulate better strategies for their preferred races. 3j Blood Bowl 2 race clustering by different playstyles Acknowledgment Special thanks to Kamilla Klonowska for finding time to provide feedback and support. Also, many thanks to Jose Maria Font Fernandez and Alberto Enrique Alvarez Uribe for the topic idea and for setting me on the correct path. Finally thanks to Andreas Harrison who provided help and support for gathering the data that was used in this research. 4j Blood Bowl 2 race clustering by different playstyles Contents 1 Introduction 11 1.1 Background . 11 1.2 Motivation . 12 1.3 Aim and Objectives . 13 1.4 Research Questions . 14 1.5 Chapter Summary . 15 2 Related work 16 2.1 Sports Games Modeling . 16 2.2 Machine Learning . 17 2.3 Data Preprocessing . 23 2.4 Chapter Summary . 25 3 Preliminaries: Blood Bowl 2 26 3.1 Terminology . 26 3.2 Game Description . 27 3.3 Player Statistics . 27 3.4 Races . 28 3.5 Match Statistics . 29 3.6 Chapter Summary . 30 4 Proposed Approach 31 4.1 Data Acquisition . 31 4.2 Data Preparation . 32 4.3 Number of Clusters . 33 5j Blood Bowl 2 race clustering by different playstyles 4.4 Chapter Summary . 35 5 Method 36 5.1 Motivation . 36 5.2 The Experiment . 37 5.3 Measurements . 37 5.4 Algorithms . 39 5.5 Chapter Summary . 41 6 Results 42 6.1 Clustering Results . 42 6.2 Classification Results . 56 6.3 Chapter Summary . 61 7 Analysis and Discussion 62 7.1 Clustering Analysis . 62 7.2 Generalization . 64 7.3 Classification Analysis . 66 7.4 Validity Threats and Limitations . 67 7.5 Chapter Summary . 67 8 Conclusion and Future Work 68 8.1 Conclusion . 68 8.2 Future Work . 70 8.3 Chapter Summary . 70 9 Appendices 77 9.1 Source Code . 77 9.2 Heat-map . 77 9.3 PCA Figures . 79 9.4 Clustering Figures . 82 6j Blood Bowl 2 race clustering by different playstyles List of Figures 2.1 Visualization of K- means clustering. 18 2.2 Scree plot showing relationship between number of clusters and sum of squared errors. 19 2.3 Example dendrogram (right) clustering six data points. 21 2.4 Mean shift centroid movement. 22 2.5 Fitting and transforming the data using PCA. 25 3.1 Races in Blood Bowl 2 Legendary edition. 28 4.1 Explained variance using 34 attributes containing 24 compo- nents. 33 4.2 Elbow method showing 3 clusters. 34 6.1 Race dendrograms displaying three clusters obtained trough hierarchical clustering. 43 6.2 K-Means Clustering results. 44 6.3 K- means Silhouette index. 45 6.4 Affinity propagation clustering results. 46 6.5 Silhouette index for affinity propagation clustering. 46 6.6 BIRCH clustering results. 47 6.7 Silhouette index for BIRCH clustering. 48 6.8 Spectral clustering results. 49 6.9 Silhouette index for spectral clustering. 49 6.10 DBSCAN clustering results. 50 6.11 Silhouette index for DBSCAN clustering. 51 7j Blood Bowl 2 race clustering by different playstyles 6.12 OPTICS clustering results. 52 6.13 Silhouette index for OPTICS clustering. 52 6.14 Gaussian clustering results. 53 6.15 Silhouette index for Gaussian clustering. 54 6.16 Mean Shift clustering results. 55 6.17 Silhouette index for mean shift clustering. 55 7.1 Box plot showing passes for each cluster. 63 7.2 Experience statistics for each cluster. 64 7.3 Breaks for each race. 65 9.1 Heat-map depicting the similarity between features. 78 9.2 Explained variance using 34 attributes containing 24 compo- nents. Full size figure. 80 9.3 Elbow method showing 3 clusters. Full size figure. 81 9.4 Race dendrograms displaying three clusters obtained trough hierarchical clustering. Full size figure. 83 9.5 K-Means Clustering results. Full size figure. 84 9.6 Affinity propagation clustering results. Full size figure. 85 9.7 BIRCH clustering results. Full size figure. 86 9.8 Spectral clustering results. Full size figure. 87 9.9 DBSCAN clustering results. Full size figure. 88 9.10 OPTICS clustering results. Full size figure. 89 9.11 Gaussian clustering results. Full size figure. 90 9.12 Mean Shift clustering results. Full size figure. 91 8j Blood Bowl 2 race clustering by different playstyles List of Tables 2.1 Example data that is to be grouped by animal. 24 2.2 Data from previous table grouped by animal. 24 5.1 Clustering algorithms that are tested in this study. 40 6.1 Overview of clustering results . 56 6.2 Classification results. 57 6.3 Gaussian Naive Bayes Confusion matrix . 57 6.4 Logistic Regression Confusion matrix . 57 6.5 SVC Confusion matrix . 58 6.6 Feature weights for classification algorithms. 59 6.7 Classification results . 60 9j Blood Bowl 2 race clustering by different playstyles List of Acronyms BIRCH Balanced Iterative Reducing and Clustering using Hierarchies CSV Comma Separated Values DBSCAN Density-Based Spatial Clustering of Applications with Noise JSON JavaScript Object Notation OPTICS Ordering Points To Identify the Clustering Structure PCA Principal Component Analysis SVC Support Vector Classification XML Extensible Markup Language 10j Blood Bowl 2 race clustering by different playstyles Chapter 1 Introduction This chapter introduces the thesis work and frames it within the context of existing work. The following subchapters describe the topic, how it relates to computer science, and states motivations for conducting the research. Aims and objectives are explicitly expressed, research questions are presented and described in this chapter. 1.1 Background This thesis is based on, and a continuation of previous research on a simi- lar topic by Andreas Gustafsson [1]. Their research attempts to predict the winner of the video game Blood Bowl 2 using machine learning classification algorithms. Gustafsson used the data set constructed from a publicly avail- able database that contains log data of game matches. The research included grouping the game races by their playstyle. The categories were constructed using statistical analysis by a community member by the name of Schlice [2]. The Schlice research lacks validity, critical analysis and is just a blog post. The races appear to shift categories when several leagues of the game are analyzed. This thesis work employs machine learning algorithms to cluster the races by their playstyles using game log files acquired over the span of five months. A. Gustafsson emphasized the importance of the coaches (hu- man actors) while this study ignores the human factor altogether. Human 11j Blood Bowl 2 race clustering by different playstyles actors are strictly limited by the rules of the game of which there are plenty [3, 4]. Grouping races and play styles by coach proved to be challenging for A. Gustafsson due to most coaches having played very small amounts of games. A. Gustafsson also mentioned that using a deep neural network was not possible due to long training times. This research will attempt to reduce the number of features in the original data set. Feature selection for machine learning has been an important problem and there are several techniques proposed [5, 6].