Extracting Control Features to Predict a Player’s League in StarCraft II Chanmin Lee Chang Wook Ahn∗ School of EECS, Gwangju Institute of Science and AI Gradueate School, Gwangju Institute of Science and Technology Technology Gwangju, Republic of Korea Gwangju, Republic of Korea [email protected] [email protected]

ABSTRACT or a new player loses all, they will not be placed in the highest or The players are divided into seven leagues depending on their skill lowest league. In addition, it takes a long time to play five games. in StarCraft II. Since it is most ideal for players with similar skills to This study suggests a method of predicting the player’s league play games, it is important to be included in the right league. This by extracting numerical features associated with the control from paper proposes a method to predict a player’s league. Our proposed replay. The replay records the game log and allows it to recheck method is extracting control features from the replay, and introduce the past games. Through it, human data can be used instead of data a useful feature that is not well known. We use a Random Forest from the simulation. We use the average value of a feature over for verifying the importance of features. a certain interval. There are many options for choosing a section, such as five minutes after the start, combat, or the entire game. CCS CONCEPTS The characteristics of the player may be reflected a lot at a short and important moment, or accumulated over a long period. The • Computing methodologies → Supervised learning by clas- purpose is to compare datasets extracted from six different sections sification; • Information systems → Extraction, transformation and to investigate which sections to show good performance. and loading. StarCraft II is a real-time strategy game thus build order which takes most of the part in strategy is the most important thing. This is KEYWORDS because the order of constructing the building makes a difference in StarCraft II, League, Control the collection and consumption of resources and unit combinations. ACM Reference Format: However, such data is unused and it focused on the control. Even Chanmin Lee and Chang Wook Ahn. 2020. Extracting Control Features to just analyzing the factors of control except for the build order, it Predict a Player’s League in StarCraft II. In . ACM, New York, NY, USA, showed high accuracy. 3 pages. 2 METHODOLOGY 1 INTRODUCTION The replay dataset was collected from Spawning Tool, which is a For games to become popular, it is important to involve a large StarCraft II community. The dataset consists of 46,398 replays and number of players. For games that play alone without interacting was played from April 2015 to August 2019. Six types of datasets with other players, game developers should keep players interested were constructed by different sections of data extraction within the through continuous updates of game content. On the other hand, game. D1 is data during combat. The average length of combat was for competitive games that play with multiple players such as Star- 18s. D2 is data for 18 seconds after the game starts, D3 is for 18 Craft II, the player’s skills should be considered as well as game seconds before the game is over, D4 is for 5 minutes after the game content.[1] If a player meets an opponent who has a big difference starts, D5 is for 5 minutes before the game is over, and D6 is for the in skill, he will keep losing or winning. Then the player will soon entire game. lose interest. It is essential to determine each player’s skill accu- The goal of this paper is to use the player’s control information, rately to match players with similar skills.[2] StarCraft II distributes features related to control were extracted. The first feature is the players according to their skills in seven different leagues. The cur- number of camera switching. Players only see part of the game map rent system places the player in the appropriate league through through the camera instead of the whole game map. Orion Vinyals the first five placement games. The weakness of this system isthat et al. [3] found out that the camera affects the performance of the even if a professional game player wins all of the placement games agent. Therefore, it was expected that the number of camera has been moved affect human skills. APM showed good performance.[4] ∗Corresponding author Train and Build are features related to the consumption of resources. Control group consists of 4 features: setting, using, adding, and Permission to make digital or hard copies of part or all of this work for personal or number. Command consists of 5 features: basic, targeted to unit, classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation targeted to point, update unit, and update point. The last feature is on the first page. Copyrights for third-party components of this work must be honored. the race of player because the characteristics of the unit controlled For all other uses, contact the owner/author(s). vary on the race. The 13 features except race are averaged per SMA 2020, September 17-19, 2020, Jeju, Republic of Korea © 2020 Copyright held by the owner/author(s). second during each combat and scaled using minmaxscaler with a minimum value of 1 and a maximum of 10. SMA 2020, September 17-19, 2020, Jeju, Republic of Korea

3 RESULTS Therefore, race do not affect the results. Other features were located Random Forest algorithm was used to classify a player’s league. between 0.05 and 0.1. Random Forest outperformed in identifying players. [5] We used 10-fold cross-validation to calculate accuracy. The datasets can be divided into two groups according to the lengths. The first group contains D1, D2, D3 short in length but different in the point. The second group contains D4, D5, D6longin length. In the first group, D1 showed the highest overall accuracy of 66% in Table 1. However, the results of the paired t-test were t=2.10, p=0.08 between D1, D2, and t=1.52, p=0.18 between D1, D3. This means that a slight difference exists but is not significant. Compar- ing two groups, the result of the paired t-test was t=-9.18,p«0.01. The length of extraction had a much greater effect than the point, and there was a 9% difference in accuracy. The overall accuracy of D6 was 75%. T. Avontuur et al. predicted the player’s league with 45% accuracy.[4] Our result showed 30% higher accuracy than that. Table 2 shows the heatmap of confusion matrix in D6. Even if misclassified, most of them did not deviate significantly from the correct answer. The misclassification error distance was 1.43. Fig. 1 shows the feature importance of Random Forest. The top 2 features were action and camera switching. The importance of action and camera switching was 0.136, 0.115. While action has proven to be an important feature in other studies, camera switching Figure 1: Feature importance of Random Forest has not been handled well.[4] Since this result has shown that camera switching is related to the player’s skills, camera switching can be considered as one of the good features when analyzing StarCraft. The lowest feature was race with importance of 0.016. 4 CONCLUSIONS We proposed a method for extracting control features and predict a player’s league using them. Extracting data from combat or other Table 1: Accuracy of each dataset short sections showed lower performance than longer sections of data. Through this result, we can observe that steady control League D1 D2 D3 D4 D5 D6 has a greater impact on performance than instantaneous control. bronze 0.83 0.64 0.72 0.92 0.92 0.90 Therefore, when extracting control features, pulling out the average silver 0.67 0.58 0.64 0.73 0.70 0.73 value for the entire game gets high performance for predicting the gold 0.61 0.60 0.61 0.72 0.71 0.74 player league. platinum 0.70 0.64 0.70 0.74 0.75 0.76 The top 2 features to classify a player’s league were APM and diamond 0.59 0.60 0.59 0.72 0.68 0.72 camera switching. In particular, since research using camera switch- master 0.68 0.66 0.67 0.75 0.74 0.75 ing as the main feature is hard to find, we suggest using camera grandmaster 0.76 0.74 0.75 0.83 0.81 0.83 switching as the main feature. The overall accuracy was 75%, and overall 0.66 0.63 0.65 0.75 0.73 0.75 90% especially in bronze. If the rank system uses these results in- stead of the placement games, it can quickly and accurately place players with a single round. Since bronze has the highest accuracy, Table 2: Heatmap of confusion matrix in D6 it will also help prevent new players from leaving the game by placing new users correctly in bronze.

ACKNOWLEDGMENTS This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF- 2019R1I1A2A01057603).

REFERENCES [1] Fernando Palero, Cristian Ramirez-Atencia, and David Camacho. Online gamers classification using k-means. In David Camacho, Lars Braubach, Salvatore Ven- ticinque, and Costin Badica, editors, Intelligent Distributed Computing VIII, pages 201–208, Cham, 2015. Springer International Publishing. [2] Rodrigo Vicencio-Moreira, Regan Mandryk, and Carl Gutwin. Now you can compete with anyone: Balancing players of different skill levels in a first-person shooter game. In CHI’15: Proceedings of the 2015 CHI international conference Extracting Control Features to Predict a Player’s League in StarCraft II SMA 2020, September 17-19, 2020, Jeju, Republic ofKorea

on Human factors in computing systems, pages 2255–2264, , Korea, 2015. [4] T. Avontuur, P. Spronck, and M. Zaanen. Player skill modeling in . In Honourable Mention Award given to top 5 AIIDE, 2013. [3] Oriol Vinyals, I. Babuschkin, W. Czarnecki, et al. Grandmaster in starcraft ii [5] S. Liu, C. Ballinger, and S.J. Louis. Player identification from rts game replays. 28th using multi-agent reinforcement learning. Nature, pages 1–5, 2019. International Conference on Computers and Their Applications 2013, CATA 2013, pages 313–318, 01 2013.