Interpretable Real-Time Win Prediction for Honor of Kings – a Popular Mobile MOBA Esport

Zelong Yang Zhufeng Pan Yan Wang Tsinghua-Berkeley Shenzhen Institute University of California, Los Angeles AI Lab Tsinghua University Email: [email protected] Inc. Email: [email protected] Email: [email protected]

Deng Cai Xiaojiang Liu Shuming Shi The Chinese University of Hong Kong AI Lab AI Lab Email: [email protected] Tencent Inc. Tencent Inc. Email: [email protected] Email: [email protected]

Shao-Lun Huang Tsinghua-Berkeley Shenzhen Institute Tsinghua University Email: [email protected]

Abstract—With the rapid prevalence and explosive devel- opment of MOBA esports (Multiplayer Online Battle Arena electronic sports), much research effort has been devoted to automatically predicting game results (win predictions). While this task has great potential in various applications, such as esports live streaming and game commentator AI systems, previous studies fail to investigate the methods to interpret these win predictions. To mitigate this issue, we collected a large-scale dataset that contains real-time game records with rich input features of the popular MOBA game Honor of Kings. For inter- pretable predictions, we proposed a Two-Stage Spatial-Temporal Fig. 1: Application of a non-interpretable prediction model in Network (TSSTN) that can not only provide accurate real-time win predictions but also attribute the ultimate prediction results the LoL 2019 World Championship live streaming. Due to the to the contributions of different features for interpretability. lack of interpretability, the commentators can merely make Experiment results and applications in real-world live streaming simple comments like “Now the win probability of the Blue scenarios showed that the proposed TSSTN model is effective Team is 80%”. both in prediction accuracy and interpretability.

I.INTRODUCTION these works, [1]–[4], [11] primarily focus on pre-game win MOBA esports have become increasingly popular. For ex- predictions using pre-game features such as hero selections ample, one of the world’s three most popular and highest- and roles of heroes. Other works [5]–[10] introduce real- grossing MOBA games, Honor of Kings (HoK, whose inter- arXiv:2008.06313v3 [cs.AI] 16 Apr 2021 time (during match) features such as team-kills and heroes’ national version is named ), is attracting more experience to predict the outcomes of MOBA games. For than 80 million daily active players and 200 million monthly example, Yang et al. [5] use three real-time features (gold, active players1. Other popular MOBA games such as DotA II experience, and death) to predict the winner of DotA II. and LoL also have tens of millions of players worldwide. In the last King Pro League Fall 2019 tournament for HoK, the Although some progress has been made, existing studies on prize pool reached 1.13 million dollars2. Due to the enormous MOBA esports win predictions suffer from a major limitation popularity of MOBA esports, related industries such as es- that they fail to investigate the possibility to interpret these ports live streaming and game commentator AI systems have win predictions. The lack of interpretability dramatically limits emerged and increasingly generate huge amounts of profit. their applications in industry scenarios such as esports live Under these circumstances, much research [1]–[11] has been streaming. As one can see in Figure 1, the non-interpretable LoL done on automatic MOBA esports win predictions. Among prediction model used in the 2019 World Championship can only give the win probabilities of the two teams (80% 1https://en.wikipedia.org/wiki/Honor of Kings and 20%). Since the game commentators are not aware of 2https://liquipedia.net/arenaofvalor/King Pro League/2019/Fall the underlying reasons for the prediction results, they can merely make some simple comments such as “Now the win probability of the Blue Team is 80.0%” based on the win probabilities, which is rather dull and non-informative. In contrast, once we can provide more interpretable predictions such as: {WinProbBlue: 80.0%, Factor 1: towers, Factor 2: team compositions}, the commentators can be aware that it is the difference in tower-counts and team compositions that results in these win predictions. With this information, more insightful comments can be made such as: “The Blue Team has destroyed three towers in the past five minutes and thus accumulated a huge advantage in terms of vision. Besides, Fig. 2: Intermediate prediction results in an HoK game. This with a more powerful team composition, they now have a very figure shows the importance weights and contributions of six high win probability of 80.0%.” feature groups in the example game at 5 minutes and 15 To address the above problems, we collected a large-scale minutes. dataset that contains abundant real-time game records of the popular MOBA game HoK. A critical characteristic of these game records is that they are played by highly-skilled and its time-variant weight denote the contribution of the human players (top 1% according to the game leaderboard). single feature group to the ultimate win prediction. Since matches with higher skill levels result in fewer random It is worth noting that high importance does not necessarily factors [2], [5], the collected dataset is more predictable and lead to a large contribution. For example, team gold is a high- thus can result in more reliable models. The dataset covers importance feature at most time-points, but its contribution 184,362 real human games, and the real-time game records will still be insignificant if the two teams have approximately are collected every 30 seconds, generating 5,253,661 data- similar amounts of gold at the current time-point. On the other frames in total. For each data-frame, the most discriminative hand, features with relatively low importance could still make and human-perceptible features are included, such as teams’ a non-negligible contribution if the Spatial-model’s distinct accumulated gold, kill-counts, and tower-counts. prediction output is high enough. For instance, in the game With the above collected dataset, we propose a Two-Stage shown in Figure 2, the importance weights at 15.0 minutes Spatial-Temporal Network (TSSTN), which gives not only is the fifth out of six (10.1%), while its contribution to the the accurate win predictions but also human-interpretable current game ranks the third (16.6%). intermediate results. In the first Spatial-stage, features are Figure 2 illustrates the interpretable intermediate results grouped into six distinct feature groups (Gold, Kill, Tower, given by the TSSTN model. As shown, our system gives Wild Resource, Soldier, and Heroes) and projected onto six information about the importance weights and contributions separate representation spaces. These are the six most impor- of the six feature groups to explain the underlying reasons for tant features suggested by senior professional commentators. the ultimate predictions. At 5.0 minutes, the most contributed Upon these spaces, we build six Spatial-models that take feature groups are Gold, Kill, and Tower, as the pure coral individual feature groups as input only and make the respective bars show; at 15.0 minutes, the most contributed features win predictions. Then, after a deep investigation on HoK and change to Gold, Soldier, and Wild Resource, as the striped other MOBA games, we find that the same features are not coral bars show. In this example, the Gold feature’s importance invariantly important throughout the games. For example, in and contribution decrease as the match progresses, while the beginning of the games, the team compositions are the other features’ (such as Tower and Soldier) contributions and most vital features to the win predictions; however, in the importance increase as the match proceeds. middle of the games, gold amount may become the most Experiments show that the TSSTN model gives accurate important feature in terms of win predictions. To better model real-time win predictions as well as human-interpretable inter- this “temporal” characteristic, in the second stage (Temporal- mediate results for the popular MOBA game HoK. Besides, stage) we assign six time-variant weights to the six Spatial- the TSSTN model has already been implemented in a real- models, and the ultimate win prediction is made through the world AI commentator system to provide human-interpretable weighted combination of Spatial-models’ outputs. prediction results. The proposed TSSTN model is interpretable in that it The contributions of our work are summarized as follows: separates the effects of the values and importance of features • A large-scale HoK dataset for real-time MOBA esports to decouple their contributions: 1) The Spatial-models’ out- win prediction is collected and utilized to train reliable puts are independent win predictions based on single feature prediction models. groups, which indicate the two teams’ win-or-lose likelihood in • A Two-Stage Spatial-Temporal Network (TSSTN) is pro- the domains of different feature groups (Spatial-domains); 2) posed to make real-time interpretable win predictions by The six time-variant weights of the Temporal-stage represent attributing the predictions to the contributions of different the relative importance of different feature groups at specific features. time-points; and 3) The product of the Spatial-model’s output • Experiment results and applications in a real-world AI commentator system demonstrate the effectiveness and TABLE I: An example data-frame at time 9.5 minutes. interpretability of the proposed TSSTN model. time: 9.5 win side: ··· middleTowerCnt: 3 II.RELATED WORKS lose side: crystalTowerCnt: 3 Due to the high demand for game result forecasts, much hero id: AD: 52, 96, 56, 23, 31 alive: true research has been conducted on predicting the results of hero: id: 31 games, both for traditional sports and MOBA esports. In this 52: Raider: ··· section, we introduce some related work on the topic of game profession: mage AP: ··· result predictions, including traditional sports win predictions, deadCnt: 3 overlord: 0 MOBA esports pre-game win predictions, and MOBA esports assistCnt: 1 darkTyrant: 0 real-time win predictions. killCnt: 0 numOverlord: 0 gold: 3338 numDarkTyrant: 0 96: ··· numTyrant: 0 A. Traditional Sports Win Prediction 56: ··· numRedBuf: 0 ··· Due to the high demand from related industries, many stud- 23: numBlueBuf: 0 31: ··· soldierDist: ies have aimed to predict the winners of traditional sports, such teamGold: 23884 lower: 8544 as basketball [12]–[14], football [15], [16], and others [17], kill: 6 upper: 8932 [18]. In these works, the authors first collect their datasets con- participation: 0.67 middle: 7273 taining pre-game or real-time information, then build models towerCnt: 7 to predict the results based on their selected features. These models have proven to be useful to some extent, but two issues need to be stressed. Firstly, to make sure that the games are of C. MOBA Esports Real-Time Win Prediction the same skill level, the datasets should contain games in the same professional competitions, such as the NBA (National Yang et al. [5] first used real-time data to predict the Basketball Association) for basketball and the NFL (National results of MOBA esports. However, their training data and Football League) for football. However, the amount of such selected features were insufficient (just less than one-third of sports competitions’ data is far too insufficient to train reliable the training data contained real-time information, and only models, considering the limited number of matches every year. three real-time features were collected). Besides, the results Secondly, there are too many off-game variables in traditional of this work were non-interpretable, further limiting their use. sports, such as weather conditions.3 Therefore, it is hard to Following this work, many works [6]–[10] used richer real- comprehensively and objectively choose the suitable features. time features to give win predictions for DotA II and LoL. Hodge et al. [7] used the game data of different skill levels to mitigate the inefficiency of training data. [20] aimed to predict B. MOBA Esports Pre-Game Win Prediction events inside the games such as hero deaths. Demediuk et MOBA esports games, on the other hand, are naturally al. [21] used unsupervised learning technologies to explore suitable for the task of result prediction. For one thing, huge the roles of heroes in MOBA esports. amounts of structured game data can be easily collected from Another type of research on esports win predictions is to the game database to train reliable models. Also, the variables train game AI using reinforcement learning technologies, such in MOBA esports are relatively few and easy to obtain. With as [22] for StarCraft II, [23], [24] for Honor of Kings, and [25] the rapid development and prevalence of MOBA esports, for DotA II. Trained appropriately, value-based reinforcement much research has been conducted on predicting the results of learning models are capable of giving real-valued win predic- MOBA esports games, such as [8], [9]. Among these works, tion results. Although these game AI models can give real- [2] first started research on MOBA esports result predictions, time win predictions, they are black-box models lacking in then [1], [3], [11] followed this work to make pre-game win interpretability. predictions. Semenov et al. [4] and Wang et al. [10] suggested Based on these previous works, our work utilizes sufficient several machine learning methods to predict MOBA esports MOBA esports data with diverse real-time features and designs using pre-game information such as team compositions. Chen a TSSTN model to give reliable and interpretable prediction et al. [19] studied the synergy and restrain relations among results. Our work has already been implemented in a real AI heroes. These works, however, lack real-time data, a factor commentator system for HoK. To the best of our knowledge, limiting their usages. our work is a pioneering attempt to realize interpretable prediction models for MOBA esports win predictions. 3Esports also have many off-game variables that can affect the results of the games, such as the player being tired, ill, under stress, and mood. However, we III.DATASET claim that these factors can be well reflected by the in-game features such as the gold-amount and the number of towers. In this sense, the win predictions HoK (along with its international version Arena of Valor) is for esports are more generalizable than traditional sports predictions. a mobile MOBA game that enjoys great popularity globally. Fig. 3: The flowchart of the proposed TSSTN model. The left part displays the data collection and feature selection procedures. The middle part is the Spatial-stage. Each Spatial-model is designed for one feature group to give individual prediction (win- score). The right part shows the Temporal-stage. The time-dependent importance weights combine with Spatial-models’ outputs to give the ultimate win prediction. The middle part (Spatial-stage) consists of two kinds of Spatial-models: feed-forward networks and logistic regression models. For feed-forward Spatial-models, the features (including time) are converted into embedding vectors as input; for logistic regression Spatial-models, the input features are scalars. In the figure, the different numbers of boxes for “Time” and “Heroes” represent the different length of these vectors.

In HoK, two teams of five heroes combat against each other to of different layers are separated using indents, and repeated destroy the “Crystal” (home base) of the rival team for victory. fields are omitted using ellipses. A detailed game introduction can be found in [26]. Cooperating with the publisher of the Honor of Kings game, IV. MODEL 4 we collect the anonymized information of all the 184,362 In this section, we introduce the Two-Stage Spatial- games that happened on a randomly chosen day in June Temporal Network (TSSTN) model for interpretable real-time 2019 from the game-core data of the official game database. MOBA esports win predictions. In the first Spatial-stage, Game-core data is the original data form with which we we categorized the features into six feature groups: Gold, can reconstruct the replays of the games and extract the Kill, Tower, Wild Resource, Soldier, and Heroes. Then we necessary game features. Prior experiments for data scale projected them onto different representation spaces. Upon indicate that one day’s data is sufficient for our task. Please each space, a Spatial-model was built to compute a win- note that to ensure the quality of games, only “Conquerors”- score si ∈ [−1.0, 1.0] which represents the win-or-lose level games - whose participants are top 1% players - are likelihood prediction based on this feature group. Since each collected (according to [2], [5]). To reduce the redundancy of Spatial-model was unaware of the features in other feature the dataset, we extracted the features every half minute. This groups, its output (win-score) could be regarded as the inde- means that a 15-minute game will transform to 30 data-frames pendent win prediction of the team in each specific feature in the collection process, and we obtained 5,253,661 data- domain. Moreover, these spaces (feature groups) were neither frames in total. In each data-frame, 96 features from the game equally important with each other (in spatial-dimensions) process were extracted, including sufficient information that 5 nor invariantly important throughout the games (in temporal- we think is helpful for the win prediction. Aside from some dimensions). For example, feature group Soldier (soldiers’ obviously useful information such as gold, kill-count, and distances to the rival Crystal) became increasingly important tower-count of each team, we also calculated some potentially because soldiers became increasingly potent as the games helpful features, such as soldiers’ distances to the rival’s proceeded. To better model the temporal variance of feature “Crystal”, buff information, and wild resource occupancy. An groups’ relative importance, in the second Temporal-stage we illustration of one data-frame is shown in Table I, where fields assigned a time-variant and learnable weight to each Spatial- 4Data without the privacy information such as players’ names, account IDs, model. Finally, the inner product of the win-score vector and game history. s = [s1, s2, ..., s6] and the time-dependent importance weight 5As Table I shows, the 96 features include 48 features for the win and the t t t t T vector w = [w1, w2, ..., w6] was the ultimate prediction lose side: heroes’ IDs (1); profession, dead count, assist count, kill count, t and gold of the five heroes (5 × 5); team gold (1); team’s kill count (1); result F . Figure 3 demonstrates the flowchart of the proposed participation (1); tower count (1); Middle Tower count (1); Crystal Tower TSSTN model. count (1); alive status and IDs for AD, Raider, and AP (2 × 3); Overlord In this two-stage architecture, some intermediate results can buff status (1); Dark Tyrant buff status (1); number of killed Overlord (1); number of killed Dark Tyrant (1); number of killed Tyrant (1); number of help interpret the ultimate win predictions. 1) The weight killed Red and Blue Buff (1 × 2); soldiers’ distances to the crystal (1 × 3). vector wt at time t reveals the relative importance of the feature groups at that time. It varied over time because feature max feature values, and the remaining three categorical feature groups were not invariantly important in different game pro- groups were converted into embedding vectors. t cesses. 2) The product wi ·si of feature group xi’s importance t B. Spatial-Stage weight wi and win-score si directly contributed to the ultimate t t P t win prediction F as F = i wi · si. Therefore, we refer As noted, the Spatial-stage consisted of three logistic regres- t to the value of wi · si as the contribution of feature group sion models and three feed-forward neural networks called xi. This contribution revealed the effects of the two teams’ Spatial-models. We used logistic regression models for nu- feature differences and features’ importance on the ultimate merical feature groups (Gold, Kill, and Tower), and used two- prediction result. Based on this scheme, we can easily explain hidden-layer feed-forward neural networks for categorical fea- how the ultimate win prediction F t is made according to the ture groups (Wild Resource, Soldier, and Heroes). For Spatial- above interpretable information. In many previous studies, this model i, the input consisted of its corresponding feature group information was hidden in the entangled parameters of the xi introduced in Section IV-A, and the current game time t. By “black-box” models. adding the game time as part of input feature in the Spatial- models, the accuracy of the Spatial-models is improved, so A. Feature Selection is the overall accuracy of the TSSTN model. The output To choose the most suitable features from the dataset, we of Spatial-model i is a win-score si(xi, t) ∈ [−1.0, 1.0], invite senior HoK commentators to select the most relevant representing the win-or-lose likelihood prediction based on the features for win predictions based on their professional ex- single feature group’s value difference. perience. Based on their suggestions, we select 11 types of Note that although adopting more sophisticated Spatial- features in every data-frame and classify these features into six model architectures for different feature groups will possibly feature groups. One may select other features when predicting improve the overall prediction accuracy further, it was not other MOBA games. the primary purpose of this study. We wanted to focus on • Gold: The difference in the accumulated amounts of gold the interpretability of MOBA esports win predictions, so we between the two teams. Gold can be used to purchase chose simple-architecture networks (logistic regression models more powerful equipment to enhance the attributes (such and feed-forward networks) for the feature groups. In real- as health-point, attack, and defense) of the heroes. world scenarios, one can deliberately design different model- • Kill: The difference in the kill-counts of the two teams. architectures for different feature groups. By killing the rival heroes, extra gold and experience can be obtained to purchase equipment and level up, C. Temporal-Stage respectively. The Temporal-stage consisted of six time-dependent • Tower: The difference in the numbers of existing defense weights, reflecting the relative importance of the six feature towers, including three kinds of towers (front, middle, and groups at different game processes. Given the win-score vector Crystal towers). Defense towers can hold back the rival s(x, t) at time-point t provided by the Spatial-stage (where heroes and soldiers as well as provide vision. vector x represents the six feature groups), the Temporal- • Wild Resource: The difference of the wild resources, stage calculated the ultimate win prediction F t using the linear including two types of wild monsters: Dark Tyrants and combination of importance weights wt and win-scores s(x, t): Overlords. By killing these monsters, the whole team d obtain buffs that can provide extra gold, experience, and t t X t attributes. F = w · s(x, t) = wi ·si(xi, t) (1) i • Soldier: The soldiers’ distances to the opponent “Crys- tal”, including the distances in the three lanes (top, where d denotes the number of feature groups. Since the middle, and bottom lanes). Soldiers are useful in terms of weights wt represent the relative importance of feature groups destroying towers, which are unavoidable obstacles to get at time-point t, we applied the softmax function to normalize t to the rival “Crystal”. As the game proceeds, the soldiers the importance weight wi for each feature group: will become increasingly potent according to the MOBA exp θt game settings. wt = i , (2) i P exp θt • Heroes: The heroes’ IDs selected by the players. As j j shown in Table I, these IDs are in the form of numbers in t so that the weight wi satisfy: the dataset. Since there are certain restrain and synergy d relations among heroes, this information can be utilized X wt = 1, wt >= 0. (3) to improve the prediction accuracy before and during the i i games. i As the input of the TSSTN model, different feature groups D. Interpretability: Feature Groups’ Contributions were pre-processed in different ways according to their Interpretable results can then be generated by element-wise Spatial-models. As the middle part of Figure 3 shows, the first multiplying the results of the Spatial-Stage and the Temporal- three numerical feature groups were normalized using min and stage. As the right part of Figure 3 shows, by multiplying the win-scores of the spatial-models and their corresponding TABLE II: The accuracy (%) of the three prediction models importance weights, we get the contributions of the feature at five equidistant time-points. groups. After sorting these contributions, we can conclude Models 0 min (std) 5 min (std) 10 min (std) 15 min (std) 20 min+ (std) which feature groups are more important to the current win Heuristic 50.0 69.2 78.4 73.0 63.0 Fully-Connected 54.7 (0.043) 69.3 (0.023) 79.3 (0.012) 75.0 (0.023) 70.0 (0.057) prediction compared with others. What’s more, the summation TSSTN 54.6 (0.036) 69.2 (0.005) 78.5 (0.005) 73.4 (0.009) 67.8 (0.032) Logistic Regression 53.9 (0.007) 63.8 (0.012) 78.2 (0.010) 70.5 (0.008) 56.6 (0.003) of all the six contributions is the current win prediction, as LSTM 54.3 (0.021) 69.4 (0.017) 78.4 (0.010) 76.3 (0.005) 66.5 (0.005) shown in Equation 1. For example, as shown in the case of Figure 2, at time 15.0 minutes, the win-scores of the six feature groups are [0.590, 0.292, 0.612, 0.901, 0.611, 0.231] added a dropout layer with a rate of 0.2 to mitigate overfitting.6 (which is not shown in Figure 2), and the importance After the output layers of the feed-forward network Spatial- weights are [0.348, 0.184, 0.129, 0.101, 0.172, 0.065]. models, we used tanh functions to rectify the Spatial-models’ After element-wise multiplying the win-scores and the impor- outputs (win-scores) to be among [−1.0, 1.0] to represent win- tance weights, the contributions of the six feature groups are or-lose likelihood. [0.205, 0.054, 0.079, 0.091, 0.105, 0.015]. By sorting these B. Comparing Methods contributions, we can conclude that feature group 1 (Gold) and In order to thoroughly evaluate the aforementioned TSSTN feature group 5 (Soldier) are the most important. With this model, we compare it with two baselines: information, human commentators or rule-based commenting systems can generate comments such as “The win probability • Heuristic: Intuitively, “gold” is one of the most decisive of the Blue Team is 77.5%, which is due to its advantages in features for the ultimate win predictions, so the Heuristic gold amount and soldier positions.” model predicts the game based on the difference of the accumulated gold between the two teams. The team with E. Training more gold will be predicted as the winner. To optimize the parameters in the Spatial-stage and the • Fully-Connected: A fully-connected neural network that Temporal-stage, we trained the Spatial-models and importance takes all the six feature groups as the input. Before the weight vectors using cross-entropy loss and back-propagation input layer, the features were pre-processed in the same technologies. In experiments, we found little difference be- manner as the TSSTN model. We designed four hidden tween training the two stages simultaneously or separately. In layers for this model with dimensions 1024, 4096, 512 the subsequent experiments, we choose to train the two stages and 64, respectively. After each hidden layer, we added a simultaneously. leaky ReLU function as the non-linear layer and added a dropout layer in order to mitigate overfitting. After V. EXPERIMENTS the output layer, a tanh function was used to give the A. Experimental Settings ultimate win predictions. The settings of the leaky ReLU layers and dropout rates were identical to those of the We conducted experiments on the dataset mentioned above. TSSTN model introduced in Subsection V-A. 10,000 games from the dataset were randomly selected as the • Logistic Regression: The Logistic Regression takes all test set. 90% of the remaining 174,362 games were set as the the features as its input. The input is in the same format training set and the other 10% as the validation set. of the Fully-Connected network. The parameters of the TSSTN model were as follows. In • LSTM: We use bidirectional LSTM with two recurrent the Spatial-stage, the Spatial-models for numerical feature layers. Sequential data with sequence 5 is used as input. groups Gold, Kill, and Tower were logistic regression models For time 0.0, 2.5, 5.0, and 7.5, the first frame of input with tanh as the non-linear function; the Spatial-models for data is duplicated to satisfy the sequence-length. The other categorical groups Wild Resource, Soldier, and Heroes probability of dropout is 0.2. The size of the hidden state were two-hidden-layer feed-forward neural networks with di- is 32. After the LSTM, we use a 64-dimension fully- mensions (256, 16), (256, 16) and (128, 16), respectively.6 connected layer and a tanh function to compute the win Before the input layer, the feature values (including time) were predictions. normalized using min and max values (for logistic regressions) The prediction accuracy of the single-feature-group based or preprocessed into embedding vectors (for feed-forward prediction models (Spatial-models, such as the prediction networks), as the middle part of Figure 3 shows. For feed- model depending on the “Heroes” features only) are elabo- forward network Spatial-models, after each hidden-layer, we rated in Subsection V-E. These Spatial-models are separately used a leaky ReLU function7 as the non-linear function and discussed since they utilize less input information than the 6These parameters (number of layers, dimensions of the layers, and dropout three aforementioned models. parameters) are chosen through a brief parameter sweep, in which we found that parameters in a reasonable range resulted in similar accuracy, and we C. Evaluation Metrics chose the parameters with the best accuracy. The prediction accuracy at different game time-points was 7 The formula of the leaky ReLU is: adopted as the automatic evaluation metrics. Although real- f(x) = x, x ≥ 0, (4) f(x) = 0.01x, x < 0. perts at time-points 0.0, 17.5, and 20.0. Intuitively, predictions at these three time-points were difficult for human experts because many features that were decisive most of the time were not significant enough at these time-points. For example, “gold” is a decisive feature at most time-points. However, at the beginnings of the games (0.0 minute), the two teams had equal gold, and gold was no longer essential after 17.5 minutes because the players had already accumulated enough gold at that time. At time-point 0.0 minute, the HoK experts appeared to have Fig. 4: The average prediction accuracy of the three HoK low prediction accuracy because before the games started, experts and the two automatic prediction models at nine the human experts could only “guess” the results based on equidistant time-points. prior-game information like team compositions, which was quite inaccurate compared with the prediction results of the automatic methods. valued winning probabilities can be provided by the TSSTN As shown in Table II and Figure 4, the prediction accuracy model and the Fully-Connected Network, we only considered of both the human experts and the automatic methods fell after the prediction accuracy of the binary win-or-lose results for 12.5 minutes because: 1) In the late-game stages, both the the test games. To better evaluate how well the proposed levels and equipment of the two teams reached the maximum; TSSTN model predicts the game-winner, we also recruited therefore, the results of the games were increasingly affected three human experts to manually predict the winner of each by random factors like players’ accidental mistakes. This fact game given the test game data. All these experts ranked in the made the late-stage games harder to predict; and 2) since HoK top 0.1% in the ladder tournament of according to the the games were not equal-duration, games that were easier to game leaderboard. Experts with higher ranks are more likely predict often ended before late-game stages; therefore the test to predict the games more accurately. The game experts were data-frames were harder to predict at late-game stages time- given access to the real games so they could explore the whole points, resulting in the noted drop in prediction accuracy. game states and extract their own features and conclusions, and were asked to predict the winners of the games every 2.5 E. Further Analysis minutes based on the battle situations and their experience. More information about the proposed TSSTN model can Note that since it is an impossible task to annotate all the be revealed by the six Spatial-models’ individual prediction 10,000 games in the test set manually due to the limitation of accuracy and the corresponding importance weights, as shown resources, we randomly chose 200 test games for each time- in Figure 6. At time-point 0.0 minute of the games, the only point for human annotation. Spatial-model with a non-trivial (larger than 50%) prediction accuracy was Spatial-model Heroes; accordingly, the only non- D. Results zero importance weight was also Heroes, as shown in the right The average prediction accuracy at five equidistant time- part of Figure 6. As the games proceeded, the most accurate points is shown in Table II. The Fully-Connected Network Spatial-model was Gold, which was also reflected in its high and the LSTM achieved the best performance in terms of the importance weights. After 12.0 minutes in the games, both prediction accuracy at these five time-points. The prediction the prediction accuracy and importance weight of Gold began accuracy of the TSSTN model was higher than the Heuristic to drop. This phenomenon may have occurred because extra model but a little lower than the Fully-Connected Network and gold above some certain thresholds became useless due to the the LSTM, since we sacrificed the coupling information among limited package space for equipment in the late-game stages. the six feature groups to some extent for interpretability. The After 20.0 minutes in the games, the prediction accuracy of accuracy of the Heuristic model at 0.0 minute was 50.0% since the Spatial-model Soldier surpassed that of the Spatial-model when the two teams have equal gold, the Heuristic model Gold, as was its corresponding importance weight. predicted the winner stochastically. Please note that human The six Spatial-models’ importance weights and prediction experts’ prediction results are not included in Table II due to accuracy changes were consistent with our intuition. For the difference of the test sets (200 games versus 10,000). The example, since the team composition information (Heroes) comparison between HoK experts and the proposed TSSTN was fixed throughout each game, the prediction accuracy of model is separately shown in Figure 4. the Spatial-model Heroes was almost impervious to time, Figure 4 shows the comparisons of the prediction accuracy as revealed in the left part of Figure 6. Moreover, since among human experts and two prediction models, one inter- the difference in team compositions had a relatively limited pretable model TSSTN and one non-interpretable model Fully- influence on the game results, the corresponding prediction Connected Network. Although the accuracy of the TSSTN accuracy of Spatial-model Heroes was only slightly above model was slightly lower than that of the Fully-Connected 50.0%. The Spatial-model Wild Resource only became valid Network, it was obviously higher than that of the human ex- after 8.0 minutes in the games, since its features “Overlord” Fig. 5: An example HoK game in which an AI commentator system gives real-time comments as the game progresses. Among these comments, the win prediction related ones are supported by the proposed TSSTN model. The upper left part of this figure shows the win probability curves of the Blue Team and the Red Team throughout the game. The lower left, upper right and lower right parts show three screenshots in which the AI commentator system utilizes the interpretable intermediate results of the proposed TSSTN model to generate its comments.

F. Interpretability The proposed TSSTN model now serves as an essential module of an AI commentator system for HoK.8 Having been deployed for twelve months (since April 2020), this AI commentator can automatically generate real-time com- ments for HoK games, and a large part of its comments rely on the interpretable prediction results of the proposed TSSTN model. Through the feedback of the audience and professional commentators, we believe that the TSSTN is able Fig. 6: Six Spatial-models’ average prediction accuracy and to provide human-interpretable information with satisfactory importance weights in the first 20 minutes of the games. performance. To illustrate the interpretability of the proposed TSSTN model, we exhibited a live-streaming game as an example, as Figure 5 shows. Specifically, the upper left part of this figure shows two real-time win probability curves for the two teams provided by the TSSTN model. The screenshots of three time-points in the live game are exhibited in the lower left, upper right and lower right parts, respectively. In the screenshots, the Chinese subtitles in white are real comments generated by the aforementioned HoK AI commentator system, and the corresponding English translations are provided below. At and “Dark Tyrant” only changed after 8.0 and 10.0 minutes, time-point 1 (game time 0:03), since it was the beginning of respectively. The Spatial-model Soldier (soldiers’ distances to the rival Crystal) became more significant after 19.0 minutes 8A live-streaming video record of this system can be found compared to other Spatial-models, since soldiers became in- at https://www.bilibili.com/video/BV1sK411W7JH, and its English-subtitled record version can be found at the link https://drive.google.com/drive/folders/ creasingly potent and important in the late-game stages in most 1oYT8DDlHOcA7 OXdyr4X3ErNDJXFN734?usp=sharing. Our module MOBA esports, especially HoK. works at game time 3:57, 5:04, and 8:25 in this record. the game, the TSSTN model predicted the Blue Team’s win esports win predictions. Further research can be done on im- probability as 54.4% based on the team composition (Heroes) proving the prediction accuracy and exploring the applications information only, so the AI commentator generated the com- of interpretable prediction models in different industries. ment: “The game starts! The team composition of the Blue Team is in advantage; therefore, they get a win probability REFERENCES 9 of 54.4%”. Then, at time-point 2 (game time 2:11), due [1] N. Kinkade, L. Jolla, and K. Lim, “Dota 2 win prediction,” Univ Calif, to the disadvantages in Gold and Kill, the Red Team’s win vol. 1, pp. 1–13, 2015. probability dropped to 36.2%, and the contributions of feature [2] K. Conley and D. Perry, “How does he saw me? a recommendation engine for picking heroes in dota 2,” Np, nd Web, vol. 7, 2013. groups “Gold” and “Kill” to the win prediction were -0.187 [3] K. Kalyanaraman, “To win or not to win? a prediction model to and -0.073, respectively. So the AI commentator generated the determine the outcome of a dota2 match,” Technical report, University comment: “The Red Team is at a disadvantage in terms of gold of California San Diego, Tech. Rep., 2014. [4] A. M. Semenov, P. Romov, S. Korolev, D. Yashkov, and K. Neklyudov, and kill-count. AI predicts that the Red Team’s win probability “Performance of machine learning algorithms in predicting game is only 36.2%”.9 Finally, at time-point 3 (game time 9:44), outcome from drafts in dota 2,” in Analysis of Images, Social Networks the Red Team reversed the win probability to 56.2%. This and Texts - 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers, ser. Communications prediction was made based mainly on the difference of “Gold”, in Computer and Information Science, D. I. Ignatov, M. Y. Khachay, “Wild Resource”, and “Tower” between the two teams, so the V. G. Labunets, N. V. Loukachevitch, S. I. Nikolenko, A. Panchenko, AI generated the comment: “With a real mess and bad start, A. V. Savchenko, and K. Vorontsov, Eds., vol. 661, 2016, pp. 26–37. [Online]. Available: https://doi.org/10.1007/978-3-319-52920-2 3 the Red Team catches up gradually to kill more Tyrants and [5] Y. Yang, T. Qin, and Y. Lei, “Real-time esports match result destroy more towers, thereby they accumulate an advantage prediction,” CoRR, vol. abs/1701.03162, 2017. [Online]. Available: in terms of Gold. Currently, their win probability has grown http://arxiv.org/abs/1701.03162 9 [6] T. Wang, “Predictive analysis on esports games: A case study on league to 56.2%”. of legends (lol) esports tournaments,” 2018. Obviously, without the interpretable results, commentators [7] V. Hodge, S. Devlin, N. Sephton, F. Block, A. Drachen, and P. Cowling, were only able to make some trivial comments devoid of “Win prediction in esports: Mixed-rank match prediction in multi-player online battle arena games,” arXiv, WorkingPaper, Nov. 2017. interest to the audience. For example, at time-point 3, one [8] I. Makarov, D. Savostyanov, B. Litvyakov, and D. I. Ignatov, “Predicting could only comment: “Now the Red team reverses its situation winning team and probabilistic ratings in ”dota 2” and ”counter-strike: and has a win probability of 56.2%”, which is less informative Global offensive” video games,” in Analysis of Images, Social Networks and Texts - 6th International Conference, AIST 2017, Moscow, Russia, than the interpretable one made by the AI commentator with July 27-29, 2017, Revised Selected Papers, ser. Lecture Notes in the help of the TSSTN model. Currently, the AI commentator Computer Science, W. M. P. van der Aalst, D. I. Ignatov, M. Khachay, generates 157.2 comments per game on average, and a large S. O. Kuznetsov, V. S. Lempitsky, I. A. Lomazova, N. V. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, A. V. Savchenko, and part of them depend on the proposed TSSTN model with S. Wasserman, Eds., vol. 10716. Springer, 2017, pp. 183–196. a relatively even distribution over the six feature groups. [Online]. Available: https://doi.org/10.1007/978-3-319-73013-4 17 This is a very significant characteristic that makes our AI [9] V. Hodge, S. Devlin, N. Sephton, F. Block, P. Cowling, and A. Drachen, “Win prediction in multi-player esports: Live professional match predic- commentator system different from other non-interpretable- tion,” IEEE Transactions on Games, Nov. 2019. prediction based systems. [10] N. Wang, L. Li, L. Xiao, G. Yang, and Y. Zhou, “Outcome prediction of dota2 using machine learning methods,” in Proceedings of 2018 International Conference on Mathematics and Artificial ONCLUSION VI.C Intelligence, ICMAI 2018, Chengdu, , April 20-22, 2018, J. Zhu and E. Lin, Eds. ACM, 2018, pp. 61–67. [Online]. Available: In this paper, we proposed to predict the results of MOBA https://doi.org/10.1145/3208788.3208800 esports games in an interpretable way. A large-scale HoK [11] K. Song, T. Zhang, and C. Ma, “Predicting the winning side of dota2,” dataset containing abundant real-time MOBA game records Sl: sn, 2015. [12] V. O. Kayhan and A. Watkins, “A data snapshot approach for making was collected to facilitate this study. Based on this dataset, real-time predictions in basketball,” Big Data, vol. 6, no. 2, pp. 96–112, we further proposed a Two-Stage Spatial-Temporal Network 2018. [Online]. Available: https://doi.org/10.1089/big.2017.0054 (TSSTN) to give interpretable intermediate results as well as [13] V. O. Kayhan and A. Watkins, “Predicting the point spread in professional basketball in real time: a data snapshot approach,” Journal reliable ultimate win predictions. The core idea of this model of Business Analytics, vol. 2, no. 1, pp. 63–73, 2019. [Online]. structure was to separate the effects of features’ value differ- Available: https://doi.org/10.1080/2573234X.2019.1625730 ences and features’ relative importance in order to decouple the [14] K. Song, Y. Gao, and J. Shi, “Making real-time predictions contributions of different features. Our interpretable results can for nba basketball games by combining the historical data and bookmaker’s betting line,” Physica A: Statistical Mechanics and be utilized in various scenarios and promote the development its Applications, vol. 547, p. 124411, 2020. [Online]. Available: of related industries. For example, the proposed TSSTN model https://www.sciencedirect.com/science/article/pii/S0378437120301618 has already been implemented in an AI commentator system [15] D. Delen, D. Cogdell, and N. Kasap, “A comparative analysis of data mining methods in predicting ncaa bowl outcomes,” International for HoK and achieved a sound performance. Our next goal Journal of Forecasting, vol. 28, no. 2, pp. 543–552, 2012. is to construct automatic metrics to quantitatively evaluate [Online]. Available: https://www.sciencedirect.com/science/article/pii/ the competence of different interpretable methods for MOBA S0169207011000914 [16] C. K. Leung and K. W. Joseph, “Sports data mining: Predicting results for the college football games,” vol. 35, pp. 710–719, 2014. [Online]. 9These comments are driven by rules written in the commentary system in Available: https://doi.org/10.1016/j.procs.2014.08.153 advance and are triggered by the TSSTN’s win predictions and interpretable [17] A. Bhawkar, B. Patel, and P. Mahajan, “Predicting the winner of indian results. premier league match: A comparative study,” 06 2019. [18] C. Soto-Valero, “Predicting win-loss outcomes in MLB regular season games - A comparative study using data mining methods,” Int. J. Comput. Sci. Sport, vol. 15, no. 2, 2016. [Online]. Available: https://doi.org/10.1515/ijcss-2016-0007 [19] Z. Chen, Y. Xu, T. D. Nguyen, Y. Sun, and M. S. El-Nasr, “Modeling game avatar synergy and opposition through embedding in multiplayer online battle arena games,” CoRR, vol. abs/1803.10402, 2018. [Online]. Available: http://arxiv.org/abs/1803.10402 [20] A. Katona, R. J. Spick, V. J. Hodge, S. Demediuk, F. Block, A. Drachen, and J. A. Walker, “Time to die: Death prediction in dota 2 using deep learning,” in IEEE Conference on Games, CoG 2019, London, United Kingdom, August 20-23, 2019. IEEE, 2019, pp. 1–8. [Online]. Available: https://doi.org/10.1109/CIG.2019.8847997 [21] S. Demediuk, P. York, A. Drachen, J. A. Walker, and F. Block, “Role identification for accurate analysis in dota 2,” in Proceedings of the Fifteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2019, October 8-12, 2019, Atlanta, Georgia, USA, G. Smith and L. Lelis, Eds. AAAI Press, 2019, pp. 130–138. [Online]. Available: https://www.aaai.org/ojs/index.php/ AIIDE/article/view/5235 [22] P. Sun, X. Sun, L. Han, J. Xiong, Q. Wang, B. Li, Y. Zheng, J. Liu, Y. Liu, H. Liu, and T. Zhang, “Tstarbots: Defeating the cheating level builtin AI in starcraft II in the full game,” CoRR, vol. abs/1809.07193, 2018. [Online]. Available: http://arxiv.org/abs/1809.07193 [23] D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo, Q. Chen, Y. Yin, H. Zhang, T. Shi, L. Wang, Q. Fu, W. Yang, and L. Huang, “Mastering complex control in MOBA games with deep reinforcement learning,” pp. 6672–6679, 2020. [Online]. Available: https://aaai.org/ojs/index.php/AAAI/article/view/6144 [24] Z. Zhang, H. Li, L. Zhang, T. Zheng, T. Zhang, X. Hao, X. Chen, M. Chen, F. Xiao, and W. Zhou, “Hierarchical reinforcement learning for multi-agent MOBA game,” CoRR, vol. abs/1901.08004, 2019. [Online]. Available: http://arxiv.org/abs/1901.08004 [25] OpenAI, “Openai five,” https://blog.openai.com/openai-five/, 2018. [26] Wikipedia, “Honor of kings — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/wiki/Honor of Kings, 2019, [Online; accessed 15-November-2019]. [Online]. Available: https://en.wikipedia.org/wiki/ Honor of Kings