Genetic Optimizing Method for Real-Time Monte Carlo Tree Search

Genetic Optimizing Method for Real-time Monte Carlo Tree Search Problem Man-Je Kim Jong-Hyun Lee Chang Wook Ahn School of EECS, Gwangju Institute of Research Center for Convergence, AI Gradueate School, Gwangju Science and Technology Sungkyunkwan University Institute of Science and Technology Gwangju, Republic of Korea Suwon, Republic of Korea Gwangju, Republic of Korea [email protected] [email protected] [email protected] ABSTRACT Monte Carlo Tree Search is one of the best algorithms for solving board game problems. However, Monte Carlo Tree Search is not suitable for real-time game problem because the problems have uncertainty of opponent’s action and a lot of simulation when determining behavior. We propose a Genetic Optimizing Method to solving the problems encountered when applying Monte Carlo Tree Search to real-time games. Our method helps solve the dilemma of Real-time Monte Carlo Tree Search between simulation and the number of branching factors by utilizing genetic algorithms. Finally, we applied our method to the Real-time Fighting Game to verify its performance. CCS CONCEPTS Figure 1: Real-time fighting Game(FightingICE) • Theory of computation → Evolutionary algorithms; Bio- inspired optimization; and excellent simulators are essential, with regards to the fact that KEYWORDS in real-time games, AI players play concurrently to result many Artificial Intelligence, Evolutionary Computing, Genetic Algorithms, unexpected situations. In order to mitigate the aforementioned Global Optimization problems, we propose a Genetic Optimizing Method that can effec- ACM Reference Format: tively optimize the simulation to apply MCTS to real-time video Man-Je Kim, Jong-Hyun Lee, and Chang Wook Ahn. 2020. Genetic Opti- games. This promising method improves the simulation of MCTS by mizing Method for Real-time Monte Carlo Tree Search Problem. In SIG utilizing Genetic Algorithm, which shows excellent performance in Proceedings Paper in LaTeX Format. ACM, New York, NY, USA, 2 pages. global optimization. We exploit a real-time fighting game as an ex- https://doi.org/ perimental environment; it is a two-player match-up game in which each AI player must constantly choose one of 56 possible actions 1 INTRODUCTION within a 16.67ms time limit via single thread, while the develop- Monte Carlo Tree Search (MCTS) is an artificial intelligence al- ment tool provides a simulator that can simulate one action within gorithm that finds optimal behaviors by simulation. In particular, 28 microsecond. Its AI player competition is occasionally held at it has been titled the most powerful board-game algorithm since the IEEE Conference on Games (CoG). These conditions make this AlphaGo won the Go competition against Lee, thanks to the combi- game an excellent platform for applying MCTS and evaluating the nation of MCTS and Deep Neural Network.[4] Considering Go as a performance of our method. conquered subject, game-AI researchers now aim at real-time video games, expecting that MCTS would be successfully applicable to 2 GENETIC OPTIMIZING METHOD real-time games as well. However, the fact that the overall perfor- Monte Carlo tree search (MCTS) is a means to find optimized tree mance of MCTS is proportional to the amount of simulation it has paths within a certain domain provided. It randomly selects samples been processed with implies that effectively applying it to real-time throughout the search space, eventually constructing a whole tree games is a highly difficult task to accomplish because they require structure. MCTS has already proven its capability of portraying instant action calculations. Therefore, effective simulation plans domains via a series of actions, establishing outstanding accom- plishments in board game fields of AI.[2] The tree’s policy follows Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed what’s called Upper Confidence Bounds*퐶퐵 1( 1), which is defined for profit or commercial advantage and that copies bear this notice and the full citation by Equation 1. -C is an average value of the reward, 퐶 is a balance on the first page. Copyrights for third-party components of this work must be honored. parameter, is the number of times the th node is visited, and ? For all other uses, contact the owner/author(s). #8 8 #8 SMA 2020, September 17-19, 2020, Jeju, Republic of Korea is the number of times the parent node of the 8th node is visited.[3] © 2020 Copyright held by the owner/author(s). SMA 2020, September 17-19, 2020, Jeju, Republic of Korea s # 2;=# ? 1 Õ8 *퐶퐵1 = - ¸ 퐶 8 , - = 4E0; . (1) C C # C # 9 8 8 9=1 The main algorithm consists of four stages: selection, expansion, simulation, and backpropagation [1]. They are repeated until the end of allowed time period. The most visited children nodes are chosen as actions to commence. • Selection: starts from the root node and searches for a superior leaf node with the highest probability. Recently, UCB1 has become a popular method. It selects an best leaf node by Figure 2: Average Fitness of Genetic Optimizing AI against calculating the UCB1 of every leaf node(available actions). Plain MCTS AI • Expansion: a process of expanding leaf nodes in order to determine the next action on the last leaf node chosen by the selection process. When the last leaf node is selected each. Table 1 shows the settings we used to apply the GA to the and the number of visiting node has already exceeded the real-time Fighting Game. simulation threshold, then the node will randomly choose among the possible actions. 3 RESULTS • Playout: randomly simulates the extended nodes until the To verify the performance of our method, we measured the output end of the played game in board game. However, in a real- of each chromosome against plain MCTS AI. The result is shown time environment, the maximum depth is set. When the in Figure 2. Figure 2 show population’s average fitness for each simulation ends, the UCB value is updated in every node via generation. In generation 1 and 2, it was close to random, so it backpropagation. showed similar performance to Plain MCTS, but the performance • Backpropagation: Backpropagating in reverse order in or- gets better as generation increases. And finally, it gradually con- der to update the tree’s policy throughout all nodes in the verges based on the 18 generations. An agent that had the best path. fitness in 20th generation, get an amazing score of 900, andwon77 There are three major elements that determine the number of sim- games in 100 matches against Plain MCTS. These results mean that ulations in MCTS. (Available Actions, Maximum Depth, Simu- our method can successfully optimize MCTS for real-time fighting lation Threshold) game problem. If Available Actions are large size, the diversity of MCTS increases and Maximum Depth is high, the accuracy is increase. Finally Simu- 4 CONCLUSIONS lation Threshold is large, the stability increases. In the board game, We developed Genetic Optimizing Method for a real-time fighting three elements could be set enough. However, in a real-time envi- game by combining GA and MCTS. By verifying effectiveness of ronment, it is essential to properly allocate these three elements this method through the experiment explained above, we intend because of limitation of response time. In order to optimize the real- to prove that MCTS is superior not only in board games but also time environment of MCTS, our method optimizes three elements in real-time games. Although our experiment was conducted only using Genetic Algorithm. in real-time fighting games, the Genetic Optimizing method did Genetic Algorithm (GA) is a population-based optimization search not use domain knowledge. This means our method can be applied algorithm inspired by the process of natural selection. GAs encode to a variety of real-time environments. Finally, we will try to our possible solutions to a problem through interactions of the genetic method for optimizing elements in various game. operations. The overall procedure consists of bio-inspired factors such as genes, chromosomes, fitness, and population sizes. We en- ACKNOWLEDGMENTS coded the above 3 elements into a chromosome with 4 binary genes This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF-2019R1I1A2A01057603 and No. NRF-2020R1C1C1009720 and Table 1: Default Genetic Algorithm Setting No. NRF-2020R1A6A3A13055636). Element Contents REFERENCES Chromosome binary encoded 3 elements [1] Cameron Browne et al. 2012. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 1–43. Gap of HP Fitness [2] Man-Je Kim, Kim Jun Suk, Kim Sungjin, James, Kim Min-jung, and Ahn (My HP- Opponent HP) Chang Wook. 2020. Genetic State-Grouping Algorithm for Deep Reinforcement Learning. Expert Systems with Applications 161, 113695 (2020). Selection Method Tournament Selection (K=0.90) [3] Yoshida Shubu, Ishihara Makoto, Miyazaki Taichi, Nakagawa Yuto, Harada To- Crossover Method two-point Crossover mohiro, and Thawonmas Ruck. 2016. Application of Monte-Carlo tree search in Population Size 16 a fighting game AI. In 2016 IEEE 5th Global Conference on Consumer Electronics. IEEE, 1–2. Chromosome Size 12 [4] David Silver et al. 2016. Mastering the Game of Go with Deep Neural Networks Mutation Probability 0.1 and Tree Search. Nature 529, 7587 (2016), 484–489..

Load more