SmartK: Efficient, Scalable and Winning Parallel MCTS

Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia Newhall Swarthmore College, Swarthmore, PA 19081

ACM Reference Format: processes. Each process constructs a distinct search tree in Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia parallel, communicating only its best solutions at the end of Newhall. 2019. SmartK: Efficient, Scalable and Winning Parallel a number of rollouts to pick a next action. This approach has MCTS. In SC ’19, November 17–22, 2019, Denver, CO. ACM, low communication overhead, but significant duplication of New York, NY, USA, 3 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnneffort across processes. 1 INTRODUCTION Like root parallelization, our SmartK algorithm achieves low communication overhead by assigning independent MCTS Monte Carlo Tree Search (MCTS) is a popular algorithm tasks to processes. However, SmartK dramatically reduces for game playing that has also been applied to problems as search duplication by starting the parallel searches from many diverse as auction bidding [5], program analysis [9], neural different nodes of the search tree. Its key idea is it uses nodes’ network architecture search [16], and radiation therapy de- weights in the selection phase to determine the number of sign [6]. All of these domains exhibit combinatorial search processes (the amount of computation) to allocate to ex- spaces where even moderate-sized problems can easily ex- ploring a node, directing more computational effort to more haust computational resources, motivating the need for par- promising searches. allel approaches. SmartK is our novel parallel MCTS algo- Our experiments with an MPI implementation of SmartK rithm that achieves high performance on large clusters, by for playing Hex show that SmartK has a better win rate and adaptively partitioning the search space to efficiently utilize scales better than other parallelization approaches. resources while limiting communication overhead. Recent successes in go [13], [8], and other domains 2 RELATED WORK have employed Monte Carlo tree search for exploring widely- The original MCTS algorithm is inherently sequential due branching search spaces. MCTS explores a game or other to the selection step of the each rollout depending on the state space by incrementally growing a search tree, using backpropation results of the the previous rollout. In prac- feedback from random simulations to identify the most promis- tice, however, there is a benefit to increasing the number of ing paths. The algorithm consists of four phases, repeated for rollouts even if they perform sub-optimal selections, which a number of iterations (rollouts): the Selection phase starts opens the door for parallel MCTS solutions. All parallel ap- at the root of the search tree and descends to child nodes proaches make tradeoffs between faithful implementation of based on an explore/exploit tradeoff until reaching a leaf MCTS and scalability and speed. node; the Expansion phase selects an unexplored action from The first approaches to parallel MCTS were root paral- the leaf and adds a child node for that action to the tree; lelization [3], where multiple threads run MCTS indepen- the Simulation phase simulates random actions from the ex- dently and combine their results at the end, and leaf paral- panded node to an end state, and computes a reward value lelization [3], where one traverses the tree, but many for the simulated path; the Backpropogation phase computes threads run simulations in parallel for each expanded node. a running average of reward values for all nodes on the path These methods each waste significant computational effort, from the root to the expanded node. These updated rewards and were largely supplanted by tree parallelization [4], where inform the weights computed in the explore/exploit tradeoff multiple threads simultaneously traverse a shared search tree. of future selection steps. MCTS terminates after a number Tree parallelization scales well in shared-memory systems [12] of rollouts of repeating these four phases, and then returns when the problem size fits in RAM. For larger-sized prob- the best available action at the root of the tree. lems, distributed memory solutions are necessary for good Among past efforts to parallelize MCTS, the standard dis- performance. However, distributed tree parallelization suf- tributed memory approach is root parallelization, which dis- fers from large communication overheads related to shar- tributes independent copies of the MCTS algorithm across ing the single search tree. Transposition-table-driven search Permission to make digital or hard copies of all or part of this work methods achieve better distributed tree parallelization by for personal or classroom use is granted without fee provided that using Zobrist hashing to assign search tree nodes to pro- copies are not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first cesses [17], but still with significant communication over- page. Copyrights for components of this work owned by others than head [7, 11]. ACM must be honored. Abstracting with credit is permitted. To copy Our approach achieves distributed-memory parallelization otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions with low communication overhead by assigning responsibility from [email protected]. for entire subtrees to individual processes. The most-similar SC ’19, November 17–22, 2019, Denver, CO approach to ours in the literature is due to Swiechowski´ and © 2019 Association for Computing Machinery. Ma´ndziuk [14], who assign to each compute node a sub- ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . °15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn set of the actions available at the root of the search tree. SC ’19, November 17–22, 2019, Denver, CO Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia Newhall

Our method improves on this in two key ways: first we con- All MCTS variants are implemented in Python and MPI sider much greater parallelism by assuming that multiple (mpi4py). processes will be assigned to each subtree. Second, we com- We compared win percentages for different combinations municate interim search state, allowing us to re-assign com- of players, degrees of parallelism, and board sizes, and eval- pute nodes to more-promising subtrees. In this preliminary uated the scalability in terms of memory usage, game nodes work, we follow the lead of Swiechowski´ and Ma´ndziuk [14] expanded, and time per game move. In our experiments in comparing performance primarily against root paralleliza- we varied the number of processes from 8 up to 512 for tion, the gold standard for truly distributed MCTS. three Hex game board sizes (8x8, 11x11, 14x14), noting that the search space grows exponentially with board size. The SmartK player completes its total rollouts in multiple rounds 3 THE SMARTK ALGORITHM 4 12 (2 rounds with 2 rollouts per round), whereas the root 16 Our novel parallel MCTS algorithm SmartK works by allo- player completes its total rollouts in a single round (2 roll- cating to each child process a single node in the search tree outs). Due to space considerations, we do not include tables that the child treats as the root of its own MCTS. Nodes are of our main results, but instead summarize them, providing re-allocated to processes over a few rounds, based on node se- key specific results to support our summary. lection weights. A single promising search tree node may be Our results show that Smartk outperforms root paral- allocated to multiple processes as the search proceeds. This lelization. As the first player, SmartK beats root for all board approach achieves the joint goals of low communication over- sizes and all degrees of parallelism: it wins 90% of the time head, because each process runs an independent instance of for a 128 processes on an 8x8 board; 74% of the time for MCTS with minimal communication to the parent, and low a 256 process 11x11 board; and 68% of the time for 512 duplication of search effort, because processes search their process on a 14x14 board. SmartK is also a better second subtrees independently. player than root (Hex has a first-mover advantage). In fact, The SmartK algorithm consists of two phases. In the first, it beats or ties root on the larger board (wins 56% on 11x11 n child processes run root-parallel MCTS for m rollouts. and 50% on 14x14). SmartK also handily beats a sequential Each process sends to the parent process value estimates and player for all combinations, and does so with higher winning visit counts for each of the root node’s children in the tree. percentages than root. The parent process merges these values from its children to We evaluated SmartK’s scalability by comparing it to produce node weights for Phase 2. In the second phase, each other algorithms’ time to make one move, memory usage, child process is assigned a tree node to explore and performs and number of search nodes expanded. Root and SmartK m MCTS rollouts rooted at that node. It then sends the perform better than sequential because they expand many value estimate and visit count for its node to the parent pro- more nodes in their game search than sequential; they both cess, who recomputes node weights and re-assigns nodes to expand numP rocess ∗ numRollouts nodes vs. sequential’s child processes for the next round. Phase 2 is repeated for numRollouts. Our results show that while SmartK and Root k rounds. After Phase 2 completes, the parent process se- expand the same number of nodes, SmartK does so in a more lects an action based on the final value estimates. SmartK efficient manner due to it breaking up rollouts into separate re-assigns nodes once per round for k rounds of m rollouts, rounds (ex. on 8x8 board, 24.3 vs. 35.4 seconds per move, allowing it to direct the search towards more promising so- and on 11x11, 21.6s vs. 27.7s per move). We also found that lutions. that memory usage across board sizes and algorithms is de- To assign nodes in Phase 2, the parent process computes pendent on the number of nodes expanded); it grows linearly a selection weight for each child of the root. In our experi- with the number of processes (degree of parallelism) and ments, this is implemented by computing a normalized UCB does not increase significantly for larger board sizes: SmartK weight[2] for each node so that weights sum to one. Subse- uses 1.21GB per process for 8x8 boards and 1.22GB per pro- quently, nodes are assigned to n child processes such that cess for 11x11 boards; Root uses 1.37GB per process for 8x8 the fraction of processes exploring each node approximates boards and 1.49GB per process for 11x11 boards. the node’s selection weight. In rounds k > 1, nodes can be re- Our experiments show that SmartK is a better algorithm assigned to the same process when possible, to allow re-use in terms of winning percentage than root across problem of existing search trees. In principle, other weight functions sizes and degrees of parallelism. They also show that both could be used and nodes deeper than children of the root root and SmartK scale well to large problem sizes, making could be assigned. use of large numbers of processes that are distributed over many compute nodes. However, SmartK makes better use of 4 RESULTS those resources, resulting in closer-to-optimal decisions. Future directions of our work involve investigating a hy- We evaluate SmartK in terms of its playing strength and scal- brid implementation using SmartK across compute nodes, ability through experiments comparing it to root parallel and with inter-node shared-memory parallelism, and continuing sequential MCTS for playing the classic Hex [1]. to evaluate SmartK’s scalability to larger problems and its Our experiments were run on the Comet HPC System at performance on other MCTS applications. San Diego SuperComputer Center [10] through XSEDE [15]. SmartK: Efficient, Scalable and Winning Parallel MCTS SC ’19, November 17–22, 2019, Denver, CO

REFERENCES [17] Kazuki Yoshizoe, Akihiro Kishimoto, Tomoyuki Kaneko, [1] Broderick Arneson, Ryan B Hayward, and Philip Henderson. Haruhiro Yoshimoto, and Yutaka Ishikawa. 2011. Scalable dis- 2010. Monte Carlo tree search in Hex. IEEE Transactions on tributed monte-carlo tree search. In Fourth Annual Symposium Computational Intelligence and AI in Games 2, 4 (2010), 251– on Combinatorial Search. AAAI Press, Barcelona, Catalonia, 258. Spain. [2] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite- time analysis of the multiarmed bandit problem. Machine learn- ing 47, 2-3 (2002), 235–256. [3] Tristan Cazenave and Nicolas Jouandeau. 2007. On the paral- lelization of UCT. In Proceedings of the Computer Games Work- shop. Amsterdam, Netherlands, 93–101. [4] Guillaume M. J. B. Chaslot, Mark H. M. Winands, and H. Jaap van den Herik. 2008. Parallel Monte-Carlo Tree Search. In Com- puters and Games, H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, and Mark H. M. Winands (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 60–71. [5] M M P Chowdhury, Tran Cao Son, Christopher Kiekintveld, and William Yeoh. 2018. Bidding strategy for periodic double auc- tions using Monte Carlo tree search. In Proceedings of the Inter- national Joint Conference on Autonomous Agents and Multia- gent Systems, Vol. 3. IFAAMAS, Stockholm, Sweden, 1897–1899. [6] Peng Dong, Hongcheng Liu, and Lei Xing. 2018. Monte Carlo tree search -based non-coplanar trajectory de- sign for station parameter optimized radiation therapy (SPORT). Physics in Medicine and Biology 63, 13 (2018). https://doi.org/10.1088/1361-6560/aaca17 [7] Alonso Gragera and Vorapong Suppakitpaisarn. 2016. A Map- ping Heuristic for Minimizing Message Latency in Massively Dis- tributed MCTS. In Fourth International Symposium on Com- puting and Networking. IEEE, Hiroshima, Japan, 574–553. [8] Dharshan Kumaran, , Thore Graepel, Matthew Lai, David Silver, Marc Lanctot, Laurent Sifre, Thomas Hu- bert, Karen Simonyan, Ioannis Antonoglou, Timothy Lillicrap, Julian Schrittwieser, and Arthur Guez. 2018. A general re- inforcement learning algorithm that masters chess, , and Go through self-play. Science 362, 6419 (2018), 1140–1144. https://doi.org/10.1126/science.aar6404 [9] Kasper Luckow, Corina S P˘as˘areanu, and Willem Visser. 2018. Monte Carlo Tree Search for Finding Costly Paths in Programs. In International Conference on Software Engineering and For- mal Methods, Vol. 10886 LNCS. Springer, Springer International Publishing AG, Toulouse, France, 123–138. [10] San Diego Supercomputer Center. [n. d.]. The Comet Supercom- puter. https://www.sdsc.edu. [11] Lars Schaefers and Marco Platzner. 2014. Distributed Monte Carlo Tree Search: A novel technique and its application to com- puter Go. IEEE Transactions on Computational Intelligence and AI in Games 7, 4 (2014), 361–374. [12] Richard B. Segal. 2011. On the Scalability of Parallel UCT. In Computers and Games, H. Jaap van den Herik, Hiroyuki Iida, and Aske Plaat (Eds.). Springer Berlin Heidelberg, Berlin, Hei- delberg, 36–47. [13] David Silver, , Chris J Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalch- brenner, , Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural net- works and tree search. Nature 529, 7587 (2016), 484–9. https://doi.org/10.1038/nature16961 [14] Maciej Swiechowski´ and Jacek Ma´ndziuk. 2016. A Hybrid Ap- proach to Parallelization of Monte Carlo Tree Search in . In Challenging Problems and Solutions in Intel- ligent Systems. Springer, 199–215. [15] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Comput- ing in Science & Engineering 16, 5 (Sept.-Oct. 2014), 62–74. https://doi.org/10.1109/MCSE.2014.80 [16] Linnan Wang, Yiyang Zhao, and Yuu Jinnai. 2018. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search. CoRR abs/1805.07440 (2018). arXiv:1805.07440