Smartk: Efficient, Scalable and Winning Parallel MCTS
Total Page:16
File Type:pdf, Size:1020Kb
SmartK: Efficient, Scalable and Winning Parallel MCTS Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia Newhall Swarthmore College, Swarthmore, PA 19081 ACM Reference Format: processes. Each process constructs a distinct search tree in Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia parallel, communicating only its best solutions at the end of Newhall. 2019. SmartK: Efficient, Scalable and Winning Parallel a number of rollouts to pick a next action. This approach has MCTS. In SC ’19, November 17–22, 2019, Denver, CO. ACM, low communication overhead, but significant duplication of New York, NY, USA, 3 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnneffort across processes. 1 INTRODUCTION Like root parallelization, our SmartK algorithm achieves low communication overhead by assigning independent MCTS Monte Carlo Tree Search (MCTS) is a popular algorithm tasks to processes. However, SmartK dramatically reduces for game playing that has also been applied to problems as search duplication by starting the parallel searches from many diverse as auction bidding [5], program analysis [9], neural different nodes of the search tree. Its key idea is it uses nodes’ network architecture search [16], and radiation therapy de- weights in the selection phase to determine the number of sign [6]. All of these domains exhibit combinatorial search processes (the amount of computation) to allocate to ex- spaces where even moderate-sized problems can easily ex- ploring a node, directing more computational effort to more haust computational resources, motivating the need for par- promising searches. allel approaches. SmartK is our novel parallel MCTS algo- Our experiments with an MPI implementation of SmartK rithm that achieves high performance on large clusters, by for playing Hex show that SmartK has a better win rate and adaptively partitioning the search space to efficiently utilize scales better than other parallelization approaches. resources while limiting communication overhead. Recent successes in go [13], chess [8], and other domains 2 RELATED WORK have employed Monte Carlo tree search for exploring widely- The original MCTS algorithm is inherently sequential due branching search spaces. MCTS explores a game or other to the selection step of the each rollout depending on the state space by incrementally growing a search tree, using backpropation results of the the previous rollout. In prac- feedback from random simulations to identify the most promis- tice, however, there is a benefit to increasing the number of ing paths. The algorithm consists of four phases, repeated for rollouts even if they perform sub-optimal selections, which a number of iterations (rollouts): the Selection phase starts opens the door for parallel MCTS solutions. All parallel ap- at the root of the search tree and descends to child nodes proaches make tradeoffs between faithful implementation of based on an explore/exploit tradeoff until reaching a leaf MCTS and scalability and speed. node; the Expansion phase selects an unexplored action from The first approaches to parallel MCTS were root paral- the leaf and adds a child node for that action to the tree; lelization [3], where multiple threads run MCTS indepen- the Simulation phase simulates random actions from the ex- dently and combine their results at the end, and leaf paral- panded node to an end state, and computes a reward value lelization [3], where one thread traverses the tree, but many for the simulated path; the Backpropogation phase computes threads run simulations in parallel for each expanded node. a running average of reward values for all nodes on the path These methods each waste significant computational effort, from the root to the expanded node. These updated rewards and were largely supplanted by tree parallelization [4], where inform the weights computed in the explore/exploit tradeoff multiple threads simultaneously traverse a shared search tree. of future selection steps. MCTS terminates after a number Tree parallelization scales well in shared-memory systems [12] of rollouts of repeating these four phases, and then returns when the problem size fits in RAM. For larger-sized prob- the best available action at the root of the tree. lems, distributed memory solutions are necessary for good Among past efforts to parallelize MCTS, the standard dis- performance. However, distributed tree parallelization suf- tributed memory approach is root parallelization, which dis- fers from large communication overheads related to shar- tributes independent copies of the MCTS algorithm across ing the single search tree. Transposition-table-driven search Permission to make digital or hard copies of all or part of this work methods achieve better distributed tree parallelization by for personal or classroom use is granted without fee provided that using Zobrist hashing to assign search tree nodes to pro- copies are not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first cesses [17], but still with significant communication over- page. Copyrights for components of this work owned by others than head [7, 11]. ACM must be honored. Abstracting with credit is permitted. To copy Our approach achieves distributed-memory parallelization otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions with low communication overhead by assigning responsibility from [email protected]. for entire subtrees to individual processes. The most-similar SC ’19, November 17–22, 2019, Denver, CO approach to ours in the literature is due to Swiechowski´ and © 2019 Association for Computing Machinery. Ma´ndziuk [14], who assign to each compute node a sub- ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. °15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn set of the actions available at the root of the search tree. SC ’19, November 17–22, 2019, Denver, CO Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia Newhall Our method improves on this in two key ways: first we con- All MCTS variants are implemented in Python and MPI sider much greater parallelism by assuming that multiple (mpi4py). processes will be assigned to each subtree. Second, we com- We compared win percentages for different combinations municate interim search state, allowing us to re-assign com- of players, degrees of parallelism, and board sizes, and eval- pute nodes to more-promising subtrees. In this preliminary uated the scalability in terms of memory usage, game nodes work, we follow the lead of Swiechowski´ and Ma´ndziuk [14] expanded, and time per game move. In our experiments in comparing performance primarily against root paralleliza- we varied the number of processes from 8 up to 512 for tion, the gold standard for truly distributed MCTS. three Hex game board sizes (8x8, 11x11, 14x14), noting that the search space grows exponentially with board size. The SmartK player completes its total rollouts in multiple rounds 3 THE SMARTK ALGORITHM 4 12 (2 rounds with 2 rollouts per round), whereas the root 16 Our novel parallel MCTS algorithm SmartK works by allo- player completes its total rollouts in a single round (2 roll- cating to each child process a single node in the search tree outs). Due to space considerations, we do not include tables that the child treats as the root of its own MCTS. Nodes are of our main results, but instead summarize them, providing re-allocated to processes over a few rounds, based on node se- key specific results to support our summary. lection weights. A single promising search tree node may be Our results show that Smartk outperforms root paral- allocated to multiple processes as the search proceeds. This lelization. As the first player, SmartK beats root for all board approach achieves the joint goals of low communication over- sizes and all degrees of parallelism: it wins 90% of the time head, because each process runs an independent instance of for a 128 processes on an 8x8 board; 74% of the time for MCTS with minimal communication to the parent, and low a 256 process 11x11 board; and 68% of the time for 512 duplication of search effort, because processes search their process on a 14x14 board. SmartK is also a better second subtrees independently. player than root (Hex has a first-mover advantage). In fact, The SmartK algorithm consists of two phases. In the first, it beats or ties root on the larger board (wins 56% on 11x11 n child processes run root-parallel MCTS for m rollouts. and 50% on 14x14). SmartK also handily beats a sequential Each process sends to the parent process value estimates and player for all combinations, and does so with higher winning visit counts for each of the root node’s children in the tree. percentages than root. The parent process merges these values from its children to We evaluated SmartK’s scalability by comparing it to produce node weights for Phase 2. In the second phase, each other algorithms’ time to make one move, memory usage, child process is assigned a tree node to explore and performs and number of search nodes expanded. Root and SmartK m MCTS rollouts rooted at that node. It then sends the perform better than sequential because they expand many value estimate and visit count for its node to the parent pro- more nodes in their game search than sequential; they both cess, who recomputes node weights and re-assigns nodes to expand numP rocess ∗ numRollouts nodes vs. sequential’s child processes for the next round. Phase 2 is repeated for numRollouts. Our results show that while SmartK and Root k rounds. After Phase 2 completes, the parent process se- expand the same number of nodes, SmartK does so in a more lects an action based on the final value estimates. SmartK efficient manner due to it breaking up rollouts into separate re-assigns nodes once per round for k rounds of m rollouts, rounds (ex.