Applying Monte-Carlo Tree Search in HTN Planning

Total Page:16

File Type:pdf, Size:1020Kb

Applying Monte-Carlo Tree Search in HTN Planning The Thirteenth International Symposium on Combinatorial Search (SoCS 2020) Applying Monte-Carlo Tree Search in HTN Planning Julia Wichlacz, Daniel Holler,¨ Alvaro´ Torralba, Jorg¨ Hoffmann Saarland University, Saarland Informatics Campus, Saarbrucken,¨ Germany [email protected], [email protected], [email protected], [email protected] Abstract cost plans (sometimes with optimality guarantees depend- ing on the technique used, see e.g. Behnke, Holler,¨ and Bi- Search methods are useful in hierarchical task network undo, 2019b). Here we investigate Monte-Carlo Tree Search (HTN) planning to make performance less dependent on the (MCTS) as a new algorithmic alternative in HTN planning. domain knowledge provided, and to minimize plan costs. Here we investigate Monte-Carlo tree search (MCTS) as a MCTS is a well-known search paradigm, originating new algorithmic alternative in HTN planning. We implement in probabilistic planning problems (Kocsis and Szepesvari´ combinations of MCTS with heuristic search in Panda. We 2006; Keller and Helmert 2013), but successful also in furthermore investigate MCTS in JSHOP, to address lifted games (Silver et al. 2016) and in deterministic planning set- (non-grounded) planning, leveraging the fact that, in contrast tings (Trunda and Bartak´ 2013; Schulte and Keller 2014). to other search methods, MCTS does not require a grounded MCTS is essentially based on sampling the state space, feed- task representation. Our new methods yield coverage perfor- ing back rewards obtained at terminal states to inform the mance on par with the state of the art, but in addition can action decisions on earlier states. This converges to optimal effectively minimize plan cost over time. decisions in the limit, but its main prowess in practice is to take good action decisions in limited time, with an any- 1 Introduction time behavior where decisions improve over time. MCTS can also be combined with heuristic search, as systematized Hierarchical Task Network (HTN) planning complements by Keller and Helmert (2013) in their trial-based heuristic actions causing state transition like in classical planning tree search (THTS) framework. Here, the focus is to balance with a hierarchy on the things to do, the tasks. Abstract tasks exploration and exploitation during search. are not applicable directly and are decomposed in a pro- Our contribution here is the implementation and evalu- cess similar to the derivation of words from a formal gram- ation of MCTS in HTN planning. We do so in two HTN mar. Solutions need to satisfy constraints induced by de- planning systems, namely Panda (Bercher, Keen, and Bi- composition structure and state transition system. The mo- undo 2014) and SHOP (Nau et al. 1999). tivation behind the hierarchy is manifold: it allows the de- Panda is the state-of-the-art system incorporating heuris- scription of complex behavior (Erol, Nau, and Hendler 1994; tic search, thus facilitating a direct empirical comparison, Holler¨ et al. 2014), enables the communication with users on and allowing to leverage prior work on heuristic functions different levels of abstraction (see e.g. Behnke et al., 2019) (Holler¨ et al. 2018). We implement and evaluate THTS. or to add advice, i. e., to guide the search using human do- main knowledge (as e.g. exploited by Nau et al., 2003). SHOP (Nau et al. 1999; 2003) is the state-of-the-art sys- Traditionally, HTN planning systems relied mostly on the tem for lifted HTN planning, not requiring a grounded task informativeness of the domain knowledge provided, mak- representation. This serves to leverage the milder repre- ing do with simple search algorithms like depth-first search sentation requirements of MCTS: in difference to heuristic (Nau et al. 2003). Recent work lines have established more search and SAT, MCTS does not require a grounded task informed search mechanisms, namely sophisticated reacha- representation. This is a key advantage because grounded bility analysis (Bit-Monnot, Smith, and Do 2016; Behnke et representations quickly become prohibitively large when al. 2020), heuristic search (Bercher, Keen, and Biundo 2014; the object universe is large, and/or when predicate/action arity is high. This happens, for example, in natural lan- Bercher et al. 2017; Holler¨ et al. 2018), compilations to clas- guage generation (Koller and Stone 2007; Koller and Hoff- sical planning (Alford, Kuter, and Nau 2009; Alford et al. mann 2010) and in Minecraft planning (Roberts et al. 2017; 2016), and SAT (Behnke, Holler,¨ and Biundo 2018; 2019a; Wichlacz, Torralba, and Hoffmann 2019). Implementing Schreiber et al. 2019). These methods improve HTN plan- MCTS in JSHOP, a version of SHOP, with a number of adap- ning performance, make it less dependent on the domain tations for increased efficiency, we obtain the first non-blind knowledge, and provide it with the ability to search for low- HTN planner not requiring a grounded task representation. 1 Copyright c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1Attempts have been made to compute heuristic functions with- 82 Evaluating our implementations on standard HTN bench- The problem definition contains an initial task network marks, we find that our new methods are generally on par tnI and an initial state s0. Some task network is a solution if with previous methods in terms of coverage, thus adding and only if it can be derived from tnI via the decomposition new technology options into the HTN arsenal. More specif- process as described before, it contains only primitive tasks, ically, in Panda, it turns out that the exploration term can and there is a (totally ordered) sequence of the tasks in line yield benefits beyond purely relying on heuristics to guide with its ordering definition that is executable in s0. There are search. In JSHOP, a na¨ıve implementation of MCTS lags be- two points that we want to emphasize: hind in coverage, but our adaptations – in particular, using • A planner is not allowed to insert additional actions apart backtracking on MCTS samples to escape shallow dead-end from those included due to decomposition. paths – remove that disadvantage. To evaluate plan-cost minimization over time without • Usually, there is no state-based goal definition given, the guidance of a heuristic function, we create a version of objective of the problem is defined in terms of the tasks JSHOP that continues search after the first plan is found. contained in tnI . Comparing MCTS against this system, we find that MCTS There are several restrictions that define subclasses of the is more effective at improving plan cost over time. This is es- given HTN definition (Bercher, Alford, and Holler¨ 2019), pecially the case on benchmarks aimed at featuring variance mostly on the ordering definitions in methods or on recur- in action costs, including a number of Minecraft benchmark sion allowed in decomposition. The two systems we build studies that we design for lifted planning. on differ in terms of the HTN variants they deal with. Panda The paper is organized as follows. Section 2 gives the re- allows for recursive, partially ordered models. The version quired background in HTN planning, MCTS, and THTS. of SHOP we build on allows for recursion but is restricted to Section 3 describes our THTS implementation in Panda, totally ordered models, i. e. models where all task networks which we empirically evaluate in Section 4. Section 5 de- in method definitions and the initial task network are totally scribes our adaptation of MCTS in JSHOP, Section 6 gives ordered. its evaluation. Section 7 concludes the paper. Monte-Carlo Tree Search (MCTS) 2 Background Monte-Carlo tree search (MCTS) is a family of algo- In this section we give a brief introduction to HTN planning rithms (Browne et al. 2012) that have been very success- and Monte-Carlo Tree Search search. ful in Games and probabilistic planning on MDPs settings. Many variants of MCTS, or algorithms containing elements HTN Planning thereof, have been proposed. Generally, MCTS considers a search tree. This is initialized with a root node, then itera- In Hierarchical Task Network (HTN) planning (see e.g. tively expanded, performing four operations at each step: Ghallab, Nau, and Traverso, 2004, Chapter 11, or Bercher, Alford, and Holler,¨ 2019), there are two types of tasks, ab- • Selection: Traverse the tree from the root to a leaf, select- stract tasks and primitive tasks (also called actions). Like in ing actions according to a strategy that balances explo- classical planning, actions are directly applicable to the en- ration (visit nodes that have been explored less often) and vironment and cause state transition. In its most basic defi- exploitation (visit more promising nodes more often). The nition, the environment is described using a finite set of state UCT algorithm defines a popular selection strategy tak- features, but there are also systems supporting numbers (see ing inspiration from multi-armed bandit problems (Kocsis e.g. Nau et al., 2003) and time (e.g. Bit-Monnot, Smith, and and Szepesvari´ 2006). Do, 2016). Tasks are maintained in task networks, which are • Expansion: Generate the successors of the selected leaf partially ordered multisets of tasks. node. Abstract tasks are not directly executable, but are decom- posed into other tasks (that may be primitive or abstract) • Evaluation: Estimate the solution cost (reward/quality) of using decomposition methods. A method definition includes a newly generated node. This is done by roll-outs, sim- the task it is applicable to and a task network that defines ulations following a random action policy, until a termi- its subtasks. Usually, there is more than one method for a nal state is reached. (But a link to heuristic search can be given abstract task, resulting in the combinatorial problem made here, see below.) of choosing the right method to apply to a given task.
Recommended publications
  • Monte-Carlo Tree Search Solver
    Monte-Carlo Tree Search Solver Mark H.M. Winands1, Yngvi BjÄornsson2, and Jahn-Takeshi Saito1 1 Games and AI Group, MICC, Universiteit Maastricht, Maastricht, The Netherlands fm.winands,[email protected] 2 School of Computer Science, Reykjav¶³kUniversity, Reykjav¶³k,Iceland [email protected] Abstract. Recently, Monte-Carlo Tree Search (MCTS) has advanced the ¯eld of computer Go substantially. In this article we investigate the application of MCTS for the game Lines of Action (LOA). A new MCTS variant, called MCTS-Solver, has been designed to play narrow tacti- cal lines better in sudden-death games such as LOA. The variant di®ers from the traditional MCTS in respect to backpropagation and selection strategy. It is able to prove the game-theoretical value of a position given su±cient time. Experiments show that a Monte-Carlo LOA program us- ing MCTS-Solver defeats a program using MCTS by a winning score of 65%. Moreover, MCTS-Solver performs much better than a program using MCTS against several di®erent versions of the world-class ®¯ pro- gram MIA. Thus, MCTS-Solver constitutes genuine progress in using simulation-based search approaches in sudden-death games, signi¯cantly improving upon MCTS-based programs. 1 Introduction For decades ®¯ search has been the standard approach used by programs for playing two-person zero-sum games such as chess and checkers (and many oth- ers). Over the years many search enhancements have been proposed for this framework. However, in some games where it is di±cult to construct an accurate positional evaluation function (e.g., Go) the ®¯ approach was hardly success- ful.
    [Show full text]
  • Αβ-Based Play-Outs in Monte-Carlo Tree Search
    αβ-based Play-outs in Monte-Carlo Tree Search Mark H.M. Winands Yngvi Bjornsson¨ Abstract— Monte-Carlo Tree Search (MCTS) is a recent [9]. In the context of game playing, Monte-Carlo simulations paradigm for game-tree search, which gradually builds a game- were first used as a mechanism for dynamically evaluating tree in a best-first fashion based on the results of randomized the merits of leaf nodes of a traditional αβ-based search [10], simulation play-outs. The performance of such an approach is highly dependent on both the total number of simulation play- [11], [12], but under the new paradigm MCTS has evolved outs and their quality. The two metrics are, however, typically into a full-fledged best-first search procedure that replaces inversely correlated — improving the quality of the play- traditional αβ-based search altogether. MCTS has in the past outs generally involves adding knowledge that requires extra couple of years substantially advanced the state-of-the-art computation, thus allowing fewer play-outs to be performed in several game domains where αβ-based search has had per time unit. The general practice in MCTS seems to be more towards using relatively knowledge-light play-out strategies for difficulties, in particular computer Go, but other domains the benefit of getting additional simulations done. include General Game Playing [13], Amazons [14] and Hex In this paper we show, for the game Lines of Action (LOA), [15]. that this is not necessarily the best strategy. The newest version The right tradeoff between search and knowledge equally of our simulation-based LOA program, MC-LOAαβ , uses a applies to MCTS.
    [Show full text]
  • Smartk: Efficient, Scalable and Winning Parallel MCTS
    SmartK: Efficient, Scalable and Winning Parallel MCTS Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia Newhall Swarthmore College, Swarthmore, PA 19081 ACM Reference Format: processes. Each process constructs a distinct search tree in Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia parallel, communicating only its best solutions at the end of Newhall. 2019. SmartK: Efficient, Scalable and Winning Parallel a number of rollouts to pick a next action. This approach has MCTS. In SC ’19, November 17–22, 2019, Denver, CO. ACM, low communication overhead, but significant duplication of New York, NY, USA, 3 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnneffort across processes. 1 INTRODUCTION Like root parallelization, our SmartK algorithm achieves low communication overhead by assigning independent MCTS Monte Carlo Tree Search (MCTS) is a popular algorithm tasks to processes. However, SmartK dramatically reduces for game playing that has also been applied to problems as search duplication by starting the parallel searches from many diverse as auction bidding [5], program analysis [9], neural different nodes of the search tree. Its key idea is it uses nodes’ network architecture search [16], and radiation therapy de- weights in the selection phase to determine the number of sign [6]. All of these domains exhibit combinatorial search processes (the amount of computation) to allocate to ex- spaces where even moderate-sized problems can easily ex- ploring a node, directing more computational effort to more haust computational resources, motivating the need for par- promising searches. allel approaches. SmartK is our novel parallel MCTS algo- Our experiments with an MPI implementation of SmartK rithm that achieves high performance on large clusters, by for playing Hex show that SmartK has a better win rate and adaptively partitioning the search space to efficiently utilize scales better than other parallelization approaches.
    [Show full text]
  • Monte-Carlo Tree Search
    Monte-Carlo Tree Search PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit Maastricht, op gezag van de Rector Magnificus, Prof. mr. G.P.M.F. Mols, volgens het besluit van het College van Decanen, in het openbaar te verdedigen op donderdag 30 september 2010 om 14:00 uur door Guillaume Maurice Jean-Bernard Chaslot Promotor: Prof. dr. G. Weiss Copromotor: Dr. M.H.M. Winands Dr. B. Bouzy (Universit´e Paris Descartes) Dr. ir. J.W.H.M. Uiterwijk Leden van de beoordelingscommissie: Prof. dr. ir. R.L.M. Peeters (voorzitter) Prof. dr. M. Muller¨ (University of Alberta) Prof. dr. H.J.M. Peters Prof. dr. ir. J.A. La Poutr´e (Universiteit Utrecht) Prof. dr. ir. J.C. Scholtes The research has been funded by the Netherlands Organisation for Scientific Research (NWO), in the framework of the project Go for Go, grant number 612.066.409. Dissertation Series No. 2010-41 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. ISBN 978-90-8559-099-6 Printed by Optima Grafische Communicatie, Rotterdam. Cover design and layout finalization by Steven de Jong, Maastricht. c 2010 G.M.J-B. Chaslot. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronically, mechanically, photocopying, recording or otherwise, without prior permission of the author. Preface When I learnt the game of Go, I was intrigued by the contrast between the simplicity of its rules and the difficulty to build a strong Go program.
    [Show full text]
  • CS234 Notes - Lecture 14 Model Based RL, Monte-Carlo Tree Search
    CS234 Notes - Lecture 14 Model Based RL, Monte-Carlo Tree Search Anchit Gupta, Emma Brunskill June 14, 2018 1 Introduction In this lecture we will learn about model based RL and simulation based tree search methods. So far we have seen methods which attempt to learn either a value function or a policy from experience. In contrast model based approaches first learn a model of the world from experience and then use this for planning and acting. Model-based approaches have been shown to have better sample efficiency and faster convergence in certain settings. We will also have a look at MCTS and its variants which can be used for planning given a model. MCTS was one of the main ideas behind the success of AlphaGo. Figure 1: Relationships among learning, planning, and acting 2 Model Learning By model we mean a representation of an MDP < S; A; R; T; γ> parametrized by η. In the model learning regime we assume that the state and action space S; A are known and typically we also assume conditional independence between state transitions and rewards i.e P [st+1; rt+1jst; at] = P [st+1jst; at]P [rt+1jst; at] Hence learning a model consists of two main parts the reward function R(:js; a) and the transition distribution P (:js; a). k k k k K Given a set of real trajectories fSt ;At Rt ; :::; ST gk=1 model learning can be posed as a supervised learning problem. Learning the reward function R(s; a) is a regression problem whereas learning the transition function P (s0js; a) is a density estimation problem.
    [Show full text]
  • An Analysis of Monte Carlo Tree Search
    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James,∗ George Konidaris,† Benjamin Rosman∗‡ ∗University of the Witwatersrand, Johannesburg, South Africa †Brown University, Providence RI 02912, USA ‡Council for Scientific and Industrial Research, Pretoria, South Africa [email protected], [email protected], [email protected] Abstract UCT calls for rollouts to be performed by randomly se- lecting actions until a terminal state is reached. The outcome Monte Carlo Tree Search (MCTS) is a family of directed of the simulation is then propagated to the root of the tree. search algorithms that has gained widespread attention in re- cent years. Despite the vast amount of research into MCTS, Averaging these results over many iterations performs re- the effect of modifications on the algorithm, as well as the markably well, despite the fact that actions during the sim- manner in which it performs in various domains, is still not ulation are executed completely at random. As the outcome yet fully known. In particular, the effect of using knowledge- of the simulations directly affects the entire algorithm, one heavy rollouts in MCTS still remains poorly understood, with might expect that the manner in which they are performed surprising results demonstrating that better-informed rollouts has a major effect on the overall strength of the algorithm. often result in worse-performing agents. We present exper- A natural assumption to make is that completely random imental evidence suggesting that, under certain smoothness simulations are not ideal, since they do not map to realistic conditions, uniformly random simulation policies preserve actions.
    [Show full text]
  • CS 188: Artificial Intelligence Game Playing State-Of-The-Art
    CS 188: Artificial Intelligence Game Playing State-of-the-Art § Checkers: 1950: First computer player. 1994: First Adversarial Search computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame. 2007: Checkers solved! § Chess: 1997: Deep Blue defeats human champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. § Go: Human champions are now starting to be challenged by machines. In go, b > 300! Classic programs use pattern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion methods. Instructors: Pieter Abbeel & Dan Klein University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley (ai.berkeley.edu).] Game Playing State-of-the-Art Behavior from Computation § Checkers: 1950: First computer player. 1994: First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame. 2007: Checkers solved! § Chess: 1997: Deep Blue defeats human champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. § Go: 2016: Alpha GO defeats human champion. Uses Monte Carlo Tree Search, learned
    [Show full text]
  • A Survey of Monte Carlo Tree Search Methods
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 4, NO. 1, MARCH 2012 1 A Survey of Monte Carlo Tree Search Methods Cameron Browne, Member, IEEE, Edward Powley, Member, IEEE, Daniel Whitehouse, Member, IEEE, Simon Lucas, Senior Member, IEEE, Peter I. Cowling, Member, IEEE, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton Abstract—Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm’s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work. Index Terms—Monte Carlo Tree Search (MCTS), Upper Confidence Bounds (UCB), Upper Confidence Bounds for Trees (UCT), Bandit-based methods, Artificial Intelligence (AI), Game search, Computer Go. F 1 INTRODUCTION ONTE Carlo Tree Search (MCTS) is a method for M finding optimal decisions in a given domain by taking random samples in the decision space and build- ing a search tree according to the results. It has already had a profound impact on Artificial Intelligence (AI) approaches for domains that can be represented as trees of sequential decisions, particularly games and planning problems.
    [Show full text]
  • Monte Carlo *-Minimax Search
    Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Monte Carlo *-Minimax Search Marc Lanctot Abdallah Saffidine Department of Knowledge Engineering LAMSADE, Maastricht University, Netherlands Universite´ Paris-Dauphine, France [email protected] abdallah.saffi[email protected] Joel Veness and Christopher Archibald, Mark H. M. Winands Department of Computing Science Department of Knowledge Engineering University of Alberta, Canada Maastricht University, Netherlands fveness@cs., [email protected] [email protected] Abstract the search enhancements from the classic αβ literature can- not be easily adapted to MCTS. The classic algorithms for This paper introduces Monte Carlo *-Minimax stochastic games, EXPECTIMAX and *-Minimax (Star1 and Search (MCMS), a Monte Carlo search algorithm Star2), perform look-ahead searches to a limited depth. How- for turned-based, stochastic, two-player, zero-sum ever, the running time of these algorithms scales exponen- games of perfect information. The algorithm is de- tially in the branching factor at chance nodes as the search signed for the class of densely stochastic games; horizon is increased. Hence, their performance in large games that is, games where one would rarely expect to often depends heavily on the quality of the heuristic evalua- sample the same successor state multiple times at tion function, as only shallow searches are possible. any particular chance node. Our approach com- bines sparse sampling techniques from MDP plan- One way to handle the uncertainty at chance nodes would ning with classic pruning techniques developed for be forward pruning [Smith and Nau, 1993], but the perfor- adversarial expectimax planning. We compare and mance gain until now has been small [Schadd et al., 2009].
    [Show full text]
  • Using Monte Carlo Tree Search As a Demonstrator Within Asynchronous Deep RL
    Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL Bilal Kartal∗, Pablo Hernandez-Leal∗ and Matthew E. Taylor fbilal.kartal,pablo.hernandez,[email protected] Borealis AI, Edmonton, Canada Abstract There are several ways to get the best of both DRL and search methods. For example, AlphaGo Zero (Silver et al. Deep reinforcement learning (DRL) has achieved great suc- 2017) and Expert Iteration (Anthony, Tian, and Barber 2017) cesses in recent years with the help of novel methods and concurrently proposed the idea of combining DRL and higher compute power. However, there are still several chal- MCTS in an imitation learning framework where both com- lenges to be addressed such as convergence to locally opti- mal policies and long training times. In this paper, firstly, ponents improve each other. These works combine search we augment Asynchronous Advantage Actor-Critic (A3C) and neural networks sequentially. First, search is used to method with a novel self-supervised auxiliary task, i.e. Ter- generate an expert move dataset which is used to train a pol- minal Prediction, measuring temporal closeness to terminal icy network (Guo et al. 2014). Second, this network is used states, namely A3C-TP. Secondly, we propose a new frame- to improve expert search quality (Anthony, Tian, and Barber work where planning algorithms such as Monte Carlo tree 2017), and this is repeated in a loop. However, expert move search or other sources of (simulated) demonstrators can be data collection by vanilla search algorithms can be slow in a integrated to asynchronous distributed DRL methods. Com- sequential framework (Guo et al.
    [Show full text]
  • Comparission Study of MCTS Versus Alpha-Beta Methods
    Charles University in Prague Faculty of Mathematics and Physics BACHELOR THESIS Tom´aˇsJakl Arimaa challenge – comparission study of MCTS versus alpha-beta methods Department of Theoretical Computer Science and Mathematical Logic Supervisor of the bachelor thesis: Mgr. Vladan Majerech, Dr. Study programme: Computer Science Specialization: General Computer Science Prague 2011 I would like to thank to my family for their support, and to Jakub Slav´ık and Jan Vanˇeˇcek for letting me stay in their flat at the end of the work. I would also like to thank to my supervisor for very useful and relevant comments he gave me. I declare that I carried out this bachelor thesis independently, and only with the cited sources, literature and other professional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Prague has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act. In Prague date August 5, 2011 Tom´aˇsJakl Title: Arimaa challenge – comparission study of MCTS versus alpha-beta methods Author: Tom´aˇsJakl Department: Department of Theoretical Computer Science and Mathematical Logic Supervisor: Mgr. Vladan Majerech, Dr., Department of Theoretical Computer Science and Mathematical Logic Abstract: In the world of chess programming the most successful algorithm for game tree search is considered AlphaBeta search, however in game of Go it is Monte Carlo Tree Search. The game of Arimaa has similarities with both Go and Chess, but there has been no successful program using Monte Carlo Tree Search so far.
    [Show full text]
  • The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions
    The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions Sylvain Gelly*, EPC TAO, INRIA Saclay & LRI Marc Schoenauer, Bât. 490, Université Paris-Sud Michèle Sebag, 91405 Orsay Cedex, France Olivier Teytaud *Now in Google Zurich Levente Kocsis David Silver Csaba Szepesvári MTA SZTAKI University College London University of Alberta Kende u. 13–17. Dept. of Computer Science Dept. of Computing Science Budapest 1111, Hungary London, WC1E 6BT, UK Edmonton T6G 2E8, Canada ABSTRACT Go. The size of the search space in Go is so large that it The ancient oriental game of Go has long been considered a defies brute force search. Furthermore, it is hard to charac- grand challenge for artificial intelligence. For decades, com- terize the strength of a position or move. For these reasons, puter Go has defied the classical methods in game tree search Go remains a grand challenge of artificial intelligence; the that worked so successfully for chess and checkers. How- field awaits a triumph analogous to the 1997 chess match in ever, recent play in computer Go has been transformed by a which Deep Blue defeated the world champion Gary Kas- new paradigm for tree search based on Monte-Carlo meth- parov [22]. ods. Programs based on Monte-Carlo tree search now play The last five years have witnessed considerable progress in at human-master levels and are beginning to challenge top computer Go. With the development of new Monte-Carlo professional players. In this paper we describe the leading algorithms for tree search [12, 18], computer Go programs algorithms for Monte-Carlo tree search and explain how they have achieved some remarkable successes [15, 12], including have advanced the state of the art in computer Go.
    [Show full text]