A Survey of Monte Carlo Tree Search Methods
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Monte-Carlo Tree Search Solver
Monte-Carlo Tree Search Solver Mark H.M. Winands1, Yngvi BjÄornsson2, and Jahn-Takeshi Saito1 1 Games and AI Group, MICC, Universiteit Maastricht, Maastricht, The Netherlands fm.winands,[email protected] 2 School of Computer Science, Reykjav¶³kUniversity, Reykjav¶³k,Iceland [email protected] Abstract. Recently, Monte-Carlo Tree Search (MCTS) has advanced the ¯eld of computer Go substantially. In this article we investigate the application of MCTS for the game Lines of Action (LOA). A new MCTS variant, called MCTS-Solver, has been designed to play narrow tacti- cal lines better in sudden-death games such as LOA. The variant di®ers from the traditional MCTS in respect to backpropagation and selection strategy. It is able to prove the game-theoretical value of a position given su±cient time. Experiments show that a Monte-Carlo LOA program us- ing MCTS-Solver defeats a program using MCTS by a winning score of 65%. Moreover, MCTS-Solver performs much better than a program using MCTS against several di®erent versions of the world-class ®¯ pro- gram MIA. Thus, MCTS-Solver constitutes genuine progress in using simulation-based search approaches in sudden-death games, signi¯cantly improving upon MCTS-based programs. 1 Introduction For decades ®¯ search has been the standard approach used by programs for playing two-person zero-sum games such as chess and checkers (and many oth- ers). Over the years many search enhancements have been proposed for this framework. However, in some games where it is di±cult to construct an accurate positional evaluation function (e.g., Go) the ®¯ approach was hardly success- ful. -
Αβ-Based Play-Outs in Monte-Carlo Tree Search
αβ-based Play-outs in Monte-Carlo Tree Search Mark H.M. Winands Yngvi Bjornsson¨ Abstract— Monte-Carlo Tree Search (MCTS) is a recent [9]. In the context of game playing, Monte-Carlo simulations paradigm for game-tree search, which gradually builds a game- were first used as a mechanism for dynamically evaluating tree in a best-first fashion based on the results of randomized the merits of leaf nodes of a traditional αβ-based search [10], simulation play-outs. The performance of such an approach is highly dependent on both the total number of simulation play- [11], [12], but under the new paradigm MCTS has evolved outs and their quality. The two metrics are, however, typically into a full-fledged best-first search procedure that replaces inversely correlated — improving the quality of the play- traditional αβ-based search altogether. MCTS has in the past outs generally involves adding knowledge that requires extra couple of years substantially advanced the state-of-the-art computation, thus allowing fewer play-outs to be performed in several game domains where αβ-based search has had per time unit. The general practice in MCTS seems to be more towards using relatively knowledge-light play-out strategies for difficulties, in particular computer Go, but other domains the benefit of getting additional simulations done. include General Game Playing [13], Amazons [14] and Hex In this paper we show, for the game Lines of Action (LOA), [15]. that this is not necessarily the best strategy. The newest version The right tradeoff between search and knowledge equally of our simulation-based LOA program, MC-LOAαβ , uses a applies to MCTS. -
A New Look at the Tayler by David Kane
A New Look at the Tayler by David kane I: Introduction The Tayler Variation (aka the Tayler Opening) is a line that has been unjustly neglected, in my view. The line is of surprisingly recent vintage though it is often confused with the Inverted Hungarian (or Inverted Hanham Defense), a line which shares the same opening moves: 1. e4 e5 2. Nf3 Nc6 3. Be2: The Inverted Hungarian is an old opening, dating back to the 1860ʼs, at least. Tartakower played it a few times in the 1920ʼs with mixed results, using the continuation, 3...Nf6 4. d3: a rather unenterprising setup for White. In 1981 British player, John Tayler (see biographical note), published an article in the British publication Chess (vol. 46) on a line he had developed stemming from the sharp 4. d4!?. This is a move which apparently no one had thought to play before, and one that transforms the sedate Inverted Hungarian into something else altogether. Technically, it is really the Tayler Variation to the Inverted Hungarian Defense rather than the Tayler Opening, though through usage, the terms are interchangeable for all practical purposes. As has been so often the case when it comes to unorthodox lines, I first heard of this opening via Mike Basman when he published a cassette on it back in the early 80ʼs (still available through audiochess.com). Tayler 2 The line stirred some interest at the time but gradually seems to have been forgotten. The final nail in the coffin was probably some light analysis published by Eric Schiller in Gambit Chess Openings (and elsewhere) where he dismisses the line primarily due to his loss in the game Schiller-Martinovsky, Chicago 1986. -
Engineering the Substrate Specificity of Galactose Oxidase
Engineering the Substrate Specificity of Galactose Oxidase Lucy Ann Chappell BSc. (Hons) Submitted in accordance with the requirements for the degree of Doctor of Philosophy The University of Leeds School of Molecular and Cellular Biology September 2013 The candidate confirms that the work submitted is her own and that appropriate credit has been given where reference has been made to the work of others. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement © 2013 The University of Leeds and Lucy Ann Chappell The right of Lucy Ann Chappell to be identified as Author of this work has been asserted by her in accordance with the Copyright, Designs and Patents Act 1988. ii Acknowledgements My greatest thanks go to Prof. Michael McPherson for the support, patience, expertise and time he has given me throughout my PhD. Many thanks to Dr Bruce Turnbull for always finding time to help me with the Chemistry side of the project, particularly the NMR work. Also thanks to Dr Arwen Pearson for her support, especially reading and correcting my final thesis. Completion of the laboratory work would not have been possible without the help of numerous lab members and friends who have taught me so much, helped develop my scientific skills and provided much needed emotional support – thank you for your patience and friendship: Dr Thembi Gaule, Dr Sarah Deacon, Dr Christian Tiede, Dr Saskia Bakker, Ms Briony Yorke and Mrs Janice Robottom. Thank you to three students I supervised during my PhD for their friendship and contributions to the project: Mr Ciaran Doherty, Ms Lito Paraskevopoulou and Ms Upasana Mandal. -
Smartk: Efficient, Scalable and Winning Parallel MCTS
SmartK: Efficient, Scalable and Winning Parallel MCTS Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia Newhall Swarthmore College, Swarthmore, PA 19081 ACM Reference Format: processes. Each process constructs a distinct search tree in Michael S. Davinroy, Shawn Pan, Bryce Wiedenbeck, and Tia parallel, communicating only its best solutions at the end of Newhall. 2019. SmartK: Efficient, Scalable and Winning Parallel a number of rollouts to pick a next action. This approach has MCTS. In SC ’19, November 17–22, 2019, Denver, CO. ACM, low communication overhead, but significant duplication of New York, NY, USA, 3 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnneffort across processes. 1 INTRODUCTION Like root parallelization, our SmartK algorithm achieves low communication overhead by assigning independent MCTS Monte Carlo Tree Search (MCTS) is a popular algorithm tasks to processes. However, SmartK dramatically reduces for game playing that has also been applied to problems as search duplication by starting the parallel searches from many diverse as auction bidding [5], program analysis [9], neural different nodes of the search tree. Its key idea is it uses nodes’ network architecture search [16], and radiation therapy de- weights in the selection phase to determine the number of sign [6]. All of these domains exhibit combinatorial search processes (the amount of computation) to allocate to ex- spaces where even moderate-sized problems can easily ex- ploring a node, directing more computational effort to more haust computational resources, motivating the need for par- promising searches. allel approaches. SmartK is our novel parallel MCTS algo- Our experiments with an MPI implementation of SmartK rithm that achieves high performance on large clusters, by for playing Hex show that SmartK has a better win rate and adaptively partitioning the search space to efficiently utilize scales better than other parallelization approaches. -
Draft – Not for Circulation
A Gross Miscarriage of Justice in Computer Chess by Dr. Søren Riis Introduction In June 2011 it was widely reported in the global media that the International Computer Games Association (ICGA) had found chess programmer International Master Vasik Rajlich in breach of the ICGA‟s annual World Computer Chess Championship (WCCC) tournament rule related to program originality. In the ICGA‟s accompanying report it was asserted that Rajlich‟s chess program Rybka contained “plagiarized” code from Fruit, a program authored by Fabien Letouzey of France. Some of the headlines reporting the charges and ruling in the media were “Computer Chess Champion Caught Injecting Performance-Enhancing Code”, “Computer Chess Reels from Biggest Sporting Scandal Since Ben Johnson” and “Czech Mate, Mr. Cheat”, accompanied by a photo of Rajlich and his wife at their wedding. In response, Rajlich claimed complete innocence and made it clear that he found the ICGA‟s investigatory process and conclusions to be biased and unprofessional, and the charges baseless and unworthy. He refused to be drawn into a protracted dispute with his accusers or mount a comprehensive defense. This article re-examines the case. With the support of an extensive technical report by Ed Schröder, author of chess program Rebel (World Computer Chess champion in 1991 and 1992) as well as support in the form of unpublished notes from chess programmer Sven Schüle, I argue that the ICGA‟s findings were misleading and its ruling lacked any sense of proportion. The purpose of this paper is to defend the reputation of Vasik Rajlich, whose innovative and influential program Rybka was in the vanguard of a mid-decade paradigm change within the computer chess community. -
Monte-Carlo Tree Search
Monte-Carlo Tree Search PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit Maastricht, op gezag van de Rector Magnificus, Prof. mr. G.P.M.F. Mols, volgens het besluit van het College van Decanen, in het openbaar te verdedigen op donderdag 30 september 2010 om 14:00 uur door Guillaume Maurice Jean-Bernard Chaslot Promotor: Prof. dr. G. Weiss Copromotor: Dr. M.H.M. Winands Dr. B. Bouzy (Universit´e Paris Descartes) Dr. ir. J.W.H.M. Uiterwijk Leden van de beoordelingscommissie: Prof. dr. ir. R.L.M. Peeters (voorzitter) Prof. dr. M. Muller¨ (University of Alberta) Prof. dr. H.J.M. Peters Prof. dr. ir. J.A. La Poutr´e (Universiteit Utrecht) Prof. dr. ir. J.C. Scholtes The research has been funded by the Netherlands Organisation for Scientific Research (NWO), in the framework of the project Go for Go, grant number 612.066.409. Dissertation Series No. 2010-41 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. ISBN 978-90-8559-099-6 Printed by Optima Grafische Communicatie, Rotterdam. Cover design and layout finalization by Steven de Jong, Maastricht. c 2010 G.M.J-B. Chaslot. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronically, mechanically, photocopying, recording or otherwise, without prior permission of the author. Preface When I learnt the game of Go, I was intrigued by the contrast between the simplicity of its rules and the difficulty to build a strong Go program. -
CS234 Notes - Lecture 14 Model Based RL, Monte-Carlo Tree Search
CS234 Notes - Lecture 14 Model Based RL, Monte-Carlo Tree Search Anchit Gupta, Emma Brunskill June 14, 2018 1 Introduction In this lecture we will learn about model based RL and simulation based tree search methods. So far we have seen methods which attempt to learn either a value function or a policy from experience. In contrast model based approaches first learn a model of the world from experience and then use this for planning and acting. Model-based approaches have been shown to have better sample efficiency and faster convergence in certain settings. We will also have a look at MCTS and its variants which can be used for planning given a model. MCTS was one of the main ideas behind the success of AlphaGo. Figure 1: Relationships among learning, planning, and acting 2 Model Learning By model we mean a representation of an MDP < S; A; R; T; γ> parametrized by η. In the model learning regime we assume that the state and action space S; A are known and typically we also assume conditional independence between state transitions and rewards i.e P [st+1; rt+1jst; at] = P [st+1jst; at]P [rt+1jst; at] Hence learning a model consists of two main parts the reward function R(:js; a) and the transition distribution P (:js; a). k k k k K Given a set of real trajectories fSt ;At Rt ; :::; ST gk=1 model learning can be posed as a supervised learning problem. Learning the reward function R(s; a) is a regression problem whereas learning the transition function P (s0js; a) is a density estimation problem. -
Download Booklet
preMieRe Recording jonathan dove SiReNSONG CHAN 10472 siren ensemble henk guittart 81 CHAN 10472 Booklet.indd 80-81 7/4/08 09:12:19 CHAN 10472 Booklet.indd 2-3 7/4/08 09:11:49 Jonathan Dove (b. 199) Dylan Collard premiere recording SiReNSong An Opera in One Act Libretto by Nick Dear Based on the book by Gordon Honeycombe Commissioned by Almeida Opera with assistance from the London Arts Board First performed on 14 July 1994 at the Almeida Theatre Recorded live at the Grachtenfestival on 14 and 1 August 007 Davey Palmer .......................................... Brad Cooper tenor Jonathan Reed ....................................... Mattijs van de Woerd baritone Diana Reed ............................................. Amaryllis Dieltiens soprano Regulator ................................................. Mark Omvlee tenor Captain .................................................... Marijn Zwitserlood bass-baritone with Wireless Operator .................................... John Edward Serrano speaker Siren Ensemble Henk Guittart Jonathan Dove CHAN 10472 Booklet.indd 4-5 7/4/08 09:11:49 Siren Ensemble piccolo/flute Time Page Romana Goumare Scene 1 oboe 1 Davey: ‘Dear Diana, dear Diana, my name is Davey Palmer’ – 4:32 48 Christopher Bouwman Davey 2 Diana: ‘Davey… Davey…’ – :1 48 clarinet/bass clarinet Diana, Davey Michael Hesselink 3 Diana: ‘You mention you’re a sailor’ – 1:1 49 horn Diana, Davey Okke Westdorp Scene 2 violin 4 Diana: ‘i like chocolate, i like shopping’ – :52 49 Sanne Hunfeld Diana, Davey cello Scene 3 Pepijn Meeuws 5 -
An Analysis of Monte Carlo Tree Search
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James,∗ George Konidaris,† Benjamin Rosman∗‡ ∗University of the Witwatersrand, Johannesburg, South Africa †Brown University, Providence RI 02912, USA ‡Council for Scientific and Industrial Research, Pretoria, South Africa [email protected], [email protected], [email protected] Abstract UCT calls for rollouts to be performed by randomly se- lecting actions until a terminal state is reached. The outcome Monte Carlo Tree Search (MCTS) is a family of directed of the simulation is then propagated to the root of the tree. search algorithms that has gained widespread attention in re- cent years. Despite the vast amount of research into MCTS, Averaging these results over many iterations performs re- the effect of modifications on the algorithm, as well as the markably well, despite the fact that actions during the sim- manner in which it performs in various domains, is still not ulation are executed completely at random. As the outcome yet fully known. In particular, the effect of using knowledge- of the simulations directly affects the entire algorithm, one heavy rollouts in MCTS still remains poorly understood, with might expect that the manner in which they are performed surprising results demonstrating that better-informed rollouts has a major effect on the overall strength of the algorithm. often result in worse-performing agents. We present exper- A natural assumption to make is that completely random imental evidence suggesting that, under certain smoothness simulations are not ideal, since they do not map to realistic conditions, uniformly random simulation policies preserve actions. -
March 2020 Uschess.Org the United States’ Largest Chess Specialty Retailer
March 2020 USChess.org The United States’ Largest Chess Specialty Retailer 888.51.CHESS (512.4377) www.USCFSales.com ƩĂĐŬŝŶŐǁŝƚŚŐϮͲŐϰ The 100 Endgames You Must Know Workbook The Modern Way to Get the Upper Hand in Chess WƌĂĐƟĐĂůŶĚŐĂŵĞƐdžĞƌĐŝƐĞƐĨŽƌǀĞƌLJŚĞƐƐWůĂLJĞƌ Dmitry Kryavkin 288 pages - $24.95 Jesus de la Villa 288 pages - $24.95 dŚĞƉĂǁŶƚŚƌƵƐƚŐϮͲŐϰŝƐĂƉĞƌĨĞĐƚǁĂLJƚŽĐŽŶĨƵƐĞLJŽƵƌ ͞/ůŽǀĞƚŚŝƐŬ͊/ŶŽƌĚĞƌƚŽŵĂƐƚĞƌĞŶĚŐĂŵĞƉƌŝŶĐŝƉůĞƐLJŽƵ ŽƉƉŽŶĞŶƚƐĂŶĚĚŝƐƌƵƉƚƚŚĞŝƌƉŽƐŝƟŽŶ͘tŝƚŚůŽƚƐŽĨŝŶƐƚƌƵĐƟǀĞ ǁŝůůŶĞĞĚƚŽƉƌĂĐƟĐĞƚŚĞŵ͘͟ʹNM Han Schut, Chess.com ĞdžĂŵƉůĞƐ'D<ƌLJĂŬǀŝŶƐŚŽǁƐŚŽǁŝƚĐĂŶďĞƵƐĞĚƚŽĚĞĨĞĂƚ ͞dŚĞƉĞƌĨĞĐƚƐƵƉƉůĞŵĞŶƚƚŽĞůĂsŝůůĂ͛ƐŵĂŶƵĂů͘dŽŐĂŝŶ ůĂĐŬŝŶƚŚĞƵƚĐŚ͕ƚŚĞYƵĞĞŶ͛Ɛ'Ăŵďŝƚ͕ƚŚĞEŝŵnjŽͲ/ŶĚŝĂŶ͕ ƐƵĸĐŝĞŶƚŬŶŽǁůĞĚŐĞŽĨƚŚĞŽƌĞƟĐĂůĞŶĚŐĂŵĞƐLJŽƵƌĞĂůůLJŽŶůLJ ƚŚĞ<ŝŶŐ͛Ɛ/ŶĚŝĂŶ͕ƚŚĞ^ůĂǀĂŶĚƚŚĞŶŐůŝƐŚKƉĞŶŝŶŐ͘>ĞĂƌŶ ŶĞĞĚƚǁŽďŽŽŬƐ͘͟ʹIM Herman Grooten, Schaaksite NEW! ƚŚĞƚLJƉŝĐĂůǁĂLJƐƚŽŐĂŝŶƚĞŵƉŝ͕ŬĞĞƉƚŚĞŵŽŵĞŶƚƵŵĂŶĚ ŵĂdžŝŵŝnjĞLJŽƵƌŽƉƉŽŶĞŶƚ͛ƐƉƌŽďůĞŵƐ͘ Strategic Chess Exercises Keep It Simple 1.d4 Find the Right Way to Outplay Your Opponent ^ŽůŝĚĂŶĚ^ƚƌĂŝŐŚƞŽƌǁĂƌĚŚĞƐƐKƉĞŶŝŶŐZĞƉĞƌƚŽŝƌĞĨŽƌtŚŝƚĞ Emmanuel Bricard 224 pages - $24.95 Christof Sielecki 432 pages - $29.95 &ŝŶĂůůLJĂŶĞdžĞƌĐŝƐĞƐŬƚŚĂƚŝƐŶŽƚĂďŽƵƚƚĂĐƟĐƐ͊ ^ŝĞůĞĐŬŝ͛ƐƌĞƉĞƌƚŽŝƌĞǁŝƚŚϭ͘ĚϰŵĂLJďĞĞǀĞŶĞĂƐŝĞƌƚŽ ͞ƌŝĐĂƌĚŝƐĐůĞĂƌůLJĂǀĞƌLJŐŝŌĞĚƚƌĂŝŶĞƌ͘,ĞƐĞůĞĐƚĞĚĂƐƵƉĞƌď ŵĂƐƚĞƌƚŚĂŶŚŝƐϭ͘ĞϰƌĞĐŽŵŵĞŶĚĂƟŽŶƐ͕ďĞĐĂƵƐĞŝƚŝƐƐƵĐŚĂ ƌĂŶŐĞŽĨƉŽƐŝƟŽŶƐĂŶĚĞdžƉůĂŝŶƐƚŚĞƐŽůƵƟŽŶƐĞdžƚƌĞŵĞůLJ ĐŽŚĞƌĞŶƚƐLJƐƚĞŵ͗ƚŚĞŵĂŝŶĐŽŶĐĞƉƚŝƐĨŽƌtŚŝƚĞƚŽƉůĂLJϭ͘Ěϰ͕ ǁĞůů͘͟ʹGM Daniel King Ϯ͘EĨϯ͕ϯ͘Őϯ͕ϰ͘ŐϮ͕ϱ͘ϬͲϬĂŶĚŝŶŵŽƐƚĐĂƐĞƐϲ͘Đϰ͘ ͞&ŽƌĐŚĞƐƐĐŽĂĐŚĞƐƚŚŝƐŬŝƐŶŽƚŚŝŶŐƐŚŽƌƚŽĨƉŚĞŶŽŵĞŶĂů͘͟ ͞Ɛ/ƚŚŝŶŬƚŚĂƚ/ƐŚŽƵůĚŬĞĞƉŵLJĂĚǀŝĐĞ͚ƐŝŵƉůĞ͕͛/ǁŽƵůĚƐĂLJ -
Distributional Differences Between Human and Computer Play at Chess
Multidisciplinary Workshop on Advances in Preference Handling: Papers from the AAAI-14 Workshop Human and Computer Preferences at Chess Kenneth W. Regan Tamal Biswas Jason Zhou Department of CSE Department of CSE The Nichols School University at Buffalo University at Buffalo Buffalo, NY 14216 USA Amherst, NY 14260 USA Amherst, NY 14260 USA [email protected] [email protected] Abstract In our case the third parties are computer chess programs Distributional analysis of large data-sets of chess games analyzing the position and the played move, and the error played by humans and those played by computers shows the is the difference in analyzed value from its preferred move following differences in preferences and performance: when the two differ. We have run the computer analysis (1) The average error per move scales uniformly higher the to sufficient depth estimated to have strength at least equal more advantage is enjoyed by either side, with the effect to the top human players in our samples, depth significantly much sharper for humans than computers; greater than used in previous studies. We have replicated our (2) For almost any degree of advantage or disadvantage, a main human data set of 726,120 positions from tournaments human player has a significant 2–3% lower scoring expecta- played in 2010–2012 on each of four different programs: tion if it is his/her turn to move, than when the opponent is to Komodo 6, Stockfish DD (or 5), Houdini 4, and Rybka 3. move; the effect is nearly absent for computers. The first three finished 1-2-3 in the most recent Thoresen (3) Humans prefer to drive games into positions with fewer Chess Engine Competition, while Rybka 3 (to version 4.1) reasonable options and earlier resolutions, even when playing was the top program from 2008 to 2011.