Improving MCTS and Neural Network Communication in Computer Go

Total Page:16

File Type:pdf, Size:1020Kb

Improving MCTS and Neural Network Communication in Computer Go Improving MCTS and Neural Network Communication in Computer Go Joshua Keller Oscar Perez Worcester Polytechnic Institute a Major Qualifying Project Report submitted to the faculty of Worcester Polytechnic Institute in partial fulfillment of the requirements for the Degree of Bachelor of Science April 24, 2016 ii Abstract In March 2016, AlphaGo, a computer Go program developed by Google DeepMind, won a 5-game match against Lee Sedol, one of the best Go play- ers in the world. Its victory marks a major advance in the field of computer Go. However, much remains to be done. There is a gap between the com- putational power AlphaGo used in the match, and the computational power available to the majority of computer users today. Further, the communica- tion between two of the techniques used by AlphaGo, neural networks and Monte Carlo Tree Search, can be improved. We investigate four different approaches towards accomplishing this end, with a focus on methods that require minimal computational power. Each method shows promise and can be developed further. iv Acknowledgements We would like to acknowledge: • Levente Kocsis, for his advice and guidance throughout the project • Sarun Paisarnsrisomsuk and Pitchaya Wiratchotisatian, for their im- plementation of the neural networks we used • MTA-SZTAKI, for providing excellent facilities for the duration of our project • Gabor N. Sarkozy, our WPI advisor • Worcester Polytechnic Institute, for providing us with this opportunity ii Contents List of Figures v List of Tables vii 1 Introduction 1 1.1 A New Era in Go Knowledge . .1 1.2 AlphaGo vs. Lee Sedol . .2 1.3 Two Powerful Techniques . .4 1.4 Next Steps for Computer Go . .6 2 Background 9 2.1 The Game of Go . .9 2.1.1 Rules . .9 2.1.2 Ranking System . 12 2.1.3 The Role of Go in Artificial Intelligence . 13 2.2 Computer Go Techniques . 15 2.2.1 Monte Carlo Tree Search . 15 2.2.2 Upper Confidence Bounds on Trees . 17 2.2.3 Deep Convolutional Neural Networks . 20 2.2.4 How AlphaGo Combines MCTS with Neural Networks . 23 iii CONTENTS 3 Methods 25 3.1 Move Selection in Pachi . 26 3.2 Our Approaches . 28 3.2.1 Adding the Neural Network to Pachi's Prior Knowledge . 29 3.2.2 Optimizing for Current Depth . 30 3.2.3 Training the Neural Network to Inform the Search . 31 3.2.3.1 Why SPSA is Necessary . 32 3.2.3.2 How SPSA Works . 34 3.2.4 Search-Based Features . 35 3.3 Testing . 37 4 Results & Evaluation 39 5 Conclusion & Future Work 45 5.1 Summary . 45 5.2 Future Work . 46 References 47 iv List of Figures 1.1 A Two-Headed Dragon . .1 1.2 The Hand of God . .4 1.3 AlphaGo's \computer-style" move . .7 2.1 Rules of Go . 10 2.2 Go Ranks . 12 2.3 The Problem with a Territory Heuristic . 13 2.4 Minimax Search . 15 2.5 MCTS Phases . 17 2.6 Simple Neural Network . 20 2.7 Fully Connected Neural Network . 21 2.8 Convolutional Neural Network (CNN) . 22 3.1 Neural Network Visualization . 38 4.1 Frequency That MCTS Expanded a Node at Each Depth . 41 v LIST OF FIGURES vi List of Tables 4.1 Pachi's Win Rate at Varying Neural Network Influence Levels . 40 4.2 Win Rate of Pachi with Different Neural Networks at Different Layers . 41 4.3 Accuracy of SPSA-trained Neural Network . 43 4.4 Accuracy of Neural Network with Search-Based Feature . 43 vii LIST OF TABLES viii 1 Introduction 1.1 A New Era in Go Knowledge The game of Go has existed for centuries. In fact, it is probably the oldest known strategy game in the world. As a result, Go theory has had an exceptionally long time to grow and develop. Over time, people have noticed patterns and techniques and given them colorful descriptions, for example: \two headed dragon", \tiger's mouth", \throwing star shape", etc. Figure 1.1: A Two-Headed Dragon - taken from [1] 1 1. INTRODUCTION Entire sequences of moves have become customary in certain situations as an agreed- upon \fair trade" (these are termed \joseki"). For instance, from a particular joseki, one player might gain a more secure territory in the corner, while the other obtains better central influence. The idea is that these advantages balance each other out. A new Go player can study these techniques, learn when to apply them in games through practice, and very quickly become a much better player. Until recently, Go knowledge has always been added to by the top players and theoreticians. Computer Go programs did not have much to teach us, consistently playing at a level far below that of the best humans. All of this changed in March 2016. A program developed by Google DeepMind, called AlphaGo, challenged Lee Sedol to a 5-game match, one of the strongest Go players, if not the strongest, in the world. The outcome of this match marked the beginning of a new era for Go, one in which we can learn from computers as well as humans. 1.2 AlphaGo vs. Lee Sedol The match itself was widely publicized. It was televised throughout South Korea. It had 60 million viewers in China. There was an international array of commentators analyzing each game live [2]. Most of the viewers were rooting for Lee to win. He himself was quite confident he would win, at first. Lee apparently underestimated AlphaGo in the first game. In their paper [3], the AlphaGo team had provided the games of AlphaGo's recent 5-game match with Eu- ropean champion Fan Hui. AlphaGo had defeated Fan Hui in a landslide 5-0 victory, but Fan Hui was ranked much lower than Lee Sedol. Lee looked at the games and suspected that AlphaGo's playing style was too defensive, and he shouldn't have too much trouble winning. However, AlphaGo had been training itself in the 5 months 2 1.2 AlphaGo vs. Lee Sedol since that match. It exhibited a dramatic improvement in playing strength in their first game. In the end, Lee Sedol lost the match 4 games to 1. This was an incredible vic- tory for AlphaGo. It had conquered what is often termed the \holy grail of artificial intelligence," a feat that was thought to be more than a decade away. However, AlphaGo did not come away unscathed. It did lose the fourth game of the match. Interestingly, it was playing as Black in that game. The only other game that Lee Sedol came close to winning was the second game, in which AlphaGo was also playing as Black. In Go, Black moves first, which gives that player an advantage. To compensate for this, White is given extra points at the start of the game, called komi. Some speculate that AlphaGo was more comfortable (whatever that can mean for a computer program) when playing White, because then equality on the board would be enough to secure a win [4]. As Black, AlphaGo would need an 8-point advantage or more on the board for a win (the komi was 7.5 points to avoid ties). Apparently it preferred the komi to the first-move advantage. The game that Lee Sedol did win was an exciting one. He played a very tactical style that turned the game into an all-or-nothing fight, instead of a slow-moving incremental buildup of advantages for both sides that played into AlphaGo's superior calculation abilities [5]. On move 78, he played a brilliant move, a close-range tactical move that put him back in the game just as it seemed he might be losing. Gu Li, one of the commentators for game 4 (and a top professional player himself), referred to this move as the \hand of God." The \hand of God," or \divine move," is something many professional Go players aspire to achieve at least once in their lives. Essentially, it is a move so startlingly original and powerful that it is as if it were divinely inspired. Certainly Lee's move 78 was not foreseen by commentators, and apparently not even 3 1. INTRODUCTION Figure 1.2: The Hand of God - Lee Sedol's \hand of God" move is marked with a triangle. by AlphaGo. It is a move he can be proud of for years to come, and in a way, it makes up for the losses he had in the other games of the match. The reader is strongly encouraged to watch the game at [8]. 1.3 Two Powerful Techniques Go is a very hard game for computers to play. The traditional approach in similar games, such as chess, is to construct a tree and look at all the possible move sequences of a certain length. Even in chess the full tree of all complete games is much too big, so the tree is cut off at a certain point, and the positions are evaluated using some evaluation function. In chess, the material count (i.e. 9 points for a Queen, 5 points for a Rook, etc.) serves as a useful and practical evaluation function. It can be made more subtle by introducing positional attributes, such as -0.2 for each pair of doubled 4 1.3 Two Powerful Techniques pawns. One problem for Go is that the search tree has to be much bigger in both width and depth: Go games last about 5 times longer than chess games, and each turn, there are roughly 5 times as many possible moves in Go compared to chess.
Recommended publications
  • FUEGO—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search Markus Enzenberger, Martin Müller, Broderick Arneson, and Richard Segal
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 2, NO. 4, DECEMBER 2010 259 FUEGO—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search Markus Enzenberger, Martin Müller, Broderick Arneson, and Richard Segal Abstract—FUEGO is both an open-source software framework with new algorithms that would not be possible otherwise be- and a state-of-the-art program that plays the game of Go. The cause of the cost of implementing a complete state-of-the-art framework supports developing game engines for full-information system. Successful previous examples show the importance of two-player board games, and is used successfully in a substantial open-source software: GNU GO [19] provided the first open- number of projects. The FUEGO Go program became the first program to win a game against a top professional player in 9 9 source Go program with a strength approaching that of the best Go. It has won a number of strong tournaments against other classical programs. It has had a huge impact, attracting dozens programs, and is competitive for 19 19 as well. This paper gives of researchers and hundreds of hobbyists. In Chess, programs an overview of the development and current state of the UEGO F such as GNU CHESS,CRAFTY, and FRUIT have popularized in- project. It describes the reusable components of the software framework and specific algorithms used in the Go engine. novative ideas and provided reference implementations. To give one more example, in the field of domain-independent planning, Index Terms—Computer game playing, Computer Go, FUEGO, systems with publicly available source code such as Hoffmann’s man-machine matches, Monte Carlo tree search, open source soft- ware, software frameworks.
    [Show full text]
  • Reinforcement Learning of Local Shape in the Game of Go
    Reinforcement Learning of Local Shape in the Game of Go David Silver, Richard Sutton, and Martin Muller¨ Department of Computing Science University of Alberta Edmonton, Canada T6G 2E8 {silver, sutton, mmueller}@cs.ualberta.ca Abstract effective. They are fast to compute; easy to interpret, modify and debug; and they have good convergence properties. We explore an application to the game of Go of Secondly, weights are trained by temporal difference learn- a reinforcement learning approach based on a lin- ing and self-play. The world champion Checkers program ear evaluation function and large numbers of bi- Chinook was hand-tuned by expert players over 5 years. nary features. This strategy has proved effective When weights were trained instead by self-play using a tem- in game playing programs and other reinforcement poral difference learning algorithm, the program equalled learning applications. We apply this strategy to Go the performance of the original version [7]. A similar ap- by creating over a million features based on tem- proach attained master level play in Chess [1]. TD-Gammon plates for small fragments of the board, and then achieved world class Backgammon performance after train- use temporal difference learning and self-play. This ingbyTD(0)andself-play[13]. A program trained by method identifies hundreds of low level shapes with TD(λ) and self-play outperformed an expert, hand-tuned ver- recognisable significance to expert Go players, and sion at the card game Hearts [11]. Experience generated provides quantitive estimates of their values. We by self-play was also used to train the weights of the world analyse the relative contributions to performance of champion Othello and Scrabble programs, using least squares templates of different types and sizes.
    [Show full text]
  • Curriculum Guide for Go in Schools
    Curriculum Guide 1 Curriculum Guide for Go In Schools by Gordon E. Castanza, Ed. D. October 19, 2011 Published By: Rittenberg Consulting Group 7806 108th St. NW Gig Harbor, WA 98332 253-853-4831 © 2005 by Gordon E. Castanza, Ed. D. Curriculum Guide 2 Table of Contents Acknowledgements ......................................................................................................................... 4 Purpose and Rationale..................................................................................................................... 5 About this curriculum guide ................................................................................................... 7 Introduction ..................................................................................................................................... 8 Overview ................................................................................................................................. 9 Building Go Instructor Capacity ........................................................................................... 10 Developing Relationships and Communicating with the Community ................................. 10 Using Resources Effectively ................................................................................................. 11 Conclusion ............................................................................................................................ 11 Major Trends and Issues ..........................................................................................................
    [Show full text]
  • FUEGO – an Open-Source Framework for Board Games and Go Engine Based on Monte-Carlo Tree Search
    FUEGO – An Open-source Framework for Board Games and Go Engine Based on Monte-Carlo Tree Search Markus Enzenberger, Martin Muller,¨ Broderick Arneson and Richard Segal Abstract—FUEGO is both an open-source software frame- available source code such as Hoffmann’s FF [20] have had work and a state of the art program that plays the game of a similarly massive impact, and have enabled much followup Go. The framework supports developing game engines for full- research. information two-player board games, and is used successfully in UEGO a substantial number of projects. The FUEGO Go program be- F contains a game-independent, state of the art came the first program to win a game against a top professional implementation of MCTS with many standard enhancements. player in 9×9 Go. It has won a number of strong tournaments It implements a coherent design, consistent with software against other programs, and is competitive for 19 × 19 as well. engineering best practices. Advanced features include a lock- This paper gives an overview of the development and free shared memory architecture, and a flexible and general current state of the FUEGO project. It describes the reusable components of the software framework and specific algorithms plug-in architecture for adding domain-specific knowledge in used in the Go engine. the game tree. The FUEGO framework has been proven in applications to Go, Hex, Havannah and Amazons. I. INTRODUCTION The main innovation of the overall FUEGO framework may lie not in the novelty of any of its specific methods and Research in computing science is driven by the interplay algorithms, but in the fact that for the first time, a state of of theory and practice.
    [Show full text]
  • Go Books Detail
    Evanston Go Club Ian Feldman Lending Library A Compendium of Trick Plays Nihon Ki-in In this unique anthology, the reader will find the subject of trick plays in the game of go dealt with in a thorough manner. Practically anything one could wish to know about the subject is examined from multiple perpectives in this remarkable volume. Vital points in common patterns, skillful finesse (tesuji) and ordinary matters of good technique are discussed, as well as the pitfalls that are concealed in seemingly innocuous positions. This is a gem of a handbook that belongs on the bookshelf of every go player. Chapter 1 was written by Ishida Yoshio, former Meijin-Honinbo, who intimates that if "joseki can be said to be the highway, trick plays may be called a back alley. When one masters the alleyways, one is on course to master joseki." Thirty-five model trick plays are presented in this chapter, #204 and exhaustively analyzed in the style of a dictionary. Kageyama Toshiro 7 dan, one of the most popular go writers, examines the subject in Chapter 2 from the standpoint of full board strategy. Chapter 3 is written by Mihori Sho, who collaborated with Sakata Eio to produce Killer of Go. Anecdotes from the history of go, famous sayings by Sun Tzu on the Art of Warfare and contemporary examples of trickery are woven together to produce an entertaining dialogue. The final chapter presents twenty-five problems for the reader to solve, using the knowledge gained in the preceding sections. Do not be surprised to find unexpected booby traps lurking here also.
    [Show full text]
  • Cognitive Reflection and Theory of Mind of Go Players Marc Oliver Rieger1and Mei Wang2
    RESEARCH ARTICLE ADVANCES IN COGNITIVE PSYCHOLOGY Cognitive Reflection and Theory of Mind of Go Players Marc Oliver Rieger1and Mei Wang2 1 University of Trier 2 WHU – Otto Beisheim School of Management ABSTRACT Go is a classical Chinese mind game and a highly popular intellectual pursuit in East Asia. In a survey KEYWORDS at two Go tournaments (one of them the largest in Europe), we measured cognitive reflection and cognitive reflection test decision in strategic games (using the classical “beauty contest” game) (N = 327). We found that Go theory of mind players in our survey had outstanding average cognitive reflection test (CRT) scores: 2.51 among all patience participants and 2.80 among players of high master level (dan). This value easily outperforms previ- board games ous measurements, for example, of undergraduates at top universities. The CRT score was closely Go related to the playing strength, but not to the frequency of playing. On the other hand, frequent weiqi players tended to have higher theory of mind, regardless of their playing strengths. However, self- baduk reported patience was not statistically significantly correlated with Go strength or playing frequency. chess INTRODUCTION Human behavior is influenced by cognitive ability. Traditionally, these playing strategic games that require adopting the other person’s point abilities have been summarized in one dimension, intelligence, but of view, for example, the famous “beauty contest game or “Keynesian subsequent research focused on finding more and more facets of intel- beauty contest” (Nagel, 1995).1 The name of this game is inspired by ligence that are important for specific tasks.
    [Show full text]
  • The Way to Go Is a Copyrighted Work
    E R I C M A N A G F O O N How to U I O N D A T www.usgo.org www.agfgo.org play the Asian game of Go The Way to by Karl Baker American Go Association Box 397 Old Chelsea Station New York, NY 10113 GO Legal Note: The Way To Go is a copyrighted work. Permission is granted to make complete copies for personal use. Copies may be distributed freely to others either in print or electronic form, provided no fee is charged for distribution and all copies contain this copyright notice. The Way to Go by Karl Baker American Go Association Box 397 Old Chelsea Station New York, NY 10113 http://www.usgo.org Cover print: Two Immortals and the Woodcutter A watercolor by Seikan. Date unknown. How to play A scene from the Ranka tale: Immortals playing go as the the ancient/modern woodcutter looks on. From Japanese Prints and the World of Go by William Pinckard at: Asian Game of Go http://www.kiseido.com/printss/cover.htm Dedicated to Ann All in good time there will come a climax which will lift one to the heights, but first a foundation must be laid, INSPIRED BY HUNDREDS broad, deep and solid... OF BAFFLED STUDENTS Winfred Ernest Garrison © Copyright 1986, 2008 Preface American Go Association The game of GO is the essence of simplicity and the ultimate in complexity all at the same time. It is taught earnestly at military officer training schools in the Orient, as an exercise in military strategy.
    [Show full text]
  • 1 Introduction
    An Investigation of an Evolutionary Approach to the Opening of Go Graham Kendall, Razali Yaakob Philip Hingston School of Computer Science and IT School of Computer and Information Science University of Nottingham, UK Edith Cowan University, WA, Australia {gxk|rby}@cs.nott.ac.uk [email protected] Abstract – The game of Go can be divided into three stages; even told how it had performed in each individual game. the opening, the middle, and the end game. In this paper, The conclusion of Fogel and Chellapilla is that, without evolutionary neural networks, evolved via an evolutionary any expert knowledge, a program can learn to play a game strategy, are used to develop opening game playing strategies at an expert level using a co-evolutionary approach. for the game. Go is typically played on one of three different board sizes, i.e., 9x9, 13x13 and 19x19. A 19x19 board is the Additional information about this work can be found in standard size for tournament play but 9x9 and 13x13 boards [19, 20, 21, 22]. are usually used by less-experienced players or for faster Go is considered more complex than chess or checkers. games. This paper focuses on the opening, using a 13x13 One measure of this complexity is the number of positions board. A feed forward neural network player is played against that can be reached from the starting position which Bouzy a static player (Gondo), for the first 30 moves. Then Gondo and Cazenave estimate to be 10160 [23]. Chess and takes the part of both players to play out the remainder of the checkers are about 1050 and 1017, respectively [23].
    [Show full text]
  • What Is the Project Title
    Artificial Intelligence for Go CIS 499 Senior Design Project Final Report Kristen Ying Advisors: Dr. Badler and Dr. Likhachev University of Pennsylvania PROJECT ABSTRACT Go is an ancient board game, originating in China – where it is known as weiqi - before 600 B.C. It spread to Japan (where it is also known as Go), to Korea as baduk, and much more recently to Europe and America [American Go Association]. This game, in which players take turn placing stones on a 19x19 board in an attempt to capture territory, is polynomial-space hard [Reyzin]. Checkers is also polynomial-space hard, and is a solved problem [Schaeffer]. However, the sheer magnitude of options (and for intelligent algorithms, sheer number of possible strategies) makes it infeasible to create even passable novice AI with an exhaustive pure brute-force approach on modern hardware. The board is 19 x19, which, barring illegal moves, allows for about 10171 possible ways to execute a game. This number is about 1081 times greater than the believed number of elementary particles contained in the known universe [Keim]. Previously investigated approaches to creating AI for Go include human-like models such as pattern recognizers and machine learning [Burmeister and Wiles]. There are a number of challenges to these approaches, however. Machine learning algorithms can be very sensitive to the data and style of play with which they are trained, and approaches modeling human thought can be very sensitive to the designer‟s perception of how one thinks during a game. Professional Go players seem to have a very intuitive sense of how to play, which is difficult to model.
    [Show full text]
  • Syllabus & Study Pack Module A
    Malaysia Weiqi Association Syllabus & Study Pack Module A 1 Malaysia Weiqi Association Table of Contents A.1 INTRODUCTION A.4 CAPTURING RACE A.1.1 Short historical introduction to the game pg. 03 A.4.1 Capturing Race Pg. 45 A.1.2 Game etiquette and manners pg. 05 A.4.2 Inside liberties vs. outside liberties Pg. 46 A.1.3 Go equipment pg. 06 A.4.3 One eye beats no eyes Pg. 47 A.4.4 Increasing liberties Pg. 48 A.2 GETTING STARTED A.5 ENDGAME A.2.1 How to start? Pg.09 A.2.2 The objective of the Game Pg. 10 A.5.1 How to end a game Pg. 50 A.2.3 How the game ends Pg. 11 A.5.2 Endgame tactics Pg. 53 A.2.4 Building territories Pg. 12 A.2.5 Liberties Pg. 16 A.2.6 Invalid moves Pg. 17 LIST OF BONUS MATERIALS A.2.7 Living and Dead groups Pg. 21 A.2.8 Multiple Eye Space Pg. 27 1. Various Names of Go Pg. 04 2. How to Hold a Go Stone Pg. 08 3. Hikaru No Go Pg. 20 A.3 BASIC TACTICS 4. Ranking and Handicap Pg. 49 5. Resources On the Internet Pg. 58 A.3.1 Connecting and Cutting Pg. 31 6. Three Tips on Improving Pg. 59 A.3.2 Double Atari Pg. 35 A.3.3 Ladders Pg. 36 AUTHOR’S NOTES Pg. 60 A.3.4 Nets Pg. 37 A.3.5 Snapback Pg. 38 A.3.6 Throw-In Pg.
    [Show full text]
  • Active Opening Book Application for Monte-Carlo Tree Search in 19×19 Go
    Active Opening Book Application for Monte-Carlo Tree Search in 19×19 Go Hendrik Baier and Mark H. M. Winands Games and AI Group, Department of Knowledge Engineering Maastricht University, Maastricht, The Netherlands Abstract The dominant approach for programs playing the Asian board game of Go is nowadays Monte-Carlo Tree Search (MCTS). However, MCTS does not perform well in the opening phase of the game, as the branching factor is high and consequences of moves can be far delayed. Human knowledge about Go openings is typically captured in joseki, local sequences of moves that are considered optimal for both players. The choice of the correct joseki in a given whole-board position, however, is difficult to formalize. This paper presents an approach to successfully apply global as well as local opening moves, extracted from databases of high-level game records, in the MCTS framework. Instead of blindly playing moves that match local joseki patterns (passive opening book application), knowledge about these moves is integrated into the search algorithm by the techniques of move pruning and move biasing (active opening book application). Thus, the opening book serves to nudge the search into the direction of tried and tested local moves, while the search is able to filter out locally optimal, but globally problematic move choices. In our experiments, active book application outperforms passive book application and plain MCTS in 19×19 Go. 1 Introduction Since the early days of Artificial Intelligence (AI) as a scientific field, games have been used as testbed and benchmark for AI architectures and algorithms. The board game of Go, for example, has stimulated con- siderable research efforts over the last decades, second only to computer chess.
    [Show full text]
  • The Official AGA Tournament Guide
    The Official AGA Tournament Guide by Ken Koester Copyright © The American Go Association 2006 Table of Contents Part I: The Tasks Introduction........................................................................................................................................... v Chapter 1. Planning.............................................................................................................................1 Theme and System ................................................................................................................1 Budgeting, or, The Tournament on $15 and $20 a Day .................................................... 6 In Need of Assistance ...........................................................................................................8 Timetables ...............................................................................................................................9 Chapter 2. Preparation .....................................................................................................................11 Time and Place .....................................................................................................................11 Getting the Word Out ..........................................................................................................12 Food for Thought .................................................................................................................15 The Prize Is Right .................................................................................................................16
    [Show full text]