Improving MCTS and Neural Network Communication in Computer Go
Total Page:16
File Type:pdf, Size:1020Kb
Improving MCTS and Neural Network Communication in Computer Go Joshua Keller Oscar Perez Worcester Polytechnic Institute a Major Qualifying Project Report submitted to the faculty of Worcester Polytechnic Institute in partial fulfillment of the requirements for the Degree of Bachelor of Science April 24, 2016 ii Abstract In March 2016, AlphaGo, a computer Go program developed by Google DeepMind, won a 5-game match against Lee Sedol, one of the best Go play- ers in the world. Its victory marks a major advance in the field of computer Go. However, much remains to be done. There is a gap between the com- putational power AlphaGo used in the match, and the computational power available to the majority of computer users today. Further, the communica- tion between two of the techniques used by AlphaGo, neural networks and Monte Carlo Tree Search, can be improved. We investigate four different approaches towards accomplishing this end, with a focus on methods that require minimal computational power. Each method shows promise and can be developed further. iv Acknowledgements We would like to acknowledge: • Levente Kocsis, for his advice and guidance throughout the project • Sarun Paisarnsrisomsuk and Pitchaya Wiratchotisatian, for their im- plementation of the neural networks we used • MTA-SZTAKI, for providing excellent facilities for the duration of our project • Gabor N. Sarkozy, our WPI advisor • Worcester Polytechnic Institute, for providing us with this opportunity ii Contents List of Figures v List of Tables vii 1 Introduction 1 1.1 A New Era in Go Knowledge . .1 1.2 AlphaGo vs. Lee Sedol . .2 1.3 Two Powerful Techniques . .4 1.4 Next Steps for Computer Go . .6 2 Background 9 2.1 The Game of Go . .9 2.1.1 Rules . .9 2.1.2 Ranking System . 12 2.1.3 The Role of Go in Artificial Intelligence . 13 2.2 Computer Go Techniques . 15 2.2.1 Monte Carlo Tree Search . 15 2.2.2 Upper Confidence Bounds on Trees . 17 2.2.3 Deep Convolutional Neural Networks . 20 2.2.4 How AlphaGo Combines MCTS with Neural Networks . 23 iii CONTENTS 3 Methods 25 3.1 Move Selection in Pachi . 26 3.2 Our Approaches . 28 3.2.1 Adding the Neural Network to Pachi's Prior Knowledge . 29 3.2.2 Optimizing for Current Depth . 30 3.2.3 Training the Neural Network to Inform the Search . 31 3.2.3.1 Why SPSA is Necessary . 32 3.2.3.2 How SPSA Works . 34 3.2.4 Search-Based Features . 35 3.3 Testing . 37 4 Results & Evaluation 39 5 Conclusion & Future Work 45 5.1 Summary . 45 5.2 Future Work . 46 References 47 iv List of Figures 1.1 A Two-Headed Dragon . .1 1.2 The Hand of God . .4 1.3 AlphaGo's \computer-style" move . .7 2.1 Rules of Go . 10 2.2 Go Ranks . 12 2.3 The Problem with a Territory Heuristic . 13 2.4 Minimax Search . 15 2.5 MCTS Phases . 17 2.6 Simple Neural Network . 20 2.7 Fully Connected Neural Network . 21 2.8 Convolutional Neural Network (CNN) . 22 3.1 Neural Network Visualization . 38 4.1 Frequency That MCTS Expanded a Node at Each Depth . 41 v LIST OF FIGURES vi List of Tables 4.1 Pachi's Win Rate at Varying Neural Network Influence Levels . 40 4.2 Win Rate of Pachi with Different Neural Networks at Different Layers . 41 4.3 Accuracy of SPSA-trained Neural Network . 43 4.4 Accuracy of Neural Network with Search-Based Feature . 43 vii LIST OF TABLES viii 1 Introduction 1.1 A New Era in Go Knowledge The game of Go has existed for centuries. In fact, it is probably the oldest known strategy game in the world. As a result, Go theory has had an exceptionally long time to grow and develop. Over time, people have noticed patterns and techniques and given them colorful descriptions, for example: \two headed dragon", \tiger's mouth", \throwing star shape", etc. Figure 1.1: A Two-Headed Dragon - taken from [1] 1 1. INTRODUCTION Entire sequences of moves have become customary in certain situations as an agreed- upon \fair trade" (these are termed \joseki"). For instance, from a particular joseki, one player might gain a more secure territory in the corner, while the other obtains better central influence. The idea is that these advantages balance each other out. A new Go player can study these techniques, learn when to apply them in games through practice, and very quickly become a much better player. Until recently, Go knowledge has always been added to by the top players and theoreticians. Computer Go programs did not have much to teach us, consistently playing at a level far below that of the best humans. All of this changed in March 2016. A program developed by Google DeepMind, called AlphaGo, challenged Lee Sedol to a 5-game match, one of the strongest Go players, if not the strongest, in the world. The outcome of this match marked the beginning of a new era for Go, one in which we can learn from computers as well as humans. 1.2 AlphaGo vs. Lee Sedol The match itself was widely publicized. It was televised throughout South Korea. It had 60 million viewers in China. There was an international array of commentators analyzing each game live [2]. Most of the viewers were rooting for Lee to win. He himself was quite confident he would win, at first. Lee apparently underestimated AlphaGo in the first game. In their paper [3], the AlphaGo team had provided the games of AlphaGo's recent 5-game match with Eu- ropean champion Fan Hui. AlphaGo had defeated Fan Hui in a landslide 5-0 victory, but Fan Hui was ranked much lower than Lee Sedol. Lee looked at the games and suspected that AlphaGo's playing style was too defensive, and he shouldn't have too much trouble winning. However, AlphaGo had been training itself in the 5 months 2 1.2 AlphaGo vs. Lee Sedol since that match. It exhibited a dramatic improvement in playing strength in their first game. In the end, Lee Sedol lost the match 4 games to 1. This was an incredible vic- tory for AlphaGo. It had conquered what is often termed the \holy grail of artificial intelligence," a feat that was thought to be more than a decade away. However, AlphaGo did not come away unscathed. It did lose the fourth game of the match. Interestingly, it was playing as Black in that game. The only other game that Lee Sedol came close to winning was the second game, in which AlphaGo was also playing as Black. In Go, Black moves first, which gives that player an advantage. To compensate for this, White is given extra points at the start of the game, called komi. Some speculate that AlphaGo was more comfortable (whatever that can mean for a computer program) when playing White, because then equality on the board would be enough to secure a win [4]. As Black, AlphaGo would need an 8-point advantage or more on the board for a win (the komi was 7.5 points to avoid ties). Apparently it preferred the komi to the first-move advantage. The game that Lee Sedol did win was an exciting one. He played a very tactical style that turned the game into an all-or-nothing fight, instead of a slow-moving incremental buildup of advantages for both sides that played into AlphaGo's superior calculation abilities [5]. On move 78, he played a brilliant move, a close-range tactical move that put him back in the game just as it seemed he might be losing. Gu Li, one of the commentators for game 4 (and a top professional player himself), referred to this move as the \hand of God." The \hand of God," or \divine move," is something many professional Go players aspire to achieve at least once in their lives. Essentially, it is a move so startlingly original and powerful that it is as if it were divinely inspired. Certainly Lee's move 78 was not foreseen by commentators, and apparently not even 3 1. INTRODUCTION Figure 1.2: The Hand of God - Lee Sedol's \hand of God" move is marked with a triangle. by AlphaGo. It is a move he can be proud of for years to come, and in a way, it makes up for the losses he had in the other games of the match. The reader is strongly encouraged to watch the game at [8]. 1.3 Two Powerful Techniques Go is a very hard game for computers to play. The traditional approach in similar games, such as chess, is to construct a tree and look at all the possible move sequences of a certain length. Even in chess the full tree of all complete games is much too big, so the tree is cut off at a certain point, and the positions are evaluated using some evaluation function. In chess, the material count (i.e. 9 points for a Queen, 5 points for a Rook, etc.) serves as a useful and practical evaluation function. It can be made more subtle by introducing positional attributes, such as -0.2 for each pair of doubled 4 1.3 Two Powerful Techniques pawns. One problem for Go is that the search tree has to be much bigger in both width and depth: Go games last about 5 times longer than chess games, and each turn, there are roughly 5 times as many possible moves in Go compared to chess.