UCT-Enhanced Deep Convolutional Neural Network for Move Recommendation in Go

Total Page:16

File Type:pdf, Size:1020Kb

UCT-Enhanced Deep Convolutional Neural Network for Move Recommendation in Go MQP CDR#GXS 1501 UCT-Enhanced Deep Convolutional Neural Network for Move Recommendation in Go a Major Qualifying Project Report submitted to the faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for th e Degree of Bachelor of Science by Sarun Paisarnsrisomsuk Pitchaya Wiratchotisatian April 30, 2015 Professor Gábor N. Sárközy, Major Advisor Abstract Deep convolutional neural networks (CNN) have been proved to be useful in predicting moves of human Go experts [10], [12]. Combining Upper Confidence bounds applied to Trees (UCT) with a large deep CNN creates an even more powerful artificial intelligence method in playing Go [12]. Our project introduced a new feature, board patterns at the end of a game, used as inputs to the network. We implemented a small, single-thread, 1-hidden-layer deep CNN without using GPU, and trained it by supervised learning from a database of human professional games. By adding the new feature, our best model was able to correctly predict the experts’ move in 18% of the positions, compared to 6% without this feature. In practice, even though the board pattern at the end of the game is invisible to Go players until the very end of the game, collecting these statistics during each simulation in UCT can approximate the likelihood of which parts of the board belong to the players. We created another dataset by using UCT to simulate a number of playouts and calculating the average influence value of Black and White positions in those boards at the end of the simulations. We used this dataset as the input to the deep CNN and found that the network was able to correctly predict the experts’ move in 9% of positions. i Acknowledgements We would like to express our gratitude to the people whose assistance, hospitality, and advice helped the completion of this project: Levente Kocsis, Project Advisor and SZTAKI liaison Gabor Sarkozy, MQP Advisor Worcester Polytechnic Institute MTA-SZTAKI SZTAKI Colleagues Pachi and Gnu Go Development Team Ryan Harris on his YouTube videos about the back-propagation algorithm i Table of Contents Abstract ........................................................................................................................................... i Acknowledgements ........................................................................................................................ i Table of Contents .......................................................................................................................... ii Table of Figures............................................................................................................................ iv 1 Introduction ................................................................................................................................ 1 2 Background ................................................................................................................................ 4 2.1 Game of Go .......................................................................................................................... 4 2.1.1 Rules of Go .................................................................................................................... 4 2.1.2 Go Rank System............................................................................................................ 6 2.1.3 Go: An Ultimate Challenge for Artificial intelligence ............................................... 6 2.2 Computer Go Techniques ................................................................................................... 8 2.2.1 Monte Carlo Tree Search (MCTS) ............................................................................. 8 2.2.2 Upper Confidence Bounds Applied to Trees (UCB applied to trees / UCT) ......... 10 2.2.3 Deep Convolutional Neural Networks (Deep CNN) ................................................ 13 3 Deep Convolutional Neural Networks ................................................................................... 16 3.1 Artificial Neural Network (ANN) .................................................................................... 16 3.2 Deep Convolutional Neural Network (Deep CNN) ........................................................ 19 3.3 Back-propagation Algorithm ........................................................................................... 22 ii 4 Methodology ............................................................................................................................. 26 4.1 Existing Codebases ............................................................................................................ 26 4.1.1 Gnu Go......................................................................................................................... 26 4.1.2 Pachi ............................................................................................................................. 27 4.1.3 GTP (Go Text Protocol) ............................................................................................. 27 4.2.1 Go Data ........................................................................................................................ 28 4.2.2 Deep Convolutional Neural Network ........................................................................ 29 4.2.2.1 Weight Symmetries .................................................................................................. 31 4.2.2.2 Activation Functions ................................................................................................ 31 4.2.2.3 Mini-Batches ............................................................................................................ 32 4.2.3 UCT-simulated Final Board Pattern ........................................................................ 33 5 Experiments and Results ......................................................................................................... 34 6 Conclusion and Future Work ................................................................................................. 39 References .................................................................................................................................... 42 Appendix ...................................................................................................................................... 45 A. Smart Game Format (SGF) ............................................................................................... 45 B. Parameters used in our best DNN ..................................................................................... 48 C. Translational invariance .................................................................................................... 48 iii Table of Figures Figure 1. The standard 19x19 board after several moves ............................................................... 5 Figure 2. Performance ranks in Go, in increasing order of strength from left to right ................... 6 Figure 3. One iteration of a general MCTS approach ..................................................................... 9 Figure 4. Structure of a neural network in the human brain ......................................................... 16 Figure 5. Architecture of an artificial neural network .................................................................. 17 Figure 6. Convolutional Neural Network in Image Recognition .................................................. 20 Figure 7. Features used as inputs to the CNN in the work of Maddison et al. ............................. 21 Figure 8. A CNN with one hidden layer, and three convolutional kernels ................................... 21 Figure 9. The features we used in our deep network .................................................................... 28 Figure 10. 8-fold weight symmetry group .................................................................................... 31 Figure 11. The accuracy comparison over different numbers of trainings (move positions) for different types of experiments ...................................................................................................... 36 Figure 12. The accuracy of ranking top k move comparison over different types of experiments ....................................................................................................................................................... 37 Figure 13. Coordinate system for points and moves in SGF files ................................................ 46 iv 1 Introduction There have been countless developments of artificial intelligence that improved the ability to play games with computers. Computers are now expert players in many popular, two-player games, including Backgammon, Checkers, Chess, and Othello. Still the game of Go, an ancient board game, continues to elude the efforts of computer science researchers. Go is remarkable because of its simple rules but complex strategies. Because the size of the search space in the game is enormous and the game has the long-term influence of moves, creating a move prediction function in Go is a challenging task for computer science researchers. In the past, researchers had failed to combine brute force tree-search with human knowledge or reinforcement learning [5]. The efforts to equalize the performance of computer Go to the performance of other two-player games were in vain, until 2006 when the Monte-Carlo tree search (MCTS) was introduced to the game of Go. The performance
Recommended publications
  • Lesson Plans for Go in Schools
    Go Lesson Plans 1 Lesson Plans for Go in Schools By Gordon E. Castanza, Ed. D. October 19, 2011 Published By Rittenberg Consulting Group 7806 108th St. NW Gig Harbor, WA 98332 253-853-4831 ©Gordon E. Castanza, Ed. D. 10/19/11 DRAFT Go Lesson Plans 2 Table of Contents Acknowledgements ......................................................................................................................... 4 Purpose/Rationale ........................................................................................................................... 5 Lesson Plan One ............................................................................................................................. 7 Basic Ideas .................................................................................................................................. 7 Introduction ............................................................................................................................... 11 The Puzzle ................................................................................................................................. 13 Surround to Capture .................................................................................................................. 14 First Capture Go ........................................................................................................................ 16 Lesson Plan Two ........................................................................................................................... 19 Units &
    [Show full text]
  • CSC321 Lecture 23: Go
    CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 22 Final Exam Monday, April 24, 7-10pm A-O: NR 25 P-Z: ZZ VLAD Covers all lectures, tutorials, homeworks, and programming assignments 1/3 from the first half, 2/3 from the second half If there's a question on this lecture, it will be easy Emphasis on concepts covered in multiple of the above Similar in format and difficulty to the midterm, but about 3x longer Practice exams will be posted Roger Grosse CSC321 Lecture 23: Go 2 / 22 Overview Most of the problem domains we've discussed so far were natural application areas for deep learning (e.g. vision, language) We know they can be done on a neural architecture (i.e. the human brain) The predictions are inherently ambiguous, so we need to find statistical structure Board games are a classic AI domain which relied heavily on sophisticated search techniques with a little bit of machine learning Full observations, deterministic environment | why would we need uncertainty? This lecture is about AlphaGo, DeepMind's Go playing system which took the world by storm in 2016 by defeating the human Go champion Lee Sedol Roger Grosse CSC321 Lecture 23: Go 3 / 22 Overview Some milestones in computer game playing: 1949 | Claude Shannon proposes the idea of game tree search, explaining how games could be solved algorithmically in principle 1951 | Alan Turing writes a chess program that he executes by hand 1956 | Arthur Samuel writes a program that plays checkers better than he does 1968 | An algorithm defeats human novices at Go 1992
    [Show full text]
  • E-Book? Začala Hra
    Nejlepší strategie všech dob 1 www.yohama.cz Nejlepší strategie všech dob Obsah OBSAH DESKOVÁ HRA GO ................................................................................................................................ 3 Úvod ............................................................................................................................................................ 3 Co je to go ................................................................................................................................................... 4 Proč hrát go ................................................................................................................................................. 7 Kde získat go ................................................................................................................................................ 9 Jak začít hrát go ......................................................................................................................................... 10 Kde a s kým hrát go ................................................................................................................................... 12 Jak se zlepšit v go ...................................................................................................................................... 14 Handicapová hra ....................................................................................................................................... 15 Výkonnostní třídy v go a rating hráčů .......................................................................................................
    [Show full text]
  • FUEGO—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search Markus Enzenberger, Martin Müller, Broderick Arneson, and Richard Segal
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 2, NO. 4, DECEMBER 2010 259 FUEGO—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search Markus Enzenberger, Martin Müller, Broderick Arneson, and Richard Segal Abstract—FUEGO is both an open-source software framework with new algorithms that would not be possible otherwise be- and a state-of-the-art program that plays the game of Go. The cause of the cost of implementing a complete state-of-the-art framework supports developing game engines for full-information system. Successful previous examples show the importance of two-player board games, and is used successfully in a substantial open-source software: GNU GO [19] provided the first open- number of projects. The FUEGO Go program became the first program to win a game against a top professional player in 9 9 source Go program with a strength approaching that of the best Go. It has won a number of strong tournaments against other classical programs. It has had a huge impact, attracting dozens programs, and is competitive for 19 19 as well. This paper gives of researchers and hundreds of hobbyists. In Chess, programs an overview of the development and current state of the UEGO F such as GNU CHESS,CRAFTY, and FRUIT have popularized in- project. It describes the reusable components of the software framework and specific algorithms used in the Go engine. novative ideas and provided reference implementations. To give one more example, in the field of domain-independent planning, Index Terms—Computer game playing, Computer Go, FUEGO, systems with publicly available source code such as Hoffmann’s man-machine matches, Monte Carlo tree search, open source soft- ware, software frameworks.
    [Show full text]
  • Move Prediction in Gomoku Using Deep Learning
    VW<RXWK$FDGHPLF$QQXDO&RQIHUHQFHRI&KLQHVH$VVRFLDWLRQRI$XWRPDWLRQ :XKDQ&KLQD1RYHPEHU Move Prediction in Gomoku Using Deep Learning Kun Shao, Dongbin Zhao, Zhentao Tang, and Yuanheng Zhu The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences Beijing 100190, China Emails: [email protected]; [email protected]; [email protected]; [email protected] Abstract—The Gomoku board game is a longstanding chal- lenge for artificial intelligence research. With the development of deep learning, move prediction can help to promote the intelli- gence of board game agents as proven in AlphaGo. Following this idea, we train deep convolutional neural networks by supervised learning to predict the moves made by expert Gomoku players from RenjuNet dataset. We put forward a number of deep neural networks with different architectures and different hyper- parameters to solve this problem. With only the board state as the input, the proposed deep convolutional neural networks are able to recognize some special features of Gomoku and select the most likely next move. The final neural network achieves the accuracy of move prediction of about 42% on the RenjuNet dataset, which reaches the level of expert Gomoku players. In addition, it is promising to generate strong Gomoku agents of human-level with the move prediction as a guide. Keywords—Gomoku; move prediction; deep learning; deep convo- lutional network Fig. 1. Example of positions from a game of Gomoku after 58 moves have passed. It can be seen that the white win the game at last. I.
    [Show full text]
  • GO WINDS Play Over 1000 Professional Games to Reach Recent Sets Have Focused on "How the Pros 1-Dan, It Is Said
    NEW FROM YUTOPIAN ENTERPRISES GO GAMES ON DISK (GOGoD) SOFTWARE GO WINDS Play over 1000 professional games to reach Recent sets have focused on "How the pros 1-dan, it is said. How about 6-dan? Games of play the ...". So far there are sets covering the Go on Disk now offers over 6000 professional "Chinese Fuseki" Volume I (a second volume Volume 2 Number 4 Winter 1999 $3.00 games on disk, games that span the gamut of is in preparation), and "Nirensei", Volumes I go history - featuring players that helped and II. A "Sanrensei" volume is also in define the history. preparation. All these disks typically contain All game collections come with DOS or 300 games. Windows 95 viewing software, and most The latest addition to this series is a collections include the celebrated Go Scorer in "specialty" item - so special GoGoD invented which you can guess the pros' moves as you a new term for it. It is the "Sideways Chinese" play (with hints if necessary) and check your fuseki, which incorporates the Mini-Chinese score. pattern. Very rarely seen in western The star of the collection may well be "Go publications yet played by most of the top Seigen" - the lifetime games (over 800) of pros, this opening is illustrated by over 130 perhaps the century's greatest player, with games from Japan, China and Korea. Over more than 10% commented. "Kitani" 1000 half have brief comments. The next specialty makes an ideal matching set - most of the item in preparation is a set of games featuring lifetime games of his legendary rival, Kitani unusual fusekis - this will include rare New Minoru.
    [Show full text]
  • Curriculum Guide for Go in Schools
    Curriculum Guide 1 Curriculum Guide for Go In Schools by Gordon E. Castanza, Ed. D. October 19, 2011 Published By: Rittenberg Consulting Group 7806 108th St. NW Gig Harbor, WA 98332 253-853-4831 © 2005 by Gordon E. Castanza, Ed. D. Curriculum Guide 2 Table of Contents Acknowledgements ......................................................................................................................... 4 Purpose and Rationale..................................................................................................................... 5 About this curriculum guide ................................................................................................... 7 Introduction ..................................................................................................................................... 8 Overview ................................................................................................................................. 9 Building Go Instructor Capacity ........................................................................................... 10 Developing Relationships and Communicating with the Community ................................. 10 Using Resources Effectively ................................................................................................. 11 Conclusion ............................................................................................................................ 11 Major Trends and Issues ..........................................................................................................
    [Show full text]
  • Strategy Game Programming Projects
    STRATEGY GAME PROGRAMMING PROJECTS Timothy Huang Department of Mathematics and Computer Science Middlebury College Middlebury, VT 05753 (802) 443-2431 [email protected] ABSTRACT In this paper, we show how programming projects centered around the design and construction of computer players for strategy games can play a meaningful role in the educational process, both in and out of the classroom. We describe several game-related projects undertaken by the author in a variety of pedagogical situations, including introductory and advanced courses as well as independent and collaborative research projects. These projects help students to understand and develop advanced data structures and algorithms, to manage and contribute to a large body of existing code, to learn how to work within a team, and to explore areas at the frontier of computer science research. 1. INTRODUCTION In this paper, we show how programming projects centered around the design and construction of computer players for strategy games can play a meaningful role in the educational process, both in and out of the classroom. We describe several game-related projects undertaken by the author in a variety of pedagogical situations, including introductory and advanced courses as well as independent and collaborative research projects. These projects help students to understand and develop advanced data structures and algorithms, to manage and contribute to a large body of existing code, to learn how to work within a team, and to explore areas at the frontier of computer science research. We primarily consider traditional board games such as chess, checkers, backgammon, go, Connect-4, Othello, and Scrabble. Other types of games, including video games and real-time strategy games, might also lead to worthwhile programming projects.
    [Show full text]
  • Collaborative Go
    Distributed Computing Master Thesis COLLABORATIVE GO Alex Hugger December 13, 2011 Supervisor: S. Welten Prof. Dr. R. Wattenhofer Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Abstract The game of Go is getting more and more popular in Europe. Its simple rules and the huge complexity for computer programs make it very interesting for a study about collaborative gaming. The idea of collaborative Go is to play a simple two player game in a setup of teams behaving as one single player instead of one versus one. During this thesis, we implemented a framework allowing to play collaborative Go. The framework is heavily based on the existing Go Text Protocol allowing an inte- gration into already existing Go servers or a competition with existing Go programs. Multiple players are merged through the use of different decision engines into one single player. Two decision engines were implemented based on different interac- tion schemes. One is based on a voting process whereas the other one decides on the final move of a team by applying a Monte Carlo simulation. The final experiments show that collaborative gaming is very interesting and atten- tion attracting for the players. Especially the voting mechanism used in the collab- orative player achieved a high user satisfaction. For an automated decision engine, we found out that it is very important to make the decision process as transparent as possible to all users, as this increases the level of the acceptance even when a bad decision was made. Acknowledgments During the accomplishment of this master thesis, a lot of people supported and guided me to a successful completion of the work.
    [Show full text]
  • Combining Tactical Search and Deep Learning in the Game of Go
    Combining tactical search and deep learning in the game of Go Tristan Cazenave PSL-Universite´ Paris-Dauphine, LAMSADE CNRS UMR 7243, Paris, France [email protected] Abstract Elaborated search algorithms have been developed to solve tactical problems in the game of Go such as capture problems In this paper we experiment with a Deep Convo- [Cazenave, 2003] or life and death problems [Kishimoto and lutional Neural Network for the game of Go. We Muller,¨ 2005]. In this paper we propose to combine tactical show that even if it leads to strong play, it has search algorithms with deep learning. weaknesses at tactical search. We propose to com- Other recent works combine symbolic and deep learn- bine tactical search with Deep Learning to improve ing approaches. For example in image surveillance systems Golois, the resulting Go program. A related work [Maynord et al., 2016] or in systems that combine reasoning is AlphaGo, it combines tactical search with Deep with visual processing [Aditya et al., 2015]. Learning giving as input to the network the results The next section presents our deep learning architecture. of ladders. We propose to extend this further to The third section presents tactical search in the game of Go. other kind of tactical search such as life and death The fourth section details experimental results. search. 2 Deep Learning 1 Introduction In the design of our network we follow previous work [Mad- Deep Learning has been recently used with a lot of success dison et al., 2014; Tian and Zhu, 2015]. Our network is fully in multiple different artificial intelligence tasks.
    [Show full text]
  • (CMPUT) 455 Search, Knowledge, and Simulations
    Computing Science (CMPUT) 455 Search, Knowledge, and Simulations James Wright Department of Computing Science University of Alberta [email protected] Winter 2021 1 455 Today - Lecture 22 • AlphaGo - overview and early versions • Coursework • Work on Assignment 4 • Reading: AlphaGo Zero paper 2 AlphaGo Introduction • High-level overview • History of DeepMind and AlphaGo • AlphaGo components and versions • Performance measurements • Games against humans • Impact, limitations, other applications, future 3 About DeepMind • Founded 2010 as a startup company • Bought by Google in 2014 • Based in London, UK, Edmonton (from 2017), Montreal, Paris • Expertise in Reinforcement Learning, deep learning and search 4 DeepMind and AlphaGo • A DeepMind team developed AlphaGo 2014-17 • Result: Massive advance in playing strength of Go programs • Before AlphaGo: programs about 3 levels below best humans • AlphaGo/Alpha Zero: far surpassed human skill in Go • Now: AlphaGo is retired • Now: Many other super-strong programs, including open source Image source: • https://www.nature.com All are based on AlphaGo, Alpha Zero ideas 5 DeepMind and UAlberta • UAlberta has deep connections • Faculty who work part-time or on leave at DeepMind • Rich Sutton, Michael Bowling, Patrick Pilarski, Csaba Szepesvari (all part time) • Many of our former students and postdocs work at DeepMind • David Silver - UofA PhD, designer of AlphaGo, lead of the DeepMind RL and AlphaGo teams • Aja Huang - UofA postdoc, main AlphaGo programmer • Many from the computer Poker group
    [Show full text]
  • Learning to Play the Game of Go
    Learning to Play the Game of Go James Foulds October 17, 2006 Abstract The problem of creating a successful artificial intelligence game playing program for the game of Go represents an important milestone in the history of computer science, and provides an interesting domain for the development of both new and existing problem-solving methods. In particular, the problem of Go can be used as a benchmark for machine learning techniques. Most commercial Go playing programs use rule-based expert systems, re- lying heavily on manually entered domain knowledge. Due to the complexity of strategy possible in the game, these programs can only play at an amateur level of skill. A more recent approach is to apply machine learning to the prob- lem. Machine learning-based Go playing systems are currently weaker than the rule-based programs, but this is still an active area of research. This project compares the performance of an extensive set of supervised machine learning algorithms in the context of learning from a set of features generated from the common fate graph – a graph representation of a Go playing board. The method is applied to a collection of life-and-death problems and to 9 × 9 games, using a variety of learning algorithms. A comparative study is performed to determine the effectiveness of each learning algorithm in this context. Contents 1 Introduction 4 2 Background 4 2.1 Go................................... 4 2.1.1 DescriptionoftheGame. 5 2.1.2 TheHistoryofGo ...................... 6 2.1.3 Elementary Strategy . 7 2.1.4 Player Rankings and Handicaps . 7 2.1.5 Tsumego ..........................
    [Show full text]