Visualizing and Understanding Policy Networks of Computer Go

Total Page:16

File Type:pdf, Size:1020Kb

Visualizing and Understanding Policy Networks of Computer Go Journal of Information Processing Vol.29 347–359 (Apr. 2021) [DOI: 10.2197/ipsjjip.29.347] Regular Paper Visualizing and Understanding Policy Networks of Computer Go Yuanfeng Pang1,a) Takeshi Ito1,b) Received: July 26, 2020, Accepted: January 12, 2021 Abstract: Deep learning for the game of Go achieved considerable success with the victory of AlphaGo against Ke Jie in May 2017. Thus far, there is no clear understanding of why deep learning performs so well in the game of Go. In this paper, we introduce visualization techniques used in image recognition that provide insights into the function of intermediate layers and the operation of the Go policy network. When used as a diagnostic tool, these visualizations enable us to understand what occurs during the training process of policy networks. Further, we introduce a visualiza- tion technique that performs a sensitivity analysis of the classifier output by occluding portions of the input Go board, and revealing parts that important for predicting the next move. Further, we attempt to identify important areas through Grad-CAM and combine it with the Go board to provide explanations for next move decisions. Keywords: computer Go, deep learning, visualization, policy network, grad-CAM feated three human Go champions. This is the first time that a 1. Introduction computer program has defeated a human professional player in For the longest time, computer Go has been a significant chal- the full-sized game of Go. lenge in artificial intelligence. Go is difficult because of its high Despite the encouraging fact that deep neural networks are bet- branching factors and subtle board situations that are sensitive to ter at recognizing shapes in the game of Go than Monte Carlo small changes. Owing to a combination of these two causes, a rollouts, there is little insight regarding the internal operation and massive search with a prohibitive amount of resources such as behavior of these complex networks, or how they achieve such Monte Carlo rollouts and Monte Carlo tree search (MCTS) was good performances. Without a clear understanding of how and used. Monte Carlo rollouts sample long sequences of actions why they work, we cannot effectively employ the DCNN-based at high speed for both players using a simple policy. Averag- computer Go and the encouraging progress it made. ing over such rollouts provide an effective position evaluation, Though the Go policy and ImageNet classification networks thus achieving weak amateur level play in Go. The MCTS uses are both convolutional networks, their inputs and outputs are en- Monte Carlo rollouts [1] to estimate the value of each state in tirely different. In this paper, we introduce visualization tech- a search tree. As more simulations are executed, the search tree niques used in image recognition that provide insight into the grows larger and the relevant values become more accurate. How- function of intermediate layers, and we apply them to visualize ever, even with cutting-edge hardware, the simulation still cannot the operation of the Go policy network. When used as a diagnos- achieve the strength required to beat the leading human profes- tic tool, these visualizations allow us to understand what occurs sional player. during the training process. Further, we introduce a visualization Fortunately, since its introduction by Clark and Storkey [2], technique that performs a sensitivity analysis of the classifier out- convolutional networks have demonstrated excellent performance put by occluding portions of the input Go board, revealing only at computer Go. Several studies have reported that convolutional parts that are important for predicting the next move. Further, networks can deliver good performance similar to regular MCTS- we attempt to identify areas through the Grad-CAM and com- based approaches. This idea was extended in the bot named Dark- bine it with Go board to provide explanations for next move deci- forest. Darkforest [3] lies on a deep convolutional neural network sions. Zeiler and Fergus [5] reported that, when used in a diagnos- (DCNN) designed for long-term predictions. Darkforest sub- tic role, these visualization techniques enabled them to identify stantially improves the win rate for pattern matching approaches model architectures that outperformed former models in terms of against MCTS-based approaches, even with looser search bud- the ImageNet classification benchmark. However, in the case of gets. Next, AlphaGo [4] was introduced, which combines MCTS training a policy network, the usefulness of this method remains with a policy and a value network; since 2016, AlphaGo has de- unclear. Based on the problems indicated by the visualization re- sults, we attempt to change the learning rate of the policy network 1 The University of Electro-Communication, Chofu, Tokyo 182–8585, to improve the performance of the policy network. Then, by using Japan the visualization techniques, we explore the changed policy net- a) [email protected] b) [email protected] work to understand what changed during the training processes. c 2021 Information Processing Society of Japan 347 Journal of Information Processing Vol.29 347–359 (Apr. 2021) Our work is divided into three parts: In Section 3, we train image is masked before feeding it to the CNN and a heatmap of a policy network from zero and save the networks at different the probability is then drawn. Matthew and Rob [5] reported that epochs to explain the training process. In Section 4, we apply the there is a distinct drop in the activity within the feature map when visualization techniques used in image recognition to visualize the parts of the original input image corresponding to the pattern the Go policy network for aiding the interpretation of the Go pol- are occluded. When used in a diagnostic role, these visualiza- icy network; further, we analyze what occurs during the training tions allowed them to identify model architectures that outper- process. As these interpretations may prompt ideas for improved form older ones. Chu, Yang, and Tadinada [6] explored residual networks, in Section 5, we present approaches to improve the networks by using visualization and empirical analysis. Further, performance of Go policy networks through the visualized exper- they presented the purpose of residual skip connections. Sel- iment results. varaju and Cogswell [7] introduced a new method (Grad-CAM) This paper makes the following two contributions: for combining feature maps by using a gradient signal that does • Most papers on visualization techniques only focus on the not require any modification of the network architecture. visualization result of a trained convolutional neural net- Visualization techniques for convolutional neural network be- works (classic networks such as AlexNet or VGG). Since havior are applied not only to image recognition tasks but also we trained a new Go policy network with the network in- to artificial intelligence tasks. For example, Laurens, Eliseand, formation from different epochs, we can explain the training and Zeynep analyzed game AI behaviors using visualization tech- process in this work. niques [8]. They visualized the evidence on which the agent bases • Most visualization techniques that we introduced were pre- its decision; further, they explained the importance of producing viously used in only image captioning or visual question an- a justification for a black-box decision agent. swering (VQA). However, based on our visualization re- Recent studies on CNN network visualization have achieved sults, similar results can be achieved even if we use a Go considerable results. The generality of this visualization tech- policy network. Under a limited network architecture, visu- nique suggests that it may perform well in other “visual” domains alization techniques can help researchers focus their efforts such as computer Go. on creating better policy networks for computer Go. 2.2 Policy Network for Computer Go 2. Related Work The most successful current programs in Go are based on 2.1 Visualization MCTS with policy and value networks. The strongest programs Recent work in image recognition has demonstrated the con- such as AlphaGo and Darkforest apply convolutional networks to siderable advantages of using deep convolutional networks over construct a move selection policy, and this is then used to bias the alternative architectures. Meanwhile, there has been progress in exploration when training the value network. AlphaGo achieved understanding how these models work. One relatively simple and a 57% move prediction accuracy using supervised learning based common approach is to apply visual filters. However, this ap- on a database of human professional games [9]. The probabil- proach is limited to the first layer where projections can be made ity that the move of the expert is within the top-5 predictions onto a pixel space. Because we use a direct inner product between of the network is over 87%. Lately, it has become possible to the weights of the convolution layer and the pixel of the image, avoid overfitting to the values by using a combined policy and we can gauge what these filters are looking for by visualizing the value network architecture and a low weight on the value compo- learned weighting from these filters. Clark and Storkey [2] visu- nent. After 72 h, the move prediction accuracy exceeded that of alize the weights of some randomly selected channels from ran- the state-of-the-art technique reported in previous work, reaching domly selected convolution filters of a five-layer convolutional 60.4% on the KGS test set [10]. Meanwhile, some open source neural network trained on the GoGoD dataset. They reported that DNN implementations such as Leela Zero’s networks, which are some filters learn to acquire a symmetric property. However, be- publicly available and have proven performance, achieved an ac- cause visualized filters of intermediate layers are not connected curacy of more than 54%. directly to the input image, they are less interpretable, and in Unlike the input to the image recognition neural network (im- higher layers, alternate methods must be used.
Recommended publications
  • Residual Networks for Computer Go
    IEEE TCIAIG 1 Residual Networks for Computer Go Tristan Cazenave Universite´ Paris-Dauphine, PSL Research University, CNRS, LAMSADE, 75016 PARIS, FRANCE Deep Learning for the game of Go recently had a tremendous success with the victory of AlphaGo against Lee Sedol in March 2016. We propose to use residual networks so as to improve the training of a policy network for computer Go. Training is faster than with usual convolutional networks and residual networks achieve high accuracy on our test set and a 4 dan level. Index Terms—Deep Learning, Computer Go, Residual Networks. I. INTRODUCTION Input EEP Learning for the game of Go with convolutional D neural networks has been addressed by Clark and Storkey [1]. It has been further improved by using larger networks [2]. Learning multiple moves in a row instead of only one Convolution move has also been shown to improve the playing strength of Go playing programs that choose moves according to a deep neural network [3]. Then came AlphaGo [4] that combines ReLU Monte Carlo Tree Search (MCTS) with a policy and a value network. Deep neural networks are good at recognizing shapes in the game of Go. However they have weaknesses at tactical search Output such as ladders and life and death. The way it is handled in AlphaGo is to give as input to the network the results of Fig. 1. The usual layer for computer Go. ladders. Reading ladders is not enough to understand more complex problems that require search. So AlphaGo combines deep networks with MCTS [5]. It learns a value network the last section concludes.
    [Show full text]
  • Arxiv:1701.07274V3 [Cs.LG] 15 Jul 2017
    DEEP REINFORCEMENT LEARNING:AN OVERVIEW Yuxi Li ([email protected]) ABSTRACT We give an overview of recent exciting achievements of deep reinforcement learn- ing (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value func- tion, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including atten- tion and memory, in particular, differentiable neural computer (DNC), unsuper- vised learning, transfer learning, semi-supervised learning, hierarchical RL, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, natural language processing, including dialogue systems (a.k.a. chatbots), machine translation, and text generation, computer vi- sion, neural architecture design, business management, finance, healthcare, Indus- try 4.0, smart grid, intelligent transportation systems, and computer systems. We mention topics not reviewed yet. After listing a collection of RL resources, we present a brief summary, and close with discussions. arXiv:1701.07274v3 [cs.LG] 15 Jul 2017 1 CONTENTS 1 Introduction5 2 Background6 2.1 Machine Learning . .7 2.2 Deep Learning . .8 2.3 Reinforcement Learning . .9 2.3.1 Problem Setup . .9 2.3.2 Value Function . .9 2.3.3 Temporal Difference Learning . .9 2.3.4 Multi-step Bootstrapping . 10 2.3.5 Function Approximation . 11 2.3.6 Policy Optimization . 12 2.3.7 Deep Reinforcement Learning . 12 2.3.8 RL Parlance . 13 2.3.9 Brief Summary .
    [Show full text]
  • ELF: an Extensive, Lightweight and Flexible Research Platform for Real-Time Strategy Games
    ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games Yuandong Tian1 Qucheng Gong1 Wenling Shang2 Yuxin Wu1 C. Lawrence Zitnick1 1Facebook AI Research 2Oculus 1fyuandong, qucheng, yuxinwu, [email protected] [email protected] Abstract In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environ- ments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a minia- ture version of StarCraft, captures key game dynamics and runs at 40K frame- per-second (FPS) per core on a laptop. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end- to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environ- ments like ALE [4]. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU [17] and Batch Normalization [11] cou- pled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than 70% of the time in the full game of Mini-RTS. Strong per- formance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open sourced at https://github.com/facebookresearch/ELF. 1 Introduction Game environments are commonly used for research in Reinforcement Learning (RL), i.e.
    [Show full text]
  • Improved Policy Networks for Computer Go
    Improved Policy Networks for Computer Go Tristan Cazenave Universite´ Paris-Dauphine, PSL Research University, CNRS, LAMSADE, PARIS, FRANCE Abstract. Golois uses residual policy networks to play Go. Two improvements to these residual policy networks are proposed and tested. The first one is to use three output planes. The second one is to add Spatial Batch Normalization. 1 Introduction Deep Learning for the game of Go with convolutional neural networks has been ad- dressed by [2]. It has been further improved using larger networks [7, 10]. AlphaGo [9] combines Monte Carlo Tree Search with a policy and a value network. Residual Networks improve the training of very deep networks [4]. These networks can gain accuracy from considerably increased depth. On the ImageNet dataset a 152 layers networks achieves 3.57% error. It won the 1st place on the ILSVRC 2015 classifi- cation task. The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks. Residual networks were recently successfully adapted to computer Go [1]. As a follow up to this paper, we propose improvements to residual networks for computer Go. The second section details different proposed improvements to policy networks for computer Go, the third section gives experimental results, and the last section con- cludes. 2 Proposed Improvements We propose two improvements for policy networks. The first improvement is to use multiple output planes as in DarkForest. The second improvement is to use Spatial Batch Normalization. 2.1 Multiple Output Planes In DarkForest [10] training with multiple output planes containing the next three moves to play has been shown to improve the level of play of a usual policy network with 13 layers.
    [Show full text]
  • Fml-Based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go
    Accepted for publication in International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems in July, 2017 FML-BASED DYNAMIC ASSESSMENT AGENT FOR HUMAN-MACHINE COOPERATIVE SYSTEM ON GAME OF GO CHANG-SHING LEE* MEI-HUI WANG, SHENG-CHI YANG Department of Computer Science and Information Engineering, National University of Tainan, Tainan, Taiwan *[email protected], [email protected], [email protected] PI-HSIA HUNG, SU-WEI LIN Department of Education, National University of Tainan, Tainan, Taiwan [email protected], [email protected] NAN SHUO, NAOYUKI KUBOTA Dept. of System Design, Tokyo Metropolitan University, Japan [email protected], [email protected] CHUN-HSUN CHOU, PING-CHIANG CHOU Haifong Weiqi Academy, Taiwan [email protected], [email protected] CHIA-HSIU KAO Department of Computer Science and Information Engineering, National University of Tainan, Tainan, Taiwan [email protected] Received (received date) Revised (revised date) Accepted (accepted date) In this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML- based Dynamic Assessment Agent (FDAA), and we present an FML-based Human–Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelligent game bot is based on the open-source code of Facebook’s Darkforest, and it features a representational state transfer application programming interface mechanism. The proximal development agent contains a dynamic assessment mechanism, a GoSocket mechanism, and an FML engine with a fuzzy knowledge base and rule base.
    [Show full text]
  • Residual Networks for Computer Go Tristan Cazenave
    Residual Networks for Computer Go Tristan Cazenave To cite this version: Tristan Cazenave. Residual Networks for Computer Go. IEEE Transactions on Games, Institute of Electrical and Electronics Engineers, 2018, 10 (1), 10.1109/TCIAIG.2017.2681042. hal-02098330 HAL Id: hal-02098330 https://hal.archives-ouvertes.fr/hal-02098330 Submitted on 12 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. IEEE TCIAIG 1 Residual Networks for Computer Go Tristan Cazenave Universite´ Paris-Dauphine, PSL Research University, CNRS, LAMSADE, 75016 PARIS, FRANCE Deep Learning for the game of Go recently had a tremendous success with the victory of AlphaGo against Lee Sedol in March 2016. We propose to use residual networks so as to improve the training of a policy network for computer Go. Training is faster than with usual convolutional networks and residual networks achieve high accuracy on our test set and a 4 dan level. Index Terms—Deep Learning, Computer Go, Residual Networks. I. INTRODUCTION Input EEP Learning for the game of Go with convolutional D neural networks has been addressed by Clark and Storkey [1]. It has been further improved by using larger networks [2].
    [Show full text]
  • Move Prediction in Gomoku Using Deep Learning
    VW<RXWK$FDGHPLF$QQXDO&RQIHUHQFHRI&KLQHVH$VVRFLDWLRQRI$XWRPDWLRQ :XKDQ&KLQD1RYHPEHU Move Prediction in Gomoku Using Deep Learning Kun Shao, Dongbin Zhao, Zhentao Tang, and Yuanheng Zhu The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences Beijing 100190, China Emails: [email protected]; [email protected]; [email protected]; [email protected] Abstract—The Gomoku board game is a longstanding chal- lenge for artificial intelligence research. With the development of deep learning, move prediction can help to promote the intelli- gence of board game agents as proven in AlphaGo. Following this idea, we train deep convolutional neural networks by supervised learning to predict the moves made by expert Gomoku players from RenjuNet dataset. We put forward a number of deep neural networks with different architectures and different hyper- parameters to solve this problem. With only the board state as the input, the proposed deep convolutional neural networks are able to recognize some special features of Gomoku and select the most likely next move. The final neural network achieves the accuracy of move prediction of about 42% on the RenjuNet dataset, which reaches the level of expert Gomoku players. In addition, it is promising to generate strong Gomoku agents of human-level with the move prediction as a guide. Keywords—Gomoku; move prediction; deep learning; deep convo- lutional network Fig. 1. Example of positions from a game of Gomoku after 58 moves have passed. It can be seen that the white win the game at last. I.
    [Show full text]
  • ELF Opengo: an Analysis and Open Reimplementation of Alphazero
    ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero Yuandong Tian 1 Jerry Ma * 1 Qucheng Gong * 1 Shubho Sengupta * 1 Zhuoyuan Chen 1 James Pinkerton 1 C. Lawrence Zitnick 1 Abstract However, these advances in playing ability come at signifi- The AlphaGo, AlphaGo Zero, and AlphaZero cant computational expense. A single training run requires series of algorithms are remarkable demonstra- millions of selfplay games and days of training on thousands tions of deep reinforcement learning’s capabili- of TPUs, which is an unattainable level of compute for the ties, achieving superhuman performance in the majority of the research community. When combined with complex game of Go with progressively increas- the unavailability of code and models, the result is that the ing autonomy. However, many obstacles remain approach is very difficult, if not impossible, to reproduce, in the understanding of and usability of these study, improve upon, and extend. promising approaches by the research commu- In this paper, we propose ELF OpenGo, an open-source nity. Toward elucidating unresolved mysteries reimplementation of the AlphaZero (Silver et al., 2018) and facilitating future research, we propose ELF algorithm for the game of Go. We then apply ELF OpenGo OpenGo, an open-source reimplementation of the toward the following three additional contributions. AlphaZero algorithm. ELF OpenGo is the first open-source Go AI to convincingly demonstrate First, we train a superhuman model for ELF OpenGo. Af- superhuman performance with a perfect (20:0) ter running our AlphaZero-style training software on 2,000 record against global top professionals. We ap- GPUs for 9 days, our 20-block model has achieved super- ply ELF OpenGo to conduct extensive ablation human performance that is arguably comparable to the 20- studies, and to identify and analyze numerous in- block models described in Silver et al.(2017) and Silver teresting phenomena in both the model training et al.(2018).
    [Show full text]
  • AI-FML Agent for Robotic Game of Go and Aiot Real-World Co-Learning Applications
    AI-FML Agent for Robotic Game of Go and AIoT Real-World Co-Learning Applications Chang-Shing Lee, Yi-Lin Tsai, Mei-Hui Wang, Wen-Kai Kuan, Zong-Han Ciou Naoyuki Kubota Dept. of Computer Science and Information Engineering Dept. of Mechanical Sytems Engineering National University of Tainan Tokyo Metropolitan University Tainan, Tawain Tokyo, Japan [email protected] [email protected] Abstract—In this paper, we propose an AI-FML agent for IEEE CEC 2019 and FUZZ-IEEE 2019. The goal of the robotic game of Go and AIoT real-world co-learning applications. competition is to understand the basic concepts of an FML- The fuzzy machine learning mechanisms are adopted in the based fuzzy inference system, to use the FML intelligent proposed model, including fuzzy markup language (FML)-based decision tool to establish the knowledge base and rule base of genetic learning (GFML), eXtreme Gradient Boost (XGBoost), the fuzzy inference system, and to optimize the FML knowledge and a seven-layered deep fuzzy neural network (DFNN) with base and rule base through the methodologies of evolutionary backpropagation learning, to predict the win rate of the game of computation and machine learning [2]. Go as Black or White. This paper uses Google AlphaGo Master sixty games as the dataset to evaluate the performance of the fuzzy Machine learning has been a hot topic in research and machine learning, and the desired output dataset were predicted industry and new methodologies keep developing all the time. by Facebook AI Research (FAIR) ELF Open Go AI bot. In Regression, ensemble methods, and deep learning are important addition, we use IEEE 1855 standard for FML to describe the machine learning methods for data scientists [10].
    [Show full text]
  • Combining Tactical Search and Deep Learning in the Game of Go
    Combining tactical search and deep learning in the game of Go Tristan Cazenave PSL-Universite´ Paris-Dauphine, LAMSADE CNRS UMR 7243, Paris, France [email protected] Abstract Elaborated search algorithms have been developed to solve tactical problems in the game of Go such as capture problems In this paper we experiment with a Deep Convo- [Cazenave, 2003] or life and death problems [Kishimoto and lutional Neural Network for the game of Go. We Muller,¨ 2005]. In this paper we propose to combine tactical show that even if it leads to strong play, it has search algorithms with deep learning. weaknesses at tactical search. We propose to com- Other recent works combine symbolic and deep learn- bine tactical search with Deep Learning to improve ing approaches. For example in image surveillance systems Golois, the resulting Go program. A related work [Maynord et al., 2016] or in systems that combine reasoning is AlphaGo, it combines tactical search with Deep with visual processing [Aditya et al., 2015]. Learning giving as input to the network the results The next section presents our deep learning architecture. of ladders. We propose to extend this further to The third section presents tactical search in the game of Go. other kind of tactical search such as life and death The fourth section details experimental results. search. 2 Deep Learning 1 Introduction In the design of our network we follow previous work [Mad- Deep Learning has been recently used with a lot of success dison et al., 2014; Tian and Zhu, 2015]. Our network is fully in multiple different artificial intelligence tasks.
    [Show full text]
  • Deep Learning for Go
    Research Collection Bachelor Thesis Deep Learning for Go Author(s): Rosenthal, Jonathan Publication Date: 2016 Permanent Link: https://doi.org/10.3929/ethz-a-010891076 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Deep Learning for Go Bachelor Thesis Jonathan Rosenthal June 18, 2016 Advisors: Prof. Dr. Thomas Hofmann, Dr. Martin Jaggi, Yannic Kilcher Department of Computer Science, ETH Z¨urich Contents Contentsi 1 Introduction1 1.1 Overview of Thesis......................... 1 1.2 Jorogo Code Base and Contributions............... 2 2 The Game of Go5 2.1 Etymology and History ...................... 5 2.2 Rules of Go ............................. 6 2.2.1 Game Terms......................... 6 2.2.2 Legal Moves......................... 7 2.2.3 Scoring............................ 8 3 Computer Chess vs Computer Go 11 3.1 Available Resources......................... 11 3.2 Open Source Projects........................ 11 3.3 Branching Factor .......................... 12 3.4 Evaluation Heuristics........................ 13 3.5 Machine Learning.......................... 13 4 Search Algorithms 15 4.1 Minimax............................... 15 4.1.1 Iterative Deepening .................... 16 4.2 Alpha-Beta.............................. 17 4.2.1 Move Ordering Analysis.................. 18 4.2.2 History Heuristic...................... 18 4.3 Probability Based Search...................... 19 4.3.1 Performance......................... 20 4.4 Monte-Carlo Tree Search...................... 21 5 Evaluation Algorithms 23 i Contents 5.1 Heuristic Based........................... 23 5.1.1 Bouzy’s Algorithm..................... 24 5.2 Final Positions............................ 24 5.3 Monte-Carlo............................. 25 5.4 Neural Network........................... 25 6 Neural Network Architectures 27 6.1 Networks in Giraffe........................
    [Show full text]
  • Gogogo: Improving Deep Neural Network Based Go Playing AI with Residual Networks Xingyu Liu Introduction Network Architecture Experiment Result
    GoGoGo: Improving Deep Neural Network Based Go Playing AI with Residual Networks Xingyu Liu Introduction Network Architecture Experiment Result • Go playing AIs using Traditional Search: Input (19x19x5) Input (19x19x5) Data • Training Accuracy ~ 32% GNU Go, Pachi, Fuego, Zen etc. • Testing Accuracy ~ 26% CONV3, 64 CONV3, 64 CONV3 • Powered by Deep Learning: Batch Norm Supervised Learning Training Loss CONV3, 64 CONV3, 64 Zen → Deep Zen Go, darkforest, AlphaGo ReLU 7 6 • Goal: From by Vanilla CNN to ResNets CONV3, 64 CONV3, 64 CONV3 5 Batch Norm 4 CONV3, 64 CONV3, 64 3 Eltwise Add Training Methodology and Data 2 CONV3, 64 CONV3, 64 1 ReLU 0 1 365 183 547 729 911 1275 2185 5461 6371 7281 8191 1457 1639 1821 2003 2367 2549 2731 2913 3095 3277 3459 3641 3823 4005 4187 4369 4551 4733 4915 5097 5279 5643 5825 6007 6189 6553 6735 6917 7099 7463 7645 7827 8009 8373 8555 8737 SL on RL on RL on 1093 CONV3, 64 CONV3, 64 Policy Policy Value (c) Residual Module /100 batches Network Network Network CONV3, 64 CONV3, 64 • Use Ing Chang-ki rule CONV3, 64 CONV3, 64 Board State + Ko is Game State, No need to Hyperparameters Value CONV3, 64 CONV3, 64 Base learning rate 2E-4 remember the number of captured stones Decay Policy Exp CONV3, 64 CONV3, 64 • From Kifu to Input Feature Maps Decay Rate 0.95 Channels: 1) Space Positions; 2) Black Positions; 3) CONV3, 64 CONV3, 64 Decay Step (kifu) 200 White Positions; 4) Current Player; 5) Ko Positions Loss Function Softmax CONV3, 64 CONV3, 64 (b) Hyperparameters Fig.
    [Show full text]