Visualizing and Understanding Policy Networks of Computer Go

Journal of Information Processing Vol.29 347–359 (Apr. 2021) [DOI: 10.2197/ipsjjip.29.347] Regular Paper Visualizing and Understanding Policy Networks of Computer Go Yuanfeng Pang1,a) Takeshi Ito1,b) Received: July 26, 2020, Accepted: January 12, 2021 Abstract: Deep learning for the game of Go achieved considerable success with the victory of AlphaGo against Ke Jie in May 2017. Thus far, there is no clear understanding of why deep learning performs so well in the game of Go. In this paper, we introduce visualization techniques used in image recognition that provide insights into the function of intermediate layers and the operation of the Go policy network. When used as a diagnostic tool, these visualizations enable us to understand what occurs during the training process of policy networks. Further, we introduce a visualization technique that performs a sensitivity analysis of the classifier output by occluding portions of the input Go board, and revealing parts that important for predicting the next move. Further, we attempt to identify important areas through Grad-CAM and combine it with the Go board to provide explanations for next move decisions. Keywords: computer Go, deep learning, visualization, policy network, grad-CAM feated three human Go champions. This is the first time that a 1. Introduction computer program has defeated a human professional player in For the longest time, computer Go has been a significant chal- the full-sized game of Go. lenge in artificial intelligence. Go is difficult because of its high Despite the encouraging fact that deep neural networks are bet- branching factors and subtle board situations that are sensitive to ter at recognizing shapes in the game of Go than Monte Carlo small changes. Owing to a combination of these two causes, a rollouts, there is little insight regarding the internal operation and massive search with a prohibitive amount of resources such as behavior of these complex networks, or how they achieve such Monte Carlo rollouts and Monte Carlo tree search (MCTS) was good performances. Without a clear understanding of how and used. Monte Carlo rollouts sample long sequences of actions why they work, we cannot effectively employ the DCNN-based at high speed for both players using a simple policy. Averag- computer Go and the encouraging progress it made. ing over such rollouts provide an effective position evaluation, Though the Go policy and ImageNet classification networks thus achieving weak amateur level play in Go. The MCTS uses are both convolutional networks, their inputs and outputs are en- Monte Carlo rollouts [1] to estimate the value of each state in tirely different. In this paper, we introduce visualization tech- a search tree. As more simulations are executed, the search tree niques used in image recognition that provide insight into the grows larger and the relevant values become more accurate. How- function of intermediate layers, and we apply them to visualize ever, even with cutting-edge hardware, the simulation still cannot the operation of the Go policy network. When used as a diagnos- achieve the strength required to beat the leading human profes- tic tool, these visualizations allow us to understand what occurs sional player. during the training process. Further, we introduce a visualization Fortunately, since its introduction by Clark and Storkey [2], technique that performs a sensitivity analysis of the classifier out- convolutional networks have demonstrated excellent performance put by occluding portions of the input Go board, revealing only at computer Go. Several studies have reported that convolutional parts that are important for predicting the next move. Further, networks can deliver good performance similar to regular MCTS- we attempt to identify areas through the Grad-CAM and com- based approaches. This idea was extended in the bot named Dark- bine it with Go board to provide explanations for next move deci- forest. Darkforest [3] lies on a deep convolutional neural network sions. Zeiler and Fergus [5] reported that, when used in a diagnos- (DCNN) designed for long-term predictions. Darkforest sub- tic role, these visualization techniques enabled them to identify stantially improves the win rate for pattern matching approaches model architectures that outperformed former models in terms of against MCTS-based approaches, even with looser search bud- the ImageNet classification benchmark. However, in the case of gets. Next, AlphaGo [4] was introduced, which combines MCTS training a policy network, the usefulness of this method remains with a policy and a value network; since 2016, AlphaGo has de- unclear. Based on the problems indicated by the visualization results, we attempt to change the learning rate of the policy network 1 The University of Electro-Communication, Chofu, Tokyo 182–8585, to improve the performance of the policy network. Then, by using Japan the visualization techniques, we explore the changed policy net- a) [email protected] b) [email protected] work to understand what changed during the training processes. c 2021 Information Processing Society of Japan 347 Journal of Information Processing Vol.29 347–359 (Apr. 2021) Our work is divided into three parts: In Section 3, we train image is masked before feeding it to the CNN and a heatmap of a policy network from zero and save the networks at different the probability is then drawn. Matthew and Rob [5] reported that epochs to explain the training process. In Section 4, we apply the there is a distinct drop in the activity within the feature map when visualization techniques used in image recognition to visualize the parts of the original input image corresponding to the pattern the Go policy network for aiding the interpretation of the Go pol- are occluded. When used in a diagnostic role, these visualiza- icy network; further, we analyze what occurs during the training tions allowed them to identify model architectures that outper- process. As these interpretations may prompt ideas for improved form older ones. Chu, Yang, and Tadinada [6] explored residual networks, in Section 5, we present approaches to improve the networks by using visualization and empirical analysis. Further, performance of Go policy networks through the visualized exper- they presented the purpose of residual skip connections. Sel- iment results. varaju and Cogswell [7] introduced a new method (Grad-CAM) This paper makes the following two contributions: for combining feature maps by using a gradient signal that does • Most papers on visualization techniques only focus on the not require any modification of the network architecture. visualization result of a trained convolutional neural net- Visualization techniques for convolutional neural network be- works (classic networks such as AlexNet or VGG). Since havior are applied not only to image recognition tasks but also we trained a new Go policy network with the network in- to artificial intelligence tasks. For example, Laurens, Eliseand, formation from different epochs, we can explain the training and Zeynep analyzed game AI behaviors using visualization tech- process in this work. niques [8]. They visualized the evidence on which the agent bases • Most visualization techniques that we introduced were pre- its decision; further, they explained the importance of producing viously used in only image captioning or visual question an- a justification for a black-box decision agent. swering (VQA). However, based on our visualization re- Recent studies on CNN network visualization have achieved sults, similar results can be achieved even if we use a Go considerable results. The generality of this visualization tech- policy network. Under a limited network architecture, visu- nique suggests that it may perform well in other “visual” domains alization techniques can help researchers focus their efforts such as computer Go. on creating better policy networks for computer Go. 2.2 Policy Network for Computer Go 2. Related Work The most successful current programs in Go are based on 2.1 Visualization MCTS with policy and value networks. The strongest programs Recent work in image recognition has demonstrated the con- such as AlphaGo and Darkforest apply convolutional networks to siderable advantages of using deep convolutional networks over construct a move selection policy, and this is then used to bias the alternative architectures. Meanwhile, there has been progress in exploration when training the value network. AlphaGo achieved understanding how these models work. One relatively simple and a 57% move prediction accuracy using supervised learning based common approach is to apply visual filters. However, this ap- on a database of human professional games [9]. The probabil- proach is limited to the first layer where projections can be made ity that the move of the expert is within the top-5 predictions onto a pixel space. Because we use a direct inner product between of the network is over 87%. Lately, it has become possible to the weights of the convolution layer and the pixel of the image, avoid overfitting to the values by using a combined policy and we can gauge what these filters are looking for by visualizing the value network architecture and a low weight on the value compo- learned weighting from these filters. Clark and Storkey [2] visu- nent. After 72 h, the move prediction accuracy exceeded that of alize the weights of some randomly selected channels from ran- the state-of-the-art technique reported in previous work, reaching domly selected convolution filters of a five-layer convolutional 60.4% on the KGS test set [10]. Meanwhile, some open source neural network trained on the GoGoD dataset. They reported that DNN implementations such as Leela Zero’s networks, which are some filters learn to acquire a symmetric property. However, be- publicly available and have proven performance, achieved an ac- cause visualized filters of intermediate layers are not connected curacy of more than 54%. directly to the input image, they are less interpretable, and in Unlike the input to the image recognition neural network (im- higher layers, alternate methods must be used.

Visualizing and Understanding Policy Networks of Computer Go

Residual Networks for Computer Go

Arxiv:1701.07274V3 [Cs.LG] 15 Jul 2017

ELF: an Extensive, Lightweight and Flexible Research Platform for Real-Time Strategy Games

Improved Policy Networks for Computer Go

Fml-Based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go

Residual Networks for Computer Go Tristan Cazenave

Move Prediction in Gomoku Using Deep Learning

ELF Opengo: an Analysis and Open Reimplementation of Alphazero

AI-FML Agent for Robotic Game of Go and Aiot Real-World Co-Learning Applications

Combining Tactical Search and Deep Learning in the Game of Go

Deep Learning for Go

Gogogo: Improving Deep Neural Network Based Go Playing AI with Residual Networks Xingyu Liu Introduction Network Architecture Experiment Result