Journal of Information Processing Vol.29 347–359 (Apr. 2021)
[DOI: 10.2197/ipsjjip.29.347] Regular Paper
Visualizing and Understanding Policy Networks of Computer Go
Yuanfeng Pang1,a) Takeshi Ito1,b)
Received: July 26, 2020, Accepted: January 12, 2021
Abstract: Deep learning for the game of Go achieved considerable success with the victory of AlphaGo against Ke Jie in May 2017. Thus far, there is no clear understanding of why deep learning performs so well in the game of Go. In this paper, we introduce visualization techniques used in image recognition that provide insights into the function of intermediate layers and the operation of the Go policy network. When used as a diagnostic tool, these visualizations enable us to understand what occurs during the training process of policy networks. Further, we introduce a visualiza- tion technique that performs a sensitivity analysis of the classifier output by occluding portions of the input Go board, and revealing parts that important for predicting the next move. Further, we attempt to identify important areas through Grad-CAM and combine it with the Go board to provide explanations for next move decisions.
Keywords: computer Go, deep learning, visualization, policy network, grad-CAM
feated three human Go champions. This is the first time that a 1. Introduction computer program has defeated a human professional player in For the longest time, computer Go has been a significant chal- the full-sized game of Go. lenge in artificial intelligence. Go is difficult because of its high Despite the encouraging fact that deep neural networks are bet- branching factors and subtle board situations that are sensitive to ter at recognizing shapes in the game of Go than Monte Carlo small changes. Owing to a combination of these two causes, a rollouts, there is little insight regarding the internal operation and massive search with a prohibitive amount of resources such as behavior of these complex networks, or how they achieve such Monte Carlo rollouts and Monte Carlo tree search (MCTS) was good performances. Without a clear understanding of how and used. Monte Carlo rollouts sample long sequences of actions why they work, we cannot effectively employ the DCNN-based at high speed for both players using a simple policy. Averag- computer Go and the encouraging progress it made. ing over such rollouts provide an effective position evaluation, Though the Go policy and ImageNet classification networks thus achieving weak amateur level play in Go. The MCTS uses are both convolutional networks, their inputs and outputs are en- Monte Carlo rollouts [1] to estimate the value of each state in tirely different. In this paper, we introduce visualization tech- a search tree. As more simulations are executed, the search tree niques used in image recognition that provide insight into the grows larger and the relevant values become more accurate. How- function of intermediate layers, and we apply them to visualize ever, even with cutting-edge hardware, the simulation still cannot the operation of the Go policy network. When used as a diagnos- achieve the strength required to beat the leading human profes- tic tool, these visualizations allow us to understand what occurs sional player. during the training process. Further, we introduce a visualization Fortunately, since its introduction by Clark and Storkey [2], technique that performs a sensitivity analysis of the classifier out- convolutional networks have demonstrated excellent performance put by occluding portions of the input Go board, revealing only at computer Go. Several studies have reported that convolutional parts that are important for predicting the next move. Further, networks can deliver good performance similar to regular MCTS- we attempt to identify areas through the Grad-CAM and com- based approaches. This idea was extended in the bot named Dark- bine it with Go board to provide explanations for next move deci- forest. Darkforest [3] lies on a deep convolutional neural network sions. Zeiler and Fergus [5] reported that, when used in a diagnos- (DCNN) designed for long-term predictions. Darkforest sub- tic role, these visualization techniques enabled them to identify stantially improves the win rate for pattern matching approaches model architectures that outperformed former models in terms of against MCTS-based approaches, even with looser search bud- the ImageNet classification benchmark. However, in the case of gets. Next, AlphaGo [4] was introduced, which combines MCTS training a policy network, the usefulness of this method remains with a policy and a value network; since 2016, AlphaGo has de- unclear. Based on the problems indicated by the visualization re- sults, we attempt to change the learning rate of the policy network 1 The University of Electro-Communication, Chofu, Tokyo 182–8585, to improve the performance of the policy network. Then, by using Japan the visualization techniques, we explore the changed policy net- a) [email protected] b) [email protected] work to understand what changed during the training processes.