中国科技论文在线 Model Checking Go
Total Page:16
File Type:pdf, Size:1020Kb
中国科技论文在线 http://www.paper.edu.cn Model Checking Go# ZHU Weijun* 5 (School of Information Engineering, Zhengzhou University) Abstract: The Go algorithm based on symbolic model checking and the one based on deep learning are complementary. In order to complement one another perfectly, we propose a novel algorithm for the game of Go. First, a conservative version of the algorithm is obtained, by strongly restricting conditions. Second, a bolder version of the algorithm is gotten, by relaxing the conditions. A case study illustrates 10 that the new method can do something more than state of the art one under some circumstances. Key words: AI; Symbolic Model Checking; Go Problem 0 Introduction 15 In March of 2016, a match of game Go between human and computer won the whole world‟s attention. One of the player is professional 9 dan player Lee SeDol, the champion of Fujitsu cup International Go Championship; the other is the famous Go program named “Alpha-Go”. Surprisingly, “Alpha-Go” won four times of five matches in total, which indicates that Artificial Intelligence has made great progress in the field of game Go. 20 Time complexity of Go problem has been proved to be EXPTIME[1] according to the computational complexity theory. Thus, it‟s impossible to search all the states by exhaustive search. In addition, Go requires players‟ perspective of overall situation, far from life/death computation of a block of stones. In January of 2016, Dr. David Silver at computer science department of Toronto University 25 and Dr. Shijie Huang from Google company published a paper in <Nature>, proposing „Alpha-Go‟[2] program by the combination of deep learning, reinforcement learning and Monte Carlo tree search. According to the paper, this program does not guarantee that every move takes the best strategy, but a great number of matches of human experts and self play provide the basis of probability computation and branch cutting of game trees, and participate in the training of deep 30 neural network as large samples. Therefore, “Alpha-Go” program choose strategies that most likely to win instead of searching entire state space. On one hand, pruning some branches which are almost impossible to win in the game tree, brings much efficiency. However, some information may loss during pruning. In some cases, loss of information caused by nondeterministic mechanism is unacceptable. Black 79 of the fourth 35 match between Lee SeDol and Alpha-Go is an example, which directly causes a failure of Alpha-Go. On the other hand, it is possible to search the state space in a certain range, whereas the probability and random computation are unreliable. To address this issue, a Go algorithm based on Symbolic Model Checking (SMC) was proposed, which can reduce the state space produced by the Go algorithm [3]. However, the model-checking-based algorithm cannot learn the playing 40 strategy itself. As a result, the SMC-based Go algorithm is very weak in terms of ability, if it is used individually. Motivated by it, we combine the SMC-based method and an Alpha-Go-liked one to a novel algorithm. And a case demonstrates the comparative advantage of the new method. This is the contribution of this paper. We do not give more details on Alpha-Go algorithm[2], Model checking algorithms[4] and the 45 SMC-based Go algorithm[3] due to the limitation of the space. Foundations: NSFC (No.U1204608) Brief author introduction:Weijun Zhu(1976-), Male, Ph.D., Associate Professor, formal methods and AI. E-mail: [email protected] - 1 - 中国科技论文在线 http://www.paper.edu.cn 1 The new algorithm Yes Yes Play according to The model checking decide whether the suggesting or not the suggesting strategy leads strategy No to adversarial winning situations Alpha-Go gives a suggesting strategy No Play according Yes The model checking decide whether to the winning or not it can leads to our winning strategy situations from the current state Fig. 1 Algorithm 1: the conservative version of the new algorithm Yes Play according Case 2: The model checking decide to the suggesting whether or not the suggesting strategy No strategy leads to adversarial significant interest promotion Play according to the strategy which leads to our significant The policy network of Alpha-Go interest promotion gives a suggesting strategy Pass Not pass No The value network Case 1: The model checking of Alpha-Go Yes decide whether or not there evaluate this exists a strategy which leads to strategy our significant interest promotion from the current state 50 Fig. 2 Algorithm 2: the intervention version of the new algorithm 1.1 Conservative version A algorithm for 19×19 Go based on symbolic model checking cannot fully search all the states within an acceptable CPU time. However, it is feasible to exhaustively search within a given 55 range. We improve the Alpha-GO algorithm as follows. Only if the winning situations and the failing situations is reached after one step, the symbol model checking instead of Alpha-GO will play. Only if the winning situations and the failure situations is reached after several steps, the symbol model checking advises the value networks of Alpha-GO to change the strategy of playing, which will be determined by the value network of Alpha-GO. Otherwise, the symbolic model 60 checking does not interfere with Alpha-GO. In this way, the new algorithm not only maintains the - 2 - 中国科技论文在线 http://www.paper.edu.cn capacity to play and evaluate situations of board, but also avoids the risk of the Alpha-GO algorithm, which plays according to the probability. The principle of the new algorithm i.e., conservative version, is shown in Fig.1. 1.2 Intervention version 65 On the basis of the conservative version of algorithm mentioned above, we can further design a bolder algorithm, i.e., intervention version, in which the intervention condition of symbolic model checking is relaxed. In this version, the symbol model checking will directly replace the Alpha-GO algorithm or advise the latter to change the strategy of playing, not only if our winning situation or the failing situation occurs, but also if our significant interest change. The principle of 70 the new algorithm, i.e., intervention version, is shown in Fig.2. 2 A Case Study (1) The situation at Black 79 (2) The situation at Black 87 75 (3) The correct move of Black 79 given by ninth-dan Mi YuTing Fig.3 The search of the new algorithm: actual combat and change Here is a case study. We analyze the new algorithm by taking the fourth match between Alpha-Go and Lee SeDol as an example. - 3 - 中国科技论文在线 http://www.paper.edu.cn When the game went to the white 78 called “the divine move”, as shown in the white stone 80 with blue dot of Fig.3(1), black 79 give a chance of the white player" which may lead to turnaround of situation[5], as shown in the black stone with pink dot of Fig.3(1). As reported in[6], ninth-dan Mi YuTing gave a correct response strategy for Black at this time. And he said "the divine move does not hold[6], as illustrated in Fig.3(3)”. Furthermore, this conclusion has also been confirmed by ninth-dan Ke Jie, ninth-dan Shi Yue and ninth-dan Luo Xihe and so on. 85 According to Silver's point of view, who is the first author and one of corresponding authors of the original article[2] which presented the Alpha-GO algorithm, the White 78 found an unknown BUG of Alpha-GO algorithm[7]. Furthermore, Hassabis, who is the leader of the Alpha-GO team and another corresponding author of the original article[2] which presented the Alpha-GO algorithm, also believes that, “Alpha-Go believed that its rate of winning achieved 70% until 90 Black 87”[7]. Thus, the value network of Alpha-GO felt that its situation was dominant at the black 79, whereas it felt that its situation was not dominant at the black 87. It coincides with the case 2 in the algorithm 3, that is, the black‟s strategy from 79 to 87 led to the white‟s significant interest change. Algorithm 3 needs to search the state space within eight steps at least, if this algorithm wants 95 to find the turnaround of situation of Black 87, at Black 79. The algorithm needs to search w1 177 positions of the board, if it wants to do some search in 1018 times within eight steps in the scope of w1 , based on the current situation. Therefore, the red frame in Fig.3(2) shows the exhaustive search range centered the current stone with w1 177 , where the stones with red dot show the actual responses from 79 to 86, and the black stone with pink dot is the Black 87. The 100 figure shows that the two sides‟ moves in actual playing from 79 to 87 are all limited to the frame of the red box, which is the search scope of algorithm 3. 3 Conclusion The main achievements of this paper are algorithm 1 and algorithm 2. As a supplement of the MCTS Algorithms, deep learning, big data and super-computing, the model checking technique 105 can avoid the lack of deterministic search of state-of-the-art Go algorithms to a certain extent under some circumstances. Consideration can be given to both the deterministic search and the rate of winning by combining the above techniques. This is the benefit of using the new approach. Acknowledgements This work has been supported by the National Natural Science Foundation of China under 110 Grant No.U1204608, as well as China Postdoctoral Science Foundation under Grant No.2012M511588 and No.2015M572120.