中国科技论文在线 http://www.paper.edu.cn
Model Checking Go# ZHU Weijun* 5 (School of Information Engineering, Zhengzhou University) Abstract: The Go algorithm based on symbolic model checking and the one based on deep learning are complementary. In order to complement one another perfectly, we propose a novel algorithm for the game of Go. First, a conservative version of the algorithm is obtained, by strongly restricting conditions. Second, a bolder version of the algorithm is gotten, by relaxing the conditions. A case study illustrates 10 that the new method can do something more than state of the art one under some circumstances. Key words: AI; Symbolic Model Checking; Go Problem
0 Introduction 15 In March of 2016, a match of game Go between human and computer won the whole world‟s attention. One of the player is professional 9 dan player Lee SeDol, the champion of Fujitsu cup International Go Championship; the other is the famous Go program named “Alpha-Go”. Surprisingly, “Alpha-Go” won four times of five matches in total, which indicates that Artificial Intelligence has made great progress in the field of game Go. 20 Time complexity of Go problem has been proved to be EXPTIME[1] according to the computational complexity theory. Thus, it‟s impossible to search all the states by exhaustive search. In addition, Go requires players‟ perspective of overall situation, far from life/death computation of a block of stones. In January of 2016, Dr. David Silver at computer science department of Toronto University 25 and Dr. Shijie Huang from Google company published a paper in
Foundations: NSFC (No.U1204608) Brief author introduction:Weijun Zhu(1976-), Male, Ph.D., Associate Professor, formal methods and AI. E-mail: [email protected]
- 1 - 中国科技论文在线 http://www.paper.edu.cn
1 The new algorithm
Yes
Yes Play according to The model checking decide whether the suggesting or not the suggesting strategy leads strategy No to adversarial winning situations
Alpha-Go gives a suggesting strategy
No
Play according Yes The model checking decide whether to the winning or not it can leads to our winning strategy situations from the current state
Fig. 1 Algorithm 1: the conservative version of the new algorithm
Yes
Play according Case 2: The model checking decide to the suggesting whether or not the suggesting strategy No strategy leads to adversarial significant interest promotion
Play according to the strategy which leads to our significant The policy network of Alpha-Go interest promotion gives a suggesting strategy
Pass Not pass No
The value network Case 1: The model checking of Alpha-Go Yes decide whether or not there evaluate this exists a strategy which leads to strategy our significant interest promotion from the current state
50 Fig. 2 Algorithm 2: the intervention version of the new algorithm
1.1 Conservative version A algorithm for 19×19 Go based on symbolic model checking cannot fully search all the states within an acceptable CPU time. However, it is feasible to exhaustively search within a given 55 range. We improve the Alpha-GO algorithm as follows. Only if the winning situations and the failing situations is reached after one step, the symbol model checking instead of Alpha-GO will play. Only if the winning situations and the failure situations is reached after several steps, the symbol model checking advises the value networks of Alpha-GO to change the strategy of playing, which will be determined by the value network of Alpha-GO. Otherwise, the symbolic model 60 checking does not interfere with Alpha-GO. In this way, the new algorithm not only maintains the
- 2 - 中国科技论文在线 http://www.paper.edu.cn
capacity to play and evaluate situations of board, but also avoids the risk of the Alpha-GO algorithm, which plays according to the probability. The principle of the new algorithm i.e., conservative version, is shown in Fig.1. 1.2 Intervention version 65 On the basis of the conservative version of algorithm mentioned above, we can further design a bolder algorithm, i.e., intervention version, in which the intervention condition of symbolic model checking is relaxed. In this version, the symbol model checking will directly replace the Alpha-GO algorithm or advise the latter to change the strategy of playing, not only if our winning situation or the failing situation occurs, but also if our significant interest change. The principle of 70 the new algorithm, i.e., intervention version, is shown in Fig.2. 2 A Case Study
(1) The situation at Black 79 (2) The situation at Black 87
75 (3) The correct move of Black 79 given by ninth-dan Mi YuTing Fig.3 The search of the new algorithm: actual combat and change Here is a case study. We analyze the new algorithm by taking the fourth match between Alpha-Go and Lee SeDol as an example.
- 3 - 中国科技论文在线 http://www.paper.edu.cn
When the game went to the white 78 called “the divine move”, as shown in the white stone 80 with blue dot of Fig.3(1), black 79 give a chance of the white player" which may lead to turnaround of situation[5], as shown in the black stone with pink dot of Fig.3(1). As reported in[6], ninth-dan Mi YuTing gave a correct response strategy for Black at this time. And he said "the divine move does not hold[6], as illustrated in Fig.3(3)”. Furthermore, this conclusion has also been confirmed by ninth-dan Ke Jie, ninth-dan Shi Yue and ninth-dan Luo Xihe and so on. 85 According to Silver's point of view, who is the first author and one of corresponding authors of the original article[2] which presented the Alpha-GO algorithm, the White 78 found an unknown BUG of Alpha-GO algorithm[7]. Furthermore, Hassabis, who is the leader of the Alpha-GO team and another corresponding author of the original article[2] which presented the Alpha-GO algorithm, also believes that, “Alpha-Go believed that its rate of winning achieved 70% until 90 Black 87”[7]. Thus, the value network of Alpha-GO felt that its situation was dominant at the black 79, whereas it felt that its situation was not dominant at the black 87. It coincides with the case 2 in the algorithm 3, that is, the black‟s strategy from 79 to 87 led to the white‟s significant interest change. Algorithm 3 needs to search the state space within eight steps at least, if this algorithm wants 95 to find the turnaround of situation of Black 87, at Black 79. The algorithm needs to search
w1 177 positions of the board, if it wants to do some search in 1018 times within eight steps
in the scope of w1 , based on the current situation. Therefore, the red frame in Fig.3(2) shows the
exhaustive search range centered the current stone with w1 177 , where the stones with red dot show the actual responses from 79 to 86, and the black stone with pink dot is the Black 87. The 100 figure shows that the two sides‟ moves in actual playing from 79 to 87 are all limited to the frame of the red box, which is the search scope of algorithm 3. 3 Conclusion The main achievements of this paper are algorithm 1 and algorithm 2. As a supplement of the MCTS Algorithms, deep learning, big data and super-computing, the model checking technique 105 can avoid the lack of deterministic search of state-of-the-art Go algorithms to a certain extent under some circumstances. Consideration can be given to both the deterministic search and the rate of winning by combining the above techniques. This is the benefit of using the new approach. Acknowledgements This work has been supported by the National Natural Science Foundation of China under 110 Grant No.U1204608, as well as China Postdoctoral Science Foundation under Grant No.2012M511588 and No.2015M572120.
References
[1] J Robson. The complexity of Go [A]. Information Processing[C]. Location: Proceedings of IFIP Congress, 1983. 413-417. 115 [2] David Silver, Aja Huang, Chris J Maddison, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. [3] Weijun Zhu, Qinglei Zhou, Linfeng Jiao. An Algorithm for Searching States of Game of Go based on Symbolic Model Checking[C]. Location:the proceedings of 2n IEEE international conference on computer and communications, Chengdu, IEEE press,2016. 1201~1205. 120 [4] Clarke E, et al. Model checking[M]. Location: MIT press, 1999. [5] Illustrations in details of historic human versus machine battle[OL]. [2016]. http://sports.sina.com.cn/go/2016-03-21/doc-ifxqnskh1061784.shtml [6] The divine move does not hold[OL]. [2016]http://new.eweiqi.com/portal.php?mod=view&aid=26645. [7] Father of Alpha-Go commented that the divine move triggers an unknown bug[OL]. [2016] 125 http://sports.163.com/16/0313/22/BI2RL5RV00051CAQ.html#p=BI21R23N01NA0005.
- 4 - 中国科技论文在线 http://www.paper.edu.cn
围棋模型检测 朱维军 130 (郑州大学信息工程学院) 摘要:基于符号模型检测的围棋方法与基于学习的方法有互补性。为此提出一种新的基于前 者技术的算法。首先,通过强限制条件获得一个有限算法;其次,通过放松条件获得更强算 法。案例研究证实了新方法在特定情况下的比较优势。 关键词:人工智能;符号模型检测;围棋问题 135 中图分类号:TP301;TP389.1;G891.3
- 5 -