4N4-IS-1c-05

The 35th Annual Conference of the Japanese Society for , 2021

Connect6 Opening Leveraging AlphaZero Algorithm and Job-Level Computing

Shao-Xiong Zheng*1,2 Wei-Yuan Hsu*1,2 Kuo-Chan Huang*3 I-Chen Wu*1,2,4ġġġ

*1 Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan *2 Research Center for IT Innovation, Academia Sinica, Taiwan *3 Department of Computer Science, National Taichung University of Education, Taichung, Taiwan *4 Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan

For most board games, players commonly learn to increase strengths by following the opening moves played by experts, usually in the first stage of playing. In the past, many efforts have been made to use game-specific knowledge to construct opening books. Recently, DeepMind developed AlphaZero (2017) that can master game playing without domain knowledge. In this paper, we present an approach based on AlphaZero to constructing an opening book. To demonstrate the approach, we use a program trained based on AlphaZero for evaluating positions, and then expand the opening game tree based on a job-level computing algorithm, called JL-UCT (job-level Upper Confidence Tree), developed by Wu et al. (2013) and Wei et al. (2015). In our experiment, the strengths of the Connect6 programs using this opening book are significantly improved, namely, the one with the opening book has a win rate of 65% against the one without using the book. In addition, the one without opening lost to Polygames in the Connect6 tournament of TCGA 2020 competitions, while the one with opening won against Polygames in TAAI and competitions later in 2020.

To demonstrate the feasibility of our approach, we used the 1. Introduction proposed approach to construct a Connect6 opening book, and Opening book construction is an important research topic for assessed its quality by comparing the strength of our twoġConnect6 increasing the strengths of game-playing programs [Wei 2015]. programs with and without the opening book, respectively. It turns Opening books of a game are databases containing good out that the opening book significantly improves the strength of actions at the opening game stage. An opening book can especially our Connect6 program. In our experiment, the one with opening bring in significant advantage in time-limited game competitions book has a win rate of 65% against the one without using the book. because a lot of search and computation time can be saved. The opening book also helped our Connect6 program get higher In the past, many efforts have been made to use game-specific ranking in real tournaments. Our original program without the knowledge to construct opening books, including analyzing opening book lost to Polygames [Cazenave 2020] in the Connect6 opening moves made by top players and using programs that tournament of TCGA 2020 competitions, while the new one with implement domain-specific algorithms to suggest opening moves. the opening book defeated Polygames and won the gold medal in These methods may encounter two problems: the quality of both TAAI and Computer Olympiad competitions later in 2020. opening books depends on human knowledge of the game, and a The remainder of this paper is organized as follows. Section 2 successful method in one game might not be applied to another presents the necessary background knowledge on MCTS, game. To solve these two issuesĭ this paper proposes an approach AlphaZero, Job-Level Computing, and discuss related work on based on AlphaZero [Silver 2018] to constructing a high-quality opening book construction. In Section 3, we describe our approach opening book without domain knowledge. to constructing a Connect6 opening book and evaluate the The AlphaZero algorithm [Silver 2018], developed by performance of the opening book. Section 4 concludes this paper. DeepMind, demonstrates the capability of reinforcement learning to master game playing without domain knowledge. In our 2. Background and Related Work opening book construction approach, a program trained based on 2.1 MCTS AlphaZero is used to evaluate the positions during expanding the Monte-Carlo tree search (MCTS) is a decision-making opening game tree. For game tree expansion, we use the Job-Level algorithm based on Monte Carlo evaluation and best-first search, Upper Confidence Tree (JL-UCT) distributed algorithm [Wei typically used in turn-based games [Chaslot 2008]. The algorithm 2015] to explore the game tree and select opening positions to repeatedly simulates the possible consequences of each action in a evaluate. The evaluation data on the opening game tree are then way that promising actions, selected based on current simulation collected and converted into an opening book. results, are given more additional simulations. Each iteration of MCTS consists of the following four stages. Contact: I-Chen Wu, Department of Computer Science, National Selection: Starting at the root node, a selection policy is Yang Ming Chiao Tung University, Hsinchu, Taiwan, +886- recursively applied to choose a child for each visited node 3-5731855, +886-3-5733777, [email protected]. until a leaf node is reached. A key issue in this stage is the balance between exploration and exploitation. A commonly

- 1 - 4N4-IS-1c-05

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

used selection policy is first to evaluate the UCT value of Compared with the similar game , Connect6 has better each child i, based on the following formula. fairness. In Gomoku, a larger board size tends to gives the first player a greater advantage [Hsu 2020]. It has even been proved ݈݋݃ͳͲ ܰ ൈට (1) that the first player wins the 15x15 Gomoku [Allis 1994]. Inܥൌݔ݅ ൅ ݅ܶܥܷ ܰ ݅ contrast, so far there is no evidence that the same unfairness exists in Connect6. In addition, the Connect6 is more complex than many where ݔ௜ is the win rate of child i, N and ܰ௜ are the visiting other games, since placing two stones in one move makes the counts of the node and its child i respectively, and C is an number of actions much higher. These properties make Connect6 coefficient. MCTS is inclined to explore for larger C, and regarded as one of the most ideal games for studying computer tends to exploit with smaller C. Then, the policy is to select games[Tao 2009]. the child with the maximum ܷܥܶ݅. This allows MCTS to converge to the optimal decision after a sufficiently large 2.4 Job-Level Computing number of simulations [Browne 2012]. Since solving game problems often requires a large amount of computation, parallelization is usually necessary in practice. To Expansion: The tree is expanded by adding one or more child help solve game problems, Wu et al. proposed a general nodes to the selected leaf nodes, according to the available distributed computing model named job-level computing [Wu actions at the state represented by the selected node. 2013]. Job-level (JL) computing consists of JL clients and the JL Simulation: Simulations are run from the new node(s) by taking system. A JL client dynamically divides game problem solving a series of random actions or according to a default policy into tasks that can be completed by specific executions of game until outcomes are obtained at the terminal states. programs. The requests to execute game programs are Backpropagation: Simulation results are used to update the UCT encapsulated as jobs and sent to the JL system. The JL system, values for all ancestors. comprised of a broker and a collection of (remote) workers, helps The four MCTS stages will be repeatedly applied until a perform the jobs simultaneously by dispatching them to available predefined time or iteration constraint is reached. workers. The job results are then returned to the JL client. Under the JL computing model, many general problem solving 2.2 AlphaZero algorithms that are not limited to specific games has been AlphaZero is an algorithm that allows programs to learn to proposed. A useful one called JL-UCT [Wei 2015] will be used in master a game without human knowledge. The algorithm our opening book construction in Section 3. JL-UCT is a game tree combines and deep neural networks in a expansion algorithm adopting ideas similar to MCTS, and works reinforcement learning framework, described as follows. During as follows. In the JL system, a game-playing program is used to the self-play, a MCTS-based program plays against itself to serve as an agent that suggests actions and evaluates expected generate game records, and a deep neural network is trained with outcomes of given positions. The JL client starts to build a JL the game outcomes as well as the probability distribution of game tree rooted at a given position. Then, the JL client repeatedly actions chosen by MCTS. Note that the selection in MCTS of requests the execution of the game-playing program and expands AlphaZero slightly modified the formula (1) by considering the tree nodes representing the succeeding positions corresponding to probability provided by the network policy in the second term, as the suggested actions to the game tree. For JL-UCT, the JL client described in greater detail in [Silver 2018], recursively applies the UCT formula to select the child nodes to AlphaZero trains programs without using game specific be visited until it reaches the leaf node. JL-UCT allows to perform knowledge, so it can be generally applied to training many other the above game tree expansion in different games when games or applications, such as Go, and Shogi, and reach implemented on different games. For example, when applied on state-of-the-art [Silver 2018]. the game rules of Hex, JL-UCT helped to accelerate the proof of four Hex positions [Xi 2015]. This generality makes it possible to 2.3 Connect6 apply our new opening book construction approach to different Connect6 is a two-player k-in-a-row game introduced by Wu et games. al. [Wu 2005]ĭġusually played on a 19x19 Go board. Traditionally, the first player uses black stones and the second uses white stones. 2.5 Related Work The first player places one black stone at an intersection to start a In recent years, many opening book construction methods have game. Subsequently, the second and the first players alternately been proposed. These methods can be roughly divided into two place two stones of their colors on two unoccupied intersections categories. The first one is to construct an opening book using each turn. The player who first gets six or more consecutive stones game records. For example, Nagashima [Nagashima 2007] of his own color (horizontally, vertically, or diagonally) wins the constructed a shogi opening book by finding the most frequently game. If none reaches 6-consecutive stones, the game ends with a occurring actions in thousands of human game records, and draw. Note that there is no limitation for the location of the first Audouard et alįġ[Audouard 2009] constructed a Go opening book black stone. However, in some Connect6 tournaments, such as the in a similar way. Yang et al. [Yang 2016] also constructed a game site of Little Golem (www.littlegolem.net/jsp/main), the first Connect6 opening book based on game records. These methods stone must be placed at the center of the board. For simplicity of heavily rely on external game information. The second category constructing opening, we also assume that the first stone is placed utilizes game-playing programs that can use game heuristics to at the center in this paper. evaluate moves. In this way, for example, Chen et al. [Chen 2014]

- 2 - 4N4-IS-1c-05

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

constructed a Chinese Chess opening book, and Wei et al. whose elo rating higher than 1900. The three opening positions are constructed a Connect6 opening book [Wei 2015]. Since those shown in Figure 1. They appeared 7748, 5893 and 4589 times in 32492 games respectively. Then, let both programs play 100 programs rely on game specific heuristics, the opening book games for each of the three initial positions. C6-O reached win construction methods above are not completely free from domain rates of 62.5%, 60%, and 68% against C6-V, for the three positions knowledge. respectively. These experiments show that our opening book In this paper, we propose an approach to construct opening works well in common opening positions. books that does not require any human understanding of games other than the game rules.

3. Connect6 Opening Book Construction 3.1 Opening Book Construction To construct opening books without human knowledge, we first 季 季 季 train a game-playing program based on AlphaZero. The detailed (a) (b) (c) settings we used to train our Connect6 program with AlphaZero are as follows. The Connect6 program uses a ResNet [He 2016] Figure 1 Common opening positions made by human experts with 15 blocks and 192 channels. The program played two million self-play games, and the learning rate was set to 0.01 and 0.003 3.3 Analysis respectively in the first million and the second million games. In order to provide more insight into the Connect6 opening book, The game-playing program then serves as the remote program this subsection analyzes the moves in the opening book as follows. in JL-UCT as mentioned above, which evaluates the values and Figure 2 shows the moves suggested by the opening book for suggests actions for given positions. The program performs MCTS responding to each position in Figure 1. Besides, if white is to play to suggest the best one from the available actions for a given after black’s first stone, our opening book tends to play the move position, and we use the empirical mean of MCTS simulation shown in Figure 1-c, since the move has the most visit count based results as the position value. The suggested action and evaluation on JL-UCT. Interestingly, the visit count for this move is ranked value are sent back to the JL Client as the job result. In this way, the third for human players. This is worthy investigating with JL-UCT is used to expand the opening game tree. human players whether the above moves in the opening book is To explore the Connect6 opening positions, we feed the initial indeed the best or high-quality to Connect6 human experts. Connect6 position, namely, the empty 19x19 board, to the JL client. JL-UCT expands the Connect6 game tree based on the actions suggested by the remote program and scores each opening position based on the evaluation value obtained from the remote program. The expansion stops when the game tree reaches a size limitation, set to 200,000 in our experiment. Finally, the opening game tree is stored into an opening book as 季 季 季 follows. Based on the property of the UCT formula, the actions (a) (b) (c) with the highest JL-UCT visit counts tend to be good moves for each position on the game tree. Since the ranking of visit counts Figure 2 Moves suggested by the opening book changes as the game tree is expanded, we only stone those actions with sufficient visit counts into the book for reliability of using the 3.4 Tournament opening book, namely, only the actions with visit counts greater Our Connect6 program entered for several major tournaments than 1000 are saved into the book and can be used when playing. before and after we developed our opening book construction approach. The one C6-V (without the opening book) was defeated 3.2 Performance Evaluation by Polygames, who won the championship in TCGA2020. Later The Connect6 opening book constructed by the proposed in both TAAI 2020 and the Computer Olympiad 2020, our approach was evaluated to see whether it can effectively improve program C6-O (with the opening book) defeated Polygames and the strength of Connect6 program. Our Connect6 program trained won the championship of the two tournaments, demonstrating the with AlphaZero as above is named C6-vanilla or C6-V, and the effectiveness of our opening book construction approach. same program with opening book lookup is named C6-opening or C6-O. In our experiment, 100 games were played. The program 4. Conclusion C6-O plays as black in 50 games and white in the other 50 games. In this paper, we proposed an approach to constructing opening For a draw, both sides were considered to have won 0.5 games. books without using human knowledge, directly combined from Playing against C6-V, C6-O reached win rates of 60% and 70% as AlphaZero and JL-UCT. The constructed opening book was black and white, respectively. The overall win rate is 65% and the evaluated in experiments and tournaments. In our experiments, our confidence interval is 9.40%. Connect6 program trained based on AlphaZero played against We also evaluated the performance of the opening book in the itself, and the one with the opening book reached a win rate of games starting from given positions. We collected Connect6 game records of human players from Little Golem and chose the three 65% against the one without. It was also shown that the book most common opening positions played by high-ranked players, performs better when the games start from the three most common

- 3 - 4N4-IS-1c-05

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

opening positions played by human experts. The book also helps [Wu 2005] I-C. Wu, D.-Y. Huang, and H.-C. Chang, “Connect6”, our program to perform better in real tournaments, namely ICGA Journal, 2005. winning the championship in both TAAI 2020 and the Computer [Wu 2013] I.-C. Wu, H.-H. Lin, D.-J. Sun, K.-Y. Kao, P.-H. Lin, Olympiad 2020 by beating against Polygames, who won against Y.-C. Chan, P.-T. Chen, “Job-Level Proof Number Search”, our program without using the opening book in the previous IEEE Transactions on Computational Intelligence and AI in tournament in TCGA 2020. Games, IEEE, 2013. [Xi 2015] L. Xi, and I-C. Wu, "Solving hex openings using job- Acknowledgements level UCT search", ICGA Journal, 2015. This research is partially supported by the Ministry of Science [Yang 2016] J.-K. Yang, P.-J. Tseng, “Building connect6 Opening and Technology (MOST) of Taiwan under Grant Number MOST by using the Monte Carlo tree search”, 2016 Eighth 109-2634-F-009-019 and MOST 110-2634-F-009-022 through International Conference on Advanced Computational Pervasive Artificial Intelligence Research (PAIR) Labs. The Intelligence (ICACI), IEEE, 2016. computing resource is partially supported by National Center for High-performance Computing (NCHC) of Taiwan.

References [Allis 1994] L. V. Allis, “Searching for Solutions in Games and Artificial Intelligence”, PhD thesis, University Limburg, 1994. [Audouard 2009] P. Audouard, G. M. J.-B. Chaslot, J.-B. Hoock, J. Perez, A. Rimmel, and O. Teytaud, “Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go”, Workshops on applications of evolutionary computation, Springer, 2009 [Browne 2012] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, et al., “A Survey of Monte Carlo Tree Search Methods”, Transactions on Computational Intelligence and AI in Games, IEEE, 2012. [Cazenave 2020] T. Cazenave, Y.-C. Chen, G.-W. Chen, S.-Y. Chen, X.-D. Chiu, J. Dehos, M. Elsa, Q. Gong, H. Hu, V. Khalidov, et al., “Polygames: Improved zero learning”, ICGA Journal, 2020. [Chaslot 2008] G. M. J.-B. Chaslot, S. Bakkes, I. Szita, and P. Spronck, “Monte-Carlo tree search: A new framework for game AI”, the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference, 2008. [Chen 2014] J. Chen, I-C. Wu, W. Tseng, B. Lin, and C. Chang, “Job-level alpha-beta search”, Transactions on Computational Intelligence and AI in Games, IEEE, 2014. [He 2016] K. He, X. Zhang, S. Ren, and J. Sun., “Deep residual learning for image recognition”, CVPR, 2016. [Hsu 2020] W.-Y. Hsu, C.-L. Ko, J.-C. Chen, T.-H. Wei, C.-H. Hsueh, and I.-C. Wu, “On solving the 7, 7, 5-game and the 8, 8, 5-game”, Theoretical Computer Science, 2020. [Nagashima 2007] H. Nagashima, “Towards master-level play of Shogi”, PhD thesis, Advanced Institute of Science and Technology, 2007. [Silver 2018] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al., “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play”, Science, 2018. [Tao 2009] J.-J. Tao, C.-M. Xu, K. Han, “Construction of Opening Book in Connect6 with Its Application”, 21st Chinese control and decision conference, IEEE, 2009. [Wei 2015] T.-H. Wei, I-C. Wu, C.-C. Liang, B.-T. Chiang, W.-J. Tseng, S.-J. Yen and C.-S. Lee, "Job-level algorithms for Connect6 opening book construction.", ICGA Journal, 2015.

- 4 -