Search and Learning Algorithms for Two-Player Games with Application to the Game of Hex Chao

Search and Learning Algorithms for Two-Player Games with Application to the Game of Hex by Chao Gao A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computing Science University of Alberta c Chao Gao, 2020 Abstract Two-Player alternate-turn perfect-information zero-sum games have been suggested as a testbed for Artificial Intelligence research since Shannon in 1950s. In this thesis, we summarize and develop algorithms for this line of research. We focus on the game of Hex | a game created by Piet Hein in 1942 and popularized by Jonh Nash in 1949. We continue on previous research, further bringing the strength of machine learning techniques | specifically deep neural networks | to the game of Hex. We develop new methods and apply them to solving and playing Hex. We present state-of-the-art results for Hex and discuss relevant research on other domains. In particular, we develop solving and playing algorithm by combining search and deep learning methods. As a result of our algorithmic innovations, we produce stronger solver and player, winning gold medal at the Computer Olympiad Hex tournaments in 2017 and 2018. ii What I believe is that if it takes 200 years to achieve artificial intelligence, and finally there is a textbook that explains how it is done; the hardest part of that textbook to write will be the part that explains why people didn't think of it 200 years ago, because we are really talking about how to make machines do things that are on the surface of our minds. It is just that our ability to observe our mental processes is not very good and has not been very good. | John McCarthy iii Acknowledgements Many people deserve to be acknowledged for assisting me directly or indirectly in creating this thesis. First of all, I thank my supervisors Ryan Hayward and Martin Müllerfor supervising me. My initial interest in games were motivated by my keen interest in playing Chinese chess as a hobby. Even though I roughly know how computers play chess games, I was surprised to learn that super-human strength program does not exist for the game of Hex, especially when considering its simplest rules and straightforward goals. The other aspect of Hex that lures me is its close tie to combinatorics and graphs; these properties have enabled many aspects of Hex be studied in a rigorous mathematical manner rather than mostly empirical as in many other games; I was intrigued to learn more about them, even though I finally did not pursue these directions for my thesis. My study of Hex with deep neural networks was inspired by the consecutive suc- cesses of using deep learning techniques as a general tool for bringing advancement in a number of areas, in particular the early breakthrough in playing the game of Go. By 2016, it becomes clear to me that the next step of research in Hex should be including deep learning to the game. Rather than straightforwardly adapting deep networks into existing Hex algorithms, I tried to understand the essence of each technique thus to invent new paradigms that should be more applicable to Hex and probably other challenging games as well, and this thesis is a reflection of my endeavour towards such an aspiration. In the process, I greatly appreciate the support of Martin Müller for his expertise in multiple domains; his critical questions on my work often pushed me to explore important things otherwise I would have neglected. I am greatly grateful to Ryan Hayward for buying two GPU machines to support my research; most experiments in this thesis were carried out on these machines. His attention on writing also influenced me; the little book he gave to me, The Elements of Style, which I originally thought of little use, turned out to be a treasure for providing valuable guidance to me. iv I am also thankful to a number of people who have aided me. Previous researchers in Hex including Jakub Pawlewicz, Broderick Arneson, Philip Henderson and Aja Huang have provided me useful comments on codebase benzene. In particular, in 2018 summer, Jakub Pawlewicz helped me greatly for preparing the Computer Hex tournament. Thanks Siqi Yan, Xutong Zhao, Jiahao Li, Paul Banse, Joseph Meleshko, Wai Yi Low and Mengliao Wang for either doing tournament test for me or temporarily contributing their GPU computation for the testing. Nicolas Fabi- ano further discovered more pruning patterns in 2019 summer though these are not discussed in this thesis. Thanks Kenny Young for passing me the working code of MoHex 2.0. I am grateful to Noah Weninger for operating MoHex-CNN in 2017 and helping me produce some training games. Thanks Cybera.ca for providing cloud instances. Thanks Jing Yang for providing his pattern set and useful instructions for 9 9 Hex solution. × I thank the useful discussion on a variety of general topics with Farhad Haqiqat, Chenjun Xiao, Jincheng Mei, Gaojian Fan, Xuebin Qin, Chenyang Huang and Yun- peng Tang; researchers that I known of while I was interning at Borealis AI: Bilal Kartal, Pablo Hernandez Leal, and manager Matt Taylor; my manager at Huawei Edmonton research office Henshuai Yao. Finally, I am deeply grateful to my family for being supportive of my pursuit | I am indebted to my parents, my brother, my sister, and in particular my wife Sitong Li for her great companion, unconditional support and priceless patience. v Contents 1 Introduction 1 1.1 The Game of Hex as a Benchmark for AI . .1 1.2 Searching for Solutions with the Help of Knowledge . .4 1.2.1 Learning Heuristic Knowledge in Games . .6 1.2.2 Principles for Heuristic Search in Games . .9 1.3 Contributions and Organization of This Thesis . .9 2 General and Specific Techniques for Solving and Playing Games 13 2.1 Strategy Representation . 13 2.1.1 Minimax Strategy and Solution Graphs . 13 2.1.2 Strategy Decomposition . 15 2.2 Search and Learning Formulations . 18 2.2.1 State and Problem Space Graphs . 19 2.2.2 Markov Decision Processes and Alternating Markov Games . 21 2.3 Techniques for Strategy Discovery . 23 2.3.1 Informed Best-first Search . 23 2.3.2 Reinforcement Learning . 27 2.3.3 Deep Neural Networks . 30 2.3.4 Combining Learning and Search: Monte Carlo Tree Search . 32 2.4 Hex Specific Research . 33 2.4.1 Complexity of Hex . 33 2.4.2 Graph Properties and Inferior Cell Analysis . 34 2.4.3 Bottom Up Connection Strategy Computation . 37 2.4.4 Iterative Knowledge Computation . 39 2.4.5 Automated Player and Solver . 40 3 Supervised Learning and Policy Gradient Reinforcement Learning in Hex 44 3.1 Supervised Learning with Deep CNNs for Move Prediction . 44 3.1.1 Background . 44 3.1.2 Input Features . 45 3.1.3 Architecture . 45 3.1.4 Data for Learning . 47 3.1.5 Configuration . 48 3.1.6 Results . 48 3.1.7 Discussion . 55 3.2 Policy Gradient Reinforcement Learning . 56 3.2.1 Background . 56 3.2.2 The Policy Gradient in MDPs . 57 3.2.3 An Adversarial Policy Gradient Method for AMGs . 58 3.2.4 Experiment Results in Hex . 61 3.2.5 Setup . 61 3.2.6 Data and Supervised Learning for Initialization . 61 3.2.7 Results of Various Policy Gradient Algorithms . 62 3.2.8 Discussion . 63 vi 4 Three-Head Neural Network Architecture for MCTS and Its Ap- plication to Hex 66 4.1 Background: AlphaGo and Its Successors . 66 4.2 Sample Efficiency of AlphaGo Zero and AlphaZero . 71 4.3 Three-Head Neural Network Architecture for More Efficient MCTS . 73 4.3.1 PV-MCTS with Delayed Node Expansion . 74 4.3.2 Training 3HNN . 76 4.4 Results on 13 13 Hex with a Fixed Dataset . 78 4.4.1 ResNet× for Hex . 78 4.4.2 Setup . 78 4.4.3 Prediction Accuracy of 3HNN . 79 4.4.4 Evaluation in the Integration of PV-MCTS . 81 4.5 Transferring Knowledge Using 3HNN . 83 4.5.1 Setup . 84 4.5.2 Prediction Accuracy In Different Board Sizes . 84 4.5.3 Usefulness When Combined with Search . 85 4.5.4 Effect of Fine-tuning . 87 4.6 Closed Loop Training with 3HNN . 88 4.6.1 Training For 2018 Computer Olympiad . 88 4.6.2 Zero-style Learning . 90 4.7 Discussion . 96 5 Solving Hex with Deep Neural Networks 98 5.1 Focused Proof Number Search for Solving Hex . 98 5.2 Focused Proof Number Search with Deep Neural Networks . 100 5.3 Results on 8 8Hex ........................... 101 5.3.1 Preparation× of Data . 101 5.3.2 Policy and Value Neural Networks . 101 5.3.3 Empirical Comparison of DFPN, FDFPN and FDFPN-CNN 103 5.3.4 Using Three-Head ResNet . 106 5.4 Solving by Strategy Decomposition . 107 6 Conclusions and Future Work 113 References 116 Appendix A Additional Documentation 129 A.1 Neurobenzene . 129 A.2 Visualization for a Human Identified 9 9 Solution . 130 A.3 Published Dataset . .× . 131 A.4 From GBFS to AO* and PNS . 132 A.5 Computing True Proof/Disproof Number is NP-hard . 141 vii List of Tables 2.1 Approximate number of estimated states for N N Hex, N = 9; 10; 11; 13. .......................................× 34 2.2 Evolution of Computer Programs for Playing Hex. 42 3.1 Input features; form bridge are empty cells that playing there forms a bridge pattern. Similarly, an empty cell is save bridge if it can be played as a response to the opponent's move in the bridge carrier.

Search and Learning Algorithms for Two-Player Games with Application to the Game of Hex Chao

Elements of DSAI: Game Tree Search, Learning Architectures

ELF: an Extensive, Lightweight and Flexible Research Platform for Real-Time Strategy Games

Improved Policy Networks for Computer Go

Unsupervised State Representation Learning in Atari

Chinese Health App Arrives Access to a Large Population Used to Sharing Data Could Give Icarbonx an Edge Over Rivals

Fml-Based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go

Spy on Me #2 – Artistic Manoevres for the Digital Present

Understanding & Generalizing Alphago Zero

Computer Go: from the Beginnings to Alphago Martin Müller, University of Alberta

6 Adversarial Search

"Large Scale Monte Carlo Tree Search on GPU"

Combining Tactical Search and Deep Learning in the Game of Go