Monte Carlo Tree Search for Games with Hidden Information and Uncertainty
Total Page:16
File Type:pdf, Size:1020Kb
Monte Carlo Tree Search for games with Hidden Information and Uncertainty Daniel Whitehouse PhD University of York Computer Science August, 2014 Abstract Monte Carlo Tree Search (MCTS) is an AI technique that has been success- fully applied to many deterministic games of perfect information, leading to large advances in a number of domains, such as Go and General Game Play- ing. Imperfect information games are less well studied in the field of AI despite being popular and of significant commercial interest, for example in the case of computer and mobile adaptations of turn based board and card games. This is largely because hidden information and uncertainty leads to a large increase in complexity compared to perfect information games. In this thesis MCTS is extended to games with hidden information and un- certainty through the introduction of the Information Set MCTS (ISMCTS) family of algorithms. It is demonstrated that ISMCTS can handle hidden in- formation and uncertainty in a variety of complex board and card games. This is achieved whilst preserving the general applicability of MCTS and using com- putational budgets appropriate for use in a commercial game. The ISMCTS algorithm is shown to outperform the existing approach of Perfect Information Monte Carlo (PIMC) search. Additionally it is shown that ISMCTS can be used to solve two known issues with PIMC search, namely strategy fusion and non-locality. ISMCTS has been integrated into a commercial game, Spades by AI Factory, with over 2:5 million downloads. The Information Capture And ReUSe (ICARUS) framework is also intro- duced in this thesis. The ICARUS framework generalises MCTS enhancements in terms of information capture (from MCTS simulations) and reuse (to improve MCTS tree and simulation policies). The ICARUS framework is used to express existing enhancements, to provide a tool to design new ones, and to rigorously define how MCTS enhancements can be combined. The ICARUS framework is tested across a wide variety of games. Contents Abstract . .i Contents . ii List of Figures . .v List of Tables . vii List of Algorithms . viii Preface . ix Acknowledgements . xi Author's Declaration . xii 1 Introduction 1 1.1 Background . .3 1.1.1 Monte Carlo Tree Search . .5 1.2 UCT for Games and Beyond . .6 1.3 Research Hypothesis and Project Outcomes . .7 1.3.1 Additional Project Outcomes . .8 1.4 Thesis Overview . 11 2 Notation and Definitions 13 2.1 Games with Hidden Information and Uncertainty . 13 2.2 Game Notation . 16 2.3 Solution concepts for games . 20 2.4 Statistical Analysis of Experiments . 21 3 Literature Review 22 3.1 Multi-Armed Bandits . 22 3.1.1 k-Armed Multi-Armed Bandit Problem . 23 3.1.2 Bandit Algorithms . 23 3.2 Monte Carlo Tree Search . 24 3.2.1 Monte Carlo Tree Search Algorithms . 24 3.2.2 Properties of Monte Carlo Tree Search . 25 3.2.3 Enhancements and Applications . 26 3.2.4 Simultaneous Moves . 27 3.2.5 Multiplayer Games . 29 3.3 Search in Imperfect Information Games . 29 3.3.1 Determinization . 29 ii 3.3.2 Minimax . 31 3.3.3 Monte Carlo Tree Search . 32 3.3.4 Counterfactual Regret Minimization . 32 3.3.5 Inference . 33 4 Software and Domains 34 4.1 MctsFramework . 34 4.1.1 Components of MctsFramework . 35 4.1.2 Handling Hidden Information . 36 4.1.3 Card Games Framework . 36 4.1.4 Integration with BOINC . 37 4.2 Domains . 37 4.2.1 Dou Di Zhu . 37 4.2.2 Mini Dou Di Zhu . 39 4.2.3 Hearts . 40 4.2.4 Spades . 41 4.2.5 Lord of the Rings: The Confrontation . 42 4.2.6 Rules . 43 4.2.7 Phantom m; n; k-games . 45 4.2.8 Checkers . 46 4.2.9 Othello . 46 4.2.10 Backgammon . 47 4.2.11 The Resistance . 47 4.2.12 Scotland Yard . 48 4.2.13 Saboteur . 49 4.2.14 Other Implemented games . 50 5 Hidden Information and Strategy Fusion 51 5.1 Perfect Information Monte Carlo Search . 53 5.1.1 Balancing Simulations and Determinizations . 54 5.1.2 Quantifying the effect of strategy fusion and non-locality 61 5.2 Information Set Monte Carlo Tree Search (ISMCTS) . 67 5.2.1 Subset-armed Bandits . 68 5.2.2 Handling Chance Events . 70 5.2.3 The ISMCTS Algorithm . 71 5.2.4 Evaluating ISMCTS . 76 5.3 Summary . 83 6 Partial Observability 88 6.1 ISMCTS and Partial Observability . 90 6.2 Bandit algorithms for Simultaneous Actions . 91 6.3 ISMCTS with Partially Observable Moves . 92 6.3.1 Algorithms . 92 6.3.2 Experiments . 95 6.4 Summary . 106 iii 7 Bluffing and Inference 111 7.1 Background . 113 7.1.1 Existing Approaches . 114 7.2 ISMCTS with multiple opponent trees . 116 7.3 Inference using MCTS statistics . 118 7.4 Self-determinization and bluffing . 120 7.5 Experiments . 123 7.5.1 Inference and self-determinization methods for The Resis- tance . 123 7.5.2 Balancing true and self-determinizations . 124 7.5.3 Emergence of bluffing . 125 7.5.4 Effect of inference on the belief distribution . 127 7.5.5 Application to other games . 129 7.6 Summary . 132 8 Information Capture and Reuse (ICARUS) 134 8.1 Motivation . 135 8.2 Information Capture And ReUse Strategies (ICARUSes) . 137 8.2.1 Defining ICARUSes . 137 8.2.2 Baseline ICARUS definition . 141 8.2.3 Enhancements in the ICARUS framework . 141 8.2.4 Combining ICARUSes . 148 8.2.5 Convergence properties of ICARUSes . 149 8.3 Episodic Information Capture and Reuse . 151 8.3.1 Definition of EPIC . 152 8.3.2 Episode functions for experimental domains . 154 8.4 Comparing ICARUSes . 156 8.5 Experiments . 159 8.5.1 Strength of ICARUS combinations . 159 8.5.2 EPIC compared to NAST . 165 8.5.3 Simulation policies for MAST . 165 8.5.4 N-gram lengths . 168 8.5.5 Computation time . 171 8.6 Summary . 172 9 Conclusion 175 9.1 Remarks on the Effectiveness of ISMCTS . 178 9.2 Future Work . 180 Bibliography 182 iv List of Figures 5.1 Histogram of win rates for the landlord player in 100 sets of 1000 Dou Di Zhu games. 56 5.2 Plot of number of landlord (player 1) wins against number of determinizations, for fixed numbers of UCT iterations per deter- minization. 58 5.3 Plot of number of landlord (player 1) wins against number of UCT iterations per determinization, for fixed numbers of deter- minizations. 58 5.4 Plot of number of landlord (player 1) wins against number of determinizations, for a fixed number of UCT simulations divided equally among all determinizations. 59 5.5 Results of the determinization balancing experiment for LOTR:C. 60 5.6 Illustration of search trees for (a) determinization and (b) infor- mation set expectimax. 64 5.7 Performance of several agents in Mini Dou Di Zhu. 66 5.8 An example game tree for a simple 1-player game. 73 5.9 An information set search tree for the game shown in Figure 5.8. 73 5.10 An example game tree for a simple 2-player game. 74 5.11 An information set search tree for the game shown in Figure 5.10. 74 5.12 Playing strength of ISMCTS and Determinized UCT for Dou Di Zhu. .................................. 80 5.13 Playing strengths of ISMCTS and PIMC (Determinized UCT) against AI Factory AI for Dou Di Zhu. 84 5.14 Playing strength of ISMCTS against AI Factory AI for spades at the normal and maximum levels . 84 6.1 Information set search trees for the game shown in Figure 5.10 with partially observable moves. 93 6.2 Heat map showing the results of the LOTR:C playing strength experiment. 96 6.3 Results of the playing strength experiment for LOTR:C. 96 6.4 Heat map showing the results of the phantom 4; 4; 4-game playing strength experiment. 99 6.5 Results of the playing strength experiment for the phantom 4; 4; 4- game. 100 v 6.6 Average number of simulations performed in 1 second by deter- minized UCT, SO-ISMCTS and MO-ISMCTS. 103 6.7 Playing strength of determinized UCT and SO-ISMCTS for dif- ferent amounts of decision time playing Dou Di Zhu . 105 6.8 Playing strength of determinized UCT and MO-ISMCTS for.