On the Construction, Maintenance and Analysis of Case-Based Strategies in Computer Poker
Total Page:16
File Type:pdf, Size:1020Kb
On the Construction, Maintenance and Analysis of Case-Based Strategies in Computer Poker Jonathan Rubin A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, The University of Auckland, 2013. Abstract The state-of-the-art within Artificial Intelligence has directly benefited from research con- ducted within the computer poker domain. One such success has been the the advancement of bottom up equilibrium finding algorithms via computational game theory. On the other hand, alternative top down approaches, that attempt to generalise decisions observed within a col- lection of data, have not received as much attention. In this thesis we examine top down ap- proaches that use Case-Based Reasoning in order to construct strategies within the domain of computer poker. Our analysis begins with the development of frameworks to produce static strategies that do not change during game play. We trace the evolution of our case-based ar- chitecture and evaluate the effect that modifications have on strategy performance. The end result of our experimentation is a coherent framework for producing strong case-based strate- gies based on the observation and generalisation of expert decisions. Next, we introduce three augmentation procedures that extend the initial frameworks in order to produce case-based strategies that are able to adapt to changing game conditions and exploit weaknesses of their opponents. Two of the augmentation procedures introduce different forms of opponent mod- elling into the case-based strategies produced. A further extension investigates the use of trans- fer learning in order to leverage information between separate poker sub-domains. For each poker domain investigated, we present results obtained from the Annual Computer Poker Com- petition, where the best poker agents in the world are challenged against each other. We also present results against a range of human opponents. The presented results indicate that the top down case-based strategies produced are competitive against both human opposition, as well as state-of-the-art, bottom up equilibrium finding algorithms. Furthermore, comparative eval- uations between augmented and non-augmented frameworks show that strategies which have been augmented with either transfer learning or opponent modelling capabilities are typically able to outperform their non-augmented counterparts. 2 JONATHAN RUBIN Acknowledgments First, I would like to thank my supervisor, Ian Watson, without whom this thesis would not have been written. Thank you for the freedom that allowed me to devote approximately five years of my life to a true passion of mine. Thank you also to my co-supervisor, Jim Warren, for providing a much needed alternative perspective along the way. Thank you to all my fellow graduate students at the University of Auckland. I would also like to acknowledge both the academic and hobbyist members of the computer poker community. Thank you to the members and moderators of the pokerai forums - an invaluable resource for anyone interested in computer poker. Thank you also to the organisers of the computer poker competition. Without such an active and vibrant community much of this work would not have been possible. Finally, thank you so much to my friends and family who have encouraged and supported me along the way. 3 4 JONATHAN RUBIN Contents 1 Introduction 13 1.1. Texas Hold’em....................................... 15 1.2. Performance Metrics & Evaluation........................... 16 1.2.1. Types of Strategies................................ 17 1.2.2. Performance Evaluators............................. 17 1.3. Variance Reduction.................................... 20 1.4. Thesis Objectives..................................... 21 1.5. Thesis Contributions................................... 22 1.6. Thesis Outline....................................... 23 2 Computer Poker: Agents and Approaches 25 2.1. Knowledge-Based Systems................................ 25 2.1.1. Rule-Based Expert Systems........................... 26 2.1.2. Formula-Based Strategies............................ 26 2.1.3. Knowledge-Based Poker Agents......................... 28 2.2. Monte-Carlo and Enhanced Simulation........................ 30 2.2.1. Introduction.................................... 30 2.2.2. Simulation Example............................... 30 2.2.3. Simulation-Based Poker Agents......................... 33 2.3. Game Theoretic Equilibrium Solutions......................... 36 2.3.1. Game Theory Preliminaries........................... 36 2.3.2. Rock-Paper-Scissors Example.......................... 36 2.3.3. Equilibrium Strategies for Texas Hold’em................... 38 2.3.4. Abstractions.................................... 40 2.3.5. Near-Equilibrium Poker Agents......................... 45 2.3.6. Iterative Algorithms for finding -Nash Equilibria............... 49 2.4. Exploitive Counter-Strategies.............................. 58 5 6 JONATHAN RUBIN 2.4.1. Imperfect Information Game Tree Search................... 58 2.4.2. Game-Theoretic Counter-Strategies...................... 64 2.5. Alternative Approaches.................................. 68 2.5.1. Evolutionary Algorithms and Neural Networks................ 68 2.5.2. Bayesian Poker.................................. 70 2.6. Summary.......................................... 71 3 Case-Based Strategies in Computer Poker 73 3.1. Introduction........................................ 73 3.2. Case-Based Reasoning Motivation........................... 75 3.3. Two-Player, Limit Texas Hold’em............................ 77 3.3.1. Overview...................................... 77 3.3.2. Architecture Evolution.............................. 78 3.3.3. Two-Player, Limit Texas Hold’em Framework................. 88 3.3.4. Two-Player, Limit Texas Hold’em Experimental Results........... 92 3.4. Two-Player, No-Limit Texas Hold’em.......................... 99 3.4.1. Abstraction..................................... 99 3.4.2. Translation..................................... 100 3.4.3. No-Limit, Heads-Up Framework........................ 101 3.4.4. No-Limit, Heads-Up Experimental Results................... 108 3.5. Multi-Player, Limit Texas Hold’em............................ 112 3.5.1. Multi-Player Limit Framework.......................... 112 3.5.2. Multi-Player Limit Experimental Results.................... 114 3.6. Concluding Remarks................................... 115 4 Implicit Opponent Modelling via Dynamic Case-Base Selection 117 4.1. Multiple Case-Bases.................................... 118 4.2. Expert Imitators...................................... 120 4.2.1. Action Vectors................................... 120 4.3. Combining Decisions................................... 121 4.3.1. Ensemble-based Decision Combination................... 122 4.3.2. Dynamic Case-Base Selection.......................... 122 4.4. Methodology........................................ 124 4.5. Experimental Results................................... 125 4.5.1. Limit Results.................................... 125 4.5.2. Limit Discussion................................. 126 4.5.3. No-Limit Results................................. 128 CASE-BASED STRATEGIES IN COMPUTER POKER 7 4.5.4. No-Limit Discussion............................... 128 4.6. Concluding Remarks................................... 130 5 Explicit Opponent Modelling via Opponent Type Solution Adaptation 131 5.1. Opponent Type Based Adaptation............................ 132 5.2. Alternative Approaches.................................. 133 5.3. Adaptation......................................... 134 5.3.1. Offline Opponent Type Model Construction.................. 135 5.3.2. Aggression Response Trends........................... 135 5.3.3. Opponent Type.................................. 136 5.4. Online Adaptation..................................... 137 5.4.1. Adapted Strategies................................ 137 5.4.2. Adaptation during Game Play.......................... 138 5.5. Methodology........................................ 138 5.6. Experimental Results................................... 140 5.7. Discussion......................................... 140 5.8. Concluding Remarks................................... 142 6 Strategy Switching via Case-Based Transfer Learning 143 6.1. Strategy Switching..................................... 144 6.1.1. Inter-Domain Mapping............................. 144 6.2. Inter-Domain Mapping Example............................ 146 6.3. Methodology........................................ 147 6.4. Experimental Results................................... 148 6.5. Discussion......................................... 149 6.6. Concluding Remarks................................... 150 7 Conclusions and Future Work 151 7.1. Conclusions........................................ 151 7.2. Future Work........................................ 154 Appendices 159 A Annual Computer Poker Competition Cross-Tables 159 8 JONATHAN RUBIN List of Figures 2.1 A hypothetical rule within a knowledge-based system................ 26 2.2 Three hypothetical trials within a simulation..................... 31 2.3 The game of Rock-Paper-Scissors represented as a tree................ 37 2.4 The game of Rock-Paper-Scissors represented as a matrix............... 38 2.5 A high level view of the game tree for 2-player Texas Hold’em............ 39 2.6 An extensive form game tree with corresponding information set tree....... 54 2.7 Part of Player A’s opponent model