An Approach of Solving Early Phase Multiplayer No-Limit Hold'em Poker

An approach of solving early phase multiplayer No-Limit Hold’em poker using empirical Bayesian statistics and vector spaces A thesis presented by Damiaan Reijnaers (10804137) for the degree of Bachelor of Science in Artificial Intelligence Credits: 18 EC University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. M.A.F. Lewis Institute for Logic, Language and Computation Faculty of Science University of Amsterdam Science Park 107 1098 XG Amsterdam March 20th, 2020 1 Abstract This thesis aims to propose an efficient methodology for the purpose of representation, and decision-making based on comparison, of early-game states in No-Limit Texas Hold’em poker. The paper presents a predictive model in the form of an augmented decision tree based on the distance between geometrically represented game situations in an euclidean vector space, wherein opponent models contribute as situational characteristics. This research identifies a shortcoming in existing work on opponent modelling, and solves it by introducing a mixed technique based on Bayesian statistics and beta-binomial regression. An implementation is suggested and tested. By using the proposed method, it can be concluded that an artificial agent is able to distinguish between different game situations and to make (a variety of) decisions based on situational factors. 2 Contents Abstract 2 1 Introduction 5 2 Putting the idea into perspective 6 3 Method 8 3.1 Proposed methodology .......................................... 9 3.1.1 General definitions ........................................ 9 3.1.2 Opponent modelling ....................................... 10 3.1.3 Situational space ......................................... 15 3.1.4 Decision trees ........................................... 18 3.2 Suggested implementation ........................................ 21 3.2.1 Dataset .............................................. 21 3.2.2 Population ............................................ 21 3.2.3 Decisions based on situations .................................. 22 3.3 Experiments and results .......................................... 26 3.3.1 Weights .............................................. 26 3.3.2 Estimated hole card ranges .................................... 26 3.3.3 Predictions for actions and raise sizing .............................. 27 4 Conclusion and discussion 28 Bibliography 31 Appendix A - Relevant rules of Texas Hold’em Poker 33 Appendix B - Glossary of used poker terms 35 Appendix C - Questionnaire concerning quality of experiment results 37 3 List of Figures 1 Dependency graph for variables in a poker game ............................. 9 2 Visualisation of VPIP- and PFR/VPIP-values for players in dataset ................... 11 3 Clusters of player characteristics ..................................... 13 4 Relation between number of observations and player metrics ...................... 14 5 Beta distributions resulting from beta-binomial regression ....................... 15 6 Generated situational space for a raise from first position ........................ 16 7 Relation between folds and raising size .................................. 19 8 Sketch of situation with raise, call and fold to agent on BU ....................... 20 9 Sketch of situation with raise from first position, folds to agent on SB .................. 27 10 Part of decision tree after agent re-raises ................................. 27 List of Tables 1 Estimations for player metrics using various methods .......................... 22 2 Analysis of parameter d in equation 14 .................................. 23 3 Observations of situations in used dataset ................................ 23 4 Estimated weights for raise from first position .............................. 26 5 Estimated hole card range for a player who raised from first position. .................. 27 List of Algorithms 1 Estimating a raising size for the agent .................................. 25 4 1 Introduction The game’s complexity does not necessarily implicate that a computer can never be profitable playing games of The rise of Deep Blue, solving chess in 1996; and the poker. Since there is a clear distinction between profitable recent victory of AlphaGo, ‘solving’ the game of Go and non-profitable players, it can be assumed that at least in 2017—the human player against whom AlphaGo a part of the game is not based on chance. And, however competed actually won one out of three rounds, which small this part in the distribution between skill and chance can be called a victory for AlphaGo, but can merely be might be, it is big enough to turn hundreds of online play- called a definitive ‘solution’—has made developments ers into winners of tens of thousands of US dollars3. Two towards solving Texas Hold’em poker more relevant. studies involving groups of instructed and non-instructed Occasionally, programs such as PokerSnowie gain a poker players reinforced the statement of a greater role considerable amount of attention by making progress of skill than chance (DeDonno and Detterman, 2008). towards creating an ‘unbeatable poker algorithm1.’ Moreover, although broadly sceptical on the matter, However, poker is a totally different game, which, unlike Meyer et al. (2013) as well states that “experts seem to Chess and Go, involves a great deal of chance. It is a game be better able to minimize losses when confronted with with imperfect and unreliable information, involving disadvantageous conditions.” Nevertheless, it is worth opponent modelling, risk management and deception – noticing that besides concluding “that the outcomes of this illustrates the relevance of the game to significant poker games are predominantly determined by chance,” areas in AI research (Billings et al., 1998). Meyer et al. nuances this conclusion by stating that their conclusion applies “at least to short game sequences,” Despite the relatively narrow scope of this thesis, thus leaving long game sequences open for discussion. multiple different disciplines of AI are involved in DeDonno and Detterman filled up this gap by showing this paper. Moreover, the methods presented in this that skill is the determining factor in long-term outcome. paper are relevant for broader use than just poker. The This is the same advice professional poker coaches teach to-be-introduced formulas for beta-binomial regression their students: “it is about focusing on small advantages in section 3.1.2 are applicable for any problem in which to win in the long run4,5.” the estimation of long-term probabilities plays a role – an example may be the ‘batting average’ of baseball or In contrast to the work discussed above, this thesis cricket players. This is the number of a player’s ‘hits’ will not yield an answer to the debate whether poker is divided by their ‘at bats’2 (Robinson, 2017). Vector a game of chance or skill, but will instead focus on how spaces (further explained in section 3.1.3) can be used as a computer could flawlessly take over certain aspects an approach for a wide variety of problems containing a of the game, using the theory of Bayesian statistics and large or an infinite number of (game) states that can be vector spaces. The aim of this thesis is to answer the represented by numbers, such as drug repositioning and following question: how can an agent efficiently make use of stock markets (Manchanda and Anand 2017; Bai et al. the theory of Bayesian statistics and vector spaces to represent 2018, p. 217). For this reason, concepts in this paper are and compare game situations in order to make decisions in represented both symbolically, using first-order logic, and the early stages of a multi-player No-Limit Texas Hold’em mathematically, in an attempt to make the concepts more game? In order to adequately come up with an answer to accessible for other purposes than poker. this question, this research question will be subdivided into two subquestions: How can opponents be modelled by considering a population of players? and How can these opponent models be used to make decisions based on publicly available information? 1PokerSnowie, Challenge PokerSnowie, https://www.pokersno wie.com/blog/taxonomy/term/13, Accessed on February 27th, 2020 2Major League Baseball, What is a Batting Average (AVG)?, http:// 3HighstakesDB, Biggest Poker Winners - Top Money Winners in On- m.mlb.com/glossary/standard-stats/batting-average, Ac- line Poker, https://www.highstakesdb.com/poker-players.asp cessed on February 27th, 2020 x?sortby=winners, accessed on February 25, 2020 5 The approach presented in this paper is based on a 2 Putting the idea into perspective couple of personal observations. At first, if for a willing and competent human, taking part in a relatively small The significance of poker as a testbed for Artificial sample of poker hands (a couple of hundred thousand or Intelligence has led to extensive research in the field, million) suffice to turn the person into a winning player, which yielded a wide variety of attempted approaches a computer can certainly do it with the same number of towards ‘solving the game.’ These include reinforcement hands and probably less. Secondly, when playing against learning (Dahl, 2001) and neural networks (Davidson an unknown opponent, one uses the knowledge gained 1999; Billings et al. 2002, p. 226-227). The approach previously by playing against other opponents, which which is presented in this thesis will solely try to indicates the use of a ‘population’ – a statement supported maximize estimated expected value for a

Load more