An approach of solving early phase multiplayer No-Limit Hold’em using empirical Bayesian statistics and vector spaces

A thesis presented by Damiaan Reijnaers (10804137) for the degree of Bachelor of Science in Artificial Intelligence

Credits: 18 EC

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam

Supervisor dr. M.A.F. Lewis Institute for Logic, Language and Computation Faculty of Science University of Amsterdam Science Park 107 1098 XG Amsterdam

March 20th, 2020

1 Abstract This thesis aims to propose an efficient methodology for the purpose of representation, and decision-making based on comparison, of early-game states in No-Limit Texas Hold’em poker. The paper presents a predictive model in the form of an augmented decision tree based on the distance between geometrically represented game situations in an euclidean vector space, wherein opponent models contribute as situational characteristics. This research identifies a shortcoming in existing work on opponent modelling, and solves it by introducing a mixed technique based on Bayesian statistics and beta-binomial regression. An implementation is suggested and tested. By using the proposed method, it can be concluded that an artificial agent is able to distinguish between different game situations and to make (a variety of) decisions based on situational factors.

2 Contents

Abstract 2

1 Introduction 5

2 Putting the idea into perspective 6

3 Method 8 3.1 Proposed methodology ...... 9 3.1.1 General definitions ...... 9 3.1.2 Opponent modelling ...... 10 3.1.3 Situational space ...... 15 3.1.4 Decision trees ...... 18 3.2 Suggested implementation ...... 21 3.2.1 Dataset ...... 21 3.2.2 Population ...... 21 3.2.3 Decisions based on situations ...... 22 3.3 Experiments and results ...... 26 3.3.1 Weights ...... 26 3.3.2 Estimated hole card ranges ...... 26 3.3.3 Predictions for actions and raise sizing ...... 27

4 Conclusion and discussion 28

Bibliography 31

Appendix A - Relevant rules of Texas Hold’em Poker 33

Appendix B - Glossary of used poker terms 35

Appendix C - Questionnaire concerning quality of experiment results 37

3 List of Figures

1 Dependency graph for variables in a poker game ...... 9 2 Visualisation of VPIP- and PFR/VPIP-values for players in dataset ...... 11 3 Clusters of player characteristics ...... 13 4 Relation between number of observations and player metrics ...... 14 5 Beta distributions resulting from beta-binomial regression ...... 15 6 Generated situational space for a raise from first ...... 16 7 Relation between folds and raising size ...... 19 8 Sketch of situation with raise, call and fold to agent on BU ...... 20 9 Sketch of situation with raise from first position, folds to agent on SB ...... 27 10 Part of decision tree after agent re-raises ...... 27

List of Tables

1 Estimations for player metrics using various methods ...... 22 2 Analysis of parameter d in equation 14 ...... 23 3 Observations of situations in used dataset ...... 23 4 Estimated weights for raise from first position ...... 26 5 Estimated hole card range for a player who raised from first position...... 27

List of Algorithms

1 Estimating a raising size for the agent ...... 25

4 1 Introduction The game’s complexity does not necessarily implicate that a computer can never be profitable playing games of The rise of Deep Blue, solving chess in 1996; and the poker. Since there is a clear distinction between profitable recent victory of AlphaGo, ‘solving’ the game of Go and non-profitable players, it can be assumed that at least in 2017—the human player against whom AlphaGo a part of the game is not based on chance. And, however competed actually won one out of three rounds, which small this part in the distribution between skill and chance can be called a victory for AlphaGo, but can merely be might be, it is big enough to turn hundreds of online play- called a definitive ‘solution’—has made developments ers into winners of tens of thousands of US dollars3. Two towards solving Texas Hold’em poker more relevant. studies involving groups of instructed and non-instructed Occasionally, programs such as PokerSnowie gain a poker players reinforced the statement of a greater role considerable amount of attention by making progress of skill than chance (DeDonno and Detterman, 2008). towards creating an ‘unbeatable poker algorithm1.’ Moreover, although broadly sceptical on the matter, However, poker is a totally different game, which, unlike Meyer et al. (2013) as well states that “experts seem to Chess and Go, involves a great deal of chance. It is a game be better able to minimize losses when confronted with with imperfect and unreliable information, involving disadvantageous conditions.” Nevertheless, it is worth opponent modelling, risk management and deception – noticing that besides concluding “that the outcomes of this illustrates the relevance of the game to significant poker games are predominantly determined by chance,” areas in AI research (Billings et al., 1998). Meyer et al. nuances this conclusion by stating that their conclusion applies “at least to short game sequences,” Despite the relatively narrow scope of this thesis, thus leaving long game sequences open for discussion. multiple different disciplines of AI are involved in DeDonno and Detterman filled up this gap by showing this paper. Moreover, the methods presented in this that skill is the determining factor in long-term outcome. paper are relevant for broader use than just poker. The This is the same advice professional poker coaches teach to-be-introduced formulas for beta-binomial regression their students: “it is about focusing on small advantages in section 3.1.2 are applicable for any problem in which to win in the long run4,5.” the estimation of long-term probabilities plays a role – an example may be the ‘batting average’ of baseball or In contrast to the work discussed above, this thesis cricket players. This is the number of a player’s ‘hits’ will not yield an answer to the debate whether poker is divided by their ‘at bats’2 (Robinson, 2017). Vector a game of chance or skill, but will instead focus on how spaces (further explained in section 3.1.3) can be used as a computer could flawlessly take over certain aspects an approach for a wide variety of problems containing a of the game, using the theory of Bayesian statistics and large or an infinite number of (game) states that can be vector spaces. The aim of this thesis is to answer the represented by numbers, such as drug repositioning and following question: how can an agent efficiently make use of stock markets (Manchanda and Anand 2017; Bai et al. the theory of Bayesian statistics and vector spaces to represent 2018, p. 217). For this reason, concepts in this paper are and compare game situations in order to make decisions in represented both symbolically, using first-order logic, and the early stages of a multi-player No-Limit Texas Hold’em mathematically, in an attempt to make the concepts more game? In order to adequately come up with an answer to accessible for other purposes than poker. this question, this research question will be subdivided into two subquestions: How can opponents be modelled by considering a population of players? and How can these opponent models be used to make decisions based on publicly available information? 1PokerSnowie, Challenge PokerSnowie, https://www.pokersno wie.com/blog/taxonomy/term/13, Accessed on February 27th, 2020 2Major League Baseball, What is a Batting Average (AVG)?, http:// 3HighstakesDB, Biggest Poker Winners - Top Money Winners in On- m.mlb.com/glossary/standard-stats/batting-average, Ac- line Poker, https://www.highstakesdb.com/poker-players.asp cessed on February 27th, 2020 x?sortby=winners, accessed on February 25, 2020

5 The approach presented in this paper is based on a 2 Putting the idea into perspective couple of personal observations. At first, if for a willing and competent human, taking part in a relatively small The significance of poker as a testbed for Artificial sample of poker hands (a couple of hundred thousand or Intelligence has led to extensive research in the field, million) suffice to turn the person into a winning player, which yielded a wide variety of attempted approaches a computer can certainly do it with the same number of towards ‘solving the game.’ These include reinforcement hands and probably less. Secondly, when playing against learning (Dahl, 2001) and neural networks (Davidson an unknown opponent, one uses the knowledge gained 1999; Billings et al. 2002, p. 226-227). The approach previously by playing against other opponents, which which is presented in this thesis will solely try to indicates the use of a ‘population’ – a statement supported maximize estimated expected for a decision tree, by Chen and Ankenman (2006, p. 38, 67). using statistics directly derived from past observations (further explained in section 3.1.4) – this highly benefits It should be noted that it is assumed that the reader the explainability of choices the agent makes. is aware of the rules of Texas Hold’em, which is necessary to understand the essence of the method presented in According to popular opinion within the commu- this paper. A brief introduction to the rules of the game nity of poker players; within their community, poker relevant for this thesis are outlined in Appendix A. players are divided into two groups: mathematical players Throughout the document, conventional poker terms will (players who base their decisions on calculations) and be used. A list of these terms and their explanations can intuitive players (players who base their decisions on a be found in the glossary in Appendix B. This document certain ‘gut feeling’). I challenge the distinction between will often use terms such as ’the agent’ or ’the program,’ these two ‘types’ of players and treat them as equivalent referring to a hypothetical program implementing the – mathematical players actively make use of existing described in this thesis. theorems and formulas, while intuitive players apply these same methods subconsciously. Because of the uncertain and complex nature of the game, one has to infer playing characteristics of opponents solely by observing an opponent play. The sample sizes of these observations are often small or non-existent, which makes a Bayesian approach to poker a straightforward choice compared to a frequentist interpretation6 of the game. This reasoning is followed by Chen and Ankenman (2006, p. 38).

A known approach to the problem of predicting op- ponents’ hole cards is to use Bayes’ theorem in a like manner (Chen and Ankenman 2006, p. 60-64; Van der Kleij 2010, p. 39; Korb et al. 1999). A corresponding

4My Poker Coaching, How to become a professional poker player, https://www.mypokercoaching.com/become-professio nal-poker-player/, accessed on February 4, 2020 5Best Poker Coaching, Create Good Habits with a Winning Pregame Routine, https://www.bestpokercoaching.com/create-good-h abits-winning-pregame-routine/, accessed on February 4, 2020 6In probability theory, a distinction is often made between frequentist and Bayesian approaches; the former defining probabilities as represent- ing the occurrence of events in long run frequencies, the latter basing a probability on indications of the plausibility of an event. “A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.” - Karl Pearson

6 approach is taken for the prediction of opponents’ actions In addition, when combining multiple variables into a (Ponsen et al. 2008; Southey et al. 2012; Korb et al. Bayesian model, which is generally required when mod- 1999). This thesis introduces an empirical Bayesian elling a multi-player No-Limit Hold’em game, another approach towards the modelling of opponents. The problem arises: the huge complexity of the game. A large proposition made in this paper is uncommon—and proba- number of possible game states causes data sets to often bly unprecedented—in the field of research on poker. The be too sparse for reliably representing the needed prior method allows for an accurate estimation of statistics on beliefs9,29. The choice of vector spaces also formulates a opponents (e.g. a player’s degree of ) based on solution to this problem, as any previously observed poker the player population. It demonstrates a bias in existing game state can be considered in a weighted comparison work, such as in the mentioned Ponsen et al. (2008), with a newly encountered game state in which the same and solves it using beta-binomial regression which is action sequence occurred. This model will still behave explained in section 3.1.2. Although Bayesian statistical similarly as when a complete Bayesian model would methods will be extensively used in the most fundamental have been considered, albeit still achieving to avoid the aspects of the opponent models, this paper will differ from explained complications related to continuous bet and the earlier mentioned work by using vector spaces for the stack sizes, and being less prone to small samples. This representation of different game states and the prediction will be further discussed at the end of section 3.1.3. of an opponent’s actions, hole cards and raise sizes. This choice is inspired by the idea of conceptual spaces. Essential for calculating the expected value for branches of the to-be-constructed decision trees, as explained Conceptual spaces allow for the representation of in section 3.1.4, is estimating the relative strength or concepts in a geometric space, in which the dimensions equity of a starting hand. After discounting the two represent characteristics (or features) of the concept. A 50 = , , hole cards initially dealt to the agent, 5 2 118 760 similarity function is used to measure the similarity of different board combinations exist. The strength of one concepts within the space (Gärdenfors, 2004). An imple- single starting hand versus another hand can be precisely mentation based on conceptual spaces is pre-computable calculated by concerning all possible run-outs of the and handles bet sizing and stack sizing in a continuous board. The results of these calculations are readily way—No-Limit Hold’em is often deemed ‘too complex’ available10. In more thorough problems, such as when for AIs to solve because of the abundance of possible game determining the strength of a starting hand versus a range , states due to the allowance of bet sizes without limit7 8 of hands, a hand’s equity can be approximated with (Johanson, 2013)—whereas different implementations Monte Carlo simulations11 and are observed to be fastly opt for pre-specified static bet sizes through abstractions converging12 (Metropolis and Ulam, 1949, p. 335-341). (Moravčík et al. 2017, p. 3; Brown and Sandholm 2017; Many different implementations of hand evaluators are Brown et al. 2018), including already cited work (Van der available on development platforms13, many of Kleij, 2010, p. 44). The implementation proposed in this which are based on mechanisms invented for efficient thesis does not pre-specify any kind of bet sizing. computing14. Large lookup tables for pre-calculated evaluations, such as for the Two Plus Two evaluator (containing 32,487,834 entries) are freely available15.

7Pokernews, Artificial Intelligence and Holdem, Part 3: No-Limit 9More formally called the prior probability of an event occurring. Holdem, The Next Frontier, https://www.pokernews.com/strate It expresses one’s ‘belief’ of a random event happening when applying gy/artificial-intelligence-hold-em-3-23218.htm, Accessed Bayesian statistical inference. on November 15, 2019 10PokerStove, preflop-matchups.txt.gz, http://web.archive.org/ 8Cardschat, Poker Bots Arent Powerful Enough to Solve No Limit Hol- web/20110612052656/http://www.pokerstove.com/analysis/ dem (Yet), https://www.cardschat.com/news/poker-bots-are preflop-matchups.txt.gz, accessed on February 28th, 2020 nt-powerful-enough-solve-no-limit-holdem-yet-52909, Ac- 11Monte Carlo simulations are a class of random sampling algorithms cessed on November 15, 2019 to approximate probabilities by considering subsets of a set of events.

7 Considering that a hand’s perceived equity depends on 3 Method various factors, such as ‘post-flop playability,’ other ap- proaches have been studied as well. Dalpasso and Lancia This section is composed of three subsections. The first (2015) observe the concept of equity as a combination of subsection is concerned with the proposed methodology features of a (hand on the) flop, such as ‘the flop contains and it effectively answers the main research question one overcard.’ Although based on the limit variant of the stated in the introduction: how can an agent efficiently game, groupings of starting hands have been proposed by, make use of the theory of Bayesian statistics and vector spaces among others, Johanson et al. (2013) and Sklansky and to represent and compare game situations in order to make Malmuth (1999, p. 14-15). The hand groupings stemming decisions in the early stages of a multi-player No-Limit Texas from the latter mentioned work will be used in section Hold’em game? In section 3.2, an implementation of the 3.1.3 to ‘learn’ weights for the formerly introduced vector proposed methodology is suggested. Section 3.3 proceeds spaces. As the scope of this thesis pertains only to the ‘pre- to present the findings of this implementation and serves flop’ phase of the game, a precise interpretation of equity to strengthen the usefulness of further investigating the, is prefered over an interpretation depending on ‘post-flop in section 3.1, proposed methods. In appendix C, as an play.’ In this thesis, Monte Carlo simulations will be used additional feature, a questionnaire and its results regarding to estimate the equity of a hand versus (possibly multiple) the quality of the suggested implementation is attached. opponents’ ranges of hands. In this thesis, a of 6,552,060 hands and 63,657 different players is consistently used. Each hand is played by six players. The choice of this particular dataset is motivated in section 3.2.1.

Only the first stage of the game is considered: the pre-flop betting round. This is the first betting round of the game when no ‘community cards’ have been revealed. For convenience purposes this part of the game will be referred to by using the following terminology: poker game, game, poker hand or hand. All games are assumed to take place in a ‘rake-free environment:’ no commission is withheld, unlike usually the case in poker games. Although briefly explained in appendix B, it is necessary to clarify what is meant by a ‘hole card range.’ At the start of every hand, player are dealt two hidden cards, which are used to form combinations with five community cards visible to all players. These cards are a player’s ‘hole cards.’ A ‘hole card range’ is a group of cards which a player is assumed to hold in a certain situation – instead of predicting an opponent’s exact hole cards, a weighted spectrum of possible holdings is proposed.

13GitHub, XPokerEval - A collection of poker hand evaluation source code compiled by James Devlin., https://github.com/tangentfo rks/XPokerEval, accessed on February 28th, 2020 14Suffecool, Cactus Kev’s Poker Hand Evaluator, http://suffe.co ol/poker/evaluator.html, accessed on February 28th, 2020 12Miscellaneous Remarks, Ideas, Trials, Poker 4: Monte Carlo Anal- 15GitHub, HandRanks.dat, https://github.com/christophsc ysis, http://oscar6echo.blogspot.com/2012/09/poker-4-mon hmalhofer/poker/blob/master/XPokerEval/XPokerEval.TwoP te-carlo-analysis.html, accessed on February 28th, 2020 lusTwo/HandRanks.dat, accessed on February 28th, 2020

8 ω ∈ Ω Ω Ωβ = {ω ∣ β ∈ Bω } big blinds in order to preserve the ability to shift between, and take into account different stakes.

[sβ ∣ β = m ∧ om ∈ O] • An ordered action sequence Aω of I actions ⟨α1,...,αI⟩. During a game, this action sequence can { ( ) ∣ ∈ ∧ ∈ } ζ γ sβ Υ oβ oβ Bω oβ O be expanded by an action [αi ∈ ϕ(ωh)] performed by [p = {{γ,ζ,η},β} ∧ p ∈ Bω ] ′ a player n n . B – A poker situation or situation is reffered to by a game’s action sequence. If referring to a η α ∈ ϕ(ω) i+1 A O τ similar situation, an identical action sequence is assumed, regardless of the size of x when ∀pn ∈ Ψ(ωh) αi = raise x. Figure 1: Dependency graph for variables in a poker game – x denotes the value of the total bet including ω. The dashed line illustrates the dynamics of players the amount before the bet’s increment. In this influencing each other. Dotted lines point to logical ex- paper, raise sizes are also denoted as a percent- pressions for relations. age for the total bet, with respect to the . For example: if the pot contains $0.15 and a player raises to $0.25, this raise is denoted as a 3.1 Proposed methodology 166.67%-raise. This subsection has been further subdivided into four And in general defined are: parts. The first part introduces a formal explanation of the • A function ϕ(ωh) returning a set of currently legal game and deals with general definitions which are used actions a ⊂ {fold,call,check,raise x} where x is a dis- in the three subsequent subsections to answer the research crete amount between νω and γβ . question. Section 3.1.2 addresses the first subquestion: How can opponents be modelled by considering a population • A function ψ(ωh) returning a one-element set (or the of players? While the two remaining sections focus on the empty set) containing the player pn = {{γ,ζ,η},β} second subqestion: How can these opponent models be used who is to generate the next action αi+1. The ordered to make decisions based on publicly available information?. sequence of all acting players is denoted by Ψ(ωh) and we denote the i-th acting player pi by Ψi.

3.1.1 General definitions • A set Bω = {β ...β } ⊆ O of N players and a ′ 1 N nested set Bω = {p ,..., p } = {{γ,ζ,η}}×B where A set of definitions and variables are introduced below. 1 n γβ denotes the player’s current stack size divided The dependence of these variables is shown as a graphical by νω ; ζβ denotes the relative position on the ta- model in figure 1. 1 2 ble; and ηβ ,ηβ ∈ {2,3,4,5,6,7,8,9,10,J,Q,K,A}× 1 2 A poker game will be denoted by ωh and is defined {♠,♡,♣,♢},ηβ ≠ ηβ denotes the player’s hole as a set consisting of the following elements: cards.

• An integer νω denoting the value of the big . – Since players can get raised, Ψ(ωh) might con- tain the same β-element multiple times. – Players on a later position at the table have ac- – When referring to a player’s hole cards, a two- cess to more information (the actions of earlier letter notation (such as AK and T9 for variants positioned opponents) which influences action A K 10 9 of ♣ ♠ and ♣ ♣ ) is often used, which may in- taken on such position (seat). clude an additional o or s (see appendix B under 10 10 10 10 – The player’s stack is divided by the number of ‘suited’). A ‘T’ refers to a ten ( ♣ , ♠ , ♡ or ♢ ).

9 • A set Ω = {ω1 ...ωH } of H previously com- 3.1.2 Opponent modelling pleted games. Each completed poker game ω = h One of the unknown variables defined in figure 1 is the {Aω ,Bω ,νω } is inferred by parsing hand histories. strategy sβ or playing style of an opponent. In accordance with section 3.1.1, it is assumed that a player’s strategy τ • A variable representing the factor of time. depends on the previous games in which the player participated. This relation is illustrated in figure 1. – A population changes its playing style over time The agent will emulate and enhance this behaviour by since opponents are influenced by each other additionally taking into account games in which the agent and by the increasing availability of outlines on did not participate. strategy and the mathematics of the game16,17. A form of representing poker situations is needed • A set O of all possible opponents, which is referred in order to compare similar game situations. In the next to as the population. Every possible opponent is section, a multi-dimensional space will be introduced in considered as having strategy Υ(om) = sm. Further, which vectors live whose components consist of values ′ = { ∣ ∈ ∧ ω ∈ } describing the characteristics of these situations. The defined is a set O o1 ...oM om Bωh h Ω of all previously encountered opponents. described opponent models of each player participating in the hand serve as some of the components of these vectors. Since only the pre-flop part of the game is considered, Equation 1 and Equation 2 are introduced for describing sβ . These two metrics will be used solely to characterize each player and to identify similar players, by comparing these metrics in a geometric space.

count({ω ∣ ω ∈ Ω ∧ β ∈ Bω ∧ ∃α (α ∈ Aω ∧ α ∈ {call,raise x} ∧ β ∈ Ψ (ω))}) VPIP(β) = i i i i ⋅ 100 (1) count({ω ∣ ω ∈ Ω ∧ β ∈ Bω })

count({ω ∣ ω ∈ Ω ∧ β ∈ Bω ∧ ∃α (α ∈ Aω ∧ α = raise x ∧ β ∈ Ψ (ω))}) PFR(β) = i i i i ⋅ 100 (2) count({ω ∣ ω ∈ Ω ∧ β ∈ Bω })

16Reddit, How has NLHE strategy changed over the last 5 - 17TwoPlusTwo, Will Poker games continue to get tougher?, 10 years?, https://www.reddit.com/r/poker/comments/2ojlig https://forumserver.twoplustwo.com/32/beginners-quest /how_has_nlhe_strategy_changed_over_the_last_5_10/, Ac- ions/will-poker-games-continue-get-tougher-1435373/, cessed on November 20th, 2019 Accessed on November 20th, 2019

10 Equation 1 formalizes the concept of a VPIP-value – the percentage of hands a player voluntarily puts money into a pot when presented with the opportunity of doing so. This value is an indicator of the tightness or looseness 80 of a player, i.e. how many hands a player likes to play. Equation 2 is similar to the preceding equation, 60 except that in this equation only raises count towards the percentage. The resulting value is the player’s pre-flop 40 PFR/VPIP raise- or PFR-value and indicates the aggressiveness or passiveness of a player pre-flop. 20

In agreement with commonly held beliefs in the 0 poker community, different groups of poker players exist, 20 40 60 80 distinguished by their playing characteristics. The plots in VPIP figure 2 hint the existence of at least three of the ‘classical’ (a) Sample of 3,821 players (≥ 1000 hands) groups of players which are believed to significantly populate the overall player population: tight-aggressive, tight-passive and loose-passive players. For a minimum 100 of 1,000 hand observations per player, the Mean Shift procedure (Fukunaga and Hostetler, 1975) seems to 80 affirm this intuition as shown in figure 3a on page 13. Thorndike’s famous Elbow method (Thorndike, 1953) as 60 well hints the existence of three clusters when a clustering range of 1 to 10 is specified. This is visualized together 40 with respective k-means clustering (with k = 3) in figure PFR/VPIP 3b and 3c (Lloyd, 1982). These findings confirm the 20 usefulness of characterizing players by the two introduced metrics, as they convey sufficient information to classify 0 a playing style. 20 40 60 80 100 VPIP As oftentimes there are too little observations (or (b) Sample of 20,989 players (≥ 100 hands) no observations at all) on opponents to be encountered, a frequentist approach to calculating an individual opponent’s VPIP and PFR values is often inaccurate. Figure 2: Visualisation of VPIP- and PFR/VPIP-values Nevertheless, many profitable players use for players in dataset software based on frequentist analyses to keep track of these statistics18,19,20. Although some players opt to

18BeatingBetting, How Valuable Are Poker HUDs?, https: //www.beatingbetting.co.uk/poker/best-poker-hud/#How_ Valuable_Are_Poker_HUDs, accessed February 29th, 2020 19CardsChat, How necessary is a HUD?, https://www.cardscha t.com/f61/how-necessary-a-hud-234552/, accessed on February 29th, 2020 20TwoPlusTwo, What % of players use a HUD?, https: //forumserver.twoplustwo.com/32/beginners-questions /what-players-use-hud-962957/, accessed on February 29th, 2020

11 tweak their software21,22; the only known software in of probabilities of a probability (in this case a players’ the field to have directly implemented an alternative VPIP or PFR/VPIP value) as shown in equation 3 to 6 acknowledges the shortcomings of a frequentist approach, (Raïffa and Schlaifer 1961; Diaconis and Ylvisaker 1979, but does not implement a full solution such as described p. 274). Here, s denotes the number of successes (e.g. below23. To solve the problem of inaccurate statistics, a player voluntarily putting money into a pot), while players using this kind of software generally wait until f = n − s denotes the number of failures where n is the their statistics on players start to converge24,25,26, missing total number of observations on a player. As P(θ) is valuable information in the meantime. beta distributed, and P(X∣θ) is binomially distributed, P(θ∣X) is also beta distributed. By integrating the prod- Instead, in this paper these metrics are proposed to uct of their probability mass and density functions, the be estimated using the population as prior knowledge. By probability density function for another beta distribution, replacing the introduced PFR-metric with a derived met- with parameters α + s and β + f , is derived. This is the PFR ⋅ ric abbreviated as PFR/VPIP and defined as VPIP 100, player-specific distribution based on observations on the both metrics can be interpreted as probabilities27. Since analysed player. Note that α,β and σ in this section refer the population can be modelled as a distribution of these to parameters of distributions rather than the definitions player characteristics, the population essentially becomes in section 3.1.1. a distribution of probabilities. As the beta distribution28 is a continuous probability distribution defined on population [0,1], this distribution is a probability distribution of player specific Ì ÏÍ Î Ì ÏÍ Î probabilities. This makes it perfectly suitable as the P(X∣θ) ⋅ P(θ) P(X∣θ) ⋅ P(θ) Bayesian prior29. Since the player characteristics are P(θ∣X) = = P(X) ∫ P(X∣θ) ⋅ P(θ)dθ probabilities themselves, the Bayesian likelihood is Bernoulli distributed30. The theory of conjugate priors (3) can be used to construct a player’s own Beta distribution α−1 β−1 nθ s( − θ) f ⋅ θ (1−θ) s 1 B(α,β) 21TwoPlusTwo, Help - Using PT4 to analyse population tendencies, = (4) θ α−1( −θ)β−1 https://forumserver.twoplustwo.com/185/heads-up-sng-s ∫ nθ s(1 − θ) f ⋅ 1 dθ pin-gos/help-using-pt4-analyse-population-tendencie s B(α,β) s-1208561/, accessed on March 1st, 2020 22TwoPlusTwo, , https://forumserver.twoplustwo.com/15/ s+α−1 f +β−1 poker-theory/wilson-score-interval-1023431/, accessed on n θ (1−θ) March 1st, 2020 s B(α,β) 23 = (5) Poker Copilot, User Guide – Statistics or probabili- nθ s+α−1(1−θ) f +β−1 ties?, https://pokercopilot.com/userguide/6/en/topic/sta ∫  s θ B(α,β) d tistics-or-probabilities, accessed on February 29th, 2020 24Smart Poker Study, HUD Reliability: Number of Hands and Sam- ple Sizes, https://www.smartpokerstudy.com/hud-reliability +α− +β− -number-of-hands-and-sample-sizes-226, accessed on March θ s 1( − θ) f 1 = 1 = (α + β + ) 1st, 2020 ( + α + β) Beta s, f 25PokerStars School, HUD Stats Youre Doing it Wrong!, B s , f https://www.pokerstarsschool.com/strategies/hud-sta (6) ts-doing-it-wrong/668/, accessed on March 1st, 2020 26 How Do I Use HUD Stats With Small Samples? TwoPlusTwo, , 29 https://forumserver.twoplustwo.com/32/beginners-quest Bayes’ theorem defines a probability by taking prior knowledge ions/how-do-i-use-hud-stats-small-samples-1551067/, into account. A probability of an event A given evidence B is given by: ( ∣ ) = P(B∣A)P(A) ( ∣ ) accessed on March 1st, 2020 P A B P(B) where P B A conveys the likelihood of evidence 27If we wouldn’t have divided the value for a player’s PFR by their B given that A indeed occurred, and P(A) conveys the prior probability VPIP-value, then their PFR-value would always been capped at their of A occurring at all. VPIP-value. 30The probability density function of a Bernoulli distribution is − 28The probability density function of a Beta distribution is f (x;α,β) = f (k; p) = pk(1 − p)1 k where k ∈ {0,1} and 0 ≤ p ≤ 1. Here, p is α− β− 1 1( − ) 1 ≤ ≤ α,β > B(α,β) x 1 x , where 0 x 1 and 0. the parameter of the Bernoulli distribution.

12 A subsequent problem arises: the majority of players in a population are often players on which there are very few observations. The dataset used in this paper consists 80 TAgTAg for 67,02% of players with less than 100 observations. Attempting to improve the level of certainty of a dataset 60 by leaving out all players with a number of observations below a certain threshold, results in a bias. As pointed out 40 PFR/VPIP in section 2 of this thesis, some well-received papers on this subject still incorrectly implement this process. The L/TPsL/TPs 20 reason for the biased Bayesian prior is straightforward: LPsLPs professional players can be assumed to play differently 0 than ‘recreational players,’ but professional players also 20 40 60 80 play more often and thus have a higher chance of ending up VPIP (more frequently) in the dataset of observations on players. (a) Mean Shift clustering (3 clusters detected) By letting n denote the total number of played hands for a player in the dataset, I proceed to verify the statement k = 3,score = 439298.53 1750000 above by calculating the mean VPIP and PFR/VPIP for all

1500000 players for every possible n. So, the means for n = 55 are

1250000 the averaged metrics of every player who participated in a total of 55 hands. To indicate convergence of a metric, an 1000000 uncertainty value based on the Wilson Score Interval31 750000 is introduced and shown in equation 7 (Wilson, 1927). 500000 This uncertainty value is plotted against a logarithmic

250000 scale of n in figure 4 on page 14. By performing both

1 2 3 4 5 6 7 8 9 weighted and unweighted linear regression, the figure k illustrates a correlation between the VPIP and PFR/VPIP (b) Distortion score elbow for K-means clustering metrics and the number of observations on players. These graphs comply with a generally known idea among the poker player community that recreational players (or fish in poker jargon) often play more hands and show less 80 , TAgTAg aggression32 33.

60 T/LAgT/LAg × 2 2 ⋅( − )+ z z pˆ 1 pˆ 4⋅n 40 pˆ + − z ⋅ PFR/VPIP = ⋅  − 2⋅n n Uncertainty 2 pˆ 2 (7) 1 + z 20 T/LPsT/LPs n 31The Wilson Score Interval is a method for showing binomial pro- 0 portion confidence intervals. 20 40 60 80 32BlackRain79, What Does VPIP Mean? An Extremely Sim- VPIP ple Explanation, https://www.blackrain79.com/2016/07/what -is-vpip.html, accessed on December 7th, 2019 (c) K-means clustering of Figure 2a (k = 3) 33PokerVIP, Tips for Identifying the Recreational Players, https://www.pokervip.com/strategy-articles/texas-h old-em-no-limit-beginner/tips-for-identifying-the-rec Figure 3: Clusters of player characteristics reational-players, accessed on December 7th, 2019

13 In equation 3 to 6, a new beta distribution is introduced of which the parameters depend not only on observations on the population as a whole, but also on player-specific 1.0 Uncertainty = 2(pˆ-Wilson(z = 0.95,n, pˆ = 0.257)) observations. As information on specific players gradu- Mean VPIP for players with n hands Trend (unweighted linear regressor) ally accumulates, information on the population becomes 0.8 Trend (weighted by number of players) less relevant for the estimation of player-specific statistics. The more we observe a player’s play, the more we look 0.6 at that player’s playing history instead of to that of the population. The less information we have on a player, the more we use the population to fill the gap. If we 0.4

totally lack information on an opponent—if we have never VPIP as probability observed that particular opponent—the population is the 0.2 only data we look at. In order to deal with the problem

related to uncertainty, as explained above and illustrated 0.0 in figure 4, we let the original parameters (α,β) depend 0 2 4 6 8 on ln(n). This way, a different pair of (α,β) can be ln(n) for 0 < n ≤ 5000 (3,821 players) obtained for every possible n. Now this pair of parameters (a) Average VPIP values among players seem to decrease as cer- (α,β) (which now only depends on the number of hands tainty increases. Value pˆ = 0.257 is chosen as it is the weighted played) can be updated in accordance with equation 6 as average VPIP value over the whole sample. player-specific observations come through.

µ In equation 8, is defined as the inverse logit Uncertainty = 2 (pˆ-Wilson(z = 0.95,n, pˆ = 0.720)) function (or expit function)34 of a linear function of ln(n). Mean PFR/VPIP for players with n hands 0.8 The parameters α and β can now be defined with regard Trend (unweighted linear regressor) to a mean and dispersion value (Pananos, 2020). By Trend (weighted by number of players) defining σ as the dispersion value, α and β are written 0.6 in terms of µ and σ in equation 9. By considering the probability density function of the beta distribution; a, 0.4 b and σ can be estimated using maximum likelihood estimation35 (Robinson, 2017, p. 56-64). The effect is 0.2 illustrated in figure 5 on page 15, which plots the beta PFR/VPIP as probability distributions for four subsequent exponents of n. As

expected, the distribution shifts towards a lower value for 0.0

a player’s VPIP as the number of played hands increases. 0 2 4 6 8 Finally, the expected value for a metric estimated using ln(n) for 0 < n ≤ 5000 (3,821 players) this method is given in equation 10. (b) Average PFR/VPIP values among players seem to increase as certainty increases. This is partly due to decreased VPIP values 1 as shown in subfigure a. Value pˆ = 0.720 is chosen as it is the µ = (8) weighted average PFR/VPIP value over the whole sample. 1 + e−(a+b⋅ln(n)) ⇒ α = µ ⋅ σ,β = (1 − µ) ⋅ σ (9) α E[Beta(α,β)] = (10) Figure 4: Relation between number of observations and α + β player metrics

34The logit function maps probabilities (thus, values ranging from 0 to 1) to a value ranging to infinity. The inverse function, which we use in this thesis, does the opposite: it maps any value to a probability.

14 3.0 The idea of a ‘situational space’ is introduced, in 0 10 hands which a player’s expected value for the VPIP and 1 10 hands PFR/VPIP metrics are merely used as ‘input values’ 2.5 2 10 hands 3 for constructing this geometric space. For every poker 10 hands situation, a multi-dimensional space is generated in which 2.0 )

β vectors live whose components represent ‘characteristics’ , α ∣ of the situation. Each vector, directing to a point in 1.5 this space, refers to an identical historical situation (as VPIP ( p explained in section 3.1.1, a poker situation is described 1.0 to be identical if the same action sequence occurred) drawn from the dataset. Every player taking part in the 0.5 situation (in other words: players who did not fold at the point of evaluation) produces a part of the attributes 0.0 0.0 0.2 0.4 0.6 0.8 1.0 of the situation, captured by the components of the VPIP vectors living in the corresponding situational space. The following values are described for each active player: Figure 5: Non-player-specific Beta distributions resulting from performing beta-binomial linear regression on the 1. The player’s expected VPIP-value, estimated by the variable of the number of hands played by a player. method presented in section 3.1.2 and equation 10; 2. the player’s expected PFR/VPIP-value, estimated by the method presented in section 3.1.2 and equation 3.1.3 Situational space 10; The, in the previous section, introduced VPIP and 3. the player’s stack size at the beginning of the hand; PFR/VPIP values can be directly used to set up a strategy 4. whether the player was all-in (either 0 or 1); against opponents by assuming a player’s hole card range 5. the size of the raise made by the player (if applicable) to consist of the top portion of hands sampled from as a percentage with respect to the pot size; the distribution of all starting hands, where the player’s 6. the size of a second raise (if applicable) as a percent- metrics determines the size of this portion36. As it is age with respect to the pot size. not impossible for a player to hold a hand outside of this implied range (for example, players can have a liking for Each vector has an additional corresponding ‘label.’ This a certain type of hand), I opt to diverge from this idea. label captures the hole cards of the analysed player (if More importantly, players can play different cards in known); the action of the analysed player; and the size different ways – a player could, for example, decide on a of the raise made by the analysed player (if applicable). more conservative play while holding a weaker hand (e.g. An example could be the situation in which the player 9 8 on the first position raised. If focussing on the player η = ♢ ♢ ), and bet a lower amount compared to when the same player would have been dealt a stronger hand (e.g. directly due to act after the raising player, the constructed 25 K K R ♠ ♣ ). This kind of behaviour among opponents can be situational space would span since six players profitably exploited by taking a more flexible approach, participate of which one specifies a raise size (in other such as the method I propose in this thesis. words, a 25-dimensional space would be generated). The similarity of these situations can be measured by 35In maximum likelihood estimation (or MLE) the most likely param- concerning the distance between the points defined by eters for a probability distribution are estimated by either differentiating these vectors. Building on the previous example, if facing or using iterative methods of trying different parameters. a raise from the first position and analysing the player in 36The Pokerbank, How to use VPIP in poker, https://www.thep okerbank.com/articles/software/vpip/, accessed on March 8th, the second position, all identical historical situations in 2020 the dataset are used to generate vectors with labels that

15 convey information about the successive action taken by If a player is due to act after the agent, the labels can the player in the second position. If we already know that be used to predict a future action by the opponent. the player in the second position took a certain action (e.g. Elaborating on the previous example, if the agent would called the bet raised by the player in the first position), we be seated in the first position and would have yet to decide can consider all vectors with a label corresponding to that on which action to take, these situational spaces can be taken action (in this example we would only vectors constructed as if the agent would have raised, in order to reflecting historical situations in which the player in the determine the (1) likelihood of the player in the second second position called after the player in the first position position calling the raise, and (2) the estimated hole card raised) and suggest a hole card range for the currently range with which this player would call. As will be encountered opponent. Since the vectors geometrically explained in section 3.1.4, in order to make a decision encompass the characteristics of historical situations with based on predictions for every participating opponent’s the same action sequence, the closer two points are in actions, raise sizes and hole card ranges, the agent will space, the more similar the situations they represent. generate situational spaces for all actions for every likely Weight is given to the actually held hole cards with action sequence. respect to the distance between the point representing the historical situation in which these hole cards are known to Considering not all of a situation’s characteristics be held, and the point representing the current situation in are of equal importance, the weights for each individual which the opponent’s hole cards are still unknown. This vector component (corresponding to a dimension in the example is illustrated in figure 6 below. space) should be estimated. For example, the player facing a raise from the first position is probably more interested in the VPIP-value of the raiser, than in the VPIP-value of the person to act next. These weights cannot be raise size (%/pot) obtained by solving linear systems of equations as we ⎢⎡.85ˆ ⎥⎤ ⎢ ⎥ lack knowledge of any distance (or similarity) between ⎢.61⎥} part of vector in R25 ⎢ ⎥ two given situations. In order to solve this problem, two ⎣⎢.46⎦⎥ weighted Minkowski distance ‘substitutes’ for distance values are introduced. These serve as ‘outcome values,’ with which the similarity ⎡ ˆ ⎤ A K ⎢.82⎥ ♠ ♡ ⎢⎡.18ˆ ⎥⎤ ⎢ ⎥ ⎢ ⎥ (or distance) between two situations can be roughly ⎢.31⎥ ⎢.74⎥ ⎢ ⎥ ⎢ ⎥ approximated: ⎣.31⎦ ⎢.09⎥ ⎣ ⎦ • Historical situations where the hole cards of the 8 7 ♡ ♣ player for which the situational space is analysed are known are grouped together, according to Sklansky’s K 6 PFR/VPIP VPIPraiser ⎢⎡.48ˆ ⎥⎤ ♣ ♢ raiser hand groups. These hand groups, which have been ⎢ ⎥ ⎢.50⎥ shortly mentioned in section 1, embody nine ranges ⎢ ⎥ ⎣⎢.03⎦⎥ of hands in decreasing order with respect to hand strength. These groups are used when estimating Figure 6: Situational space for a faced raise from first weights for the purpose of determining hole cards position. The hole card range for the faced raiser is ap- and raise sizes. These groups are divided as follows: proximated by using the weighted Minkowski distance for (Sklansky and Malmuth, 1999, p. 14-15) comparing with past events. The 1st, 2nd and 5th compo- – Group 1: AA, AKs, KK, QQ, JJ nents of the normalized vectors are shown. For illustrative – Group 2: AKo, AQs, AJs, KQs, TT purposes, only 4 manually selected vectors out of 536,951 – Group 3: AQo, ATs, KJs, QJs, JTs, 99 – Group 4: AJo, KQo, KTs, QTs, J9s, T9s, 98s, 88 (of which 79,578 with known hole cards) are shown. The – Group 5: A9s-A2s, KJo, QJo, JTo, Q9s, T8s, 97s, coloured vector represents the ‘live situation’ for which 87s, 77, 76s, 66 hole cards should be predicted. – Group 6: ATo, KTo, QTo, J8s, 86s, 75s, 65s, 55, 54s

16 – Group 7: K9s-K2s, J9o, T9o, 98o, 64s, 53s, 44, 43s, 33, 22 mean within group mean of means within groups – Group 8: A9o, K9o, Q9o, J8o, J7s, T8o, 96s, 87o, Ì ÏÍ Î Ì ÏÍ Î 85s, 76o, 74s, 65o, 54o, 42s, 32s k » j k j » – Group 9: All other hands 1 » 1 1 1 »2 w = ⋅ ∑»  ⋅ ∑ck −  ⋅ ∑ ⋅ ∑ck  » i k » j ji k j ji » k=1 » j=1 k=1 j=1 » • Categories grouped by the actual action taken are (11) used when determining weights for the purpose of w − min w ⇒ w = i (12) predicting a future opponent’s action: (1) check/fold, i ∑i − i= wi min w (2) call, (3) raise x. 1 The similarity of two vectors (representing situations) us- ing these weights is then calculated by dividing by the weighted Minkowski distance38, as shown in equation 13. Each group now contains vectors representing historical The sum of all the similarities for a specific element (e.g. situations (note that a vector can now occur in both one of A K ♡ ♡ (hand); call (action); or 133.33 (raise size)) is divided the groups categorized by the Sklansky hand groups listed by the sum of all similarities for elements of the same type, above, and one of the groups divided by the action taken). to obtain the weight of that element being relevant to the Now, we generate a matrix of which every row represents A K analysed opponent (e.g. holding ♡ ♡ ; calling; or when a vector in the group. This matrix is then ‘divided’ by (de- raising, raising with 133.33). fined as taking the element-wise division of every value for every row in the matrix, with every value of a denom- inator vector) a vector holding the maximum encountered 1 S(a,b) = (13) values for the considered component. In other words: i 1 + ∑ = ∣ w ⋅ (a − b ) ∣ the vector components are normalized by dividing their i 1 i i i individual values by the outcome of a max()-function ex- ecuted column-wise37. If for a component the maximum encountered value equals 0, the maximum value is set to The proposed method in this paper is a combination of 1 to avoid division by zero. For every group, the means two main ideas: empirical Bayesian statistics and vec- are taken for each component over the normalized values. tor spaces. As explained in section 2, this approach is The variance of these mean values corresponding to the chosen to diminish the negative effects of having too lit- same component among groups is then taken as the weight tle data available. This effect is further reduced by not for that component in that situation. This idea rests on the only concerning situations at the same point of evalua- the intuition that the more a value varies among different tion. In other words: based on the example used on the groups, the more that value helps to determine to which previous page, if we are constructing the vector space for group it belongs. This is summarized in equation 11 and the raiser, we would also include situations in which the 12. In the last step, all weights are normalized so that player in the second position folded or re-raised, instead ∑i = k of only considering situations in which that player called i=1 wi 1 where i in c ji corresponds to the number of dimensions in the situational space, k to the number of the faced raise. If a player participated in many hands, groups and j to the number of vectors within group k. the player will often have contributed to many different points in a vector space generated for an often-occuring situation. But as players are characterised by their VPIP and PFR/VPIP-values, which serve as components for the

37NumPy v.1.17 manual, numpy.ndarray.max, Return the maximum 38The Minkowski distance is a generalization of the Eucledian distance along a given axis.https://docs.scipy.org/doc/numpy/referen (a straight line between two points) and ‘taxicab distance’ (or ‘Manhat- ce/generated/numpy.ndarray.max.html, accessed on March 9th, ten distance’ – a stepwise distance calculated by summing the absolute 2020 difference of discrete coordinates) used in normalized vector spaces.

17 vectors used to compare situations, points reflecting situ- the first position, players who did not fold before that ations in which the same player played on the same posi- raise are due to act more than once. This means that tion will automatically be positioned close to each other the implemented algorithm should be recursive and as their VPIP and PFR/VPIP-values are identical. This is causes the previously listed type of action sequence compatible with ideas such as that a player’s range often to be nestedly included within this type as well. widens when being positioned later39, and that holdings When presented with a situation in which the agent has to are usually stronger when a player raises from an early po- make a decision, a tree of all possible action sequences sition compared to when that player would be seated in a and corresponding vector spaces will be generated. As later position40. The more observations obtained on a spe- the raise-option could theoretically cause the tree to be cific player, the more significant the historical actions of infinite, the number of allowed raises is ideally limited. that particular player, and the heavier these are weighted Having observed too few instances of a situation is an in predicting future actions for that player. This can be insurmountable obstacle which results in inadequate seen as a geometric adaptation of prior information in a estimations, although to a lesser degree than when Bayesian model. Furthermore, this method allows for pat- exact situations (including raise sizes and such) would terns in an opponent’s playing style to be recognized and have been compared with each other. This problem is exploited. Also, since all players are taken into account in inevitable and can be overcome by leaving these situations the situational vectors, this model supports the idea that out of the equation. As this only applies to situations opponents adapt to each other at the table, instead of only with a low number of observations, these observations the agent adapting to the opponents. Another advantage of are inherently statistically improbable. Table 3 on page this technique is the possibility of caching the matrices— 23 shows the degree of relevance of this phenomenon to together with their corresponding weights—in which the the used dataset in this thesis. As cutting out situations vectors are contained, in order to speed up future com- could cause probabilities of actions occuring to not add putation. In addition, the values of the matrix represent up to one, the probabilities should be normalized after a explainable values which make it easier to understand why few more steps are introduced. the algorithm would decide on a particular move. An example is considered wherein the agent is 3.1.4 Decision trees seated in the fourth position, ‘BU’ or the ‘button,’ and = ⟨ ⟩ In order to explain the final concept which connects all A raise .20,fold,call . If the maximum amount of previously mentioned techniques, a distinction is made raises is set to 1, all possible action sequences are between two types of player action sequences: generated as shown in the list below. Bold text denotes the agent’s action – this style is used throughout the paper. • Action that happened before the agent is due to act. = ⟨ ⟩ For the opponents involved in this type of action only 1. A raise .20,fold,call,call,call,call = ⟨ ⟩ the hole card ranges will be estimated using the vector 2. A raise .20,fold,call,call,call,fold = ⟨ ⟩ spaces discussed in section 3.1.3. 3. A raise .20,fold,call,call,fold,call = ⟨ ⟩ • Action that happens after the agent is due to act. For 4. A raise .20,fold,call,call,fold,fold = ⟨ ⟩ these opponents the action and the hole card range 5. A raise .20,fold,call,fold,call,call = ⟨ ⟩ corresponding to that action is predicted using vector 6. A raise .20,fold,call,fold,call,fold = ⟨ ⟩ spaces. If the action is a raise, the raising size is also 7. A raise .20,fold,call,fold,fold,call = ⟨ ⟩ predicted. If a raise is made by a player not seated in 8. A raise .20,fold,call,fold,fold,fold

39CardsChat, Your Guide To Pre-Flop Calling Ranges, https: The likelihood for every listed action sequence is calcu- //www.cardschat.com/preflop-calling-hand-ranges.php, ac- lated by consulting the similarity measures for the actions cessed on March 9th, 2020 derived from the situational spaces. If an opponent 40PartyPoker, How To Play Poker In ‘Early’ Position, https://www.partypoker.com/en/how-to-play/school/a raises after the agent called or raised, the agent is due dvanced/early-position, accessed on March 10th, 2020 to act multiple times. This needs to be handled in a

18 recursive manner. Let us consider the action sequence The last ingredient is the expected final size of the pre-flop A = ⟨call,fold,fold,call1,raise .20,fold,fold,call2⟩. On pot for a generated action sequence. The implementation the second point in which the agent is due to act—when should ensure that raise and call sizes are compatible with the agent calls for the second time—the agent will assume the acting player’s stack size: players cannot bet ‘more the action sequence before call2, as if it already happened, than they have.’ As explained in section 3.1.3, raise sizes and evaluate it as the first type of action sequences. On for opponents are estimated using vector spaces. However, the first point, when the agent decides to call for the first due to the findings presented in section 3.3.1, the size with time, the agent already needs to evaluate all future actions which the agent raises is determined by linear regression. (that are likely to happen) – including the agent’s own call The first step is to create ‘buckets’ of a fixed number of after an opponent in a later position raised. As mentioned historical situations. These buckets correspond to sizes in the previous paragraph, this means that the first element of raises by players in historical situations on a specific in the list with types of action sequences is ‘nestedly’ seat, which are defined as a percentage with respect to the embedded into the second type of action sequence. pot. For every raise size encountered in the dataset, the percentage of folds by players on another specific seat is Besides needing to estimate the probabilities for ev- calculated. These buckets serve as datapoints with which ery likely action sequence, we also need to estimate the a linear function (of the form y = ax + b) describing the equity of the hand held by the agent versus the estimated relation between a raising size and the number of expected hole card ranges for the opponents still in the pot. Since folds can be obtained using ordinary least squares linear an opponent’s range is weighted, the equity of the agent’s regression41. An example is illustrated in figure 7 below. hand for all combinations of all possible simultaneous The likelihood of an opponent calling or raising can be hand holdings for every involved player is approximated predicted in a likewise manner. If a specific raising size using a Monte Carlo Simulation, as explained in the occurs more often than this fixed limit, the bucket corre- introduction (Metropolis and Ulam, 1949, p. 335-341). sponding to this raising size will contain more situations All different card combinations for a hand need to be than the set limit. taken into account, as for example, 12 combinations exist of AKo, while only 4 combinations of AKs are possible. Every pocket pair, such as AA, has 6 possible Regressor (a ≈ 0.026,b ≈ 73.096) combinations of suits. Since the agent holds cards itself, ≥ 90 Bucket of 1000 situations these cards act as blockers and allow for the removal of

some of these possible combinations in the opponents’ 85 ranges – this can skew an opponent’s range significantly

when the agent, for example, holds an ace (see section 80 3.3.2). A hand’s equity is equal to the probability of that % of folds

hand winning the pot. When only two players are actively 75 involved in a situation, ties have to be divided equally

among all players. When multiple players are involved, 70

more complicated situations occur. For example: when 100 200 300 400 500 600 700 a player with the lowest stack wins the pot, all ‘losing size of raise as % of total pot size players’ can tie between themselves and divide up the Figure 7: Folds by a player in the fourth position, reflected remaining pot. In order to overcome this complexity, in against the size of a raise by a player in the first position. all simulations, all players with the exact same ‘made hand combination’ (as shown in appendix A) should be regarded as having ‘won’ the pot. The significance of 41By minimizing the squared distance of the sampled points (in this the simulation should then be discounted by a factor case the percentage of folds in a bucket) to a line defined by two param- eters a and b (as in y = ax + b), the most ‘optimal’ parameters (a,b) can depending on the number of players who did win. be obtained. That will say, values for a and b that best fit the presented data.

19 By distincting between all possible actions for the agent perform that action. If the analysed action concerns when the agent is due to act for the first time; and by a raise, the raise size is estimated by the same manner. calculating the final pre-flop pot size for every generated Since, in this example, a limit of one raise is set and likely action sequence multiplied by the likelihood of the sequences in which the agent folds are ignored, that action sequence, and consequently multiplied by three additional vector spaces are generated: a vector the equity of the agent’s hole cards versus the estimated space describing the event in which the agent calls on weighted ranges of all active opponents; the expected BU, with the space’s labels focussing on SB; and two value for all possible actions for the agent is obtained. spaces focussing on BB for a call or fold on SB given The action yielding the most estimated expected value a call on BU. is the action the agent chooses in the presented game • For all sequences, Monte Carlo simulations are used J 8 situation. The action sequences in which the agent to approximate the equity of ♢ ♢ against all active J 8 folds are neglected and don’t need to be analysed since opponents’ hand ranges with ♢ s and ♢ s removed. their expected value is always zero. Since the hole card • For action sequences in which the agent acts by rais- ranges for opponents remain practically fixed, as shown ing, which do not exist in this example due to the in section 3.3.2; the size with which the agent raises limit; a function defining the number of folds, calls can be optimised by maximizing the expected value of and raises given a raising size is constructed for all a particular action sequence involving a raise using the active opponents. These functions are used—in com- agent’s equity in combination with the estimated fold, bination with the agent’s hand equity—to find the raise and call parameters for all active opponents. The optimal raising size for the agent. likelihoods of future actions by opponents are not adjusted • All action sequences are grouped by the agent’s by the likelihoods returned by the linear function. first action and evaluated by calculating the expected value for every possible action of the agent by multi- If elaborating on the example suggested at the be- plying the agent’s hand equity by the size of the pot ginning of this section, which is visualized in figure 8, the and by the likelihood of the action sequence occuring. steps below can be read as a summary of all previously explained methods in this thesis. These steps are a rough

sketch for an algorithm based on all explained material: a1 EP MP • The (cached) vector space is consulted for all situa- tions wherein EP raised (including situations wherein ai $0.30 a2 MP did not fold and CO did not call). Using the (cached) weights (estimated by grouping all historical situations wherein the hole cards for EP are known, BB $0.10 total pot: $0.75 $0.30 CO according to Sklansky’s hand groups), a range for EP’s possible holdings is estimated by consulting the Minkowski distance. a a • Likewise, the (cached) vector space is loaded for the 5 $0.05 3 determination of the hand range of CO. SB BU • All possible action sequences for the game are gen- a4 J ♢ 8 ♢ ♢ ♢ ♢ ♢

♢ ♢

erated, these are listed with numbers 1-8 on page 18. ♢

♢ ♢

♢ ♢ ♢ 8

• For each action in each action sequence (except for J the agent’s action) a (cached) vector space and its weights (different than those used to determine hand Figure 8: Situation in which the agent is seated on the ranges) is examined to estimate the likelihood of the button holding J8s and faces a 200% raise and a call from opponents performing the supposed action and the the first and third positions respectively. A white node hole card ranges with which the opponents would indicates that a player folded.

20 3.2 Suggested implementation a more convenient analysis of the method presented. Furthermore, in the chosen game variant, the number of This section includes several personal decisions (e.g. for players seated at the table is always six. This way, it is parameters in section 3.2.3) and serves to demonstrate a not needed to account for a variable number of players in possible implementation of the presented method in sec- the model. The dataset is purchased from a commercial tion 3.1. Although all of the personal choices posed in this company that offers data mining services43. section are motivated, the presented methods in this thesis can be implemented in various ways. 3.2.2 Population

3.2.1 Dataset These hand histories can be imported by using available commercial software44 which maintains these hands in a The need for modelling the influence of time is outlined database45. An exported player report46, including the in section 3.1.1: a population changes its playing style number of hands for a player and the frequentist calcula- over time since opponents are influenced by each other tions for a player’s VPIP and PFR/VPIP values is used for and by the increasing availability of outlines on strategy 16,17 all methods described in section 3.1.2 and the correspond- and the mathematics of the game . The population ing figures 2, 3 and 4. The file is imported in Python 3.8.2 can be further divided into multiple subpopulations as using the csv library47. The calculations are done using different groups of players play at different ‘stakes’ (i.e. NumPy48 and SciPy49. For all further described methods ν different values for , or the big blind). As shown in involving data from past observations, the database is di- figure 3, even additional subdivisions make sense. As this rectly queried using SQL50 with the psycopg2 package in thesis pertains only to demonstrating a general approach Python51. The database has the following features relevant towards designing a poker-playing agent, aiming it to be for the implementation of the proposed algorithm: expanded and further developed; ideally, the influence of these variables could be diminished or even eliminated. • A table in which hand summaries are stored in which In section 1, from a somewhat philosophical standpoint, an ordered actors (players who did call or raise) and I expressed that an agent should not necessarily be based aggressors (the big blind and all players who raised) on a dataset of unrealistic measures compared to a real sequence is stored. Players are denoted by the num- player’s playing history. bers 3-2-1-0-9-8, where 3 is the first player to act and 8 is the last player to act (the player seated on For all reasons stated above, a dataset consisting of 6,552,060 6-max Zoom hands played within a relatively 43The hands are purchased from https://hhdealer.com short timeframe (21st of July 2017 - 4th of September, 44The program used for this thesis is PokerTracker 4: https//www.p 2017) on a fixed stakes size ($0.05/$0.10) is chosen. okertracker.com 45In this thesis, a PostgreSQL database is used which ‘Zoom’ is a subtype of poker wherein opponents are strongly resembles the database structure illustrated on changed after every hand played, offering players a faster https://www.pokertracker.com/guides/PT3/databases/ game pace42. As opponents change after every hand, -3-database-schema-documentation, accessed on March 9th, 2020 players can get less exploitative (as it becomes harder 46PokerTracker, Configuring reports, https://www.pokertrack for players to observe each other individually) and are er.com/guides/PT4/tutorials/configuring-reports, accessed thus expected to influence each other’s play to a lesser on March 9th, 2020 degree. By selecting a dataset of hands played on fixed 47Python 3.8.2 documentation, csv – CSV File Reading and Writ- ing, https://docs.python.org/3/library/csv.html, accessed stakes during a short timeframe, the significance of on March 9th, 2020 the variable of time and stakes decreases, allowing for 48NumPy, https://numpy.org, accessed on March 9th, 2020 49SciPy, https://scipy.org, accessed on March 9th, 2020 42PokerStars, ZOOM! PokerStars launches new fast-paced poker 50SQL is a simplified language with which search queries in game in beta, https://www.pokerstars.com/en/news/zoom-pok can be efficiently formulated. erstars-launches-new-fast-paced-092053/15308/, accessed 51PyPI, psycopg2 - Python-PostgreSQL Database Adapter, https: on March 7th, 2020 //pypi.org/project/psycopg2/, accessed on March 9th, 2020

21 the big blind). The actors sequence for the situa- The maximum likelihood estimation, as explained in tion presented in figure 8, evaluated before the agent section 3.1.2, is executed using the Limited-memory acts, is 31 while the aggressors sequence equals 83. Broyden—Fletcher—Goldfarb—Shanno algorithm with Historical situations are fetched from the database bound constraints (Byrd et al., 1995, p. 1190-1208). This by performing queries aimed at restricting by these produces the results shown in table 1. columns. • A table in which player statistics are stored. This 3.2.3 Decisions based on situations table contains the sizes with which the player raised (if applicable), the player’s stack size at the beginning The retrieved records from the database and export of the hand and whether the player was all-in pre- file, augmented with optimised player data based on flop. Additionally, the table records the hole cards beta-binomial regression, are converted into rows of a the player held (if known) determined by a number NumPy-matrix. These matrices are cached in comma- between 1 and 169 for all distinguishable hands (e.g. separated (CSV) files together with their labels. The AKo, AKs, AA). weights are stored, together with the maximum value • A table which contains the players’ names. These for every column in the matrix (the ‘factors’ which names are then matched against the exported report are used to normalize the columns before calculating (described above) which contains all frequentist cal- the weights), in separate CSV-files. The distances are culations for all players’ VPIP and PFR/VPIP values. calculated using the weighted Minkowski algorithm • All previously listed tables are related to each other with an order of one for the norm of the difference, by specified hand IDs and player IDs. with the help of SciPy’s spatial distance submod- ule52. The degree of coverage by the used dataset of all possible poker situations is reported in table 3.

Table 1: Different methods of estimations for a player’s VPIP and PFR/VPIP value given a variable amount of hands played. From left to right for each metric: frequentist interpretation, interpretation using empirical Bayes based on equation 6 and 10, interpretation using empirical Bayes and beta-binomial regression based on equations 6, 9 and 10.

VPIP PFR/VPIP hands played frequentist empirical Bayes +beta-binomial frequentist empirical Bayes +beta-binomial 2 100.00 35.86 71.33 50.00 34.35 35.61 3 66.67 33.76 63.90 100 44.57 47.08 14 50.00 38.34 52.48 0.00 4.96 5.5 17 81.25 57.17 73.55 23.08 23.81 24.85 26 34.62 31.85 39.35 33.33 33.52 34.55 57 33.93 32.03 35.56 36.84 36.18 36.84 89 21.84 22.08 23.71 94.74 91.82 92.6 131 21.43 21.89 22.93 48.15 47.51 47.91 208 34.98 34.57 35.49 63.38 62.84 63.16 314 42.33 41.69 42.38 64.57 64.22 64.45 448 85.39 83.70 84.69 8.02 8.17 8.27 845 23.42 23.49 23.59 70.47 70.23 70.33 2,108 41.26 41.18 41.25 56.22 56.16 56.21 8,596 23.15 23.16 23.16 79.80 79.78 79.80

22 Table 3: Analysis of the used dataset with regard to the number of observations on all possible action sequences. For the approximation of hand equities, a library called Holdem Calculator53 is used. I have slightly modified the Possible situations given number of raises code in this library to correctly allocate ‘tie situations,’ as explained in section 3.1.4 on page 19. To reduce Observations 696 6,100 44,264 computation, all hands in the weighted ranges are removed 1 raise 2 raises 3 raises which do not conform to equation 14. In this equation none 95 (13.6%) 3,936 (64.5%) 40,200 (90.8%) p is the probability given to the hand in the opponent’s < 100 435 (62.5%) 5,556 (91.1%) 43,617 (98.5%) range, n is defined as the number of opponents faced and ≥ 100;< 1000 121 (17.4%) 328 (5.4%) 411 (0.9%) d is a factor set to 0.001 acknowledging the results shown ≥ 1000 140 (20.1%) 216 (3.5%) 236 (0.5%) in table 2. The hand loses equity as the opponents’ ranges are narrowed down, this indicates that the opponents’ As explained in section 3.1.4, a hand (such as AKs) ranges are skewed towards having mostly stronger hands comes in different combinations – six if considering a in these spots. pocket pair (22-AA), twelve if considering offsuited cards such as AKo, and four if considering a suited hand such as ATs. If the agent is opposed by only one opponent and p ≤ d ⋅ 2n (14) does not hold a card which is in a combination of a hand J in the range of that opponent (for example, ♢ is in one of the combinations of JJ), only one combination of cards which does not include any of the suits the agents holds is evaluated with a higher probability of occuring. For Table 2: Analysis of d in equation 14 for two different K 10 example: if the agent holds ♣ ♢ , all possible remaining situations where the agent is seated on SB and holds J8s K K K K K K combinations for KK—that is, ♢ ♡ , ♠ ♡ , ♠ ♢ —will be of diamonds. Players not listed fold. 1 evaluated separately, each with a probability of 6 multi- plied by the original probability, since the agent removes d Matchups Resulting equity (5 trials) 3 6 combinations for the opponent. All possibilities for AJs EP raises, SB calls, BB calls will be evaluated only once with different suits than the A J A J agent is holding—that is, ♡ ♡ or ♠ ♠ —while keeping the none 1,351,344 .2338,.2323,.2332,.2328,.2328 original probability of that opponent holding AJs intact. .0005 325,296 .2266,.2272,.2275,.2289,.2288 .001 162,336 .2234,.2226,.2233,.2236,.2219 A matchup is the event in which Monte Carlo sim- .005 6,546 .1911,.1870,.1903,.1956,.1839 ulations are performed for a combination of cards EP raises, SB calls consisting of the agent’s cards and one combination from none 400 .3401,.3403,.3425,.3411,.3408 a hand in an opponent’s range for every opponent. If .0005 202 .3330,.3336,.3330,.3331,.3329 Q Q the agent holds ♢ ♡ and competes with two opponents .001 146 .3291,.3287,.3285,.3282,.3291 A K A K respectively holding ♣ ♣ and ♠ ♠ , the matchup contains .005 67 .2997,.2999,.2989,.2980,.2988 three hands in which the agent has 57.36% equity. This .01 36 .2821,.2784,.2801,.2809,.2787 example matchup highlights the necessity of dividing up ties correctly among all players, since the players holding AKs only have a 7.02% chance of winning while tie 52SciPy, scipy.spatial.distance.minkowski, https://docs.scipy situations are calculated to occur 28.72% of the time. .org/doc/scipy/reference/generated/scipy.spatial.dis It would be inaccurate to share this ‘tie equity’ equally tance.minkowski.html#scipy.spatial.distance.minkowski, among all players as the biggest chunk of this percentage accessed on March 11th, 2020 of ties ‘belongs’ to the players holding AKs. 53ktseng, holdem_calc, https://github.com/ktseng/holdem_c alc, accessed on March 11th, 2020

23 Q Q When the agent is up against multiple opponents, most • The agent holds ♡ ♢ . The agent faces two opponents: of these opponents likely share some of the same hand combinations in their ranges. In these sequences, all – One opponent with an estimated range of: ⌈ 5 ⋅ AKs; 12 ⋅ AKo; 3 ⋅ AQs; 4 ⋅ KK; 3 ⋅ AA⌋ card combinations are separately evaluated for every 27 27 27 27 27 opponent. Only valid matchups are considered (i.e. – Another opponent with an estimated range of: ⌈ 16 ⋅ AA; 13 ⋅ AKs; 9 ⋅ KK; 7 ⋅ QQ⌋ opponents cannot hold the same cards simultaneously). 61 61 61 61 Since this results in extra removal, this causes for the The calculated equity ≈ 0.2309 (292 matchups) need to normalize all ‘hand versus ranges probabilities’ while the approximated equity for 5 tri- after performing the simulations. There is, for example, als (572,612 simulations per trial) is: K K no possibility for three opponents to hold ♠ ♣ simulta- 0.2309,0.2314,0.2311,0.2301,0.2316. neously, while it is very likely that three opponents all 3 J 8 have this hand in their range with a probability > 0.001⋅2 . • The agent holds ♢ ♢ . The agent faces two opponents: – One opponent with an estimated range of: For each matchup, 10,000 simulations of board run-outs ⌈ 2 ⋅ A3o; 7 ⋅ KJs; 5 ⋅ J7s; 4 ⋅ TT⌋ are executed if only one opponent is considered, while 10 18 18 18 18 – Another opponent with an estimated range of: simulations are executed if two opponents are considered, ⌈ 11 ⋅ JJ; 6 ⋅ K2s; 9 ⋅ 97o⌋ and only 1 simulation when more than two opponents are 26 26 26 faced. If the product of the number of matchups and the The calculated equity ≈ 0.2528 (432 matchups) number of simulations per matchup is below 1,000,000; while the approximated equity for 5 tri- the number of simulations per matchup is adjusted to als (947,376 simulations per trial) is: obey this constraint. Since invalid combinations of 0.2585,0.2537,0.2525,0.2520,0.2521. hands are rejected, the total number of simulations is in K J most cases lower than 1,000,000 (see the example in the • The agent holds ♠ ♠ . The agent faces two opponents: right column). Although the number of simulations can – One opponent with an estimated range of: get as low as 1, the results are similar as overall more ⌈ 28 ⋅ 77; 7 ⋅ 88; 5 ⋅ JTs⌋ matchups (and thus more simulations) are considered. 40 40 40 – Another opponent with an estimated range of: A hand versus a range of a single opponent results in ⌈ 11 ⋅ QJo; 6 ⋅ QJs; 9 ⋅ KTs; 4 ⋅ KJo⌋ very few matchups to be simulated, partly due to the 30 30 30 30 mechanism of grouping separate combinations into one, The calculated equity ≈ 0.3532 (294 matchups) as explained above. For example, when considering the while the approximated equity for 5 tri- action sequence A = ⟨raise .20,fold,fold,fold,call,call⟩, als (933,450 simulations per trial) is: 162,336 matchups have to be evaluated which re- 0.3533,0.3531,0.3541,0.3527,0.3539. sults in 1,623,360 simulations of the agent’s hand drawing against two opponents’ ranges; while if A = ⟨raise .20,fold,fold,fold,call,fold⟩ is considered, 146 matchups have to be evaluated which results in 1,460,000 simulations of the agent’s hand against a single opponent’s range. Considering that there are three opponents in the action sequence A = ⟨raise .20,call,fold,fold,call,call⟩, 14,531,201 matchups have to be evaluated, which results in the same number of simulations. To verify this claim, I have considered three test scenarios executed using the described implementation. These are shown in the column to the right.

24 A tree of all possible action sequences given all faced all possible raising sizes. In the implementation, these are action to the agent is generated using a queue data struc- set as a percentage with respect to the pot and range from ture54. Each action sequence is expanded with all possible 100, with increments of 10, to the maximum amount the future actions. The actions in the tree are augmented ‘on agent can put in (i.e. the agent’s stack size divided by the the go:’ each returned follow-up action is merged with total pot size multiplied by 100). the former—also augmented—action sequence to create a new action sequence, which includes the the probabilities for each future action, corresponding hand ranges (if the future action is not a fold) and raise sizes (if the future ac- tion is a raise). If no valid future actions exist, the action Algorithm 1 Estimating a raising size for the agent sequence is marked as ‘done’ and is not re-added to the raise sizes ← [100...stack size] queue. In the suggested implementation, a limit of two for all raise sizes as size do raises is set to reduce computation. As shown in table 3, ev ← 0 not all possible situations are observed in the dataset. Sit- for all raise branches in tree do uations with less than 50 observations are ignored. To ‘fix p ← 1 the gap,’ the probabilities for all branches (corresponding for all actions in raise branch do to frequency predictions for the opponents’ future actions) a,b ← Regressor(Situation) are normalized after the tree is generated. All possible folds ← a ⋅ size + b actions for the agent are always augmented with a proba- if action is fold then bility value of 1. For each branch, the agent’s hand equity p ← p ⋅ folds is calculated using all active opponents’ hand ranges in- else volved in the branch. All branches are grouped by the p ← p ⋅ (1−folds) agent’s action at the first point where the agent is due to end if act. The expected value for each branch is then added to pot, bet ← Evaluate(Situation, size) a value storing the total expected value for a certain ac- cost ← bet - involuntary_cost tion for the agent, as shown in equation 15. The formerly ev ← ev + p ⋅ ( equity ⋅ pot - cost ) calculated probabilities for the opponents’ future actions end for are not adjusted by the fold-frequencies returned by the end for linear formula found using regression. These frequencies results ← (size, ev) temporarily are only assumed to find the perfect raise size. end for Moreover, as indicated in algorithm 1, a binary situation best size ← Max(results) is considered: only ‘non-folds’ and ‘folds’ are counted.

evbranch = pot size after all action ⋅ agent’s equity − voluntary cost

evcall = evcall + normalized likelihood of branch ⋅ evbranch (15) If the action yielding the maximum expected value for the agent is a raise, i.e. if evraise > evcall ∨ evraise > evfold is true, the raising size for the agent is estimated using an implementation based on algorithm 1 to the right. For all branches in the tree of action sequences, expanding on a raise from the agent, the expected value is calculated for

54A queue is a structured form for managing data in which ‘new elements’ (in this case: action sequences) are added in the ‘waiting queue.’ When processing the queue, elements are ‘popped’ from the queue obeying a first-in-first-out regime.

25 3.3 Experiments and results 3.3.2 Estimated hole card ranges The purpose of this section is to demonstrate the potency Elaborating on the situation sketched in figure 8, the of the, in section 3.1, presented method; and to attract estimated hole card range for a player in EP is shown in interest in the involved techniques, for approaching similar table 5 on page 27. The concerned player is modelled by problems. As the method pertains only to a subphase VPIP and PFR/VPIP values of 43 and 10 respectively. of the game, it is hard evaluate as there are no ‘direct This hole card range is based solely on the action of the results.’ Nevertheless, in this section, the outcomes for raising player. The agent’s holdings do not influence a few situations are discussed. In appendix C, some of the ‘initial’ estimation of an opponent’s range. Only these outcomes are commented on by ‘real’ players. The when calculcating equity—the point at which opponents’ purpose of this paper is to answer the research question, ranges are actually used—these blockers are taken into stated in section 1, by proposing the methodology outlined account. To illustrate the effect of the agent holding an in section 3.1. This section, and the findings presented in ace: if the agent held one ace, the estimated probability appendix C, solely serve an exemplification purpose. of the opponent holding AKo and AA drops to 4.77% and 2.07% respectively, while the probability of holding 3.3.1 Weights KK jumps to 4.33%. Surprisingly, if the player in EP was modelled by VPIP and PFR/VPIP values of 16.89 and For the situation shown in figure 8, a vector space, illus- 73.98 respectively, EP’s range is estimated to be identical trated in figure 6, is created. Every vector in this space to the reported range in table 5. This unexpected finding consists of 25 components. The weights for these com- suggests that the similarity function might not be optimal. ponents, calculated according to equation 11 and 12, are presented in table 4 below. Noteworthy is the low weight If the player to act directly after the raise from the for EP’s raising size. For similar spaces, used to predict first position re-raises; the top five cards of this player’s opponents’ future actions, weights are even lower: the nor- range consist of AKo, KK, AA, QQ and JJ with respective malized weights given to the component representing EP’s probabilities of 11.33%, 10.35%, 9.75%, 7.91% and raising size for all players to act afterwards in respective 6.07%. If a third player would again re-raise after the first order are .000005,0.0005,0.0002,0.00005,.0001. and second player raised, the cards of which this player’s top five consist are identical to those of the second player’s Table 4: Estimated importance weights for dimensions in top five, with the exception that JJ and QQ are ‘switched a space used for the estimation of a hole card range for a around:’ their probabilities with respect to the second player who raised from the first position. player’s top five are 17.33%, 14.79%, 13.27%, 7.15% (QQ) and 9.19% (JJ). If ‘it folds to the small blind,’ and the w function w function player in the small blind, being the first player to voluntar- .5391 VPIP of EP .0044 VPIP of BU ily put money into the pot, raises, the top of this player’s .2767 PFR/VPIP of EP .0098 PFR/VPIP of BU range consists of cards with an estimated probability .0024 EP’s stack size .0002 BU’s stack size of at maximum 2.88%. This indicates a much wider range. .0785 EP’s all-in status .0066 BU’s all-in status .0001 raise size by EP .0050 VPIP of SB Historical situations for which the analysed player’s .0047 VPIP of MP .0122 PFR/VPIP of SB hole cards were unknown are disregarded. These situa- .0092 PFR/VPIP of MP .0003 SB’s stack size tions occur when a player did not proceed to a complete .0003 MP’s stack size .0056 SB’s all-in status showdown. The proportion of these ‘unknown’ holdings .0045 MP’s all-in status .0052 VPIP of BB is often larger than the proportion of ‘known’ holdings. .0039 VPIP of CO .0100 PFR/VPIP of BB In the aforementioned example of three players in the .0071 PFR/VPIP of CO .0001 BB’s stack size first three positions raising the pot; respectively 85.24%, .0001 CO’s stack size .0068 BB’s all-in status 76,75% and 50.36% of the holdings of their ‘historical .0071 CO’s all-in status counterparts’ were unknown.

26 Q Q K K Table 5: Estimated hole card range for a player who raised only thinks raising ♢ ♡ and ♣ ♠ is profitable. The agent 2 3 Q 5 J 8 from first position. would fold ♡ ♡ , ♢ ♢ , ♢ ♣ . The agent expects roughly the Q 10 same value ($0.00) when folding and calling ♢ ♠ . p ⋅ 100 hole cards p ⋅ 100 cards p ⋅ 100 cards Now, consider the agent to be seated on BU while 5.69% AKo 4.69% AQo 4.14% AJo A 7 A 2 2 2 MP and CO still fold – the agent now folds ♣ ♢ , ♡ ♡ , ♡ ♢ , 3.99% QQ 3.88% KK 3.70% AA 6 7 10 J A K A K 5 5 10 10 J J Q Q ♡ ♡ , ♢ ♢ ; but calls ♠ ♢ , ♠ ♠ , ♡ ♣ , ♠ ♣ , ♡ ♢ , ♢ ♡ and 3.61% KQo 3.56% JJ 3.28% TT A J K K A A ♠ ♢ ; and now decides to raise ♣ ♠ , while still raising ♣ ♡ . 2.83% 99 2.60% 88 2.53% ATo 2.24% 77 2.22% KJo 2.06% AKs 2.02% 66 1.76% 55 1.72% AQs EP MP 1.71% AJs 1.55% KQs 1.51% ATs a $0.25 1.42% QJs 1.40% QJo 1.38% 44 i 1.34% KJs 1.28% JTs 1.17% QTs 1.15% A9s 1.13% 33 1.11% KTs 1.08% T9s 0.98% A5s 0.94% A8s BB $0.10 total pot: $0.40 CO

3.3.3 Predictions for actions and raise sizing $0.05 In the situation shown in figure 8 on page 20, the algo- SB BU rithm suggests a fold for the agent when considering the

A A opponent models as shown in the enumeration below. The ♣ ♡

♣ ♡

♡ ♣

A agent expects to win $0.00 by folding, while the expected A value for calling and raising is respectively -$0.04 and -$0.38 (when considering a ‘standard raise’ to $1.20). Figure 9: Situation in which the agent is seated on the small • EP has an estimated VPIP-value of 42.9 and a PFR/VPIP- blind and faces a 166.67% raise from the first position. A value of 10.4. white node indicates that a player folded. • CO has an estimated VPIP-value of 29.7 and a PFR/VPIP- value of 71.2. • SB has an estimated VPIP-value of 25.9 and a PFR/VPIP- β = EP, α = fold value of 62.0. • BB has an estimated VPIP-value of 17.2 and a PFR/VPIP- .42 value of 82.3. β = BB, α = fold For a new situation, shown in figure 9, wherein the agent is seated in a later position (in the ‘small blind’ or ‘SB’) β = agent .84 .58 and faces three folds after a raise to $0.25 from the first β = EP, α = call α = raise 1.04 position, the final part of the generated tree of action .16 .62 sequences is drawn in figure 10. The player on BU is β = BB modelled by a 17.0 VPIP value, and a 38.5 PFR/VPIP α = call A A value. In this situation, the agent suggests to raise ♣ ♡ to $1.04. More than half of the time, the agent expects EP, .38 the ‘original raiser,’ to call the agent’s re-raise while the β = EP, α = fold estimated probability is slightly higher when the ‘big blind’ (BB) decides to call the raise as well. In this spot, Figure 10: Final part of the generated decision tree for the J J A 9 A 2 5 5 Q Q K K agent’s re-raise after EP raised and MP to BU folded. A the agent would also call ♢ ♡ , ♡ ♡ , ♡ ♡ , ♡ ♣ , ♢ ♡ , ♣ ♠ , K 10 7 6 2 2 A 7 10 10 limit of two raises is set. ♠ ♠ , ♠ ♠ , ♣ ♡ , ♣ ♢ and ♣ ♠ . Of all listed hands, the agent

27 4 Conclusion and discussion The estimation of player-specific metrics is of high relevance in the world of poker, and possibly other fields For the purpose of representing and comparing game facing similar problems. The method shown in this situations in order to make decisions in the early stages of paper can be used for a variety of applications related a multi-player No-Limit Texas Hold’em game, an agent to poker, many of which are ‘Non-Artificial Intelligence can generate multi-dimensional vector spaces in which applications.’ An interesting idea for future research vectors live whose components consist of characteristics might be to analyse whether a more specific model would describing the to-be-compared situation. Part of these yield even better results. A suggestion might be to characteristics consist of opponent models, estimated expand the Bayesian model with another prior based on using a combination of empirical Bayesian statistical playing styles (which is possible, as shown in figure 2). methods and beta-binomial regression. A weighted Another suggestion is to replace the used technique of geometric distance between vectors in this space can beta-binomial regression by a more extensive Bayesian be used as a similarity measure between a currently model, for example by applying a Poisson-Gamma model presented situation and several historical situations (jkm, 2019). A necessary remark needs to be made on the reflecting games in which the involved players performed proposed implementation for beta-binomial regression. the same sequence of actions. An augmented probability The VPIP values for players with a very low amount tree can be generated for all possible action sequences of hands played (say 2 or 3) are possibly unrealistically and its probabilities. The expected value for a particular high, as shown in table 1. If these values are indeed action for the agent can by calculated by consulting the undesirable, a possible solution would be to cut a very likelihoods of every branch in the tree in combination small portion of players in the dataset, for example players with approximating the agent’s equity versus opponents’ with less than 10 hands played, while still managing to ranges involved in that branch. overcome most of the negative effects from the commonly encountered bias pointed out in figure 4. This methodology is efficient as much of the in- volved computation concerns matrix operations or simple The second point that needs to be addressed, is the arithmetic due to the benefits of caching and conjugate way in which weights are estimated for situational prior theory. This technique causes an agent to take into characteristics in section 3.1.3. The described method is account most situational factors involved in a poker game, not optimal. In order to get better results, a possibly larger and to make decisions based on the opponents’ dynamics. selection of ‘outcome groups’ should be considered. The proposed design is expandable to multiple different Moreover, as mentioned, the borrowed hand groups from variants of the game, and can be extended in a similar Sklansky’s work are originally designed for a different manner by taking into account more variables. variant of the game – Limit Texas Hold’em instead of No-Limit Texas Hold’em. The method in this paper Although the results are satisfactory, a broad range has been invented purely for bridging the challenges of remarks needs to be made on the approach in this faced in this problem, and can likely be improved thesis. The most important shortcoming of the presented significantly. Therefore, it might also be needed to change implementation is the lack of ‘insight’ into further game- the estimating-weights algorithm altogether. In section play. Decisions are based entirely on hand equity, while in two, I advocated the idea of human players applying reality action generates (possibly negative) implied equity techniques similar to those implemented in this thesis – i.e. some hands you can get more easily bluffed out of subconsciously. Since players can be assumed to weigh (Duke and Vorhaus, 2011, p. 78-79). A good poker player certain characteristics of poker situations individually, a does take possible post-flop situations into account when more ideal model would also involve a more dynamic making a play pre-flop, while the implementation in this approach, allowing for the estimation of player-specific paper solely considers absolute hand equity. A reasonable weights. The problem of ‘unguidedly finding weights’ choice for further research would be the expansion of the for dimensions in a dataset is, however, very interesting used methods to the post-flop phase of the game. for the field of AI and can turn up in a broad area of

28 different applications. Research following up on this vector along the axis representing the size of the raise. particular matter is significant for any problem in which However, the weights for these raise sizes are often too concepts are to be compared with each other. Moreover, low—which also points out the shortcomings of the there are many interfaces with the earlier mentioned idea weight-estimation method used in this thesis—which of Gärdenfors’ Conceptual Spaces, since our problem results in this mechanism barely making a difference. reduces to finding ‘salience weights’ for concepts in the This is not in line with real world situations. This fact even light of his work. renders the inclusion of the raising size in the used vectors almost useless. Nonetheless, the value is deliberately Elaborating on the use of weights for the dimen- included in the vectors to illustrate the philosophy behind sions of the situational spaces in this thesis, a note should the proposed idea. An important remark on both types be made that the dimensions are treated as if they were of methods is that optimised raise sizes can lead to the independent. For the same reasons as stated above, it agent being exploited, as high raise sizes resulting from would be interesting to examine the options for enlarging the agent maximizing its expected value point to the agent the proposed model to be compatible with correlations holding cards with strong absolute equity. The ‘vice between dimensions. A starting point could be to versa’ situation also applies. This disadvantage could calculate weights for a matrix storing two-dimensional be eliminated by implementing the concept of range correlations, instead of a vector (or, if going even further, balancing55. tensors of multi-dimensional correlations). Furthermore, some factors that occur in real life are Furthermore, the application of these weights—used disregarded when comparing poker situations with in calculating weighted Minkowski distances between each other using the approach presented in this paper. vectors—is unsatisfactory. In table 4, relatively enormous This makes the proposed method, as it currently is, weights are calculated for the EP’s VPIP and PFR/VPIP most suitable for ‘bot versus bot situations.’ A good values for the situation sketched in figure 8. However, the example of an ignored element that does in fact influence estimated hole card range for this player, as shown in table gameplay is the concept of emotion – real players might 5, is identical to that of a player modelled by very different for example temporarily lose interest in the game after values for these seemingly salient metrics, as pointed out getting ‘unlucky.’ A way of incorporating this aspect on page 26. This also results in ranges to be estimated in the model might be to include a metric measuring too wide for players who generally raise not very often. ‘unluckiness,’ which ranges from zero to one. For Although, in essence, the proposed methodology with example: if in an all-in situation a player holds 97,62% which the research question is answered is solid, it is not equity (a ‘one-outer’) and loses the hand, the metric utilized perfectly in the shown example implementation. should jump towards 1, while if this situation occurred vice-versa, the metric should be close to 0. If we consider A third significant limitation of the proposed method, is another to-be-invented metric capturing the variance of the use of linear regression for the purpose of finding a player’s stack size (i.e. this number would be high if the size for the agent with which to raise, as described the number of a player’s chips fluctuates wildly), then the in section 3.1.4. By deviating from vector spaces, the correlation of these two metrics would possibly be very agent loses some of the advantages these vector spaces significant (for example, if a player has both been very offer in this part of the algorithm. Hole card ranges unlucky and lost a big chunk of his/her stack, the player’s for opponents do not change as the agent’s raising size game play might very well be influenced due to emotional changes, while in reality an opponent can generally be factors). assumed to hold a stronger range when calling higher raises. The ‘size-finding’ algorithm, shown on page 25, can also be optimised by implementing a more efficient 55The Pokerbank, Range balancing, http://www.thepokerbank method of ‘trying out’ raise size values. In theory, it .com/strategy/concepts/range-balancing/, accessed on March should be possibly to move the point of the situational 10th, 2020

29 Another significant factor not accounted for in the pro- combinations of a poker hand in situations with more posed implementation is the mentioned factor of ‘time.’ than two players. Other improvements possibly include Although the proposed implementation involves a design different caching mechanisms, for example using the cited that allows for extenstion towards more dimensions, and resources on page 8 of this document. Besides this, when is therefore able to take into account more variables, the calculating the expected value of an action using a hand’s way in which ‘the influence of time’ should be measured equity, no ‘rake’ values are taken into account. In the ‘real is not investigated in more detail. Specifically for the field world,’ poker rooms often deduct a small commission on of poker, it would be interesting to further investigate the every hand played, called ‘rake.’ consequences of ‘time’ in poker situations. A starting point could be the seemingly obvious observation that The method in this paper suggests the construction more recent actions by a particular player should be of a hole card range for an opponent by measuring the weighted more heavily, to allow flexible anticipation on a similarity between the situation in which the unknown (possibly abrupt) change of an opponent’s strategy. hole cards of the faced opponent are presented, and historical situations in which the same sequence of player When considering historical situations for determin- action occurred and the hole cards of a particular player ing an opponent’s range of hole cards, the situations in were known. This technique could very well be improved which the hole cards of a historical player were unknown when implementing ‘bucket mechanisms’ which could for (this happens when a player either won or lost the example be based on inferred rules such as: “if a player 7 7 9 9 pot without a showdown) are ignored. This probably calls ♢ ♡ and ♠ ♢ , the player can probably be assumed 8 8 results in a bias, as some hands can be expected to (not) to do the same with ♣ ♡ .” Buckets could possibly also go to showdown ‘easier.’ This is another significant be dynamically generated, by only considering groups shortcoming of the presented method. For example, some of hands that have similar equity against the hand the 2 2 5 5 hands might be more often played with a high level of agent holds (e.g. a bucket of ♠ ♣ - ♣ ♠ and a bucket of 7 7 A A 6 6 aggression, which results in more folds (and thus less ♢ ♣ - ♡ ♢ make sense when the agent holds ♠ ♡ ). Other showdowns) by opponents. On the other side, weaker approaches involving fixed or dynamic ‘buckets’ might be hands are folded more often after the flop which results in relevant as well. As explained at the end of section 3.1.3, less showdowns. A third example could be hands that ‘do this might go at the expensive of player exploitability. 2 2 not hit the board often,’ such as ♢ ♡ , which often fold after As such, these mechanisms should be implemented with 2 2 any flop not containing a ♣ or ♠ , but are still valuable to caution. 2 2 play because of the huge implied value when a ♣ or ♠ does hit the board. A possible solution would be to account Although, the most likely action sequences are all for ‘non-showdowns’ in the model, or if available, to use accounted for, the dataset lacks observations for a a dataset in which all hole cards for every player are known. significant part of all possible action sequences. This is inevitable, but could possibly be overcome in a more A few remarks need also to be made on the tech- flexible way when abstractions would be applied. A nique with which a hand’s ‘equity’ is calculated in this possible improvement to the suggested method could be thesis. The most obvious assertion is that the method to use observations from other action sequences (possibly in this paper relies on simulations, instead of exact with slight modifications, carried out on a way that is calculations. The negative effects of this problem can be learnt from the data itself) for action sequences lacking partly reduced if, when enough computing resources are observations. available, some exact equity calculations would be stored in cache. If building on the approach in which equity is approximated instead of calculated, more efficient implementations of the proposed algorithm, explained on page 23 and 24, are certainly possible. The most obvious improvement would be to not consider all different

30 References Davidson, A. (1999). Using artificial neural networks to model opponents in texas holdem. Unpublished Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., manuscript. and Robles-Kelly, A. (2018). Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR Interna- DeDonno, M. A. and Detterman, D. K. (2008). Poker is a tional Workshop, S+SSPR 2018, Beijing, China, August skill. Gaming Law Review, 12(1):31--36. 17--19, 2018, Proceedings. Lecture Notes in Computer Science. Springer International Publishing. Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist., 7(2):269--281. Billings, D., Davidson, A., Schaeffer, J., and Szafron, D. (2002). The challenge of poker. Artificial Intelligence, Duke, A. and Vorhaus, J. (2011). Decide to Play Great 134(1):201 – 240. Poker: A Strategy Guide to No-Limit Texas Hold ’Em. Huntington Press. Billings, D., Papp, D., Schaeffer, J., and Szafron, D. (1998). Poker as a testbed for ai research. In Mercer, Fukunaga, K. and Hostetler, L. (1975). The estimation of R. E. and Neufeld, E., editors, Advances in Artificial In- the gradient of a density function, with applications in telligence, pages 228--238, Berlin, Heidelberg. Springer pattern recognition. IEEE Transactions on Information Berlin Heidelberg. Theory, 21(1):32--40.

Brown, N. and Sandholm, T. (2017). Reduced space and Gärdenfors, P. (2004). Conceptual spaces: The geometry faster convergence in imperfect-information games via of thought. MIT press. pruning. In Precup, D. and Teh, Y. W., editors, Proceed- ings of the 34th International Conference on Machine jkm (2019). Accounting for uncertain information (few Learning, volume 70 of Proceedings of Machine Learn- observations) in a prior (empirial bayes). Cross Val- ing Research, pages 596--604, International Convention idated. URL:https://stats.stackexchange.com/q/439760 Centre, Sydney, Australia. PMLR. (version: 2019-12-07).

Brown, N., Sandholm, T., and Amos, B. (2018). Depth- Johanson, M. (2013). Measuring the size of large no-limit limited solving for imperfect-information games. poker games. arXiv preprint arXiv:1302.7008.

Byrd, R. H., Lu, P., Nocedal, J., and Zhu, C. (1995). A Johanson, M., Burch, N., Valenzano, R., and Bowl- limited memory algorithm for bound constrained op- ing, M. (2013). Evaluating state-space abstractions in timization. SIAM Journal on Scientific Computing, extensive-form games. In Proceedings of the 2013 inter- 16(5):1190--1208. national conference on Autonomous agents and multi- agent systems, pages 271--278. Chen, B. and Ankenman, J. (2006). The Mathematics of Poker. ConJelCo LLC. Korb, K. B., Nicholson, A. E., and Jitnah, N. (1999). Bayesian poker. In Proceedings of the Fifteenth Confer- Dahl, F. A. (2001). A reinforcement learning algorithm ence on Uncertainty in Artificial Intelligence, UAI’99, applied to simplified two-player texas hold’em poker. In pages 343--350, San Francisco, CA, USA. Morgan De Raedt, L. and Flach, P., editors, Machine Learning: Kaufmann Publishers Inc. ECML 2001, pages 85--96, Berlin, Heidelberg. Springer Berlin Heidelberg. Lloyd, S. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--137. Dalpasso, M. and Lancia, G. (2015). Estimating the strength of poker hands by integer linear programming Manchanda, S. and Anand, A. (2017). Representation techniques. Central European Journal of Operations learning of drug and disease terms for drug reposition- Research, 23(3):625,640. ing.

31 Metropolis, N. and Ulam, S. (1949). The monte carlo Wilson, E. B. (1927). Probable inference, the law of suc- method. Journal of the American statistical association, cession, and statistical inference. Journal of the Ameri- 44(247):335--341. can Statistical Association, 22(158):209--212.

Meyer, G., von Meduna, M., Brosowski, T., and Hayer, T. (2013). Is poker a game of skill or chance? a quasi-experimental study. Journal of Gambling Studies, 29(3):535--550.

Moravčík, M., Schmid, M., Burch, N., Lisý, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. (2017). Deepstack: Expert-level artifi- cial intelligence in heads-up no-limit poker. Science, 356(6337):508--513.

Pananos, D. (2020). Beta-binomial regression or poisson- gamma model to account for uncertainty in (empricial bayesian) prior? explained in simple terms? Cross Val- idated. URL:https://stats.stackexchange.com/q/444807 (version: 2020-01-22).

Ponsen, M. J., Ramon, J., Croonenborghs, T., Driessens, K., and Tuyls, K. (2008). Bayes-relational learning of opponent models from incomplete information in no- limit poker. In AAAI, pages 1485--1486.

Raïffa, H. and Schlaifer, R. (1961). Applied statistical decision theory. Studies in managerial economics. Di- vision of Research, Graduate School of Business Ad- minitration, Harvard University.

Robinson, D. (2017). An introduction to empirical bayes: Examples from baseball statistics.

Sklansky, D. and Malmuth, M. (1999). Hold’em poker for advanced players. Two Plus Two Publishing LLC.

Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., and Rayner, D. C. (2012). Bayes’ bluff: Opponent modelling in poker. CoRR, abs/1207.1411.

Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18(4):267--276.

Van der Kleij, A. (2010). Monte carlo tree search and opponent modeling through player clustering in no- limit texas holdem poker. University of Groningen, The Netherlands.

32 Appendix A - Rules of Texas Hold’em Poker

In Texas Hold’em every player is dealt two hidden cards – the player’s ‘hole cards.’ Together with five additional ‘community cards’ visible to any player, these (seven) cards determines a player’s ‘hand:’ the best possible combination of five cards out of seven. Hands are ranked in descending order accord- ing to the combinations shown in the figures below. In the event of a tie, the hand with the highest 10 10 10 2 2 3 3 3 K K A Q 10 8 2 rank for the largest group of cards wins. For example: r ♣ ♠ q ♣ beats ♠ q r ♣ r and ♠ ♠ ♠ ♠ ♠ beats Q J 10 8 2 ♠ ♠ ♠ ♠ ♠ . Cards not counting towards the type of a ‘made hand’ are called ‘side cards,’ or ‘kickers.’ Q Q J J A The hand ♠ ♣ ♠ ♣ q is ranked as ‘two pair (queens and jacks) with an ace kicker.’ In case of a subsequent 3 3 3 A 2 3 3 3 K J K 10 8 7 6 tie, the hand with the highest kicker wins. For example: q ♣ r r r beats q ♣ r ♣ q ; and r r ♣ r q beats J 10 9 8 2 r r ♣ q ♠ . If no kickers are available (as with a five-card hand such asa straight or flush); or if a tie occurs again, the pot is split between the players. In no-limit Texas Hold’em bets and raises can go up to any value, whereas in limit Texas Hold’em these sizes are capped. Q J 10 5 6 7 ♠ ♠ ♠ rr rr r

K ♠ 4rrr r

♠ ♠ ♠ ♠ r r

A 3 rr r

♠ r

♠ r

A

♠ r 3

♠ ♠ r r r

4

K r

r ♠

r r

♠ r r r ♠ r

r 5

Q ♠ r

♠ r 6 J

7 10

‘Royal flush’ – an ace, king, queen, jack and ten of ‘Straight flush’ – five cards of subsequent ranks of the same suit the same suit

Q 2 2 A K ♣ q♣

A ♣ r r Q r♣ q

A q Q ♠r ♠

r r

A ♠ ♠

q ♣ Q ♠

r

r r

♣ Q

♠ A ♣ ♠ ♣ q

q A

Q

♣ 2 A q

r A r 2 K ‘Full house’ – three cards of the same rank and ‘Four of a kind’ – four cards of the same rank two cards of the s ame rank

J A J Q 8qq qq 10 ♠r ♠rr r

5 qq q 9 r♣r ♣r

qq q ♣r r♣ r

r

q 8

2 r♣ ♣

r

q q r

♣ q

q ♣

q q r♣

r

2

8 q r r

r

q

r

5 ♣

q r 9

r r q q

q ♣ q ♠ q 8 10 ♠

q r

r

r q J J

A Q

‘Flush’ – five c ards of the same suit ‘Straight’ – five cards of subsequent ranks Q Q 5 5 2 q♠♣ ♠q♠ ♠q

4 r♠ r♠ r 7 ♠♣q

4 ♠rq r r ♣q ♣

7 q ♣♠ qq

rq ♣q ♣

q

4 q 7

r

q 7

q ♣

q ♠ ♠

q 4

♣ ♠

q

q 5

q 4

q

r

♠ ♠

r r ♠ r

4 ♠ 5

q

r 2 r Q Q ‘Two pair’ – two pairs of two cards of the same ‘Three of a kind’ – three cards of the same rank rank 10 J A 7 9 10 5 q ♠q r♠q ♠r

qq♣ qq ♣q q r

8 q♣♠ q q ♠ r♠r

2 ♣

q q♠

♠ ♣q q q q r

♣ r

8 ♠ ♣ 2 q

q

♣ q ♣

r

5 ♠ q q

♣ r ♠ q q ♠

♣ r 8 ♠ q

q

♠ q

7

♣ ♠ 8

q q r

♣ q r 9 10 q ♠

q J 10 A ‘High card’ – an arbitrary other combination not ‘Pair’ – two cards of the same rank mentioned above

The game is divided into four different betting rounds: ‘pre-flop’ (before any community cards are dealt), the ‘flop’ (after three community cards are dealt), the ‘turn’ (after a fourth community card is dealt) and the ‘river’ (when all five community cards are dealt). In each betting round, all players need to match their bets to proceed to the next round. When no bets have been placed, a player can choose to bet or check (a player checks when deciding not to bet). When a bet has been placed by another player, a player can opt for a call (matching the previously made bet), raise (incrementing the previously made bet) or fold (leaving the hand with no chances of winning). Two rotating positions involuntarily have to place set bets at the beginning of the first round: the ‘big blind’ and the ‘small blind,’ thus leaving the possible actions ‘pre-flop’ to be either call, raise or fold (the only player with a pre-flop possibility for a check is the player on the big blind , when no player has raised before the big blind’s turn). If on the river (the last betting round) all remaining players matched their bets, the players go to ‘showdown’ and the player with the best hand wins the pot. Appendix B - Glossary

• Action - The combination of moves made by players in the current hand. • All-in - When a player commits all their chips to the pot. The player will not be able to participate in further action. When one or multiple players call a player’s all-in bet, a showdown will certainly occur. • Bet - Amount of money voluntarily put into the pot by a player who takes initiative, i.e. a ‘call’ is not a ‘bet.’ • Bet size - The amount of money or chips with which a player bets. • Blind -Money involuntarily put into the pot by two (rotating) players: the ‘small blind’ and the ‘big blind.’ Often denoted as two numbers – in a game with blinds at $100/$50, players on the positions of the big and small blind place bets of $100 and $50 respectively, at the beginning of the hand. • Blocker - A card ‘blocks’ the possibility of an opponent having the same card, reducing the pos- A 2 sibility of the opponent having a hand involving that card. For example, being dealt ♠ q reduces the possibility of opponents having any hand with an ace (such as AA or AKs). • Board - Synonym for community cards. • Call - Matching a previous bet made by another player. • Check - Voluntarily choosing not to (increase the) bet. If all players check, all players proceed to the next betting round or showdown without having to put (extra) money into the pot. • Community cards - Five cards visible to all players with which a player can make a 5-card combination together with their two hole cards. • Equity - The perceived strength of a hand or a range of hands, most often measured against another hand or range of hands. • Flop - Either refers to the phase of the game concerning the second betting round in which three community cards have been dealt, or the three revealed cards themselves. • Fold - Voluntarily choosing to halt participation in the hand and forfeit all chances of winning the pot. A player folds when a player decides not to match the bet made by another player. • Hand - Can either refer to the player’s hole cards, a hand ranking listed in appendix A, or a game event itself (the moment from dealing hole cards to players, all the way to a possible showdown). • Hole cards - Two hidden cards a player receives at the start of a hand. • Implied odds/value - Amount of chips a player expects to win if a certain event (such as a ‘making a straight’) occurs. • Overcard - An overcard is a card of a higher rank than the cards that the player is holding. For J 10 A 8 2 A example: if a player holds ♣ ♣ on a ♠ ♠ q flop, then the flopped ♠ is an overcard to the player’s hand.

• Position - The place in which the player is seated. All positions rotate clockwise after every hand. The first player to act (in the first position, sometimes referred to as ‘UTG,’ butmost often as ‘EP’) is seated left of the blinds. The last player to act has the big blind. In this thesis the first (few) players to act are referred to as being in ‘early position,’ whereas the lastplayer(s) are in ‘late position.’

• Pot - The accumulation of all money put into the pot by all players. • Pre-flop - The first betting round of the game during which no community cards have been dealt yet. This is the phase of the game with which we are concerned in this thesis. • Raise - Instead of matching a bet made previously by another player or folding, the player increments the bet. All players who did not fold yet have to either match the new size, fold or re-raise. • Raise size - The amount of money or chips a player increments the previously made bet with. Often measured as a percentage relative to the total pot size or the previously made bet. • Range - A group of hands a player can have in a specific situation, e.g. ‘the re-raising range’ of a player could consist of AA, KK and AKs. • Showdown - Event occurring when two or more players proceed through all betting rounds and show their cards to find out who wins the pot. • Stack size - The amount of chips a player has. Often measured at the beginning of a hand. The maximum amount the player can put into the pot.

• Suited - Two hole cards of the same suit. Often denoted with a small s, for example: ‘ATs’ for A 10 A 10 r r . Offsuit hands (such as r ♣ ) are often denoted with a small o (for example ATo). Appendix C - Questionnaire

To provide an additional feature, six poker players have been asked to comment on the agent’s decisions using their expert knowledge. As this questionnaire is prone to limitations due to multiple reasons— the most obvious two reasons being the low number of participants and the possibly biased selection of questions in the survey—it is only used as an appendix in an unprocessed manner. The results of the questionnaire are attached to this and the next page, while the last pages of this document consist of the contents of the questionnaire itself. Extra comments made by players are referred to by foot- notes.. Most of the remarks have already been addressed throughout this thesis. Most surveys seem to confirm common assumptions and concerns made in this paper. The results continue on the nextpage.

These answers concern a profitable player These answers concern a profitable player These answers concern a profitable live who studied a lot of strategy and played who studied a lot of strategy and played poker player who studied a fair amount of 10,000 - 100,000 hands over the past five more than 100,000 hands over the past strategy and played about 30,000 hands years. five years. since November 2019, when online poker became legal again in the United States 1 6 1. 3/5 6. 2.5/5 1. 3/511 6. 3/515 1. 1/517 6. 1/520 2. Raise 7 7. 0/5 2. Raise 7. 1/516 to 2. Raise 7. 2/521 2 8. It’s to $1.00 8 $1.05 8. It’s good $1.00 3 good 18 8. It’s good 3. 4/5 12 9. Yes 3. 1/5 9. See com- 3. 5/5 9. Yes 4 9 19 13 10. Raise to 4. 2/5 ment 4. 1/5 4. 1/5 $0.87 10. Call from 5 5. 5/5 10. Raise to 14 5. 5/5 BU, raise $0.7510 5. 5/5 from SB

1“The range is quite ‘top heavy,’ including almost no suited connectors, bit it’s not unreasonable. It seems strange that hands which are objectively always opening have different weights (i.e. JJ/TT). The range would be unreasonable if the VPIP and PFR are assumed to be correct. The given information implies they only raise preflop 4.3% of the time. However a 4.3% opening range seems unreasonable.” 2“We have the best hand in the game, this is a pretty clear raise.” 3“The big blind will likely be calling slightly less often than this given that the early position raiser is assumed to be strong. The BB also has a fairly tight VPIP so I wouldn’t assume such a wide overcall range. The raising frequency looks about right. Seems reasonable overall though.” 4“The assumptions are backwards; the standard play here for the BB is to reraise or fold. Cold-calling the 3bet with 15% of range is simply not reasonable, especially considering they aren’t closing action. If we assume BB cannot reraise then this is still somewhat unreasonable, as 15% is just way too wide for that situation.” 5“Perfect raising size. This is considered a standard 3bet size when out of position.” 6“Expected value can be extremely difficult to gauge. I imagine these numbers don’t include rake. J8s isafoldin either position against the EP raiser according to all preflop charts. J8o is a definite fold. I have a hard time believing that these hands are not -EV in that spot, but it’s not completely unreasonable in a zero-rake environment.” 7“I very much disagree with the EV assumptions here. Raising will definitely be a better play in this situation. KK is a standard 3bet from any position. I’m confident that any game theory solver or 2+2 regular will you toraiseKK 100% of the time in that spot. Slow-playing Kings is an especially terrible idea against an ace-heavy opponent. There’s no way that flatting is higher EV than raising.” 8“Flatting there is an atrocious play. They should definitely be raising.” 9“I am somewhat indifferent. AJo is a slightly stronger hand, however ATs plays better multiway and has better implied odds due to flush possibilities. If we assume one of the blinds will call quite often then I prefer ATs, otherwiseI prefer AJo.” 10“TT is a definite raise; it does not play well as a flat. More often than not the flop will bring overcards (J-A),soit’s preferable to take the betting initiative and have an uncapped range in that spot.” These answers concern a recreational These answers concern a profitable player These answers concern a recreational player who picked up some strategy and who studied a lot of strategy and played player who picked up some strategy and played 10,000 - 100,000 hands over the more than 100,000 hands over the past played more than 100,000 hands over the past five years. five years. past five years.

1. 6. 4/5 11 1. 4/5 6. 4/5 1. 2/5 6. 3/5 2/5 7. 2/5 2. Raise 7. 2/5 2. Call 7. 1/5 2. Call 8. It’s good $0.85 $0.25 $0.25 8. It’s good 8. It’s good 9. Yes 3. 4/5 3. 4/5 3. 4/5 9. No 9. Yes 10. Raise to 4. 3/5 4. 4/5 4. 4/5 10. Call 10. Raise $1.00 5. 3/5 $0.25 5. 2/5 $0.75 5. 5/5

11“It should be said that the stats provided are insufficient to determine the openings range from the Lojack (EP). Even tho his average is 4.3 PFR it could be even lower from the position he is raising from. That said your estimation of 4.3% of hands is way off. Your estimated range consists of about 16% of hands.” 12“These numbers seem accurate. You could make a case his calling % should be a bit higher and folding % a bit lower, since he should defend a lot of his range, even with a caller in the small blind.” 13“although with this specific combo (AA) it is indeed harder for the Big Blind to have a hand that can profitably call a 3-bet with the original raiser still behind. It seems 15,9% call is way too much, though. Not accounting 4-bets is also strange, strategy from this position is usually 4-bet or fold.” 14“Raising size is good. In general, we want to 3-bet from Out of Position with a larger size, so 4x the original raise seems good, some would even say standard. With this size we generate fold equity with our bluffs and we get value from the premium hands in our range.” 15“Although a call is probably the same ev as folding in theory, or at least looking at those numbers, what a solver would say. It should be also accounted for that we don’t realize this equity/value when we are out of position. I think folding j8s is the only option here in a cash game without antes.” 16“We should always raise this. First for value, second to deny equity from the big blind.” 17“Yes I like A10s more as a call. Ajo is a raise or fold in my opinion.” 18“This range seems rather arbitrary. What are the other considerations that are conflicting with the VPIP and PFR stats so extremely? How does the agent adjust these ranges based on sample size? This player seems very passive. I expect a lot of limping and calling. If his PFR/VPIP is 10% and his VPIP is 43% it seems he’s only raising very strong hands (top 4%). This should be even more narrow in EP and wider in late position if EP has a clue. Even if they raise a static 4% across all positions regardless of other factors it will be much more narrow than the range you’ve provided. Also, the percentages don’t seem to add to 100% of his raising range. Lastly why are there so much difference in hands with the same amount of combos? How does this translate to real life? If we are equally likely to get AKs and AQs I would expect them to make the same p*100 slice of our overall range for example. Are we limping or folding AQs every 100th time?” 19“BB stats are considerably tighter and much more aggressive. We should not expect much calling at all, otherwise there would be a greater gap between is VPIP and PFR. The 82% ratio of PFR/VPIP puts BB at like 17/14. VERY rarely does he flat a raise pre. He is raising of folding almost always. The 3% gap could be from getting awalkinthe BB on occasion or flatting a raise once in a blue moon. There’s no way we should expect a call 26%. We don’thave3 bet stats but I’d expect mostly folds from this guy and 5% calls max. And the rest raises since SB should be capped after just calling OOP.” 20“I can’t think of any hand that wants to cold call a 3 bet here except AA. Since we have AA in the SB that’s 1 combo left. I also rate this expectation of BB action as 1 out of 5. Again this player profile does not cold call much. More raise or fold. So I’d have him 4 betting AK, KK+ and folding everything else given the passive player EP raise and a SB 3 bet anyways.” 21“I have no idea how you have come to the conclusion that J8dd is a +EV call from the SB against a passive EP raise and with an aggressive player behind that can squeeze. This does not seem like an even spot at all to me. We have the worst position with the worst starting hand in a 2 or 3 way pot if we dont get squeezed. We will certainly loose more often than we win here unless our opponents are doing something very wrong. Position matters in poker. The J8dd will be more +EV from BU than from SB but in both cases we are behind most of EPs raising range and will have to win with post flop aggression or by out flopping EP.” 22“KK is a raise here OOP with the BB left to act behind. I’m not sure how you are coming up with the EV numbers but if we aren’t raising KK here then what are we raising? Only AA? We need to be able to build large pots with large hands especially OOP. If they all fold there is no rake and we move to the next hand quickly. I rate this estimation as 2 out of 5.” 23“As the agent already has AA, I think the chance of EP also having AA should be lower than 3,7%” Questionnaire concerning quality of artificial poker play 1

All your data will be handled for the sole purpose of gaining insight in the quality of play of an artificial poker playing agent. Your findings will be processed and included in a bachelor’s thesis for artificial intelligence by Damiaan Reijnaers at the University of Amsterdam (https://uva.nl): ‘An approach of solving early phase multiplayer No-Limit Hold’em poker using empirical Bayesian statistics and vector spaces.’

Please answer some questions about yourself first.

1. Are you aware of the rules of No-Limit Texas Hold’em poker? O Yes O No 2. In how many online hands have you participated over the past 5 years? O None O Less than 10,000 O Between 10,000 and 100,000 O More than 100,000 3. To what degree do you consider yourself aware of ? O I have studied a lot of strategy O I have picked up some strategy O I am not informed about any strategical theory of the game 4. Please share your online poker username so I can verify your winnings. O I do not want to share my username but I am a profitable player O I do not want to share my username and I am purely a recreational player O My username is: ......

In the regarded work, methods are described for creating an artificial program concerned with making decisions given pre-flop poker situations. Please note: • The algorithm has not been ‘fed’ any kind of strategy nor bet sizing. • The algorithm takes into account situational factors such as the aggressive- ness of opponent’s on the table, stack sizes and position. • All introduced situations involve Zoom1 hands at $0.05/$0.10 stakes. 1‘Zoom poker,’ released in 2012 by PokerStars, is a variant of cash game poker in which opponents change after every hand. Please consult the following webpage for more information: https://www.pokerstars.com/nl/poker/zoom/. Questionnaire concerning quality of artificial poker play 2

Let us consider the situation illustrated in figure 1 below. You are on the small blind. The player on the first position raised to $0.25 and it folds to you. All players have a stacksize of $10.00. You have the following information on your opponents at your disposal:

• EP has an estimated VPIP-value2 of 43 and a PFR/VPIP-value3 of 10.

• BB has an estimated VPIP-value of 17, and a PFR/VPIP-value of 82.

EP MP a i $0.25

BB $0.10 total pot: $0.40 CO

$0.05 SB BU

A A ♣ ♡

♣ ♡

♡ ♣

A A

Figure 1: Situation in which the agent is seated in the small blind and faces a raise from the first position, the size of 166.67% with respect to the pot.

For this particular opponent in this particular situation, the agent has estimated part of the hole card range4 of the player raising in the first position to be as shown in the table 1 below (note that concepts like combos5 are automatically ‘accounted for’ in these percentages). The agent bases the estimation on all situational factors

2A VPIP-value for a player is a value ranging from 0 to 100, measuring the degree of looseness of a player. The value is the frequency of a player voluntarily putting money into a pot. If a player has a VPIP-value of 40, the player generally plays 40% of the hands it is dealt. 3A PFR/VPIP-value for a player is a value ranging from 0 to 100, measuring the degree of aggression for a specific player. It is a frequency of the number of times a player voluntarily put in money in the pot by raising, divided by a player’s VPIP-value. 4A weighted group of cards a certain player has in a particular situation 5Every type of starting hand (such as ‘AKs’) come in different ‘combinations:’ suited cards (such as ‘AQs’) have four combinations, while pocket pairs (such as ‘QQ’) come in six different combinations and offsuit cards (such as ‘98o’) have twelve different combinations of suits. Questionnaire concerning quality of artificial poker play 3

(such as the VPIP-values or stack sizes of the opponents the agent is seated with) but does not directly calculate a hole card range for an opponent by using e.g. VPIP values or stack sizes.

Table 1: Estimated hole card range for a player who raised from first position.

p ⋅ 100 hole cards p ⋅ 100 hole cards p ⋅ 100 hole cards 5.69(%) AKo 4.69(%) AQo 4.14(%) AJo 3.99(%) QQ 3.88(%) KK 3.70(%) AA 3.61(%) KQo 3.56(%) JJ 3.28(%) TT 2.83(%) 99 2.60(%) 88 2.53(%) ATo 2.24(%) 77 2.22(%) KJo 2.06(%) AKs 2.02(%) 66 1.76(%) 55 1.72(%) AQs 1.71(%) AJs 1.55(%) KQs 1.51(%) ATs 1.42(%) QJs 1.40(%) QJo 1.38(%) 44 1.34(%) KJs 1.28(%) JTs 1.17(%) QTs 1.15(%) A9s 1.13(%) 33 1.11(%) KTs 1.08(%) T9s 0.98(%) A5s 0.94(%) A8s

Please answer the following questions below.

5. Rate the estimation of the hole card range for the opponent in the first position between 1 to 5 (1 = poor, 5 = excellent) ......

6. If you were the agent, with what action would you proceed in this situation? O Fold O Call $0.25 O Raise to ......

Let us assume that the agent has called the bet. The agent now approximates the probability of the big blind calling behind, folding and raising as 26.3%, 68.9% and 4.8% respectively. Questionnaire concerning quality of artificial poker play 4

7. Rate the agent’s approximation of the big blind’s action assuming the agent called EP’s bet, from 1 to 5......

Now let us assume that the agent raises the bet to $1.04. The agent now ‘thinks’ that the big blind will call 15.9% of the time, while folding 84,1% of the time. In this situation, the agent leaves a re-raise from the big blind out of the equation.

8. Rate the agent’s approximation of the big blind’s action assuming the agent raised EP’s bet, from 1 to 5......

The agent bases its raising size on a size with which the agent expects the most value. Raising sizes have not been pre-defined. Please note that the agent is designed to allow for balanced6 raising sizes in order to avoid exploitation by opponents.

9. Rate the raising size of the agent from 1 to 5: ......

A A J 8 Instead of ♣ ♡ the agent holds ♢ ♢ . The agent estimates that folding and call- ing both approximately yields the same expected value7. Folding has an expected value of $0.00 while the agent expects to win $0.007 by calling. If the agent would J 8 hold ♢ ♣ , it would fold. Additionally, if the agent would be seated in BU instead J 8 of SB, the agent would fold ♢ ♢ as well.

10. Rate the agent’s estimation for its expected value in these situations from 1 to 5: ......

Questions continue on the next page.

6Please consider: The Pokerbank, Range balancing, http://www.thepokerbank.com/str ategy/concepts/range-balancing/ 7The expected value of an action is the amount of money a player, on average, is expected to win when making this play. Questionnaire concerning quality of artificial poker play 5

K K A similar situation occurs when the agent would have been dealt ♠ ♣ . The agent’s estimated expected value for calling is $0.25 while raising to $0.84 yields $0.21.

11. Rate the agent’s estimation for its expected value in these situations from 1 to 5: ......

Let us again consider the agent to be seated in BU, facing the same raise from K K EP and a fold from MP. The agent now prefers raising ♠ ♣ .

12. What do you think about this change of decisions? O It’s good. O It’s bad. O Both options are fine.

Considering the same situation—with the agent in BU—the agent both calls A 10 A J ♠ ♠ and ♠ ♢ but slightly prefers the former hand.

13. Do you agree (not minding whether you would call or not yourself)? A J O No, I would rather prefer ♠ ♢ in this spot. A 10 O Yes. I like ♠ ♠ more.

14. If you were the agent, with what action would you proceed in this situation 10 10 if you would hold ♣ ♠ ? O Fold O Call $0.25 O Raise to ......

Thank you for filling in these questions. The answers can be sent by email to [email protected].