A Data-Driven Approach for Gin Rummy Hand Evaluation

PRELIMINARY PREPRINT VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference. A Data-Driven Approach for Gin Rummy Hand Evaluation Sang T. Truong1, Todd W. Neller2 1DePauw University, 2Gettysburg College sangtruong [email protected], [email protected] Abstract Players need to make a series of non-trivial choices in Gin Rummy. This paper focuses on two play strategies that we We develop a data-driven approach for hand strength evalu- call “offensive” and “defensive”. Discarding offensively, the ation in the game of Gin Rummy. Employing Convolutional Neural Networks, Monte Carlo simulation, and Bayesian rea- player focuses on developing their hand while potentially soning, we compute both offensive and defensive scores of helping their opponent complete a meld. Discarding defen- a game state. After only one training cycle, the model was sively, the player focuses on discarding cards that are un- able to make sophisticated and human-like decisions with a likely to be used by their opponent at the cost of not improv- 55:4%0:8% win rate (90% confidence level) against a Sim- ing their hand. An expert player often alternates between ple player. these strategies (Shankar 2015). Determining when to play which strategy requires metrics to assert the goodness of a hand (offensive score) and the safeness of each discardable Introduction card (defensive score). Although Gin Rummy was one of the most popular card Other than some typical suggestions to build a good hand games of the 1930’s and 1940’s (Parlett 2008, 2020) (e.g. low deadwood points, high melds), developing patterns and remains one of the most popular standard-deck card is an empirical rule for constructing an offensive hand. For games (Ranker 2020; BoardGameGeek.com 2020), it has re- example, the player should try to form a triangle of three ceived relatively little Artificial Intelligence (AI) research at- cards (e.g. 4~ − 5~ − 5♠) as this pattern becomes a meld if tention. Here, we develop initial steps towards hand strength the player can draw a 3~, 6~, 5}, or 5|. Other patterns, evaluation in the game of Gin Rummy. such as a square of four cards, i.e. two pairs in adjacent Gin Rummy is a 2-player imperfect information card ranks, also have advantages (Shankar 2015; Rubl.com 2020; games played with a standard (a.k.a. French) 52-card deck. RummyTalk.com 2020). Looking for all possible patterns is Ranks run from aces low to kings high. The game’s object is a daunting task, as there might be many good patterns to fol- to be the first player to score 100 or more points accumulated low and bad ones to avoid. Playing offensively by balancing through the scoring of individual rounds. We follow standard between reducing deadwood, increasing melds, and assem- Gin Rummy rules (Pagat.com 2020) with North American bling patterns under time pressure is even more challenging. 25-point bonuses for both gin and undercut. Since we do not know any system for quantitatively com- A meld is either at least 3 cards of a rank or at least 3 puting offensive scores, building a model to accomplish the cards of a suit with consecutive ranks. Cards not in melds above task is our first goal. are deadwood. Cards have associated point values with aces The existing defensive score system relies on the basic being 1 point, face cards being 10 points, and other number counting principle. For example, without any other informa- cards having points according to their number. Deadwood tion, since a 10 (denoted “T”) can be used in six different points are the sum of card points from all deadwood cards. ways by the opponent to form a meld of three cards (e.g. At the beginning of each round, each player is dealt a 89T, 9TJ, TJQ, and 3×TTT), it has a discard risk score of hand of 10 cards, and one card is dealt face-up to start the 6. Similarly, a K has a discard risk score of 4. Thus, without discard pile. The rest of the cards form a face-down draw other information, it is safer to discard a K than to discard a pile. On each player’s turn, the player takes two actions: (1) T. The discard risk score can evolve as the game progresses. pick up a card (from either the face-up discard pile or the For example, if the player knows that the opponent does not face-down draw pile) and (2) discard a card to the face-up have any 9 or J, their T’s discard risk score is only 3. Hence, discard pile. A player with total deadwood points less than or it is now relatively safer to discard the T than the K. This ex- equal to 10 may also knock to end the round. Deferring rule isting defensive scoring system gives an upper bound to the details to (Pagat.com 2020), the player with less deadwood discard risk score of a card since it only considers the worst generally scores according to the deadwood difference. cases given the available information. Refining this scoring Copyright c 2021, Association for the Advancement of Artificial system is our second goal. Beyond an upper bound, we aim Intelligence (www.aaai.org). All rights reserved. to improve the estimation of each cards’ discard risk score to help defensive players make a more informed decision. “player”). Variable o 2 R1×4×13 is opponent hand estima- With the two aforementioned goals, we present a data- tion, which will be explained in more details in Defensive driven approach for hand evaluation. We show that a Con- Score section. Variable y 2 R represents the goodness of a volutional Neural Network (CNN) and a Monte Carlo (MC) hand. At the end of each round (through which both players’ simulation can be used to estimate the offensive score of a hands change), the winner scores s. For a game of r rounds hand and that Bayesian reasoning can be used to estimate with index counting down from r − 1 to 0, for a hand at the discard risk score of each discardable card. With only a round q with a discount factor γ: single training cycle, our player achieves 55:4%0:8% with s × γq if player wins a 90% confidence level (CL) winning rate against a Simple y = (2) player (SP), the baseline of our research provided by (Neller −s × γq if player loses 2020). The SP implements a simple strategy: Having γ = 1 implies no discount. The score of the last • Ignore opponent actions and cards that are not in play. hand is never discounted as its index q always equals 0. • Only draw a face-up to complete or improve a meld. Oth- erwise, draw face-down. A 2 3 4 5 6 7 8 9 T J Q K • Discard a highest-deadwood unmelded card, breaking ties ♠ 0 0 0 0 1 0 0 0 0 0 0 1 0 randomly and without regard to breaking up potential ~ 0 0 1 0 0 1 0 0 0 0 0 1 0 meld patterns (e.g. pairs). } • Knock as soon as possible. 0 0 0 0 0 1 1 0 0 0 0 0 0 Our data-driven approach helps the player make sophis- | 1 0 1 0 0 0 0 0 0 0 1 0 0 ticated and human-like decisions to improve play performance against a SP for baseline comparison. Figure 1: [ 5♠, Q♠, 3~, 6~, Q~, 6}, 7}, J|, A|, 3|] 2- dimensional binary hand representation, in which each row Related Work is a suit and each column is a rank. 1 = having a card. As previously mentioned, there is little prior AI research We employ neural network y^ = f^ to approximate y = f: on the game of Gin Rummy in particular. TD-Rummy and EVO-Rummy (Kotnik and Jugal Kalita 2003) perform y =y ^ + , ∼ N (0; σ2) (3) reinforcement learning using an artificial neural network (ANN). That study seeks to compare traditional temporal- where is Gaussian error. Since f^ is a parametric function, difference learning to an evolutionary learning of ANN finding f^ is equivalent to finding a set of weights that min- weights, and thus primarily serves as a comparison of two imizes a given loss (e.g. mean square error). From this per- learning methods. It does not fully implement the rules of spective, fitting a model through a dataset is summarizing Gin Rummy, e.g. there is no laying off of cards. The only information from that data into a set of parameters. Hence, baseline for comparison is a player with a randomly initial- the model trained from data collected from a player can only ized ANN, so it is unclear how the players would perform be as good as that player. For a model to yield superior play against a baseline SP with a reasonable strategy. performance against a SP, we would want to train that model More relevant is the literature on Poker AI. Like Gin on a dataset that features play superior to that of the SP. One Rummy, Poker is a stochastic, imperfect information card way to approach this is to explore possible lines of play with game, albeit with a significantly smaller number of infor- MC simulation. mation sets. DeepStack (Moravcˇ´ık et al. 2017) and Libra- We explore the state space to find superior lines of play to tus (Brown and Sandholm 2018) effectively solved Heads- that of the SP by using MC simulation on a game in which a up, no-limit Texas Hold ’Em Poker, but each require consid- Random player (RP) plays against a SP.

Load more