An Expert-Level Card Playing Agent Based on a Variant of Perfect Information Monte Carlo Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) An Expert-Level Card Playing Agent Based on a Variant of Perfect Information Monte Carlo Sampling Florian Wisser Vienna University of Technology Vienna, Austria [email protected] Abstract it would take at least 10 exabytes to store it. Using state-space abstraction [Johanson et al., 2013] EAAs may still be able Despite some success of Perfect Information Monte to find good strategies for larger problems, but they depend Carlo Sampling (PIMC) in imperfect information on finding an appropriate simplification of manageable size. games in the past, it has been eclipsed by other So, we think it is still a worthwhile task to search for in-time approaches in recent years. Standard PIMC has heuristics like PIMC that are able to tackle larger problems. well-known shortcomings in the accuracy of its de- cisions, but has the advantage of being simple, fast, On the other hand, in the 2nd edition (and only there) of [ robust and scalable, making it well-suited for im- their textbook, Russell and Norvig Russell and Norvig, 2003, ] perfect information games with large state-spaces. p179 quite accurately use the term “averaging over clair- We propose Presumed Value PIMC resolving the voyancy” for PIMC. A more formal critique of PIMC was [ problem of overestimation of opponent’s knowl- given in a series of publications by Frank, Basin, et al. Frank ] [ et al. ] [ edge of hidden information in future game states. and Basin, 1998b , Frank , 1998 , Frank and Basin, ] [ ] The resulting AI agent was tested against human 1998a , Frank and Basin, 2001 , where the authors show that experts in Schnapsen, a Central European 2-player the heuristic of PIMC suffers from strategy-fusion and non- trick-taking card game, and performs above human locality producing erroneous move selection due to an over- expert-level. estimation of MAX’s knowledge of hidden information in fu- ture game states. A further investigation of PIMC and why it still works well for many games is given by Long et al. [Long 1 Introduction et al., 2010], trying to give three easily measurable proper- Perfect Information Monte Carlo Sampling (PIMC) in tree ties of a game tree, meant to predict the success of PIMC in a search of games of imperfect information has been around for game. More recently overestimation of MAX’s knowledge is many years. The approach is appealing, for a number of rea- also dealt with in the field of general game play [Schofield et sons: it allows the usage of well-known methods from perfect al., 2013]. To the best of our knowledge, all literature on the information games, its complexity is magnitudes lower than deficiencies of PIMC concentrates on the overestimation of the problem of weakly solving a game in the sense of game MAX’s knowledge. Frank et al. [Frank and Basin, 1998a] ex- theory, it can be used in a just-in-time manner even for games plicitly formalize the “best defense model”, which basically with large state-space, and it has proven to produce compet- assumes a clairvoyant opponent, and state that this would be itive AI agents in some games. Since we will mainly deal the typical assumption in game analysis in expert texts. This with trick-taking cards games, let us mention Bridge [Gins- may be true for some games, but clearly not for all. berg, 1999], [Ginsberg, 2001], Skat [Buro et al., 2009] and Think, for example, of a game of heads-up no-limit Schnapsen [Wisser, 2010]. hold’em poker playing an opponent with perfect information, In recent years research in AI in games of imperfect in- knowing your hand as well as all community cards before formation was heavily centered around equilibrium approx- they even appear on the table. The only reasonable strategy imation algorithms (EAA). One of the reasons might be the left against such an opponent would be to immediately con- concentration on simple poker variants as in the renowned an- cede the game, since one will not achieve much more than nual computer poker competition. Wherever they are appli- stealing a few blinds. And in fact expert texts in poker do cable, agents like Cepheus [Bowling et al., 2015] for heads- never assume playing a clairvoyant opponent when analyzing up limit hold’em (HULHE) will not be beaten by any other the correctness of the actions of a player. agent, since they are nearly perfect. However, while HULHE In the following — and in contrast to the references above has a small state-space (∼ 1014), solving this problem still — we start off with an investigation of the problem of over- required very substantial amounts of computing power to cal- estimation of MIN’s knowledge, from which PIMC and its culate a strategy stored in 12 terabytes. Our prime example known variants suffer. We set this in context to the best Schnapsen has a state-space around 1020, so even if a near defense model and show why the very assumption of it is equilibrium strategy was computable within reasonable time, doomed to produce sub-optimal play in many situations. For 125 A A call call fold fold 011 0−11 1 4 ep = 0 4 B ep = 1 ap @ A = 3 C B ep = −1 ap @−1A = − 3 C ep = 0 2 −2 diff diff match match D E D E 0 1 0 1 ~A : 011 0−11 1 ~A : 0−11 0−11 1 }A : @1A @−1A @ 1 A }A : @−1A @−1A @ 1 A |A : 1 2 −2 |A : −1 2 −2 Figure 1: PIMC Tree for XX Figure 2: PIMC Tree for XI the purpose of demonstration we use the simplest possible To the left of node C, the evaluation of straight PIMC (for synthetic games we could think of. We go on defining two better distinction abbreviated by SP in the following) of this new heuristic algorithms, first Presumed Payoff PIMC, tar- node is given, averaging over the payoffs in different worlds geting imperfect information games decided within a single after building the point-wise maximum of the payoff vectors hand (or leg), followed by Presumed Value PIMC for games in D and E. We see that SP is willing to turn down an en- of imperfect information played in a sequence of hands (or sured payoff of 1 by folding, to go for an expected payoff legs), both dealing with the problem of MIN overestimation. of 0, by calling and going for either bet then. The reason is Finally we do an experimental analysis on a simple synthetic the well-known overestimation of hidden knowledge, i.e.: it game, as well as the Central-European trick-taking card game assumes to know ιI when deciding whether to bet on match- Schnapsen. ing or differing colors in node C, and thereby evaluates it to 4 [ an average payoff (ap) of 3 . Frank et al. Frank and Basin, 2 Background Considerations 1998b] analyzed this behavior in detail and termed it strategy- In a 2-player game of imperfect information there are gen- fusion. We will call it MAX-strategy-fusion in the follow- erally 4 types of information: information publicly available ing, since it is strategy-fusion happening in MAX nodes. The (ιP ), information private to MAX (ιX ), information private to basic solution given for this problem is vector minimax. In MIN (ιI ) and information hidden to both (ιH ). To exemplify MAX node C, vector minimax picks the vector with the high- the effect of “averaging over clairvoyancy” we introduce two est mean, instead of building a vector of point-wise max- very simple 2-player games: XX with only two successive ima for each world, i.e. the vector attached to C would be decisions by MAX, and XI with two decisions, first one by either of (1; 1; −2) or (−1; −1; 2), not (1; 1; 2), leading to MAX followed by one of MIN. The reader is free to omit the the correct decision to fold. We list the average payoffs for rules we give and view the game trees as abstract ones. Both various agents playing XX on the left-hand side of Table games are played with a deck of four aces, ♠A, ~A, }A and 1. Looking at the results we see that a uniformly random |A. The deck is shuffled and each player is dealt 1 card, with agent (RAND) plays worse than a Nash equilibrium strategy the remaining 2 cards lying face down on the table. ιP con- (NASH), and SP plays even worse than RAND. VM stands sists of the actions taken by the players, ιH are the 2 cards for vector minimax, but includes all variants proposed by face down on the table, ιX the card held by MAX and ιI the Frank et al., most notably payoff-reduction-minimax, vector- card held by MIN. αβ and payoff-reduction-αβ. Any of these algorithms solves In XX, MAX has to decide whether to fold or call first. the deficiency of MAX-strategy-fusion in this example and In case MAX calls, the second decision to make is to bet, plays optimally. whether the card MIN holds matches color with its own card Let us now turn to the game XI, its game tree given in (match, both red or both black) or differs in color (diff). Fig. 2. The two differences to XX are that MAX’s payoff at Fig. 1 shows the game tree of XX with payoffs, assuming node B is −1, not 1, and C is a MIN (not a MAX) node.