arXiv:1804.04789v1 [cs.GT] 13 Apr 2018 taeiscnb ucsfli utpae ae fe all after games 3- multiplayer in equili a realisti successful Nash of be on in despi based can variety agents that, strategies guarantees, a shows equilibrium theoretical defeat of This We Nash lack game. to players. exact imperfect-information able two an player than is using more that with opponents agent no games an in in gua guarantees describe no or theoretical but strong games games, have zero-sum concep zero-sum which two-player solution in equilibrium, game-theoretic antees Nash approaches approximating as Common AI. on such in problem based open major are a is players two lal nti aetesceso lyn ahequilibriu Nash a playing of success the case this in Clearly hr ohslc hi rfre pinwt probability with an option preferred Football), their (Football, select Football both strategy select where equilibrium players Oper (Opera, Nash both (i.e., three Opera when select has players game both when This depict profiles: game 1. Sexes hold, the Figure properties of in Battle additional classic these the of from highlighted none exis result, to guaranteed Nash’s still to is equilibrium Nash while [5]. players, time polynomial in tha computed fact the be to can due it games additiona no-limit zero-sum is two-player player equilibrium for Nash two compelling [4]. of [3], the game in ’em humans large-scale hold Texas strongest popular the the defeat on in successful, to based world able very been agents been even fact have have and quit in the equilibrium it and in Nash made concept, approximating 2 has solution equilibrium and This compelling Nash 1 property. a games player “unbeatability” zero-sum of this two-player enjoys role suc for follows the he So alternates if game. case and worst the strategy in a tying) least gua can at player (or ou a guarantee winning that expecta- worst-case means (in this best case Essentially the a strategies. worst is all the value guarantees in the that player strategy and the a tion), for game such the von work of playing value that that In [2]. proved Neumann Neumann von John strateg by /maximin of earlier compel concept developed the particularly with is coincides the it concept of as solution losses the the finite equal player), competitiv player in one other (i.e., of exists games winnings always the PhD zero-sum where one games pioneering two-player that the For proving to [1]. due Nash games part John large of in thesis theory, game in cept ucsflNs qiiru gn o 3-Player a for Agent Equilibrium Nash Successful Abstract o o-eosmgmsadgmswt oeta two than more with games and games non-zero-sum For con- solution central the as emerged has equilibrium Nash mi:[email protected] Email: im ec,Foia 33139 Florida, Beach, Miami Cetn togaet o ae ihmr than more with games for agents strong —Creating azre Research Ganzfried a Ganzfried Sam .I I. NTRODUCTION mefc-nomto Game Imperfect-Information lrd nentoa University International Florida mi:[email protected] Email: im,Foia 33199 Florida, Miami, rantee brium . due t ling, utnNowak Austin ea te of t a)), ies lly ed as n- 5 3 m r- ts d h e e c t . nvr ag ae hr h ubro aeieain and iterations game of number the where games large partic very difficult, in very success is However, ex- exploitation setting.) opponent successful that performing as for much even exploit as agents fully zero- ploitative not opponents two-player may suboptimal equilibrium for that of Nash helpful mistakes as (Note well, very as [16]. games potentially [15], sum be opponents and would learn the to this simply of attempt just also weaknesses would not exploit and would concept solution approach theory a successful follow game most and the intelligence at course, artificial one of important intersection most the the the problem—perhaps open particular succes an are in remains strategies equilibrium and Nash whether games, of question multiplayer and non-zero-sum ro na3pae mefc-nomto ae[3,[14]. [13], game imperfect-information approximatio 3-player of a degree Nash in small approximate very error provably that a [12]. developed to [11], techniques equilibrium [10], been [9], also [8], settings have degree varying different with in Sexes) success the of of Battle as such heuri (i been games games several matrix strategic-form has though for that developed [7], it been conjectured have [6], approaches equilibrium, widely exists is algorithm Nash efficient and a PPAD-complete no even compute be Furthermore, to payoff. to proven players’ wanted two we other negative if the equals payoff of by whose sum game player the zero-sum third three-player “dummy” a a into adding converted be general-s three-player two-player can any Even game as special, games). not are zero-sum can- games it two-player zero-sum above in described as occur games (though not in players for two occur despite just than can more phenomenon 0 is with same example of the this payoff game, two-player While possible a equilibrium. worst Nash the a receive following Footbal will first plays and wife the equilibrium the Nash from hi second follows strategy the husband the from player. her but strategy other follows Opera, plays the wife and equilibrium by the Nash chosen if strategy example, the For on heavily depends hs h rbe fhwt raesrn gnsfor agents strong create to how of problem the Thus, Football i.1 ateo h ee game. sexes the of Battle 1. Fig. Opera lrd nentoa University International Florida mi:jpina011@fiu.edu Email: im,Foia 33199 Florida, Miami, Opera (0,0) (3,2) onirPinales Joannier Football (2,3) (0,0) There ularly fully sful, Of . stic um .e., l, n s s observations of the opponents’ play is small compared to the have questionable skill level, so it is not clear whether the number of game states. And furthermore, such approaches CFR-based approaches actually produce high-quality strategies are susceptible to being deceived and counterexploited by or whether they just produced strategies that happened to sophisticated opponents. It is clear that pure exploitation outperform mediocre opponents and would have done very approaches are insufficient to perform well against a mix of poorly against strong ones. While these CFR-based approaches opponents of unknown skill level, and that a strong strategy are clearly the best so far and seem to be promising, they do rooted in game-theoretic foundations is required. not conclusively address the question of whether Nash equi- The strongest existing agents for large multiplayer games librium strategies can be successful in practice in interesting have been based on approaches that attempt to approximate multiplayer games against realistic opponents. strategies [17], [18]. In particular, they apply In this paper we create an agent based on an exact Nash the counterfactual regret minimization algorithm [19], [20], equilibrium strategy for the game of 3-player Kuhn poker. which has also been used for two-player zero-sum games While this game is relatively small, and in particular quite and has resulted in super-human level play for both limit small compared to 3-player limit Texas hold ’em, it is far from Texas hold ’em [21] and no-limit Texas hold ’em [3], [4]. trivial to analyze, and has been used as a challenge problem at These agents have performed well in the 3-player limit Texas the Annual Computer Poker Competition for the past several hold ’em division of the Annual Computer Poker Competi- years [22]. A benefit of experimenting on a small problem tion which is held annually at the AI conferences AAAI or is that exact Nash equilibrium strategies can be computed IJCAI [22]. Counterfactual regret minimization is an iterative analytically [23]. That paper computed an infinite family self-play algorithm that is proven to converge to Nash equi- of Nash equilibrium strategies, though it did not perform librium in the limit for two-player zero-sum games. It can be experiments to see how they performed in practice against integrated with various forms of Monte Carlo sampling in or- realistic opponents. The poker competition also did not publish der to improve performance both theoretically and in practice. any details of the agents who participated, so it is unclear what For multiplayer and non-zero-sum games the algorithm can approaches were used by the successful agents. We ran exper- also be run, though the strategies computed are not guaranteed iments with our equilibrium agent against 10 agents that were to form a Nash equilibrium. It was demonstrated that it does created recently as part of a class project. These agents were in fact converge to an ǫ-Nash equilibrium (strategy profile in computed using a wide range of approaches, which included which no agent can gain more than ǫ by deviating) in the deep learning, opponent modeling, rule-based approaches, as small game of 3-player Kuhn poker, while it does not converge well as game-theoretic approaches. We show that an approach to equilibrium in Leduc hold ’em [17]. It was subsequently based on using a natural Nash equilibrium strategy is able proven that it guarantees converging to a strategy that is not to outperform all of the agents from the class. This suggests dominated and does not put any weight on iteratively weakly- that agents based on using Nash equilibrium strategies can in dominated actions [18]. While for some small games this fact be successful in multiplayer games, despite the fact that guarantee can be very useful (e.g., for two-player Kuhn pokera they do not have a worst-case theoretical guarantee. Of course high fraction of the actions are iteratively-weakly-dominated), since we just experimented on one specific game there is no in many large games (such as full Texas hold ’em) only a very guarantee that this conclusion would apply beyond this to other small fraction of actions are dominated and the guarantee is games, and more extensive experiments would be needed to not useful. Other approaches based on integrating the fictitious determine whether this conclusion would generalize. play algorithm with MDP-solving algorithms such as policy iteration have been demonstrated experimentally to converge II. THREE-PLAYER KUHN POKER to ǫ-equilibrium for very small ǫ in a no-limit Texas hold Three-player Kuhn poker is a simplified form of limit poker ’em poker tournament endgame [13], [14]. It has been proven that has been used as a testbed game in the AAAI Annual that if these algorithms converge, then the resulting strategy Computer Poker Competition for several years [22]. There is profile constitutes a Nash equilibrium (while CFR does not a single round of betting. Each player first antes a single chip have such a guarantee); however, the algorithms are not proven and is dealt a card from a four-card deck that contains one to converge in general, despite the fact that they did for the Jack (J), one Queen (Q), one King (K), and one Ace (A). game that was experimented on. The first player has the option to bet a fixed amount of one The empirical success of the 3-player limit Texas hold ’em additional chip (by contrast in no-limit games players can bet agents in the Annual Computer Poker Competition suggests arbitrary amounts of chips) or to check (remain in the hand that CFR-based approaches which are attempting to approxi- but not bet an additional chip). When facing a bet, a player mate Nash equilibrium are promising for multiplayer games. can call (i.e., match the bet) or fold (forfeit the hand). No However, the takeaway is not very clear. First, the algorithms additional bets or raises beyond the additional bet are allowed are not guaranteed to converge to equilibrium for this game, (while they are allowed in other common poker variants such and there is no guarantee on whether the strategies used by as Texas hold ’em, both for the limit and no-limit variants). If the agents constitute a Nash equilibrium or are even remotely all players but one have folded, then the player who has not close to one. Furthermore, there were only a small number folded wins the pot, which consists of all chips in the middle. of opposing agents submitted to the competition who may If more than one player have not folded by the end there is a showdown, at which the players reveal their private card and III. NASH EQUILIBRIUM-BASEDAGENT the player with the highest card wins the entire pot (which One way wonder why it is worthwhile to create agents and consists of the initial antes plus all additional bets and calls). experiment on three-player Kuhn poker, given that the game The ace is the highest card, followed by king, queen, and jack. has been “solved,” as described in the preceding section. First, As one example of a play of the game, suppose the players as described there are infinitely many Nash equilibria in this are dealt Queen, King, Ace respectively, and player 1 checks, game (and furthermorethere may be others beyond those in the then player 2 checks, then player 3 bets, then player 1 folds, family computed in the prior work). So even if we wanted to then player 2 calls; then player 3 would win a pot of 5, for a create an agent that employed a Nash equilibrium “solution,” profit of 3 from the amount he started the hand with. it would not be clear which one to pick, and the performance would depend heavily on the strategies selected by the other Note that despite the fact that 3-player Kuhn poker is only agents (who may not even be playing a Nash equilibrium at a synthetic simplified form of poker and is not actually played all). This is similar to the phenomenon described for the Battle competitively, it is still far from trivial to analyze, and contains of the Sexes Game in the introduction, where even though many of the interesting complexities of popular forms of poker the wife may be aware of all the equilibria, if she attends the such as Texas hold ’em. First, it is a game of imperfect Opera as part of the (O,O) equilibrium while the husband does information, as players are dealt a private card that the other football as part of the (F,F) equilibrium, both players obtain agents do not have access to, which makes the game more very low payoff despite both following equilibrium. A second complex than a game with that has the reason is that, as also described in the introduction, Nash same number of nodes. Despite the size, it is not trivial to equilibrium has no theoretical benefits in three-player games, compute Nash equilibrium analytically, though recently an and it is possible that a non-equilibrium strategy (particularly infinite family of Nash equilibria has been computed [23]. The one that integrates opponent modeling and exploitation) would equilibrium strategies exhibit the phenomena of bluffing (i.e., perform better, even if we expected the opponents may be sometimes betting with weak hands such as a Jack or Queen), following a Nash equilibrium strategy, but particularly if we and slow-playing (aka trapping) (i.e., sometimes checking with expect them to be playing predictably and/or making mistakes. strong hands such as a King or Ace in order to induce a bet from a weaker hand). To see why, suppose an agent X played So despite that the fact that exact Nash equilibrium strate- a simple strategy that only bet with an Ace or sometimes a gies have been computed for this game, it is still very unclear King. Then the other agents would only call the bet if they what a good approach is for creating a strong agent against a had an Ace, since otherwise they would know they are beat pool of unknown opponents. (since there is only one King in the deck, if they held a King For our agent we have decided to use a Nash equilibrium they would know that player X held an Ace). But now if the strategy that has been singled out as being more robust than other agents are only calling with an Ace, it is unprofitable for the others in prior work and that obtains the best worst- player X to bet with a King, since he will lose an additional case payoff assuming that the other agents are following one chip whenever another player holds an Ace, and will not get a of the strategies given by the computed infinite equilibrium call from a worse hand; it would be better to check and then family [23]. We depict this strategy in Table I. This table potentially call with hopes that the other player is bluffing (or assigns values for the 21 free parameters in the infinite family to fold if you think the player is bluffing too infrequently). A of Nash equilibrium strategies. To define these parameters, better strategy may be to bet with an Ace and to sometimes bet ajk, bjk, and cjk denote the action probabilities for players with a Jack as a bluff, to put the other players in a challenging P1, P2, and P3 respectively when holding card j and taking situation when holding a Queen or King. However, player X an aggressive action (Bet (B) or Call (C)) in situation k, where may also want to sometimes check with an Ace as well so the betting situations are defined in Table III. Prior work has that he can still have some strong hands after he checks and actually singled out a range of strategies that receive the best the players are more wary of betting into him after a check. worst-case payoff; above we have described the lower bound of this space, and we also experiment using the strategy that A full infinite family of Nash equilibria for this game has falls at the upper bound (Table II). been computed and can be seen in the tables from a recent P1 P2 P3 article by Szafron et al. [23]. The family of equilibria is based a11 = 0 b11 = 0 c11 = 0 = 0 = 0 = 1 on several parameter values, which once selected determine a21 b21 c21 2 a22 = 0 b22 = 0 c22 = 0 the probabilities for the other portions of the strategies. One a23 = 0 b23 = 0 c23 = 0 can see from the table that randomization and including some a31 = 0 b31 = 0 c31 = 0 a32 = 0 b32 = 0 c32 = 0 probability on trapping and bluffing are essential in order to 1 1 1 a33 = 2 b33 = 2 c33 = 2 have a strong and unpredictable strategy. Thus, while this game a34 = 0 b34 = 0 c34 = 0 a41 = 0 b41 = 0 c41 = 1 may appear quite simple at first glance, analysis is still very far TABLE I from simple, and the game exhibits many of the complexities PARAMETER VALUES USED FOR OUR NASH EQUILIBRIUM AGENT of far larger games that are played competitively by humans for large amounts of money. P1 P2 P3 1 REFERENCES a11 = 0 b11 = 4 c11 = 0 1 1 a21 = 0 b21 = 4 c21 = 2 = 0 = 0 = 0 [1] J. Nash, “Non-cooperative games,” Ph.D. dissertation, Princeton Univer- a22 b22 c22 sity, 1950. a23 = 0 b23 = 0 c23 = 0 [2] J. von Neumann, “Zur Theorie der Gesellschaftsspiele,” Mathematische a31 = 0 b31 = 0 c31 = 0 Annalen, vol. 100, pp. 295–320, 1928. a32 = 0 b32 = 1 c32 = 0 1 7 [3] N. Brown and T. Sandholm, “Safe and nested solving for a33 = 2 b33 = 8 c33 = 0 a34 = 0 b34 = 0 c34 = 1 imperfect-information games,” in Proceedings of the Annual Conference a41 = 0 b41 = 1 c41 = 1 on Neural Information Processing Systems (NIPS), 2017. TABLE II [4] M. Moravˇc´ık, M. Schmid, N. Burch, V. Lis´y, D. Morrill, PARAMETER VALUES USED FOR OUR SECOND NASH EQUILIBRIUM AGENT N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling, “Deepstack: Expert-level artificial intelligence in heads-up no-limit poker,” Science, 2017. [Online]. Available: Situation P1 P2 P3 http://science.sciencemag.org/content/early/2017/03/01/science.aam6960 1 – K KK [5] D. Koller and N. Megiddo, “The complexity of two-person zero-sum 2 KKB B KB games in extensive form,” Games and Economic Behavior, vol. 4, no. 4, 3 KBF KKBF BF pp. 528–552, Oct 1992. 4 KBC KKBC BC [6] X. Chen and X. Deng, “Settling the complexity of 2-player Nash TABLE III equilibrium,” in Proceedings of the Annual Symposium on Foundations BETTING SITUATIONS IN THREE-PLAYER KUHN POKER of Computer Science (FOCS), 2006. [7] ——, “3-Nash is PPAD-complete,” Electronic Colloquium on Compu- tational Complexity, vol. Report No. 134, 2005. [8] K. Berg and T. Sandholm, “Exclusion method for finding Nash equilib- IV. EXPERIMENTS rium in multiplayer games.” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017. We experimented against 10 of 11 agents submitted recently [9] R. Porter, E. Nudelman, and Y. Shoham, “Simple search methods for for a class project (we ignored one agent that ran very finding a Nash equilibrium,” Games and Economic Behavior, vol. 63, no. 2, pp. 642–662, 2008. slowly, which performed poorly). These agents utilized a [10] S. Govindan and R. Wilson, “A global Newton method to compute Nash wide variety of approaches, ranging from neural networks equilibria,” Journal of Economic Theory, vol. 110, pp. 65–86, 2003. to counterfactual regret minimization to opponent modeling [11] T. Sandholm, A. Gilpin, and V. Conitzer, “Mixed-integer programming methods for finding Nash equilibria,” in Proceedings of the National to rule-based approaches. For each grouping of 3 agents we Conference on Artificial Intelligence (AAAI), 2005, pp. 495–501. ran matches consisting of 3000 hands between each of the [12] C. Lemke and J. Howson, “Equilibrium points of bimatrix games,” 6 permutations of the agents (with the same cards being Journal of the Society of Industrial and Applied Mathematics, vol. 12, pp. 413–423, 1964. dealt for the respective positions of the agents in each of [13] S. Ganzfried and T. Sandholm, “Computing an approximate jam/fold the duplicated matches). The number of hands per match equilibrium for 3-player no-limit Texas hold ’em tournaments,” in (3000) is the same value used in the Annual Computer Poker Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2008. Competition, and the process of duplicating the matches with [14] ——, “Computing equilibria in multiplayer stochastic games of im- the same cards between the different agent permutations is a perfect information,” in Proceedings of the 21st International Joint common approach that significantly reduces the variance. We Conference on Artificial Intelligence (IJCAI), 2009. [15] M. Johanson, M. Zinkevich, and M. Bowling, “Computing robust ran 10 matches for each permutation of 3 agents. Table IV counter-strategies,” in Proceedings of the Annual Conference on Neural shows the overall payoff (divided by 100,000) for each agent. Information Processing Systems (NIPS), 2007, pp. 1128–1135. The Nash agent received highest payoff. The results are very [16] S. Ganzfried and T. Sandholm, “-based opponent modeling in large imperfect-information games,” in Proceedings of the Inter- similar when using the upper and lower bound equilibrium national Conference on Autonomous Agents and Multi-Agent Systems strategies with the upper bound performing slightly better. (AAMAS), 2011. [17] N. Abou Risk and D. Szafron, “Using counterfactual regret minimization Nash A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 to create competitive multiplayer poker agents,” in Proceedings of 2.81 (UB) 2.25 1.18 2.54 -1.65 2.32 1.74 -1.34 -9.56 -3.48 1.42 the International Conference on Autonomous Agents and Multi-Agent 2.81 (LB) 2.24 1.17 2.54 -1.66 2.32 1.74 -1.34 -9.54 -3.47 1.42 Systems (AAMAS), 2010. TABLE IV [18] R. Gibson, “Regret minimization in games and the development of EXPERIMENTS USING NASHAGENTSAGAINSTCLASSPROJECTAGENTS champion multiplayer computer poker-playing agents,” Ph.D. disserta- tion, University of Alberta, 2014. [19] M. Zinkevich, M. Bowling, M. Johanson, and C. Piccione, “Regret minimization in games with incomplete information,” in Proceedings V. CONCLUSION of the Annual Conference on Neural Information Processing Systems Creating strong agents for games with more than two play- (NIPS), 2007. [20] M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling, “Monte Carlo ers, and in particular the question of whether Nash equilibrium sampling for regret minimization in extensive games,” in Proceedings strategies are successful, is an important open problem— of the Annual Conference on Neural Information Processing Systems perhaps the most important one at the intersection of game (NIPS), 2009. [21] M. Bowling, N. Burch, M. Johanson, and O. Tammelin, “Heads-up limit theory and AI. We demonstrated that an agent based on hold’em poker is solved,” Science, vol. 347, no. 6218, pp. 145–149, following an exact Nash equilibrium is able to outperform January 2015. agents submitted for a recent class project that utilize a wide [22] Http://www.computerpokercompetition.org/. [23] D. Szafron, R. Gibson, and N. Sturtevant, “A parameterized family of variety of approaches. This suggests that agents based on equilibrium profiles for three-player kuhn poker,” in Proceedings of using Nash equilibrium strategies can in fact be successful the International Conference on Autonomous Agents and Multi-Agent in multiplayer games, despite the fact that they do not have a Systems (AAMAS), 2013. worst-case theoretical guarantee.