Imperfect Information

• So far, all games we’ve developed solutions for have Lecture 10: Imperfect Information • No hidden information such as individual cards • Hidden information often represented as chance AI For Traditional Games nodes Prof. Nathan Sturtevant Winter 2011 • Could be a play by one player that is hidden until the end of the game

Example Tree What is the size of a game with ii?

• Simple betting game (Kuhn ) • Ante 1 chip • 2-player game, 3-card deck, 1 card each • First player can check/bet • Second player can bet/check or call/fold • If 2nd player bets, 1st player can call/fold 1 1111-1-1 -1 -1 -1 111 -1 -1 -1 • 3 hands each / 6 total combinations • [Exercise: Draw top portion of tree in class] Simple Approach: Perfect-Info Monte-Carlo Drawbacks of Monte-Carlo

• We have good perfect information-solvers • May be too many worlds to sample • How can we use them for imperfect information • May get probabilities on worlds incorrect games? • World prob. based on previous actions in the game • Sample all unknown information (eg a world) • May reveal information in actions • For each world: • Good probabilities needed for information hiding • Solve perfectly with alpha-beta • Program has no sense of information seeking/hiding • Take the average best move moves • If too many worlds, sample a reasonable subset • Analysis may be incorrect (see work by Frank and Basin)

Strategy Fusion Non-locality

World 2 World 1

c

1 c c' World 1 & 2 -1 1

b a a b a' b' -1 1 World 1 1 -1 World 2 1 -1 -1 1 Strengths of Monte-Carlo Analysis of PIMC

• Simple to implement • Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search • Relatively fast • Jeffrey Long and Nathan R. Sturtevant and Michael • Can play some games very well Buro and Timothy Furtak • Approximates some games better than others

• How can we measure this? • Abstract model of a game

Leaf Correlation Bias

• (lc) With probability lc, each sibling pair of terminal • b: At each correlated pair of leaf nodes, the nodes’ nodes will have the same payoff value (whether it be 1 values will be set to 1 with probability b and -1 or -1). With probability (1 − lc), each sibling pair will be otherwise. Thus, with bias of 1, all correlated pairs will anti-correlated, with one randomly determined leaf have a value of 1, and with bias of 0.5, all correlated having value 1 and its sibling being assigned value -1. pairs will be either 1 or -1 at uniform random (and thus biased towards neither player). Note that anti- correlated leaf node pairs are unaffected by bias. Disambiguation factor Measurements in practice

• (df): Each time p is to move, we recursively break each of • Trick-based card games his information sets in half with probability df (thus, each • Leaf-correlation: tends to be correlated set is broken in two with probability df; and if a break occurs, each resulting set is also broken with probability • Bias: tend to have bias based on cards df and so on). If df is 0, then p never gains any direct knowledge of his opponent’s private information. If df is • Disambiguation: lots of disambiguation (each action 1, the game collapses to a perfect information game, provides some information) because all information sets are broken into sets of size one immediately.

Abstract model results Abstract model results

Figure 3: PerformanceFigure of PIMC 3: P searcherformance against of a PIMC Nash searchequilibrium. against Darker a Nash regions equilibrium. indicate Darker a greater regions average indicate loss afor greater PIMC. average Disambiguation loss for PIMC. Disambiguation is fixed at 0.3, bias atis 0.75 fixed and at correlation 0.3, bias at at 0.75 0.5 and in figures correlation a, b and at 0.5 c respectively. in figures a, b and c respectively.

Figure 4: PerformanceFigure of random 4: Performance play against of a random . play against Darker a Nash regions equilibrium. indicate Darker a greater regions average indicate loss a for greater random average play. loss Disam- for random play. Disam- biguation is fixed at 0.3,biguation bias at is0.75 fixed and at correlation 0.3, bias at at 0.75 0.5 andin figures correlation a, b and at 0.5 c respectively. in figures a, b and c respectively.

against each other onagainst the x each and othery axes, on while the x the and third y axes, pa- whilethat the third are effectively pa- that anti-correlated are effectively occuring anti-correlated perhaps one occuring or perhaps one or rameter is held constant.rameter Figures is held 3 constant. and 4 shows Figures the playing 3 and 4 showstwo the levels playing of depth uptwo from levels the of leaves depth of up the from tree. the Note leaves that, of the tree. Note that, performance of theperformance challenging player of the (eitherchallenging PIMC player search (either PIMCof course, search in the caseof of course, maximum in the bias case and of correlation,maximum bias even and correlation, even or uniform random)or against uniform the random) equilibrium against player. the equilibrium White player.the random White player willthe random play perfectly, player sincewill play the sameperfectly, player since the same player shaded regions areshaded areas of regions the parameter are areas space of the where parameter the spaceis guaranteed where the to winis no guaranteed matter what to win the line no matter of play what (we the can line of play (we can challenger breaks evenchallenger with equilibrium. breaks even with The darker equilibrium. the Theonly darker suppose the these wouldonly suppose be very these boring would games be in very real boring life). games in real life). shading, the greatershading, the challenger’s the greater loss the against challenger’s the equilib- loss against theThe equilib- situation with theThe disambiguation situation with the factor disambiguation initially ap- factor initially ap- rium. Figure 5 is similar,rium. Figureexcept 5that is similar, the shading except represents that the shadingpears represents counter-intuitive;pears it counter-intuitive; appears that a low it disambiguation appears that a low disambiguation the gain of PIMC searchthe gain overof the PIMC random search player over when the random play- playerfactor when is play- good for thefactor absolute is good performance for the absolute of PIMC performance search, of PIMC search, ing against equilibrium.ing against Dark regions equilibrium. of these Dark plots regions repre- of thesewhile plots the repre- worst casewhile is a the mid-range worst case disambiguation is a mid-range factor. disambiguation factor. sent areas where PIMCsent searchareas where is performing PIMC search almost is no performing bet- almostHowever, no bet- in the relativeHowever, case in of the PIMC’s relative gain case over of random, PIMC’s gain over random, ter than the randomter player, than whereas the random lighter player, regions whereas indicate lighter a regionsthe indicate trend is a very clearlythe trend reversed. is very The clearly explanation reversed. for The this explanation for this substantial performancesubstantial gain for performance PIMC search gain over for random. PIMC search overlies inrandom. the fact thatlies the in random the fact player that the performs random relatively player performs relatively These plots show thatThese PIMC plots search show is that at its PIMC worst search when is at itswell worst in when games withwell a low in disambiguation games with a low factor. disambiguation In some factor. In some leaf node correlationleaf is node low. correlation This is true is both low. in This absolute is true bothsense, in absolute because theresense, is so muchbecause uncertainty there is so in much these uncertainty games, in these games, performance, and inperformance, PIMC’s relative and improvement in PIMC’s relative over ran- improvementthere over is a ran- lot of ‘luck,’there and is a there lot of is ‘luck,’ only so and much there an is opti- only so much an opti- dom play. The mostdom likely play. explanation The most for likely this explanation behavior is for thismal behavior player is can do tomal improve player his can position. do to improve As we his increase position. As we increase that when anti-correlationthat when occurs anti-correlation deep in the occurs game tree deep – in thethe game disambiguation tree – the factor, disambiguation the performance factor, of the the performance random of the random particularly at the leavesparticularly – then at PIMC the leaves search – then always PIMC be- searchplayer always deteriorates be- rapidly,player deteriorates while PIMC rapidly, search iswhile much PIMC more search is much more lieves that the criticallieves decisions that the are critical going decisionsto come ‘later’ are going and to comesuccessful ‘later’ and at holdingsuccessful its own against at holding the optimalits own against player. theAs optimal player. As that what it does higherthat what up the it tree does does higher not up actually the tree matter. does not actuallydisambiguation matter. approachesdisambiguation 1, the performance approaches of 1, PIMCthe performance im- of PIMC im- Of course, when anOf information course, when set structure an information (which set PIMC structure (whichproves PIMC drastically, sinceproves the drastically, game is approaching since the game a perfect is approaching a perfect ignores at every nodeignores except at the every root node of its except own the search) root is of its owninformation search) is game.information Finally, we game. note that Finally, with a we high note dis- that with a high dis- imposed on the tree,imposed early moves on the frequently tree, earlydo movesmatter, frequently and doambiguationmatter, and in the 0.7-0.9ambiguation range, in low the correlation 0.7-0.9 range, is actually low correlation is actually thus the superior playthus of the superiorequilibrium play player. of the equilibriumWhen corre- player.good When for corre- PIMC’s performance.good for PIMC’s This performance. is a result of the This fact is a result of the fact lation is medium tolation low, bias is medium also seems to low, to play bias a also role seems here, to playthat a role these here, games becomethat these perfect games information become perfect games information very games very with more extremewith bias resultingmore extreme in better bias performance resulting in better for performancequickly, and for low correlationquickly, and increases low correlation the probability increases that the probability that PIMC, although thePIMC, effect although of bias is the generally effect of small. bias is The generallyawinisstillavailablebythetimethePIMCplayerbegins small. The awinisstillavailablebythetimethePIMCplayerbegins performance gain dueperformance to bias for gain PIMC due is to likely bias becausefor PIMC a is likelyplaying because optimally a inplaying the perfect optimally information in the perfect section information of the section of the more extreme bias reducesmore extreme the probability bias reduces of interior the probability nodes of interiortree. nodes tree.

138 138 Abstract model results Measurements in practice

• Kuhn Poker • Leaf-correlation: mixed (0.5) • You can sometimes fold and give the payoff to the other player (anti-correlated) • Bias: tend to have bias based on cards, but averages out over all cards (0.5) • Disambiguation: no disambiguation (actions give no direct information about the cards you hold)

Figure 3: Performance of PIMC search against a Nash equilibrium. Darker regions indicate a greater average loss for PIMC. Disambiguation is fixed at 0.3, bias at 0.75 and correlation at 0.5 in figures a, b and c respectively.

Kuhn Poker

Opponent to exploit its mistakes. Another issue is that the real games Player: Nash Best-Response we consider in this paper represent the ‘extremes’ of the pa- Random (p1) -0.161 -0.417 rameter space established by our synthetic trees. It would be Random (p2) -0.130 -0.500 informative if we could examine a game that is in between PIMC (p1) -0.056 -0.083 the extremes in terms of these parameters. Such a game PIMC (p2) 0.056 -0.166 could provide further evidence of whether PIMC’s perfor- Table 1: Average payoff achieved by random and PIMC mance scales well according to our properties, or whether against Nash and best-response players in Kuhn poker. there are yet more elements of the problem to consider. Figure 4: Performance of random play against a Nash equilibrium. Darker regions indicate a greater average loss for random play. Disam- Finally, we have seen that in games like skat that there biguation is fixed at 0.3, bias at 0.75 and correlation at 0.5 in figures a, b and c respectively. isn’t a single measurement point for a game, but a cloud of player 1 may fold or call. The player with the high card parameters depending on the strength of each hand. If we then wins the pot. With Nash-optimal strategies player 1 is can quickly analyze a particular hand when we first see it, ¯ against each other on the x and y axes, while the third pa- that are effectivelyexpected anti-correlated to lose 1/18 = 0. occuring05 bets per perhaps game and one player or 2 to we may be able to use this analysis to determine what the win 1/18 bets per game. best techniques for playing are on a hand-by-hand basis and rameter is held constant. Figures 3 and 4 shows the playing two levels of depthThis up game from has a the disambiguation leaves of the factor tree. of 0, Note since that, no cards improve performance further. performance of the challenging player (either PIMC search of course, in theare case revealed of maximum (if at all) until bias the and end, correlation, and the size of even an infor- or uniform random) against the equilibrium player. White the random playermation will set is play never perfectly, decreased. since By inspecting the same the player game tree Acknowledgements and using our notion of pre-terminal nodes the correlation shaded regions are areas of the parameter space where the is guaranteed to win no matter what the line of play (we can The authors would like to acknowledge NSERC, Alberta In- and bias can be seen to be 0.5 and 0.5 respectively. These genuity, and iCORE for their financial support. challenger breaks even with equilibrium. The darker the only supposeparameters these would are be very very different boring from games skat and in real hearts life). and lie shading, the greater the challenger’s loss against the equilib- The situationin the with portion the disambiguation of parameter space factor where initially we would ap- predict References rium. Figure 5 is similar, except that the shading represents pears counter-intuitive;that PIMC search it appears performs that more a low poorly disambiguation and offers little im- Billings, D.; Burch, N.; Davidson, A.; Holte, R. C.; Schaef- the gain of PIMC search over the random player when play- factor is goodprovement for the absolute over random performance play. This is of then, PIMC perhaps, search, at least ing against equilibrium. Dark regions of these plots repre- one explanation of why research in the full game of poker fer, J.; Schauenberg, T.; and Szafron, D. 2003. Approximat- while the worsthas casetaken theis a direction mid-range of finding disambiguation game-theoretic factor. solutions to ing game-theoretic optimal strategies for full-scale poker. In sent areas where PIMC search is performing almost no bet- However, in theabstract relative versions case of of the PIMC’s game rather gain than over tackling random, the game IJCAI,661–668. ter than the random player, whereas lighter regions indicate a the trend is verydirectly clearly with PIMC reversed. search. The explanation for this Buro, M.; Long, J. R.; Furtak, T.; and Sturtevant, N. R. 2009. substantial performance gain for PIMC search over random. lies in the factWe that present the random results comparing player performs play between relatively a random Improving state evaluation, inference, and search in trick- These plots show that PIMC search is at its worst when well in gamesplayer, with PIMC a low player disambiguation and Nash player factor. in Table In some 1. Kuhn based card games. In IJCAI,1407–1413. poker is not symmetric, so we distinguish the payoffs both leaf node correlation is low. This is true both in absolute sense, because there is so much uncertainty in these games, Frank, I., and Basin, D. 1998. Search in games with in- as player 1 and player 2. Because the PIMC player does not : A case study using bridge card play. performance, and in PIMC’s relative improvement over ran- there is a lottake of ‘luck,’ dominated and actions, there when is only playing so much against an a Nash opti- equi- Artificial Intelligence 87–123. dom play. The most likely explanation for this behavior is mal player canlibrium do to this improve player achieves his position. the equilibrium As we increase payoff, while arandomplayerlosessignificantlyagainstevenanequilib- Ginsberg, M. 2001. GIB: Imperfect Information in a Com- that when anti-correlation occurs deep in the game tree – the disambiguation factor, the performance of the random putationally Challenging Game. Journal of Artificial Intelli- particularly at the leaves – then PIMC search always be- player deterioratesrium player. rapidly, If an while opponent PIMC is able search to build is much a best-response more against a PIMC player, then the PIMC player is vulnerable to gence Research 303–358. lieves that the critical decisions are going to come ‘later’ and successful at holdingsignificant its exploitation own against as player the 2,optimal while the player. random As player Russell, S., and Norvig, P. 2002. Artificial Intelligence: A that what it does higher up the tree does not actually matter. disambiguationloses approaches -0.5 as the second 1, the player, performance where 0.056 of PIMC could have im- been Modern Approach.EnglewoodCliffs,NJ:PrenticeHall,2nd Of course, when an information set structure (which PIMC proves drastically,won. Thus, since these the results game present is approaching the experimental a perfect evidence edition. ignores at every node except the root of its own search) is information game.showing Finally, that PIMC we is notnote a good that approach with a high in practice dis- for Sturtevant, N. R. 2008. An analysis of UCT in multi-player imposed on the tree, early moves frequently do matter, and ambiguation inagamelikepoker.Althoughitplaysbetterandislessex- the 0.7-0.9 range, low correlation is actually games. In Computers and Games,37–49. ploitable than a random player, PIMC may lose significantly thus the superior play of the equilibrium player. When corre- good for PIMC’s performance. This is a result of the fact Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, to a opponent that can model its play. C. 2008. Regret Minimization in Games with Incomplete lation is medium to low, bias also seems to play a role here, that these games become perfect information games very Information. In Advances in Neural Information Processing with more extreme bias resulting in better performance for quickly, and low correlationConclusion increases and Futurethe probability Work that Systems 20,1729–1736. PIMC, although the effect of bias is generally small. The awinisstillavailablebythetimethePIMCplayerbeginsIn this paper, we performed experiments on simple, syn- performance gain due to bias for PIMC is likely because a playing optimallythetic in game the trees perfect in order information to gain some section insight of into the the more extreme bias reduces the probability of interior nodes tree. mystery of why Perfect Information Monte Carlo search has been so successful in a variety of practical domains in spite of its theoretical deficiencies. We defined three proper- ties of these synthetic trees that seem to be good predictors of PIMC search’s performance, and demonstrate how these 138 properties can be measured in real games. There are still several open issues related to this problem. One major issue that we have not addressed is the potential exploitability of PIMC search. While we compared PIMC’s performance against an optimal Nash-equilibrium in the synthetic tree domain, the performance of PIMC search could be substantially worse against a player that attempts

140