Games Lectures.Key

Imperfect Information • So far, all games we’ve developed solutions for have perfect information Lecture 10: Imperfect Information • No hidden information such as individual cards • Hidden information often represented as chance AI For Traditional Games nodes Prof. Nathan Sturtevant Winter 2011 • Could be a play by one player that is hidden until the end of the game Example Tree What is the size of a game with ii? • Simple betting game (Kuhn Poker) • Ante 1 chip • 2-player game, 3-card deck, 1 card each • First player can check/bet • Second player can bet/check or call/fold • If 2nd player bets, 1st player can call/fold 1 1111-1-1 -1 -1 -1 111 -1 -1 -1 • 3 hands each / 6 total combinations • [Exercise: Draw top portion of tree in class] Simple Approach: Perfect-Info Monte-Carlo Drawbacks of Monte-Carlo • We have good perfect information-solvers • May be too many worlds to sample • How can we use them for imperfect information • May get probabilities on worlds incorrect games? • World prob. based on previous actions in the game • Sample all unknown information (eg a world) • May reveal information in actions • For each world: • Good probabilities needed for information hiding • Solve perfectly with alpha-beta • Program has no sense of information seeking/hiding • Take the average best move moves • If too many worlds, sample a reasonable subset • Analysis may be incorrect (see work by Frank and Basin) Strategy Fusion Non-locality World 2 World 1 c 1 c c' World 1 & 2 -1 1 b a a b a' b' -1 1 World 1 1 -1 World 2 1 -1 -1 1 Strengths of Monte-Carlo Analysis of PIMC • Simple to implement • Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search • Relatively fast • Jeffrey Long and Nathan R. Sturtevant and Michael • Can play some games very well Buro and Timothy Furtak • Approximates some games better than others • How can we measure this? • Abstract model of a game Leaf Correlation Bias • (lc) With probability lc, each sibling pair of terminal • b: At each correlated pair of leaf nodes, the nodes’ nodes will have the same payoff value (whether it be 1 values will be set to 1 with probability b and -1 or -1). With probability (1 − lc), each sibling pair will be otherwise. Thus, with bias of 1, all correlated pairs will anti-correlated, with one randomly determined leaf have a value of 1, and with bias of 0.5, all correlated having value 1 and its sibling being assigned value -1. pairs will be either 1 or -1 at uniform random (and thus biased towards neither player). Note that anti- correlated leaf node pairs are unaffected by bias. Disambiguation factor Measurements in practice • (df): Each time p is to move, we recursively break each of • Trick-based card games his information sets in half with probability df (thus, each • Leaf-correlation: tends to be correlated set is broken in two with probability df; and if a break occurs, each resulting set is also broken with probability • Bias: tend to have bias based on cards df and so on). If df is 0, then p never gains any direct knowledge of his opponent’s private information. If df is • Disambiguation: lots of disambiguation (each action 1, the game collapses to a perfect information game, provides some information) because all information sets are broken into sets of size one immediately. Abstract model results Abstract model results Figure 3: PerformanceFigure of PIMC 3: P searcherformance against of a PIMC Nash searchequilibrium. against Darker a Nash regions equilibrium. indicate Darker a greater regions average indicate loss afor greater PIMC. average Disambiguation loss for PIMC. Disambiguation is fixed at 0.3, bias atis 0.75 fixed and at correlation 0.3, bias at at 0.75 0.5 and in figures correlation a, b and at 0.5 c respectively. in figures a, b and c respectively. Figure 4: PerformanceFigure of random 4: Performance play against of a random Nash equilibrium. play against Darker a Nash regions equilibrium. indicate Darker a greater regions average indicate loss a for greater random average play. loss Disam- for random play. Disam- biguation is fixed at 0.3,biguation bias at is0.75 fixed and at correlation 0.3, bias at at 0.75 0.5 andin figures correlation a, b and at 0.5 c respectively. in figures a, b and c respectively. against each other onagainst the x each and othery axes, on while the x the and third y axes, pa- whilethat the third are effectively pa- that anti-correlated are effectively occuring anti-correlated perhaps one occuring or perhaps one or rameter is held constant.rameter Figures is held 3 constant. and 4 shows Figures the playing 3 and 4 showstwo the levels playing of depth uptwo from levels the of leaves depth of up the from tree. the Note leaves that, of the tree. Note that, performance of theperformance challenging player of the (eitherchallenging PIMC player search (either PIMCof course, search in the caseof of course, maximum in the bias case and of correlation,maximum bias even and correlation, even or uniform random)or against uniform the random) equilibrium against player. the equilibrium White player.the random White player willthe random play perfectly, player sincewill play the sameperfectly, player since the same player shaded regions areshaded areas of regions the parameter are areas space of the where parameter the spaceis guaranteed where the to winis no guaranteed matter what to win the line no matter of play what (we the can line of play (we can challenger breaks evenchallenger with equilibrium. breaks even with The darker equilibrium. the Theonly darker suppose the these wouldonly suppose be very these boring would games be in very real boring life). games in real life). shading, the greatershading, the challenger’s the greater loss the against challenger’s the equilib- loss against theThe equilib- situation with theThe disambiguation situation with the factor disambiguation initially ap- factor initially ap- rium. Figure 5 is similar,rium. Figureexcept 5that is similar, the shading except represents that the shadingpears represents counter-intuitive;pears it counter-intuitive; appears that a low it disambiguation appears that a low disambiguation the gain of PIMC searchthe gain overof the PIMC random search player over when the random play- playerfactor when is play- good for thefactor absolute is good performance for the absolute of PIMC performance search, of PIMC search, ing against equilibrium.ing against Dark regions equilibrium. of these Dark plots regions repre- of thesewhile plots the repre- worst casewhile is a the mid-range worst case disambiguation is a mid-range factor. disambiguation factor. sent areas where PIMCsent searchareas where is performing PIMC search almost is no performing bet- almostHowever, no bet- in the relativeHowever, case in of the PIMC’s relative gain case over of random, PIMC’s gain over random, ter than the randomter player, than whereas the random lighter player, regions whereas indicate lighter a regionsthe indicate trend is a very clearlythe trend reversed. is very The clearly explanation reversed. for The this explanation for this substantial performancesubstantial gain for performance PIMC search gain over for random. PIMC search overlies inrandom. the fact thatlies the in random the fact player that the performs random relatively player performs relatively These plots show thatThese PIMC plots search show is that at its PIMC worst search when is at itswell worst in when games withwell a low in disambiguation games with a low factor. disambiguation In some factor. In some leaf node correlationleaf is node low. correlation This is true is both low. in This absolute is true bothsense, in absolute because theresense, is so muchbecause uncertainty there is so in much these uncertainty games, in these games, performance, and inperformance, PIMC’s relative and improvement in PIMC’s relative over ran- improvementthere over is a ran- lot of ‘luck,’there and is a there lot of is ‘luck,’ only so and much there an is opti- only so much an opti- dom play. The mostdom likely play. explanation The most for likely this explanation behavior is for thismal behavior player is can do tomal improve player his can position. do to improve As we his increase position. As we increase that when anti-correlationthat when occurs anti-correlation deep in the occurs game tree deep – in thethe game disambiguation tree – the factor, disambiguation the performance factor, of the the performance random of the random particularly at the leavesparticularly – then at PIMC the leaves search – then always PIMC be- searchplayer always deteriorates be- rapidly,player deteriorates while PIMC rapidly, search iswhile much PIMC more search is much more lieves that the criticallieves decisions that the are critical going decisionsto come ‘later’ are going and to comesuccessful ‘later’ and at holdingsuccessful its own against at holding the optimalits own against player. theAs optimal player. As that what it does higherthat what up the it tree does does higher not up actually the tree matter. does not actuallydisambiguation matter. approachesdisambiguation 1, the performance approaches of 1, PIMCthe performance im- of PIMC im- Of course, when anOf information course, when set structure an information (which set PIMC structure (whichproves PIMC drastically, sinceproves the drastically, game is approaching since the game a perfect is approaching a perfect ignores at every nodeignores except at the every root node of its except own the search) root is of its owninformation search) is game.information Finally, we game. note that Finally, with a we high note dis- that with a high dis- imposed on the tree,imposed early moves on the frequently tree, earlydo movesmatter, frequently and doambiguationmatter, and in the 0.7-0.9ambiguation range, in low the correlation 0.7-0.9 range, is actually low correlation is actually thus the superior playthus of the superiorequilibrium play player.

Games Lectures.Key

Game Theory 10

An Essay on Assignment Games

Improving Fictitious Play Reinforcement Learning with Expanding Models

Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents

Successful Nash Equilibrium Agent for a 3-Player Imperfect-Information Game

Game Theory 10

Online Enhancement of Existing Nash Equilibrium Poker Agents

ECE 586BH: Problem Set 4: Problems and Solutions Extensive Form Games

Heads-Up Limit Hold'em Poker Is Solved

Α-Rank: Multi-Agent Evaluation by Evolution

Optimistic Regret Minimization for Extensive-Form Games Via Dilated Distance-Generating Functions∗

Training Agents with Neural Networks in Systems with Imperfect Information