Limited Lookahead in Imperfect-Information Games

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)

Christian Kroer and Tuomas Sandholm Computer Science Department Carnegie Mellon University [email protected], [email protected]

Abstract four specific games rather than broad classes of games. In- stead, we analyze the questions for imperfect information and Limited lookahead has been studied for decades general-sum extensive-form games. in perfect-information games. This paper initiates a As is typical in the literature on limited lookahead in perfect- new direction via two simultaneous deviation points: information games, we derive our results for a two-agent generalization to imperfect-information games and setting. One agent is a rational player (Player r) trying to a game-theoretic approach. The question of how one optimally exploit a limited-lookahead player (Player l). Our should act when facing an opponent whose looka- results extend immediately to one rational player and more head is limited is studied along multiple axes: looka- than one limited-lookahead player, as long as the latter all head depth, whether the opponent(s), too, have im- break ties according to the same scheme (statically, favorably, perfect information, and how they break ties. We or adversarially—as described later in the paper). This is be- characterize the hardness of finding a Nash equilib- cause such a group of limited-lookahead players can be treated rium or an optimal commitment strategy for either as one from the perspective of our results. player, showing that in some of these variations the The type of limited-lookahead player we introduce is anal- problem can be solved in polynomial time while in ogous to that in the literature on perfect-information games. others it is PPAD-hard or NP-hard. We proceed to Specifically, we let the limited-lookahead player l have a node design algorithms for computing optimal commit- evaluation function h that places numerical values on all nodes ment strategies for when the opponent breaks ties in the game tree. Given a strategy for the rational player, at 1) favorably, 2) according to a fixed rule, or 3) ad- each information set at some depth i, Player l picks an action versarially. The impact of limited lookahead is then that maximizes the expected value of the evaluation function investigated experimentally. The limited-lookahead at depth i + k, assuming optimal play between those levels. player often obtains the value of the game if she Our study is the game-theoretic, imperfect-information gen- knows the expected values of nodes in the game eralization of lookahead questions studied in the literature tree for some equilibrium, but we prove this is not and therefore interesting in its own right. The model also has sufficient in general. Finally, we study the impact applications such as biological games, where the goal is to of noise in those estimates and different lookahead steer an evolution or adaptation process (which typically acts depths. This uncovers a lookahead pathology. myopically with lookahead 1) [Sandholm, 2015] and security games where opponents are often assumed to be myopic (as 1 Introduction makes sense when the number of adversaries is large [Yin et al., 2012]). Furthermore, investigating how well a rational Limited lookahead has been a central topic in AI game play- player can exploit a limited-lookahead player lends insight ing for decades. To date, it has been studied in single-agent into the limitations of using limited-lookahead algorithms in settings and perfect-information games—specifically in well- multiagent decision making. known games such as chess, checkers, Go, etc., as well as We then design algorithms for finding an optimal strategy in random game tree models [Korf, 1990; Pearl, 1981; 1983; to commit to for the rational player. We focus on this rather Nau, 1983; Nau et al., 2010; Bouzy and Cazenave, 2001; than equilibrium computation because the latter seems nonsen- Ramanujan et al., 2010; Ramanujan and Selman, 2011]. In this sical in this setting: the limited-lookahead player determining paper, we initiate the game-theoretic study of limited looka- a Nash equilibrium strategy would require her to reason about head in imperfect-information games. Such games are more the whole game for the rational player’s strategy, which rings broadly applicable to practical settings—for example auctions, contrary to the limited-lookahead assumption. Computing op- negotiations, security, cybersecurity, and medical settings— timal strategies to commit to in standard rational settings has than perfect-information games. Mirrokni et al. [2012] con- previously been studied in normal-form games [Conitzer and ducted a game-theoretic analysis of lookahead, but they con- Sandholm, 2006] and extensive-form games [Letchford and sider only perfect-information games, and the results are for Conitzer, 2010], the latter implying some complexity results

575 for our setting as we will discuss. 3 Model of limited lookahead As in the literature on lookahead in perfect-information We now describe our model of limited lookahead. We use the games, a potential weakness of our approach is that we re- term optimal hypothetical play to refer to the way the limited- quire knowing the evaluation function h (but make no other lookahead agent thinks she will play when looking ahead from assumptions about what information h encodes). In practice, a given information set. In actual play part way down that plan, this function may not be known. As in the perfect-information she may change her mind because she will then be able to see setting, this can lead to the rational exploiter being exploited. to a deeper level of the game tree. k Let k be the lookahead of Player l, and SI,a the nodes at 2 Extensive-form games lookahead depth k below information set I that are reachable (through some path) by action a. As in prior work in We start by deﬁning the class of games that the players will the perfect-information game setting, Player l has a node- evaluation function h : S R that assigns a heuristic numeri- play, without reference to limited lookahead. An extensive- → form game Γ is a tuple N, A, S, Z, , σ0, u, . N is the set cal value to each node in the game tree. of players. A is the set ofh all actionsH in the game.Ii S is a set of Given a strategy σr for the other player and ﬁxed action nodes corresponding to sequences of actions. They describe probabilities for Nature, Player l chooses, at any given infor- a tree with root node sr S. At each node s, it is the turn of mation set I l at depth i, a (possibly mixed) strategy whose ∈ ∈ I some Player i to move. Player i chooses among actions As, support is contained in the set of actions that maximize the s expected value of the heuristic function at depth i + k, assum- and each branch at s denotes a different choice in As. Let ta be the node transitioned to by taking action a As at node ing optimal hypothetical play by her (maxσl in the formula ∈ s. The set of all nodes where Player i is active is called Si. below). We will denote this set by AI∗ =

Z S is the set of leaf nodes. The utility function of Player σ−l ⊂ X π (s) X σ s i is ui : Z R, where ui(z) is the utility to Player i when a : a arg max max π (t , s0)h(s0) , σ−l a → { ∈ a AI σl π (I) } reaching node z. We assume, without loss of generality, that ∈ s I s0 Sk ∈ ∈ I,a all utilities are non-negative. Zs is the subset of leaf nodes reachable from a node s. i is the set of heights in where σ = σ , σ is the strategy profile for the two players. H ⊆ H l r the game tree where Player i acts. Certain nodes correspond Here moves{ by Nature} are also counted toward the depth of the to stochastic outcomes with a fixed probability distribution. lookahead. The model is flexible as to how the rational player Rather than treat those specially, we let Nature be a static chooses σr and how the limited-lookahead player chooses a player acting at those nodes. 0 is the set of heights where (possibly mixed) strategy with supports within the sets A . H I∗ Nature acts. σ0 specifies the probability distribution for Nature, For one, we can have these choices be made for both players with σ0(s, a) denoting the probability of Nature choosing simultaneously according to the Nash equilibrium solution outcome a at node s. Imperfect information is represented concept. As another example, we can ask how the players in the game model using information sets. i is the set should make those choices if one of the players gets to make, I ⊆ I of information sets where Player i acts. i partitions Si. For and commit to, all her choices before the other. I nodes s1, s2 I,I i, Player i cannot distinguish among ∈A =∈A I = A them, and so s1 s2 I . 4 Complexity We denote by σi : Si [0, 1] a behavioral strategy for → In this section we analyze the complexity of finding strategies Player i. For each information set I i, it assigns a proba- ∈ I according to these solution concepts. bility distribution over AI , the actions at the information set. σi(I, a) is the probability of playing action a at information Nash equilibrium. Finding a Nash equilibrium when Player set I.A strategy profile σ = (σ0, . . . , σn) consists of a be- l either has information sets containing more than one node, havioral strategy for each player. We will often use σ(I, a) or has lookahead at least 2, is PPAD-hard [Papadimitriou, to mean σi(I, a), since the information set specifies which 1994]. This is because finding a Nash equilibrium in a 2- Player i is active. As described above, randomness external to player general-sum normal-form game is PPAD-hard [Chen et the players is captured by the Nature outcomes σ0. al., 2009], and any such game can be converted to a depth 2 Let the probability of going from node s to node sˆ under extensive-form game, where the general-sum payoffs are the σ evaluation function values. strategy profile σ be π (s, sˆ) = Π s,¯ a¯ Xs,sˆσ(¯s, a¯) where h i∈ Xs,sˆ is the set of node-action pairs on the path from s to sˆ. We If the limited-lookahead player only has singleton informa- let the probability of reaching node s be πσ(s) = πσ(sr, s), tion sets and lookahead 1, an optimal strategy can be trivially the probability of going from the root node to s. Let πσ(I) = computed in polynomial time in the size of the game tree P σ for the limited-lookahead player (without even knowing the s I π (s) be the probability of reaching any node in I. Due ∈ σ σ other player’s strategy σr): for each of her information sets, to perfect recall, we have πi (I) = πi (s) for all s I. For σ σ¯ ∈ we simply pick an action that has highest immediate heuristic probabilities over Nature, π0 = π0 for all σ, σ¯, so we can ignore the strategy profile superscript and write π0. Finally, value. To get a Nash equilibrium, what remains to be done is for all behavioral strategies, the subscript i refers to the same to compute a best response for the rational player, which can − σ also be easily done in polynomial time [Johanson et al., 2011]. definition, excluding Player i. For example, π i(s) denotes the probability of reaching s over the actions of− the players Commitment strategies. Next we study the complexity of other than i, that is, if i played to reach s with probability 1. finding commitment strategies (that is, finding a strategy for

576 by a linear program that has size O( S ) + O(P A the rational player to commit to, where the limited lookahead I l I player then responds to that strategy.). The complexity depends k | | ∈I | | · maxs S As ). on whether the game has imperfect information (information ∈ | | sets that include more than one node) for the limited-lookahead To prove this theorem, we first design a series of linear player, how far that player can look ahead, and how she breaks programs for computing best responses for the two players. ties in her action selection. We will then use duality to prove the theorem statement. When ties are broken adversarially, the choice of response depends on the choice of strategy for the rational player. If In the following, it will be convenient to change to matrix- [ ] Player l has lookahead one and no information sets, it is easy vector notation, analogous to that of von Stengel 1996 , with A = B to find the optimal commitment strategy: the set of optimal some extensions. Let be matrices describing the r − actions A for any node s S can be precomputed, since utility function for Player and the adversarial tie-breaking of s∗ l l r Player r does not affect which∈ actions are optimal. Player l Player over , respectively. Rows are indexed by Player A l will then choose actions from these sets to minimize the utility sequences, and columns by Player sequences. For sequence x, y of Player r. We can view the restriction to a subset of actions form vectors , the objectives to be maximized for the xAy, xBy E,F as a new game, where Player l is a rational player in a zero- players are then , respectively. Matrices are r sum game. An optimal strategy for Player r to commit to is used to describe the sequence form constraints for Player l then a Nash equilibrium in this smaller game. This is solvable and , respectively. Rows correspond to information sets, and e, f in polynomial time by an LP that is linear in the size of the columns correspond to sequences. Letting be standard , game. The problem is hard without either of these assumptions. unit vectors of length r l , respectively, the constraints Ex = e, F y = f |I | |I | This is shown in an extended online version. describe the sequence form constraint for the respective players. Given a strategy x for Player r satis- fying (1) and (2) for some , the optimization problem for 5 Algorithms A Player l becomes choosing a vector of y0 representing prob- In this section we will develop an algorithm for solving the abilities for all sequences in that minimize the utility of A hard commitment-strategy case. Naturally its worst-case run- Player r. Letting a prime superscript denote the restriction time is exponential. As mentioned in the introduction, we of each matrix and vector to sequences in , this gives the A focus on commitment strategies rather than Nash equilibria following primal (3) and dual (4) LPs: because Player l playing a Nash equilibrium strategy would require that player to reason about the whole game for the op- T T ponent’s strategy. Further, optimal strategies to commit to are max (x B0)y0 min q0 f 0 0 q0 desirable for applications such as biological games [Sandholm, y (3) T T (4) 2015] (because evolution is responding to what we are doing) F 0y0 = f 0 q0 F 0 x B0 ≥ and security games [Yin et al., 2012] (where the defender y 0 typically commits to a strategy). ≥ Since the limited-lookahead player breaks ties adversarially, where q is a vector with + 1 dual variables. Given we wish to compute a strategy that maximizes the worst-case 0 some strategy y for Player|A|l, Player r maximizes utility best response by the limited-lookahead player. For argument’s 0 among strategies that induce . This gives the following best- sake, say that we were given , which is a fixed set of pairs, response LP for Player r: A one for each information set I Aof the limited-lookahead player, consisting of a set of optimal actions AI∗ and one strategy for T σI I = S A , σI max x (Ay0) hypothetical play l at . Formally, I l I∗ l . To x A ∈I h i make these actions optimal for Player l, Player r must choose a xT ET = eT strategy such that all actions in are best responses according A x 0 (5) to the evaluation function of Player l. Formally, for all action ≥ T T triples a, a∗ , a0 / (letting π(s) denote probabilities x H x H ∈I A ∈ A ¬A − A ≤ − induced by σl for the hypothetical play between I, a and s): T T x G ∗ = x G X X A A π(s) h(s) > π(s) h(s) (1) · · s Sk s Sk where the last two constraints encode (1) and (2), respectively. ∈ I,a ∈ I,a0 The dual problem uses the unconstrained vectors p, v and X X π(s) h(s) = π(s) h(s) (2) constrained vector u and looks as follows k · k · s SI,a s S ∗ ∈ ∈ I,a min eT p u p,u,v Player r chooses a worst-case utility-maximizing strategy that − · T (6) satisfies (1) and (2), and Player l has to compute a (possibly E p + (H H )u + (G ∗ G )v A0y0 ¬A − A A − A ≥ mixed) strategy from such that the utility of Player r is u 0 minimized. This can beA solved by a linear program: ≥ Theorem 1. For some fixed choice of actions , Nash equilib- We can now merge the dual (4) with the constraints from the ria of the induced game can be computed in polynomialA time primal (5) to compute a minimax strategy: Player r chooses x,

577 d which she will choose to minimize the objective of (4), lookahead depth is reached, we recursively constrain vI (I0, a) T to be: min q0 f 0 d X d ˇ 0 v (I0, a) v (I) (10) x,q I ≥ I T T Iˇ q0 F 0 x B0 0 ∈D − ≥ where is the set of information sets at the next level where T T T x E = e PlayerDl plays. If there are no more information sets where − − (7) d x 0 Player l acts, then we constrain vI (I0, a): ≥ T T d X σ x H x H vI (I0, a) π lh(s) (11) A − ¬A ≥ ≥ − T T s Sk x G x G ∗ = 0 ∈ I0,a A − A Taking the dual of this gives Setting it to the probability-weighted heuristic value of the nodes reached below it. Using this, we can now write the T max e p + u constraint that a dominates all a0 I, a0 / as: y0,p − · ∈ ∈ A X T πσ(s)h(s) vd(I) E p + (H H )u + (G G ∗ )v B0y0 (8) ≥ I − A − ¬A A − A ≤ k s SI,a F 0y0 = f 0 ∈ O(P A ) y, u 0 There can at most be I l I actions to be made dom- ≥ inant. For each action at some∈I | information| set I, there can We are now ready to prove Theorem 1. min k,k0 be at most O(maxs S As { }) entries over all the con- ∈ | | Proof. The LPs in Theorem 1 are (7) and (8). We will use straints, where k0 is the maximum depth of the subtrees rooted duality to show that they provide optimal solutions to each of at I, since each node at the depth the player looks ahead to has its heuristic value added to at most one expression. For the the best response LPs. Since A = B, the first constraint in T T − constraint set x G x G ∗ = 0, the choice of hypothetical (8) can be multiplied by 1 to obtain the first constraint in (6) A − A and the objective function− can be transformed to that of (6) by plays has already been made for both expressions, and so we making it a minimization. By the weak duality theorem, we have the constraint X X 0 get the following inequalities πσ(s)h(s) = πσ (s)h(s) k k T T s SI,a s S 0 q0 f 0 x B0y0; by LPs (3) and (4) ∈ ∈ I,a ≥ T T l l, for all I l, a, a0 I, a, σ , a0, σ 0 , where e p u x A0y0; by LPs (5) and (6) ∈ I ∈ { } { } ∈ A − · ≥ l l, σ = σ l, σ , σ0 = σ l, σ 0 We can multiply the last inequality by 1 to get: { − } { − } − P 2 T T T T There can at most be AI such constraints, which is q0 f 0 x B0y0 = x A0y0 e p + u (9) I l ≥ − ≥ − · dominated by the size of∈I the previous| | constraint set. By the strong duality theorem, for optimal solutions to LPs (7) Summing up gives the desired bound. T and (8) we have equality in the objective functions q0 f 0 = eT p + u which yields equality in (9), and thereby equality In reality we are not given . To find a commitment strat- r A −for the objective functions in LPs (3), (4) and for (5), (6). By egy for Player , we could loop through all possible structures , solve LP (7) for each one, and select the one that gives strong duality, this implies that any primal solution x, q0 and A dual solution y , p to LPs (7) and (8) yields optimal solutions the highest value. We now introduce a mixed-integer program 0 (MIP) that picks the optimal induced game while avoiding to the LPs (3) and (5). Both players are thus best responding A to the strategy of the other agent, yielding a Nash equilibrium. enumeration. The MIP is given in (12). We introduce Boolean Conversely, any Nash equilibrium gives optimal solutions x, y sequence-form variables that denote making sequences sub- 0 optimal choices. These variables are then used to deactivate for LPs (3) and (5). With corresponding dual solutions p, q0, equality is achieved in (9), meaning that LPs (7) and (8) are subsets of constraints, so that the MIP branches on formula- solved optimally. tions of LP (7), i.e., what goes into the structure . The size of the MIP is of the same order as that of LP (7). A It remains to show the size bound for LP (7). Using sparse representation, the number of non-zero entries in the matrices min qT f A, B, E, F is linear in the size of the game tree. The constraint x,q,z set xT H xT H , when naively implemented, is not. qT F xT B zM A − ¬A ≥ ≥ − The value of a sequence a / AI∗ is dependent on the choice Ex = e among the cartesian product∈ of choices at each information xT H xT H + (1 z)M set I0 encountered in hypothetical play below it. In practice A ≥ ¬A − − (12) d T T we can avoid this by having a real-valued variable vI (I0) rep- x G = x G ∗ (1 z)M A A resenting the value of I0 in lookahead from I, and constraints X ± − vd(I ) vd(I , a) a I vd(I , a) za za0 I 0 I 0 for each 0, where I 0 is a vari- ≥ ≥ ∈ a AI able representing the value of taking a at I0. If there are more ∈ information sets below I0 where Player l plays, before the x 0, z 0, 1 ≥ ∈ { }

578 The variable vector x contains the sequence form variables In Kuhn poker, the player with the higher card wins in a for Player r. The vector q is the set of dual variables for Player showdown. In KJ, showdowns have two possible outcomes: l. z is a vector of Boolean variables, one for each Player l one player has a pair, or both players have the same private sequence. Setting za = 1 denotes making the sequence a an card. For the former, the player with the pair wins the pot. For inoptimal choice. The matrix M is a diagonal matrix with suf- the latter the pot is split. Kuhn poker has 55 nodes in the game ficiently large constants (e.g. the smallest value in B) such that tree and 13 sequences per player. The KJ game tree has 199 setting za = 1 deactivates the corresponding constraint. Sim- nodes, and 57 sequences per player. ilar to the favorable-lookahead case, we introduce sequence To investigate the value that can be derived from exploiting P 0 form constraints a AI za za where a0 is the parent se- a limited-lookahead opponent, a node evaluation heuristic is quence, to ensure that∈ at least≥ one action is picked when the needed. In this work we consider heuristics derived from a parent sequence is active. We must also ensure that the incen- Nash equilibrium. For a given node, the heuristic value of tivization constraints are only active for actions in : the node is simply the expected value of the node in (some A chosen) equilibrium. This is arguably a conservative class xT H xT H (1 z)M (13) A − ¬A ≥ − − of heuristics, as a limited-lookahead opponent would not be T T x G x G ∗ = 0 (1 z)M expected to know the value of the nodes in equilibrium. Even A − A ± − with this form of evaluation heuristic it is possible to exploit for diagonal matrices M with sufficiently large entries. Equal- the limited-lookahead player, as we will show. We will also ity is implemented with a pair of inequality constraints , , {≤ ≥} consider Gaussian noise being added to the node evaluation where denotes adding or subtracting, respectively. heuristic, more realistically modeling opponents who have The± values of each column constraint in (13) is implemented I vague ideas of the values of nodes in the game. Formally, let by a series of constraints. We add Boolean variables σl (I0, a0) σ be an equilibrium, and i the limited-lookahead player. The for each information set action pair I0, a0 that is potentially heuristic value h(s) of a node s is: chosen in hypothetical play at I. Using our regular notation, for each a, a where a is the action to be made dominant, the ui(s) if s Z 0 h(s) = constraint is implemented by: P s ∈ (15) a As σ(s, a)h(ta) otherwise ∈ X i d i I v (s) v (I), v (s) σ (I0, a0) M (14) We consider two different noise models. The first adds Gaus- ≥ I ≤ l · s Sk sian noise with mean 0 and standard deviation γ independently ∈ I,a to each node evaluation, including leaf nodes. Letting µs be i where the latter ensures that v (s) is only non-zero if chosen a noise term drawn from (0, γ): hˆ(s) = h(s) + µs. The in hypothetical play. We further need the constraint vi(s) second, more realistic, modelN adds error cumulatively, with no σ i ≤ π l(s)h(s) to ensure that v (s), for a node s at the lookahead error on leaf nodes: depth,− is at most the heuristic value weighted by the probability of reaching s. ¯ ui(s) if s Z h(s) = P ¯ s ∈ (16) a As σ(s, a)h(ta) + µs otherwise ∈ 6 Experiments Using MIP (12), we computed optimal strategies for the In this section we experimentally investigate how much utility rational player in Kuhn poker and KJ. The results are given in can be gained by optimally exploiting a limited-lookahead Figure 1. The x-axis is the noise parameter γ for hˆ and h¯. The player. We conduct experiments on Kuhn poker [Kuhn, 1950], y-axis is the corresponding utility for the rational player, aver- a canonical testbed for game-theoretic algorithms, and a larger aged over at least 1000 runs per tuple game, choice of rational simplified poker game that we call KJ. Kuhn poker consists player, lookahead, standard deviationh . Each figure contains of a three-card deck: king, queen, and jack. Each player antes plots for the limited-lookahead player havingi lookahead 1 or 2, 1. Each player is then dealt one of the three cards, and the and a baseline for the value of the game in equilibrium without third is put aside unseen. A single round of betting (p = 1) limited lookahead. then occurs. In KJ, the deck consists of two kings and two Figures 1a and b show the results for using evaluation func- jacks. Each player antes 1. A private card is dealt to each, tion hˆ in Kuhn poker, with the rational player in plot a and b followed by a betting round (p = 2), then a public card is being Player 1 and 2, respectively. For rational Player 1, we dealt, follower by another betting round (p = 4). If no player see that, even with no noise in the heuristic (i.e., the limited- has folded, a showdown occurs. For both games, each round lookahead player knows the value of each node in equilibrium), of betting looks as follows: it is possible to exploit the limited-lookahead player if she has Player 1 can check or bet p. • lookahead 1. (With lookahead 2 she achieves the value of – If Player 1 checks Player 2 can check or raise p. the game.) For both amounts of lookahead, the exploitation If Player 2 checks the betting round ends. potential steadily increases as noise is added. ∗ If Player 2 raises Player 1 can fold or call. ∗ Figures 1c and d show the same variants for KJ. Here, If Player 1 folds Player 2 takes the pot. lookahead 2 is worse for the limited-lookahead player than · If Player 1 calls the betting round ends. · lookahead 1. To our knowledge, this is the first known – If Player 1 raises Player 2 can fold or call. imperfect-information lookahead pathology. Such patholo- If Player 2 folds Player 1 takes the pot. [ ∗ gies are well known in perfect-information games Beal, 1980; If Player 2 calls the betting round ends. ] ∗ Pearl, 1981; Nau, 1983 , and understanding them remains an

579 Information sets no yes

Lookahead depth > 1 PPAD,NP -hard yes no { }

PPAD,NP -hard Solution concept { } Equilibrium Commitment

P Tie-breaking rule Adversarial, static Favorable

P NP-hard Figure 3: Our complexity results. PPAD,NP -hard indicates that ﬁnding a Nash equilibrium (optimal{ strategy} to commit to) is PPAD-hard (NP-hard). P indicates polytime.

node belonging to P1, and it will be played with probability 0 in equilibrium, since it has expected value 0. Due to this, all strategies where Player 2 chooses up can be part of an equilibrium. Assuming that P2 is the limited-lookahead player and minimizing, for large enough α, the node labeled P1∗ will be more desirable than any other node in the game, since it has expected value α according to the evaluation function. − A rational player P1 can use this to get P2 to go down at P2∗, and then switch to the action that leads to α. This example is for lookahead 1, but we can generalize the example to work with any finite lookahead depth: the node P1∗ can be replaced by a subtree where every other leaf has payoff 2α, in which case P2 would be forced to go to the leaf with payoff α once down has been chosen at P2∗. Figures 1e and f show the results for Kuhn poker with h¯. These are very similar to the results for hˆ, with almost identical expected utility for all scenarios. Figures 1g and h, as previously mentioned, show the results with h¯ on KJ. Here we see no abstraction pathologies, and for the setting where Player 2 is the rational player we see the most pronounced Figure 1: Winnings in Kuhn poker and KJ for the rational difference in exploitability based on lookahead. player as Player 1 and 2, respectively, for varying evaluation function noise. Error bars show standard deviation. 7 Conclusions and future work This paper initiated the study of limited lookahead in 1 0 0 imperfect-information games. We characterized the complex- 2 P1 P ∗ 1 α ity of finding a Nash equilibrium and optimal strategy to com- − 0 P1∗ mit to for either player. Figure 3 summarizes those results, 0 α including the cases of favorable and static tie-breaking, the Figure 2: A subtree that exhibits lookahead pathology. discussion of which we deferred to the extended online paper. We then designed a MIP for computing optimal strategies to commit to for the rational player. The problem was shown to active area of research [Lustrekˇ et al., 2006; Nau et al., 2010; reduce to choosing the best among a set of two-player zero- Wilson et al., 2012]. This version of the node heuristic does sum games (the tie-breaking being the opponent), where the not have increasing visibility: node evaluations do not get more optimal strategy for any such game can be computed with an accurate toward the end of the game. Our experiments on KJ LP. We then introduced a MIP that finds the optimal solution with h¯ in Figures 1g and h do not have this pathology, and h¯ by branching on these games. does have increasing visibility. We experimentally studied the impact of limited lookahead Figure 2 shows a simple subtree (that could be attached to in two poker games. We demonstrated that it is possible to any game tree) where deeper lookahead can make the agent’s achieve large utility gains by exploiting a limited-lookahead decision arbitrarily bad, even when the node evaluation func- opponent. As one would expect, the limited-lookahead player tion is the exact expected value of a node in equilibrium. often obtains the value of the game if her heuristic node eval- We now go over the example of Figure 2. Assume without uation is exact (i.e., it gives the expected values of nodes in loss of generality that all payoffs are positive in some game. the game tree for some equilibrium)—but we provided a coun- We can then insert the subtree in Figure 2 as a subgame at any terexample that shows that this is not sufficient in general.

580 Finally, we studied the impact of noise in those estimates, and [Kroer and Sandholm, 2014b] Christian Kroer and Tuomas Sand- different lookahead depths. While lookahead 2 usually outper- holm. Extensive-form game imperfect-recall abstractions with formed lookahead 1, we uncovered an imperfect-information bounds. arXiv preprint: http://arxiv.org/abs/1409.3302, 2014. game lookahead pathology: deeper lookahead can hurt the [Kuhn, 1950] Harold W. Kuhn. A simplified two-person poker. In limited-lookahead player. We demonstrated how this can oc- H. W. Kuhn and A. W. Tucker, editors, Contributions to the Theory cur with any finite depth of lookahead, even if the limited- of Games, volume 1 of Annals of Mathematics Studies, 24, pages lookahead player’s node evaluation heuristic returns exact 97–103. Princeton University Press, Princeton, New Jersey, 1950. values from an equilibrium. [Lanctot et al., 2012] Marc Lanctot, Richard Gibson, Neil Burch, Our algorithms in the NP-hard adversarial tie-breaking set- Martin Zinkevich, and Michael Bowling. No-regret learning in ting scaled to games with hundreds of nodes. For some practi- extensive-form games with imperfect recall. In International cal settings more scalability will be needed. There are at least Conference on Machine Learning (ICML), 2012. two exciting future directions toward achieving this. One is [Letchford and Conitzer, 2010] Joshua Letchford and Vincent to design faster algorithms. The other is designing abstraction Conitzer. Computing optimal strategies to commit to in extensive- techniques for the limited-lookahead setting. In extensive-form form games. In Proceedings of the ACM Conference on Electronic game solving with rational players, abstraction plays an im- Commerce (EC), 2010. portant role in large-scale game solving [Sandholm, 2010]. [Lustrekˇ et al., 2006] Mitja Lustrek,ˇ Matjazˇ Gams, and Ivan Bratko. Theoretical solution quality guarantees have recently been Is real-valued minimax pathological? Artificial Intelligence, achieved [Lanctot et al., 2012; Kroer and Sandholm, 2014a; 170(6):620–642, 2006. 2014b]. Limited-lookahead games have much stronger struc- [Mirrokni et al., 2012] Vahab Mirrokni, Nithum Thain, and Adrian ture, especially locally around an information set, and it may Vetta. A theoretical examination of practical game playing: be possible to utilize that to develop abstraction techniques lookahead search. In Algorithmic Game Theory, pages 251–262. with significantly stronger solution quality bounds. Also, lead- Springer, 2012. ing practical game abstraction algorithms (e.g., [Ganzfried [Nau et al., 2010] Dana S. Nau, Mitja Lustrek,ˇ Austin Parker, Ivan and Sandholm, 2014]), while theoretically unbounded, could Bratko, and Matjazˇ Gams. When is it better not to look ahead? Artificial Intelligence immediately be used to investigate exploitation potential in , 174(16):1323–1338, 2010. larger games. Finally, uncertainty over h is an important future [Nau, 1983] Dana S. Nau. Pathology on game trees revisited, and an research direction. This would lead to more robust solution alternative to minimaxing. Artificial intelligence, 21(1):221–244, concepts, thereby alleviating the pitfalls involved with using 1983. an imperfect estimate. [Papadimitriou, 1994] Christos H. Papadimitriou. On the complex- Acknowledgements. This work is supported by the National ity of the parity argument and other inefficient proofs of existence. Science Foundation under grant IIS-1320620. Journal of Computer and system Sciences, 48(3):498–532, 1994. [Pearl, 1981] Judea Pearl. Heuristic search theory: Survey of recent References results. In IJCAI, volume 1, pages 554–562, 1981. [ ] [Beal, 1980] Donald F. Beal. An analysis of minimax. Advances in Pearl, 1983 Judea Pearl. On the nature of pathology in game Artificial Intelligence computer chess, 2:103–109, 1980. searching. , 20(4):427–453, 1983. [ ] [Bouzy and Cazenave, 2001] Bruno Bouzy and Tristan Cazenave. Ramanujan and Selman, 2011 Raghuram Ramanujan and Bart Sel- Computer go: an ai oriented survey. Artificial Intelligence, man. Trade-offs in sampling-based adversarial planning. In ICAPS 132(1):39–103, 2001. , pages 202–209, 2011. [ et al. ] [Chen et al., 2009] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Ramanujan , 2010 Raghuram Ramanujan, Ashish Sabharwal, Settling the complexity of computing two-player Nash equilibria. and Bart Selman. On adversarial search spaces and sampling- ICAPS Journal of the ACM, 2009. based planning. In , volume 10, pages 242–245, 2010. [ ] [Conitzer and Sandholm, 2006] Vincent Conitzer and Tuomas Sand- Sandholm, 2010 Tuomas Sandholm. The state of solving large AI Mag- holm. Computing the optimal strategy to commit to. In Proceed- incomplete-information games, and application to poker. azine ings of the ACM Conference on Electronic Commerce (ACM-EC), , pages 13–32, Winter 2010. Special issue on Algorithmic Ann Arbor, MI, 2006. Game Theory. [Sandholm, 2015] Tuomas Sandholm. Steering evolution strategi- [Ganzfried and Sandholm, 2014] Sam Ganzfried and Tuomas Sand- cally: Computational game theory and opponent exploitation for holm. Potential-aware imperfect-recall abstraction with earth treatment planning, drug design, and synthetic biology. In AAAI mover’s distance in imperfect-information games. In Proceedings Conference on Artificial Intelligence, Senior Member Track, 2015. of the AAAI Conference on Artificial Intelligence (AAAI), 2014. [von Stengel, 1996] Bernhard von Stengel. Efficient computation of [Johanson et al., 2011] Michael Johanson, Kevin Waugh, Michael behavior strategies. Games and Economic Behavior, 14(2):220– Bowling, and Martin Zinkevich. Accelerating best response cal- 246, 1996. culation in large extensive games. In Proceedings of the Interna- tional Joint Conference on Artificial Intelligence (IJCAI), 2011. [Wilson et al., 2012] Brandon Wilson, Inon Zuckerman, Austin Parker, and Dana S. Nau. Improving local decisions in adver- [Korf, 1990] Richard E. Korf. Real-time heuristic search. Artificial sarial search. In ECAI, pages 840–845, 2012. intelligence, 42(2):189–211, 1990. [Yin et al., 2012] Z Yin, A Jiang, M Tambe, C Kietkintveld, [Kroer and Sandholm, 2014a] Christian Kroer and Tuomas Sand- K Leyton-Brown, T Sandholm, and J Sullivan. TRUSTS: Schedul- holm. Extensive-form game abstraction with bounds. In Pro- ceedings of the ACM Conference on Economics and Computation ing randomized patrols for fare inspection in transit systems using game theory. AI Magazine, 2012. (EC), 2014.

581