Articles

The State of Solving Large Incomplete-Information , and Application to

Tuomas Sandholm

n -theoretic solution concepts prescribe ame-theoretic solution concepts prescribe how rational how rational parties should act, but to become parties should act in multiagent settings. This is non - operational the concepts need to be accompa - trivial because an agent’s utility-maximizing nied by algorithms. I will review the state of G generally depends on the other agents’ strategies. The most solving incomplete-information games. They famous for this is a : a strat - encompass many practical problems such as auctions, negotiations, and security applica - egy profile (one strategy for each agent) where no agent has tions. I will discuss them in the context of how incentive to deviate from her strategy given that others do not they have transformed computer poker. In short, deviate from theirs. game-theoretic reasoning now scales to many In this article I will focus on incomplete-information games, large problems, outperforms the alternatives on that is, games where the agents do not entirely know the state those problems, and in some games beats the of the game at all times. The usual way to model them is a game best humans. tree where the nodes (that is, states) are further grouped into information sets. In an information set, the player whose turn it is to move cannot distinguish between the states in the infor - mation set, but knows that the actual state is one of them. Incomplete-information games encompass most games of prac - tical importance, including most negotiations, auctions, and many applications in information security and physical battle. Such games are strategically challenging. A player has to rea - son about what others’ actions signal about their knowledge. Conversely, the player has to be careful about not signaling too much about her own knowledge to others through her actions. Such games cannot be solved using methods for complete-infor - mation games like checkers, , or Go. Instead, I will review new game-independent algorithms for solving them. Poker has emerged as a standard benchmark in this space (Shi and Littman 2002; Billings et al. 2002) for a number of reasons, because (1) it exhibits the richness of reasoning about a proba - bilistic future, how to interpret others’ actions as signals, and information hiding through careful action selection, (2) the game is unambiguously specified, (3) the game can be scaled to the desired complexity, (4) humans of a broad range of skill exist for comparison, (5) the game is fun, and (6) computers find interesting strategies automatically. For example, time-tested behaviors such as bluffing and slow play arise from the game-the - oretic algorithms automatically rather than having to be explic - itly programmed.

Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 WINTER 2010 13 Articles

Original game Abstracted game Abstraction algorithm

Custom algorithm for !nding a Nash equilibrium Nash Reverse model equilibrium Nash equilibrium

Figure 1. Current Paradigm for Solving Large Incomplete-Information Games.

Kuhn poker — a game with three cards — was Abstraction Algorithms among the first applications discussed in game the - ory, and it was solved analytically by hand (Kuhn Abstraction algorithms take as input a description 1950). On large-scale poker games, the best com - of the game and output a smaller but strategically puterized strategies for a long time were rule based. similar — or even equivalent — game. The abstrac - Nowadays, the best poker-playing programs are tion algorithms discussed here work with any generated automatically using algorithms that are finite number of players and do not assume a zero- based on game-theoretic principles. sum game. There has been tremendous progress on equilib - Information Abstraction rium-finding algorithms since 2005. Two-player zero-sum game trees with 10 12 leaves can now be The most popular kind of abstraction is informa - solved near optimally. However, many real games tion abstraction. The game is abstracted so that the are even larger. For example, two-player Limit agents cannot distinguish some of the states that Texas Hold’em poker has 10 18 leaves. For such large they can distinguish in the actual game. For exam - games, abstraction algorithms have emerged as ple in an abstracted poker hand, an agent is not practical preprocessors. able to observe all the nuances of the cards that she Most competitive poker-playing programs are would normally observe. nowadays generated using an abstraction algo - Lossless Information Abstraction. It turns out rithm followed by using a custom equilibrium- that it is possible to do lossless information finding algorithm to solve the abstracted game. See abstraction, which may seem like an oxymoron at figure 1. This paradigm was first used in Gilpin, first. The method I will describe (Gilpin and Sand - Sandholm, and Sørensen (2007). Predecessors of holm 2007b) is for a class of games that we call the paradigm included handcrafting small abstrac - games with ordered signals. It is structured, but still tions (Billings et al. 2003), as well as solving auto - general enough to capture a wide range of strategic matically generated abstractions with general-pur - situations. A game with ordered signals consists of pose linear programming algorithms (Gilpin and a finite number of rounds. Within a round, the Sandholm 2006; 2007a; 2007b). players play a game on a directed tree (the tree can In this article I will discuss abstraction algo - be different in different rounds). The only uncer - rithms first and equilibrium-finding algorithms tainty players face stems from private signals the second. After that I will address opponent exploita - other players have received and from the unknown tion and other topics. future signals. In other words, players observe each

14 AI MAGAZINE Articles

An yNash equilibriu mofth eshrunkengamecor respond s toa Nash equilibriu mofth eoriginalgame.

Theorem 1. (Gilpin and Sandholm 2007b.)

others’ actions, but potentially not nature’s applied iterated elimination of dominated strate - actions. In each round, there can be public signals gies, which further reduced this to 1,190,443 rows (announced to all players) and private signals (con - and 1,181,084 columns. GameShrink required less fidentially communicated to individual players). than one second to run. Then, using a 1.65 GHz We assume that the legal actions that a player has IBM eServer p5 570 with 64 gigabytes of RAM (the are independent of the signals received. For exam - LP solver actually needed 25 gigabytes), we solved ple, in poker, the legal betting actions are inde - the resulting LP in 8 days using the interior-point pendent of the cards received. Finally, the barrier method of CPLEX version 9.1.2. In sum - strongest assumption is that there is a total order - mary, we found an exact solution to a game with ing among complete sets of signals, and the pay - 3.1 billion nodes in the (the largest offs are increasing (not necessarily strictly) in this incomplete-information game that had been ordering. In poker, this ordering corresponds to the solved previously had 140,000 (Koller and Pfeffer ranking of card hands. 1997)). To my knowledge, this is still the largest The abstraction algorithm operates on the sig - incomplete-information game that has been nal tree, which is the game tree with all the agents’ solved exactly. 1 action edges removed. We say that two sibling Lossy Information Abstraction. Some games are nodes in the signal tree are ordered game isomorphic so large that even after applying the kind of lossless if (1) if the nodes are leaves, the payoff vectors of abstraction described above, the resulting LP the players (which payoff in the vector material - would be too large to solve. To address this prob - izes depends on how the agents play) are the same lem, such games can be abstracted more aggres - at both nodes, and (2) if the nodes are interior sively, but this incurs loss in solution quality. nodes, there is a bipartite matching of the nodes’ One approach is to use a lossy version of children so that only ordered game isomorphic GameShrink where siblings are considered ordered children get matched. game isomorphic if their children can be approxi - The GameShrink algorithm is a bottom-up mately matched in the bipartite matching part of dynamic program that merges all ordered game the algorithm (Gilpin and Sandholm 2007b; 2006). isomorphic nodes. It runs in Õ( n2) time, where n However, lossy GameShrink suffers from three is the number of nodes in the signal tree. Game - drawbacks. Shrink tends to run in sublinear time and space in First, the resulting abstraction can be highly the size of the game tree because the signal tree is inaccurate because the grouping of states into significantly smaller than the game tree in most buckets is, in a sense, greedy. For example, if lossy nontrivial games. The beautiful aspect of this GameShrink determines that hand A is similar to method is that it is lossless (theorem 1). A small hand B, and then determines that hand B is simi - example run of GameShrink is shown in figure 2. lar to hand C, it will group A and C together, We applied GameShrink to Rhode Island despite the fact that A and C may not be very sim - Hold’em poker (Gilpin and Sandholm 2007b). ilar. The quality of the abstraction can be even That two-player game was invented as a testbed worse when a longer sequence of such compar - for computational game playing (Shi and Littman isons leads to grouping together extremely differ - 2002). Applying the sequence form to Rhode ent hands. Stated differently, the greedy aspect of Island Hold’em directly without abstraction yields the algorithm leads to lopsided buckets where large a linear program (LP) with 91,224,226 rows, and buckets are likely to attract even more states into the same number of columns. This is much too the bucket. large for (current) linear programming algorithms Second, one cannot directly specify how many to handle. We used GameShrink to shrink this, buckets lossy GameShrink should yield (overall or yielding an LP with 1,237,238 rows and columns at any specific betting round). Rather, there is a — with 50,428,638 nonzero coefficients. We then parameter (for each round) that specifies a thresh -

WINTER 2010 15 Articles

{{J1}, {J2}, {K1}, {K2}} 1/4 J1 1/4 1/4 1/4 K2 J2 K1 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 J2 K1 K2 J1 K1 K2 J1 J2 K2 J1 J2 K1 c b 1 c b c b c b 1 c b c b c b 1 c b c b c b 1 c b c b 2 2 2 2 2 2 2 2 CBFBC BBF C BBF CBFBC BBF C BBF CBFBCBFBC BBF CBFBCBFBC BBF

001 -1 1 -2 -1 11-2 0 1 0 -1 1 -2 -1 -2 1 1 21 1 21001 1 21 1 2 001 1 1 1 1 fb fb fb fb fb fb fb fb fb fb fb fb -1 0 -1 -2 -1 -2 -1 0 -1 -2 -1 -2 -1 2 -1 2 -1 0 -1 2 -1 2 -1 0

{{J1,J2}, {K1}, {K2}} J1,J2 1/2 1/4 K2 {{J1,J2}, {K1,K2}} J1,J2 1/2 1/2 K1,K2 K1 1/4 1/3 1/3 J1,J2 1/3 K2 J1,J 2K2 J1,J 2 K1 J1,J 2 K1,K2 J1,J2 K1,K2 K1 2/3 1/3 2/3 1/3 1/3 2/3 2/3 1/3 1 1 1 c b 1 c b c b c b c b c b c b c b c b c b 1 c b 2 2 2 2 2 2 2 2 2 2

CBFBCBFBC BBF CBFBC BBF CBFBCBFB CBFB BF BCBFBCBFB 001 -1 11-2 -1 -2 1 1 2 001 1 1 2 001 001 -1 1 -2 1 1 2 001 1 1 1 11 fb fb fb fb fb fb fb fb fb fb fb -1 0 -1 -2 -1 -2 -1 2 -1 0 -1 2 -1 0 -1 0 -1 -2 -1 2 -1 0

Figure 2. GameShrink Applied to a Tiny Two-Player Four-Card Poker Game. (The game consists of two jacks and two kings) (Gilpin and Sandholm 2007b). Next to each game tree is the range of the information fil - ter, which shows the abstraction. Dotted lines denote information sets, which are labeled by the controlling player. Open circles are chance nodes with the indicated transition probabilities. The root node is the chance node for player 1’s card, and the next level is for player 2’s card. The payment from player 2 to player 1 is given below each leaf. In this example, the algorithm reduces the game tree from 113 nodes to 39 nodes.

old of how different states can be and still be con - ness I will describe it in the context of two-player sidered the same. If one knows how large an LP can Texas Hold’em. be solved, one cannot create an LP of that size by The algorithm operates on an abstraction tree, specifying the number of buckets directly; rather which, unlike the signal tree, looks at the signals one must use trial-and-error (or some variant of from one player’s perspective only: the abstraction binary search applied to the setting of multiple is done separately (and possibly with different parameters) to pick the similarity thresholds (one numbers of buckets) for different players. For Texas for each round) in a way that yields an LP of rough - Hold’em poker, it is initialized as follows. The root ly the desired size. node contains (52 choose 2) = 1326 children, one The third drawback is scalability. The time need - for each possible pair of hole cards that a player ed to compute an abstraction for a three-round may be dealt in round 1. Each of these children has truncated version of two-player Limit Texas (50 choose 3) children, each corresponding to the Hold’em was more than a month. Furthermore, it possible 3-card flops that can appear in round 2. would have to be executed in the inner loop of the Similarly, the branching factor at the next two lev - parameter guessing algorithm of the previous para - els is 47 and 46, corresponding to the 1-card draws graph. in rounds 3 and 4, respectively. Expectation-Based Abstraction Using Clustering The algorithm takes as input the number of

and Integer Programming. In this subsection I buckets, Kr, allowed in the abstraction in each describe a new abstraction algorithm that elimi - round, r. For example, we can limit the number of

nates the above problems (Gilpin and Sandholm buckets of hands in round 1 to K1 = 20. Thus, we 2007a). It is not specific to poker, but for concrete - would need to group (that is, abstract) each of the

16 AI MAGAZINE Articles

(52 choose 2) = 1326 hands into 20 buckets. We treat this as a clustering problem. To perform the clustering, we must first define a metric to deter - 1. Create k centroi d point sinth e inter val betwee n the minimu m an d maximu m han dvalues. mine the similarity of two hands. Letting w, l, and 2.Assigneac h han dtoth e neares t centroid. d be the number of possible wins, losses, and draws 3. Adjus t eac h centroi dtobeth e mea nofthei r assigned (based on the rollout of the remaining cards), we han dvalues. compute the hand’s value as w + d/2, and we take 4. Repea t step s2 an d3 unti l co nvergence. the distance between two hands to be the absolute difference between their values. This gives us the necessary ingredients to apply clustering to form k buckets, for example, k-means clustering (algo - Algorithm 1. k-means Clustering for Poker Hands. rithm 1). Algorithm 1 is guaranteed to converge, but it may find a local optimum. Therefore, in our imple - We repeat this procedure for round 3 (with the mentation we run it several times with different round-2 buckets as the parents and a different starting points to try to find a global optimum. For (larger) limit (say, K3 = 4,800) on the maximum a given clustering, we can compute the error number of buckets. Then we compute the bucket - (according to the value measure) that we would ing for round 4 analogously, for example with K4 = expect to have when using the abstraction. 28,800. For the later rounds we again want to determine Overall, our technique optimizes the abstraction the buckets. Here we face the additional problem of round by round in the abstraction tree. A better determining how many children each parent in the abstraction (even for the same similarity metric) abstraction tree can have. For example, we can put could conceivably be obtained by optimizing all rounds in one holistic optimization. However, that a limit of K2 = 800 on the number of buckets in round 2. How should the right to have 800 children seems infeasible. First, the optimization problem (buckets that have not yet been generated at this would be nonlinear because the probabilities at a stage) be divided among the 20 parents? We model given level depend on the abstraction at previous and solve this problem as a 0–1 integer program levels of the tree. Second, the number of decision (Nemhauser and Wolsey 1999) as follows. Our variables in the problem would be exponential in objective is to minimize the expected error in the the size of the initial abstraction tree (which itself abstraction. Thus, for each of the 20 parent nodes, is large), even if the number of abstraction classes we run the k-means algorithm presented above for for each level is fixed. values of k between 1 and the largest number of Potential-Aware Abstraction. The expectation- children we might want any parent to have, MAX. based abstraction approach previously described We denote the expected error when node i has k does not take into account the potential of hands. children by ci,k. We denote by pi the probability of For instance, certain poker hands are considered getting dealt a hand that is in abstraction class i drawing hands in which the hand is currently weak, (that is, in parent i); this is simply the number of but has a chance of becoming very strong. Since hands in i divided by (52 choose 2). Based on these the strength of such a hand could potentially turn computations, the following 0–1 integer program out to be much different later in the game, it is finds the abstraction that minimizes the overall generally accepted among poker experts that such expected error for the second level: a hand should be played differently than another K1 MAX min P pi P ci,k xi,k hand with a similar chance of winning, but with - i=1 k =1 K1 MAX out as much potential (Sklansky 1999). However, ≤ s.t. P P kxi,k K 2 if using the difference between probabilities of i=1 k =1 MAX winning as the clustering metric, the abstraction P xi,k =1∀i k =1 algorithm would consider these two very different ∈{ } xi,k 0, 1 situations similar. One possible approach to handling the problem The decision variable xi,k is set to 1 if and only if node i has k children. The first constraint ensures that certain hands with the same probability of that the limit on the overall number of children is winning may have different potential would be to not exceeded. The second constraint ensures that consider not only the expected strength of a hand, a decision is made for each node. This problem is a but also its variance. Although this would likely be generalized knapsack problem, and although NP- an improvement over basic expectation-based complete, can be solved efficiently using off-the- abstraction, it fails to capture two important issues shelf integer programming solvers (for example, that prevail in many sequential imperfect infor - CPLEX solves this problem in less than one second mation games, including poker. at the root node of the branch-and-bound search First, mean and variance are a lossy representa - tree). tion of a probability distribution, and the lost

WINTER 2010 17 Articles

Round r-1

.3 .2 0 .5

Round r

Figure 3. States and Transitions Used in Potential-Aware Abstraction. In our potential-aware abstraction algorithm, the similarity of states is measured based on the states’ transition his - tograms to buckets at the next round. This definition is then operationalized by conducting bottom-up passes in the abstraction tree so those later-round buckets get defined first.

aspects of the probability distribution over hand sible future states. There are at least two prohibi - strength can be significant for deciding how one tive problems with this vanilla approach as stated. should play in any given situation. First, there are a huge number of possible reachable Second, the approach based on mean and vari - future states, so the dimensionality of the his - ance does not take into account the different paths tograms is too large to do meaningful clustering of information revelation that hands take in with a reasonable number of clusters (that is, small increasing or decreasing in strength. For example, enough to lead to an abstracted game that can be two hands could have similar means and vari - solved for equilibrium). Second, for any two states ances, but one hand may get the bulk of its uncer - at the same level of the game, the descendant tainty resolved in the next round, while the other states are disjoint. Thus the histograms would have hand needs two more rounds before the bulk of its nonoverlapping supports, so any two states would final strength is determined. The former hand is have maximum dissimilarity and thus no basis for better because the player has to pay less to find out clustering. the essential strength of his hand. For both of these reasons (and for reducing To address these issues, we introduced potential- memory usage and enhancing speed), we coarsen aware abstraction, where we associate with each the domains of the histograms. First, instead of state of the game a histogram over future possible having histograms over individual states, we use states (Gilpin, Sandholm, and Sørensen 2007); see histograms over abstracted states (buckets, that is, figure 3. This representation can encode all the per - clusters), which contain a number of states each tinent information from the rest of the game (such (and we use those buckets’ centroids to conduct as paths of information revelation), unlike the the similarity calculations). We will have, for each approach based on mean and variance. bucket, a histogram over buckets later in the game. As in expectation-based abstraction, we use a Second, we restrict the histogram of each bucket to clustering algorithm and an integer program for be over buckets at the next round only (rather than allocating children. They again require a distance over buckets at all future rounds). However, we metric to measure the dissimilarity between differ - introduce a technique (a bottom-up pass of con - structing abstractions up the tree) that allows the ent states. The metric we now use is the L2-distance over the histograms of future states. Specifically, let buckets at the next round to capture information S be a finite set of future states, and let each hand from all later rounds. One way of constructing the histograms would i be associated with a histogram, hi, over the future states S. Then, the distance between hands i and j be to perform a bottom-up pass of a tree represent - is ing the possible card deals: abstracting round 4 first, creating histograms for round 3 nodes based s 2 dist (i, j )= X (hi (s) −hj (s)) . on the round 4 clusters, then abstracting round 3, s ∈S creating histograms for round 2 nodes based on Under this approach, another design dimension the round 3 clusters, and so on. This is indeed what of the algorithm is in the construction of the pos - we do to find the abstraction for round 1.

18 AI MAGAZINE Articles

Winnings to potential-aware (small bets per hand) 10 6.99 5.57 5 1.06 0.088 0

-5

-10

-15 -16.6 -20 Finer- 0 4 0 5 7 5 0 5 2 7 grained 2 5 -7 -1 -1 -1 -2 0 0 5 5 0 0 5 0 abstraction -2 -5 -1 -1 -2 3 3 3 3 3 1 1 1 1 1 Potential-aware becomes lossless, win-probability-based is as good as it gets, never lossless

Figure 4. Potential-Aware Versus Expectation-Based Abstraction. (Gilpin and Sandholm 2008a).

However, for later betting rounds, we improve quickly compared to the time to even approxi - on this by leveraging our knowledge of the fact mately solve the abstracted game, so the compari - that abstracted children of any bucket at the level son comes down to how good the approximate above should only include states that can actually equilibria are when generated from abstractions of be children of the states in that bucket. We do this each of the two types. It turns out that this by multiple bottom-up passes, one for each bucket depends on the granularity that the abstraction is at the round above. For example, if a round-1 allowed to have! (This in turn is dictated in prac - bucket contains only those states where the hand tice by the speed of the equilibrium-finding algo - consists of two aces, then when we conduct rithm that is used to approximately solve the abstraction for round 2, the bottom-up pass for abstracted game.) We conducted experiments on that level-1 bucket should only consider future this question in the context of Rhode Island states where the hand contains two aces as the Hold’em so that we could solve the abstracted hole cards. This enables the abstraction algorithm game exactly (Gilpin and Sandholm 2008a). For to narrow the scope of analysis to information that both types of abstraction algorithms we allowed is relevant given the abstraction that it made for the same number of abstract buckets at each of the earlier rounds. three rounds of the game. We denote an abstrac - In the last round there is no need to use the tion granularity by the string K1-K2-K3. For exam - potential-aware techniques discussed above since ple, 13-25-125 means 13 first-round buckets, 25 the players will not receive any more information, second-round buckets, and 125 third-round buck - that is, there is no potential. Instead, we simply ets. The abstraction granularities we considered compute the fourth-round abstraction based on range from coarse (13-25-125) to fine (13-205- each hand’s probability of winning (based on dif - 1774). At this latter granularity an equilibrium-pre - ferent possible rollouts of the cards), using cluster - serving abstraction exists (Gilpin and Sandholm ing and integer programming as in expectation- 2007b). In the experiments we fix the first-round based abstraction. granularity to 13 (which allows for a lossless solu - Comparison of Expectation-Based Versus Poten - tion in principle due to suit isomorphisms), and tial-Aware Abstraction. Now, which is better, vary the granularity allowed in rounds two and expectation-based or potential-aware abstraction? three. Figure 4 shows the results when the pro - Both types of abstraction algorithms run very grams generated with these two abstraction meth -

WINTER 2010 19 Articles

ods were played against each other. For very coarse the abstraction, the solution improves. There is abstractions, the expectation-based approach does only a weak analog of this in (even two-player better. For medium granularities, the potential- zero-sum) games: If the opponent’s strategy space aware approach does better. For fine granularities, is not abstracted lossily at all, refining our player’s the potential-aware approach does better, but the abstraction causes our player to become less difference is small. Interestingly, the expectation- exploitable. (Exploitability is measured by how based approach never yielded a lossless abstraction much our player loses to its nemesis in expecta - no matter how fine a granularity was allowed. tion. This can be computed in reasonably sizeable These conclusions hold also when (1) compar - games by carrying out a best-response calculation ing the players against an optimal player (that is, to our agent’s strategy.) There are no proven guar - an equilibrium strategy), or (2) comparing each antees on the amount of exploitability as a func - player against its nemesis (that is, a strategy that tion of the coarseness of the lossy information exploits the player as much as possible in expecta - abstraction used. Furthermore, sometimes the tion), or (3) evaluating the abstraction based on its exploitability of a player increases as its abstraction ability to estimate the true value of the game. or its opponent’s abstraction is refined (Waugh et These conclusions also hold for a variant of the al. 2009a). This nonmonotonicity holds for infor - expectation-based algorithm that considers the mation abstraction even if one carefully selects the

square of the probability of winning ( vi = ( wi + least exploitable equilibrium strategy for the play - d/2) 2), rather than simply the probability (Zinke - er in the abstracted games. The nonmonotonicity vich et al. 2007). A motivation for this is that the has been shown in small artificial poker variants. hands with the higher probabilities of winning For Texas Hold’em (even with just two players), should be more finely abstracted than lower-value experience from years of the AAAI Computer Pok - hands. This is because low-value hands are likely er Competition suggests that in practice finer folded before the last round, and because it is abstractions tend to yield better strategies. Howev - important to know very finely how good a high- er, further research on this question is warranted. value hand one is holding if there is a betting esca - Another problem is that current lossy abstrac - lation in the final round. Another suggested moti - tion algorithms do not yield lossless abstractions vation is that this captures some of the variance even if enough granularity is allowed for a lossless and “higher variance is preferred as it means the abstraction to exist. For the expectation-based player eventually will be more certain about their abstraction algorithm this can already be seen in ultimate chances of winning prior to a show - figure 4, and this problem arises in some games down.” The version of this we experimented with also with the potential-aware abstraction algo - is somewhat different than the original since we rithm. One could trivially try to circumvent this are using an integer program to allocate the buck - problem by running lossless GameShrink first, and ets, which enables nonuniform bucketing. only running a lossy abstraction algorithm if the One possible explanation for the crossover abstraction produced by GameShrink has a larger between expectation-based and potential-aware number of buckets than desired. abstraction is that the dimensionality of the tem - Strategy-Based Abstraction. It may turn out that porary states used in the bottom-up pass in the abstraction is as hard a problem as equilibrium third-round (which must be smaller than the num - finding itself. After all, two states should fall in the ber of available second-round buckets in order for same abstract bucket if the optimal action proba - the clustering to discover meaningful centroids) is bilities in them are (almost) the same, and deter - insufficient for capturing the strategically relevant mining the action probabilities is done by finding aspects of the game. Another hypothesis is that an equilibrium. since the potential-aware approach is trying to This led us to develop the strategy-based abstrac - learn a more complex model (in a sense, clusters of tion approach. It iterates between abstraction and paths of states) and the expectation-based model is equilibrium finding. The equilibrium finding oper - trying to learn a less complex model (clusters of ates on the current abstraction. Once a (near) equi - states), the former requires a larger dimension to librium is found for that abstraction, we redo the capture this richness. abstraction using the equilibrium strategies to The existence of a cross-over suggests that for a inform the bucketing. Then we find a near equilib - given game — such as Texas Hold’em — as com - rium for the new abstraction, and so on. puters become faster and equilibrium-finding algo - We have applied this approach for the AAAI rithms more scalable so games with finer-grained Computer Poker Competition. However, definitive abstractions become solvable, the potential-aware results have not yet been obtained on the approach will become the better method of choice. approach because for the fine-grained abstractions Problems with Lossy Information Abstraction. used in the competition (about 10 12 leaves in the In single-agent settings, lossy abstraction has the abstracted game tree), approximate equilibrium desirable property of monotonicity: As one refines finding takes months on a shared-memory super -

20 AI MAGAZINE Articles computer using 96 cores. Therefore, we have so far ly is tricky. For one, when solving a later phase sep - had time to go over the abstraction/equilibrium arately from an earlier phase, a strategy may dis - finding cycle only twice. Future research should close to the opponent information about which explore this approach more systematically both exact later phase version is being played (in poker, experimentally and theoretically. information about the private cards the player is holding). From the perspective of the later phase Action Abstraction alone this will seem like no loss, but from the per - So far in this article I have discussed information spective of the entire game it fails to hide informa - abstraction. Another way of making games easier tion as effectively as a holistically . to solve is action abstraction, where in the abstract - This approach has been used to tackle two-play - ed game the players have fewer actions available er Texas Hold’em poker in two (or in principle than in the original game. This is especially impor - more) phases. The first phase includes the early tant in games with large or infinite action spaces, rounds. It is solved offline. To be able to solve it, such as No-Limit poker. So far action abstraction one needs a model of what would happen in the has been done by selecting some of the actions later rounds, that is, what are the payoffs at the from the original game into the abstracted game, end of each path of the first phase. The first although in principle one could generate some approach was to assume rollout of the cards in the abstract actions that are not part of the original later phase(s), that is, no betting and no folding game. Also, so far action abstractions have been (Billings et al. 2003). Better results were achieved generated manually. 2 Future research should also by taking the strategies for the later phase(s) direct - address automated action abstraction. ly from strong prior poker bots or from statistical Action abstraction begets a fundamental prob - observations of such bots playing if the bots’ strate - lem because real opponent(s) may select actions gies themselves are not available (Gilpin and Sand - outside the abstract model. To address this, work holm 2007a). I call this bootstrapping with base has begun on studying what are good reverse map - strategies. pings (figure 1), that is, how should opponents’ It has also been shown experimentally that hav - actions that do not abide to the abstraction be ing the phases overlap yields better results than interpreted in the abstracted game? One objective having the phases be disjoint. For example, the is to design a reverse mapping that tries to mini - first phase can include rounds 1, 2, and 3, while mize the player’s exploitability. Conversely, one the second phase can include rounds 3 and 4 would like to have actions in one’s own abstraction (Gilpin and Sandholm 2007a). Or the first phase that end up exploiting other players’ action can include all rounds, and the second phase a abstractions. These remain largely open research more refined abstraction of one or more of the lat - areas, but some experimental results already exist er rounds. on the former. In No-Limit poker it tends to be bet - The second phase can be solved in real time dur - ter to use logarithmic rather than linear distance ing the game so that a finer abstraction can be used when measuring how close an opponent’s real bet than if all possible second-phase games would is to the bet sizes in the abstraction (Gilpin, Sand - have to be solved (that is, all possible sequences of holm, and Sørensen 2008). Furthermore, a ran - cards and actions from the rounds before the start domized reverse mapping that weights the abstract of the second phase) (Gilpin and Sandholm betting actions based on their distance to the 2007a). Whether or not the second phase is solved opponent’s real bet tends to help (Schnizlein, offline or in real time, at the beginning of the sec - Bowling, and Szafron 2009). As with information ond phase, before the equilibrium finding for the abstraction, in some games refining the action second phase takes place, the players’ beliefs are abstraction can actually increase the player’s updated using Bayes’ rule based on the cards and exploitability (Waugh et al. 2009a). actions each player has observed in the first phase. The idea of bootstrapping with base strategies Phase-Based Abstraction, has been extended to base strategies that cover the Real-Time Equilibrium Finding, entire game, not just the end (Waugh, Bard, and and Strategy Grafting Bowling 2009). The base strategies can be comput - Beyond information and action abstraction, a ed using some abstraction followed by equilibrium third form of abstraction that has been used for finding. Then, one can isolate a part of the incomplete-information games is phase-based abstracted game at a time, construct a finer-grained abstraction. The idea is to solve earlier parts (which abstraction for that part, require that Player 1 fol - we call phases) of the game separately from later low his base strategy in all his information sets phases of the game. This has the advantage that except those in that part (no such restriction is each part can use a finer abstraction than would be placed on the opponent), and solve the game tractable if the game were solved holistically. The anew. Then one can pick a different part and do downside is that gluing the phases together sound - this again, using the same base strategy. Once all

WINTER 2010 21 Articles

AAAI Poker Competition Announced Gilpin, Sandholm, Nodes in game tree and Sørensen Scalable EGT 1,000,000,000,000 Zinkevich et al. 100,000,000,000 Counterfactual regret

10,000,000,000

1,000,000,000 Gilpin, Hoda, Peña & Sandholm 100,000,000 Scalable EGT

10,000,000

1,000,000

100,000 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Koller and Pfeffer Billings et al. Using sequence form LP (CPLEX interior point method) and LP (simplex) Gilpin and Sandholm LP (CPLEX interior point method)

Figure 5. Progress on Algorithms for Solving Two-Player Zero-Sum Games.

the parts (which constitute a partition of the infor - games. The next section covers more general mation sets where it is Player 1’s turn to move) games. have been finished, we have a strategy for Player 1. The most common solution concept (that is, Then, a strategy for Player 2 is computed analo - definition of what it means to be a solution) is gously. This grafting approach allows one to focus Nash equilibrium. A strategy for an agent defines on one part of the game at a time with fine abstrac - for each information set where it is the agent’s turn tion while having a holistic view of the game to move a probability distribution over the agent’s through the base strategies. In principle this can actions. The two agents’ strategies form a Nash equilibrium if neither agent can benefit in expec - increase the player’s exploitability, but in practice tation by deviating from her strategy given that it improves performance. Similar approaches can the other agent does not deviate from his. be used for more than two players, but nothing has Formally, the Nash equilibria of two-player zero- been published on that yet. sum sequential games are the solutions to min max yTAx =max minyTAx Equilibrium-Finding Algorithms x∈X y∈Y y∈Y x∈X (1) for Two-Player Zero-Sum Games. where X and Y are polytopes defining the players’ So far I have discussed abstraction. I will now move strategies and A is the payoff matrix (Romanovskii to algorithms for solving the (abstracted) game. 1962; Koller, Megiddo, and von Stengel 1996; von This section focuses on two-player zero-sum Stengel 1996). When the minimizer plays a strate -

22 AI MAGAZINE Articles gy x Œ X and the maximizer plays y Œ Y, the expected utility to the maximizer is yTAx and, since the game is zero-sum, the minimizer’s The re isa procedu re (algorith m4presente d later) expected utility is – yTAx. Problem 1 can be base donth e abov e smoothin g techniqu e tha t after N N N expressed as a linear program (LP) whose size is lin - iteration s generate sapai rofpoints (x , y ) ∈ X×Y such that ear in the size of the game tree. Thus the problem is solvable in polynomial time. Today’s best gener - 4 kAk sDX DY 0 ≤ f (xN ) − φ(yN ) ≤ . al-purpose LP solvers can solve games with up to N +1 σX σY 10 7 or 10 8 leaves in the game tree (corresponding Furthermo re, eac h iteratio nofth eprocedu re performs to nonzero entries in A) (Gilpin and Sandholm som e elementar y operations , th ree matrix-vector 2006). For example, losslessly abstracted Rhode multiplication s by A , an drequi res th e exac t solution Island Hold’em poker has a 10 6 × 10 6 payoff matrix of th ree subproblem softh e form T T containing 50 million nonzeros, and solving it (to max {g x − dX (x)} or max {g y − dY (y)}. (4) near machine precision) with CPLEX’s barrier x∈X y∈Y method (an interior-point LP method) took a week and used 25 gigabytes of RAM (Gilpin and Sand - holm 2007b). Interestingly, on these kinds of prob - lems the barrier method does better than the sim - plex method. Theorem 2. The LP approach does not scale to most inter - (Nesterov 2005a, 2005b). esting games. For instance, the payoff matrix A in (1) for two-player Limit Texas Hold’em poker has The functions f and are respectively convex and 14 14 f dimension 10 × 10 and contains more than concave nonsmooth functions. The left-hand side 18 10 nonzero entries. There has been tremendous of equation 2 is a standard convex minimization progress in developing equilibrium-finding algo - problem of the form rithms in the last few years, spurred in part by the min {h(x):x ∈X} . (3) AAAI Computer Poker Competition, (see figure 5). First-order methods for solving equation 3 are These new algorithms find an e-equilibrium, that algorithms for which a search direction at each is, strategies x Œ X and y Œ Y such that neither iteration is obtained using only the first-order player can benefit more than e in expectation by deviating from her strategy. information of h, such as its gradient or subgradi - ent. When h is nonsmooth, subgradient algo - Algorithms Based on rithms can be applied, but they have a worst-case Smoothing and Gradient Descent complexity of O(1/ e2) iterations (Goffin 1977). However, that pessimistic result is based on treat - In this section I will describe recent custom equi - ing h as a black box where the value and subgradi - librium-finding algorithms based on smoothing ent are accessed through an oracle. For nonsmooth and gradient descent. functions with a suitable max structure, Nesterov Extending the Excessive Gap Technique to devised first-order algorithms requiring only Sequential Games and Making it Scalable. In O(1/ e) iterations. this section I will describe the first custom equilib - The key component of Nesterov’s smoothing rium-finding algorithm for sequential incomplete- technique is a pair of prox-functions for the sets X information games (Gilpin et al. 2007; Hoda et al. and Y. These prox-functions are used to construct 2010). It took equilibrium finding to a new level by smooth approximations f ≈ f and f ≈ f. To obtain 12 µ µ solving poker games with 10 leaves in the game approximate solutions to equation 2, gradient- tree (Gilpin, Sandholm, and Sørensen 2007), and it based algorithms can then be applied to fµ and fµ. found an e-equilibrium with a small e (0.027 small We say that a function is a prox-function if it is bets per hand in abstracted Limit Texas Hold’em). strongly convex and its minimum is zero. Assume It remains one of the fastest equilibrium-finding dX and dY are prox-functions for the sets X and Y algorithms in practice. respectively. Then for any given µ > 0, the smooth The algorithm is an adaptation of Nesterov’s approximations fµ ≈ f and fµ ≈ f are excessive gap technique (Nesterov 2005a; 2005b) T f µ(x):=max{y Ax − µd Y (y):y ∈ Y}, to sequential games, and it also includes tech - T niques for significantly improving scalability both φµ(y):=min{y Ax + µd X (x):x ∈ X }. in terms of time and memory usage. Let DX := max{ dX (x): x Œ X}, and let sX denote the Problem 1 can be stated as strong convexity modulus of dX . Let DY and sY be min f (x)= max φ(y) (2) 3 x∈X y∈Y defined likewise for Y and dY. where For an algorithm based on theorem 2 to be prac - T T tical, the subproblems (equation 4) must be solv - f (x)= max y Ax and φ(y)=min y Ax. y∈Y x∈X able quickly. They can be phrased in terms of the

WINTER 2010 23 Articles

In normal form (also known as bimatrix) games, the strategy of each player lives in a simplex, that There is a wa ytoconstruc tniceprox-functionsfor sequenc eformgames. is, the probabilities on her actions sum to one. For the k-dimensional simplex Dk, the entropy function k

d(x)= ln k + X xi ln xi , Theorem 3. i=1 (Hoda et al. 2010). and the Euclidean distance function 1 k d(x)= X(x −1/k )2 2 i i=1 initial (A, dX ,dY ) are nice prox-functions. k k 1. 0 := 0 := √ A In multistep games, the strategies live in a more µX µY σX σY ∗ complex space. The common way to represent 2. xˆ := ∇dX (0) 0 ∗ ³ 1 ´ multistep games of perfect recall is the sequence 3. y := ∇dY 0 Axˆ µ Y form (Romanovskii 1962; Koller, Megiddo, and 0 ∗ ³ 1 T 0´ 4. x := ∇dX ∇dX (xˆ)+ 0 A y von Stengel 1996; von Stengel 1996). Our theorem µ X 0 0 0 0 3 enables the use of Nesterov’s excessive gap tech - 5. Return (µX ,µY , x , y ) nique for these games. Nesterov’s Excessive Gap Technique (EGT). For

µX, µ Y > 0, consider the pair of problems:

T f (x):=max {y Ax − µY dY (y):y ∈Y}, Algorithm 2. µ Y T φµ X (y):=min {y Ax + µX dQ (x):x ∈X}. k k k k Algorithm 4 generates iterates ( x , y , µ X, µ Y) with k k µ X, µ Y decreasing to zero and such that the fol - lowing excessive gap condition is satisfied at each shrink ( A, µX ,µY ,τ,x, y,dX ,dY ) iteration: ∗ 1 T ≤ 1. x˘ := ∇d ³− A y ´ f µ Y (x) φµ X (y). (5) X µ X 2. xˆ := (1 − τ)x + τx˘ From equation 4 and the fact f(x) ≥ f(y), we see 3. yˆ := ∇d∗ ³ 1 Axˆ´ that Y µ Y ≤ − ≤ ∗ τ T 0 φ(y) f (x) µX DX + µY DY . (6) 4. x˜ := ∇d ³∇dX (x˘) − A yˆ´ X (1 −τ )µ X Consequently, f(xk) ≈ f(yk) when µ k and µ k are 5. y+ := (1 − τ)y + τyˆ X Y 6. x+ := (1 − τ)x + τx˜ small. + Algorithm 2 finds a starting point that satisfies 7. µX := (1 − τ)µX + + + the excessive gap condition (equation 5). 8. Return (µX , x , y ) Algorithm 3 decreases µ X and µ Y while main - taining equation 5.

If the input (µ X, µ Y, x, y ) to algorithm 3 satisfies + + + equation 5, then so does (µ X, µ Y, x , y ) as long as 2 2 t satisfies t /(1 − t ) ≤ µ X µY sX sY /|| A|| (Nesterov Algorithm 3. 2005a). We are now ready to describe Nesterov’s exces - sive gap technique specialized to equation 1 (see conjugate of the functions dX and dY. The conju - gate of d : Q Æ ޒ is the function d∗ : ޒn Æ ޒ algorithm 4). By theorem 2, algorithm 4 — which defined by we will refer to as EGT — finds an e-equilibrium in ∗ O d (s):=max{sTx − d(x):x ∈ Q}. (1/ e) iterations. Furthermore, for games with ordered signals, each iteration runs in linear time If d is strongly convex and Q is compact, then the in the size of the game tree (Hoda et al. 2010). conjugate d∗ is Lipschitz continuous, differentiable Heuristics. EGT can be sped up by applying heuris - everywhere, and tics that decrease µ and µ faster, while main - ∇ ∗ { T − ∈ } X Y d (s)= argmax s x d(x):x Q . taining the excessive gap condition (4) (Gilpin et If the prox-function’s conjugate and the conju - al. 2007). This leads to faster convergence in prac - gate’s gradient are computable quickly (and the tice without compromising any of the theoretical prox-function is continuous, strongly convex, and guarantees. differentiable), we say that the prox-function is The first heuristic is based on the following nice (Hoda et al. 2010). With nice prox-functions observation: although the value t = 2/( k + 3) com - the overall algorithm is fast. puted in step 2(a) of EGT guarantees the excessive

24 AI MAGAZINE Articles gap condition (equation 5), this is potentially an overly conservative value. Instead we can use an adaptive procedure to choose a larger value of t. EGT (A, dX ,dY ) Since we then can no longer guarantee equation 5 0 0 0 0 a priori, we do a posterior verification, which occa - 1. (µX , µY , x , y )= initial(A, dX , dY ) sionally necessitates an adjustment in the parame - 2. For k=0, 1,...: (a) τ := 2 ter t. k +3 The second heuristic is motivated by the obser - (b ) If k is even ://shrinkµX k +1 k +1 k +1 vation that after several iterations, one of µ and i. (µX , x , y ) := X k k k k µ may be much smaller than the other. This shrink(A, µ X ,µY , τ, x , y ,dX ,dY ) Y k +1 k imbalance is undesirable because the larger one ii. µY := µY contributes the most to the worst-case bound (c ) If k is odd: // shrink µY k +1 k +1 k +1 i. (µY , y , x ) := (equation 6). Hence, every so many iterations we T k k k k shrink( −A , , y , x ,dY ,dX ) perform a balancing to bring these values closer ,µY ,µX τ k +1 k together. The balancing consists of repeatedly ii. µX := µX shrinking the larger one of µ X and µ Y. We also observed that after such balancing, the values of µ X and µ Y can sometimes be further reduced without violating the excessive gap condi - tion (equation 5). We thus include a final reduc - Algorithm 4. tion step in the balancing. Experiments on automatically abstracted poker  b11 C ··· b1n C  . . . mp ×nq games show that each of the two heuristics tends B ⊗ C =  . .. .  ∈ ޒ .   to reduce e by about an order of magnitude. Those bm 1C ··· bmn C experiments also show that using the entropy Given the above decomposed representation of prox-function at the leaves performs better than A, the space required is sublinear in the size of the using the Euclidian prox-function. game tree. For example, in Rhode Island Hold’em,

Decomposed Game Representation to Save the dimensions of the F1 and F2 matrices are 10 × Memory. One attractive feature of first-order meth - 10 and 70 × 70 respectively. The dimension of the ods like EGT is that the only operation performed F3 and S matrices are 490 × 490. The dimensions of on the matrix A is a matrix-vector product. We can B1, B2, and B3 are 13 × 13, 205 × 205, and 1,774 × thus exploit the problem structure to store only an 1,774, respectively. By contrast, the matrix A is implicit representation of A (Gilpin et al. 2007). 883,741 × 883,741. Furthermore, the matrices Fi, This representation relies on a certain type of Bi, S, and W are themselves sparse, so we capitalize decomposition that is present in games with on the Compressed Row Storage (CRS) data struc - ordered signals. For example, the betting sequences ture that only stores nonzero entries. Table 1 that can occur in most poker games are independ - demonstrates that the decomposed game repre - ent of the cards that are dealt. We can decompose sentation enables equilibrium finding in the large. the payoff matrix based on these two aspects. Speeding Up the Matrix-Vector Products Using For ease of exposition, we explain the concise Parallelization and Sampling. The matrix-vector representation in the context of Rhode Island operations dominate the run time of first-order Hold’em. The payoff matrix A can be written as algorithms like EGT. They can be parallelized on " A1 # multiple cores with near-perfect efficiency (Gilpin A = A2 A3 et al. 2007). For further speedups, one can sample the payoff matrix A to construct a sparser matrix, where and then run the algorithm on the latter. One can A1 = F 1 ⊗ B 1, A2 = F 2 ⊗ B 2, and also redo the sampling dynamically if overfitting A3 = F 3 ⊗ B 3 + S ⊗ W to the sparser matrix occurs (Gilpin and Sandholm 2010). for much smaller matrices Fi, B i, S, and W. The matrices Fi correspond to sequences of moves in Poker Players Created. In 2008, the Association round i that end with a fold, and S corresponds to for the Advancement of Artificial Intelligence the sequences in round 3 that end in a showdown. (AAAI) held the third annual AAAI Computer Pok -

The matrices Bi encode the betting structures in er Competition, where computer programs sub - round i, while W encodes the win/lose/draw infor - mitted by teams worldwide competed against each mation determined by poker hand ranks. The sym - other. We generated our players using lossy bol ƒ denotes the Kronecker product. The Kro - abstraction algorithms followed by equilibrium necker product of two matrices B Œ ޒm×n and C Œ finding in the abstracted game using the EGT algo - ޒp×q is rithm described above. GS4-Beta placed first (out

WINTER 2010 25 Articles

sequential games. This entailed developing a cus - tom dynamic program. Nam e CP LE X I PM CP LE X Si mpl ex EG T We added an outer loop that decreases e by a fac - 10 k 0.082 GB > 0.051 GB 0.012 GB tor e in between calls to smoothing. The key was 160 k 2.25 GB > 0.664 GB 0.035 GB then to prove that each such call to smoothing RI 25 .2 GB > 3.45 GB 0.15 GB uses only a constant number of first-order itera - tions (gradient steps). It follows immediately that Texa s > 458 GB > 458 GB 2.49 GB the overall algorithm runs in O(log 1/ e) first-order GS4 > 80 ,000 GB > 80 ,000 GB 43 .96 GB iterations (theorem 4). Experiments verified that this algorithm con - verges faster than EGT, and the speed difference increases systematically the smaller e is. Current Table 1. work includes developing robust software imple - mentations of this algorithm that can scale to the Memory footprint of CPLEX interior-point method (IPM), CPLEX simplex, large. and our memory-efficient version of EGT. 10k and 160k are lossy abstractions Our O(log 1/ e) convergence rate matches the of Rhode Island Hold’em, and RI is lossless. Texas and GS4 are lossy abstrac - best known convergence rate: that of interior- tions of Texas Hold’em. point methods. At the same time, our algorithm — being a first-order method — is scalable while cur - rent interior-point methods require a prohibitive amount of memory. Algorithm Based on Counterfactual Th e algorith m find s an ²-equilibriu m in √ √ Regret Minimization (CFR) 2 2 · e · κ(A) · ln(2 kAk/² ) · D Soon after we developed an EGT-based algorithm first-o rde r iterations , whe re k·k i s th eEuclidea n matrix for sequential games, another algorithm, counter - norm, D is th e maximu mEuclidea n distanc e between factual regret (CFR) was introduced that can also strategies , and κ(A) i saconditio n measu re of A. find an e-equilibrium with small e in games with 10 12 leaves in the game tree (Zinkevich et al. 2007). It is based on totally different principles than the gradient-based equilibrium-finding algorithms described above. Specifically, it is based on regret Theorem 4. minimization principles, and it is crafted cleverly (Gilpin and Sandholm 2008a). so they can be applied at individual information sets (usually they are employed in the space of strategies; that would not scale to large sequential games). of nine) in the Limit Bankroll Competition and The average overall regret for agent i, RN is how Tartanian placed third (out of four) in the No-Lim - i much better off i would have been on average in it Competition. Tartanian actually had the highest repetitions 1.. N of the game by using his or her winning rate in the competition, but due to the best fixed strategy than by having played the way winner determination rule for the competition, it he or she did. If both players have average overall got third place. regret less than e, then Agent 1’s time-averaged A Gradient-Based Algorithm with O(log 1/ e) strategy and Agent 2’s time-averaged strategy con - Convergence. Recently we developed a gradient- stitute a 2 e-equilibrium. based equilibrium-finding algorithm that finds an The design by Zinkevich et al. starts by studying e-equilibrium exponentially faster: in O(log 1/ e) one information set I and player i’s choices made

iterations (Gilpin, Peña, and Sandholm 2008). It in I. Define counterfactual utility ui(s, I) to be the uses as a subroutine a procedure we call smoothing, expected utility given that information set I is which is a recent smoothing technique for non - reached and all players play using strategy s except smooth convex optimization (Nesterov 2005b). that player i plays to reach I. Formally, letting ps(h, The algorithm is unlike EGT in that there is only hЈ) be the probability of going from history h to hЈ, one function that is being optimized and and letting Z be the set of terminal histories:

smoothed instead of two. Also unlike in EGT, the σ σ 0 0 Ph∈I,h 0∈Z π−i (h)π (h, h )ui (h ) target e needs to be specified in advance. This is ui (σ,I)= πσ (I ) used to set the constant µ that is used as the mul - −i

tiplier on the prox-function. Unlike in EGT, µ does For all actions a ŒA(I), define s|IÆa to be a strategy not change inside smoothing. profile like s except that player i always chooses We showed that smoothing can be extended to action a in I. The immediate counterfactual regret is

26 AI MAGAZINE Articles

N R i,imm (I )= N 1 σ t t t max X π−i (I )( ui (σ |I →a ,I) − ui (σ ,I)) N a∈A (I ) t=1 Let ∆ i be the difference between i’s highest and lowest payoff in the game. With th√e This is the player’s regret in its decisions at I in N ≤ |A |/ CFR algorithm, R i,imm (I ) √∆ i p i N terms of counterfactual utility, with an additional N and thus R i ≤ ∆ i |I i |p|A i|/ N , where weighting term for the counterfactual probability |A i | =maxh:Player (h)= i |A (h)|. that I would be reached if the player had tried to do so. Denoting N, + N R i,imm (I )=max{R i,imm (I ),0} it turns out that that the average overall regret can be bounded by the local ones: Theorem 5. N N, + R i ≤ X R i,imm (I ). (Zinkevich et al. 2007.) I ∈I i The key is that immediate counterfactual regret can be minimized by controlling only si(I). The overall speed advantage over the other depending CFR algorithm maintains for all I ŒIi, for all a on the game. ŒA(I) N A Practical Use of Imperfect Recall R i (I,a)= N All of the results above are for games with perfect 1 σ t t t X π (I )( u (σ | → ,I) −u(σ ,I)) N −i i I a i recall, which includes most games played by com - t=1 puters. A recent idea has been to model a game of Let the positive counterfactual regret be perfect recall like poker as a game of imperfect N, + N R i (I,a)= max {R i (I,a), 0}. recall by allowing the player to use fine-grained The CFR algorithm simulates the players playing abstraction at the poker round that she is in and the game repeatedly. Actions for each agent are then forgetting some of those details of that round selected in proportion to their positive counterfac - so as to be able to have a larger branching factor in tual regret. (If no action has positive counterfactu - the information abstraction for the next round al regret, then the action is selected randomly.) The while keeping the overall abstracted game size output is a strategy for each agent; it is the time- manageable (Waugh et al. 2009b). Such imperfect averaged strategy computed from the agent’s recall abstracted games have been approached strategies in repetitions 1.. N. using CFR, but there are no convergence guaran - tees.

CFR can be sped up by sampling the chance Equilibrium-Finding moves, that is, sampling bucket sequences in the abstraction, rather than considering all the histo - Algorithms for Multiplayer ries that lead to the information set that is being and Nonzero-Sum Games updated. This enables approximately 750 itera - tions per second on a Texas Hold’em abstraction, While the abstraction methods discussed earlier and yields a smaller e faster (Zinkevich et al. 2007). are for n-player general-sum games, the equilibri - Other forms of sampling can also be used (Lanctot um-finding algorithms above are for two-player et al. 2009). By theorem 5, CFR runs in O(1/ e2) iter - zero-sum games. The problem of finding a Nash ations (and the chance sampling incurs only a lin - equilibrium in two-player general-sum games is ear increase in the number of iterations) which is PPAD-complete even in normal form games with significantly worse than the O(1/ e) and O(log(1/ e)) (Daskalakis, Goldberg, and guarantees of the smoothed gradient based tech - Papadimitriou 2008; Chen and Deng 2006), sug - niques described above. In practice, on Texas gesting that there likely is no polynomial-time Hold’em abstractions with 10 12 leaves, CFR is run algorithm for all instances. The PPAD-complete - for about a billion sampled iterations while EGT ness holds even when payoffs are restricted to be needs to be run only for about a hundred itera - binary (Abbott, Kane, and Valiant 2005). With tions. On the other hand, each sampled CFR itera - three or more players, the problem becomes FIXP- tion runs much faster than an iteration of those complete even in the zero-sum case (Etessami and other algorithms: EGT takes hours per iteration Yannakakis 2007). (even when the matrix-vector products are paral - To my knowledge the best equilibrium-finding lelized across 96 cores). Our initial direct compar - algorithms for general multiplayer incomplete- isons between EGT and CFR on small and medium- information games are continuation methods sized games show that either can have a significant (Govindan and Wilson 2003). They perturb the

WINTER 2010 27 Articles

to make and break partnerships and contracts. In poker, the cards in the hand are private signals, 1 and in equilibrium, often the same action is taken BLUFF-FOLD d FOLD/B LUFF in continuous regions of the signal space (for n d example, Ankenman and Chen [2006]). The idea a i h of using qualitative models as an extra input for

e l FOLD/CH ECK equilibrium finding has been applied to continu -

b CH ECK- FOLD i ous (and finite) multiplayer Bayesian games, a s h s broad class of imperfect-information games that

o RAIS E-B LUFF/CHECK

p c g includes many variants of poker (Ganzfried and

t Sandholm 2010). Figure 6 shows an example of a s CA LL /CHECK e qualitative model for a simplified poker game.

b f CH ECK-CALL Given a qualitative model that is correct, there is = a mixed integer linear feasibility program (MILFP) 0

, b that finds an equilibrium (a mixed strategy equilib - s CALL/ BET

e rium in games with a finite number of types and a BET -FOLD u l a pure strategy equilibrium in games with a continu - a um of types) (Ganzfried and Sandholm 2010). The V paper also presents extensions of the algorithm to d

n BET -CALL e games with dependent private signal distributions, a many players, and multiple candidate qualitative H RAI SE/BET models of which only some are correct. Experi - 0 ments show that the algorithm can still compute P1- Act ions P2- Act ions an equilibrium even when it is not clear whether any of the models are correct, and an efficient pro - cedure is given for checking the output in the event that they are not correct. The MILFP finds an exact Figure 6. A Qualitative Model for a Simplified Poker Game. equilibrium in two-player games, and an e-equilib - (Ganzfried and Sandholm 2010). Player 1’s action regions are on the left, Play - rium in multiplayer games. It also yields a MILFP er 2’s on the right. for solving general multiplayer imperfect-informa - tion games given in extensive form without any qualitative models. For most of these games classes, no prior algorithm was known. game by giving agents fixed bonuses, scaled by l, Experiments suggest that modeling a finite game for each of their actions. If the bonuses are large with an infinite one with a continuum of types can enough (and unique), they dominate the original significantly outperform abstraction-based ap- game, so the agents need not consider their oppo - proaches on some games. Thus, if one is able to con - nents’ actions. There is thus a unique pure-strate - struct a correct qualitative model, solving the MIL - gy equilibrium easily determined at l = 1. The con - FP formulation of the infinite approximation of a tinuation method can then be used to follow a game could potentially be the most efficient ap - path in the space of l and equilibrium profiles for proach to solving certain classes of large finite the resulting perturbed game, decreasing l until it games (and the only approach to solving the infi - is zero, at which point the original game has been nite version). The main algorithm was used to solved. The algorithm scales better for games that improve play in two-player limit Texas hold’em by can be represented in a structured way using mul - solving endgames. In addition, experiments dem - tiagent influence diagrams (Blum, Shelton, and on strated that the algorithm was able to efficiently Koller 2006). However, even then it has only been compute equilibria in several infinite three-player applied to relatively small games. games. Leveraging Qualitative Models Solving Multiplayer Stochastic A recent idea that scales to significantly larger Games of Imperfect Information games takes advantage of the fact that in many set - Significant progress has also been made on equi - tings it is easier to infer qualitative models about librium finding in multiplayer stochastic games of the structure of equilibrium than it is to actually imperfect information (Ganzfried and Sandholm compute an equilibrium. For example, in 2008; 2009). For example, consider a No-Limit (sequences of) take-it-or-leave-it offers, equilibria Texas Hold’em tournament. The best way to play involve accepting offers above a certain threshold differs from how one should play an individual and rejecting offers below it. Threshold strategies hand because there are considerations of bankroll are also common in auctions and in deciding when management (one gets eliminated once one runs

28 AI MAGAZINE Articles out of chips) and the payoffs are based on ranks in approaches. For example, our poker player that was the tournament rather than chips. This becomes constructed using potential-aware abstraction and especially important near the end of a tournament EGT, and used no learning, won the Bankroll Com - (where the antes are large). One simple strategy petition in the AAAI 2008 Computer Poker Com - restriction is to always go all-in or fold in Round 1 petition. This was noteworthy because the Bankroll (that is, once the private cards have been dealt but Competition is designed to favor learning programs no public cards have). In the two-player case, the that can take advantage of weak opponents. best strategy in that restricted space is almost opti - One weakness in the learning approach is the mal against an unrestricted opponent (Miltersen get-taught-and-exploited problem (Sandholm 2007): and Sørensen 2007). It turns out that if all players An opponent might play in a way to teach the are restricted in this way, one can find an e-equi - learner a model, and then exploit the learner that librium for the multiplayer game (Ganzfried and attempts to use that model. Furthermore, the Sandholm 2008; 2009). The algorithms have an opponent might lose significantly less from the inner loop to determine e-equilibrium strategies teaching than he gains from the exploitation. for playing a hand at a given state (stack vector, One recent approach that has been pursued one stack of chips per player) given the values of both at University of Alberta’s poker research possible future states. This is done for all states. The group (Johanson, Zinkevich, and Bowling 2007; iteration of the outer loop adjusts the values of the Johanson and Bowling 2009) and mine is to start different states in light of the new payoffs obtained with a game-theory-based strategy and then adjust from the inner loop. Then the inner loop is exe - it in limited ways to exploit the opponent as we cuted again until convergence, then the outer learn more about the opponent. This already yield - loop, and so on. ed a win in the Bankroll Competition in the AAAI For instance, can be used for the 2009 Computer Poker Competition, and is a prom - inner loop and policy iteration for solving Markov ising direction for the future. decision processes for the outer loop. Several other There are some fundamental limits, however. variants were also studied. None of the variants are Can this be done in a safe way? That is, can one guaranteed to converge, but some of them have exploit to some extent beyond the game-theoretic the property that if they converge, they converge equilibrium strategy while still maintaining at least to an equilibrium. In practice, both the inner and the same expected payoff as the equilibrium strat - outer loop converge quickly in all of the tested egy? Recently Sam Ganzfried and I proved that this variants. This suggests that fictitious play is anoth - is impossible. So, in order to increase exploitation, er promising algorithm for multiplayer imperfect- one needs to sacrifice some on the game-theoretic information games. safety guarantee.

Opponent Exploitation Additional Topics So far I have discussed approaches based on game Beyond what I discussed so far, there are other theory. A totally different approach is to try to interesting developments in the computation of learn to exploit opponents. solutions to incomplete-information games. Let Two-player zero-sum games have the nice prop - me briefly discuss some of them here. erty that our player can only benefit if the oppo - One question is whether Nash equilibrium is the nent does not play an equilibrium strategy. Fur - right solution concept. In two-player zero-sum thermore, it does not matter which equilibrium games it provides a safety guarantee as discussed strategies are selected: if ( x, y ) and ( xЈ, y Ј) are equi - above, but in more general games it does not libria, then so are ( x, y Ј) and ( xЈ, y ). However, even because can be an issue. Even in two-player zero-sum games, an equilibrium in two-player zero-sum games, the equilibrium strategy might not maximally exploit an opponent strategy may not play the rest of the game opti - that does not play equilibrium. In multiplayer mally if the opponent makes a mistake. Various games, there is the further complications of equi - equilibrium refinements can be used to prune such librium selection. equilibrium strategies from consideration, and There is a long history of opponent-exploitation there has been some work on computing equilibri - research in AI (for example, by building models of um strategies that honor such refinements (for opponents). That has also been studied for poker example, (Miltersen and Sørensen 2010; 2006; (for example, Billings et al. [1998], Southey et al. 2008)). In multiplayer games there is also the pos - [2005], and Bard and Bowling [2007]). In practice sibility of (coordination of strategies — at least in large games like Texas Hold’em (even and/or information), and there are coalitional in the two-player case), even with relatively large equilibrium refinements (for example, Milgrom numbers of hands to learn from — those approach - and Roberts [1996], Moreno and Wooders [1996], es are far inferior to the game-theory-based and Ray [1996]).

WINTER 2010 29 Articles

There is also work on other general classes of analysis to a small number of strategies and conduct games. An optimal polynomial algorithm was equilibrium analysis among them (Wellman 2006). recently developed for repeated incomplete-infor - 3. The operator norm of A is defined as || A|| := max{ yT Ax: mation games (Gilpin and Sandholm 2008b). || x||, || y|| ≤ 1}, where the norms || x||, || y|| are those associ - Work has also been done on Kriegspiel (chess ated with sX and sY. where the players do not observe each others’ moves), but the best-performing techniques are References still based on sampling of the game tree rather Abbott, T.; Kane, D.; and Valiant, P. 2005. On the Com - than game-theoretic approaches (Ciancarini and plexity of Two-Player Win-Lose Games. In Proceedings of Favini 2009). There has been work on computing the 46th Annual IEEE Symposium on Foundations of Com - puter Science (FOCS-05). Los Alamitos, CA: IEEE Comput - commitment (Stackelberg) strategies (where Play - er Society. er 1 has to commit to a mixed strategy first, and Ankenman, J., and Chen, B. 2006. The Mathematics of then Player 2 picks a strategy) in normal form Poker. Pittsburgh, PA: ConJelCo LLC. games, with significant security applications (for Bard, N., and Bowling, M. 2007. Particle Filtering for example, Conitzer and Sandholm [2006]; Jain et Dynamic Agent Modelling in Simplified Poker. In Pro - al. [2010]). Recently that was studied also in ceedings of the 22nd AAAI Conference on Artificial Intelli - sequential incomplete-information games (Letch - gence (AAAI-07), 515–521. Menlo Park, CA: AAAI Press. ford and Conitzer 2010; Kiekintveld, Tambe, and Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer, Marecki 2010). J.; Schauenberg, T.; and Szafron, D. 2003. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker. In Proceedings of the 18th International Joint Conference on Conclusions Artificial Intelligence (IJCAI-03). San Francisco: Morgan There has been tremendous progress on solving Kaufmann Publishers. incomplete-information games in the last five Billings, D.; Davidson, A.; Schaeffer, J.; and Szafron, D. years. For some rather large games like two-player 2002. The Challenge of Poker. Artificial Intelligence 134(1- 2): 201–240. Rhode Island Hold’em, an optimal strategy has been computed. An optimal strategy is not yet Billings, D.; Papp, D.; Schaeffer, J.; and Szafron, D. 1998. Opponent Modeling in Poker. In Proceedings of the 15th known for any variant of Texas Hold’em, but in National Conference on Artificial Intelligence (AAAI-98), two-player Limit Texas Hold’em — a game that is 493–499. Menlo Park, CA: AAAI Press. frequently used in competitions among profes - Blum, B.; Shelton, C. R.; and Koller, D. 2006. A Continu - sional poker players — computers have surpassed ation Method for Nash Equilibria in Structured Games. humans. In the No-Limit and multiplayer variants Journal of Artificial Intelligence Research 25: 457–502. humans remain ahead. Chen, X., and Deng, X. 2006. Settling the Complexity of This is a very active and fertile research area. I 2-Player Nash Equilibrium. In Proceedings of the 47th hope this article helps newcomers enter the field, Annual IEEE Symposium on Foundations of Computer Science spurs interest in further pushing the boundary of (FOCS) . Los Alamitos, CA: IEEE Computer Society. what is computationally possible, and facilitates Ciancarini, P., and Favini, G. P. 2009. Monte Carlo Tree adoption of these approaches to additional games Search Techniques in the Game of Kriegspiel. In Proceed - of importance, for example, negotiation, auctions, ings of the 21st International Joint Conference on Artificial and various security applications. Intelligence (IJCAI-09). Menlo Park, CA: AAAI Press. Conitzer, V., and Sandholm, T. 2006. Computing the Acknowledgments Optimal Strategy to Commit to. In Proceedings of the 11th I want to thank Andrew Gilpin, Javier Peña, Sam ACM Conference on Electronic Commerce, 83–92. New York: Ganzfried, Troels Bjerre Sørensen, and Sam Hoda. Association for Computing Machinery. Their contributions to this research were essential. Daskalakis, C.; Goldberg, P.; and Papadimitriou, C. 2008. I also thank Kevin Waugh for conducting experi - The Complexity of Computing a Nash Equilibrium. SIAM ments between EGT and CFR while he was in my Journal on Computing 39(1): 195–259 laboratory. Etessami, K., and Yannakakis, M. 2007. On the Complex - This material is based upon work supported by ity of Nash Equilibria and Other Fixed Points (Extended Abstract). In Proceedings of the 48th Annual IEEE Sympo - the National Science Foundation under grants ITR- sium on Foundations of Computer Science (FOCS-07) , 113– 0427858, IIS-0905390, and IIS-0964579. I also 123. Los Alamitos, CA: IEEE Computer Society. thank IBM and Intel for their machine gifts, and Ganzfried, S., and Sandholm, T. 2010. Computing Equi - the Pittsburgh Supercomputing Center for com - libria by Incorporating Qualitative Models. In Proceedings pute time and help. of the Ninth International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2009) . Richland, SC: Notes International Foundation for Autonomous Agents and 1. The reader is invited to play against this strategy at Multiagent Systems. www.cs.cmu.edu/~gilpin/gsi.html. Ganzfried, S., and Sandholm, T. 2009. Computing Equi - 2. An extreme form of action abstraction is to restrict libria in Multiplayer Stochastic Games of Imperfect Infor -

30 AI MAGAZINE Articles mation. In Proceedings of the 21st International Joint Con - er. In Proceedings of the 22nd AAAI Conference on Artificial ference on Artificial Intelligence (IJCAI-09). Menlo Park, CA: Intelligence (AAAI-07). Menlo Park, CA: AAAI Press. AAAI Press. Goffin, J.-L. 1977. On the Convergence Rate of Subgradi - Ganzfried, S., and Sandholm, T. 2008. Computing an ent Optimization Methods. Mathematical Programming Approximate Jam/Fold Equilibrium for 3-Agent No-Lim - 13(1): 329–347. it Texas Hold’em Tournaments. In Proceedings of the 8th Govindan, S., and Wilson, R. 2003. A Global Newton International Conference on Autonomous Agents and MultiA - Method to Compute Nash Equilibria. Journal of Economic gent Systems (AAMAS). Richland, SC: International Foun - Theory 110(1): 65–86. dation for Autonomous Agents and Multiagent Systems. Hoda, S.; Gilpin, A.; Peña, J.; and Sandholm, T. 2010. Gilpin, A., and Sandholm, T. 2010. Speeding Up Gradi - Smoothing Techniques for Computing Nash Equilibria of ent-Based Algorithms for Sequential Games. In Proceed - Sequential Games. Mathematics of Operations Research ings of the Ninth International Conference on Autonomous 35(2): 494–512. Agents and MultiAgent Systems (AAMAS 2010). Richland, Jain, M.; Tsai, J.; Pita, J.; Kiekintveld, C.; Rathi, S.; SC: International Foundation for Autonomous Agents Ordóñez, F.; and Tambe, M. 2010. Software Assistants for and Multiagent Systems. Randomized Patrol Planning for The LAX Airport Police Gilpin, A., and Sandholm, T. 2008a. Expectation-Based and The Federal Air Marshals Service. Interfaces 40(4): Versus Potential-Aware Automated Abstraction in Imper - 267–290. fect Information Games: An Experimental Comparison Johanson, M., and Bowling, M. 2009. Data Biased Robust Using Poker. In Proceedings of the 23rd AAAI Conference on Counter Strategies. Paper presented at the 12th Interna - Artificial Intelligence (AAAI-08) . Short paper. Menlo Park, tional Conference on Artificial Intelligence and Statistics CA: AAAI Press. (AISTATS), Clearwater Beach, FL. 16–18 April. Gilpin, A., and Sandholm, T. 2008b. Solving Two-Person Johanson, M.; Zinkevich, M.; and Bowling, M. 2007. Zero-Sum Repeated Games of Incomplete Information. In Computing Robust Counter-Strategies. In Proceedings of Proceedings of the Seventh International Conference on the 21st Annual Conference on Neural Information Processing Autonomous Agents and MultiAgent Systems (AAMAS 2008). Systems (NIPS 2007). Cambridge, MA: The MIT Press. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems. Kiekintveld, C.; Tambe, M.; and Marecki, J. 2010. Robust Bayesian Methods for Stackelberg Security Games Gilpin, A., and Sandholm, T. 2007a. Better Automated (Extended Abstract). In Proceedings of the Ninth Interna - Abstraction Techniques for Imperfect Information tional Conference on Autonomous Agents and MultiAgent Sys - Games, with Application to Texas Hold’em Poker. In Pro - tems (AAMAS 2010). Richland, SC: International Founda - ceedings of the 6th International Conference on Autonomous tion for Autonomous Agents and Multiagent Systems. Agents and MultiAgent Systems (AAMAS 2007) , 1168–1175. Richland, SC: International Foundation for Autonomous Koller, D., and Pfeffer, A. 1997. Representations and Solu - Agents and Multiagent Systems. tions for Game-Theoretic Problems. Artificial Intelligence 94(1): 167–215. Gilpin, A., and Sandholm, T. 2007b. Lossless Abstraction of Imperfect Information Games. Journal of the ACM Koller, D.; Megiddo, N.; and von Stengel, B. 1996. Effi - 54(5): 1–32. cient Computation of Equilibria for Extensive Two-Per - son Games. Games and Economic Behavior 14(2): 247–259. Gilpin, A., and Sandholm, T. 2006. A Competitive Texas Hold’em Poker Player via Automated Abstraction and Kuhn, H. W. 1950. A Simplified Two-Person Poker. In Real-Time Equilibrium Computation. In Proceedings of the Contributions to the Theory of Games, Annals of Mathe - 21st National Conference on Artificial Intelligence (AAAI-06), matics Studies 24, ed. H. W. Kuhn and A. W. Tucker, vol - 1007–1013. Menlo Park, CA: AAAI Press. ume 1, 97–103. Princeton, NJ: Princeton University Press. Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gra - Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. dient-Based Algorithms for Finding Nash Equilibria in 2009. Monte Carlo Sampling for Regret Minimization in Extensive Form Games. In Proceedings of the 3rd Interna - Extensive Games. In Proceedings of the 23rd Annual Con - tional Workshop on Internet and Network Economics (WINE ference on Neural Information Processing Systems (NIPS ’07) . Berlin: Springer-Verlag. 2009), 1078–1086. Cambridge, MA: The MIT Press. Gilpin, A.; Peña, J.; and Sandholm, T. 2008. First-Order Letchford, J., and Conitzer, V. 2010. Computing Optimal Algorithm with O(log(1/ e)) Convergence for e-Equilibri - Strategies to Commit to in Extensive-Form Games. In Pro - um in Games. In Proceedings of the 23rd AAAI Conference ceedings of the 11th ACM Conference on Electronic Com - on Artificial Intelligence (AAAI-08) . Menlo Park, CA: AAAI merce, 83–92. New York: Association for Computing Press. Machinery. Gilpin, A.; Sandholm, T.; and Sørensen, T. B. 2008. A Milgrom, P., and Roberts, J. 1996. Coalition-Proofness Heads-Up No-Limit Texas Hold’em Poker Player: Dis - and Correlation with Arbitrary Communication Possibil - cretized Betting Models and Automatically Generated ities. Games and Economic Behavior 17(1): 113–128. Equilibrium-Finding Programs. In Proceedings of the Sev - Miltersen, P. B., and Sørensen, T. B. 2010. Computing a enth International Conference on Autonomous Agents and Quasi-Perfect Equilibrium of a Two-Player Game. Eco - MultiAgent Systems (AAMAS 2008). Richland, SC: Interna - nomic Theory 42(1): 175–192. tional Foundation for Autonomous Agents and Multia - Miltersen, P. B., and Sørensen, T. B. 2008. Fast Algorithms gent Systems. for Finding Proper Strategies in Game Trees. In Proceed - Gilpin, A.; Sandholm, T.; and Sørensen, T. B. 2007. Poten - ings of the 19th Annual ACM-SIAM Symposium on Discrete tial-Aware Automated Abstraction of Sequential Games, Algorithms (SODA), 874–883. Philadelphia, PA: Society for and Holistic Equilibrium Analysis of Texas Hold’em Pok - Industrial and Applied Mathematics.

WINTER 2010 31 Articles

Sklansky, D. 1999. The Theory of Poker, fourth edition. Las Vegas, NV: Two Plus Two Publishing. Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch, Please Support AAAI N.; Billings, D.; and Rayner, C. 2005. Bayes’ Bluff: Oppo - nent Modelling in Poker. In Proceedings of the 21st Annu - with Your Gift Today! al Conference on Uncertainty in Artificial Intelligence (UAI 2005), 550–558. Redmond, WA: AUAI Press. von Stengel, B. 1996. Efficient Computation of Behavior It is the generosity and loyalty of our members Strategies. Games and Economic Behavior 14(2): 220–246. that enables us to continue to provide the best Waugh, K.; Bard, N.; and Bowling, M. 2009. Strategy possible service to the AI community and pro - Grafting in Extensive Games. In Proceedings of the 23rd mote and further the science of artificial intel - Annual Conference on Neural Information Processing Systems (NIPS 2009). Cambridge, MA: The MIT Press ligence by sustaining the many and varied pro - Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D. grams that AAAI provides. Direct cash gifts are 2009a. Abstraction Pathologies in Extensive Games. In the most common form of giving and can be Proceedings of the 8th International Conference on unrestricted or, in some circumstances, desig - Autonomous Agents and MultiAgent Systems (AAMAS 2009). nated for a specific project or program. Cash Richland, SC: International Foundation for Autonomous gifts are tax-deductible to the full extent per - Agents and Multiagent Systems. mitted by law. You may donate online at Waugh, K.; Zinkevich, M.; Johanson, M.; Kan, M.; Schni - zlein, D.; and Bowling, M. 2009b. A Practical Use of www.aaai.org/donate or contact us at Imperfect Recall. In Proceedings of the 8th Symposium on [email protected]. Abstraction, Reformulation and Approximation (SARA2009). Menlo Park, CA: AAAI Press. Wellman, M. 2006. Methods for Empirical Game-Theo - retic Analysis (Extended Abstract). In Proceedings of the Miltersen, P. B., and Sørensen, T. B. 2007. A Near-Optimal 21st AAAI Conference on Artificial Intelligence (AAAI-2006), Strategy for a Heads-Up No-Limit Texas Hold’em Poker 1552–1555. Menlo Park, CA: AAAI Press. Menlo Park, CA: Tournament. In Proceedings of the Sixth International Con - AAAI Press. ference on Autonomous Agents and MultiAgent Systems Zinkevich, M.; Bowling, M.; Johanson, M.; and Piccione, (AAMAS 2007). Richland, SC: International Foundation C. 2007. Regret Minimization in Games with Incomplete for Autonomous Agents and Multiagent Systems. Information. In Proceedings of the Twenty-First Annual Con - ference on Neural Information Processing Systems (NIPS Miltersen, P. B., and Sørensen, T. B. 2006. Computing 2007). Cambridge, MA: The MIT Press. Proper Equilibria of Zero-Sum Games. In Proceedings of the 5th International Conference on Computers and Games. Lec - ture Notes in Computer Science. Berlin: Springer-Verlag. Tuomas Sandholm is a professor in the Computer Sci - Moreno, D., and Wooders, J. 1996. Coalition-Proof Equi - ence Department at Carnegie Mellon University. He has librium. Games and Economic Behavior 17(1): 80–112. published more than 400 papers on artificial intelligence, Nemhauser, G., and Wolsey, L. 1999. Integer and Combi - , electronic commerce, multiagent systems, natorial Optimization. John Wiley & Sons. auctions and exchanges, automated negotiation and con - Nesterov, Y. 2005a. Excessive Gap Technique in Non - tracting, coalition formation, voting, search and integer smooth Convex Minimization. SIAM Journal of Optimiza - programming, safe exchange, normative models of tion 16(1): 235–249. , resource-bounded reasoning, Nesterov, Y. 2005b. Smooth Minimization of NonSmooth machine learning, and networks. He has 20 years of expe - Functions. Mathematical Programming 103:127–152. rience building optimization-based electronic market - places and has fielded several of his systems. He was Ray, I. 1996. Coalition-Proof : A founder, chairman, and CTO/chief scientist of Com - Definition. Games and Economic Behavior 17(1): 56–79. bineNet, Inc. from 1997 until its acquisition in 2010. Romanovskii, I. 1962. Reduction of a Game with Com - During this period the company commercialized over plete Memory to a Matrix Game. Soviet Mathematics 3: 800 large-scale generalized combinatorial auctions, with 678–681. over $50 billion in total spending and over $6 billion in Sandholm, T. 2007. Perspectives on Multiagent Learning. generated savings. His technology also runs the nation - Artificial Intelligence 171(7): 382–391. wide kidney exchange. He received his Ph.D. and M.S. Schnizlein, D.; Bowling, M.; and Szafron, D. 2009. Prob - degrees in computer science from the University of Mas - abilistic State Translation in Extensive Games with Large sachusetts at Amherst in 1996 and 1994. He earned an Action Sets. In Proceedings of the 21st International Joint M.S. (B.S. included) with distinction in industrial engi - Conference on Artificial Intelligence (IJCAI-2009). Menlo neering and management science from the Helsinki Uni - Park, CA: AAAI Press. versity of Technology, Finland, in 1991. He is recipient of the NSF Career Award, the inaugural ACM Autonomous Shi, J., and Littman, M. 2002. Abstraction Methods for Agents Research Award, the Alfred P. Sloan Foundation Game Theoretic Poker. In Revised Papers from the Second Fellowship, the Carnegie Science Center Award for Excel - International Conference on Computers and Games, Lecture lence, and the Computers and Thought Award. He is Fel - Notes in Computer Science, 333–345. Berlin: Springer- low of the ACM and AAAI. Verlag.

32 AI MAGAZINE