Solving Games.Aimag11.Pdf

Articles The State of Solving Large Incomplete-Information Games, and Application to Poker Tuomas Sandholm n Game-theoretic solution concepts prescribe ame-theoretic solution concepts prescribe how rational how rational parties should act, but to become parties should act in multiagent settings. This is non - operational the concepts need to be accompa - trivial because an agent’s utility-maximizing strategy nied by algorithms. I will review the state of G generally depends on the other agents’ strategies. The most solving incomplete-information games. They famous solution concept for this is a Nash equilibrium: a strat - encompass many practical problems such as auctions, negotiations, and security applica - egy profile (one strategy for each agent) where no agent has tions. I will discuss them in the context of how incentive to deviate from her strategy given that others do not they have transformed computer poker. In short, deviate from theirs. game-theoretic reasoning now scales to many In this article I will focus on incomplete-information games, large problems, outperforms the alternatives on that is, games where the agents do not entirely know the state those problems, and in some games beats the of the game at all times. The usual way to model them is a game best humans. tree where the nodes (that is, states) are further grouped into information sets. In an information set, the player whose turn it is to move cannot distinguish between the states in the infor - mation set, but knows that the actual state is one of them. Incomplete-information games encompass most games of prac - tical importance, including most negotiations, auctions, and many applications in information security and physical battle. Such games are strategically challenging. A player has to rea - son about what others’ actions signal about their knowledge. Conversely, the player has to be careful about not signaling too much about her own knowledge to others through her actions. Such games cannot be solved using methods for complete-infor - mation games like checkers, chess, or Go. Instead, I will review new game-independent algorithms for solving them. Poker has emerged as a standard benchmark in this space (Shi and Littman 2002; Billings et al. 2002) for a number of reasons, because (1) it exhibits the richness of reasoning about a proba - bilistic future, how to interpret others’ actions as signals, and information hiding through careful action selection, (2) the game is unambiguously specified, (3) the game can be scaled to the desired complexity, (4) humans of a broad range of skill exist for comparison, (5) the game is fun, and (6) computers find interesting strategies automatically. For example, time-tested behaviors such as bluffing and slow play arise from the game-the - oretic algorithms automatically rather than having to be explic - itly programmed. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 WINTER 2010 13 Articles Original game Abstracted game Abstraction algorithm Custom algorithm for !nding a Nash equilibrium Nash Reverse model equilibrium Nash equilibrium Figure 1. Current Paradigm for Solving Large Incomplete-Information Games. Kuhn poker — a game with three cards — was Abstraction Algorithms among the first applications discussed in game the - ory, and it was solved analytically by hand (Kuhn Abstraction algorithms take as input a description 1950). On large-scale poker games, the best com - of the game and output a smaller but strategically puterized strategies for a long time were rule based. similar — or even equivalent — game. The abstrac - Nowadays, the best poker-playing programs are tion algorithms discussed here work with any generated automatically using algorithms that are finite number of players and do not assume a zero- based on game-theoretic principles. sum game. There has been tremendous progress on equilib - Information Abstraction rium-finding algorithms since 2005. Two-player zero-sum game trees with 10 12 leaves can now be The most popular kind of abstraction is informa - solved near optimally. However, many real games tion abstraction. The game is abstracted so that the are even larger. For example, two-player Limit agents cannot distinguish some of the states that Texas Hold’em poker has 10 18 leaves. For such large they can distinguish in the actual game. For exam - games, abstraction algorithms have emerged as ple in an abstracted poker hand, an agent is not practical preprocessors. able to observe all the nuances of the cards that she Most competitive poker-playing programs are would normally observe. nowadays generated using an abstraction algo - Lossless Information Abstraction. It turns out rithm followed by using a custom equilibrium- that it is possible to do lossless information finding algorithm to solve the abstracted game. See abstraction, which may seem like an oxymoron at figure 1. This paradigm was first used in Gilpin, first. The method I will describe (Gilpin and Sand - Sandholm, and Sørensen (2007). Predecessors of holm 2007b) is for a class of games that we call the paradigm included handcrafting small abstrac - games with ordered signals. It is structured, but still tions (Billings et al. 2003), as well as solving auto - general enough to capture a wide range of strategic matically generated abstractions with general-pur - situations. A game with ordered signals consists of pose linear programming algorithms (Gilpin and a finite number of rounds. Within a round, the Sandholm 2006; 2007a; 2007b). players play a game on a directed tree (the tree can In this article I will discuss abstraction algo - be different in different rounds). The only uncer - rithms first and equilibrium-finding algorithms tainty players face stems from private signals the second. After that I will address opponent exploita - other players have received and from the unknown tion and other topics. future signals. In other words, players observe each 14 AI MAGAZINE Articles An yNash equilibriu mofth eshrunkengamecor respond s toa Nash equilibriu mofth eoriginalgame. Theorem 1. (Gilpin and Sandholm 2007b.) others’ actions, but potentially not nature’s applied iterated elimination of dominated strate - actions. In each round, there can be public signals gies, which further reduced this to 1,190,443 rows (announced to all players) and private signals (con - and 1,181,084 columns. GameShrink required less fidentially communicated to individual players). than one second to run. Then, using a 1.65 GHz We assume that the legal actions that a player has IBM eServer p5 570 with 64 gigabytes of RAM (the are independent of the signals received. For exam - LP solver actually needed 25 gigabytes), we solved ple, in poker, the legal betting actions are inde - the resulting LP in 8 days using the interior-point pendent of the cards received. Finally, the barrier method of CPLEX version 9.1.2. In sum - strongest assumption is that there is a total order - mary, we found an exact solution to a game with ing among complete sets of signals, and the pay - 3.1 billion nodes in the game tree (the largest offs are increasing (not necessarily strictly) in this incomplete-information game that had been ordering. In poker, this ordering corresponds to the solved previously had 140,000 (Koller and Pfeffer ranking of card hands. 1997)). To my knowledge, this is still the largest The abstraction algorithm operates on the sig - incomplete-information game that has been nal tree, which is the game tree with all the agents’ solved exactly. 1 action edges removed. We say that two sibling Lossy Information Abstraction. Some games are nodes in the signal tree are ordered game isomorphic so large that even after applying the kind of lossless if (1) if the nodes are leaves, the payoff vectors of abstraction described above, the resulting LP the players (which payoff in the vector material - would be too large to solve. To address this prob - izes depends on how the agents play) are the same lem, such games can be abstracted more aggres - at both nodes, and (2) if the nodes are interior sively, but this incurs loss in solution quality. nodes, there is a bipartite matching of the nodes’ One approach is to use a lossy version of children so that only ordered game isomorphic GameShrink where siblings are considered ordered children get matched. game isomorphic if their children can be approxi - The GameShrink algorithm is a bottom-up mately matched in the bipartite matching part of dynamic program that merges all ordered game the algorithm (Gilpin and Sandholm 2007b; 2006). isomorphic nodes. It runs in Õ( n2) time, where n However, lossy GameShrink suffers from three is the number of nodes in the signal tree. Game - drawbacks. Shrink tends to run in sublinear time and space in First, the resulting abstraction can be highly the size of the game tree because the signal tree is inaccurate because the grouping of states into significantly smaller than the game tree in most buckets is, in a sense, greedy. For example, if lossy nontrivial games. The beautiful aspect of this GameShrink determines that hand A is similar to method is that it is lossless (theorem 1). A small hand B, and then determines that hand B is simi - example run of GameShrink is shown in figure 2. lar to hand C, it will group A and C together, We applied GameShrink to Rhode Island despite the fact that A and C may not be very sim - Hold’em poker (Gilpin and Sandholm 2007b). ilar. The quality of the abstraction can be even That two-player game was invented as a testbed worse when a longer sequence of such compar - for computational game playing (Shi and Littman isons leads to grouping together extremely differ - 2002). Applying the sequence form to Rhode ent hands.

Solving Games.Aimag11.Pdf

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support