<<

Characterizing Solution Concepts in Games Using Knowledge-Based Programs

Joseph Y. Halpern∗ Yoram Moses Computer Science Department Department of Electrical Engineering Cornell University, U.S.A. Technion—Israel Institute of Technology e-mail: [email protected] 32000 Haifa, Israel email: [email protected]

Abstract x 0

We show how solution concepts in games such as .5 .5 , , rational- z S x xzS 1 izability, and can be given 0 1 2 a uniform definition in terms of knowledge-based 2 2 programs. Intuitively, all solution concepts are im- plementations of two knowledge-based programs, B B one appropriate for games represented in normal form, the other for games represented in extensive form. These knowledge-based programs can be x 3 x4 viewed as embodying rationality. The representa- L R L R tion works even if (a) information sets do not cap- ture an agent’s knowledge, (b) uncertainty is not zz2 3 z 4 z 5 represented by probability, or (c) the underlying 346 2 game is not . Figure 1: A game of imperfect recall. 1 Introduction a payoff of 2, or continue, by playing move B. If he contin- Game theorists represent games in two standard ways: in ues, he gets a high payoff if he matches nature’s move, and a normal form, where each agent simply chooses a , low payoff otherwise. Although he originally knows nature’s and in extensive form, using game trees, where the agents move, the information set that includes the nodes labeled x3 make choices over time. An extensive-form representation and x4 is intended to indicate that the player forgets whether has the advantage that it describes the dynamic structure of nature moved left or right after moving B. Intuitively, when the game—it explicitly represents the sequence of decision he is at the information set X, the agent is not supposed to problems encountered by agents. However, the extensive- know whether he is at x3 or at x4. form representation purports to do more than just describe It is not hard to show that the strategy that maximizes ex- the structure of the game; it also attempts to represent the pected utility chooses action S at node x1, action B at node information that players have in the game, by the use of in- x2, and action R at the information set X consisting of x3 formation sets. Intuitively, an information set consists of a and x4. Call this strategy f.Letf be the strategy of choos- set of nodes in the where a player has the same ing action B at x1, action S at x2,andL at X. Piccione and [ ] information. However, as Halpern 1997 has pointed out, Rubinstein argue that if node x1 is reached, the player should information sets may not adequately represent a player’s in- reconsider, and decide to switch from f to f .AsHalpern formation. points out, this is indeed true, provided that the player knows Halpern makes this point by considering the following at each stage of the game what strategy he is currently using. single-agent game of imperfect recall, originally presented by However, in that case, if the player is using f at the infor- [ ] Piccione and Rubinstein 1997 : The game starts with nature mation set, then he knows that he is at node x4;ifhehas / moving either left or right, each with probability 1 2.The switched and is using f , then he knows that he is at x3.So, S agent can then either stop the game (playing move ) and get in this setting, it is no longer the case that the player does not ∗ know whether he is at x3 or x4 in the information set; he can Supported in part by NSF under grants CTC-0208535 and ITR- 0325453, by ONR under grant N00014-02-1-0455, by the DoD infer which state he is at from the strategy he is using. Multidisciplinary University Research Initiative (MURI) program In , a strategy is taken to be a function from in- administered by the ONR under grants N00014-01-1-0795 and formation sets to actions. The intuition behind this is that, N00014-04-1-0725, and by AFOSR under grant F49620-02-1-0101. since an agent cannot tell the nodes in an information set

IJCAI-07 1300 apart, he must do the same thing at all these nodes. But this express with the formula doi(S)), and she believes that she example shows that if the agent has imperfect recall but can would not do any better with another strategy, then she should switch strategies, then he can arrange to do different things indeed go ahead and run S. This test can be viewed as em- at different nodes in the same information set. As Halpern bodying rationality. There is a subtlety in expressing the [1997] observes, ‘ “situations that [an agent] cannot distin- statement “she would not do any better with another strat- guish” and “nodes in the same information set” may be two egy”. We express this by saying “if her expected utility, given quite different notions.’ He suggests using the game tree to that she will use strategy S,isx, then her expected utility if describe the structure of the game, and using the runs and sys- she were to use strategy S is at most x.” The “if she were tems framework [Fagin et al., 1995] to describe the agent’s to use S”isacounterfactual statement. She is planning information. The idea is that an agent has an internal local to use strategy S, but is contemplating what would happen state that describes all the information that he has. A strat- if she were to do something counter to fact, namely, to use egy (or protocol in the language of [Fagin et al., 1995])isa S. Counterfactuals have been the subject of intense study function from local states to actions. Protocols capture the in the philosophy literature (see, for example, [Lewis, 1973; intuition that what an agent does can depend only what he Stalnaker, 1968]) and, more recently, in the game theory lit- knows. But now an agent’s knowledge is represented by its erature (see, for example, [Aumann, 1995; Halpern, 2001; local state, not by an information set. Different assumptions Samet, 1996]). We write the counterfactual “If A were the about what agents know (for example, whether they know case then B would be true” as “A B”. Although this state- their current strategies) are captured by running the same pro- ment involves an “if . . . then”, the semantics of the counter- tocol in different contexts. If the information sets appropri- factual implication A B is quite different from the material ately represent an agent’s knowledge in a game, then we can implication A ⇒ B. In particular, while A ⇒ B is true if A identify local states with information sets. But, as the exam- is false, A B might not be. ple above shows, we cannot do this in general. With this background, consider the following kb program A number of solution concepts have been considered in the for player i: game-theory literature, ranging from Nash equilibrium and for each strategy S ∈Si(Γ) do correlated equilibrium to refinements of Nash equilibrium if Bi(doi(S) ∧∀x(EUi = x ⇒ such as sequential equilibrium and weaker notions such as S∈S (Γ)(doi(S ) (EUi ≤ x)))) then S. (see [Osborne and Rubinstein, 1994] for an i overview). The fact that game trees represent both the game This kb program is meant to capture the intuition above. In- i and the players’ information has proved critical in defining tuitively, it says that if player believes that she is about to S x solution concepts in extensive-form games. Can we still rep- perform strategy and, if her expected utility is ,thenif S resent solution concepts in a useful way using runs and sys- she were to perform another strategy , then her expected utility would be no greater than x, then she should perform tems to represent a player’s information? As we show here, Γ not only can we do this, but we can do it in a way that gives strategy S. Call this kb program EQNF (with the indi- Γ deeper insight into solution concepts. Indeed, all the standard vidual instance for player i denoted by EQNFi ). As we Γ solution concepts in the literature can be understood as in- show, if all players follow EQNF , then they end up play- stances of a single knowledge-based (kb) program [Fagin et ing some type of equilibrium. Which type of equilibrium they al., 1995; 1997], which captures the underlying intuition that play depends on the context. Due to space considerations, a player should make a , given her beliefs. The we focus on three examples in this abstract. If the players differences between solution concepts arise from running the have a common prior on the joint strategies being used, and kb program in different contexts. this common prior is such that players’ beliefs are indepen- In a kb program, a player’s actions depend explicitly on dent of the strategies they use, then they play a Nash equilib- the player’s knowledge. For example, a kb program could rium. Without this independence assumption, we get a cor- have a test that says “If you don’t know that Ann received the related equilibrium. On the other hand, if players have pos- information, then send her a message”, which can be written sibly different priors on the space of strategies, then this kb program defines rationalizable strategies [Bernheim, 1984; if ¬Bi(Ann received info) then send Ann a message. Pearce, 1984]. This kb program has the form of a standard if ...then state- To deal with extensive-form games, we need a slightly dif- ment, except that the test in the if clause is a test on i’s knowl- ferent kb program, since agents choose moves, not strategies. Γ edge (expressed using the modal operator Bi for belief; see Let EQEFi be the following program, where a ∈ PM de- Section 2 for a discussion of the use of knowledge vs. belief). notes that a is a move that is currently possible. Using such tests for knowledge allows us to abstract away for each move a ∈ PM do from low-level details of how the knowledge is obtained. Kb if Bi(doi(a) ∧∀x((EUi = x) ⇒ programs have been applied to a number of problems in the (doi(a ) (EUi ≤ x)))) then a. [ ] a ∈PM computer science literature (see Fagin et al., 1995 and the Γ references therein). To see how they can be applied to under- Just as EQNF characterizes equilibria of a game Γ repre- Γ stand equilibrium, given a game Γ in normal form, let Si(Γ) sented in normal form, EQEF characterizes equilibria of consist of all the pure strategies for player i in Γ. Roughly a game represented in extensive form. We give one example speaking, we want a kb program that says that if player i here: sequential equilibrium. To capture sequential equilib- believes that she is about to perform strategy S (which we rium, we need to assume that information sets do correctly

IJCAI-07 1301 describe an agent’s knowledge. If we drop this assumption, i cannot tell apart, player i must do the same thing. A joint however, we can distinguish between the two equilibria for strategy S =(S1,...,Sn) for the players determines a distri- the game described in Figure 1. bution over paths in the game tree. A normal-form game can All these solution concepts are based on expected utility. be viewed as a special case of an extensive-form game where But we can also consider solution concepts based on other each player makes only one move, and all players move si- decision rules. For example, Boutilier and Hyafil [2004] multaneously. consider -regret equilibria, where each player uses a strategy that is a best-response in a minimax-regret sense to 2.2 Protocols, Systems, and Contexts the choices of the other players. Similarly, we can use max- To explain kb programs, we must first describe standard pro- imin equilibria [Aghassi and Bertsimas, 2006]. As pointed tocols. We assume that, at any given point in time, a player in out by Chu and Halpern [2003], all these decision rules can agameisinsomelocal state. The local state could include be viewed as instances of a generalized notion of expected the history of the game up to this point, the strategy being utility, where uncertainty is represented by a plausibility mea- used by the player, and perhaps some other features of the sure, a generalization of a probability measure, utilities are player’s type, such as beliefs about the strategies being used elements of an arbitrary partially ordered space, and plausi- ⊕ ⊗ by other players. A global state is a tuple consisting of a local bilities and utilities are combined using and , generaliza- state for each player. tions of + and ×. We show in the full paper that, just by inter- i i u A protocol for player is a function from player ’s lo- preting “EUi = ” appropriately, we can capture these more cal states to actions. For ease of exposition, we consider exotic solution concepts as well. Moreover, we can capture only deterministic protocols, although it is relatively straight- solution concepts in games where the game itself is not com- forward to model randomized protocols—corresponding to mon knowledge, or where agents are not aware of all moves mixed strategies—as functions from local states to distribu- [ ] available, as discussed by Halpern and Rˆego 2006 . tions over actions. Although we restrict to deterministic pro- Our approach thus provides a powerful tool for represent- tocols, we deal with mixed strategies by considering distrib- ing solution concepts, which works even if (a) information utions over pure strategies. sets do not capture an agent’s knowledge, (b) uncertainty is A run is a sequence of global states; formally, a run is not represented by probability, or (c) the underlying game is a function from times to global states. Thus, r(m) is the not common knowledge. global state in run r at time m.Apoint is a pair (r, m) The rest of this paper is organized as follows. In Sec- consisting of a run r and time m.Letri(m) be i’s local tion 2, we review the relevant background on game theory state at the point (r, m);thatis,ifr(m)=(s1,...,sn),then and knowledge-based programs. In Section 3, we show that r m s Γ Γ i( )= i.Ajoint protocol is an assignment of a protocol EQNF and EQEF characterize Nash equilibrium, cor- for each player; essentially, a joint protocol is a joint strat- related equilibrium, rationalizability, and sequential equilib- egy. At each point, a joint protocol P performs a joint ac- rium in a game Γ in the appropriate contexts. We conclude tion (P1(r1(m)),...,Pn(rn(m))), which changes the global in Section 4 with a discussion of how our results compare to P other characterizations of solution concepts. state. Thus, given an initial global state, a joint protocol generates a (unique) run, which can be thought of as an ex- ecution of P . The runs in a normal-form game involve only 2 Background one round and two time steps: time 0 (the initial state) and In this section, we review the relevant background on games time 1, after the joint strategy has been executed. (We as- and knowledge-based programs. We describe only what we sume that the payoff is then represented in the player’s local need for proving our results. The reader is encouraged to con- state at time 1.) In an extensive-form game, a run is again sult [Osborne and Rubinstein, 1994] for more on game the- characterized by the strategies used, but now the length of the ory, [Fagin et al., 1995; 1997] for more on knowledge-based run depends on the path of play. programs without counterfactuals, and [Halpern and Moses, A probabilistic system is a tuple PS =(R,μ),whereR is 2004] for more on adding counterfactuals to knowledge- a set of runs and μ =(μ1,...,μn) associates a probablity μi based programs. on the runs of R with each player i. Intuitively, μi represents player i’s prior beliefs. In the special case where μ1 = ···= 2.1 Games and Strategies μn = μ, the players have a common prior μ on R.Inthis Agameinextensive form is described by a game tree. Asso- case, we write just (R,μ). ciated with each non-leaf node or history is either a player— We are interested in the system corresponding to a joint the player whose move it is at that node—or nature (which protocol P. To determine this system, we need to describe the can make a randomized move). The nodes where a player i setting in which P is being executed. For our purposes, this moves are further partitioned into information sets. With each setting can be modeled by a set G of global states, a subset G0 run or maximal history h in the game tree and player i we can of G that describes the possible initial global states, a set As associate i’s utility, denoted ui(h), if that run is played. A of possible joint actions at each global state s,andn probabil- strategy for player i is a (possibly randomized) function from ity measures on G0, one for each player. Thus, a probabilistic i i 1 ’s information sets to actions. Thus a strategy for player context is a tuple γ =(G, G0, {As : s ∈G},μ). A joint tells player i what to do at each node in the game tree where i is supposed to move. Intuitively, at all the nodes that player 1We are implicitly assuming that the global state that results from

IJCAI-07 1302 protocol P is appropriate for such a context γ if, for every 2.3 Knowledge-Based Programs global state s, the joint actions that P can generate are in As. A knowledge-based program is a syntactic object. For our When P is appropriate for γ, we abuse notation slightly and purposes, we can take a knowledge-based program for player i refer to γ by specifying only the pair (G0,μ). A protocol P to have the form and a context γ for which P is appropriate generate a sys- if κ1 then a1 tem; the system depends on the initial states and probability if κ2 then a2 measures in γ. Since these are all that matter, we typically ..., G simplify the description of a context by omitting the set of where each κj is a Boolean combination of formulas of the global states and the sets As of global actions. Let R(P,γ) form Biϕ,inwhichtheϕ’s can have nested occurrences of B denote the system generated by joint protocol P in context γ. operators and counterfactual implications. We assume that κ ,κ ,... If γ =(G0,μ),thenR(P,γ)=(R,μ ),whereR consists of the tests 1 2 are mutually exclusive and exhaustive, so r s ∈G r that exactly one will evaluate to true in any given instance. atherun s for each initial state 0,where s is the run Γ EQNF generated by P when started in state s,andμ (rs)=μi(s), The program i can be written in this form by simply i for ... do for i =1,...,n. replacing the statement by one line for each pure Γ A probabilistic system (R,μ ) is compatible with a con- strategy in Si(Γ); similarly for EQEFi . text γ =(G0,μ) if (a) every initial state in G0 is the initial We want to associate a protocol with a kb program. Unfor- stateofsomeruninR, (b) every run is the run of some pro- tunately, we cannot “execute” a kb program as we can a pro- tocol appropriate for γ,and(c)ifR(s) is the set of runs in tocol. How the kb program executes depends on the κ R with initial global state s,thenμj(R(s)) = μj(s),for of tests j. Since the tests involve beliefs and counterfactuals, we need to interpret them with respect to a system. The idea j =1,...,n. Clearly R(P,γ ) is compatible with γ. is that a kb program Pg for player i and a probabilistic sys- We can think of the context as describing background in- i tem PS together determine a protocol P for player i.Rather formation. In distributed-systems applications, the context than giving the general definitions (which can be found in also typically includes information about message delivery. [Halpern and Moses, 2004]), we just show how they work in For example, it may determine whether all messages sent are the two kb programs we consider in this paper: EQNF and received in one round, or whether they may take up to, say, EQEF. five rounds. Moreover, when this is not obvious, the context Given a system PS =(R,μ), we associate with each for- specifies how actions transform the global state; for exam- mula ϕ aset[[ ϕ]] PS of points in PS. Intuitively, [[ ϕ]] PS is the ple, it describes what happens if in the same joint action two set of points of PS where the formula ϕ is true. We need a players attempt to modify the same memory cell. Since such little notation: issues do not arise in the games we consider, we ignore these facets of contexts here. For simplicity, we consider only con- • If E is a set of points in PS,letR(E) denote the set texts where each initial state corresponds to a particular joint of runs going through points in E;thatisR(E)={r : Γ ∃m r, m ∈ E } strategy of Γ.Thatis,Σi is a set of local states for player i (( ) ) . Γ indexed by (pure) strategies. The set Σi can be viewed as de- • Let Ki(r, m) denote the set of points that i cannot dis- scribing i’s types; the state sS can the thought of as the initial tinguish from (r, m): Ki(r, m)={(r ,m):(ri(m )= state where player i’s type is such that he plays S (although ri(m)}. Roughly speaking, Ki(r, m) corresponds to i’s we stress that this is only intuition; player i does not have to information set at the point (r, m). Γ Γ Γ play S at the state sS). Let G =Σ × ...× Σn. We will be 0 1 • Given a point (r, m) and a player i,letμ(i,r,m) be the interested in contexts where the set of initial global states is a i Γ probability measure that results from conditioning μ on subset G0 of G0 . In a normal-form game, the only actions pos- Ki(r, m), i’s information at (r, m). We cannot condi- sible for player i at an initial global state amount to choosing i tion on Ki r, m directly: μ is a probability measure a pure strategy, so the joint actions are joint strategies; no ac- ( ) on runs, and Ki(r, m) is a set of points. So we actu- tions are possible at later times. For an extensive-form game, ally condition, not on Ki r, m , but on R Ki r, m ,the the possible moves are described by the game tree. We say ( ) ( ( )) set of runs going through the points in Ki(r, m). Thus, that a context for an extensive-form game is standard if the i μi,r,m = μ |R(Ki(r, m)). (For the purposes of this ab- local states have the form (s, I),wheres is the initial state stract, we do not specify μi,r,m if μi R Ki r, m . and I is the current information set. In a standard context, ( ( ( ))) = 0 It turns out not to be relevant to our discussion.) an agent’s knowledge is indeed described by the information set. However, we do not require a context to be standard. The kb programs we consider in this paper use a limited For example, if an agent is allowed to switch strategies, then collection of formulas. We now can define [[ ϕ]] PS for the the local state could include the history of strategies used. In formulas we consider that do not involve counterfactuals. such a context, the agent in the game of Figure 1 would know • In a system PS corresponding to a normal-form game more than just what is in the information set, and would want Γ,ifS ∈Si(Γ),then[[ doi(S)]]PS is the set of initial to switch strategies. points (r, 0) such that player i uses strategy S in run r. • PS performing a joint action in As at the global state s is unique and ob- Similarly, if corresponds to an extensive-form game, vious; otherwise, such information would also appear in the context, then [[ doi(a)]]PS is the set of points (r, m) of PS at as in the general framework of [Fagin et al., 1995]. which i performs action a.

IJCAI-07 1303 • Player i believes a formula ϕ at a point (r, m) if the evaluate formulas only with respect to complete systems. In event corresponding to formula ϕ has probability 1 ac- a complete system PS,wedefine[[ doi(S) ϕ]] PS to consist cording to μi,r,m.Thatis,(r, m) ∈ [[ Biϕ]] PS if of all the points (r, m) such that the closest point (r ,m) to μi(R(Ki(r, m)) =0 (so that conditioning on Ki(r, m) (r, m) where i uses strategy S is in [[ ϕ]] PS. The definition is defined) and μi,r,m([[ϕ]] PS ∩Ki(r, m)) = 1. of [[ doi(a) ϕ]] PS is similar. We say that a complete sys- R,μ R,μ μ μ R • r tem ( ) extends ( ) if j and j agree on (so that With every run in the systems we consider, we can as- μ (A)=μj(A))forallA ⊆R)forj =1,...,n. sociate the joint (pure) strategy S used in r.2 This pure j Since each formula κ that appears as a test in a kb program strategy determines the history in the game, and thus de- Pgi for player i is a Boolean combination of formulas of the termines player i’s utility. Thus, we can associate with form Biϕ, it is easy to check that if (r, m) ∈ [[ κ]] PS,then every point (r, m) player i’s expected utility at (r, m), Ki(r, m) ⊆ [[ κ]] PS. In other words, the truth of κ depends where the expectation is taken with respect to the prob- only on i’s local state. Moreover, since the tests are mutually ability μi,r,m.Ifu is a real number, then [[ E U i = u]] PS i u exclusive and exhaustive, exactly one of them holds in each is the set of points where player ’s expected utility is ; PS PS [[ E U i ≤ u]] PS is defined similarly. local state. Given a system , we take the protocol Pgi PS  r, m PS • ϕ x ∀ to be such that Pgi ( )=aj if, for some point ( ) in Assume that ( ) has no occurrences of .Then with ri(m)=,wehave(r, m) ∈ [[ κj]] PS.Sinceκ1,κ2,... [[ ∀xϕ(x))]]PS = ∩a∈IR[[ ϕ[x/a]]]PS,whereϕ[x/a] is x ϕ a are mutually exclusive and exhaustive, there is exactly one the result of replacing all occurrences of in by . action aj with this property. That is, ∀x is just universal quantification over x,where x We are mainly interested in protocols that implement a kb ranges over the reals. This quantification arises for us P x ∀xϕ x program. Intuitively, a joint protocol implements a kb when represents a utility, so that ( ) is saying that ϕ program Pg in context γ if P performs the same actions as holds for all choices of utility. Pg in all runs of P that have positive probability, assuming We now give the semantics of formulas involving coun- that the knowledge tests in Pg are interpreted with respect terfactuals. Here we consider only a restricted class of such formulas, those where the counterfactual only occurs in the to the complete system PS extending R(P,γ). Formally, form doi(S) ϕ, which should be read as “if i were to use a joint protocol P (de facto) implements a joint kb program S ϕ S ϕ strategy ,then would be true. Intuitively, doi( ) is Pg [Halpern and Moses, 2004] in a context γ =(G0,μ) if true at a point r, m if ϕ holds in a world that differs from PS ( ) Pi()=Pgi () for every local state  = ri(m) such that (r, m) only in that i uses the strategy S.Thatis,doi(S) ϕ r ∈ R(P,γ) and μi(r) =0 ,wherePS is the complete sys- is true at (r, m) if ϕ is true at the point (r ,m) where, in R P,γ run r ,playeri uses strategy S and all the other players tem extending ( ). We remark that, in general, there use the same strategy that they do at (r, m). (This can be may not be any joint protocols that implement a kb program viewed as an instance of the general semantics for coun- in a given context, there may be exactly one, or there may be terfactuals used in the philosophy literature [Lewis, 1973; more than one (see [Fagin et al., 1995] for examples). This Stalnaker, 1968] where ψ ϕ is taken to be true at a world is somewhat analogous to the fact that there may not be any w if ϕ is true at all the worlds w closest to w where ψ is equilibrium of a game for some notions of equilibrium, there true.) Of course, if i actually uses strategy S in run r,then may be one, or there may be more than one. r = r. Similarly, in an extensive-form game Γ, the closest point to (r, m) where doi(a ) is true (assuming that a is an 3 The Main Results action that i can perform in the local state ri(m)) is the point P nf Fix a game Γ in normal form. Let i be the protocol that, (r ,m) where all players other than player i use the same pro- Γ nf in initial state sS ∈ Σ , chooses strategy S;letP = tocol in r and r,andi’s protocol in r agrees with i’s protocol i P nf ,...,Pnf in r except at the local state ri(m),wherei performs action ( 1 n ).LetSTRATi be the random variable on s a . Thus, r is the run that results from player i making a sin- initial global states that associates with an initial global state i r gle deviation (to a at time m) from the protocol she uses in player ’s strategy in . As we said, Nash equilibrium arises γ G ,μ r, and all other players use the same protocol as in r. in contexts with a common prior. Suppose that =( 0 ) is a context with a common prior. We say that μ is compatible There is a problem with this approach. There is no guar- antee that, in general, such a closest point (r,m) exists in with the mixed joint strategy S if μ is the probability on pure the system PS. To deal with this problem, we restrict atten- joint strategies induced by S (under the obvious identification tion to a class of systems where this point is guaranteed to of initial global states with joint strategies). exist. A system R,μ is complete with respect to context γ ( ) S if R includes every run generated by a protocol appropriate Theorem 3.1: The joint strategy is a Nash equilibrium of γ r,m the game Γ iff there is a common prior probability measure for context . In complete systems, the closest point ( ) μ GΓ ,..., is guaranteed to exist. For the remainder of the paper, we on 0 such that STRAT1 STRATn are independent with respect to μ, μ is compatible with S, and P nf imple- 2 Γ Γ If we allow players to change strategies during a run, then we ments EQNF in the context (G0 ,μ). will in general have different joint strategies at each point in a run. For our theorems in the next section, we restrict to contexts where Proof: Suppose that S is a (possibly mixed strategy) Nash players do not change strategies. equilibrium of the game Γ.LetμS be the unique probability

IJCAI-07 1304 Γ on G0 compatible with S.IfS is played, then the proba- recommendations according to μ. (Note that, as in our ex- bility of a run where the pure joint strategy (T1,...,Tn) is ample, if a mediator chooses a joint strategy (S1,...,Sn) ac- played is just the product of the probabilities assigned to Ti cording to μ, the mediator recommends Si to player i;player by Si,soSTRAT1,...,STRATn are independent with re- i is not told the joint strategy.) We omit the formal definition nf Γ [ ] spect to μS . To see that P implements EQNF in the of correlated equilibrium (due to Aumman 1974 ) here; how- Γ ever, we stress that a correlated equilibrium is a distribution context γ =(G0 ,μS ),let = ri(0) be a local state such that r R P nf ,γ μ r  s P nf  T over (pure) joint strategies. We can easily capture correlated = ( ) and ( ) =0.If = T ,then i ( )= , equilibrium using EQNF. so T must be in the support of Si. Thus, T must be a best μ S j i Theorem 3.2: The distribution on joint strategies is a corre- response to −i, the joint strategy where each player = nf Γ lated equilibrium of the game Γ iff P implements EQNF plays its component of S.Sincei uses strategy T in r,thefor- Γ in the context (G0 ,μ). mula Bi(doi(T )) holds at (r, 0) iff T = T . Moreover, since T is a best response, if u is i’s expected utility with the joint Both Nash equilibrium and correlated equilibrium require strategy S, then for all T , the formula doi(T ) (EUi ≤ u) a common prior on runs. By dropping this assumption, we get Γ PS another standard : rationalizability [Bern- holds at (r, 0). Thus, (EQNF ) ()=T ,wherePS is i heim, 1984; Pearce, 1984]. Intuitively, a strategy for player R P nf ,γ the complete system extending ( ). It follows that i is rationalizable if it is a best response to some beliefs that nf Γ P implements EQNF . player i may have about the strategies that other players are For the converse, suppose that μ is a common prior proba- following, assuming that these strategies are themselves best Γ bility measure on G0 , STRAT1,...,STRATn are indepen- responses to beliefs that the other players have about strate- dent with respect to μ, μ is compatible with S,andP nf im- gies that other players are following, and so on. To make Γ plements EQNF in the context γ =(GΓ,μ).Wewantto this precise, we need a little notation. Let S−i =Πj=iSj. 0 show that S is a Nash equilibrium. It suffices to show that Let ui(S) denote player i’s utility if the strategy tuple S is i each pure strategy T in the support of Si is a best response played. We describe player ’s beliefs about what strategies μi S−i to S−i.Sinceμ is compatible with S, there must be a run r the other players are using by a probability on .A μ r > r s i T strategy S for player i is a best response to beliefs described such that ( ) 0 and i(0) = T (i.e., player chooses in  nf Γ by a probability μi on S−i(Γ) if ui(S, T )μi(T ) ≥ run r). It since P implements EQNF , and in the context  T ∈S−i Γ γ, EQNF ensures that no deviation from T can improve T∈S ui(S , T )μi(T ) for all S ∈Si. Following Osborne −i i’s expected utility with respect to S−i, it follows that T is and Rubinstein [1994], we say that a strategy S for player i indeed a best response. in game Γ is rationalizable if, for each player j,thereisa set Zj ⊆Sj(Γ) and, for each strategy T ∈ Zj, a probability As is well known, players can sometimes achieve better measure μj,T on S−j(Γ) whose support is Z−j such that outcomes than a Nash equilibrium if they have access to a • S ∈ Z helpful mediator. Consider the simple 2-player game de- i;and scribed in Figure 2, where Alice, the row player, must choose • for each player j and strategy T ∈ Zj, T is a best re- between top and bottom (T and B), while Bob, the column sponse to the beliefs μj,T . player, must choose between left and right (L and R): For ease of exposition, we consider only pure rationaliz- able strategies. This is essentially without loss of generality. LR It is easy to see that a mixed strategy S for player i is a best T , , (3 3) (1 4) response to some beliefs μi of player i iff each pure strategy B , , (4 1) (0 0) in the support of S is a best response to μi. Moreover, we can assume without loss of generality that the support of μi Figure 2: A simple 2-player game. consists of only pure joint strategies. Theorem 3.3: A pure strategy S for player i is rationalizable It is not hard to check that the best Nash equilibrium for iff there exist probability measures μ1,...,μn,asetG0 ⊆ Γ nf nf this game has Alice randomizing between T and B,andBob G0 , and a state s ∈G0 such that Pi (si)=S and P Γ randomizing between L and R; this gives each of them ex- implements EQNF in the context (G0,μ). pected utility 2. They can do better with a trusted mediator, Γ Proof: First, suppose that P nf implements EQNF in con- who makes a recommendation by choosing at random be- G ,μ s ∈G tween (T,L), (T,R),and(B,L). This gives each of them text ( 0 ). We show that for each state 0 and player i S P nf s Z expected utility 8/3.Thisisacorrelated equilibrium since, , the strategy s,i = i ( i) is rationalizable. Let i = for example, if the mediator chooses (T,L), and thus sends {Ss,i : s ∈G0}.ForS ∈ Zi,letE(S)={s ∈G0 : si = sS}; recommendation T to Alice and L to Bob, then Alice con- that is, E(S) consists consists of all initial global states where siders it equally likely that Bob was told L and R, and thus player i’s local state is sS;letμi,S = μi(·|E(S)) (under the has no incentive to deviate; similarly, Bob has no incentive to obvious identification of global states in G0 with joint strate- Γ deviate. In general, a distribution μ over pure joint strategies gies). Since P nf implements EQNF , it easily follows that is a correlated equilibrium if players cannot do better than S best response to μi,S . Hence, all the strategies in Zi are following a mediator’s recommendation if a mediator makes rationalizable, as desired.

IJCAI-07 1305 For the converse, let Zi consist of all the pure rationaliz- shown in Figure 3, the unique joint strategy in a sequential able strategies for player i. It follows from the definition of equilibrium has A choosing acrossA and B choosing downB. rationalizability that, for each strategy S ∈ Zi, there exists The main difficulty in defining sequential equilibrium lies a probability measure μi,S on Z−i such that S is a best re- in capturing the intuition of best response in information sets sponse to μi,S .ForasetZ of strategies, we denote by Z˜ the that are reached with probability 0. To deal with this, a se- S,β set {sT : T ∈ Z}.SetG0 = Z˜1 × ...× Z˜n, and choose some quential equilibrium is defined to be a pair ( ), consist- measure μi on G0 such that μi(·|E(S)) = μi,S for all S ∈ ing of a joint strategy S and a belief system β, which as- Zi μi αS μi,S αS ∈ , I β I . (We can take = S∈Zi ,where (0 1) sociates with every information set a probability ( ) on  nf I αS P sS S the histories in . There are a number of somewhat subtle and S∈Zi =1.) Recall that i ( )= for all states sS. It immediately follows that, for every rationalizable joint consistency conditions on these pairs pairs; we omit them here due to lack of space (see [Kreps and Wilson, 1982; strategy S =(S1,...,Sn), both s =(sS1 ,...,sSn ) ∈G0, Osborne and Rubinstein, 1994] for details). Our result de- S P nf s G and = ( ). Since the states in 0 all correspond to pends on a recent characterization of sequential equilibrium rationalizable strategies, and by definition of rationalizability [Halpern, 2006] that uses nonstandard probabilities, which S μ each (individual) strategy i is a best response to i,S ,itis can assign infinitesimal probabilities to histories. By assum- nf Γ easy to check that P implements EQNF in the context ing that every history gets positive (although possibly infini- Γ (G0 ,μ), as desired. tesimal) probability, we can avoid the problem of dealing with information sets that are reached with probaility 0. We remark that Osborne and Rubinstein’s definition of ra- r μ j To every nonstandard real number , there is a closest stan- tionalizability allows j,T to be such that believes that other dard real number denoted st (r), and read “the standard part players’ strategy choices are correlated. In most of the lit- of r”: |r − st (r) | is an infinitesimal. Given a nonstandard erature, players are assumed to believe that other players’ probability measure ν, we can define the standard probabil- choices are made independently. If we add that requirement, ity measure st (ν) by taking st (ν)(w)=st (ν(w)). A non- then we must impose the same requirement on the probability standard probability ν on G0 is compatible with joint strategy measures μ1,...,μn in Theorem 3.3. S if st (ν) is the probability on pure strategies induced by Up to now we have considered solution concepts for games S. When dealing with nonstandard probabilities, we general- in normal form. Perhaps the best-known solution concept for games in extensive form is sequential equilibrium [Kreps and ize the definition of implementation by requiring only that P Wilson, 1982]. Roughly speaking, a joint strategy S is a se- performs the same actions as Pg in runs r of P such that ν r > EU x Si S−i st ( )( ) 0. Moreover, the expression “ i = ”in quential equilibrium if is a best response to at all infor- Γ mation sets, not just the information sets that are reached with EQEF is interpreted as “the standard part of i’s expected x x positive probability when playing S. To understand how se- utility is ”(since ranges over the standard real numbers). quential equilibrium differs from Nash equilibrium, consider Theorem 3.4: If Γ is a game with perfect recall3 there is a be- the game shown in Figure 3. lief system β such that (S,β ) is a sequential equilibrium of Γ iff there is a common prior nonstandard probability measure Γ ν on G0 that gives positive measure to all states such that STRAT1,...,STRATn are independent with respect to ν, Γ ν is compatible with S, and P ef implements EQEF in the Γ standard context (G0 ,ν). This is very similar in spirit to Theorem 3.1. The key dif- ference is the use of a nonstandard probability measure. Intu- itively, this forces S to be a best response even at information sets that are reached with (standard) probability 0. The effect of interpreting “EUi = x” as “the standard part of i’s expected utility is x” is that we ignore infinitesimal dif- Figure 3: A game with an unreasonable Nash equilibrium. ef ferences. Thus, for example, the strategy Pi (s0) might not be a best response to S−i; it might just be an -best response One Nash equilibrium of this game has A playing downA for some infinitesimal . As we show in the full paper, it fol- and B playing acrossB. However, this is not a sequential [ ] equilibrium, since playing across is not a best response for lows from Halpern’s 2006 results that we can also obtain a characterization of (trembling hand) perfect equilibrium [Sel- B if B is called on to play. This is not a problem in a B ten, 1975], another standard refinement of Nash equilibrium, Nash equilibrium because the node where plays is not x i reached in the equilibrium. Sequential equilibrium refines if we interpret “EUi = ” as “the expected utility for agent is x” and allow x to range over the nonstandard reals instead Nash equilibrium (in the sense that every sequential equilib- of just the standard reals. rium is a Nash equilibrium) and does not allow solutions such as (downA, acrossB). Intuitively, in a sequential equilibrium, 3These are games where players remember all actions made and every player must make a best response at every information the states they have gone through; we give a formal definition in the set (even if it is reached with probability 0). In the game full paper. See also [Osborne and Rubinstein, 1994].

IJCAI-07 1306 4 Conclusions strict type uncertainty. In Proc. Twentieth Conf. on Uncer- tainty in AI (UAI 2004), pages 268–277, 2004. We have shown how a number of different solution con- cepts from game theory can be captured by essentially one [Brandenburger and Dekel, 187] A. Brandenburger and knowledge-based program, which comes in two variants: one E. Dekel. Rationalizability and correlated equilibria. appropriate for normal-form games and one for extensive- Econometrica, 55:1391–1402, 187. form games. The differences between these solution concepts [Chu and Halpern, 2003] F. Chu and J. Y. Halpern. Great ex- is captured by changes in the context in which the games are pectations. Part I: On the customizability of generalized played: whether players have a common prior (for Nash equi- expected utility. In Proc. Eighteenth International Joint librium, correlated equilibrium, and sequential equilibrium) Conf. on AI (IJCAI ’03), pages 291–296, 2003. or not (for rationalizability), whether strategies are chosen [Fagin et al., 1995] R. Fagin, J. Y. Halpern, Y. Moses, and independently (for Nash equilibrium, sequential equilibrium, M. Y. Vardi. Reasoning About Knowledge. MIT Press, and rationalizability) or not (for correlated equilibrium); and 1995. (Slightly revised paperback version, 2003.) whether uncertainty is represented using a standard or non- standard probability measure. [Fagin et al., 1997] R. Fagin, J. Y. Halpern, Y. Moses, and Our results can be viewed as showing that each of these so- M. Y. Vardi. Knowledge-based programs. Distributed lution concepts sc can be characterized in terms of common Computing, 10(4):199–225, 1997. Γ knowledge of rationality (since the kb programs EQNF [Halpern and Moses, 2004] J. Y. Halpern and Y. Moses. Us- Γ and EQEF embody rationality, and we are interested in ing counterfactuals in knowledge-based programming. systems “generated” by these program, so that rationality Distributed Computing, 17(2):91–106, 2004. holds at all states), and common knowledge of some other [Halpern and Rˆego, 2006] J. Y. Halpern and L. C. Rˆego. features Xsc captured by the context appropriate for sc (e.g., Extensive games with possibly unaware players. In that strategies are chosen independently or that the prior). Proc. Fifth International Joint Conf. on Autonomous Roughly speaking, our results say that if Xsc is common Agents and Multiagent Systems, pages 744–751, 2006. knowledge in a system, then common knowledge of rational- [ ] ity implies that the strategies used must satisfy solution con- Halpern, 1997 J. Y. Halpern. On ambiguities in the inter- S pretation of game trees. Games and Economic Behavior, cept sc; conversely, if a joint strategy satisfies sc, then there 20:66–96, 1997. is a system where Xsc is common knowledge, rationality is [Halpern, 2001] J. Y. Halpern. Substantive rationality and common knowledge, and S is being played at some state. Re- . Games and Economic Behavior, sults similar in spirit have been proved for rationalizability [Brandenburger and Dekel, 187] and correlated equilibrium 37:425–435, 2001. [Aumann, 1987]. Our approach allows us to unify and ex- [Halpern, 2006] J. Y. Halpern. A nonstandard characteri- tend these results and, as suggested in the introduction, ap- zation of sequential equilibriu, perfect equilibrium, and plies even to settings where the game is not common knowl- . Unpublished manuscript, 2006. edge and in settings where uncertainty is not represented by [Kreps and Wilson, 1982] D. M. Kreps and R. B. Wilson. probability. We believe that the approach captures the essence Sequential equilibria. Econometrica, 50:863–894, 1982. of the intuition that a solution concept should embody com- [Lewis, 1973] D. K. Lewis. Counterfactuals.HarvardUni- mon knowledge of rationality. versity Press, 1973. References [Osborne and Rubinstein, 1994] M. J. Osborne and A. Ru- binstein. A Course in Game Theory. MIT Press, 1994. [Aghassi and Bertsimas, 2006] M. Aghassi and D. Bertsi- [Pearce, 1984] D. G. Pearce. Rationalizable strategic be- mas. Robust game theory. Mathematical Programming, havior and the problem of perfection. Econometrica, Series B, 107(1–2):231–273, 2006. 52(4):1029–1050, 1984. [Aumann, 1974] R. J. Aumann. Subjectivity and correlation [Piccione and Rubinstein, 1997] M. Piccione and A. Rubin- in randomized strategies. Journal of Mathematical Eco- stein. On the interpretation of decision problems with im- nomics, 1:67–96, 1974. perfect recall. Games and Economic Behavior, 20(1):3– [Aumann, 1987] R. J. Aumann. Correlated equilibrium as an 24, 1997. expression of Bayesian rationality. Econometrica, 55:1– [Samet, 1996] D. Samet. Hypothetical knowledge and 18, 1987. games with . Games and Economic Be- [Aumann, 1995] R. J. Aumann. Backwards induction and havior, 17:230–251, 1996. common knowledge of rationality. Games and Economic [Selten, 1975] R. Selten. Reexamination of the perfectness Behavior, 8:6–19, 1995. concept for equilibrium points in extensive games. Inter- national Journal of Game Theory, 4:25–55, 1975. [Bernheim, 1984] B. D. Bernheim. Rationalizable strategic behavior. Econometrica, 52(4):1007–1028, 1984. [Stalnaker, 1968] R. C. Stalnaker. A semantic analysis of conditional logic. In N. Rescher, editor, Studies in Logical [ ] Boutilier and Hyafil, 2004 C. Boutilier and N. Hyafil. Re- Theory, pages 98–112. Oxford University Press, 1968. gret minimizing equilibria and mechanisms for games with

IJCAI-07 1307