Maximum Entropy Correlated Equilibria

Luis E. Ortiz Robert E. Schapire Sham M. Kakade ECE Deptartment Dept. of Computer Science Toyota Technological Institute Univ. of Puerto Rico at Mayaguez¨ Princeton University Chicago, IL 60637 Mayaguez,¨ PR 00681 Princeton, NJ 08540 [email protected] [email protected] [email protected]

Abstract cept in and in mathematical economics, where game theory is applied to model competitive economies We study maximum entropy correlated equilib- [Arrow and Debreu, 1954]. An equilibrium is a point of ria (Maxent CE) in multi-player games. After strategic stance between the players in a game that charac- motivating and deriving some interesting impor- terizes and serves as a prediction to a possible of tant properties of Maxent CE, we provide two the game. gradient-based algorithms that are guaranteed to The general problem of equilibrium computation is funda- converge to it. The proposed algorithms have mental in computer science [Papadimitriou, 2001]. In the strong connections to algorithms for statistical last decade, there have been many great advances in our estimation (e.g., iterative scaling), and permit a understanding of the general prospects and limitations of distributed learning-dynamics interpretation. We computing equilibria in games. Just late last year, there also briefly discuss possible connections of this was a breakthrough on what was considered a major open work, and more generally of the Maximum En- problem in the field: computing equilibria was shown tropy Principle in statistics, to the work on learn- to be hard, even for two-player games [Chen and Deng, ing in games and the problem of equilibrium se- 2005b,a, Daskalakis and Papadimitriou, 2005, Daskalakis lection. et al., 2005, Goldberg and Papadimitriou, 2005]. How- ever, there are still important open questions. The work presented here contributes in part to the problem of com- 1 INTRODUCTION puting equilibria with specific properties. One important notion of equilibrium in non-cooperative The Internet has been a significant source of new problems games of recent interest in computational game theory is of great scientific interest. One example is understanding that of correlated equilibria (CE), a concept due to Aumann the largely decentralized mechanisms leading to the Inter- [1974]. A CE generalizes the more widely-known equilib- net’s own formation. Another example is to study the out- rium notion due to Nash [1951]. It also offers some inter- comes of the interaction of many individual human and ar- esting advantages over Nash equilibria (NE); among them, tificial agents resulting from the modern mechanisms that (a) it allows a weak form of “cooperation” that can lead the Internet has facilitated during the last ten years (e.g., to “better” and “fairer” outcomes while maintaining indi- online auction). Such problems have played a major role in vidual player rationality (i.e., each individual still plays a the increased interest in artificial intelligence to the study best-response ), (b) it is consistent with Bayesian of multi-agent systems, as evidenced by the large body of rationality [Aumann, 1987], and (c) unlike NE, reasonably recent work. natural learning dynamics in repeated games converge to Game theory and mathematical economics have become CE (see, for example, Foster and Vohra [1997], Foster and the most popular framework in which to formulate and for- Vohra [1999], Hart and Mas-Colell [2000]). The textbook mally study multi-agent systems in artificial intelligence. example of a CE is the traffic light at a road intersection: This has led the computer science community to become a correlating device for which the of the very active in pursuing fundamental computational ques- individual drivers approaching the intersection is to obey tions in game theory and economics. Computational game the signal, therefore leading to a better overall outcome theory and economics has thus emerged as a very important not generally achievable by any NE (e.g., avoids accidents area of study. without forcing any unnecessary, discriminatory delays). An equilibrium is perhaps the most important solution con- In this paper, we study maximum entropy correlated equi- libria (Maxent CE). We argue that the Maximum Entropy Each player’s objective is to maximize their own (expected) Principle (Maxent), due to Jaynes [1957], can serve as a payoff. In the context of a game, a probability distribution guiding principle for selecting among the many possible P over An is called a joint mixed strategy, i.e., a random- CE of a game. This is in part because, as we show, Maxent ized strategy where players play a ∈ An with probability CE affords individual players additional guarantees over ar- P(a). Given a joint mixed strategy P, let P(ai) denote bitrary CE and exhibits other representational and compu- the individual mixed strategy of player i, i.e., the marginal tationally attractive properties. While in many cases CE probability that player i plays ai in Ai, and P(a−i|ai) the has been shown more computationally amenable than NE, conditional joint mixed strategy of all the players except i computing CE with specific properties has been found to given the action of player i, i.e., the conditional probability be hard in general. that, given that player i plays ai, the other players play a−i. We provide simple gradient-based algorithms for comput- An equilibrium is generally considered the solution of a ing Maxent CE that closely resemble statistical inference game. An equilibrium can be viewed as a point of strategic algorithms used successfully in practice (see, for example, stance, where every player is “happy,” i.e., no player has Berger et al. [1996] and Della Pietra et al. [1997]). The any incentive to unilaterally deviate from the way they play. algorithms have a “learning” interpretation. In particular, This paper concerns correlated equilibria. Let they can be seen as natural update rules which individual G (a0 , a , a ) ≡ M (a0 , a ) − M (a , a ) denote players use during some form of pre-play negotiation. The i i i −i i i −i i i −i player i’s payoff gain from playing a0 instead of a when update rules can also be implemented efficiently. We relate i i the other players play a .A correlated equilibrium (CE) this work to that on learning dynamics and discuss other −i for a game G is a joint mixed strategy P such that for possible connections to the general literature on learning every joint action a drawn according to P, each player i, in games. More specifically, we comment on how Maxent knowing P and its own action a in a only, has no payoff may help us characterize, just as it does for many natural i gain, in expectation, from unilateral changing its play systems (e.g., in thermodynamics), the equilibrium behav- to another action a0 ∈ A instead; formally, if for every ior that is likely to arise from the dynamics of rational in- i i player i, and every action pair (a , a0 ) ∈ A2, such that dividuals, each following his or her own individually con- i i i a 6= a0 and P(a ) > 0, trolled learning mechanism. i i i P P(a |a )G (a0 , a , a ) ≤ 0. a−i∈A−i −i i i i i −i 2 PRELIMINARIES One way to establish the existence of CE for any game In this section, we introduce some basic game-theoretic is through its connection to Nash equilibria (NE). An NE concepts, terminology and notation. (For a thorough intro- is a CE P that is a product distribution, i.e.,P(a) = duction to game theory, see Fudenberg and Tirole [1991], Qn i=1 P(ai) for all a, and therefore players play indepen- for instance.) dently. The classical result of Nash [1951] is that for any Let n be the number of players in the game. For each player game there exists a (Nash) equilibrium, thus CE always ex- i = 1, . . . , n, let Ai be its finite set of actions (also called ist. n pure strategies). Let A ≡ ×i=1Ai be the joint-action Given a game G, it is computationally convenient to ex- space; each element a ≡ (a1, . . . , an) ∈ A is a called a press its CE conditions as the following equivalent sys- joint action, i.e., if ai is the ith component of a, player i n tem of linear constraints on the joint-mixed-strategy values plays ai in a. Let A−i ≡ ×j=1,j6=iAj denote the joint 2 0 2 P(a): for each player i, and for all (ai, ai) ∈ Ai such that action space of all the players except that of i. Similarly, 0 ai 6= ai, given a ∈ A, let a−i ∈ A−i denote the joint action, in a, of all the players except that of i. It is often convenient to P 0 a ∈A P(ai, a−i)Gi(ai, ai, a−i) ≤ 0; (1) denote the joint action a by (ai, a−i) to highlight the action −i −i of player i in a. for all a ∈ A, P(a) ≥ 0, and P P(a) = 1. We re- n a∈A Let Mi : A → R be the payoff function of player i, i.e., fer to the set of all P that satisfy this linear system as the if players play joint action a ∈ A, then each player i in- correlated equilibria of G and denote it simply by CE, i.e., dividually receives a payoff value of Mi(a). Let M ≡ P ∈ CE if and only if P is a CE. {M1,...,Mn} be the set of payoff functions, one for each player. matrix games because in the finite-action case one can view each An n-player game in normal form is defined by the tuple payoff function as an n-dimensional matrix indexed by the join- 1 actions. G ≡ (A, M). 2In contrast, because an NE is product distribution, the con- ditions are not linear on the individual players mixed strategies 1 Such games are also often referred to as strategic-form or P(ai). 3 MAXENT CE AND ITS PROPERTIES Proposition 1 (Player Optimality of Maxent CE) If P∗ is the Maxent CE of a game G, then for each player i, we ∗ have P = arg maxP∈CE(P∗ ) HA |A (P). Given a joint mixed strategy P, let H(P) ≡ −i i −i P P(a) ln(1/P(a)) be its (Shannon) entropy. For- a∈A The proof follows easily from the definition of a Maxent mally, a Maxent CE is the joint mixed strategy P∗ = CE and the chain rule for (joint) entropies. arg maxP∈CE H(P). We state the results presented in the remaining of this sec- 3.2 SIMPLE AND COMPACT REPRESENTATION tion without proofs; the proofs will become clear from the descriptions in the next section. Another interesting property of Maxent is that it is, in a sense, the most simple CE, and permits a very compact rep- resentation. Let m = maxi |Ai| be the maximum number 3.1 CONCEPTUAL PROPERTIES of actions of any player. Then, the number of parameters needed is O(nm2). Consider the following scenario. A rational player is will- ing to negotiate and agree to some form of “joint” strategy Theorem 1 (Maxent CE Representation) Given an n- with the other players. At the same time, the player wants player game G, its Maxent CE P∗ has the following para- to try to conceal its own behavior, relative to the agreed- metric form: for all a, upon joint strategy, by making it difficult to predict. If ev- ery player has the same objective, how should they (agree  n  ∗ X X ∗ 0 to) play the game? Alternatively, can we suggest a joint P (a) ∝ exp − λ 0 Gi(a , ai, a−i) ,  i,ai,ai i  i=1 0 strategy that satisfies all the players but complicates their ai∈Ai\{ai} prediction of each others’ individual strategies? We pro- ∗ pose Maxent CE as a formal and natural answer to these where λ 0 ≥ 0, for all players i = 1, . . . , n, and for all i,ai,ai questions. 0 2 0 of its action pairs (ai, ai) ∈ Ai such that ai 6= ai, are the The concept of conditional entropy in information theory parameters. provides a measure of the predictability of a random process from another [Cover and Thomas, 2006]. As This form is not surprising for those familiar with maxent such, conditional entropy helps quantify the uncertainty optimization, as the parameters corresponds to Lagrange inherent in the players’ strategies. Let P(a |a ) denote multipliers (i.e., the dual variables). In the game context, i −i ∗ the conditional mixed strategy of player i given the actions the parameters λ have a natural interpretation. For ex- ∗ ample, the parameter λ 0 roughly measures the ten- of the other players, i.e., the probability, with respect i,ai,ai 0 to P, that player i plays ai given that the other players dency that player i has to prefer ai over ai. For instance, ∗ a A A λ 0 > 0 if and only if player i is indifferent between play −i. Let us view i and −i as random variables i,ai,ai 0 corresponding to the play of player i and the others, respec- ai and ai, i.e., the corresponding CE (inequality) condition tively, under P. We can then denote by HAi|A−i (P) = holds with equality. Thus, if the player has a strict pref- − P P(a ) P P(a |a ) log P(a |a ) erence of a over a0 , i.e., the corresponding CE condition a−i∈A−i −i ai∈Ai i −i i −i i i the entropy of the play of player i conditioned on holds with strict inequality, then λ 0 must be 0. i,ai,ai knowledge about the play of the other players. We note that due to a technical condition, the results Consider any group of players that wishes to predict the presented in this paper should be appropriately qualified. strategy of another player based on their own play. Con- The expression for the Maxent CE given above hints at ceptually, we can think of the conditional entropy as a mea- this complication: although the Maxent always exists, be- sure on how hard this is: the larger the conditional entropy, cause CE always exists, one can see the need to appro- the harder the prediction. Let us allow each individual priately modify the representation when the Maxent CE player i the opportunity to change from a suggested CE P does not have full support. The results do hold, for in- to another CE P0 but the player can only change its condi- stance, if the game has some CE with full support. They 0 tional mixed strategy P (ai|a−i), not the joint mixed strat- hold more generally for the case of Maxent approximate 0 egy of the other players P(a−i); thus, P (a) must equal CE, where we allow the possibility that each player can 0 P (ai|a−i)P(a−i). We show that an important property gain by deviating, but no more than some small amount ∗ 0 of the Maxent CE P of a game is that no player can uni-  > 0, i.e., for all i, and for all ai 6= ai, we allow laterally change to or suggest another allowed CE and in- P P(a , a )G (a0 , a , a ) ≤ . The modifications to a−i i −i i i i −i crease its conditional entropy of play. More formally, given the case of approximate CE are easy, and the details are pre- 0 0 a joint mixed strategy P and a player i, let P−i denote sented in a companion technical report [Ortiz et al., 2006]. 0 0 the marginal of P over A−i, and let CEi(P−i) = {P ∈ We ignore this issue throughout this paper in the interest of 0 CE|P(a−i) = P−i(a−i) for all a−i ∈ A−i}. presentation. 3.3 PROBABILISTIC STRUCTURE There is exactly one potential in the MRF for each player, and each potential is only over the neighborhood of the The Maxent CE can exploit available strategic structure in player. the game. As an example, we now discuss the probabilistic The representation result of Kakade et al. [2003] states that structure of Maxent CE in the particular context of graphi- given any CE, one can represent it by another with the same cal games [Kearns et al., 2001], a graphical model for game size as the representation size of the graphical game, mod- theory. The representation size of games in normal form n ulo expected payoff equivalence. The proof uses a max- is O(nm ), exponential in the number of players. Such imum entropy argument based on matching the neighbor- sizes render the normal-form representation of games inad- hood marginals of the original CE. Here we concentrate ex- equate when modeling problems as games with many play- plicitly on the Maxent CE and give an explicit expression ers, as it would be the case in many real-world problems, for that equilibrium in terms of the players’ payoff func- specially in the Internet. Just like probabilistic graphical tions which uses only O(nm2) parameters, as opposed to models allow us to model large but structured probabilis- O(nmk). tic systems, graphical games allow us to deal with large- population games by providing compact representation for This corollary, which can also be derived from the Rep- them. resentation Theorem of Kakade et al. [2003], is signifi- cant because it elucidates the probabilistic structure of the Let G = (V,E) be an undirected graph where the vertices CE. It also lets us exploit what is known from the litera- or nodes V = {1, 2, . . . , n} correspond to the players in the ture on probabilistic graphical models. For instance, we game. The neighbors of a player i in G are those players can make qualitative statements about the structure of the that are connected to i by a single edge in E. Given a player CE (e.g., which players play independently conditioned on i, we refer to the player i and its neighbors as the neighbor- fixing the actions of a separate set of players) that depend hood Ni = {j|(i, j) ∈ E}∪{i} of player i (note that Ni in- only on the graph structure, and not the actual payoff val- cludes i). The graph has a simple meaning: a player’s pay- ues. This connection also establishes the efficient imple- off is only a function of the actions of players in its neigh- mentation of the Maxent CE (i.e., drawing samples effi- borhood. For every player i, let ki = |Ni| be the size of ciently from it) for bounded-tree-width graphical games, its neighborhood with N = {j , j , . . . , j } ⊂ V and de- i 1 2 ki and allows approximate implementations for more general note by aN ≡ (aj , aj , . . . , aj ) ∈ ×j∈N Aj ≡ AN the i 1 2 ki i i graphs through well-known and broadly used techniques joint action of only players in the neighborhood of player i. such as Gibbs sampling. Given a graph G, for each player i, its local payoff function 0 Mi : ANi → R maps the the joint-actions aNi of players 0 3.4 COMPUTATION in its neighborhood in G to a real number Mi (aNi ). For each player i, the payoff function Mi of player i is such The following result is the main computational contribution 0 0 0 0 that Mi(a) = Mi (aNi ). Let M ≡ {M1,...,Mn} be the of this paper. The next section is devoted to its derivation. set of local payoff functions. A graphical game is defined by the 3-tuple GG = (A, G, M0). Theorem 2 (Maxent CE Computation) Given a game G, there exist gradient-based algorithms with guaranteed con- Recall that m = max |A |. Let k = max |N | be the i i i i vergence to the Maxent CE of G. largest neighborhood size in G. The representation size of k a graphical game is O(nm ), exponential in the size of the We also note that each iteration of the algorithms consti- largest neighborhood, not the number of players. In fact, tutes a natural step and can be computed in time poly- if k  n, we obtain exponential savings in representation nomial in the representation size for normal-form games size. Note that the generality of the normal-form game is and, more generally, graphical games whose neighborhood not lost because we can represent any game by using a fully graphs have bounded tree-width; as noted earlier, approxi- connected graph; we just gain representationally for games mations are also possible for more general graphs. with richer strategic structure. Kakade et al. [2003] give an algorithm to compute a Let GN denote the extended-neighborhood graph of G, single exact CE in graphical games with bounded tree- i.e., the graph that results from adding edges between every width in time polynomial on the representation size of the neighbor of a player; formally, if G = (V,E), then GN = game based on a simple linear program, 3 thus extend- (V,EN ), where EN = E ∪ {(i, j)|i, j ∈ Nk for some k}. ing a well-known previous result for normal-form games The following result follows easily from Theorem 1 above [Gilboa and Zemel, 1989]. Papadimitriou and Roughgar- and the strategic structure of the graphical game. den [2005] present an alternative algorithm for computing 3The result given there is for tree graphical games, but a simple Corollary 1 (Probabilistic Structure of Maxent CE) application of standard arguments and ideas from the literature on Given a graphical game GG with graph G, the Maxent CE probabilistic graphical models extends it to the case of bounded- of GG is a Markov random field with respect to GN . tree-width graphs. CE in bounded-tree-width graphical games and strength- and ens it by showing that it is hard to compute the socially t + − 0 0 optimum CE, i.e., the CE that maximizes the sum of the Ri,ai,a (t) ≡ ∇i,ai,a g(λ ) = R 0 (t) − R 0 (t). i i i,ai,ai i,ai,ai players’ expected payoffs. For bounded-tree-width graphi- cal games, both algorithms can guarantee a representation We call R 0 (t) the regret that player i has for playing i,ai,ai 0 t size for the resulting CE that is at most exponential in the ai instead of ai, with respect to P . tree-width of the graph of the graphical game. It is also The logarithmic-gradient algorithm is as follows: initialize important to note that the kind of equilibrium that previ- λ0 ≥ 0 arbitrarily and at every round t ≥ 0, set ous algorithms compute corresponds, roughly speaking, to those that maximize some linear function of the neighbor- t+1 t t + λ 0 ← [λ 0 + δ 0 ] (2) i,a ,a i,ai,a i,ai,a hood marginals of the graphical game. i i i i In a surprising result, Papadimitriou [2005] gives a where polynomial-time algorithm for computing a single exact t 1  + −  CE for a large class of compactly-representable games, in- δ 0 = log R 0 (t)/R 0 (t) (3) i,ai,ai i,ai,a i,ai,a cluding graphical games. The goal there is to find any CE, 2c i i and thus, not one with a specific property. The algorithm and the constant c is such that c ≥ also uses duality, but in a different way than ours. It is also P P 0 i a0 6=a |Gi(ai, ai, a−i)| for all ai, a−i. interesting that the CE found by that algorithm can be rep- i i resented by a mixture of a polynomial number of product The dynamic-step-size gradient-ascent algorithm is the distributions as components, which is also different from same except that it uses the Maxent representation given above. t 1 1 t δ 0 = ∇i,a ,a0 g(λ ) i,ai,ai + − i i 2c R 0 (t) + R 0 (t) i,ai,ai i,ai,ai 4 COMPUTING MAXENT CE + − R 0 (t) − R 0 (t) 1 i,ai,ai i,ai,ai = + − Computing the Maxent CE is a convex optimization prob- 2c R 0 (t) + R 0 (t) i,ai,ai i,ai,ai lem, allowing the application of efficient algorithms al- + ! R 0 (t) ready developed for such problems (see [Boyd and Van- 1 i,ai,ai 1 = + − − . (4) derberghe, 2004]). Here, we solve this problem by us- c R 0 (t) + R 0 (t) 2 i,ai,ai i,ai,ai ing duality, and provide specific gradient-based algorithms that monotonically converge to the unique dual optimum. 4.2 CONVERGENCE The algorithms also have a natural distributed learning- dynamics interpretation which we discuss in the next sec- We show that if we use either algorithm at each round, tion. then the sequence Pt converges to the Maxent CE. Due to Following a standard argument, we obtain that the La- space constraints, we omit the details of the proof and refer grange dual function reduces to g(λ) = − ln Z(λ) where the reader to the companion technical report [Ortiz et al., 0 P P P 0 2006]. Z(λ) = exp(− 0 λi,ai,a Gi(a , ai, a−i)). a i ai6=ai i i The dual problem reduces to finding supλ≥0 g(λ). The The element in the proof is based on an auxiliary func- relationship between the dual variables λ and the primal tion, variational approach similar to Dud´ık et al. [2004] λ variables P ≡ P is as usual: for all a, and Collins et al. [2002]. The idea is to lower bound the

X X 0 change in the dual value and then maximize that bound. 0 log P(a) = − λi,ai,a Gi(ai, ai, a−i)−log Z(λ). 0 i The following is a sketch. Let si(ai, ai, a−i) = 1 if 0 i a 6=ai 0 i Gi(ai, ai, a−i) ≥ 0, and −1 otherwise. First note that, by Jensen’s inequality, 4.1 ALGORITHMS „ 0 « P P |Gi(ai,ai,a−i)| 0 i a0 6=a (−cδi,a ,a0 si(ai,ai,a−i)) We now present two gradient-based algorithms for com- e i i c i i ≤ puting sup g(λ). For convenience, we use the notation 0 λ≥0 |G (a , a , a )| 0 t X X i i i −i −cδi,a ,a0 si(ai,ai,a−i) [x]+ ≡ max(0, x). We also denote Pt ≡ Pλ , e i i + 0 c i ai6=ai + X t 0 +   Ri,a ,a0 (t) = P (ai, a−i)[Gi(ai, ai, a−i)] , 0 i i X X |Gi(ai, ai, a−i)| a−i 1 −  . 0 c i ai6=ai − X t 0 + R 0 (t) = P (ai, a−i)[−Gi(ai, ai, a−i)] , i,ai,ai a−i Applying this inequality and linearity of expectation, we get “external player” or “arbiter;” just like in gradient dynam- ics, this is not really needed as each individual can broad- Zt+1/Zt cast its suggested joint mixed strategy and perform the + R 0 (t) functions of the arbiter. At each round t, the arbiter takes X  −cδi,a ,a0  i,ai,ai ≤ e i i − 1 + c suggestions from each player i about its preferred joint i,a ,a0 ,a 6=a0 t i i i i mixed strategy Pi. The arbiter then processes those sug- + t R 0 (t) gestions by forming a single joint mixed P as follows,  cδi,a ,a0  i,ai,ai e i i − 1 + 1. n c t Y t P (a) ∝ Pi(a), Minimizing with respect to δ leads the logarithmic-gradient i=1 algorithm. The gradient-ascent algorithm follows similarly which it then sends back to each player for its considera- after applying the well-known inequality 1 + x ≤ ex ≤ tion. 1 + x + x2 for |x| < 1 to the bound, before minimizing. We then show that in fact δ = 0 if and only if g(λt) = We can interpret Pt as follows. We can think of each player t maxλ≥0 g(λ) = maxP∈CE H(P). using their own joint mixed strategy Pi to draw joint ac- tion suggestions. Using this mechanism, the players actu- 5 A LEARNING-DYNAMICS VIEW ally perform a joint play only when everybody draws the same joint action. When play actually occurs, the proba- bility that they all agree to play a is in fact Pt(a)! The In this section, we connect the algorithms to simple learn- arbiter, in some sense, serves as a surrogate that accelerates ing dynamics [Singh et al., 2000]. Convergence of best- joint-action agreements; thus, only one draw is needed! response-gradient dynamics to NE is not always possible [Hart and Mas-Colell, 2003]. In contrast, the analogous Throughout the process, each player i maintains a paramet- process presented here is guaranteed to converge to the ric joint probability distribution which at any time t takes Maxent CE. the form   Consider the following simple learning dynamics based on t X t 0 P (a) ∝ exp − λ 0 Gi(a , ai, a−i) , distributed gradient ascent. In this case, each player is i  i,ai,ai i  0 known to play independently, and thus holds its own mixed ai6=ai strategy Pi over just Ai. At every round t, each player i t where λi,a ,a0 corresponds to the current value of its own sends to all the other players its current strategy Pt. Each i i i parameters. Each player also updates the parameters us- player i then computes the expected payoff from playing ing any of the update rules described in the previous sec- a particular action when the others play according to their tion. We can see from the update rules that each player broadcasted strategy. Assuming the other players maintain only needs to know its own payoff, current parameter val- their strategy, each player then improves over its strategy by ues, and no other external knowledge other than the current changing it in the direction of the gradient of its expected global joint mixed strategy. When an arbiter is present, no payoff P P (a )(Q Pt (a ))M (a , a ) with respect a i i j6=i j j i i −i knowledge of the other players’ suggestions is needed, and to its own strategy P . Formally, each player i gradient- i without an arbiter, only the individual joint mixed strategies ascent update rule is, for each a , i at the current time step are required, not the other play-   ers’ parameters or payoff functions that define those strate- t+1 t t X Y t gies. Each player can thus maintain their own parameter Pi (ai) ← Pi(ai) + α  Pj(aj) Mi(ai, a−i), a−i j6=i and payoff values private. The update rules have a very natural interpretation for each where αt is the step size, and ensuring that the updates lead player. If Ri,a ,a0 (i.e., the gradient of the dual) is positive, to a proper probability distribution. There has been quite a i i then player i has regret over playing a instead of a0 with bit of work that tries to understand the convergence prop- i i respect to the currently agreed (global) joint distribution of erties of such update rules. However, it is known that in play. Therefore, assuming the other players maintain their general, update rules of this kind are not guaranteed to con- strategy, the player should increase the value of λi,a ,a0 so verge to NE. i i as to reduce the probability that the player wants to play 0 We can view the gradient-based algorithms presented in ai (and indirectly increase that of playing ai); a similar the previous section as a type of distributed learning via a reasoning applies to the case that R 0 < 0. It is not i,ai,ai message-passing process that is analogous to the gradient- hard to see that by using the learning (or pre-play negotia- ascent update rule just described. Now, however, because tion) mechanism just described, the players are effectively the players are negotiating a joint mixed strategy, each computing the Maxent CE in a distributed way. Thus, the player suggests its own joint mixed strategy P(a). For convergence result guarantees that this process in fact con- conceptual presentation only, let us introduce an additional verges to the Maxent CE. 6 DISCUSSION mal” (a notion which in and of itself is thorny to define), particularly when players are not cooperative, is an impor- On the question of . From a repre- tant open problem. sentational standpoint, “simpler” CE are better. Motivated It is nevertheless instructive to explore the connections be- by the representational results of Kakade et al. [2003] on tween Maxent CE and other popular equilibrium-selection CE in graphical games, we initially considered the follow- notions, like the socially optimum CE with respect to to- ing question: Are there natural learning rules directly lead- tal welfare (i.e. the total sum of expected payoffs). We ing to “compactly representable” and “reasonably struc- can compute the socially optimum CE using a linear pro- tured” CE that exploit the strategic structure of the graph- gram. In many simple, classical toy games (e.g., Battle ical game? It turns out that this question is harder than it of the Sexes, Shapley, etc.), the Maxent CE is a NE, and, first appears. as with NE, the Maxent CE can achieve lower payoff than From a learning standpoint, it has been found that when the socially optimal one. As is well known, there can be agents in an environment simultaneously learn to act using both better and worse CE than NE with respect to sum total natural learning rules, as in so called no-regret learning, welfare in a game. Similarly, in general, we expect there convergence of their empirical play is guaranteed, not to to be both better and worse CE than the Maxent CE (with Nash, but to the set of CE (see, for instance, Foster and regards to sum total welfare). We should note that it is Vohra [1999] and Hart and Mas-Colell [2000]). Currently, unclear whether the equilibrium which maximizes the sum there is little understanding of the specific equilibria those total welfare is the most appropriate concept that character- types of learning rules might converge to. Thus, there has izes complex systems. been considerable interest in trying to characterize in any way the behavior of natural learning rules leading to CE. While we believe our work is a step towards addressing Conclusions. We propose maximum entropy as a use- this question, in our own preliminary experiments, the em- ful guiding principle to the question of equilibrium selec- pirical play produced by no-regret-type learning rules does tion in game theory. We motivated and studied maximum not seem to converge to a specific CE, let alone the Maxent entropy correlated equilibria and showed that it has some CE. The question is, are there other kind of natural learning very useful representational and computational properties. rules that do converge to the Maxent CE? In particular, we presented a logarithmic-gradient and a (dynamic-step-size) gradient-ascent algorithm and showed Because a game can have many equilibria, another related that they are guaranteed to converge to the Maxent CE for question is, which equilibria are more natural or likely to any game. Each round of the algorithms can be performed naturally arise from the players’ dynamics? This “selec- efficiently for a wide class of games, including normal- tion” question is not particular to game theory. In sta- form and some graphical games (e.g., those with bounded tistical estimation, for instance, one asks which probabil- tree-width); widely popular and effective approximation ity distribution should be chosen out of all distributions techniques such as Gibbs sampling can be used for gen- consistent with the data. The Maximum Entropy Princi- eral graphical games. The algorithms are similar to others ple [Jaynes, 1957] has proved to be a natural and useful used for statistical estimation which have been found effec- guiding principle with wide applicability beyond statistics. tive in many practical applications. Finally, we provided a Maxent has often helped to characterize equilibria in many view of the algorithms in terms of learning or pre-play ne- natural systems (particularly, in thermodynamics). We be- gotiation dynamics. We believe Maxent CE may help us lieve Maxent may be useful to study equilibria that are ac- understand the properties of the equilibria that the empir- tually reached under simple learning dynamics and empir- ical play generated by some natural learning mechanisms ical play. From an information-theoretic point of view, the might converge to. maximum entropy distribution is the “simplest” as a conse- quence of being the most “uninformative.” Hence, it is not unreasonable to think that Maxent can also serve as a useful Acknowledgements guiding principle to the question of equilibrium selection in game theory. We would like to thank Michael Collins for pointers to the literature on maximum entropy models as well as many Maxent vs. Social Welfare. In game theory, an equilib- useful suggestions. We also want to thank the anonymous rium is a descriptive concept of any stable outcome of the reviewers for their useful comments and suggestions, spe- interaction of rational non-cooperative players in a game. cially those regarding the comparison between Maxent and As such, game theory by itself imposes no classification of socially optimum CE. Part of this work was done while the equilibria as good or bad (as in the physical sciences, equi- first and third author were at the CIS Dept. at Penn, and libria are only stable states). Understanding how stable out- also while the first author was at MIT CSAIL. The current comes can be achieved which are somehow “socially opti- address for the first author is given. References Paul W. Goldberg and Christos H. Papadimitriou. Re- ducibility among equilibrium problems. Electronic Col- Kenneth J. Arrow and Gerard Debreu. Existence of an equi- loquium on Computational Complexity (ECCC), (090), Econometrica librium for a competitive economy. , 22 2005. (3):265–290, July 1954. Sergiu Hart and Andreu Mas-Colell. A simple adaptive R.J. Aumann. Subjectivity and correlation in randomized procedure leading to correlated equilibrium. Economet- strategies. Journal of Mathematical Economics, 1, 1974. rica, 68(5):1127 – 1150, 2000. R.J. Aumann. Correlated equilibrium as an expression of Sergiu Hart and Andreu Mas-Colell. Uncoupled dynamics Bayesian rationality. Econometrica, 55, 1987. do not lead to . American Economic A. Berger, S. Della Pietra, and V. Della Pietra. A maxi- Review, 93(5):1830–1836, 2003. mum entropy approach to natural language processing. E. T. Jaynes. Information theory and statistical mechanics Computational Linguistics, 22(1), March 1996. I & II. Phys. Rev., 1957. Stephen Boyd and Lieven Vanderberghe. Convex Optimiza- Sham Kakade, Michael Kearns, John Langford, and Luis tion. Cambridge University Press, 2004. Ortiz. Correlated equilibria in graphical games. In ACM Xi Chen and Xiaotie Deng. 3-NASH is PPAD-complete. Conference Proceedings on Electronic Commerce, 2003. Electronic Colloquium on Computational Complexity M. Kearns, M. Littman, and S. Singh. Graphical models for (ECCC), (134), 2005a. game theory. In Proceedings of the Conference on Un- Xi Chen and Xiaotie Deng. Settling the complexity of certainty in Artificial Intelligence, pages 253–260, 2001. 2-player Nash-equilibrium. Electronic Colloquium on J. F. Nash. Non-cooperative games. Annals of Mathemat- Computational Complexity (ECCC), (140), 2005b. ics, 54:286–295, 1951. Michael Collins, Robert E. Schapire, and Yoram Singer. Luis E. Ortiz, Robert E. Schapire, and Sham M. Kakade. Logistic regression, Adaboost and Bregman distances. Maximum entropy correlated equilibrium. Technical Re- Machine Learning, 48(1/2/3), 2002. port TR-2006-21, CSAIL MIT, Cambridge, MA USA, Thomas M. Cover and Joy A. Thomas. Elements of In- March 2006. formation Theory. Wiley & Sons, New York, second Christos Papadimitriou. Algorithms, games, and the Inter- edition, 2006. net. In STOC ’01: Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 749– Konstantinos Daskalakis and Christos H. Papadimitriou. 753, 2001. Three-player games are hard. Electronic Colloquium on Computational Complexity (ECCC), (139), 2005. Christos H. Papadimitriou. Computing correlated equilib- ria in multi-player games. In STOC ’05: Proceedings of Konstantinos Daskalakis, Paul W. Goldberg, and Chris- the thirty-seventh annual ACM symposium on Theory of tos H. Papadimitriou. The complexity of computing a computing, pages 49–56, 2005. Nash equilibrium. Electronic Colloquium on Computa- tional Complexity (ECCC), (115), 2005. Christos H. Papadimitriou and Tim Roughgarden. Com- puting equilibria in multi-player games. In SODA ’05: Stephen Della Pietra, Vincent Della Pietra, and John Laf- Proceedings of the sixteenth annual ACM-SIAM sympo- ferty. Inducing features of random fields. IEEE Transac- sium on Discrete algorithms, pages 82–91, 2005. tions on Pattern Analysis and Machine Intelligence, 19 (4):1–13, April 1997. Satinder P. Singh, Michael J. Kearns, and Yishay Mansour. Nash convergence of gradient dynamics in general-sum Miroslav Dud´ık, Steven J. Phillips, and Robert E. Schapire. games. In UAI, pages 541–548, 2000. Performance guarantees for regularized maximum en- tropy density estimation. In Proceedings of COLT, 2004. D. Foster and R. Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 1997. D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, pages 7 – 36, 1999. and . Game Theory. The MIT Press, 1991. I. Gilboa and E. Zemel. Nash and correlated equilibria: some complexity considerations. Games and Economic Behavior, 1:80–93, 1989.