Maximum Entropy Correlated Equilibria Luis E. Ortiz Robert E. Schapire Sham M. Kakade ECE Deptartment Dept. of Computer Science Toyota Technological Institute Univ. of Puerto Rico at Mayaguez¨ Princeton University Chicago, IL 60637 Mayaguez,¨ PR 00681 Princeton, NJ 08540 [email protected] [email protected] [email protected] Abstract cept in game theory and in mathematical economics, where game theory is applied to model competitive economies We study maximum entropy correlated equilib- [Arrow and Debreu, 1954]. An equilibrium is a point of ria (Maxent CE) in multi-player games. After strategic stance between the players in a game that charac- motivating and deriving some interesting impor- terizes and serves as a prediction to a possible outcome of tant properties of Maxent CE, we provide two the game. gradient-based algorithms that are guaranteed to The general problem of equilibrium computation is funda- converge to it. The proposed algorithms have mental in computer science [Papadimitriou, 2001]. In the strong connections to algorithms for statistical last decade, there have been many great advances in our estimation (e.g., iterative scaling), and permit a understanding of the general prospects and limitations of distributed learning-dynamics interpretation. We computing equilibria in games. Just late last year, there also briefly discuss possible connections of this was a breakthrough on what was considered a major open work, and more generally of the Maximum En- problem in the field: computing equilibria was shown tropy Principle in statistics, to the work on learn- to be hard, even for two-player games [Chen and Deng, ing in games and the problem of equilibrium se- 2005b,a, Daskalakis and Papadimitriou, 2005, Daskalakis lection. et al., 2005, Goldberg and Papadimitriou, 2005]. How- ever, there are still important open questions. The work presented here contributes in part to the problem of com- 1 INTRODUCTION puting equilibria with specific properties. One important notion of equilibrium in non-cooperative The Internet has been a significant source of new problems games of recent interest in computational game theory is of great scientific interest. One example is understanding that of correlated equilibria (CE), a concept due to Aumann the largely decentralized mechanisms leading to the Inter- [1974]. A CE generalizes the more widely-known equilib- net’s own formation. Another example is to study the out- rium notion due to Nash [1951]. It also offers some inter- comes of the interaction of many individual human and ar- esting advantages over Nash equilibria (NE); among them, tificial agents resulting from the modern mechanisms that (a) it allows a weak form of “cooperation” that can lead the Internet has facilitated during the last ten years (e.g., to “better” and “fairer” outcomes while maintaining indi- online auction). Such problems have played a major role in vidual player rationality (i.e., each individual still plays a the increased interest in artificial intelligence to the study best-response strategy), (b) it is consistent with Bayesian of multi-agent systems, as evidenced by the large body of rationality [Aumann, 1987], and (c) unlike NE, reasonably recent work. natural learning dynamics in repeated games converge to Game theory and mathematical economics have become CE (see, for example, Foster and Vohra [1997], Foster and the most popular framework in which to formulate and for- Vohra [1999], Hart and Mas-Colell [2000]). The textbook mally study multi-agent systems in artificial intelligence. example of a CE is the traffic light at a road intersection: This has led the computer science community to become a correlating device for which the best response of the very active in pursuing fundamental computational ques- individual drivers approaching the intersection is to obey tions in game theory and economics. Computational game the signal, therefore leading to a better overall outcome theory and economics has thus emerged as a very important not generally achievable by any NE (e.g., avoids accidents area of study. without forcing any unnecessary, discriminatory delays). An equilibrium is perhaps the most important solution con- In this paper, we study maximum entropy correlated equi- libria (Maxent CE). We argue that the Maximum Entropy Each player’s objective is to maximize their own (expected) Principle (Maxent), due to Jaynes [1957], can serve as a payoff. In the context of a game, a probability distribution guiding principle for selecting among the many possible P over An is called a joint mixed strategy, i.e., a random- CE of a game. This is in part because, as we show, Maxent ized strategy where players play a ∈ An with probability CE affords individual players additional guarantees over ar- P(a). Given a joint mixed strategy P, let P(ai) denote bitrary CE and exhibits other representational and compu- the individual mixed strategy of player i, i.e., the marginal tationally attractive properties. While in many cases CE probability that player i plays ai in Ai, and P(a−i|ai) the has been shown more computationally amenable than NE, conditional joint mixed strategy of all the players except i computing CE with specific properties has been found to given the action of player i, i.e., the conditional probability be hard in general. that, given that player i plays ai, the other players play a−i. We provide simple gradient-based algorithms for comput- An equilibrium is generally considered the solution of a ing Maxent CE that closely resemble statistical inference game. An equilibrium can be viewed as a point of strategic algorithms used successfully in practice (see, for example, stance, where every player is “happy,” i.e., no player has Berger et al. [1996] and Della Pietra et al. [1997]). The any incentive to unilaterally deviate from the way they play. algorithms have a “learning” interpretation. In particular, This paper concerns correlated equilibria. Let they can be seen as natural update rules which individual G (a0 , a , a ) ≡ M (a0 , a ) − M (a , a ) denote players use during some form of pre-play negotiation. The i i i −i i i −i i i −i player i’s payoff gain from playing a0 instead of a when update rules can also be implemented efficiently. We relate i i the other players play a .A correlated equilibrium (CE) this work to that on learning dynamics and discuss other −i for a game G is a joint mixed strategy P such that for possible connections to the general literature on learning every joint action a drawn according to P, each player i, in games. More specifically, we comment on how Maxent knowing P and its own action a in a only, has no payoff may help us characterize, just as it does for many natural i gain, in expectation, from unilateral changing its play systems (e.g., in thermodynamics), the equilibrium behav- to another action a0 ∈ A instead; formally, if for every ior that is likely to arise from the dynamics of rational in- i i player i, and every action pair (a , a0 ) ∈ A2, such that dividuals, each following his or her own individually con- i i i a 6= a0 and P(a ) > 0, trolled learning mechanism. i i i P P(a |a )G (a0 , a , a ) ≤ 0. a−i∈A−i −i i i i i −i 2 PRELIMINARIES One way to establish the existence of CE for any game In this section, we introduce some basic game-theoretic is through its connection to Nash equilibria (NE). An NE concepts, terminology and notation. (For a thorough intro- is a CE P that is a product distribution, i.e.,P(a) = duction to game theory, see Fudenberg and Tirole [1991], Qn i=1 P(ai) for all a, and therefore players play indepen- for instance.) dently. The classical result of Nash [1951] is that for any Let n be the number of players in the game. For each player game there exists a (Nash) equilibrium, thus CE always ex- i = 1, . , n, let Ai be its finite set of actions (also called ist. n pure strategies). Let A ≡ ×i=1Ai be the joint-action Given a game G, it is computationally convenient to ex- space; each element a ≡ (a1, . , an) ∈ A is a called a press its CE conditions as the following equivalent sys- joint action, i.e., if ai is the ith component of a, player i n tem of linear constraints on the joint-mixed-strategy values plays ai in a. Let A−i ≡ ×j=1,j6=iAj denote the joint 2 0 2 P(a): for each player i, and for all (ai, ai) ∈ Ai such that action space of all the players except that of i. Similarly, 0 ai 6= ai, given a ∈ A, let a−i ∈ A−i denote the joint action, in a, of all the players except that of i. It is often convenient to P 0 a ∈A P(ai, a−i)Gi(ai, ai, a−i) ≤ 0; (1) denote the joint action a by (ai, a−i) to highlight the action −i −i of player i in a. for all a ∈ A, P(a) ≥ 0, and P P(a) = 1. We re- n a∈A Let Mi : A → R be the payoff function of player i, i.e., fer to the set of all P that satisfy this linear system as the if players play joint action a ∈ A, then each player i in- correlated equilibria of G and denote it simply by CE, i.e., dividually receives a payoff value of Mi(a). Let M ≡ P ∈ CE if and only if P is a CE. {M1,...,Mn} be the set of payoff functions, one for each player.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-