Maximum Entropy Correlated Equilibria

Maximum Entropy Correlated Equilibria Luis E. Ortiz Robert E. Schapire Sham M. Kakade ECE Deptartment Dept. of Computer Science Toyota Technological Institute Univ. of Puerto Rico at Mayaguez¨ Princeton University Chicago, IL 60637 Mayaguez,¨ PR 00681 Princeton, NJ 08540 [email protected] [email protected] [email protected] Abstract cept in game theory and in mathematical economics, where game theory is applied to model competitive economies We study maximum entropy correlated equilib- [Arrow and Debreu, 1954]. An equilibrium is a point of ria (Maxent CE) in multi-player games. After strategic stance between the players in a game that charac- motivating and deriving some interesting impor- terizes and serves as a prediction to a possible outcome of tant properties of Maxent CE, we provide two the game. gradient-based algorithms that are guaranteed to The general problem of equilibrium computation is funda- converge to it. The proposed algorithms have mental in computer science [Papadimitriou, 2001]. In the strong connections to algorithms for statistical last decade, there have been many great advances in our estimation (e.g., iterative scaling), and permit a understanding of the general prospects and limitations of distributed learning-dynamics interpretation. We computing equilibria in games. Just late last year, there also briefly discuss possible connections of this was a breakthrough on what was considered a major open work, and more generally of the Maximum En- problem in the field: computing equilibria was shown tropy Principle in statistics, to the work on learn- to be hard, even for two-player games [Chen and Deng, ing in games and the problem of equilibrium se- 2005b,a, Daskalakis and Papadimitriou, 2005, Daskalakis lection. et al., 2005, Goldberg and Papadimitriou, 2005]. How- ever, there are still important open questions. The work presented here contributes in part to the problem of com- 1 INTRODUCTION puting equilibria with specific properties. One important notion of equilibrium in non-cooperative The Internet has been a significant source of new problems games of recent interest in computational game theory is of great scientific interest. One example is understanding that of correlated equilibria (CE), a concept due to Aumann the largely decentralized mechanisms leading to the Inter- [1974]. A CE generalizes the more widely-known equilib- net’s own formation. Another example is to study the out- rium notion due to Nash [1951]. It also offers some inter- comes of the interaction of many individual human and ar- esting advantages over Nash equilibria (NE); among them, tificial agents resulting from the modern mechanisms that (a) it allows a weak form of “cooperation” that can lead the Internet has facilitated during the last ten years (e.g., to “better” and “fairer” outcomes while maintaining indi- online auction). Such problems have played a major role in vidual player rationality (i.e., each individual still plays a the increased interest in artificial intelligence to the study best-response strategy), (b) it is consistent with Bayesian of multi-agent systems, as evidenced by the large body of rationality [Aumann, 1987], and (c) unlike NE, reasonably recent work. natural learning dynamics in repeated games converge to Game theory and mathematical economics have become CE (see, for example, Foster and Vohra [1997], Foster and the most popular framework in which to formulate and for- Vohra [1999], Hart and Mas-Colell [2000]). The textbook mally study multi-agent systems in artificial intelligence. example of a CE is the traffic light at a road intersection: This has led the computer science community to become a correlating device for which the best response of the very active in pursuing fundamental computational ques- individual drivers approaching the intersection is to obey tions in game theory and economics. Computational game the signal, therefore leading to a better overall outcome theory and economics has thus emerged as a very important not generally achievable by any NE (e.g., avoids accidents area of study. without forcing any unnecessary, discriminatory delays). An equilibrium is perhaps the most important solution con- In this paper, we study maximum entropy correlated equilibria (Maxent CE). We argue that the Maximum Entropy Each player’s objective is to maximize their own (expected) Principle (Maxent), due to Jaynes [1957], can serve as a payoff. In the context of a game, a probability distribution guiding principle for selecting among the many possible P over An is called a joint mixed strategy, i.e., a random- CE of a game. This is in part because, as we show, Maxent ized strategy where players play a ∈ An with probability CE affords individual players additional guarantees over ar- P(a). Given a joint mixed strategy P, let P(ai) denote bitrary CE and exhibits other representational and compu- the individual mixed strategy of player i, i.e., the marginal tationally attractive properties. While in many cases CE probability that player i plays ai in Ai, and P(a−i|ai) the has been shown more computationally amenable than NE, conditional joint mixed strategy of all the players except i computing CE with specific properties has been found to given the action of player i, i.e., the conditional probability be hard in general. that, given that player i plays ai, the other players play a−i. We provide simple gradient-based algorithms for comput- An equilibrium is generally considered the solution of a ing Maxent CE that closely resemble statistical inference game. An equilibrium can be viewed as a point of strategic algorithms used successfully in practice (see, for example, stance, where every player is “happy,” i.e., no player has Berger et al. [1996] and Della Pietra et al. [1997]). The any incentive to unilaterally deviate from the way they play. algorithms have a “learning” interpretation. In particular, This paper concerns correlated equilibria. Let they can be seen as natural update rules which individual G (a0 , a , a ) ≡ M (a0 , a ) − M (a , a ) denote players use during some form of pre-play negotiation. The i i i −i i i −i i i −i player i’s payoff gain from playing a0 instead of a when update rules can also be implemented efficiently. We relate i i the other players play a .A correlated equilibrium (CE) this work to that on learning dynamics and discuss other −i for a game G is a joint mixed strategy P such that for possible connections to the general literature on learning every joint action a drawn according to P, each player i, in games. More specifically, we comment on how Maxent knowing P and its own action a in a only, has no payoff may help us characterize, just as it does for many natural i gain, in expectation, from unilateral changing its play systems (e.g., in thermodynamics), the equilibrium behav- to another action a0 ∈ A instead; formally, if for every ior that is likely to arise from the dynamics of rational in- i i player i, and every action pair (a , a0 ) ∈ A2, such that dividuals, each following his or her own individually con- i i i a 6= a0 and P(a ) > 0, trolled learning mechanism. i i i P P(a |a )G (a0 , a , a ) ≤ 0. a−i∈A−i −i i i i i −i 2 PRELIMINARIES One way to establish the existence of CE for any game In this section, we introduce some basic game-theoretic is through its connection to Nash equilibria (NE). An NE concepts, terminology and notation. (For a thorough intro- is a CE P that is a product distribution, i.e.,P(a) = duction to game theory, see Fudenberg and Tirole [1991], Qn i=1 P(ai) for all a, and therefore players play indepen- for instance.) dently. The classical result of Nash [1951] is that for any Let n be the number of players in the game. For each player game there exists a (Nash) equilibrium, thus CE always ex- i = 1, . , n, let Ai be its finite set of actions (also called ist. n pure strategies). Let A ≡ ×i=1Ai be the joint-action Given a game G, it is computationally convenient to ex- space; each element a ≡ (a1, . , an) ∈ A is a called a press its CE conditions as the following equivalent sys- joint action, i.e., if ai is the ith component of a, player i n tem of linear constraints on the joint-mixed-strategy values plays ai in a. Let A−i ≡ ×j=1,j6=iAj denote the joint 2 0 2 P(a): for each player i, and for all (ai, ai) ∈ Ai such that action space of all the players except that of i. Similarly, 0 ai 6= ai, given a ∈ A, let a−i ∈ A−i denote the joint action, in a, of all the players except that of i. It is often convenient to P 0 a ∈A P(ai, a−i)Gi(ai, ai, a−i) ≤ 0; (1) denote the joint action a by (ai, a−i) to highlight the action −i −i of player i in a. for all a ∈ A, P(a) ≥ 0, and P P(a) = 1. We re- n a∈A Let Mi : A → R be the payoff function of player i, i.e., fer to the set of all P that satisfy this linear system as the if players play joint action a ∈ A, then each player i in- correlated equilibria of G and denote it simply by CE, i.e., dividually receives a payoff value of Mi(a). Let M ≡ P ∈ CE if and only if P is a CE. {M1,...,Mn} be the set of payoff functions, one for each player.

Maximum Entropy Correlated Equilibria

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support