Correlated Equilibrium

Announcements Ø Minbiao’s office hour will be changed to Thursday 1-2 pm, starting from next week, at Rice Hall 442 1 CS6501: Topics in Learning and Game Theory (Fall 2019) Introduction to Game Theory (II) Instructor: Haifeng Xu Outline Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis 3 Recap: Normal-Form Games Ø � players, denoted by set � = {1, ⋯ , �} Ø Player � takes action �* ∈ �* Ø An outcome is the action profile � = (�., ⋯ , �/) • As a convention, �1* = (�., ⋯ , �*1., �*2., ⋯ , �/) denotes all actions excluding �* / ØPlayer � receives payoff �*(�) for any outcome � ∈ Π*5.�* • �* � = �*(�*, �1*) depends on other players’ actions Ø �* , �* *∈[/] are public knowledge ∗ ∗ ∗ A mixed strategy profile � = (�., ⋯ , �/) is a Nash equilibrium ∗ ∗ (NE) if for any �, �* is a best response to �1*. 4 NE Is Not the Only Solution Concept ØNE rests on two key assumptions 1. Players move simultaneously (so they cannot see others’ strategies before the move) 2. Players take actions independently ØLast lecture: sequential move results in different player behaviors • The corresponding game is called Stackelberg game and its equilibrium is called Strong Stackelberg equilibrium Today: we study what happens if players do not take actions independently but instead are “coordinated” by a central mediator Ø This results in the study of correlated equilibrium 5 An Illustrative Example B STOP GO A STOP (-3, -2) (-3, 0) GO (0, -2) (-100, -100) The Traffic Light Game Well, we did not see many crushes in reality… Why? ØThere is a mediator – the traffic light – that coordinates cars’ moves Ø For example, recommend (GO, STOP) for (A,B) with probability 3/5 and (STOP, GO) for (A,B) with probability 2/5 • GO = green light, STOP = red light • Following the recommendation is a best response for each player • It turns out that this recommendation policy results in equal player utility − 6/5 and thus is “fair” This is exactly how traffic lights are designed! 6 Correlated Equilibrium (CE) ØA (randomized) recommendation policy � assigns probability �(�) for each action profile � ∈ � = Π*∈ / �* • A mediator first samples � ∼ �, then recommends �* to � privately ØUpon receiving a recommendation �*, player �’s expected utility is . ∑ � � , � ⋅ �(� , � ) @ BCD∈ECD * * 1* * 1* • � is a normalization term that equals the probability �* is recommended A recommendation policy � is a correlated equilibrium if ∑ ∑ I I BCD �* �*, �1* ⋅ �(�*, �1*) ≥ BCD �* � *, �1* ⋅ � �*, �1* , ∀ � * ∈ �*, ∀� ∈ � . Ø That is, any recommended action to any player is a best response • CE makes incentive compatible action recommendations Ø Assumed � is public knowledge so every player can calculate her utility 7 Basic Facts about Correlated Equilibrium Fact. Any Nash equilibrium is also a correlated equilibrium. Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists Fact. The set of correlated equilibria forms a convex set. ØIn fact, distributions � satisfies a set of linear constraints ∑ ∑ I I BCD �* �*, �1* ⋅ �(�*, �1*) ≥ BCD �* � *, �1* ⋅ � �*, �1* , ∀ � * ∈ �*, ∀� ∈ � . 8 Basic Facts about Correlated Equilibrium Fact. Any Nash equilibrium is also a correlated equilibrium. Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists Fact. The set of correlated equilibria forms a convex set. ØIn fact, distributions � satisfies a set of linear constraints ØThis is nice because that allows us to optimize over all CEs ØNot true for Nash equilibrium 9 Coarse Correlated Equilibrium (CCE) ØA weaker notion of correlated equilibrium ØAlso a recommendation policy �, but only requires that any player does not have incentives to opting out of our recommendations A recommendation policy � is a coarse correlated equilibrium if I I ∑B∈E �* � ⋅ �(�) ≥ ∑B∈E �* � *, �1* ⋅ � � , ∀ � * ∈ �*, ∀� ∈ � . That is, for any player �, following �’s recommendations is better than opting out of the recommendation and “acting on his own”. Compare to correlated equilibrium condition: ∑ ∑ I I BCD �* �*, �1* ⋅ �(�*, �1*) ≥ BCD �* � *, �1* ⋅ � �*, �1* , ∀ � * ∈ �*, ∀� ∈ � . 10 Coarse Correlated Equilibrium (CCE) ØA weaker notion of correlated equilibrium ØAlso a recommendation policy �, but only requires that any player does not have incentives to opting out of our recommendations A recommendation policy � is a coarse correlated equilibrium if I I ∑B∈E �* � ⋅ �(�) ≥ ∑B∈E �* � *, �1* ⋅ � � , ∀ � * ∈ �*, ∀� ∈ � . That is, for any player �, following �’s recommendations is better than opting out of the recommendation and “acting on his own”. Fact. Any correlated equilibrium is a coarse correlated equilibrium. 11 The Equilibrium Hierarchy Coarse Correlated Equilibrium (CCE) Correlated Equilibrium (CE) Nash Equilibrium (NE) There are other equilibrium concepts, but NE and CE are most often used. CCE is not used that often. 12 Outline Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis 13 Zero-Sum Games ØTwo players: player 1 action � ∈ � = {1, ⋯ , �}, player 2 action � ∈ [�] ØThe game is zero-sum if �. �, � + �N �, � = 0, ∀� ∈ � , � ∈ [�] • Models the strictly competitive scenarios • “Zero-sum” almost always mean “2-player zero-sum” games • �-player games can also be zero-sum, but not particularly interesting Ø Let �. �, � = ∑*∈ Q ,R∈[/] �. �, � �*�R for any � ∈ ΔQ, � ∈ Δ/ ∗ ∗ ∗ ∗ ∗ Ø (� , � ) is a NE for the zero-sum game if: (1) �. � , � ≥ �.(�, � ) for ∗ ∗ ∗ any � ∈ [�]; (2) �. � , � ≤ �.(� , �) for any j ∈ [�] ∗ ∗ ∗ ∗ ∗ ∗ Ø Condition �. � , � ≤ �.(� , �) ⟺ �N � , � ≥ �N � , � Ø We can “forget” �N; Instead think of player 2 as minimizing player 1’s utility 14 Maximin and Minimax Strategy ØPrevious observations motivate the following definitions ∗ Definition. � ∈ ΔQ is a maximin strategy of player 1 if it solves max min �1(�, �). Z∈[\ R∈[/] The corresponding utility value is called maximin value of the game. Remarks: Ø �∗ is player 1’s best action if he was to move first 15 Maximin and Minimax Strategy ØPrevious observations motivate the following definitions ∗ Definition. � ∈ ΔQ is a maximin strategy of player 1 if it solves max min �1(�, �). Z∈[\ R∈[/] The corresponding utility value is called maximin value of the game. ∗ Definition. � ∈ Δ/ is a minimax strategy of player 2 if it solves min max �1(�, �). _∈[` *∈[Q] The corresponding utility value is called minimax value of the game. Remark: �∗ is player 2’s best action if he was to move first 16 Duality of Maximin and Minimax Fact. max min �1(�, �) ≤ min max �1(�, �). Z∈[\ R∈[/] _∈[` *∈[Q] That is, moving first is no better. ∗ Ø Let � = argmin max �1(�, �), so _∈[` *∈[Q] ∗ min max �.(�, �) = max �1(�, � ) _∈[` *∈ Q *∈ Q Ø We have ∗ ∗ max min �1(�, �) ≤ max �1(�, � ) = max �1(�, � ) Z∈[\ R∈[/] Z∈[\ *∈ Q 17 Duality of Maximin and Minimax Fact. max min �1(�, �) ≤ min max �1(�, �). Z∈[\ R∈[/] _∈[` *∈[Q] Theorem. max min �1(�, �) = min max �1(�, �). Z∈[\ R∈[/] _∈[` *∈[Q] Ø Maximin and minimax can both be formulated as linear program Maximin Minimax max � min � Q s.t. � ≥ ∑/ � (�, �) � , ∀� ∈ [�] s.t. � ≤ ∑*5. �.(�, �) �*, ∀� ∈ [�] R5. R Q ∑/ ∑*5. �* = 1 R5. �R = 1 �* ≥ 0, ∀� ∈ [�] �R ≥ 0, ∀� ∈ [�] Ø This turns out to be primal and dual LP. Strong duality yields the equation 18 “Uniqueness” of Nash Equilibrium (NE) Theorem. In 2-player zero-sum games, (�∗, �∗) is a NE if and only if �∗ and �∗ are the maximin and minimax strategy, respectively. ⇐: if �∗ [�∗] is the maximin [minimax] strategy, then (�∗, �∗) is a NE ∗ ∗ ∗ ØWant to prove �. � , � ≥ �. �, � , ∀� ∈ [�] ∗ ∗ ∗ �. � , � ≥ min �. � , � e = max min �. �, � Z∈[\ e = min max �.(�, �) _∈[` *∈[Q] ∗ = max �.(�, � ) *∈[Q] ∗ ≥ �. �, � , ∀� ∗ ∗ ∗ Ø Similar argument shows �. � , � ≤ �. � , � , ∀� ∈ [�] Ø So �∗, �∗ is a NE 19 “Uniqueness” of Nash Equilibrium (NE) Theorem. In 2-player zero-sum games, (�∗, �∗) is a NE if and only if �∗ and �∗ are the maximin and minimax strategy, respectively. ⇒: if (�∗, �∗) is a NE, then �∗ [�∗] is the maximin [minimax] strategy ØObserve the following inequalities ∗ ∗ ∗ �. � , � = max �.(�, � ) *∈[Q] ≥ min max �. �, � _∈[` *∈ Q = max min �. �, � Z∈[\ e ∗ ≥ min �. � , � e ∗ ∗ = �. � , � Ø So the two “≥” must both achieve equality. • The first equality implies �∗ is the minimax strategy • The second equality implies �∗ is the maximin strategy 20 “Uniqueness” of Nash Equilibrium (NE) Theorem. In 2-player zero-sum games, (�∗, �∗) is a NE if and only if �∗ and �∗ are the maximin and minimax strategy, respectively. Corollary. Ø NE of any 2-player zero-sum game can be computed by LPs Ø Players achieve the same utility in any Nash equilibrium. • Player 1’s NE utility always equals maximin (or minimax) value • This utility is also called the game value 21 The Collapse of Equilibrium Concepts in Zero-Sum Games Theorem. In a 2-player zero-sum game, a player achieves the same utility in any Nash equilibrium, any correlated equilibrium, any coarse correlated equilibrium and any Strong Stackelberg equilibrium. ØCan be proved using similar proof techniques as for the previous theorem ØThe problem of optimizing a player’s utility over equilibrium can also be solved easily as the equilibrium utility is the same 22 Outline Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis 23 Generative Modeling Input data points drawn Output data points drawn from distribution �hijk from distribution �lmnko Goal: use data

Correlated Equilibrium

Lecture Notes

Collusion Constrained Equilibrium

Correlated Equilibria and Communication in Games Françoise Forges

Characterizing Solution Concepts in Terms of Common Knowledge Of

Correlated Equilibria 1 the Chicken-Dare Game 2 Correlated

Correlated Equilibrium Is a Distribution D Over Action Proﬁles a Such That for Every ∗ Player I, and Every Action Ai

Maximum Entropy Correlated Equilibria

Online Learning, Regret Minimization, Minimax Optimality, and Correlated

Rationality and Common Knowledge Herbert Gintis

Correlated Equilibrium in a Nutshell

Yale Lecture on Correlated Equilibrium and Incomplete Information

Faster Regret Matching