Von Neumann's Minimax Theorem Outline for Today Recall

Outline for today

Stat155 Game Theory Lecture 9: von Neumann’s Minimax Theorem Nash equilibrium. Games as linear programs. von Neumann’s minimax theorem Peter Bartlett Low regret learning algorithms for playing games.

September 22, 2016

1 / 20 2 / 20 Recall: Saddle Point Nash equilibrium

Definition Definition A pair( x , y ) ∆m ∆n is a Nash equilibrium for a payoff matrix A pair( i ∗, j∗) 1,..., m 1,..., n is a saddle point (or pure Nash m ∗n ∗ ∈ × ∈ { } × { m} n A R × if equilibrium) for a payoff matrix A R × if ∈ ∈ max x>Ay = x>Ay = min x>Ay. x ∆m ∗ ∗ ∗ y ∆n ∗ ∈ ∈ max aij = ai j = min ai j . i ∗ ∗ ∗ j ∗ Just like a pure Nash equilibrium (saddle point), but mixed strategies. If Player I plays x and Player II plays y , neither player has an If Player I plays i and Player II plays j , neither player has an ∗ ∗ ∗ ∗ incentive to change. incentive to change. x is a best response to y , y is a best response to x . Think of these as locally optimal strategies for the players. ∗ ∗ ∗ ∗ Think of these as locally optimal strategies for the players. We’ve seen that they are also globally optimal strategies. We’ll see that they are also globally optimal strategies.

3 / 20 4 / 20 Nash equilibrium Outline

Theorem

For x ∆m and y ∆n, the following are equivalent. ∗ ∈ ∗ ∈ 1 (x , y ) is a Nash equilibrium. ∗ ∗ 2 x and y are optimal: ∗ ∗ Nash equilibrium. min x>Ay = max min x>Ay, max x>Ay = min max x>Ay. y ∗ x y x ∗ y x Games as linear programs. von Neumann’s minimax theorem Low regret learning algorithms for playing games. Proof (1) (2): The same as the proof for the optimality of a saddle point. (2)⇒(1): The von Neumann minimax theorem implies that ⇒

x >Ay min x >Ay = max min x >Ay= min max x >Ay = max x >Ay x >Ay ∗ ∗ ≥ y ∗ x y y x x ∗ ≥ ∗ ∗ so( x , y ) is a Nash equilibrium. ∗ ∗ 5 / 20 6 / 20 Linear Programs Linear Programs

A linear program is an optimization problem involving the choice of a real vector to maximize a linear objective subject to linear constraints: From the perspective of the row player, a two-player zero-sum game is an optimization problem of the form max min x>Ae max b>x m i x Rn x R i 1,...,n ∈ ∈ ∈{ } s.t. d>x c1 s.t. x 0 1 ≤ 1 ≥ d>x c2 . 2 ≤ . . . xm 0 ≥ d>x c 1>x = 1 k ≤ k

n Here, b R speciﬁes the linear objective x b>x, and This is not a linear program: n ∈ 7→ di R , ci R specify the ith constraint. The constraints are linear, but the objective is not. But we can ∈ ∈ The set of xs that satisfy the constraints is a polytope convert it to a linear program, by introducing a slack variable. (an intersection of half-spaces).

7 / 20 8 / 20 Linear Programs Linear Programs

max v x Rm,v R For the column player, we can obtain a similar linear program: ∈ ∈ min v s.t. v x>Ae1 y Rm,v R ≤ ∈ ∈ . s.t. v e>Ay . ≥ 1 . v x>Ae . ≤ n x1 0 v e>Ay ≥ ≤ n . y1 0 . ≥ . x 0 . m ≥ 1 x = 1 y 0 > m ≥ 1>y = 1 By maximizing the lower bound v on all of the x>Aei we maximize the minimum of the x>Aei .

9 / 20 10 / 20 Linear Programs: An Aside Outline

There are efficient algorithms for solving linear programs. (“Efficient” run-time is polynomial in the size of the problem.) The column player’s linear program is the dual of the row player’s linear program: For any concave maximization problem, like the row player’s linear Nash equilibrium. program (we’ll call it the primal problem), it is possible to define a Games as linear programs. dual convex minimization problem, like the column player’s linear von Neumann’s minimax theorem program. Low regret learning algorithms for playing games. This dual problem has a value that is at least as large the value of the primal problem. In many important cases (such as our linear program), these values are the same. In optimization, this is called strong duality. This is von Neumann’s minimax theorem. The principle of indifference is a general property of dual optimization problems (called ‘complementary duality’). 11 / 20 12 / 20 von Neumann’s Minimax Theorem Proof of von Neumann’s Minimax Theorem

Theorem m n For any two-person zero-sum game with payoff matrix A R × , ∈ There are many proofs; see the text for a proof based on the max min x>Ay = min max x>Ay. separating hyperplane theorem from convex analysis. x ∆m y ∆n y ∆n x ∆m ∈ ∈ ∈ ∈ We will consider playing the game repeatedly, with one player learning a good strategy (playing small improvements on its previous actions) We call the optimal expected payoff the value of the game, V . and the other player giving a best response. This gives an explicit sequence of strategies for the two players that LHS: Player I plays x ∆m first, then Player II responds with y ∆n. ∈ ∈ converges to their optimal strategies, and that proves the minimax RHS: Player II plays y ∆ first, then Player I responds with x ∆ . ∈ n ∈ m theorem. Notice that we should always prefer to play last: The key property of the learner is that it has small regret.

max min x>Ay min max x>Ay. x ∆m y ∆n ≤ y ∆n x ∆m ∈ ∈ ∈ ∈ The astonishing part is that it does not help.

13 / 20 14 / 20 Proof of von Neumann’s Minimax Theorem Proof of von Neumann’s Minimax Theorem

Definition (Regret in a repeated game) Consider a two-player zero-sum game that is repeated for T rounds. At each round, the row player chooses an x ∆ , then the column player t ∈ m chooses a y ∆ , and the row player receives a payoff of x Ay . t ∈ n t> t The row player’s regret after T rounds is how much its total payoff falls We’ll see that there are learning algorithms that have low regret short of the best in retrospect that it could have achieved against the against any sequence played by the column player. column player’s choices with a fixed mixed strategy: These learning algorithms don’t need to know anything about the

T T game in advance, they just need to see, after each round, the column vector of payoffs corresponding to the column player’s choice. RT = max x>Ayt xt>Ayt . x ∆m − ∈ Xt=1 Xt=1 best total payoff total payoff | {z } | {z }

If the row player’s regret R grows sublinearly (that is, R /T 0), T T → we say that it has low regret.

15 / 20 16 / 20 Proof of von Neumann’s Minimax Theorem Proof of von Neumann’s Minimax Theorem

Proof (max min x >Ay min max x >Ay) Theorem x y ≥ y x The existence of a row player with low regret implies the minimax theorem. max min x>Ay min x¯>Ay x ∆m y ∆n ≥ y ∆n ∈ ∈ ∈ T 1 T Proof 1 RT = min xt>Ay max x Ay y ∆n T > t Suppose the row player has low regret. ∈ t=1 ··· ≥ x ∆m T − T X ∈ t=1 1 T T X Deﬁne¯x = T t=1 xt . 1 RT min xt>Ay = max x>Ay¯ Suppose that the column player plays a best response yt against the ≥ T y ∆n x ∆m − T P t=1 ∈ ∈ row player’s choice xt : X R T T 1 min max x>Ay . xt>Ayt = min xt>Ay. ≥ y ∆n x ∆m − T y ∆n = x>Ayt ∈ ∈ ∈ T t t=1 1 T X As T , RT /T 0. Deﬁne¯y = T t=1 yt . → ∞ → ··· P

17 / 20 18 / 20 Proof of von Neumann’s Minimax Theorem Outline

Theorem The existence of a row player with low regret implies the minimax theorem.

In the proof, the row player is a low regret learning algorithm, playing Nash equilibrium. xt , the column player plays a best response yt , and we deﬁne Games as linear programs. 1 T 1 T x¯ = T t=1 xt ,¯y = T t=1 yt . von Neumann’s minimax theorem Low regret learning algorithms for playing games. The proofP shows that¯x Pand¯y are asymptotically optimal, in the sense that that the gain of¯x and the loss of¯y approach the value of the game. Next, we’ll consider a speciﬁc low regret learning algorithm: gradient ascent.

19 / 20 20 / 20