ANNAANNA JAJA ŚŚKIEWICZKIEWICZ GAMEGAME THEORYTHEORY STATIC GAMES (Simultaneous Decision Games) LECTURE 1 What is ? • A mathematical formalism for understanding designing and predicting the of games. • A game is characterized by a number of players (more than 2), assumed to be intelligentintelligent and rationalrational that interact with each other by selecting various actions based on their assigned preferences. • A player (decision maker) is characterized by a set of actions available for him and a set of relationships defined for each possible action tuple. History of game theory

• First studies of games in economic literature deal with models on oligopoly pricing and production: Cournot (1838), Bertrand (1883), Edgeworth (1925)

• The beginning of game theory: and Oskar Morgenstern “Theory of Games and Economic Behaviour”

• Nash (1950) proposed what came to be known as “

• Nobel Prize in Economics (1994): J. Harsanyi, J. Nash, R. Selten

Selten (1965) introduced his concept of a perfection. Harsanyi (1967-68) introduced his Bayesian Nash Equilibrium to model situations of incomplete information. Example: Prisoners’ Dilemma

• Two crooks are being questioned by the police in connection with a serious crime . They are held in separate cells and cannot communicate. If both of them confess , they will get 4-year sentence, if neither confesses , they spend 2 years in jail. If only one confesses, then the confessor will be set free and the other spend 5 years in jail. • Game model: PLAYER 2

Keep Quiet Confess

P Keep L Quiet A -2,-2 -5,0 Y E R 1 Confess 0,-5 -4,-4

ANALYSIS LEADS TO A SOLUTION:

PLAYER 2

Keep Quiet Confess

P Keep L Quiet A -2,-2 -5,0 Y E R 1 Confess 0,-5 -4,-4

This is a paradoxical result that shows the difference between non-interactive and interactive decision models. For example: paying taxes (everyone is better off if he does not pay taxes but then there is no money to provide community services and everyone is worse off). PARETO OPTIMALITY A solution is said to be Pareto optimal (after the italian economist Vilfredo Pareto) if no players’ payoff can be increased without decreasing the payoff to another player. Such solutions are also called Pareto/socially efficient or just efficient.

PLAYER 2

Keep Quiet Confess

P Keep L Quiet A -2,-2 -5,0 Y E R 1 Confess 0,-5 -4,-4

Here, NE is not Pareto optimal. MODEL OF A STATIC GAME

1. The set of players indexed by i ∈{1,...,n}.

2. Si - a pure (or action) set for each player.

3. r : X n S →  is a payoff function for Player i. i i=1 i

A tabular description of a game using pure strategies is called normal form or strategic form of a game. DEFINITION: A mixed strategy σ i for player i gives the probabilities that action/ strategy s ∈Si will be played.

Clearly, σ i is an element of P(Si ), the set of probability measures on the set Si .

The payoff for player i when player 1 uses a strategy σ1 and player 2 uses a strategy σ 2 is:

ri (σ1,σ 2 ) = ∑ ∑ p(s1 )q(s2 )ri (s1,s2 ) s ∈S s ∈S 1 1 2 2 where p(s1 ) is the probability of using a pure strategy s1 and q(s2 ) is the probability of using a pure strategy s2 . EXAMPLE:

Let σ1=(1/3,1/3,1/3) and let σ 2 =(0,1/2,1/2).

Calculate: r1 (σ1,σ 2 )

L M R

U 4,3 5,1 6,2

C 2,1 8,4 3,6

D 3,0 9,6 2,8 r1 (σ1,σ 2 ) = 1/3⋅1/2 ⋅5+1/3⋅1/2⋅ 8+1/3⋅1/2 ⋅8+...+1/3⋅1/2 ⋅2= =1/6(5+8+9+6+5)=11/2

WEAK AND STRICT DOMINANCE

The strategy σ1 is weakly dominated by σ1 ' if for every

σ 2 ∈P(S2 ), we have r1 (σ1 ',σ 2 ) ≥ r1(σ1,σ 2 ).

The strategy σ1 is strictly dominated by σ1 ' if for every

σ 2 ∈P(S2 ), we have r1 (σ1 ',σ 2 ) ≥ r1(σ1,σ 2 ) and there exists a strategy σˆ 2 ∈P(S2 ) such that r1 (σ1 ',σˆ 2 ) > r1(σ1,σˆ 2 ). Assumption: of Rationality: • Players are rational and all know that the others are rational. • Players all know that the other players know that they are rational, etc.

ITERATED DOMINANCE ALGORITHM

L M R SOLUTION U 1,0 1,2 0,1

D 0,3 0,1 2,0 Analysis: • Player 1 has no dominated strategies. • R is dominated by M. • Now D is dominated by U. • Clearly, Player 2 will play column M.

Remarks:

• There is a problem with the iterated dominance algorithm: the solution may depend on the order in which strategies are eliminated, e.g., L M R U 1,3 4,2 2,2

C 4,0 0,3 4,1

D 2,5 3,4 5,6

Order 1: eliminate D, then is the solution. Order 2: eliminate R, four pairs are left.

• Very few games are solvable by the iterated dominance algorithm. NASH EQUILIBRIUM

A Nash Equilibrium for a two-player game is a pair of strategies * * (σ1 ,σ 2 )∈P(S1 ) × P(S2 ) such that

* * * r1 (σ1 ,σ 2 ) ≥ r1(σ1,σ 2 ) for every σ1 ∈P(S1 ), and

* * * r2 (σ1 ,σ 2 ) ≥ r2 (σ1 ,σ 2 ) for every σ 2 ∈P(S2 ).

In other words, neither player has a unilateral incentive to change its strategy.

Note: Strictly dominated strategies cannot form a NE. NASH EQUILIBRIUM PROPERTIES

• Existence

- Pure strategy Nash equilibrium may not exist

• Uniqueness

- Nash equilibrium need not be unique

• Efficiency

- Pareto Optimality? Not always, see Prisoners’ Dilemma.

How to get to the Nash equilibrium? EXAMPLES:

• A pure strategy Nash Equilibrium can be found by the arrow diagram:

L M R U 1,3 4,2 2,2

C 4,0 0,3 4,1

D 2,5 3,4 5,6

Is the pair of strategies Pareto efficient? YES! • Matching Pennies: - two players simultaneously announce head or tail - if their choices match, Player 1 wins, otherwise Player 2 wins.

H T

H 1,-1 -1,1

T -1,1 1,-1

Nash Equilibrium: each player has to randomise his choice with probability 1/2. • Battle of the Sexes: - two players: wife and husband want to spend this evening together; - they have their own pre fences: the wife (Pl 2) wants to watch a classical ballet and her husband (Pl 1) wants to watch a football match;

B F

B 1,2 0,0

F 0,0 2,1

* * Nash Equilibria: , , a mixed equilibrium: (σ1 ,σ 2 ), * * where σ1 = (1 / 3,2 / 3) and σ 2 = (2 / 3,1/ 3). • Examples

L R L R

U 2,3 -2,7 U 2,3 -2,7

D 6,-5 0,-1 D 6,-5 3,5

- Nash equilibrium

- Pareto optimality STATIC GAMES (Simultaneous Decision Games) LECTURE 2 How to obtain Nash Equilibria?

Mathematics

Let us consider a two-player game with the payoff matrix A for player 1 and the payoff matrix B for player 2. Recall that * * a strategy pair (x , y ) is called a Nash Equilibrium if

* *t *t x Ay ≥ xAy for all mixed strategies x, and * *t * t x By ≥ x By for all mixed strategies y. THEOREM (NASH, 1950):

Every finite two-player game has at least one Nash equilibrium point.

This result is equivalent the following theorem.

Brouwer’s Fixed Point Theorem: Suppose that S is non-empty, compact, convex set in n  . If the function f :S → S is continuous, then a fixed point s exists satisfying

s = f (s). Proof of the Nash Theorem: • For a pair of mixed strategies (x,y) define: t t ci = max{Ai•y − xAy ,0} t d j = max{xA• j − xAy ,0} and

' xi + ci ' y j + d j xi = , y j = . 1 + ∑ck 1+ ∑dk k k

• Clearly, T(x,y)=(x’,y’) is continuous.

• We prove that (x,y)=(x’,y’) iff (x,y) is a NE. • ⇐ Indeed, if (x,y) is a NE, then for all i, we have t t xAy ≥ Ai• y . Hence, ci=0. Similarly, d j =0 and x=x’, y=y’.

• ⇒ Assume that (x,y) is not a NE. Hence, there exists x such that t t t t xAy > xAy (or y such that xBy > xBy ). We shall only consider the first case, since the second one is analogous. t t Note that there is an index i such that Ai• y > xAy . Consequently, ci>0 and ∑ck >0. k t t The expression is xAy is also a weighted average of Ai• y (with t t xi as the weights). Thus, Ai• y ≤ xAy for some i such that xi >0. For this i, ci=0 and therefore

' xi xi = < xi . 1 + ∑ck k Hence, x≠x’. In the second case, y≠y’. Hence, (x,y)=(x’,y’) if and only if a pair (x,y) is a NE. • The set of all pairs of strategies is bounded, closed and convex, and therefore satisfies the assumptions of Brouwer’s Fixed Point Theorem. Since T(x,y)=(x’,y’) is continuous, it must possess a fixed point. This point is a pair of a Nash Equilibrium.  Remark: The proof of the theorem can be easily generalized for n-player games. STRATEGY:

A strategy σˆ1 is a best response to some fixed strategy σ 2 for player 2 if

σˆ1 ∈arg max r1 (σ1,σ 2 ) σ ∈P(S ) 1 1

Similarly, σˆ 2 is a best response to some σ1 if

σˆ 2 ∈arg max r2 (σ1,σ 2 ) σ2 ∈P(S2 ) An equivalent definition of NE is the following:

* * A pair (σ1 ,σ 2 ) is a Nash Equilibrium if

ˆ * arg maxr ( , * ) σ 1 ∈ 1 σ1 σ 2 σ ∈P(S ) 1 1 , and

* * σˆ 2 ∈arg max r2 (σ1 ,σ 2 ) σ2 ∈P(S2 )

We use this definition for drawing the so-called best response diagram. Example: Matching Pennies

b 1-b H T

a H 1,-1 -1,1 σ 2 =(b,1-b)

1-a T -1,1 1,-1 σ1 =(a,1-a)

r1 (σ1,σ 2 ) = ab − a(1− b) − b(1− a) + (1− a)(1− b) = (1 − 2b)(1 − 2a)

- If b<1/2, then the best response is to choose a=0. - If b>1/2, then the best response is to choose a=1. - If b=1/2, then the best response is any strategy a. Similarly, r2 (σ1,σ 2 ) = −ab + a(1− b) + b(1− a) − (1 − a)(1− b) = (2b −1)(1 − 2a)

- If a<1/2, then the best response is to choose b=1. - If a>1/2, then the best response is to choose b=0. - If a=1/2, then the best response is any strategy b.

b

Nash Equilibriun The Best Response Diagram

a THE EQUALITY OF PAYOFFS:

Support of a strategy σ is a set Sσ ⊆ S for which σ specifies p(s)>0.

* * * Theorem: Let the pair (σ1 ,σ 2 ) be a Nash Equilibrium and let S1 * be a support of σ1 . Then

* * * * r1 (σ1 ,σ 2 ) = r1(s1,σ 2 ) for every s1 ∈S1 . Proof: * • If S1 contains only one strategy, then the theorem is trivially true. * • Suppose that S1 contains more than one strategy and suppose that at least one strategy gives higher payoff for player 1 than * * * r1 (σ1 ,σ 2 ): case (i) . It is also possible that all s ∈S1 gives the * * payoff that is not greater then r1 (σ1 ,σ 2 ): case (ii). Case (i): * Let s' ∈S1 be the strategy that gives the greatest such a payoff. Then

* * * * * r1 (σ1 ,σ 2 ) = ∑ p(s)r1(s,σ 2 ) = ∑ p(s)r1(s,σ 2 ) + p(s ')r1 (s',σ 2 ) * * s∈S1 s∈S1 s≠s' * * * < ∑ p(s)r1(s',σ 2 ) + p(s')r1(s',σ 2 ) = r1 (s',σ 2 ). * s∈S1 s≠s'

* * But this contradicts that (σ1 ,σ 2 ) is a Nash Equilibrium! * Case (ii): Assume that there exists s' ∈S1 such that

* * * r1 (s',σ 2 )

* * * r1 (s,σ 2 ) ≤ r1 (σ1 ,σ 2 ). Multiplying these inequalities by p(s’) and p(s) and adding them, we have

* * * * r1 (σ1 ,σ 2 ) < r1(σ1 ,σ 2 ). 

Clearly, the corresponding result also holds for Player 2. EXAMPLES: • Matching Pennies:

b 1-b H T

a H 1,-1 -1,1 σ 2 =(b,1-b)

1-a T -1,1 1,-1 σ1 =(a,1-a)

* * r1 (H,σ 2 ) = r1(T,σ 2 ) ⇔ br1 (H,H ) + (1− b)r1(H,T ) = br1(T , H ) + (1 − b)r1(T,T ) b+(1-b)(-1)=-b+(1-b) ⇒ b=1/2. The expected payoff equals 0. • Battle of Sexes:

b 1-b B F

=(a,1-a) a B 1,2 0,0 σ1

=(b,1-b) 1-a F 0,0 2,1 σ 2

- The equality of payoffs:

2a+0(1-a)=0a+(1-a) ⇒ a=1/3 ⇒ σ1=(1/3,2/3),

b+0(1-b)=0b+2(1-b) ⇒ b=2/3 ⇒ σ 2 =(2/3,1/3). - The best response diagram: r1 (σ1,σ 2 ) = ab + 2(1− a)(1− b) = a(3b − 2) − 2b + 2 fixed If b<2/3, then the best response is a=0. If b>2/3, then the best response is a=1. If b=2/3, then the best response is any value of a∈[0,1].

r2 (σ1,σ 2 ) = 2ab + (1− a)(1− b) = b(3a − 1) − a +1 fixed If a<1/3, then the best response is b=0. If a>1/3, then the best response is b=1. If a=1/3, then the best response is any value of b∈[0,1]. Figure of the best response diagram:

b

NASH EQUILIBRIA

<(1/3B,2/3F),(2/3B,1/3F)>

a

The expected payoffs for a mixed equilibrium equal 2/3 for both players. NASH EQUILIBRIA FOR 2 × n AND m × 2 GAMES PLAYER 2 q 1-q PLAYER 1’s payoffs

-1,3 5,-2 -q+5(1-q)=5-6q P L A Y 2,1 4,5 2q+4(1-q)=4-2q E R 1 4,-2 -3,6 4q-3(1-q)=-3+7q

5,10 -4,-4 5q-4(1-q)=-4+9q Graph of these four lines:

NE not NE 5 (I) (II) 3 the corners, where the payoffs are 1 equalised

-1

-3

-5 Point (I) ↔ two-player game (row 1 and 2):

-1,3 5,-2

2,1 4,5

* (1/ 4,3/ 4) and * (4 / 9,5 / 9,0,0) σ 2 = σ 1 =

Point (II) ↔ two-player game (row 2 and 4):

2,1 4,5

5,0 -4,-4

* (8 /11,3/11) and * (0,7 / 9,0,2 / 9). σ 2 = σ 1 = NASH SOLUTION THEOREM

In an m × 2 [2 × n] game at least one of the intersection points of endpoints of the upper of edges of the row [column] player’s expected payoff graph yields a Nash Equilibrium. EQUILIBRIA FOR n-PLAYER GAMES

Strategy profile: ( , , ). σ1 … σ n Notation: ( ˆ , ) ( , , , ˆ , ,..., ). σ i σ −i = σ1 … σi−1 σ i σ i+1 σ n

Definition:

A strategy profile * ( * , , * ) is a Nash Equilibrium if σ = σ 1 … σ n

* * * ri (σ i ,σ −i ) ≥ ri (σ i ,σ −i )

for all σ i ∈P(Si ) and i=1,...,m. STATIC GAMES (Simultaneous Decision Games) LECTURE 3

CLASSIFICATION OF GAMES

GENERIC NON-GENERIC

A generic game is a one in which a small change to any one of the payoffs does not introduce new Nash equilibria or remove existing ones. Most of the games we have considered so far are generic:

• I Prisoners’ Dilemma • II Matching Pennies • III Battle of the Sexes

Oddness Theorem A generic game has an odd number of Nash Equilibria.

I - one pure strategy Nash Equilibrium II - one mixed strategy Nash Equilibrium III - one mixed strategy and two pure strategy Nash Equilibria Non-generic game:

b 1-b L M R

a U 10,0 5,1 4,-2 1-a D 10,1 5,0 1,-1

R is strictly dominated by M. Player 1 would get the same payoff playing U against L. Similarly, playing D against M Player 1 gets the same payoff.

Non-generic games may have infinitely many Nash equilibria.

Indeed, let σ1=(a,1-a), and σ 2 =(b,1-b,0). We shall draw the best response diagram. r1(σ1,σ 2 ) = 10ab + 5a(1− b) + 10b(1− a) + 5(1− a)(1− b) = 5 + 5b

The best response strategy for Player 1 to a fixed strategy σ 2 of Player 2 is to play (a,1-a) with any a∈[0,1].

r2 (σ1,σ 2 ) = a(1− b) + b(1− a) = a + b(1− 2a)

If a<1/2, then Player 2 should play b=1. If a>1/2, then Player 2 should play b=0. If a=1/2, then Player 2 can play any b∈[0,1]. The best response diagram for non-generic game:

b

Nash Equilibria lie on the blue line

a CLASSIFICATION OF GAMES

ZERO-SUM NON-ZERO SUM

A zero-sum game is a game, in which the payoffs add up to zero, for example Matching Pennies game. The interests of the players are exactly opposed: one only wins what the other loses.

Zero-sum games are the first games which were studied formally (the concept of Nash Equilibrium did not exist yet). Such games were solved by John von Neumann (The Theorem). Player 1 MAXIMISER Player 2 MINIMISER

r = r1 = −r2 v* is the lower value of the game, it is the amount that Player 1 can ensure himself.

v* = supinf r(σ1,σ 2 ) σ σ1 2

* v is the upper value of the game, it is the amount that Player 2 can guarantee himself.

* v = inf sup r(σ1,σ 2 ) σ 2 σ1 * Lemma: v ≥ v*. Proof:

• inf r(σ1,σ 2 ) ≤ r(σ1,σ 2 ) for all σ1 and σ 2 σ2

• inf r(σ1,σ 2 ) ≤ sup r(σ1,σ 2 ) for all σ1 and σ 2 σ 2 σ1

• supinf r(σ1,σ 2 ) ≤ supr(σ1,σ 2 ) for all σ 2 σ σ1 2 σ1

* • v* = supinf r(σ1,σ 2 ) ≤ inf supr(σ1,σ 2 ) = v σ σ σ1 2 2 σ1  If * , then we say that the game has a value . v* = v  v v The following result says that every finite zero-sum game has a value.

THE MINIMAX THEOREM (John von Neumann, 1928)

In every finite zero-sum game there exists a pair of strategies * * (σ1 ,σ 2 ) ∈P(S1 ) × P(S2 ) such that

* * * * r(σ1,σ 2 ) ≤ r(σ1 ,σ 2 ) ≤ r(σ1 ,σ 2 ) for all σ1 and σ 2 . * * Note that v = r(σ1 ,σ 2 ) is a value of the game. Indeed, the inequality in theorem means that

* * supr(σ1,σ 2 ) ≤ inf r(σ1 ,σ 2 ), σ σ1 2 and, therefore

* v = inf sup r(σ1,σ 2 ) ≤ sup inf r(σ1,σ 2 ) = v*. σ σ 2 σ1 σ1 2

* On the other hand, from the lemma: v ≥ v*. Hence, it must hold * v = v*. Clearly,

* * * v = r(σ1 ,σ 2 ) = v = v*.

* σ1 is an optimal (maxmin) strategy for Player 1, which ensures him an expected gain of at least v.

* σ 2 is an optimal (minmax) strategy for Player 2, which ensures him an expected loss of at most v.

* * (σ1 ,σ 2 ) A SADDLE POINT

Question: Let v be a value of the game and let σˆ1and σˆ 2 be strategies such that

v = r(σˆ1,σˆ 2 ).

Do the strategies σˆ1and σˆ 2 have to be optimal? Answer: No! For example, consider the Matching Pennies game:

H T

H 1,-1 -1,1 σˆ1=(1/2,1/2)

T =(1,0) -1,1 1,-1 σˆ 2

The value of the game v=0 and r(σˆ1,σˆ 2 ) = 0.

But σˆ 2 is not optimal strategy, because it does not ensure Player 2 the expected loss 0. Remarks: • The problem of the existence of a value of the game and a saddle point is a problem of separation of convex sets and its proof is based upon the Separating Hyperplane Theorem. It is not a problem of a fixed point of a certain correspondence as in the case of non-zero sum games.

* * • In zero-sum games, we call a pair of strategies (σ1 ,σ 2 ) optimal, because the interests of the players are exactly opposite and there is only one value. In non-zero-sum games this terminology does not make sense! We may have many Nash Equilibria that lead to different expected payoffs. Which equilibrium is the best?

Theorem: A generic zero-sum game has a unique solution, i.e., there exists only one pair of optimal strategies. SOLVING ZERO-SUM GAMES BY LINEAR PROGRAMMING

A standard form of a Linear Programm:

maxb1y1 + ...+ bn yn

a11y1 + ...+ a1n yn ≤ c1 subject to  ,

am1y1 + ...+ amn yn ≤ cm and y j ≥ 0 for j=1,...n.

The objective function is linear and the constraints are also linear. ⎛ a11 … a1n ⎞ Let A = ⎜    ⎟ be the payoff matrix for Player 1. He ⎜ ⎟ a a ⎝ m1  mn ⎠ p ⎛ 1 ⎞ m ⎜ ⎟ wants to choose  that maximize min piaij subject to the ⎜ ⎟ 1≤ j≤n ∑ p i=1 ⎝ m ⎠ constraints p1 + ...+ pm = 1, pi ≥ 0 for i=1,...,m. Although the constraints are linear the objective function m

min piaij 1≤ j≤n ∑ i=1 is not linear! We add a new variable v to the Player 1’s list of variables, restrict m to be less than the objective function v ≤ min piaij and make 1≤ j≤n ∑ i=1 v as large as possible. Hence, we have: (I)

maxv

m m subject to v ≤ ∑ piai1,…,v ≤ ∑ piain i=1 i=1

p1 + ...+ pm = 1, pi ≥ 0

for i=1,...,m. In a similar way, one may look at the problem from the minimiser’s point of view and arrive at the following Linear Programm: choose w and a vector (q1,...,qn ) that satisfy

(II) minw

n n w ≥ ∑ a1 jq j ,…,w ≥ ∑ amjq j subject to j=1 j=1 q + ...+ q = 1,q ≥ 0 1 n j

for j=1,...,n. Problems (I) and (II) can be solved by the SIMPLEX ALGORITHM (see MATLAB, MATHEMATICA). Simplex algorithm was invented by G. Dantzig, although some Linear Programms were solved earlier by the economists, e.g., T. Koopmans.

In Linear Programming there is a theory of duality that says that the programms (I) and (II) are dual. The so-called Duality Theorem implies that dual programms have the same value, i.e., v = w. This is exactly the Minimax Theorem. A GAME WITHOUT VALUE

Let S1 = S2 = {1,2,...} be the set of pure strategies for both players and let A = [aij ] be a payoff matrix for Player 1, where ⎧1 if i>j ⎪ aij = ⎨0 if i=j ⎪ ⎩−1 if i

Let σ 2 = (q1,q2 ,…) be a strategy used by Player 2, i.e., ( ) ∑qj = 1 j

Note that if Player 1 uses a pure strategy, then for every σ 2 , sup ∑aij qj = 1. 1≤i<∞ j Indeed, since ( ) holds, then for any ε > 0 there exists N such that N ∑qj ≥ 1− ε . j=1 Obviously, * v = inf sup r(σ1,σ 2 ) ≥ sup aij qj = 1, σ ∑ 2 σ1 1≤i<∞ j * and since maxaij = 1, then v = 1. Since the situation is i

‘symmetric’ for the minimiser, it follows that v* = −1. Conclusion: The Minimax Theorem cannot be easily extended to games with infinite pure strategy sets. Additional assumptions are needed (e.g., the compactness of the pure strategy sets). FINITE DYNAMIC GAMES LECTURE 4

Graph consists of a finite set V and a set A of unordered pairs.

v ∈V - vertex or node

{v1,v2 } ∈A - arc, branch or edge

Tree is a graph where any two nodes are connected by exactly one path. EXAMPLES

TREE Root This is NOT A TREE

Leaves - Terminal Nodes L(T) DEFINITION A game in an extensive form consists of the following objects:

• N = {1,…,n} - a set of players • T - rooted game tree

0 1 n • P ,P ,…, P - a partition of the set of non-terminal nodes of T into (n+1) subsets; the members of P0 are called nature/chance nodes, while the members Pi are called nodes of Player i

• Pi is divided into k(i) information sets: U i , ,U i 1 … k (i)

1 n • g(t) = (g (t),…, g (t)) - n dimensional payoff vector for each terminal node t ∈L(T ). This complete description is common knowledge among the players.

Definition:

An information set for Player i is a set of decision nodes in game tree such that:

(i) the player concerned is masking a decision (no others)

(ii) the player does not know which node has been reached (only that it is one of the nodes in the set). EXAMPLES • (I) Matching Pennies:

0 1 1 2 2 P = ∅, P = U1 = {a}, P = U1 = {b,c}, root= {a}, payoff vector (g1(t),g2 (t))

a 1 U1 H T Player 2 does not 2 know the choice of b c Player 1 when making U1 a decision

H T H T

(1,-1) (-1,1) (1,-1) (-1,1) 0 1 1 • (II) N={1,2,3}, root={a}, P = {d}, P = {a,e, f }, U1 = {a}, 1 2 2 3 3 U2 = {e, f }, P = U1 = {b,c}, P = U1 = {g}

a 1 U1 1 2

b c 2 U1

1 2 1 2 1 U2 0 1 P d e f g U 3 1/2 1/3 1 2 1 2 1 1/6 2 3

(2,0,0) (0,2,0) (0,2,3) (1,1,1) (0,0,0) (1,2,3) (2,0,0) (0,1,-1) (1,2,0) (1,-1,1) 1 Note that at his second information set U2 Player 1 does not recall what his 1 choice was at U1 . PURE STRATEGIES

• I i = {U i ,U i , ,U i } - the set of information sets for Player i 1 2 … k(i ) • U i ∈I i - a generic element of I i

• v ≡ v(U i ) - the number of branches going out of each node in U i

i i • C(U )  {1,…,v(U )} - the set of choices available to Player i at any node in U i i i i A pure strategy s is a function s : I → {1,2,…} such that si (U i ) ∈C(U i ) for all U i ∈I i .

In other words, si specifies for every information set U i ∈I i of Player i, a choicesi (U i ) there.

Let S i be the set of pure strategies of Player i and let S = S1 ×× S n be the set of pure strategy profiles.

1 n For a strategy profile s = (s ,…, s ) ∈S the expected payoff for Player i is: i i , h (s) = ∑ ps (t)g (t) t∈L(T ) where ps (t) is the probability that the game ends at t when the strategy profile s is used. Example (II) -cont.

• Player 1 has four pure strategies: <1,1>, <1,2>, <2,1>, <2,2>, 1 1 where means that i is chosen at U1 and j is chosen at U2 .

• Player 2 has two pure strategies: <1>, <2>. • Player 3 has three pure strategies: <1>, <2>, <3>.

If s=(<2,1>,<2>,<3>), then h1 (s) = 1, h2 (s) = −1, h 3 (s) = 1. If s’=(<1,1>,<1>,<3>), then 1 1 1 h(s') = (2,0,0) + (0,2,0) + (0,2, 3) = (1,1,1). 2 6 3 EXTENSIVE FORM VS NORMAL FORM

• Example (III) A B A 2,0 2,0 Normal form: B 0,2 1,1

Then N = {1,2}, S1 = S 2 = {A, B} and the extensive form is:

1 1 U1 U1 A B A B 2 2 U1 or (2,0) U1 A B A B A B

(0,2) (1,1) (2,0) (0,2) (2,0) (1,1) The extensive form contains more information than the normal form!!! MIXED STRATEGIES A mixed strategy is

i i i i σ = (σ (s )) i i ∈P(S ) s ∈S

It means that Player i chooses each pure strategy si ∈S i with probability σ i (si ). The expected payoff of Player i when a strategy profile σ = (σ 1,...,σ n ) is used equals: H i (σ ) = ∑σ (s)hi (s), s∈S n where σ(s) = ∏σ j (s j ). j=1 THE KUHN THEOREM (1953)

An n-person game in extensive form is of if all information sets are singletons, i.e.,|U i |= 1.

THM: Every finite n-person game of perfect information has an equilibrium point in pure strategies.

Proof by induction.

1 (4,1,2) U1

1 2

(2,3,0) 2 2 (4,1,2) U1 U2

1 2 1 2

(2,3,0) 3 (0,1,3) 3 (0,-1,-2) 1 (4,1,2) U1 U2 U2

1 2 1 2 1 2 3

(1,-1,-2) (2,3,0) (1,0,2) (0,1,3) (3,1,0) (4,1,2) (1,2,-1) The resulting equilibrium point is

s=(<2,2>,<1,2>,<2,2>)

with payoffs h(s) = (4,1,2).

Remark: Note that equilibrium points constructed in this manner, when restricted to any subgame of the original game, yield equilibria in the subgame as well. Such equilibria are called subgame perfect (see the next lecture). BEHAVIOUR STRATEGIES

• A pure strategy is a possible plan for the player’s choices at all his information sets. • A mixed strategy means that the player chooses before the beginning one such a plan at random (according to some probability distribution).

• An alternative approach for the player: to make independent random choice at each one of his information sets. The choices at each information sets are stochastically independent. These randomisation procedures are called behaviour strategies. Every behaviour strategy is a special kind of mixed strategy. When player has “perfect recall” then his every mixed strategy is equivalent to a behaviour strategy.

• bi - behaviour strategy of Player i,

• Bi - the set of all behaviour strategies of Player i, i.e.,

Bi = ∏ P(C(U i )). i i U ∈I EXAMPLE (IV)

Two cards marked H and L are dealt at random to Ann and John. The person with a high card obtains 1$ from the person with a low card and decides whether to stop or continue the game. If the games continues, then Paul (not knowing the outcome of the deal) instructs Ann and John either to exchange or keep their cards. Again the holder of the high card receives 1$ from the holder of the low card and the game ends. Draw a tree for this game and then give its normal form! THE GAME TREE

• ∈P0 H to Ann with pr. 1/2 H to John with pr. 1/2

A J U1 U1 S S C C (1,-1) (-1,1) P U1

K E K E

(2,-2) (0,0) (-2,2) (0,0) NORMAL FORM OF THE GAME

Note that this is a zero-sum game.

• Ann → Player 1, maximiser

pure strategies: , , ,

• John → Player 2, minimiser

pure strategies: ,

1 1 Strategy (0,0) (− , ) 2 2 is dominated by 1 1 (0,0) ( , − ) 2 2 Strategy 1 1 is ( , − ) (0,0) dominated by 2 2

1 1 (− , ) (0,0) 2 2

⎛ 1 1 ⎞ ⎛ 1 1⎞ Unique optimal strategies are: 0, , ,0 and , ; ⎝⎜ 2 2 ⎠⎟ ⎝⎜ 2 2⎠⎟ value of the game: 1/4. • Consider the behaviour strategy for Player 1: (α,1− α ) - Ann chooses S with probability α and C with probability 1 −α

• Paul chooses K with probability β and E with probability 1 − β Player 1’s expected payoff :

• Player 2 plays S:

1 / 2[α + (1− α)(2β + 0(1− β))] +1/ 2(−1) = (1− α )(β − 1/ 2)

• Player 2 plays C:

1/ 2[α + (1− α)(2β + 0(1− β))]+1/ 2[−2β + 0] = α(1/ 2 − β) The maximum payoff that Player 1 can ensure herself is

max{min{(1 −α )(β −1 / 2),α(1 / 2 − β)} = 0 , 0≤α,β≤1 since either 1 / 2 − β ≤ 0 or β −1 / 2 ≤ 0.

Conclusion: The behaviour strategies do a poorer job! The corresponding mixed strategy: ( , , , ) ( αβ , α(1− β), (1 −α )β , (1 −α )(1− β))

Note: σ 1 = (0,1 / 2,1 / 2,0) - the mixed strategy for Player 1 requires randomisation at two information sets that are rather correlated than independent. SUBGAME PERFECTION, COURNOT MODEL LECTURE 5

SUBGAME PERFECTION A subgame (subtree) is a part of a game that: • begins at a decision node (for any player); • the information set containing the initial decision node contains no other nodes, i.e., players knows all decisions that has been made up until that time • the subtree contains all the decision nodes that follow the initial node A Subgame Perfect Nash Equilibrium is a Nash Equilibrium in which the behaviour specified in every subgame is a Nash Equilibrium for the subgame.

Note: This also applies to that are NOT reached during a play of the game.

Theorem

Every finite dynamic game has a Subgame Perfect Nash Equilibrium. EXAMPLE

Two people (husband and wife) are buying items for a dinner party. The husband buys either fish F or meat M for the main course and the wife buys wine: red R or white W. Both are conventional and prefer red wine with meat and white wine with fish. However, the husband prefers meat over fish and wife prefers fish over meat.

The set of pure strategies for the players:

S h = {M,F}, S w = {RR,RW,WR,WW } Extensive form of the game

h (2,1) U1

M F Backward induction

w (2,1) w (1,2) U1 U2

R W R W

(2,1) (0,0) (0,0) (1,2)

Nash Equilibrium in pure strategies obtained by the backward induction: (M,RW) Normal form of the game

M (2,1) (2,1) (0,0) (0,0)

F (0,0) (1,2) (0,0) (1,2)

We obtain three pure strategy Nash Equilibria: (M,RR), (M,RW), (F,WW). But only

(M,RW) is the the Subgame Perfect Nash Equilibrium. THE COURNOT DUOPOLY MODEL

• A duopoly is a market in which two firms compete to supply the same set of customers with the same product. The product is infinitely divisible, e.g., petroleum. • Cournot’s model: the firms choose the production of quantity qi ; since the product is infinitely divisible, the action sets are continuous. • The market price of product is assumed to be dependent on the total supply Q = q1 + q2 .

Let P0 and Q0 be some positive numbers. ⎧ ⎛ Q ⎞ P 1 , if ⎪ 0 ⎜ − ⎟ Q < Q0 P(Q) = ⎨ ⎝ Q0 ⎠ ⎪ if Q ≥ Q ⎩0, 0

• The production cost for Firm i is C(qi ) = cqi , where c is a cost of production of one unit and P0 > c. • The payoff for Firm i:

ri (q1,q2 ) = qiP(Q) − C(qi ) SOLUTION:

• Note that it does not payoff to produce more than Q0 for each firm. Hence, we put the action set as [0,Q0 ]. • Best response strategy for Firm 1:

∂r1 r1 (q1,q2 ) = q1P(Q) − cq1, and (qˆ1,q2 ) = 0, ∂q1

⎛ qˆ + q ⎞ ⎛ 1 ⎞ thus P 1 1 2 qˆ c 0, 0 ⎜ − ⎟ + 1 ⎜ − ⎟ − = ⎝ Q0 ⎠ ⎝ Q0 ⎠ and, finally Q ⎛ q c ⎞ qˆ 0 1 2 1 = ⎜ − − ⎟ 2 ⎝ Q0 P0 ⎠

• Check that it is really the best response:

∂2r ⎛ P ⎞ (I) 1 qˆ ,q 0 0. 2 ( 1 2 ) = −⎜ ⎟ < ∂q1 ⎝ Q0 ⎠

(II) We also need to have qˆ1 + q2 ≤ Q0 . Indeed,

Q0 + q2 cQ0 cQ0 qˆ1 + q2 = − ≤ Q0 − < Q0 . 2 2P0 2P0 • Similarly, we find the best response to a choice of q1 for Firm 2:

Q ⎛ q c ⎞ qˆ 0 1 1 2 = ⎜ − − ⎟ 2 ⎝ Q0 P0 ⎠

* * • A pure strategy Nash Equilibrium is a pair (q1 ,q2 ), each of which is the best response to the other. Such a pair can be found by solving simultaneously equations:

Q ⎛ q* c ⎞ Q ⎛ q* c ⎞ q* 0 1 2 , q* 0 1 1 . 1 = ⎜ − − ⎟ 2 = ⎜ − − ⎟ 2 ⎝ Q0 P0 ⎠ 2 ⎝ Q0 P0 ⎠ The solution is:

Q ⎛ c ⎞ q* q* q* 0 1 C ≡ 1 = 2 = ⎜ − ⎟ 3 ⎝ P0 ⎠

The payoff when the Nash Equilibrium is played is:

2 PQ ⎛ c ⎞ r (q* ,q* ) 0 0 1 i C C = ⎜ − ⎟ 9 ⎝ P0 ⎠  MONOPOLY VS DUOPOLY A monopolist maximises its profit:

rm (q) = qP(q) − cq, and therefore his optimal strategy is:

Q ⎛ c ⎞ q* 0 1 . m = ⎜ − ⎟ 2 ⎝ P0 ⎠

* * Because qm < 2qC , the price at which goods are sold is higher for the monopoly. What about his profit? 2 ⎛ ⎛ q* ⎞ ⎞ Q P ⎛ c ⎞ r (q* ) q* P 1 m c 0 0 1 . m m = m ⎜ 0 ⎜ − ⎟ − ⎟ = ⎜ − ⎟ ⎝ ⎝ Q0 ⎠ ⎠ 4 ⎝ P0 ⎠

Hence, competition operates to benefit the consumer! Suppose that two firms agree to form a cartel and use the 1 strategies: q = q = q* . 1 2 2 m The profit for each firm is now:

2 ⎛ 1 * 1 * ⎞ ⎛ 1 * 1 * ⎞ Q0 P0 ⎛ c ⎞ r q , q r q , q 1 . 1 ⎜ m m ⎟ = 2 ⎜ m m ⎟ = ⎜ − ⎟ ⎝ 2 2 ⎠ ⎝ 2 2 ⎠ 8 ⎝ P0 ⎠ Remarks:

• The cartel profit is greater than the Cournot payoff.

• The price paid by the consumers in the cartel model is the same as they pay under a monopoly.

• Cartel models give an unstable solution, because the best response to a firm producing the cartel quantity is to produce:

Q ⎛ q* c ⎞ 3 1 qˆ 0 1 m q* q* . = ⎜ − − ⎟ = m > m 2 ⎝ 2Q0 P0 ⎠ 4 2

Check it! STACKELBERG SOLUTION

Suppose now that the decisions are made sequentially:

• Firm 1 (LEADER) decides on a quantity to produce.

• This decision is observed by Firm 2 (FOLLOWER), which then decides on the quantity that it will produce. We solve this game by backward induction to find a

Subgame Perfect Nash Equilibrium. In other words, we have to: (I) Find the best response of Firm 2: qˆ2 (q1 ) for every possible choice of quantity q1; (II) Given that Firm 1 knows Firm 2’s best response to every choice of q1, find a Nash Equilibrium for this game by determining the maximum payoff that Firm 1 can achieve.

(I): Firm 2’s profit is r2 (q1,q2 ) = q2[P(Q) − c] and the best response to a choice of q1 is found by solving: ∂r 2 ˆ (q1,q2 ) = 0, ∂q2 which gives that Q ⎛ q c ⎞ qˆ (q ) 0 1 1 . 2 1 = ⎜ − − ⎟ 2 ⎝ Q0 P0 ⎠ (II): If Firm 1 chooses q1 and Firm 2 chooses the best response qˆ2 (q1 ), then Firm 1’s profit is P ⎛ q c ⎞ r q ,qˆ (q ) q 0 1 1 . 1 ( 1 2 1 ) = 1 ⎜ − − ⎟ 2 ⎝ Q0 P0 ⎠ Hence, Firm 1 maximises its profit at Q ⎛ c ⎞ qˆ 0 1 . 1 = ⎜ − ⎟ 2 ⎝ P0 ⎠ The Nash Equilibrium is:

Q ⎛ c ⎞ Q ⎛ c ⎞ q* 0 1 and q* qˆ (q* ) 0 1 1 = ⎜ − ⎟ 2 = 2 1 = ⎜ − ⎟ 2 ⎝ P0 ⎠ 4 ⎝ P0 ⎠ The Subgame Perfect Nash Equilibrium derived above is sometimes called STACKELBERG EQUILIBRIUM. It is interesting to note that Firm 1 makes the greater profit * * because q2 < q1 .

Remarks: This Stackelberg Equilibrium is not the only a Nash Equilibrium inStackelberg model. Another Nash Equilibrium is for Firm 1 to produce the Cournot quantity and for Firm 2 to produce the Cournot quantity regardless of the production of Firm 1. Q ⎛ q* c ⎞ Indeed, if q q* , then qˆ 0 1 C q* . 1 = C 2 = ⎜ − − ⎟ = C 2 ⎝ Q0 P0 ⎠ * * Hence, Firm 2’s best response to q1 = qC is qˆ2 = qC . * If Firm 2 always chooses q2 = qC , then Firm 1’s profit is ⎡ ⎛ q + q* ⎞ ⎤ r q ,q* = q P 1− 1 C − c . 1 ( 1 C ) 1 ⎢ 0 ⎜ Q ⎟ ⎥ ⎣ ⎝ 0 ⎠ ⎦ * The best response for Firm 1 to q2 = qC is found from ∂r 1 ˆ * (q1,qC ) = 0, which gives ∂q1 Q ⎛ q* c ⎞ qˆ 0 1 C q* . 1 = ⎜ − − ⎟ = C 2 ⎝ Q0 P0 ⎠ * * So Firm 1’s best response to q2 = qC is qˆ1 = qC . Because, for both firms the best response to the other firm producing * * * * quantity qC is to produce qC , the pair (qC ,qC ) is a Nash Equilibrium.

* * Although q2 = qC is a best response to q1 = qC , it is not a * best response to q1 ≠ qC . Hence, it is not a Nash Equilibrium in each subgame. Consequently,

* * the pair (qC ,qC ) is not a Subgame Perfect Nash Equilibrium. WAR OF ATTRITION LECTURE 6

In a War of Attrition two players compete for a resource of value of v, for example, two supermarkets engaged in a price war.

The strategy for each player is a choice of persistence time ti . ASSUMPTIONS OF THE MODEL: • The cost of the contest is related only to its duration, there are no other costs! • The player that persists the longest gets all of the resource. If both players quit at the same time, then neither gets the resource. • The cost paid by each player is proportional to the shortest persistence time chosen, i.e., no costs are incurred after one player quits and the contest ends. • The payoffs for the players are:

⎧v − ct2 , if t1 > t2 r1 (t1,t2 ) = ⎨ ⎩−ct1, if t1 ≤ t2

⎧v − ct1, if t2 > t1 r2 (t1,t2 ) = ⎨ ct , if ⎩− 2 t2 ≤ t1 , where c is some positive constant. There are two pure strategy Nash Equilibria:

* * • A pair (t1 ,t2 ) is a NE, where * v * t1 = and t2 = 0 c ⎛ v ⎞ ⎛ v ⎞ giving r ,0 = v and r ,0 = 0. 1 ⎝⎜ c ⎠⎟ 2 ⎝⎜ c ⎠⎟

Indeed, note that for every t1 > 0, we have r1 (t1,0) = v and r1 (0,0) = 0. This yields that

* * * r1 (t1,t2 ) ≤ r1 (t1 ,t2 ) for every t1. ⎛ v ⎞ v For Player 2, we have r ,t = −ct < 0 for every t ≤ 2 ⎝⎜ c 2 ⎠⎟ 2 2 c ⎛ v ⎞ v and r ,t = 0 for every t > . Hence, 2 ⎝⎜ c 2 ⎠⎟ 2 c * * * r2 (t1 ,t2 ) ≤ r2 (t1 ,t2 ) for every t2 .

** ** • Clearly, a pair (t1 ,t2 ) is a second pure strategy NE, where v ** and ** t1 = 0 t2 = c ⎛ v⎞ ⎛ v⎞ giving r 0, = 0 and r 0, = v. 1 ⎝⎜ c⎠⎟ 2 ⎝⎜ c⎠⎟ Mixed strategy Nash Equilibrium

It is convenient to consider strategies based on the costs of the contest, x = ct1 and y = ct2 . The payoffs are

⎧v − y, if x > y r1 (x, y) = ⎨ ⎩−x, if x ≤ y and

⎧v − ct1, if y > x r2 (x, y) = ⎨ ⎩−ct2 , if y ≤ x .

The mixed strategy σ1 specifies a choice of cost in the range x to x+dx with probability p(x)dx; and σ 2 specifies a choice of cost in the range y to y+dy with probability q(y)dy. The expected payoff to Player 1 if he chooses a fixed cost x against * a mixed strategy σ 2 is x +∞ r (x, * ) (v y)q(y)dy ( x)q(y)dy. 1 σ 2 = ∫ − + ∫ − 0 x The first term arises from the probability that Player 2 chooses cost y ≤ x , and the second term from the probability that Player 2 chooses y > x .

By (extension) of the Equality of Payoffs Theorem we must have * r1 (x,σ 2 ) = const. * * Hence, for fixed σ 2 , r1 (x,σ 2 ) is independent of x, so ∂r (x,σ * ) 1 2 = 0. ∂x Now,

∂r (x,σ * ) d x +∞ d +∞ 1 2 = ∫ (v − y)q(y)dy − ∫ q(y)dy − x ∫ q(y)dy. ∂x dx 0 x dx x

From the fundamental theorem of calculus we obtain

d x ∫ (v − y)q(y)dy = (v − x)q(x), dx 0 and

d +∞ d ⎡ x ⎤ q(y)dy = 1 − q(y)dy = −q(x). dx ∫ dx ⎢ ∫ ⎥ x ⎣ 0 ⎦ Summing up, we get

∂r (x,σ * ) +∞ 1 2 = (v − x)q(x) − ∫ q(y)dy + xq(x) ∂x x +∞ = vq(x) − ∫ q(y)dy ≡ 0. x

We identify q(y) as an exponential probability density, so we put ky q(y) = ke− , where k is a normalization constant.

Since, +∞ +∞ ∫ q(y)dy = k ∫ e− kydy = e−kx , x x we have 1 vke− kx = e− kx ⇒ k = . v y 1 − Hence, q(y) = e v . v The same argument for Player 1 yields the same distribution in x 1 − terms of costs p(x) = e v , i.e., the equilibrium is symmetric. v

Nash Equilibrium in terms of distribution of persistence times p(t) chosen:

+∞ +∞ x +∞ ct +∞ 1 − ⎧x = ct ⎫ c − p(x)dx = e v dx = ⎨ ⎬ = e v dt = p(t)dt ∫ ∫ v dx = cdt ∫ v ∫ 0 0 ⎩ ⎭ 0 0 Hence, the equilibrium distributions for both players are v exponential with mean , i.e. c

ct c − p (t) = q(t) = e v v The expected payoffs for players under this equilibrium can be calculated

i either explicitly from the formula

+∞ ⎡ x +∞ ⎤ r ( *, * ) p(x) (v y)q(y)dy x q(y)dy dx, 1 σ1 σ 2 = ∫ ⎢∫ − − ∫ ⎥ 0 ⎣ 0 x ⎦ * * ∂r1(x,σ 2 ) i or by observing that r1 (0,σ 2 ) = 0 and = 0 imply ∂x * * that r1 (x,σ 2 ) = 0 for every x; therefore r1 (σ1,σ 2 ) = 0 for all * σ1, and particularly for σ1 .

Note: Although p (t) is the distribution of persistence time, it is not the distribution of contest duration (D) P(D ≤ t) = 1 − P(D > t) = 1− P(neither player has quit before t). +∞ ct − Now, P(Player i does not quit before t)= ∫ p(τ )dτ = e v , and t 2ct − P(D ≤ t) = 1 − e v . When do NASH EQUILIBRIA exist?

Kakutani Fixed Point Theorem

Let K be a convex and compact set of an Euclidean space n . Let F be a mapping such that i for every x ∈K , F(x) is convex and closed in K i the graph of F be closed.

Then there exists a point such that x0 ∈F(x0 ).

NOTE: Glicksberg (1952) extended this result on infinite dimensional spaces. n Definition 1: Consider a non-empty subset K ⊂ R and a correspondence F : X → X . The graph of F is said to be closed if the following set gph(F) ≡ {(x, y) ∈K × K : y ∈F(X)} is closed in K × K .

The graph of F is closed if and only if for all sequences {x } n n∈ and {y } in K satisfying n n∈

• lim xn = x, n→∞

• lim yn = y, n→∞

• yn ∈F(xn ) for each n ∈, we get y ∈F(x). F(x ) X { 0 { i x0 { X

n Definition 2: A subset K ⊂ R is compact if it is bounded: ∃M > 0 ∀x ∈K x ≤ M and closed, i.e., for every sequence {x } in K converging to n n∈ n some x ∈R the limit vector x still belongs to K. n Lemma 1: A subset K ⊂ R is compact if for every sequence {x } in K there exists a subsequence {x } such that n n∈ nk k∈ lim xn = x. k→∞ k

Nash theorem for compact pure strategy sets

Consider a strategic form of an n-person game whose strategy

n spaces Si , i = 1,...,n are non-empty compact subsets of R . If the payoff functions are continuous then there exists a Nash

Equilibrium in mixed strategies. Economists are often interested in a pure strategy Nash Equilibria. In order to obtain such an equilibrium, additional assumptions are needed.

Definition 3: A function f : n is quasi-concave if the R → R sets {x : f (x) ≥ c} are convex for all c. for every sequence {x } in K converging to some x n the n n∈ ∈R limit vector x still belongs to K.

Lemma 2: Consider a function u :C → R , where C is a convex n and non-empty subset of R . The function u is quasi-concave if and only if u(α x + (1− α)y) ≥ min{u(x),u(y)} for any x, y ∈C and α ∈[0,1]. Lemma 3: Consider a function u :C → R , where C is a convex n and non-empty subset of R . If u is concave then it is quasi- concave.

Example of a quasi-concave function:

u

c { CONVEX SET General existence result (Debreu 1952; Glicksberg 1952; Fan 1952)

Consider a strategic form of an n-person game for which

• strategy spaces Si , i = 1,...,n are non-empty compact subsets n of R ,

• the payoff function is continuous on , ri S = S1 ×× Sn

• the payoff function ri is quasi-concave in si .

Then there exists at least one pure strategy Nash Equilibrium. The proof is based on verifying that the continuous payoffs imply non-empty, closed graph of best response correspondence, and that quasi-concavity in players’ own action implies convex-valued best response correspondence.

Remark: If the payoff functions are not continuous, then the best response correspondence can fail to have a closed graph and/or fail to be non-empty. The latter problem arises, because discontinuous function need not attain a maximum, for example,

⎧− | x |, x ≠ 0 f (x) = ⎨ . ⎩−1, x = 0 Example: Consider the following two-player game with strategy sets S1 = S2 = [0,1] and payoff functions:

2 r1 (s1,s2 ) = −(s1 − s2 )

2 ⎧ ⎛ 1⎞ 1 ⎪−⎜ s1 − s2 − ⎟ , s1 ≥ ⎪ ⎝ 3⎠ 3 r (s ,s ) 2 1 2 = ⎨ 2 ⎪ ⎛ 1⎞ 1 −⎜ s1 − s2 + ⎟ , s1 < . ⎩⎪ ⎝ 3⎠ 3

Here each player’s payoff is strictly concave in his own strategy, and a best response exists (and is unique) for each strategy of the opponent. However, the game does not have a pure strategy Nash Equilibrium:

• Player 1’s best response is always to choose s1 := s2 ; 1 1 • Player 2’s best response is to choose s := s − for s ≥ and 2 1 3 1 3 1 1 s + for s < ; 1 3 1 3

S2 1 PLAYER 1 Best response PLAYER 2 functions do not intersect.

1 S1 BAYESIAN EQUILIBRIUM LECTURE 7

Robert Aumann in the article [J. Math. Econ. 1 (1974)]:

“Subjectivity and correlation in randomized strategies”, proposed a mechanism that permits the players to enter into preplay arrangements so that their strategy choices could be correlated instead of independently chosen. Example proposed by Aumann:

PLAYER 2

L R

P L U 5,1 0,0 A Y E R D 4,4 1,5 1

The game has two pure strategy and one mixed strategy Nash Equilibrium:

* * ⎛ 1 1⎞ (U,L), (D,R) and σ = σ = , . 1 2 ⎝⎜ 2 2⎠⎟ The corresponding payoffs are: i (5,1), i (1,5), i (2.5,2.5). Now, if the players agree to play by jointly observing a coin flip and playing (U, L) if the result is head and (D,R) if the result is tail, then the expected outcome is (3,3), which is a convex combination of two pure strategy Nash Equilibrium outcomes.

Observe that this way of playing the game defines an equilibrium for the following extensive game:

P0 1/2 1/2

1 U HEAD 1 1 TAIL U2 U D U D 2 2 U1 U2

L R L R L R L R

(5,1) (0,0) (4,4) (1,5) (5,1) (0,0) (4,4) (1,5) We now push the example one step further by assuming that the players agree to play according to the following mechanism: a random device selects one sell in the game matrix with the probabilities

L R

U 1/3 0

D 1/3 1/3

When a cell is selected, then each player is told to play the corresponding pure strategy. The trick is that a player is told what to play but is not told what the recommendation to the other player is. The information is not public any more! When Player 1 receives the signal “play D” he only knows that with probability 1/2 the other player has been told to play R. When Player 1 receives the signal “play U” he knows that the other will play L with probability 1.

Consider now what Player 1 can do if he assumes that Player 2 plays according to the recommendation.

i If Player 1 has been told “play D” and if he plays so, he expects: 1 1 5 1 1 5 4 + = , but if he plays U instead, he expects: 5 + 0 = . 2 2 2 2 2 2 Hence, he cannot improve his expected payoff. i If Player 1 has been told “play U” and if he plays so, he gets 5, whereas he plays D he expects 4. Conclusion: For Player 1 obeying to the recommendation is the best reply to Player 2’s behaviour when he himself plays according to the suggestion of the signalling scheme. Now we can repeat the verification for Player 2. 1 1 5 i If he has been told “play L” he expects 1+ 4 = , but if he 2 2 2 1 1 5 plays R instead, he expects 0 + 5 = . Hence, cannot improve 2 2 2 his expected payoff. i If he has been told “play R” he expects 5, otherwise 4. Thus, we have checked that an equilibrium property holds for this scheme of playing the game.

1 1 1 Each player expects: 1+ 5 + 4 = 3.3333... 3 3 3 Extensive form of game leading to the correlated equilibrium:

P0

1/3 1/3 1/3

1 (U,L) (D,L) (D,R) 1 U1 U2

U D U D U D

2 2 U1 U2

L R L R L R L R L R L R

(5,1) (0,0) (4,4) (1,5) (5,1) (0,0) (4,4) (1,5) (5,1) (0,0) (4,4) (1,5) Remarks: By expanding the game via the adjunct of a first stage where the Nature plays and gives information to the players, a new class of equilibria can be reached, which dominate some of the original Nash Equilibria.

(1) If the random device gives information which is common to all players, then it permits a mixing of different pure strategy Nash Equilibria and the outcome is in the convex hull of the Nash Equilibrium outcomes. Such an equilibrium is sometimes called SUNSPOT EQUILIBRIUM. (1,5) (3.33,3.33)

(3,3)

(2.5,2.5) (5,1)

(2) If the random device gives information which can be different from one player to the other, then the correlated equilibrium can have an outcome which lies outside of the convex hull. Such an equilibrium is called CORRELATED EQUILIBRIUM. Definition: A correlated equilibrium in n-player game is any probability distribution p(⋅) over the set of pure strategies of the players, i.e., such that S = S1 ×× Sn for every player j and any mapping d j :S j → Sj , it holds

p(s)r (s ,s ) p(s)r (d(s ), s ) ∑ j j − j ≥∑ j j − j s∈S s∈S where (s ,s ) = (s ,...,s ,s ,s ,...s ). j − j 1 j−1 j j+1 n BAYESIAN EQUILIBRIUM J. Harsanyi, Inter. J. Game Theory (1973)

In a game of incomplete information some players do not know exactly what the characteristics of other players are.

Example: Player 1 can be of two types, called θ1 and θ2 . The payoffs matrices are the following:

L R θ1 θ2 L R

U (0,-1) (2,0) U (1.5,-1) (3.5,0)

D (2,1) (3,0) D (2,1) (3,0) Extensive form of this game:

P0

p1 p2

1 1 U1 θ1 θ2 U2

U D U D

2 U1

L R L R L R L R

(0,-1) (2,0) (2,1) (3,0) (3/2,-1) (7/2,0) (2,1) (3,0) J. Harsanyi proposed a transformation of a game with incomplete information into a game with imperfect information. This transformation introduces a preliminary “chance move” played by the Nature which decides randomly the type θi of

Player 1. The probability of each type is p1 and p2 = 1− p1 , and represents the beliefs of Player 2. One assumes that Player 1 also knows the beliefs, thus the prior probabilities are common knowledge.

i Let xi be a probability of choosing U and 1− xi be a probability of choosing D when the Player 1 is of type θi . i Let y be a probability of choosing L and 1− y be a probability of choosing R. In other words, (y,1− y) is a mixed strategy for Player 2. We can define the best response of Player 1 to a mixed strategy (y,1− y) of Player 2.

θ1 θ1 i If Player 1 is of type θ1, then max ai1 y + ai 2 (1− y) is a i=1,2 best response.

θ2 θ2 i If Player 1 is of type θ2 , then max ai1 y + ai 2 (1− y) is a i=1,2 best response.

θl θl Here, aij , bij are the payoffs for Player 1 and Player 2, when

Player 1 is of type θl . From above points we get:

i θ1 ⇒ max{0y + 2(1− y),2y + 3(1− y)} and

i θ2 ⇒ max{3 / 2y + 7 / 2(1− y),2y + 3(1− y)} 5 5 θ1 θ2 3.5 3 U

D D 2 1.5 U

1 1/2 1

We can now define the optimal response of Player 2 to the pair of mixed strategy (xi ,1− xi ) of Player 1 by solving

θ1 θ1 θ2 θ2 max p1 (x1b1 j + (1− x1 )b2 j ) + p2 (x2b1 j + (1− x2 )b2 j ) j=1,2

Note that if Player 1 is of type θ1, he always chooses a strategy D, i.e., x1 = 0 (see Figure). Therefore, the best reply conditions for Player 2 can be written as follows: max {1p1 + p2 (x2 (−1) +1(1− x2 ), p2 (0x2 + 0(1 − x2 )) + 0 p1} which is max {1− 2(1− p1 )x2 ,0}. Hence, Player 2 chooses 1 i strategy L, if > x2 , 2(1− p1 ) 1 i strategy R, if < x2 , 2(1− p1 ) 1 i any mixed strategy y ∈[0,1], if = x2 . 2(1− p1 ) Conclusion: The equilibria of this game are characterized as follows

i x1 ≡ 0, 1 i if p ≤ , then x = 0, y = 1 or x = 1, y = 0 or 1 2 2 2 1 1 x2 = , y = , 2(1− p1 ) 2

1 i if p > , then x = 0, y = 1. 1 2 2 Definition: Each player j (in n-player game) may be of different possible types. Let Θ j be the finite set of types for Player j. Whatever his type is, Player j has the same set of pure strategies Sj If θ = (θ1 ,…,θn ) ∈Θ = Θ1 ×× Θn is a type specification for every player, then the normal form of the game is specified by the payoff functions: r (θ;i) :S ×× S → , j = 1,...,n. (*) j 1 n R A prior probability distribution p(θ1,…,θn ) on Θ is given as common knowledge. Furthermore, we shall assume that all the marginal probabilities are nonzero, i.e., pj (θ j ) > 0 for all j = 1,...,n. When Player j observes that he is of type θ j ∈Θ j , he can construct his revised conditional probability distribution on the types θ− j of the other players through the Bayes formula:

p([θ j ,θ− j ]) p(θ− j |θ j ) = . pj (θ j ) We can now introduce an expanded game where the Nature draws randomly, according to the prior probability distribution p(i), a type vector θ ∈Θ for all players. Player j observes only his type and picks a strategy from his strategy set Sj . The outcome is then defined by (*). A BAYESIAN EQUILIBRIUM is a Nash Equilibrium in the expanded game, in which each player’s pure strategy gj is a map from Θ j to Sj . The payoff to Player j associated with a strategy profile g = (g1,...,gn ) is given by R (g) p ( ) p( | )r ([ , ];g ( ),..., g ( )). j = ∑ j θ j ∑ θ− j θ j j θ j θ− j 1 θ1 n θn θ j ∈Θ j θ− j ∈Θ− j As usual a Nash Equilibrium is a strategy profile g* such that for every gj and j = 1,...,n it holds

* * Rj (g ) ≥ Rj (gj , g− j ). BAYESIAN EQUILIBRIUM LECTURE 8

BARGAINING UNDER INCOMPLETE INFORMATION K. Chatterjee, W. Samuelson, Oper. Res. (1983)

THE MODEL

Single seller of an indivisible good faces a single potential buyer. A bargain is concluded if and only if the good is transferred at a mutually acceptable price. Let

i vs ∈[0,1] denote the seller’s reservation price (the smallest monetary sum he will accept in exchange for the good),

i vb ∈[0,1] denote the buyer’s reservation price (the greatest sum he is willing to pay for the good).

The bargain is struck when the sale price P satisfies

vs ≤ P ≤ vb .

Incomplete information is modeled by the following situation: Each party knows his reservation price, but is uncertain about his adversary’s, assessing a subjective probability distribution over the range of possible values that his opponent may hold. Specifically, the buyer regards vs as a random variable possessing a c.d.f. Fb (vs ) satisfying F (0) 0 and F (1) 1. Moreover, F ( ) is b = b = b i strictly increasing on [0,1] and has a density function.

Similarly, the seller’s knowledge of vb is summarized by F (v ) which satisfies F (0) 0 and F (1) 1. F ( ) is s b s = s = s i also strictly increasing on [0,1] and has a density function. These distribution functions are common knowledge, i.e., each side knows these distributions, knows that the latter knowledge is known, etc. Bargaining Rule

Seller and buyer submit sealed offers, s and b, respectively. If i s ≤ b, a bargain is enacted and the good is sold at price, P = kb + (1− b)s, where 0 ≤ k ≤ 1, and if i s > b, there is no sale and no money transfer.

1 If k = , then the rule determines a final sale price by 2 splitting the difference between the player offers. Both offers carry equal weight in determining the sale price. We shall consider this case! Objective: Characterize Bayesian equilibrium solutions in this one-stage bargain game.

i In the event of agreement the seller has a profit P − vs and the buyer vb − P. i In the event of no agreement each side earns 0.

Each bargainer makes the offer to maximize his expected profit. Notation: b = B(vb ) and s = S(vs ) indicate that the players’ offers depend on their reservation prices. The functions B(i) and S(i) are referred to as the player offer strategies. i The expected profit for the buyer is given by

b ⎧ ⎛ b + s⎞ v − g (s)ds, b > 0 ⎪∫ ⎜ b ⎟ b rb (b,vb ) = ⎨ 0 ⎝ 2 ⎠ ⎪ ⎩0, b ≤ 0.

Here, g ( ) is a density function of the seller induced by b i S( ) and F ( ), i.e., i b i

(1) −1 −1 Gb (ς) = Pr(S(vs ) ≤ ς) = Pr(vs ≤ S (ς)) = Fb (S (ς)). i The expected profit for the seller is given by

1 ⎧ ⎛ b + s ⎞ − v g (b)db, s < 1 ⎪∫ ⎜ s ⎟ s rs (s,vs ) = ⎨ s ⎝ 2 ⎠ ⎪ ⎩0, s ≥ 1.

Here, g ( ) is a density function of the seller induced by s i B( ) and F ( ), i.e., i s i

(2) −1 −1 Gs (β) = Pr(B(vb ) ≤ β) = Pr(vb ≤ B (β)) = Fs (B (β)). We look for a pure strategy equilibrium. * Conditional on vs the seller’s offer s is a best response against if * for all . Similarly, the B(i) rs (s ,vs ) ≥ rs (s,vs ) s ∈[0,1] * buyer of type vb makes offer b that is a best response against if * for all . A pair of best S(i) rb (b ,vb ) ≥ rb (b,vb ) b ∈[0,1] response offer strategies constitutes a Bayesian (Nash) equilibrium.

Proposition Under the sealed offer bargaining rule, the equilibrium bargaining strategies of the buyer and seller are increasing in the respective reservation prices. Prove this proposition. Hint: Note that, for example, for the buyer we have the following inequalities: rb (b,vb ) ≥ rb (b',vb ) and rb (b',vb ' ) ≥ rb (b,vb ' ).

Let us now consider the maximization problem for the seller of type vs , i.e., 1 ⎛ s + b ⎞ maxrs (s,vs ) = max − vs Gs (db). s∈[0,1] s∈[0,1] ∫⎜ ⎟ s ⎝ 2 ⎠ Note that

1 b=1 1 ⎛ s + b ⎞ ⎡⎛ s + b ⎞ ⎤ 1 − v G (db) = − v G (b) − G (b)db. ∫ ⎝⎜ s ⎠⎟ s ⎢⎝⎜ s ⎠⎟ s ⎥ ∫ s s 2 ⎣ 2 ⎦b=s 2 s Hence, 1 ⎛ s + b ⎞ − v G (db) = ∫ ⎜ s ⎟ s s ⎝ 2 ⎠ 1 ⎛ s +1 ⎞ 1 − v − s − v G (s) − G (b)db. ⎜ s ⎟ ( s ) s ∫ s ⎝ 2 ⎠ 2 s

∂r (s,v ) First order condition s s = 0 implies that ∂s 1 1 − G (s) − (s − v )g (s) + G (s) = 0. 2 s s s 2 s Finally, we obtain that 1 (1− G (s)) = (s − v )g (s). (∗) 2 s s s

Let us now consider the maximization problem for the buyer of type vb , i.e., b ⎛ s + b ⎞ maxrb (b,vb ) = max vb − Gb (ds). b∈[ 0,1] b∈[ 0,1] ∫ ⎜ ⎟ 0 ⎝ 2 ⎠ Note that

b s=b b ⎛ s + b⎞ ⎡⎛ s + b⎞ ⎤ 1 v − G (ds) = v − G (s) + G (s)ds. ∫ ⎝⎜ b ⎠⎟ b ⎢⎝⎜ b ⎠⎟ b ⎥ ∫ b 0 2 ⎣ 2 ⎦s=0 2 0 Hence, b b ⎛ s + b⎞ 1 v − G (ds) = (v − b)G (b) + G (s)ds. ∫ ⎜ b ⎟ b b b ∫ b 0 ⎝ 2 ⎠ 2 0 ∂r (b,v ) First order condition b b = 0 implies that ∂b 1 (v − b)g (b) − G (b) + G (b) = 0. b b b 2 b Finally, we obtain the following condition 1 (v − b)g (b) = G (b). (∗∗) b b 2 b Let us further assume that F ( ) and F ( ) are uniform b i s i distributions on [0,1]. We look for a linear strategies:

S(vs ) = α1 + β1vs = s,

B(vb ) = α 2 + β2vb = b, where βi ≠ 0 for i=1,2. Then, by (1) we get that ⎛ s − α ⎞ s −α G (s) F (S −1 (s)) F 1 1 , b = b = b ⎜ ⎟ = ⎝ β1 ⎠ β1 1 which implies that gb (s) = for s ∈[α1,α1 + β1 ]. β1 Analogously, by (2) we have that ⎛ b −α ⎞ b −α G (b) F (B−1 (b)) F 2 2 , s = s = s ⎜ ⎟ = ⎝ β2 ⎠ β2 1 which implies that gs (b) = for b ∈[α 2 ,α 2 + β2 ]. β2 We plug g ( ) into ( ) and g ( ) into ( ) and have: s i ∗ b i ∗∗ 1 ⎛ s − α ⎞ 1 1 2 (s v ) ⎜ − ⎟ = − s 2 ⎝ β2 ⎠ β2

1 1 b − α1 (vb − (α 2 + β2vb )) = . β1 2 β1 After some rearrangements we obtain

⎧vb (2 − 3β2 ) = 3α 2 −α1 ⎨ ⎩vs (−2 + 3β1 ) = −3α1 +α 2 + β2 .

Since vb and vs vary from 0 to 1, then it must hold that 2 2 i 2 − 3β2 = 0 ⇒ β2 = , 3β1 − 2 = 0 ⇒ β1 = , 3 3 2 i 3α 2 = α1, 3α1 − α 2 = , which yields that 3 1 1 α = and α = . 1 4 2 12 Hence, we have obtained that

1 2 1 2 S(vs ) = + vs B(vb ) = + vb 4 3 12 3

In equilibrium the parties trades if and only if it holds

1 S(v ) ≤ B(v ) ⇒ v + ≤ v s b s 4 b . Moreover, due to rational assumptions made at the beginning: S(vs ) ≥ vs and B(vb ) ≤ vb . These inequalities imply that vs ≤ 3 / 4 and vb ≥ 1 / 4. Trade is enacted, if vb ≥ vs - but this inequality holds (see above).

The above conditions mean that the seller never sells his good below his own reservation price, and the buyer never buys the product for the higher price than his own reservation price. Clearly, vb ∈[1 / 4,1] and vs ∈[0,3 / 4]. Figure: Equilibrium strategies for the seller and the buyer

1 S(v ) 3/4 s B(vb )

1/4

1/4 3/4 1 vb ,vs What is the probability that the bargain will be reached? ⎛ 1 3 1 ⎞ This probability is: Pr v ≥ + v , v ≤ , v ≥ =(*). ⎝⎜ b 4 s s 4 b 4 ⎠⎟

Clearly, vb , vs are independent random variables. Hence,

vb 1

3/4

⎛ 1 3 1⎞ 9 1/4 Pr v ≥ + v , v ≤ , v ≥ = ⎝⎜ b 4 s s 4 b 4⎠⎟ 32

3/4 1 vs * b ⎛ s + b* ⎞ The expected profit for the buyer: v − G (ds), ∫ ⎜ b ⎟ b 0 ⎝ 2 ⎠

s − 1/ 4 where G (s) = for s ∈[1 / 4,11 /12]. Hence, b 2 / 3

2 1 1 vb + 2 3 12 ⎛ s + 2 / 3v +1 /12⎞ 3 4 1 ⎛ 1 ⎞ v − b ds + 0ds = v − ∫ ⎜ b ⎟ ∫ ⎜ b ⎟ 1/4 ⎝ 2 ⎠ 2 0 2 ⎝ 4 ⎠

1 2 1 ⎛ 1 ⎞ 9 and v − dv = . Calculate the profit for the seller. ∫ ⎜ b ⎟ b 1 2 ⎝ 4 ⎠ 128 4 INFINITE DYNAMIC GAMES LECTURE 9

ITERATED PRISONERS’ DILEMMA i Recall the payoff matrix for one-stage Prisoners’ Dilemma:

Q C

Q (3,3) (0,5) Q-Keep Quiet

C-Confess C (5,0) (1,1) i In two-stage game the pure strategy sets are:{QQ,CQ,QC,CC}. We solve this game by backward induction: in the final stage there is no further interaction so the only consequence is the payoff to be gained: (1,1). Now consider the first stage; the payoffs in this stage can be calculated by adding the payoffs for the Nash Equilibrium in the second stage (we are only interested in a Subgame Perfect Nash Equilibrium):

QC CC

QC (4,4) (1,6) Q-Keep Quiet NE ⇒ {CC,CC} C-Confess CC (6,1) (2,2) i Consider now an infinite number of repeats, indexed by t=0,1,..... If there is no end of the game, then there is no last stage to work backwards from. Since all subgames look the same, this suggests that we should look for stationary strategies.

Definition: A stationary strategy is a strategy in which the rule for choosing an action is the same in every stage.

Note that this does not necessarily mean that the action chosen in each stage must be the same!

Examples: (a) Play Q in every stage or (b) Play Q if the other player has never played C and play C otherwise. Let s be a profile of strategies used by the players and ri (t) is the payoff to Player i received at the stage t of the game. ∞ Then the total payoff is: s . But observe that ∑ri (t) t=0

∞ ∞ s s Q Q , where - play always Q, and ∑ri (t) = ∑ 3 = +∞ sQ t=0 t=0

∞ ∞ sCsC , where - play always C. ∑ri (t) = ∑1 = +∞ sC t=0 t=0 IDEA: TO DISCOUNT FUTURE PAYOFFS BY A DISCOUNT FACTOR δ ∈(0,1). Then the discounted payoff to Player i, when the profile ∞ is used, equals: t s . s = (s1,s2 ) Ri (s1,s2 ) = ∑δ ri (t) t=0 Depending on the situation the discount factor represents, for example, the uncertainty about whether the game will 1 continue (with the probability δ ) or δ = , where r is an 1+ r interest rate. In the latter case, when V is an initial amount, then in consecutive periods we get V(1+ r), V(1+ r)2 = FV (V(1+ r))(1+ r) and so on. Hence, PV = T . (1+ r)T ∞ 3 Examples: t , Ri (sQ ,sQ ) = ∑ 3δ = t=0 1− δ ∞ 1 t . Ri (sC ,sC ) = ∑δ = t=0 1− δ Let us now consider the (sometimes it is called GRIM): sG - start by playing Q and continue until the other player confesses, then use C forever after. If both players adopt this strategy then permanent 3 cooperation gives that R (s ,s ) = . i G G 1− δ

Is the strategy pair (sG ,sG ) a Nash Equilibrium? Let us restrict the pure strategy sets for both players to the set: . S = {sG ,sQ ,sC }

- Suppose that Player 1 uses strategy sQ instead. Then, 3 R (s ,s ) = . Hence, Player 1 cannot do better by 1 Q G 1− δ switching to sQ .

- Suppose that Player 1 uses strategy sC instead. Then, δ R (s ,s ) = 5 +δ +δ 2 + ... = 5 + , 1 C G 1 −δ because the sequence of actions is as follows: t= 0 1 2 3 4 Player 1 C C C C C...

Player 2 Q C C C C...

Hence, Player 1 cannot do better by switching to sC , if δ 3 1 5 + ≤ ⇒ δ ≥ . 1− δ 1− δ 2

Thus, the pair of strategies (sG ,sG ) is a Nash Equilibrium for sufficiently large discount factor. However, (sG ,sG ) is not a Subgame Perfect Nash Equilibrium. To show this claim consider the possible outcomes: (i) neither player has played C, (ii) both players have played C, (iii) Player 1 used C in the last stage but Player 2 did not, (iv) Player 2 used C in the last stage but Player 1 did not.

Note that in cases (i) and (ii) there is no conflict with the concept of SPNE. Indeed, for example, in (ii) both players have played C so the NE pair (sG ,sG ) specifies that each player should play C forever. Hence, the strategy pair

(sC ,sC ) adopted in this case is a NE of the subgame, because it is a NE of the entire game. In case (iii) sQ specifies that Player 2 should switch to using C forever. However, Player 2 has not yet played C, so Player 1 should continue to use sG . Thus, a NE for the whole game specifies that the strategy pair (sG ,sC ) should be adopted in the subgame. However, this pair is not a NE for the subgame , because Player 1 could obtain a greater payoff by using sC . Indeed, the corresponding payoffs in δ this subgame are R (s , s ) = 0 +δ +δ 2 + ... = and 1 G C 1− δ 1 R (s ,s ) = 1+ δ + δ 2 + ... = . 1 C C 1 −δ Although (sG ,sG ) is not a SPNE, a very similar strategy lead to a SPNE, when it is adopted by both players:

* s - start by being quiet and continue to play Q until either player confesses, then play C forever after.

The pair (s*, s* ) is a Subgame Perfect Nash Equilibrium, because it specifies that the players should play (sC ,sC ) in the subgames (iii) and (iv). So far, we have allowed repeated games to have only limited set of strategies . Will the Subgame Perfect Nash Equilibrium change, if we allow more general strategies?

Definition: A pair of strategies (σ1,σ 2 ) satisfies the ONE-STAGE DEVIATION CONDITION if neither player can increase his payoff by deviating (unilaterally) from his strategy in any single stage and returning to the specified strategy thereafter.

Example (cont.) Does a strategy pair (s*, s* ) satisfies one-stage deviation principle? The game can be in one of two classes of subgame: (I) both players have always used Q, or (II) at least one player has used C in the previous round.

SUBGAME {

(I) t-1 t t+1 t+2 t+3 Player 1 Q C Q C Q C Q C Q...

Player 2 Q Q C Q C Q C Q...

δ 3 5 + ≤ R1 (sQ ,sQ ) = 1−δ 1− δ 1 for δ ≥ 2 SUBGAME {

(II) t-1 t t+1 t+2 t+3 Player 1 C Q C C C C C C C...

Player 2 Q C C C C C C C...

δ 1 δ 0 + ≤ = 1+ 1−δ 1− δ 1− δ for δ < 1

Hence, (s*, s* ) satisfies one-stage deviation condition 1 provided that ≤ δ < 1. 2 THEOREM

A pair of strategies is a Subgame Perfect Nash Equilibrium for a discounted if and only if it satisfies one-stage deviation condition.

This theorem is called one-stage deviation principle. Proof: For finitely repeated games, the equivalence of DP and DC is obtained via backward induction ( it shows that a SPNE payoff cannot be improved by deviating in any finite number of stages).

Because the definition of subgame perfection implies the one-stage deviation condition for finitely and infinitely repeated games, it only remains to prove that the one- stage deviation condition implies SPNE in an infinitely repeated games.

The proof of this fact is by a contradiction. On the contrary, suppose that a strategy pair (σ1,σ 2 ) satisfies DC but it is not a SPNE. It follows that there is some stage t at which it would be better for one of the players, say Player 1, to adopt a different strategy σˆ 1.

This means that there exists ε > 0 such that

R1 (σˆ 1,σ 2 ) − R1(σ1,σ 2 ) > 2ε.

Now consider a strategy σ '1 defined as follows: ˆ i σ '1 = σ1 from stage t to stage T, and

i σ '1 = σ1 from stage T onwards. Because the future payoffs are discounted, then

T −t R1(σˆ 1,σ 2 ) − R1 (σ '1,σ 2 ) ∝ δ , so we can choose T sufficiently large such that

R1 (σˆ 1,σ 2 ) − R1(σ '1,σ 2 ) < ε and

R1 (σ '1,σ 2 ) − R1(σˆ 1,σ 2 ) > −ε . Combining these inequalities we get that

R1 (σ '1,σ 2 ) − R1(σ1,σ 2 ) > ε .

But σ1 and σ '1 differ at only finite number of stages. Therefore, this inequality contradicts the one-stage deviation principle for finitely repeated games.

Hence, (σ1,σ 2 ) cannot satisfy one-stage deviation principle without being a Subgame Perfect Nash Equilibrium.  STOCHASTIC GAMES

LECTURE 10

PRELIMINARIES i X - a Borel space, i.e., a Borel subset of complete separable metric space;

i {Xn } - a sequence of Borel spaces Xn ≠ ∅ for every

∞ ∞ n ∈; put X = X1 × X2 × = ∏ Xn ; n=1 ∞ obviously, x = {xn } ∈X , xn ∈Xn for every n ∈;

i (Xn ) - Borel sets in Xn ;

i Cylinder sets in X ∞ : C D D X X * k = 1 × × k × k+1 × k+2 ×  ( )

Di ∈(Xi ) for i = 1,..., k ;

∞ ∞ i (X ) - the smallest algebra in X containing the cylinder sets. DATA:

{pn } - a sequence of conditional probabilities, where p1 is a probability measure on X1, and p i x ,…, x is a probability measure on X given n ( 1 n−1 ) n x , ,x for n ≥ 2. ( 1 … n−1 )

For every D ∈(Xn ), p D x ,…, x is a measurable n ( 1 n−1 ) function on X X . 1 ×× n−1 THE IONESCU-TULCEA THEOREM

For a sequence of conditional probability measures p p = {pn } there exists a unique probability measure P on

∞ ∞ (X ) such that for any cylinder set Ck ⊂ X defined in (*) it holds P p (C ) = … p D x ,…, x p dx x ,…, x … p dx . k ∫ ∫ k ( k 1 k−1 ) k−1 ( k−1 1 k−2 ) 1 ( 1 ) D1 Dk−1 For example, if k = 2, then P p (C ) = 1 x 1 x p dx x p dx . 2 ∫ ∫ D2 ( 2 ) D1 ( 1 ) 2 ( 2 1 ) 1 ( 1 ) X1 X2 THE MODEL OF A

i S - a countable state space; i A, B- finite sets of actions for Player 1 and 2, respectively; i q(z s,a,b) - a transition probability from state s to z, when Player 1 picks an action a, and Player 2 chooses b; - one-step bounded payoff function for i ri :S × A × B →  Player i.

Evolution of the game: The players observe an initial state s1 ∈S and selects independently actions a1 ∈A and b1 ∈B. The actions can be chosen by a random device with assigned probabilities.

Then Player i receives the payoff ri (s1 ,a1 ,b1 ). If ri (s1 ,a1 ,b1 ) < 0, it means that it is a cost he has to pay. The game moves to a next state s2 according to q(s2 s1,a1,b1 ). Remembering s1,a1 ,b1 and observing s2 , the players pick a2 ∈A and b2 ∈B, Player i receives ri (s2 ,a2 ,b2 ), and the game moves to a next state s3 according to q(s3 s2 ,a2 ,b2 ). In this way we obtain a trajectory of a stochastic process h = (s1,a1,b1, s2 ,a2 ,b2 ,...).

Let hn = (s1,a1,b1,...,sn−1 ,an−1 ,bn−1 ,sn ) be a history up to the nth state for n ≥ 2 and put h1 = (s1 ). Figure: Evolution of the game

a1 a2 a3 a4

q(i s1, a1,b1 ) q(i s2 ,a2 ,b2 ) q(i s3, a3,b3 )  s1 s2 s3 s4

b1 b2 b3 b4

ri (s1, a1,b1 ) ri (s2 ,a2 ,b2 ) ri (s3, a3,b3 ) ri (s4 ,a4 ,b4 )

STRATEGY FOR PLAYER 1 is a sequence π = {π n }, where π i h ∈Pr(A) for every n ∈. n ( n ) STRATEGY FOR PLAYER 2 is a sequence γ = {γ n }, where γ i h ∈Pr(B) for every n ∈. n ( n )

Denote by Π the set of all strategies for Player 1 and Γ the set of all strategies for Player 2.

The existence of a process is guaranteed by the Ionescu-Tulcea

Theorem. For any initial state s := s1 ∈S and strategies π = {π n }

πγ and γ = {γ n }, there exists a unique probability measure Ps on (Ω,(Ω)), where Ω = (S × A × B)∞ and (Ω) is a σ -algebra generated by the cylinder (“history”) sets in Ω. Ck - cylinder set in Ω, the event that the game has a history hk , i.e., ⎛ ⎞

Ck = ⎜ s1,a1,b1,…, sk−1,ak−1,bk−1,sk ,A, B,S, A, B,S...⎟ . ⎜  ⎟ ⎝ hk ⎠ Then, πγ Ps (Ck ) =

q(sk sk−1 ,ak−1 ,bk−1 )π k−1 (ak−1 hk−1 )γ k−1 (bk−1 hk−1 ) …q s s ,a ,b π a s γ b s . ( 2 1 1 1 ) 1 ( 1 1 ) 1 ( 1 1 ) n ri (s,π,γ ) - the expected payoff for Player i at the nth stage of the game, when players use strategies π and γ . For example, 3 ri (s,π,γ ) = r (s ,a ,b ) a h b h ∑∑ ∑ ∑ ∑ i 3 3 3 π 3 ( 3 3 )γ 3 ( 3 3 ) a1,b1 s2 a2 ,b2 s3 a3 ,b3

q(s3 s2 ,a2 ,b2 )π 2 (a2 h2 )γ 2 (b2 h2 )

q(s2 s1,a1,b1 )π1 (a1 s1 )γ 1 (b1 s1 ).

Here, s = s1. (I) FINITE HORIZON MODEL N N n−1 n , Ji (s,π,γ ) = ∑δ ri (s,π,γ ) n=1 where δ ∈(0,1] is a discount coefficient.

(II) INFINITE HORIZON MODEL ∞ n−1 n . Ji (s,π,γ ) = ∑δ ri (s,π,γ ) n=1 Let us now get rid of one player, say Player 2, and examine the model with a one decision maker. It is known as MARKOV DECISION PROCESS, abbr. MDP. MDP: (S, A,q,r)

i S - a denumerable state space; i A- finite set of action for the decision maker; i q(z s,a) - a transition probability from state s to z; i r :S × A →  - one-step bounded payoff function. Evolution of the process develops analogously as in the game.

Let f :S → A be a decision function. A strategy π = { f , f,...} is called stationary. We shall further identify π = { f , f,...} with its member f . In other words, a decision maker, when uses a stationary strategy, looks only at the current state of the system and based on this knowledge he chooses an action. We shall consider non-randomised strategies, which will turn out to be sufficient for studying MDP models.

Definition: Let u be a bounded function on S. For any stationary strategy f define the operator L f as follows

(L f u)(s) = r(s, f (s)) + δ∑u(z)q(z s, f (s)). z∈S

Let us now examine consecutive composition of L f with itself. L2 u (s) r s, f (s) L u (z)q z s, f (s) r s, f (s) ( f ) = ( ) + δ∑( f ) ( ) = ( ) z∈S ⎛ ⎞ +δ ∑⎜ r(z, f (z))+ δ∑u(z')q(z' z, f (z))⎟ q(z s, f (s)) z∈S ⎝ z'∈S ⎠ = J 2 (s, f )+ δ 2 ∑∑u(z')q(z' z, f (z))q(z s, f (s)) z∈S z'∈S = J 2 (s, f )+ δ 2 ∑u(z)q2 (z s, f (s)). z∈S

Hence, for an arbitrary n ∈ we have n n n n (L f u)(s) = J (s, f ) +δ ∑u(z)q (z s, f (s)). z∈S Here, qn denotes the transition after n periods when a stationary strategy π = { f , f ,...} is used and J n (s, f ) represents the corresponding expected discounted cost in n-step model. Since, u(z) < K for all z ∈S and some positive constant K , then δ n ∑u(z)qn (z s, f (s)) < δ n K → 0 as n → ∞. z∈S Hence, ∞ n n n n lim L f u (s) = lim J (s, f ) = J(s, f ) = δ r s, f . n→∞ ( ) n→∞ ∑ ( ) n=1 J (s, f ) is the expected discounted cost in infinite horizon model. CLAIM 1: Let , i=1,2, be bounded functions. wi :S × A →  Then,

max w1 (s, a) − maxw2 (s, a) ≤ max w1 (s,a)− w2 (s, a) . a∈A a∈A a∈A

Proof: Note that maxw (s,a) = max w (s, a) − w (s,a)+ w (s, a) a∈A 1 a∈A ( 1 2 2 )

≤ max w1 (s,a) − w2 (s,a) + maxw2 (s,a). a∈A ( ) a∈A

Hence, maxw1 (s,a) − maxw2 (s,a) ≤ max w1 (s,a) − w2 (s,a) . a∈A a∈A a∈A The rest follows from symmetry.  THE BANACH FIXED POINT THEOREM

Let X be a complete metric space and T : X → X be a contractive operator, i.e., there exists a constant δ ∈(0,1) such that d(T (x),T (y)) ≤ δd(x, y), for all x, y ∈X . Clearly, d is a metric in X . Then, (I) there exists a unique point x* ∈X such that x* = T (x* ), and * n (II) x = limT x0 , for any x0 ∈X . n→∞ APPLICATION TO MDP

Let B(S) be the set of all bounded functions on S (bounded sequences) equipped with the supremum metric:

d (u1,u2 ) = sup u1 (s) − u2 (s) . s∈S Then B(S) is a Banach space. Let T : B(S) → B(S) be an operator defined as follows ⎡ ⎤ (Tu)(s) = max r(s,a) + δ u(z)q z s,a . a∈A ⎢ ∑ ( )⎥ ⎣ z∈S ⎦ CLAIM 2: T is a contraction operator.

Proof: Note that by virtue of Claim 1 we have that

Tu1 (s) − Tu2 (s) ≤ max δ u1 (z) − u2 (z) q z s,a . ( ) ( ) a∈A ∑[ ] ( ) z∈S Further, we get that

Tu1 (s) − Tu2 (s) ≤ maxδ sup u1 (z) − u2 (z) q z s,a ( ) ( ) a∈A ∑ ( ) z∈S z∈S

≤ δd(u1,u2 ).

Hence, d (Tu1,Tu2 ) ≤ δd (u1,u2 ).  From the Banach Fixed Point Theorem there exists a unique function u * ∈B(S) such that it holds

* ⎡ * ⎤ u (s) = max r(s,a) +δ u (z)q z s,a a∈A ⎢ ∑ ( )⎥ ⎣ z∈S ⎦ for all s ∈S. This equation is called THE BELLMAN EQUATION, and the operator T THE DYNAMIC PROGRAMMING OPERATOR. STOCHASTIC GAMES

LECTURE 11

Recall the Bellman equation:

* ⎡ * ⎤ u (s) = max r(s,a) +δ u (z)q z s,a . a∈A ⎢ ∑ ( )⎥ ⎣ z∈S ⎦ A is finite ⇒ ∃ a decision rule f :S → A such that ⎡ ⎤ u *(s) r(s, f (s)) u * (z)q z s, f (s) L u * (s). = ⎢ + δ∑ ( )⎥ = ( f ) ⎣ z∈S ⎦ * n * Thus, u (s) = (L f u )(s) for every n ≥ 1 and all s ∈S. From the previous lecture we know that

* n * u (s) = lim L f u (s) = J(s, f ), where J(s, f ) is a n→∞ ( ) discounted payoff in infinite horizon time, where a stationary strategy { f , f ,...} is used.

On the other hand, for any a ∈A, we may write u * (s) ≥ r(s,a) + δ∑u *(z)q(z s,a). z∈S Iterating this inequality we get that * u (s) ≥ r(s,a1 ) + ⎡ ⎤ r(s ,a ) u* (s )q s s ,a q s s,a . δ ∑ ⎢ 2 2 + δ ∑ 3 ( 3 2 2 )⎥ ( 2 1 ) s2 ∈S ⎣ s3∈S ⎦

Hence,

* 2 2 * u (s) ≥ J (s,π) +δ ∑ u (s3 )q(s3 s2 ,a2 )q(s2 s,a1 ), s2 ,s3 and for any n ≥ 1, u * (s) ≥ J n (s,π) + n u* (s )q s s ,a q s s,a . δ ∑ n+1 ( n+1 n n ) ( 2 1 ) s2 ,...,sn+1 Clearly, the second term tends to 0 (u * ∈B(S)). Letting n → ∞, we conclude that u * (s) ≥ J(s,π) for each s ∈S and any strategy π ∈Π.

Summing up, J(s, f ) = u* (s) ≥ J(s,π ) for any π ∈Π and all s ∈S;

u * (s) = max J(s,π ) = J(s, f )-optimal payoff function π∈Π

{ f , f , f , f , f …} -optimal stationary strategy From the Banach Fixed Point Theorem (see point (II)), we * n know that u (s) = lim T u0 (s), s ∈S, for any u0 ∈B(S). n→∞ ( )

n Let us put u0 ≡ 0 and examine T u0 .

i Tu0 (s) = maxr(s,a) the optimal payoff in 1-step model; ( ) a∈A - ∃ f :S → A Tu (s) = L u (s) = r(s, f (s)). 1 ( 0 ) ( f1 0 ) 1

2 ⎡ ⎤ i T u0 (s) = max r(s,a) +δ Tu0 (z)q z s,a the ( ) a∈A ⎢ ∑( ) ( )⎥ - ⎣ z∈S ⎦ optimal payoff in 2-step model; 2 ∃ f2 :S → A (T u0 )(s) = ⎡ ⎤ max r(s, f2 (s)) + δ Tu0 (z)q z s, f2 (s) , a∈A ⎢ ∑( ) ( )⎥ ⎣ z∈S ⎦

T 2u (s) = L L u (s) = J 2 s,π * , where π * = f , f ( 0 ) ( f2 ( f1 0 )) ( ) { 2 1 } is an optimal strategy for two-step game. 

n n * i T u0 (s) = L f … L f L f u0 (s) = J s,π the ( ) ( n ( 2 ( 1 ))) ( ) - optimal payoff in n-step model, where π * = f , , f is an { n … 1} optimal strategy in this model. CLAIM 3: sup J n (s,π) → sup J(s,π) as n → ∞. π∈Π π∈Π Proof: By Claim 1, it follows that

sup J n (s,π ) − sup J(s,π) ≤ sup J n (s,π) − J(s,π) . π ∈Π π∈Π π ∈Π The right-hand side equals: Cδ n sup ∑ δ k−1r k (s,π) ≤ sup ∑ δ k−1C = → 0. π∈Π k=n+1 π ∈Π k=n+1 1− δ

Here, C is a constant such that r(s,a) ≤ C for any s ∈S,

k a ∈A and consequently r (s,π ) ≤ C for any k ∈.  CONCLUDING REMARKS FOR MDP MODELS

For a discounted MDP with infinite horizon time the following facts hold:

(I) There exists a solution u * to the Bellman Equation:

* ⎡ * ⎤ u (s) = max r(s,a) +δ u (z)q z s,a , a∈A ⎢ ∑ ( )⎥ ⎣ z∈S ⎦ where u * (s) = (Tu * )(s); moreover, u * is an optimal discounted payoff,i.e., u * (s) = sup J(s,π ); π∈Π (II) There exists an optimal stationary strategy π , where π = { f , f ,…}, and a decision function f : S → A attains maximum on the right-hand side in the Bellman equation;

(III) The infinite horizon model can be approximated by the finite models: * n u (s) = lim T u0 (s), u0 ≡ 0, n→∞ ( ) { optimal discounted payoff in n-step model

This algorithm is known as value iteration, since the function u* is also named value function; For every n-step horizon model there exists an optimal strategy π * obtained by the backward induction. Each decision rule in π * depends on the current state and the moment/epoch at which the action is selected. Such a strategy is called .

The beginning of the theory of stochastic games as well as Markov decision processes is dated back the L.S. Shapley’ paper: “Stochastic games”, Proc. Nat. Acad. Sci. USA, vol. 39, pp. 1095-1100. Shapley defined an infinite game, for which the probability of its continuation is δ (our discount coefficient). ZERO-SUM DISCOUNTED STOCHASTIC GAMES (S, A, B, q, r) {

state space action spaces transition probability

r :S × A × B →  - one-step bounded reward function for Player 1 and cost function for Player 2;

Since we deal with a game, we allow the players to randomize their choices! Recall that

hn = (s1 ,a1 ,b1 ,..., sn−1 ,an−1 ,bn−1,sn ). Player 1 → Maximizer: strategy π = π ,π ,... ∈Π, where π i h ∈Pr(A), ( 1 2 ) n ( n ) stationary strategy f , f ,... F , where f s Pr(A) π = ( ) ∈ (i ) ∈ Player 2 → Minimizer: strategy γ = γ ,γ ,... ∈Γ , where γ i h ∈Pr(B), ( 1 2 ) n ( n ) stationary strategy g, g,... G, where g s Pr(B) γ = ( ) ∈ (i ) ∈

The discounted payoff function for Player 1 (cost function

∞ for Player 2): J (s,π,γ ) = ∑δ n−1r n (s,π,γ ), where r n (s,π,γ ) is n=1 the expected reward (for Player 1) in the nth step. THEOREM

(I) A discounted zero-sum stochastic game has a value V, i.e., V(s) := supinf J (s,π,γ ) = inf sup J (s,π,γ ). π∈Π γ ∈Γ γ ∈Γ π ∈Π

(II) Moreover, each player possesses an optimal stationary strategy, that is, ∃ f * ∈F such that V(s) = inf J s, f *,γ , and γ ∈Γ ( ) ∃ g* ∈G such that V(s) = sup J (s,π,g* ) for all s ∈S. π∈Π Proof: i Define the operator T : B(S) → B(S) as follows: (Tu)(s) := ⎡ ⎤ max min r s,µ,ρ +δ u(z)q z s, µ,ρ . µ∈Pr(A) ρ∈Pr( B) ⎢ ( ) ∑ ( )⎥ ⎣ z∈S ⎦

By the von Neumann Theorem we may replace the order of min and max and write this definition as ⎡ ⎤ (Tu)(s) = val ⎢r(s,i,i) + δ∑u(z)q(z s,i,i)⎥. ⎣ z∈S ⎦ Here, r(s, µ,ρ) = ∑∑r(s,a,b)ρ(b)µ(a), where µ(a) is a a∈A b∈B probability of choosing an action a, and ρ(b) is a probability of selecting b. Similar formula applies to q s, , . (i µ ρ)

i Next we have to show that T is contractive, that is, for any u1,u2 ∈B(S) it holds

sup (Tu1 )(s) − (Tu2 )(s) ≤ δ sup u1 (s) − u2 (s) . s∈S s∈S Proof of this fact is similar to the case of MDP, and follows from the fact that minϕ = − max(−ϕ). The Banach Fixed Point Theorem implies that there exists v ∈B(S) such that v(s) = (Tv)(s) for all s ∈S. We choose f s Pr(A) such that i (i ) ∈ v(s) = ⎡ ⎤ min r s, f i s ,ρ + δ v(z)q z s, f i s ,ρ . (*) ρ∈Pr( B) ⎢ ( ( ) ) ∑ ( ( ) )⎥ ⎣ z∈S ⎦

Hence, f (i s) realizes the maximum of the minimum. Let us now consider MDP, for Player 2, defined by the objects: (S,Pr(B),q,r), where q(i s,ρ) = q(i s, f (i s),ρ) = ∑q(i s,a,b) f (a s)ρ(b), a,b and r(s, ρ) = r(s, f (i s), ρ) = ∑r(s,a,b) f (a s)ρ(b). a,b Hence, (*) takes the form ⎡ ⎤ v(s) = min r s,ρ + δ v(z)q z s,ρ ρ∈Pr( B) ⎢ ( ) ∑ ( )⎥ ⎣ z∈S ⎦ ⎡ ⎤ ≤ ⎢r(s,ρ) + δ∑v(z)q (z s,ρ)⎥ ∀ρ ∈Pr(B). ⎣ z∈S ⎦

Iterating the last inequality, we infer that  v(s) ≤ J(s,γ ) = J(s, f ,γ ) for every γ ∈Γ and s ∈S. This means that v(s) ≤ inf J(s, f ,γ ) ≤ supinf J(s,π,γ ). γ ∈Γ π ∈Π γ ∈Γ Now we choose g s Pr(B) such that i (i ) ∈ v(s) = ⎡ ⎤ max r s,µ, g i s +δ v(z)q z s,µ,g i s . () µ∈Pr(A) ⎢ ( ( )) ∑ ( ( ))⎥ ⎣ z∈S ⎦

Hence, g(i s) realizes the minimum of the maximum. Let us now consider MDP, for Player 1, defined by the objects: (S,Pr(A),qˆ,rˆ), where qˆ(i s,µ) = q(i s,µ,g(i s)) = ∑q(i s,a,b)µ(a)g(b s), a,b and rˆ(s,µ) = r(s,µ, g(i s)) = ∑r(s,a,b)µ(a)g(b s). a,b Hence, () takes the form ⎡ ⎤ v(s) = max rˆ s,µ +δ v(z)qˆ z s,µ µ∈Pr(A) ⎢ ( ) ∑ ( )⎥ ⎣ z∈S ⎦ ⎡ˆ ˆ ⎤ ≥ ⎢r(s, µ) +δ∑v(z)q(z s, µ)⎥ ∀µ ∈Pr(A). ⎣ z∈S ⎦

Iterating the last inequality, we deduce that v(s) ≥ Jˆ(s,π) = J(s,π, g) for every π ∈Π and s ∈S. This means that v(s) ≥ sup J(s,π, g) ≥ inf sup J(s,π,γ ). π∈Π γ ∈Γ π ∈Π i Summing up, we obtain that infsup J(s,π,γ ) ≤ sup J(s,π,g) ≤ v(s) γ ∈Γ π∈Π π ∈Π ≤ inf J(s, f ,γ ) ≤ sup inf J(s,π,γ ) (). γ ∈Γ π∈Π γ ∈Γ But, it is always true that infsup J(s,π,γ ) ≥ supinf J(s,π,γ ), γ ∈Γ π∈Π π ∈Π γ ∈Γ thus we must have equalities in all places in (). Hence, v is a value of the game, and optimal stationary strategies for Player 1 and 2 are: f * := f , g* := g, which are obtained from the corresponding Bellman equations.  Further extensions

Instead of zero-sum discounted stochastic games one may consider games with ergodic payoff: 1 n w(s,π,γ ) := limsup ∑r n (s,π,γ ) n→∞ n k=1 with a finite state space S. Neyman and Mertens in 1981 proved that lim(1− δ )vδ (s) is a value of the game with δ→1− ergodic payoff, where vδ is a value of a discounted game. Earlier Bewley and Kohlberg (1976) showed that this limit always exists. FISH WAR GAME

LECTURE 12

A NOTE ON AN EQUILIBRIUM IN THE GREAT FISH WAR GAME A.S. Nowak, Econ. Bull. 17(2006) We consider the following non-zero sum stochastic game: i There are two-players, who extract renewable resource, e.g., fish; i The resource stock develops over time according to the growth rule: x xα , 0 1, t 1,2, t+1 = t < α < = … xt ∈X := [0,1] -state space; i The utility (payoff) function for Player i (i=1,2) is: u i (c) = lnc, c ∈X

i i Let ct be a resource consumption of Player i in period t. The aim of Player i is:

i i i 1 2 α maxU (c1 ,c2 ,...) subject to xt+1 = (xt − ct − ct ) . The strategies for the players are defined in usual manner. Let π = (π 1,π 2 ) be any strategy profile.

We shall consider discounted utilities: (a) in finite horizon game N i t−1 i , U N (x,π ) = ∑δ ln ct t=1 (b) in infinite horizon game ∞ i t−1 i . U (x,π ) = ∑δ lnct t=1 AIM: Find a Nash Equilibrium in a game (a) and (b). Theorem 1:

The finite N-step game has a symmetric Nash Equilibrium * π = {cN ,cN −1,..., c1 }: x x c (x) = , c (x) = t 1 , t = 2,..., N . 1 2 t 2 + αδ + ...+ (αδ ) − The equilibrium function in the k-period k ∈{1,..., N} i * game is: Vk (x) = Uk (x,π ), where x V (x) = ln , 1 2 k V (x) = 1+ + αδ ln x + ln B  k+1 ( ( ) ) k+1 ( ) 1 and B = , 1 2

k δ k αδ +...+(αδ ) Bk (αδ + ...+ (αδ ) ) B = k . k+1 1+αδ +...+(αδ ) (2 + αδ + ...+ (αδ)k ) Moreover, for n ∈{1,..., N −1}

α V (x) = lnc (x) +δV x − 2c (x) n+1 n+1 n (( n+1 ) ) ⎡ α ⎤ = max lnc + δVn (x − cn+1 (x) − c) . 0≤c ≤x ⎣ ( )⎦ Proof: Proof is by backward induction: x i V1(x) = max[lnc] = ln - there is nothing afterwards 0≤c≤x/2 2 and the players divide equally the amount x between x them. Thus, c (x) = . 1 2

i V2 (x) = ⎡ ⎛ (x − c − c* )α ⎞ ⎤ = max[lnc +δV1(y)] = max ⎢ln c + δ ln⎜ ⎟ ⎥. 0≤c≤x 0≤c ≤x ⎣ ⎝ 2 ⎠ ⎦ 1 −αδ First order condition gives: + = 0. Similarly, c x − c − c* 1 −αδ by symmetry for Player 2 we have that + = 0. c* x − c − c* Since, we look for a , we get that x x c (x) := c = c* = < . 2 2 +αδ 2 Thus, δ x ⎛ αδ x ⎞ ⎛ 1⎞ V2 (x) = ln + αδ ln⎜ ⎟ + ln⎜ ⎟ = 2 + αδ ⎝ 2 +αδ ⎠ ⎝ 2⎠ (αδ )αδ (1 / 2)δ = (1+ αδ )ln x + ln 1+αδ . 2 + αδ ( )  B2

i V3 (x) = max lnc + δV1 (y) = 0≤c≤ x [ ]

Applying again FOC and the fact that c = c* , we get that 1 αδ(1 +αδ ) = and, consequently, c x − 2c x c (x) := c = . 3 2 + αδ(1+ αδ)

Putting this formula into the above equation we obtain

x V (x) = ln + 3 2 +αδ + (αδ )2

2 ⎛ (αδ + (αδ ) ) x ⎞ αδ(1+ αδ)ln⎜ 2 ⎟ + δ ln B2 = ⎝⎜ 2 + αδ + (αδ ) ⎠⎟ δ 2 B2 = 1+ αδ + (αδ ) ln x + ln 2 ( ) 2 + αδ + (αδ ) 2 2 αδ +(αδ ) αδ + αδ ( ( ) ) 2 + ln 2 = 1 +αδ + (αδ ) ln x + ln B3 . 2 + αδ + (αδ ) ( )

Continuing this procedure, we obtain the expression for each Vk , 1 ≤ k ≤ N. 

Remark: In order to obtain the result we have used backward induction, which means that the equilibrium function for N period has the number 1, for N-1 has the number 2, etc.

PERIOD {

1 2 3 N − 2 N −1 N TIME 

Step N: Step N-1: Step N-2: Step 3: Step 2: Step 1: Backward V , c V , c V , c V , c Induction N N VN −1, VN −2 , 3 3 2 2 1 1

cN−1 cN−2 CLAIM 1: {Bk } is decreasing.

1 Proof: Recall that B = and for k ≥ 1 1 2

k δ k αδ +...+(αδ ) Bk (αδ + ...+ (αδ ) ) B = k . k+1 1+αδ +...+(αδ ) (2 + αδ + ...+ (αδ)k )

y ⎛ y ⎞ Define φ :(0,1) → (0,∞) as follows φ(y) = . ⎝⎜ 2 + y⎠⎟

Then, φ '(y) y y = (lnφ(y))' = ln − +1 ≤ 0, φ(y) 2 + y 2 + y because lns − s +1 ≤ 0, for s ∈(0,1). Hence, φ is φ(y) decreasing and, consequently, ψ (y) = is 2 + y decreasing. Put k (αδ + ...+ (αδ)k )αδ +...+(αδ ) y := k k 1+αδ +...+(αδ ) (2 + αδ + ...+ (αδ)k ) and observe that y > y for every n ∈ ψ  . n n+1 ( ) Now by induction we show that {Bn } is decreasing.

αδ  1 (αδ) 1 1 B2 = αδ < 2δ (2 + αδ ) (2 + αδ ) 2δ (2 + αδ) 1 1 < < = B . 2δ 2 2 1

 2 Suppose that Bn+1 < Bn for some n ∈. Since δ δ yn+1 < yn , we obtain Bn+2 = Bn+1yn+1 < Bn yn = Bn+1,which completes the induction step.  CLAIM 2: {Bk } is bounded from below by some positive constant.

Proof: Observe that

1−δ Bn+1 Bn+1 Bn+1 = > = yn > lim yk > 0. δ δ k→∞ Bn+1 Bn

Therefore, l := lim Bn+1 exists and l = inf Bn+1 > 0.  n→∞ n∈

δ l1 Now, since Bn+1 = Bn yn , it follows that l = , where l2 αδ 1 ⎛ αδ ⎞ (1−αδ )(1−δ ) ⎛ 2 − αδ ⎞ (1−αδ )(1−δ ) l1 = ⎜ ⎟ , l2 = ⎜ ⎟ . ⎝ 1 −αδ ⎠ ⎝ 1 −αδ ⎠

Next, we observe that

x x(1− αδ) lim cN (x) = N −1 = =: f* (x). N →∞ 2 + αδ + ...+ (αδ ) 2 − αδ

Set vδ := limVk+1. By above discussion vδ is well-defined, k→∞ i.e., vδ (x) < ∞ for every x ∈X . Letting k → ∞ (and at the same time N → ∞) in (),

k vδ (x) = lim 1++ (αδ ) ln x + ln Bk+1 = k→∞ (( ) ) ln x + (*) 1− αδ αδ ln(αδ ) + (1 −αδ )ln(1− αδ ) − ln(2 −αδ ) (1−αδ )(1− δ ) for every x ∈X . Theorem 2:

The infinite game has a symmetric stationary Nash Equilibrium x(1 −αδ ) f* (x) = . 2 − αδ

The equilibrium function is vδ defined in (*). This function for every x ∈X satisfies the following equations: α v (x) = ln f (x) + δv x − 2 f (x) (**) δ * δ (( * ) ) ⎡ α ⎤ = max ln c +δvδ (x − f* (x) − c) . 0≤c≤ x ⎣ ( )⎦

Remark: The above equation is called a Bellman equation. Since we deal with a non-zero sum game, we get the Bellman equation for each player. However, we do not write the upper index to indicate the player, because we look for symmetric equilibrium and, consequently, the equilibrium function is the same for both players. Proof: Recall that from Theorem 1, we have that

α V (x) = lnc (x) +δV x − 2c (x) k+1 k+1 k (( k+1 ) ) for 1 ≤ k ≤ N −1 and x ∈X. Hence, letting k, N → ∞, we obtain the first inequality in (**).

Now from Theorem 1 we also obtain that ⎡ α ⎤ Vk+1 (x) = max lnc + δVk (x − ck+1(x) − c) . 0≤c ≤x ⎣ ( )⎦

Clearly, Vk+1 → vδ as k → ∞. Next we show that the second equality in (**) holds. Indeed, it is easy to see that when we insert formulae for vδ and f* in (**) and making use of the first order 1 αδ 1 condition, we get that − = 0, c 1− αδ x − f* (x) − c x αδ which implies that − c = c and , 2 − αδ 1− αδ (1− αδ ) x consequently, c = . 2 − αδ Putting this expression in the second equality of (**), we get exactly the right-hand side of the first equality of (**). This completes the proof.  SUPERMODULAR GAMES

LECTURE 13 Supermodular games are games in which each player’s marginal utility of increasing his strategy rises with increases in his rivals’ strategies. In such games the best response correspondences are increasing. Supermodular games are well behaved and possess pure strategy Nash equilibria. The simplicity of supermodular games makes convexity and differentiability assumptions unnecessary, although they are satisfied in most applications. i Si Player i’s strategy set; it is a subset (not necessarily compact and convex) of a finite dimensional Euclidean space  mi ; S S S is a subset of m , where i = 1 ×× n  m m m . = 1 ++ n

Let us introduce the partial order ‘≥’ in  K :

i x ≥ y, if xk ≥ yk for all k = 1,...,K ; i x > y, if x ≥ y and there exists k ∈{1,...,K} such that xk > yk ; Remark: If a vector dominates another in one component but is dominated in another component , the vectors cannot be compared.

meet x y min{x , y },...,min{x ,y } i ⇔ ∧  ( 1 1 K K ) join x y max{x , y },...,max{x ,y } i ⇔ ∨  ( 1 1 K K )

m S is sublattice of  if s ∈S and s ∈S imply that s ∧ s ∈S and s ∨ s ∈S. A set S has a greatest element s (a least element s ) if s ≤ s (s ≥ s ) for all s ∈S. A topological result (see Birkhoff 1967) guarantees that S has a greatest element and a least element, if S is non- empty and compact sublattice of  m .

Definition: ri (si ,s−i ) has increasing differences in (s ,s ), if for all (s ,s ) S 2 and (s ,s ) S 2 such that i −i i i ∈ i −i −i ∈ −i s s and s s , i ≥ i −i ≥ −i r (s ,s ) r (s ,s ) r (s ,s ) r (s ,s ); i i −i − i i −i ≥ i i −i − i i −i ri (si ,s−i ) has strictly increasing differences in (si ,s−i ), if for all (s ,s ) S 2 and (s ,s ) S 2 such that s s i i ∈ i −i −i ∈ −i i > i and s s , −i > −i r (s ,s ) r (s ,s ) r (s ,s ) r (s ,s ). i i −i − i i −i > i i −i − i i −i

Remark: Increasing differences says that an increase in strategies of Player i’s rivals raises the desirability of playing a high strategy for Player i.

Definition: ri (si ,s−i ) is supermodular in si , if for each s−i r (s ,s ) r (s ,s ) r (s s ,s ) r (s s , s ) i i −i + i i −i ≤ i i ∧ i −i + i i ∨ i −i for all (s ,s ) S 2 . i i ∈ i

ri (si ,s−i ) is strictly supermodular in si , if this inequality is strict whenever s and s cannot be compared with i i respect to ‘≥’.

Note that supermodularity is automatically satisfied if Si is single-dimensional. The supermodularity will be needed in the case of multi-dimensional strategy spaces to prove that each player’s best responses are increasing with his rivals’ strategies. Topkis (1979) showed that if S mi and if r is twice i =  i continuously differentiable in si , then ri is supermodular in s , iff for any two components s and s i iκ i (with κ ≠ )

∂2r i ≥ 0. ∂s ∂s iκ i

D. Topkis (1979), Equilibrium points in nonzero-sum n-person submodular games, SIAM Journal on Control and Optimization 17 (1979), 773-787. Supermodular Game is such that for each i, Si is

mi a sublattice of  , ri has increasing differences in

(si ,s−i ) and ri is supermodular in si .

Example (Cournot game) Consider a duopoly. Firm i ∈{1,2} chooses a quantity qi ∈[0,qi ]. Suppose that inverse demand functions Pi (qi ,q j ) are twice continuously differentiable, and that Pi and firm i’s

∂Pi marginal revenue, i.e., Pi + qi are decreasing in q j . ∂qi Firm i’s cost Ci (qi ) is assumed differentiable. the payoffs are

ri (qi ,q j ) = qi Pi (qi ,q j ) − Ci (qi ).

If s1 ≡ q1 and s2 ≡ −q2 , the transformed payoffs satisfy ∂2r i ≥ 0 for all i ≠ j . Thus the game is supermodular. ∂si∂s j

Note that the above-mentioned transformation works only for a two-player game. THEOREM (Tarski 1955)

If S is non-empty compact sublattice of  m and f :S → S is increasing (i.e., f (x) ≤ f (y) for x ≤ y), then f has a fixed point in S.

Intuition: Consider a single-dimensional case with S = [0,1]. In order not to have a fixed point, the function f (see the figure below) would need to “escape” the area above the diagonal and “jump” into the area below the diagonal; but increasing functions do not jump down. 1 f

0 1 s

In the multi-dimensional case the intuition is the same, as no component of f (s) jumps down when an arbitrary component of s increases. Definition: A function r(i) on S is upper semicontinuous at s, if for any sequence sk → s limsupr(sk ) ≤ r(s). k→∞

Upper semicontinuous functions r(s) r(s)

0 s 0 s THEOREM 1 (Topkis 1979)

Let Si be compact, ri be upper semicontinuous in si for each s−i and all i ∈{1,...,n}. Suppose that the game is supermodular. Then, the set of pure strategy Nash equilibria is a non-empty and possesses greatest and least equilibrium points s and s .

Sketch of the proof: Let bi (s−i ) be the set of best responses of Player i to a strategy profile s−i , i.e., the set of s S that maximize r ( ,s ). i ∈ i i i −i

Claim: If Si is compact and ri is upper semicontinuous in si , then bi is non-empty, since ri (si ,s−i ) attains the maximum in si on Si . k Indeed, consider a sequence si such that k supri (si ',s−i ) = lim ri (si ,s−i ) . Compactness implies that k→∞ si '∈Si k there exists a converging subsequence of si → si (w.l.g. k we may assume that si is this subsequence). Upper k semicontinuity implies that ri (si ',s−i ) ≥ limsupri (si , s−i ), k→∞ so that si is indeed a best response to s−i . 

To show that bi (s−i ) is a sublattice for each s−i , suppose that s and s belong to b (s ) and i i i −i r (s s ,s ) r (s ,s ) r (s ,s ). i i ∧ i −i < i i −i = i i −i Supermodularity of r ( ,s ) implies that i i −i r (s ∨ s ,s ) ≥ r (s ,s ) + r (s ,s ) − r (s ∧ s , s ) i i i −i i i −i i i −i i i i −i > r (s ,s ) + r (s ,s ) − r (s ,s ) = r (s , s ), i i −i i i −i i i −i i i −i which contradicts the assumption that s and s are best i i responses to s−i . The same reasoning applies to the join.

mi Since bi (s−i ) is non-empty compact sublattice of  , it has a greatest element si (s−i ).

Claim: s ( ) is non-decreasing, i.e., i i s s s (s ) s (s ). −i ≥ −i ⇒ i −i ≥ i −i Indeed, let s s (s ) and s s (s ). i  i −i i  i −i Then, 0 ≤ ri (si ,s−i ) − ri (si ∨ si ,s−i ) ≤ ri (si ,s−i ) − ri (si ∨ si ,s−i ) ≤ r (s ∧ s ,s ) − r (s ,s ) ≤ 0. i i i −i i i −i The first and fourth inequalities result from optimality of s against s and s against s , the second from the i −i i −i increasing differences and s s s , and the third from i ≤ i ∨ i supermodularity in Player i’s strategy. Since si is the greatest element of b (s ), then s s s . Therefore, i −i i ≥ i ∨ i s s s , and consequently s s . i = i ∨ i i ≥ i 

We can now apply Tarski’s theorem to f (s) = s (s),..., s (s) . By construction a fixed point s of f ( ) ( 1 n ) i (which exists) is a pure strategy Nash Equilibrium. It can be shown that s is the greatest element in the set of Nash equilibria. The intuition is again that a higher strategy triggers a higher best response. Finally, by symmetry, the analysis applies to lower bounds as well. 

For supermodular games with strictly increasing differences, one can prove monotonicity of the entire reaction correspondence. THEOREM 2 (Topkis 1979)

Consider a supermodular game with strictly increasing differences. If s b (s ), s b (s ) and s s , then i ∈ i −i i ∈ i −i −i ≥ −i s s . i ≥ i

Proof: The assertion follows from the following chain of inequalities:

0 ≤ ri (si ,s−i ) − ri (si ∨ si ,s−i ) ≤ ri (si ,s−i ) − ri (si ∨ si ,s−i ) ≤ r (s ∧ s ,s ) − r (s ,s ) ≤ 0. i i i −i i i −i The first and fourth inequalities result from optimality of s against s and s against s , the second from the i −i i −i increasing differences and s s s , and the third from i ≤ i ∨ i supermodularity in Player i’s strategy. Finally, note that if s is not greater or equal s , then i i s s s , and strictly increasing differences implies that i < i ∨ i the second inequality is strict, which is contradiction.  COOPERATIVE GAMES

LECTURE 14

A COOPERATIVE GAME is a game in which the players have complete freedom of preplay communication to make joint binding agreements. Such a game is described by a triplet

(N,P(N ),v) where • N = {1,…,n} - set of n players.

• P(N) = 2 N - set of all coalitions; Any subset S ⊆ N is called a coalition.

• v :P(N ) →  - the characteristic function; in other words, it is a function such that v(S) represents the largest joint payoff which the coalition S⊆ N is guaranteed.

We define v(∅) = 0. AIM OF THE GAME

To find a “just” distribution of v(N ) among players, who can form coalitions S ⊆ N . A value of a coalition S is a number v(S).

The following property, possessed by any characteristic function, is called supperadditivity.

Proposition 1 For any finite cooperative game v(S ∪ T ) ≥ v(S) + v(T ), S,T ⊆ N, S ∩T = ∅. The strongest of supperadditivity is additivity: v(S ∪ T ) = v(S) + v(T ), S,T ⊆ N, S ∩ T = ∅. In such a game there is plainly no positive incentive for coalitions involving one player to form.

Definition: A cooperative game with additive characteristic function is called inessential. Other cooperative games are called essential. Imputation is a possible distribution of available payoff:

x=(x1,...,xn ) that satisfies the following conditions:

i individual rationality: xi ≥ v({i}) for all i ∈N (no member of coalition will consent to receive less than he or she can obtain individually),

collective rationality: i ∑ xi = v(N ). i∈N Proposition 2 A finite cooperative game is essential if and only if . ∑ xi = v(N ) i∈N

The set of imputations is never empty, since from the supperadditivity of v, we have ∑v({i}) ≤ v(N). For i∈N example, one imputation is given by x=(x1,...,xn ), where n−1 for and . xi = v({i}) i = 1,...,n −1 xn = v(N) − ∑v({i}) i=1 This is an imputation most preferred by Player n. In fact the set of imputations is exactly the simplex consisting of the convex hull of the n points obtained by letting xi = v({i}) for all xi except one, which is then chosen to satisfy . ∑ xi = v(N ) i∈N

1 For example, if v({1}) = , v({2}) = 0, v({3}) = 1, and 2 v({N}) = 9. The set of imputations is ⎧ 1 ⎫ ⎨(x1,x2 ,x3 ) : x1 + x2 + x3 = 9, x1 ≥ , x2 ≥ 0, x3 ≥ 1⎬. ⎩ 2 ⎭ This is a triangle each of whose vertices satisfy two of the three inequalities with equality, namely

(8,0,1) - imputation most preferred by Player 1 ⎛ 1 15 ⎞ , ,1 - imputation most preferred by Player 2 ⎝⎜ 2 2 ⎠⎟

⎛ 1 17 ⎞ ,0, - imputation most preferred by Player 3. ⎝⎜ 2 2 ⎠⎟ THE

Suppose some imputation x is being proposed as a division of v(N ) among the players. If there exists a coalition S, whose total return from x is less than what the coalition can achieve acting by itself, that is, ∑ xi < v(S), i∈S then it will be a tendency for a coalition S to form and upset the proposed x because such a coalition could guarantee each of its members more than they would receive from x. Such an imputation has an inherent instability. Definition: An imputation x is said to be unstable through a coalition S if ∑ xi < v(S). We say x is unstable i∈S if there is a coalition S such that x is unstable through S, and we say x is stable otherwise.

The set C of stable imputations is called the core,

C = {x = (x1 ,..., xn ) : ⎫ x v(N ), x v(S) for all S N . ∑ i = ∑ i ≥ ⊂ ⎬ i∈N i∈S ⎭ The core can consist of many points, but the core can also be empty. One may take the size of the core as a measure of stability, or of how likely it is that a negotiated agreement is prone to be upset.

Definition: A game in coalitional form is said to be constant-sum if v(S) + v(S c ) = v(N) for all coalitions S ∈2 N.

Proposition 3 The core of an essential constant-sum game is empty. Proof: Let x be an imputation. Since the game is essential, we have ∑v({i}) = v(N ). Then there must be a player k i∈N such that xk > v({k}), for otherwise

v(N ) = ∑ xi ≤ ∑v({i}) < v(N ). i∈N i∈N

essential game superadditivity our condition:

xi ≤ v({i})

Moreover, since the game is constant-sum, we have that v(N \ {k})+ v({k}) = v(N ). But then x must be unstable through the coalition N \ {k}, because

∑ xi = ∑ xi − xk < v(N) − v({k}) = v(N \ {k}).  i≠k i∈N

Example 1. Consider the game with characteristic function v given by

v({1}) = 1 v({1,2}) = 4

v({2}) = 0 v({1,3}) = 3 v({1,2,3} }) = 8 v({3}) = 1 v({3,2}) = 5 grand coalition The imputations are the points (x1 ,x2 , x3 ) such that x1 + x2 + x3 = 8 and x1 ≥ 1, x2 ≥ 0, x3 ≥ 1. This set is the triangle with vertices (7,0,1), (1,6,1), (1,0,7).

We plot this triangle in barycentric coordinates. This is done by pretending that the plane of the plot is x1 + x2 + x3 = 8 and giving each point on the plane three coordinates which add to 8. Then, it is easy to draw the line x1 = 1 or the line x1 + x3 = 3. Note that the latter line is the same as the line x2 = 5. It is apparent that the set of imputations is an equilateral triangle. (7,0,1)

x1 + x2 = 4

Unstable through {2,3} (4,0,4)

(3,0,5) x1 + x3 = 3 (3,4,1)

x2 + x3 = 5 (2,5,1) Unstable through THE CORE {1,2} Unstable through {1,3} (1,6,1) (1,5,2) (1,3,4) (1,0,7) Let us explain which imputations are unstable. The coalition can guarantee itself v({2,3}) = 5, so all points

(x1 ,x2 , x3 ) with x2 + x3 < 5 are unstable through {2,3}.

These are the points below the line x2 + x3 = 5 in the diagram. Since {1,2} can ensure itself v({1,2}) = 4, all points below and to the right of the line x2 + x1 = 4 are unstable. Finally since {1,3} can guarantee itself v({1,3}) = 3, all points below the line x3 + x1 = 3 are unstable. The core is the remaining set of points in the set of imputations given by the 5-sided figure, including the boundary. Example 2. A certain object is worth wi to Player i for i=1,2,3. We assume that w1 < w2 < w3 , which means that Player 3 values the object most. But Player 1 owns the object, so v({1}) = w1. Player 2 and 3 themselves can do nothing, so v({2}) = v({3}) = 0 and v({2,3}) = 0. If

Players 1 and 2 come together, the joint worth is w2 ,and therefore v({2,1}) = w2 . Similarly, v({3,1}) = w3 . If all three get together, the object is still only worth w3, so v({3,2,1}) = w3. Find the core of this game. The core consists of all vectors (x1 ,x2 , x3 ) satisfying

x1 ≥ w1 x1 + x2 ≥ w2

x2 ≥ 0 x1 + x3 ≥ w3 x1 + x2 + x3 = w3

x3 ≥ 0 x2 + x2 ≥ 0

It follows from x2 = w3 − x1 − x3 ≤ 0 and x2 ≥ 0 that x2 = 0 for all points of the core. Then we obtain that x1 ≥ w2 and x3 = w3 − x1 . Hence, the core is C = {(x,0,w3 − x) :w2 ≤ x ≤ w3 }.

This fact indicates that the object will be purchased by

Player 3 at some purchase price x between w2 and w3. Hence, Player 1 ends up with the amount x and Player 3 ends up with the object and the amount minus x. Player 2 plays no active role in this bargain, but without Player 2 around, Player 3 might hope to get the object for less than w2 (but not less than w1). COOPERATIVE GAMES

LECTURE 15

Lloyd Shapley in „A value for n-person games” (In: Contributions to the Theory of Games II, Eds. Kuhn, Tucker; Princeton University Press, 1953) proposed the following axioms of distribution of v(N ):

(S1) Axiom of effectiveness: ∑ϕi[v] = v(N ) i∈N (S2) Axiom of “dummy player”:

∀S ⊆ N v(S) = v(S \ {i}) ⇒ ϕi [v] = 0. (S3) Axiom of symmetry: for any S ⊆ N \ {i, j}, if

v(S ∪ {i}) = v(S ∪ { j}) ⇒ ϕi [v] = ϕ j [v]

(S4) Axiom of aggregation: if v and w are characteristic functions of two games with the same set of players, then

∀i ∈N ϕi [v + w] = ϕi[v]+ ϕi[w], where (v + w)(S) = v(S) + w(s).

Axiom (S4) was criticized, since it refers to the sum of characteristic functions of two games. Theorem (Shapley (1953))

The unique vector ϕ[v], which satisfies the axioms (S1)- (S4) is given by:

(n− | S |)!(| S | −1)! ϕi[v] = ∑ (v(S) − v(S \ {i})) i∈S n! (n −1− | T |)!| T |! = ∑ (v(T ∪ {i})− v(T )) T ⊆ N \{i} n! {

pT Interpretation: Since pT is the probability of forming ⎛ ⎞ coalition T , then is the expected ⎜ ∑ pT = 1⎟ ϕi[v] ⎝ T ⊆N \{i} ⎠ value of marginal distributions of Player i to each coalition.

Thus the theorem demonstrates that Shapley’s axioms are equivalent to the proposal that Player i should receive the average of all his or her contributions to coalitions S, which contain i. ϕ[v] is called

Remark: H.P. Young (1985) replaced the controversial aggregation axiom with the following one:

(S4’) if ∀S ⊆ N \ {i} v(S ∪ {i}) − v(s) = w(S ∪ {i}) − w(s), then ϕi[v] = ϕi[w]. If Player i has the same marginal distribution in two games, then his/her payoffs in these games are equal.

Theorem (Young (1985))

If a game satisfies the axioms (S1)-(S3)+(S4’), then there exists a unique imputation ϕ[v] which is a Shapley value. VOTING GAMES Let us consider a game, in which characteristic function equals 0 for losing coalitions and 1 for winning coalitions.

Definition: Let W be a set of winning coalition. A game is called simple, if: ∅ ∉W N ∈W if S ∈W and S ⊆ T , then T ∈W (each coalition including winning coalition is also winning). Weighted voting games:

[q; w1,...,wn ] q - number of votes to pass a bill, wi - number of votes of Player i.

S ∈W ⇔ ∑ wi ≥ q {i}∈S Example: United Nations Security Council. In 1978 U.N. Security Council had 15 members, 5 of whom had vetoes. For a resolution to pass it is necessary to have 9 affirmative votes and no vetoes. To treat this as a 15-player simple game take N = {1,...,15} and suppose that the first five have the veto. We say that coalition S is winning, i.e., v(S) = 1, if it can defeat a resolution, otherwise v(S) = 0. Thus the characteristic function is: if i ∈S for some 1 ≤ i ≤ 5, then v(S) = 1 otherwise, ⎧1, if | S |≥ 7 v(S) = ⎨ ⎩0, if | S |< 7.

Fortunately, it is only necessary to calculate ϕ1, since the symmetries of the game will then permit easy computation of ϕi for i > 5. Then, (n− | S |)!(| S | −1)! ϕ1 = ∑ (v(S) − v(S \ {i})) = 1∈S n!

14!0! 13!1! ⎛10⎞ (v({1}) − v(∅)) + (v({16}) − v(6)) + 15! 15! ⎝⎜1 ⎠⎟ 12!2! ⎛10⎞ 11!3! ⎛10⎞ (v({167}) − v(67)) + (v({1678})− v(678)) 15! ⎝⎜ 2 ⎠⎟ 15! ⎝⎜ 3 ⎠⎟

10!4! ⎛10⎞ 9!5! + (v({16789})− v(6789)) + 15! ⎝⎜ 4 ⎠⎟ 15! ⎛10⎞ (v({1678910}) − v(678910)) + ⎝⎜ 5 ⎠⎟

8!6! ⎛10⎞ (v({167891011}) − v(67891011)) = 0.19627... 15! ⎝⎜ 6 ⎠⎟ Hence,

⎧0.19627... if 1 ≤ i ≤ 5 ϕi = ⎨ ⎩0.00186... if 6 ≤ i ≤ 15.

From this it is apparent that 98.1% of the power is in the hands of five permanent members and individually a permanent member is more than 105 times as powerful as an ordinary member. THE BANZHAF INDEX

Assumption: Player i possesses a Banzhaf power index, which is proportional to the number of coalitions, in which he/she is critical (i.e., if he decides to leave the winning coalition, then this coalition becomes losing):

a number of winning coalition with Player i β := i a number of all coalitions with Player i 1 v(S {i}) v(S) = ∑ n−1 [ ∪ − ] S⊆ N \{i} 2 J. Banzhaf, „Weighted voting doesn’t work: a mathematical analysis”, Rutgers Law Rev. 19 (1965), 317-343.

Shapley Value ↔ Banzhaf Index

Let us consider the following weighted voting game: [6; 4, 3, 2, 1].

Then, the Shapley value is: ϕi = 1 ⎛ 5 3 3 1 ⎞ ∑ (| S | −1)!(n− | S |)! → ⎜ , , , ⎟ n! i is critical in S ⎝ 12 12 12 12 ⎠ Write all permutations of players and denote the critical players:

Player B is critical. A B C D

Player A as the first Player B as the second enters the grand enters the grand coalition coalition

A B D C, A C B D, A C D B, A D B C, A D C B,...etc.

Sometimes the Shapley value in voting games is called a Shapley-Shubik power index; L. Shapley, M. Shubik, „A method for evaluating the distribution of power in a committee system”, Amer. Polit. Sci. Rev. 48 (1954), 787-792.

The Banzhaf power index is: ⎛ 5 3 3 1 ⎞ β = , , , ⎝⎜ 8 8 8 8 ⎠⎟ Write all winning coalitions and denote the critical players (the order does not play a role): A B, A C, A B C, A C D, A B D, B C D, A B C D.

Remark: Generally, ∑βi ≠ v(N ) = 1. Hence, the i∈N Banzhaf index cannot be used to the division of v(N) among players, but it can be used to a study of importance of each player. AXIOMATIZATIONS OF THE BANZHAF POWER INDEX P. Dubey, L.S. Shapley (1979): the axioms that give some semi-values; G. Owen (1982): the axioms, that do not give the Banzhaf index uniquely; E. Lehrer, „An axiomatization of the Banzhaf value”, Inter. J. Game Theory 17 (1988), 89-99; A.S. Nowak, „On an axiomatization of the Banzhaf value without the additivity axiom”, Inter. J. Game Theory 26 (1997), 137-141. THE AXIOMS OF EHUD LEHRER:

(L1)

∀S ⊆ N \ {i} v(S ∪ {i}) = v(S) + v({i}) ⇒ βi = v({i});

(L2)

∀S ⊆ N \ {i, j} v(S ∪ {i}) = v(S ∪ { j}) ⇒ βi = β j ;

(L3) Let (v ∧ w)(S) = min{v(S),w(S)} and (v ∨ w)(S) = max{v(S),w(S)}; Then, β[v ∧ w] + β[v ∨ w] = β[v] + β[w]; The last axiom (L4) follows from this observation: Let us consider the following weighted voting game 1 1 1 v=[1; , , ] 3 3 3 Then, ⎛ 1 1 1 ⎞ ⎛ 1 1 1⎞ β[v] = , , and ϕ[v] = , , . ⎝⎜ 4 4 4 ⎠⎟ ⎝⎜ 3 3 3⎠⎟ Now let us consider a new game 2 1 w = [1; , ], in which Player 1 and Player 2 are one 3 3 player. Then, ⎛ 1 1 ⎞ β[v] = , =ϕ[v]. ⎝⎜ 2 2 ⎠⎟ Note that from the Shapley value’s point of view the 1 1 1 unification does not pay off, because + > , but from 3 3 2 the Banzhaf power index’ point of view the unification is never harmful, because 1 1 1 + ≤ . 4 4 2 (L4) Let vp be (n-1)-player game, in which the players {i, j} are one player {p}:

β p [vp ] ≥ βi[v] + β j [v].

Theorem (Lehrer (1988))

If a game satisfies the axioms (L1) - (L4), then the vector β is a Banzhaf power index. Remark: A.S. Nowak (1997) replaced the axiom (L3) with an axiom similar to the one given by Young in the axiomatization of the Shapley value:

(L3’) if for every S ⊆ N \ {i} v(S ∪ {i}) − v(S) = w(S ∪ {i})− w(S), then βi [v] = βi [w].

Theorem (Nowak (1997))

If a game satisfies the axioms (L1, L2, L4) and (L3’) , then the vector β is a Banzhaf power index. BIBLIOGRAPHY TO THE COURSE ON “GAME THEORY”

i LECTURE 1 - P.D. Straffin, Game Theory and Strategy, Mathematical Association of America, 1993.

i LECTURE 2 - P.D. Straffin, Game Theory and Strategy, Mathematical Association of America, 1993. - G. Owen, Game Theory, Academic Press, 1982. i LECTURE 3 - J.N. Webb, Game Theory, Decisions, Interaction and Evolution, Springer-Verlag, 2007. - T.S. Ferguson, Game Theory, 2005, http://www.math.ucla.edu/~tom/Game_Theory/Contents.html

i LECTURE 4 - S. Hart, Games in Extensive and Strategic Forms, In: Handbook of Game Theory (Eds.R. Aumann, S. Hart), Elsevier Science Publishers, 1992.

i LECTURE 5 - J.N. Webb, Game Theory, Decisions, Interaction and Evolution, Springer-Verlag, 2007.

i LECTURE 6 - J.N. Webb, Game Theory, Decisions, Interaction and Evolution, Springer-Verlag, 2007. - K. Binmore, Playing for Real, A Text on Game Theory, Oxford Press, 2007.

i LECTURE 7 - D. Fudenberg, J. Tirole, Game Theory, MIT Press, 1994. - A. Haurie, J.B. Krawczyk, An Introduction to Dynamic Games, 2000, http://www.ceaconline.org/pdf/microecomoniadiplomado/ Introduction_To_Dynamic_Games_Decision_Theory.pdf

i LECTURE 8 - K. Chaterjee, W. Samuelson, Bargaining under incomplete information, Oper. Res. (1983).

i LECTURE 9 - J.N. Webb, Game Theory, Decisions, Interaction and Evolution, Springer-Verlag, 2007. i LECTURE 10 - M.L. Puterman, Markov Decision Processes, Discrete Stochastic Dynamic Programming, John Wiley & Sons, 1994.

i LECTURE 11 - A. Haurie, J.B. Krawczyk, An Introduction to Dynamic Games, 2000, http://www.ceaconline.org/pdf/microecomoniadiplomado/ Introduction_To_Dynamic_Games_Decision_Theory.pdf i LECTURE 12 - A.S. Nowak, A note on an equilibrium in the great wish war game, Econ. Bull. (2006).

i LECTURE 13 - D. Fudenberg, J. Tirole, Game Theory, MIT Press, 1994.

i LECTURE 14 - T.S. Ferguson, Game Theory, 2005, http://www.math.ucla.edu/~tom/Game_Theory/Contents.html - A.J. Jones, Game Theory: Mathematical Models of Conflict, John Wiley & Sons, 1980. • LECTURE 15

- A.J. Jones, Game Theory: Mathematical Models of Conflict, John Wiley & Sons, 1980. - T.S. Ferguson, Game Theory, 2005, http://www.math.ucla.edu/~tom/Game_Theory/Contents.html - E. Lehrer, An axiomatization of the Banzhaf value, Inter. J. Game Theory (1988). - A.S. Nowak, On an axiomatization of the Banzhaf value without the additivity axiom, Inter. J. Game Theory (1997).