Learning 1* Fictitious Play

14.126 Sergei Izmalkov Muhamet Yildiz *Special thanks to for permitting to use his slides from 14.126 (Fall 2001)

• “After all, insects can hardly be said to think at all, an so rationality cannot be so crucial if game theory somehow manages to predict their behavior under appropriate conditions. Simultaneously the advent of brought home the fact that human subjects are no great shakes at thinking either. When they find their way to an equilibrium of a game, they typically do so using trial-and-error methods.”

Ken Binmore Road Map

1. Fictitious play 2. Evolution and reinforcement mechanisms – replicator dynamics 3. Adjustment models with persistent randomness; 4. Learning in extensive form games, rationality, etc.

Road Map – fictitious play

1. Cournot Adjustment 2. Fictitious play for 2 players 3. Examples 4. Asymptotic behavior 5. Issues with multiplayer fictitious play, etc. 6. Stochastic fictitious play Cournot Adjustment θ2 C • θt+1 = f (θt), where C 2 •f(θ) = (BR1(θ ), 1 BR2(θ )). BR • If θ is a steady state 1 of fC (i.e., θ = fC(θ)), then θ is a . BR2 • Nash equilibrium on the left is globally stable. θ1

• A steady state θ of F is stable iff, ∀neighborhood Uof θ, ∃nbrd U1 s.t., if θ0∈U1, Ft(θt)∈U ∀t>0. θ2 • The stable Nash equilibria are

• A steady state θ of F is asymptotically stable iff, it is stable and, if θ ∈U, lim BR1 0 t Ft(θt) = θ.

• A steady state q of F is BR2 globally stable, iff, it is ass.stable and U=Θ

θ1 Fictitious play – 2 players

•(S1,S2,u1,u2), a game. k i s−i +1 if s−i = s−i i -i i −i t−1( ) t−1 • k 0 : S → R++; kt ()s =  i −i  kt−1()s otherwise. i −i i −i kt ()s • γ t ()s = k i ~s −i ∑~s −i t ()

• Fictitious play is any rule ρt with i -i i -i ρt (γt ) ∈ BR (γt ).

Example –

H T

H (1,-1) (-1,1)

T (-1,1) (1,-1) Fictitious play in Matching Pennies

TH 4

3.5

TT 3

2.5

HT 2

1.5

HH 1 2 g1(H) g (H) 0.5

0 0 20 40 60 80 100 1 2 k0 = (2,1); k0 = (1,2)

Fictitious play in Matching Pennies 2

TH 4

3.5

TT 3

2.5

HT 2

1.5

HH 1 2 g1(H) g (H) 0.5

0 0 200 400 600 800 1000 Fictitious play in Matching Pennies 3

TH 4

3.5

TT 3

2.5

HT 2

1.5

HH 1 2 g1(H) g (H) 0.5

0 0 2000 4000 6000 8000 10000

Chicken

(-1,-1) (1,0)

(0,1) (1/2,1/2) Fictitious play in Chicken

CC 3

2.5

LC 2

1.5

LL 1 2 g1(L) g (L) 0.5

0 0 20 40 60 80 100

k1 = (1,1) = k2

Fictitious play in Chicken – 2 1 2 k0 = (1,1); k0 = (10,10) CC 3

2.5

LC 2

1.5

g2(L) LL 1

0.5 g1(L)

0 0 20 40 60 80 100 Miscoordination game

A B A (0,0) (1,1)

B (1,1) (0,0)

Miscoordination

BB3

2.5

2

1.5

AA 1

0.5

0 0 20 40 60 80 100 Asymptotic behavior

Proposition: (1) If s is a strict Nash equilibrium, and s is played at some t in the process of fictitious play, then s is played thereafter. (2) Any pure- steady state of fictitious play is a Nash equilibrium. Proof: (1) i i i 1. s ∈ BR (γt ). i i -i 2. γ t+1 = (1-at) γt + atδ(s ); δ is Dirac’s delta. i i i i i i i i -i 3. u (s ,γ t+1) = (1-at)u (s ,γ t) + atu (s , s ). i i i 4. {s } = BR (γ t+1).

Asymptotic behavior, continued

j j j j • Empirical distribution: dt (s ) = [kt(s ) – k0(s )]/t Proposition: If the empirical distributions over each player’s choices converge, then the product of these distributions is a Nash equilibrium. j Proposition: The empirical distributions dt over each player j’s choices converge if the stage has generic payoffs and is 2x2, or it is zero-sum, or is solvable by iterated elimination of strictly dominant strategies, or … Shapley’s example L M R T 0,0 1,0 0,1 M 0,1 0,0 1,0 D 1,0 0,1 0,0

(T,M) → (T,R) → (M,R) → (M,L) → (D,L) → (D,M) → (T,M) → … Empirical distributions do not converge, but instead follow a limit cycle.

Issues with multiplayer fictitious play • Should a player assume that the other players’ strategies un-correlated • Should a player assume that the other players’ strategies are correlated only because he does not know which pairs of independent strategies are played? Stochastic fictitious play & Mixed-strategy equilibria

Randomly perturbed payoffs

• η1,η2 are iid with a smooth distribution H T f . H x 2+η1,2+η2 η1,0 • As x approaches to 0, T 0, η 1,1 fx becomes a unit 2 mass at 0. Randomly perturbed payoffs

•BRi(σ-i)(si) = Pr{ηi| si∈ BRi(σ-i; ηi)} • Profile σ is a Nash BR2 distribution iff BRi(σ-i) = σi (∀i) • Stochastic fictitious play BR1 is a rule that plays a mixed strategy in i -i BR (γt ).