Dynamics in Games, by Lyndsey Clark (2011-2012)

Contents

1 Game Theory 3 1.1 Gamesinnormalform ...... 3 1.2 Bimatrix games ...... 5

2 Replicator Dynamics 6 2.1 Long term behaviour and convergence ...... 8 Dynamics in Games 2.2 Correspondence with Lotka-Volterra dynamics ...... 11 3 Fictitious Play 14 3.1 Deﬁnitions ...... 14 0820964 3.2 Existence and uniqueness of solutions ...... 15 3.3 Games with convergent ﬁctitious play ...... 17 3.4 The Shapley system ...... 21 March 2012 3.5 Open questions ...... 22 3.5.1 Transition diagrams ...... 22

4 The link between replicator dynamics and ﬁctitious play 29 4.1 Time averages ...... 33

5 Conclusion 34

1 Introduction Chapter 1

Game theory is a subject with a wide range of applications in economics and biology. In Game Theory this dissertation, we discuss evolutionary dynamics in games, where players’ strategies evolve dynamically, and we study the long-term behaviour of some of these systems. The replicator system is a well known model of an evolving population in which successful We begin by briefly going over the game theoretic ideas and definitions that we hope the subtypes of the population proliferate. Another system known as fictitious play (also reader is already familiar with, along with some of the biological motivation behind the called best response dynamics) models the evolution of a population in which at any given technical details. The material presented here can be found in any number of books on time some individuals review their strategy and change to the one which will maximise game theory: see for example Hofbauer and Sigmund [11, 6]. their payo↵.Wewillinvestigatethesetwosystemsandtheirbehaviourandfinallyexplore § the link between the two. 1.1 Games in normal form

Generally we assume an individual can behave in n di↵erent ways. These are called pure strategies, denoted by P1 to Pn, and could represent things like “ﬁght”, “run away”, “play rock in a game of Rock Paper Scissors”, or even “sell shares” in a more economic context. For biology, we take it that one individual plays one pure strategy. Then the make up of the population can be described by the proportions of individuals playing each pure strategy. These proportions form what is said to be simply the strategy of the population as a whole. As the proportions should clearly sum to one, the set of possible strategies is a simplex:

n n ⌃n = p =(p1,...,pn) R : pi 0and pi =1 . 2 ( i=1 ) X Here pi can be thought of as the chance of randomly picking an individual who uses strategy i. By deﬁnition a population in which everybody plays strategy Pi corresponds to the i-th unit vector in the simplex. We then consider games to be played by a population that evolves continuously as time progresses, describing a path in the simplex. To model some form of evolution, we need to somehow quantify what happens when one individual encounters another: which strategy will perform better? Given that we know the composition of the population, how well will one strategy perform on average over many encounters? In this context, the phrases “perform better” or “succeed” refer to an increase in evolutionary ﬁtness: more food, more territory, more females - anything that increases the chance of reproduction. The way to model this is to assign a “payo↵”: when an individual using strategy i meets a second individual following strategy j,thei-strategist will receive a payo↵ of aij. This creates an n n payo↵ matrix A =(a ). Then a population with strategy p playing ⇥ ij

2 3 against a population with strategy q will receive a payo↵ 1.2 Bimatrix games

n n An n m bimatrix game covers the situation where there are two players or populations, p Aq = a p q . ⇥ · ij i j one with n strategies and one with m strategies. In normal form this can be represented i=1 j=1 X X by a pair of n m matrices [A, B], so that if Player A plays strategy i and Player B ⇥ A game described this way is called a game in normal form. This is as opposed to plays strategy j, then the payo↵ to Player A will be aij and the payo↵ to Player B will extensive form, where a game is written as a tree. Extensive form can describe games be bij. This allows for the representation of games such as the Prisoner’s Dilemma or with imperfect or incomplete information, such as where players do not know the payo↵s Rock, Paper, Scissors. The concepts of Nash equilibria and best responses may then be or when the payo↵saresubjectto(random)noise. generalised to bimatrix games. For simplicity of notation we write the strategy p ⌃A of Player A as a row vector and the strategy q ⌃ of Player B as a column vector.2 The next concept needed is that of a best response. 2 B Definition 1.1 (Best responses). A strategy p is a best response to a strategy q if for Definition 1.4 (Best responses). A strategy pˆ ⌃A is a best response of Player A to the strategy q ⌃ of Player B if for every p 2⌃ we have every p0 in ⌃n,wehave 2 B 2 A p Aq p0 Aq. · · pÂq pAq. (1.2) We denote the set of best responses to q by BR(q). Essentially, a best response to q is the strategy (or strategies - best responses are not necessarily unique) that will maximise Similarly a strategy qˆ ⌃B is a best response of Player B to the strategy p ⌃A of Player A if for every q 2 ⌃ we have 2 the payo↵.Thisisthusausefulconceptasmanyofthesystemswewillconsiderwill 2 B involve a population attempting to maximise its payo↵. pBqˆ pBq (1.3) If a strategy is a best response to itself, this is called a Nash equilibrium, named for John Forbes Nash. More formally: We denote by BR (q) ⌃ and BR (p) ⌃ the sets of best responses of Players A ⇢ A B ⇢ B Definition 1.2 (Nash equilibria). A strategy p is a Nash equilibrium if A and B to strategies q and p respectively. Definition 1.5 (Nash equilibria). A strategy pair (pˆ, qˆ)isaNash equilibrium for the p Ap q Ap p. · · 8 bimatrix game [A, B]ifpˆ is a best response to qˆ and qˆ is a best response to pˆ: that is, (pˆ, qˆ) BR (pˆ) BR (qˆ). If a strategy is the unique best response to itself, then it is said to be a strict Nash 2 A ⇥ B equilibrium.

Notice that p is a Nash equilibrium if and only if its components satisfy

(Ap) = c i such that p > 0, i 8 i for some constant c>0, where

p + + p =1. 1 ··· n It is well known that every ﬁnite game has a (not necessarily unique) Nash equilibrium. Multiple proofs, including Nash’s original proof from his thesis [13], may be found in a paper by Hofbauer [10, Theorem 2.1]. It seems sensible that if a strategy is a Nash equilibrium, then the population should stop evolving at this state, as it is already following the best strategy. This brings us to questions of stability of equilibria.

Deﬁnition 1.3 (Evolutionarily stable states). A strategy pˆ is said to be an evolutionarily stable state (ESS) if pˆ Ap > p Ap (1.1) · · for every p = pˆ in a neighbourhood of pˆ. 6

4 5 Note that it follows from Lemma 2.1 that we may rewrite A in a simpler form, for example having only 0 in the last row. The following useful lemma (given as an exercise in Hofbauer and Sigmund [11, p. 68]) allows us to transform the system so that we may move a point of interest such as a Nash equilibrium to the barycentre. Chapter 2 Lemma 2.2. The projective transformation p q given by ! p c q = i i i n p c Replicator Dynamics l=1 l l P ˜ 1 (with cj > 0) changes (2.2) into the replicator equation with matrix A =(aijcj ). Notice 1 1 1 that this enables us to move a specific point p =(pi) to the barycentre ( n , n ,..., n ) by 1 taking c = p . The replicator equation is a long-standing simple population model for the evolutionary i i success of di↵erent subtypes within a population. It can be found in many books on Proof. This calculation is somewhat tedious but necessary as the result will be useful evolutionary game dynamics, including those by Hofbauer and Sigmund [11, 7] and by later. First, let us calculate Fudenberg and Levine [7, 3]. § § n If a population is very large, then one individual is evolutionarily speaking a very 1 (A˜q) = a c q small insignificant part of the population. Consequently we tend to ignore the fact that i ij j j j=1 populations come in discrete sizes, and assume the population changes continuously over Xn time. Let us denote by p the proportion of type i individuals within the population. Then 1 1 i = a c p c ij j j j n p c p =(p1,...,pn) ⌃n can be considered as a continuous function of time t. The relative j=1 l=1 l l success of each2 type will be dependent on the overall composition of the population. X 1 P Denote this fitness by fi(p). Then the average fitness for the population will be f¯(p):= = n (Ap)i. n l=1 plcl i=1 pifi(p). Also we have P Definition 2.1 (Replicator Dynamics). Given p(t) ⌃n, fi : ⌃n R,thereplicator P 2 ! equation is n ¯ 1 p˙i = pi(fi(p) f(p)) i =1, . . . , n. (2.1) q A˜q = a c q q · ij j i j j=1 Notice that we can add a function (p)toeachfi without changing the dynamics. X n n In the case where fitness is given by a payo↵ matrix A, this simplifies to 1 = n 2 aijcipipj. p˙ = p ((Ap) p Ap) i =1, . . . , n. (2.2) ( l=1 plcl) i=1 j=1 i i i · X X Lemma 2.1. The addition of a constant c to the j-th column of A does not change the P dynamics. Proof. Let B be the matrix A with the addition of a constant c to the j-th column. Then, n

(Bp)i = aikpk + cpj k=1 Xn n n p Bp = a p p + cp p . · kl k l j k Xk=1 Xl=1 Xk=1 Noting that k cpjpk = cpj k pk = cpj, we see that p ((Bp) p Bp)=p ((Ap) + cp (p Ap cp )) P i i P· i i j · j = p ((Ap) p Ap). i i · Thus the dynamics do not change.

6 7 Then by definition, Lemma 2.4. Rest points p int(⌃n) of the replicator equation are precisely points p such that 2 dqi d pici n = n dt dt l=1 plcl (Ap)1 =(Ap)2 = =(Ap)n and pi =1. ✓ ◆ n ··· i=1 p˙ici pici X = P p˙lcl n n 2 Equivalently, a point in the interior of the simplex ⌃n is a rest point if and only if it is plcl ( p c ) l=1 l=1 l l l=1 a Nash equilibrium. n nXn Pp c = i i Pa p a p p Proof. Points in the interior have p > 0foralli, so this follows directly from the previous n p c ij j ij i j i l=1 l l j=1 i=1 j=1 ! lemma. X X X P p c n n n n i i p c a p a p p Lemma 2.5. If the omega limit set !(p) of a point p ⌃n under replicator dynamics n 2 i i ij j kj k j consists of one point q, then q is a Nash equilibrium of2 the underlying matrix. ( l=1 plcl) i=1 " j=1 j=1 !# X X Xk=1 X n n n Proof. Suppose q = !(p)forsomep ⌃ . Then by definition and slight abuse of picPi 1 n 2 { } 2 = n aijpj n aijcipipj notation , there exists a sequence tn with limn tn = such that p(tn) q. This plcl plcl !1 l=1 j=1 l=1 i=1 j=1 ! implies that p˙ (t ) 0. Suppose q is not a Nash equilibrium.1 Then there! exists a X X X n ! = pPici (A˜q)i q A˜q P standard basis vector ei such that ei Aq > q Aq. Equivalently, there exists ✏ > 0such · that e Aq q Aq > ✏. By definition,· (Aq) ·= e Aq,thuswehave ⇣qi ˜ ⌘ ˜ i · · i i · = n 1 (Aq)i q Aq . l=1 cl ql · q˙i = qi(ei Aq q Aq) ⇣ ⌘ n · · This is almost the replicator equation: the only di↵erence is the factor of plcl. This q˙ P l=1 = i > ✏ t 0. can be fixed by a rescaling of time, and we obtain the replicator equation with matrix ) qi 8 ˜ 1 P A =(aijcj )asrequired. But then q˙ 0foranyt : a contradiction. 6! n !1 Lemma 2.6. If q is a Lyapunov stable point of the replicator equation with matrix A, 2.1 Long term behaviour and convergence then q is a Nash equilibrium for the game with payo↵ matrix A.

We now endeavour to say something about the long term behaviour of replicator dy- Proof. Suppose q is not a Nash equilibrium. Then by continuity there exists an i and some ✏ > 0suchthat(Ap) p Ap > ✏ for all p in some neighbourhood of q. Thenp ˙ > ✏p , namics. As one would expect, this is closely tied to the concepts of Nash equilibria, i · i i evolutionarily stable strategies and Lyapunov stability1. so pi increases exponentially for all p in some neighbourhood of q. This contradicts the Lyapunov stability of q. Lemma 2.3. If p ⌃n is a Nash equilibrium for the game with payo↵ matrix A, then p is a fixed point of the2 corresponding replicator equation. Notice that not every rest point of the replicator equation is a Nash equilibrium: any pure strategy will always be a rest point (representing the idea that extinct subtypes Proof. Suppose p is a Nash equilibrium. Then, as discussed in the previous chapter, its cannot come back to life), but there are certainly games where a given pure strategy is components satisfy not a Nash equilibrium. Similarly, not every Nash equilibrium is Lyapunov stable under n fictitious play: for example in a game where every strategy is a Nash equilibrium (take (Ap)i = c i such that pi > 0, for some constant c>0, where pi =1. 8 aij =1foralli, j). i=1 X Thus, Theorem 2.1. If q ⌃n is an evolutionarily stable state for the game with payo↵ matrix n A, then q is an asymptotically2 stable rest point of the corresponding replicator equation. p Ap = p c = c. · i i=1 Proof. Consider the function X Considering the replicator equation, we see that n qi F : ⌃n [0, ], p p . p˙ =0 i p ((Ap) p Ap)=0 ! 1 7! i i 8 () i i · i=1 p ((Ap) c)=0. Y () i i We first show that this has a unique maximum at q. To see this, we use Jensen’s This clearly holds when p is a Nash equilibrium. Inequality:

1 2 It is assumed the reader is familiar with Lyapunov stability and omega limit sets: for deﬁnitions and Here we use p to denote both the point p ⌃n and the solution p(t) to the replicator equation discussion of these concepts see for example Hofbauer and Sigmund [11, 2.6] satisfying p = p(0). 2 § 8 9 Theorem 2.2 (Jensen’s Inequality). If a function f : I I is strictly convex on an some neighbourhood of q, p = q. Notice that F (p) > 0forallp ⌃n. Now interval I, then ! 6 2 n n d n f q p q f(p ), (log F )˙ = qi log pi i i  i i dt i=1 ! i=1 i=1 ! X X n X d where q =(qi) int(⌃n) and pi I, with equality if and only if p1 = p2 = = pn. = q log p 2 2 ··· dt i i We apply Jensen’s Inequality to the function log(F ), setting I =[0, ]and0log0= i=1 1 Xn 0log = 0. Notice that log is strictly increasing, so F has a maximum at q if and only p˙ 1 = q i if log F has a maximum at q. Now, i q i=1 i X n n q i = qi ((Ap)i p Ap) log(F (p)) = log pi · ! i=1 i=1 X n Y = q Ap p Ap. qi · · = log pi i=1 By assumption, q is an ESS, and so by deﬁnition we have that q Ap p Ap > 0for Xn F˙ · · p in a neighbourhood of q. Hence, F > 0inaneighbourhoodofq.WeknowthatF is = qi log pi. strictly positive on ⌃n,thuswehavethatF>˙ 0andF is a strict Lyapunov function. So, i=1 X q is asymptotically stable and by previous lemmas (2.3 and 2.6), q is an asymptotically By Jensen’s Inequality, stable rest point of the replicator equation.

n p n p log q i q log i i q  i q 2.2 Correspondence with Lotka-Volterra dynamics i=1 i ! i=1 i X X ✓ ◆ n p n Lotka-Volterra dynamics have been well studied and are generally one of the first examples q log i log p . () i q  i of population modelling that one encounters. Consequently a correspondence between i=1 i i=1 ! X ✓ ◆ X replicator dynamics and Lotka-Volterra systems gives us a rapid gain in understanding n for comparatively little e↵ort, and is thus very helpful. Since i=1 pi =1thisgivesus n Throughout this section, p =(pi)i=1 ⌃n will be used to denote variables for replica- n n n 1 n 1 2 P tor dynamics and q =(qi) R+ will be used to denote variables for Lotka-Volterra i=1 2 qi log pi qi log qi 0, dynamics.  i=1 i=1 X X Definition 2.2 (Lotka-Volterra dynamics). The generalised Lotka-Volterra system for with equality if and only if pi = cqi for some constant c and for all i. As we must have n 1 q R+ is given by p ⌃n, the only viable possibility here is that c =1.Thus, 2 n 1 2 n n q˙i = qi ri + bijqj , j=1 ! qi log pi qi log qi, X  i=1 i=1 for i =1,...,n 1, where the r and b are constants. X X i ij for all p ⌃n, with equality if and only if p = q. So the function F has a unique Written thus, the generalised Lotka-Volterra equations and replicator dynamics are 2 maximum at q. both first order non-linear ordinary di↵erential equations. Replicator dynamics involves Next we show that F is a strict Lyapunov function at q, i.e. that F>˙ 0forallp in n equations with cubic terms on an n 1 dimensional space, and the Lotka-Volterra system involves n 1equationswithquadratictermsonan n 1dimensionalspace.A correspondence does not seem that unreasonable.

n 1 Theorem 2.3. There exists a smooth invertible map from p ⌃n : pn > 0 to R+ that maps the orbits of replicator dynamics to the orbits of the{ Lotka-Volterra2 system} 2.2 with r = a a and b = a a . i in nn ij ij nj

10 11 Proof. First notice that by Lemma 2.1 we may assume without loss of generality that the Then last row of A consists of zeros, i.e. (Ap)n =0.Defineqn =1.Considerthetransformation d qi n 1 p˙i = n T : p ⌃n : pn > 0 R+ dt qj { 2 } ! j=1 ! p q, n q˙iP qi j=1 q˙j 7! = n q n q n q given by j=1 j j=1 j ⇥ Pj=1 j pi n 1 n n 1 qi = P qi P P i=1 qi pn = r + b q r + b q n q i ij j n q i ij j j=1 j j=1 ! j=1 j j=1 ! for i =1,...n 1. This has inverse given by X P X P n n n Pn q p = i = pi ql aijpj pi aijpj i n 1 ! j=1 qj +1 l=1 j=1 i=1 j=1 Xn X X X for i =1,...n 1, and P = p q ((Ap) p Ap) . i l i · 1 l=1 pn = n 1 . X q +1 j=1 j By a change in velocity we thus have the replicator equation. Clearly T is smooth and invertible. We nowP show that T maps the orbits of the replicator 1 There are many other interesting results involving the replicator equation which would equation to the orbits of the Lotka-Volterra system (and that T performs the reverse). Assume the replicator equation holds. Then exceed the scope of this work. A full classification of dynamics for n =3wasgivenby Zeeman [20], and specific examples may be found in Hofbauer and Sigmund [11, 7.4]. d p § q˙ = i i dt p ✓ n ◆ pi = ((Ap)i (Ap)n) pn n

= qi aijpj j=1 ! X n 1 = q a + a q p . i in ij j · n j=1 ! X By a change in velocity we can remove the extra p . Noting that we assumed a 0for n nj ⌘ all j, set ri := ain and bij := aij. Then we have the Lotka-Volterra system as required. Assume the Lotka-Volterra equations hold. Then define aij := bij for i, j =1,...,n 1. Define a := r for i =1,...,n 1anda := 0 for j =1,...,n. For convenience of in i nj notation define qn =1.

12 13 A B P2 P2 ⌃A ⌃B A Z23 B Z13

B Z23 Chapter 3 A Z12

A B A B A B Fictitious Play P1 Z12 P3 P1 Z13 P3

Figure 3.1: Simplices and indi↵erence planes for the Shapley system ( 3.4), 1 § ⇡ 2 The fictitious play process was originally defined by Brown in 1951 [2] as a method of computing Nash equilibria in zero-sum games. It is now used primarily as a model of Then the fictitious play process is defined as follows: learning (see Berger’s paper [4] for more of the history). At a given moment, each player computes a best response to the average of his opponent’s past strategies using a payo↵ d pA =BR (pB) pA (3.1) matrix and instantly plays this best response. This causes each player’s average to move dt A continuously towards the best response. In terms of populations this can be interpreted d pB =BR (pA) pB. (3.2) as follows: in each moment of time, a proportion of the current population changes from dt B their current strategy to the best response. As mentioned previously, we are here considering pA(t)andpB(t)tobetheaverages of the players’ past strategies, or alternatively as populations many up of individuals playing 3.1 Definitions pure strategies: it is not that the player is playing a mixed strategy.

Let [A, B] be a bimatrix game. We will also use A and B to refer to the two players, but the context will make clear what is meant. Following the conventions of Sparrow et 3.2 Existence and uniqueness of solutions al. [18], we set the averages of the pure strategies of the players to be pA(t)andpB(t), It is important to note that as the best response is not necessarily unique, fictitious play defining these as row and column vectors respectively. We define the best response sets is a di↵erential inclusion. Consequently we must be very careful regarding existence and BR and BR to be the sets of best responses for each player (see Section 1.2). We also A B uniqueness of solutions. Firstly we notice that there is only a problem on the indi↵erence define: planes defined previously. These form a codimension-one subset of ⌃ ⌃ . Outside vA(t):=ApB(t) A B of this subset, the best responses of each player are unique and so we have⇥ existence of vB(t):=pA(t)B uniqueness of solutions. Generically the best response sets will consist of a pure strategy. If there is more than B A A B Lemma 3.1 (Solutions of fictitious play). If BRA(p )=Pi and BRB(p )=Pj , then one pure best response, then the best response set will consist of all convex combinations fictitious play is a di↵erential equation with solutions of these pure strategies. The strategies for which there are multiple best responses form A A A t A planes in the simplex, called indi↵erence planes as this is where a player is indi↵erent p (t)=(p (0) P )e + P i i between two or more strategies. B B B t B p (t)=(p (0) P )e + P , j j Definition 3.1 (Indi↵erence planes). The indi↵erence planes for players A and B re- for t [0, ✏) for some ✏ > 0. spectively are 2 It is worth noting that the system can be reparametrised in such a way that - assuming B A B A B A B A Zij := p ⌃A : vi (p )=vj (p )=max vk (p ) ⌃A existence and uniqueness of solutions - the players would reach their best response at time 2 k { } ✓ n o 1. This is particularly useful when modelling the system numerically. A B A B A B A B Zkl := p ⌃B : vk (p )=vl (p ) = max vk (p ) ⌃B. Lemma 3.2 (Reparametrisation). The fictitious play system (3.1) can be reparametrised 2 k { } ✓ n o as described above.

14 15 t Proof. Set s =1 e . Then P A P B 2 2 ⌃A ⌃B d dpA dt pA = ds dt ds = et(P A pA(t)) ZB ZA i B 13 A 23 1 Z23 Z12 = (P A pA( ln(1 s))). 1 s i From the solution above, we can calculate P A ZB P A P B ZA P B pA( ln(1 s)) = (pA(0) P A)(1 s)+P A. 1 12 3 1 13 3 i i Then we have Figure 3.2: Demonstration of the failure of the transversality condition in the Shapley d 1 pA = (P A P A (1 s)(pA(0) P A)) system with =0(see 3.4) ds 1 s i i i § A A = Pi p (0). B A that as BRA(p (0)) = Pk , there exists ✏ > 0suchthatforallt with t < ✏,wehave B A | | Following the same procedure for pB, this gives us solutions BRA(p (t)) = Pk . A A B Transversality means that ((1 )p (0) + Pk ) / Zij for any non-zero value of . A A A 2 A A p (s)=(1 s)p (0) + sPi (3.3) Thus, we may choose with 0 < < ✏ such that BRB((1 )p (0) + Pk )isuniqueand A A pB(s)=(1 s)pB(0) + sP B, (3.4) BRB((1 + )p (0) Pk )isunique.Thenwemaydefinethesolutionasfollows: j pA(t)=tP A +(1 t)pA(0) if t < ✏ for s [0, ✏)forsome✏ > 0. k | | 2 pB(0) if t =0 This is all very well when the best response is unique. However, most solutions B A B p (t)= tBRB(p ()) + (1 t)p (0) if 0 0wehaveBRB(p (t)) = Pi and for t<0wehave A B finding Nash equilibria. This was later shown by Shapley to be false, but considerable BRB(p (t)) = Pj (of course possibly with i, j interchanged depending on the system). e↵ort has gone into showing convergence for various classes of games. It is conjectured The system we hope only has uniqueness issues “momentarily” at t =0,andassuch that non-degenerate quasi-supermodular games are one such class, but attempts at a we hope to define a continuous extension to the unique solutions that exist before and proof have so far been unsuccessful. However, steps have been made in this direction, after t=0. To do this properly, we would need to define concepts like continuity for including the following types of games: set-valued maps. This can be done quite naturally, but a full discussion of di↵erential inclusions is somewhat beyond the scope of this project. As such, this is something of a Generic 2 n non-zero-sum games ([5] Berger, 2005). sketch proof: the details of di↵erential inclusions may be found in Aubin and Cellina [1] • ⇥ 3 3 non-degenerate quasi-supermodular games ([9] Hahn, 1999). and the application thereof in Sparrow et al. [18, Proposition 3.1]. Continuing, we see • ⇥

16 17 3 m and 4 4 non-degenerate quasi-supermodular games ([6] Berger, 2007). Now that we have arrived at this set up, we simply show that all switches must be • ⇥ ⇥ improvement steps. This result is the combination of the Improvement Principle due to This section will go through (some of) the ideas, results and proofs from these papers. Un- Sela [16] and its analog called the Second Improvement Principle due to Berger [6]. fortunately the proofs have essentially employed a brute force method which does not ex- tend to higher dimensions. The idea is to show that when a player changes strategy, they Lemma 3.3 (Improvement Principle). Suppose a fictitious play path switches from (i, j) 1 must change to a “better” strategy. Then we use the properties of quasi-supermodularity to (i0,j0) at time t . Then either there are improvement steps (i, j) (i, j0) (i0,j0) or 1 ! ! to show that there does not exist a cycle of strategies to follow with each better than the there are improvement steps (i, j) (i0,j) (i0,j0). last, and thus fictitious play must converge. ! ! We begin as always with definitions, which may be found in any of the papers above. Proof. At time t1,wehavethat(i, j), (i0,j0) BRA(q(t1)) BRB(p(t1)). The players are indi↵erent between both strategies. By definition2 of a switch,⇥ there exists ✏ > 0such Definition 3.2 (Quasi-supermodular games). An n m bimatrix game [A, B] is said to ⇥ that (it,jt)=(i, j)forallt [t0,t1)wheret0 = t1 ✏. Then we may write be quasi-supermodular if given i, i0 N = 1,...,n and j, j0 M = 1,...,m with 2 2 { } 2 { } A B i

ai0j >aij = ai0j0 >aij0 where 1. Rearranging, )  b >b = b >b . ij0 ij ) i0j0 i0j 1 1 (P A,PB)= (p(t ), q(t )) + (p(t ), q(t )) The idea is that as one player increases his strategy, it becomes increasingly advan- i j 1 1 1 1 tageous for the other to also increase their strategy to a higher numbered one. = c(p(t ), q(t )) + (1 c)(p(t ), q(t )), 1 1 0 0 Deﬁnition 3.3 (Non-degeneracy). An n m bimatrix game [A, B] is said to be degenerate 1 ⇥ where c = 1. if either for some i, i0 N with i = i0, there exists j such that a = a or for some 2 6 i0j ij Considering only the part in ⌃B and multiplying by A, j, j0 M with j = j0, there exists i such that b = b . A game is said to be non- 2 6 ij0 ij degenerate if it is not degenerate. AP B = cAq(t )+(1 c)Aq(t ) j 1 0 The crux of the proofs of convergence of ﬁctitious play hinged on the concept of an Now subtract the i-th component from the i0-th component to get improvement step.

ai j aij = c [Aq(t1) Aq(t1) ] +(1 c)[Aq(t0) Aq(t0) ] Definition 3.4 (Improvement steps). We say that (i, j) (i0,j0)(calledanimprovement 0 i0 i i0 i ! 0 step)ifeitheri = i0 and bij0 >bij or j = j0 and ai0j >aij. A sequence of improvement steps forms an improvement path and a cyclical improvement path is called an improvement =(1 c) [Aq(t0) Aq(t0) ] | {z i0 } i cycle. 0asc 1  0. This simple idea is intrinsically linked to the solutions of fictitious play in that the | {z } sequence of best responses must follow an improvement path. Definition 3.5 (Fictitious play paths). A strategy path (i ,j) N M for t [0, ) t t Now that we have the Improvement Principle, we are able to prove the following is a (continuous) fictitious play path if for almost every t 1 we2 have⇥ 2 1 theorem: (i ,j) BR (q(t)) BR (p(t)), (3.5) t t 2 A ⇥ B Theorem 3.2. The fictitious play process converges for every 3 m non-degenerate ⇥ where the path (p(t), q(t)) solves the fictitious play equations (3.1). quasi-supermodular game (NDQSMG) With this definition, we may consider the indi↵erence planes as being the places where Proof. Without loss of generality we may assume that there are no dominated strategies. the fictitious play path switches. Then in a NDQSMG, we begin with the following set up of improvement steps. Then we look for a step that goes “up”: to be precise, a step of the form (a, j) (b, j) Definition 3.6 (Switching). A fictitious play path (it,jt)issaidtoswitch from (i, j)to where a>b. Clearly a cycle must contain such a step. There are only three possibilities! (i0,j0)attimet0 if (i, j) =(i0,j0)andthereexists✏ > 0suchthat for going “up”: a cycle must contain either a step (3,j) (1,j), a step (2,j) (1,j)or 6 ! ! (i ,j)=(i, j)fort [t ✏,t )(3.6) astep(3,j) (2,j). We consider each case. t t 2 0 0 ! (i ,j)=(i0,j0)fort (t ,t + ✏]. (3.7) 1Notice that here we are assuming solutions exist t t 2 0 0

18 19 (1, 1) (1,m) (1,j)

(3, 1) (3,m) (3,j)

Case 1 Suppose the cycle contains a step (3,j) (1,j). Then by quasi- Thus, it is not possible to go “up”. Thus any improvement path must be finite, and ! supermodularity, we have steps (3,k) (1,k)forallk =1,...,j. Then from (1,j) so fictitious play must converge. the cycle must continue by going left or! down. We do not want to reach the equilibrium (1, 1), therefore at some point in the cycle there must be a step downwards from This method of proof is somewhat brute force. It can be applied to many situations, but is inadequate to prove convergence for general n m NDQSMGs: indeed, in [6] (1,j0)wherej0 j. This step cannot be to (3,j0)becauseweknowthereisastep  Berger found an example of a 4 5 NDQSMG which does⇥ have an improvement cycle (3,j0) (1,j0). So, there must be a step (1,j0) (2,j0). By quasi-supermodularity ! ! (but conjecturally its fictitious play⇥ converges nonetheless). there are steps (1,k) (2,k)forallk = j0,...,mand specifically for k = j. This implies there is a step (3,j) ! (2,j), and similar steps for k =1,...,j. Having thus extracted ! as much information as we can from our assumption and quasi-supermodularity, we have 3.4 The Shapley system the following picture. Originally it was thought that fictitious play of all games would converge. Then in (1,j ) (1,j) 0 1964, Lloyd Shapley presented an example where fictitious play does not converge, but rather cycles between strategies delineating a triangle in the simplex [17]. This example was extended by Sparrow et al. [18] to a one-parameter family of games, and numerical observations suggest that periodic orbits can be found in other games as well. The (3,j) Shapley system has payo↵ matrices 10 10 Now consider how we arrived at the point (3,j). It is clear from the picture that the A = 10 ,B= 0 1 , for some ( 1, 1). improvement path must have come from the bottom left corner (3, 1). This point cannot 0 1 0 1 0 1 10 2 be part of an improvement cycle. Thus there can be no such step (3,j) (1,j)inan ! @ A @ A improvement cycle. Shapley’s original example was this system with =0.Generallyweheretake between Case 2 Suppose the cycle contains a step (2,j) (1,j). Quasi-supermodularity gives -1 and 1. This has Nash equilibrium at the barycentre of ⌃A ⌃B: us steps (2,k) (1,k)fork =1,...,j.From(1!,j)wemustgoleftordown,andto ⇥ ! A B 1 1 1 1 1 1 T avoid the equilibrium and follow a cycle we must hence have a step (1,j0) (3,j0), where [E ,E ]= ( , , ), ( , , ) ⌃ ⌃ . ! 3 3 3 3 3 3 2 A ⇥ B 1

20 21 A B P2 P2 ⌃B ⌃A ⌃B A 1 A 2 A 3 ! ! ! ZB ZA B 13 A 23 Z23 Z12 B 2 !

⌃ B 3 A ! A B A B A B P1 Z12 P3 P1 Z13 P3 B 1 ! Figure 3.3: Periodic orbit for the Shapley system, =0. Figure 3.5: Transition diagram for > 0 3.5 Open questions

There are many open questions pertaining to ﬁctitious play. As discussed in Section 3.3, In Figures 3.4 and 3.5 above, the region where A 1andB 1isindicatedin ! ! there are many classes of games for which the dynamics are unknown. We now consider blue. Looking at the simplices, one can see that from a position in ⌃A that is in the B some of the questions related to the Shapley system as in the previous section. blue region but above the red dashed line, the next indi↵erence plane crossed will be Z13 B and not Z12. This is represented by the arrows and the red dashed line shown in the transition diagram. The transition diagrams are only a representation: certainly orbits 3.5.1 Transition diagrams in the simplex are not in bijective with paths drawn in the transition diagrams. These

The state space of the Shapley system is ⌃A ⌃B. This creates an issue with visualisation: representations are, however, useful, as they provide an easy way of depicting various it is very dicult to mentally picture things⇥ in a four dimensional space. To help with questions: for example, do there exist orbits of the following type? this problem, Sparrow et al. [18] thought of a way of representing ⌃ ⌃ that enables A ⇥ B one to see quickly and easily what indi↵erence planes are crossed in what order. It is a ⌃B similar idea to the diagrams used by Hahn [9] and Berger [6] but also incorporates the A 1 A 2 A 3 concept of indi↵erence planes. ! ! !

P A P B 2 2 B 2 ⌃A ⌃B ! A Z23 B Z13 ⌃ B 3 A ! B Z23 A Z12 B 1 !

A B A B A B P1 Z12 P3 P1 Z13 P3

1 Figure 3.4: Simplices and indi↵erence planes for the Shapley system, 2 ⇡ The proper phrasing of this question is as follows: for the Shapley system with > 0, B Given the simplices as above, we consider the (four-dimensional) regions where A i do there exist orbits beginning on Z13 that cross the following indi↵erence planes in this and B j. There are three possibilities for the best responses of each of A and!B, exact order: thus giving! us nine such preference regions. We represent these by the grid as follows, B A B A B A B B identifying the top edge with the bottom and the left edge with the right. Z13,Z23,Z13,Z13,Z12,Z12,Z12,Z13.

22 23 A B Equivalently, do there exist orbits such that the sequences of players’ best responses P2 P2 is as follows: ⌃A ⌃B A B B Z23 p1 = p2 A 2331122 B A A B B . 1 Z13 p2 = p3 p2 = p3 B 3311221 2 0 ✓ ◆ 1 B 0 6 Drawn in the simplices, an orbit of this type would look approximately like this, Z23 A beginning at the blue dots: 6 Z12 5 2 5 4 3 3 4 P A P B 2 2 P A ZB P A P B ZA P B ⌃A ⌃B 1 12 3 1 13 3 A Z23 B pA = pA Z13 1 3

B Figure 3.6: Orbit with numbers depicting the points at which an indi↵erence plane is Z23 A traversed. Z12

B B A A A B A B A B At 2, require p2 < p3 to next cross Z13 rather than Z23. P1 Z12 P3 P1 Z13 P3 • At 3, require pA < pA to next cross ZB rather than ZB . • 2 3 12 13 At 4, require pB > pB to next cross ZA rather than ZA . The transition diagrams make it possible to instantly see which regions an orbit may • 1 3 12 13 move into. However, these arrows only describe possible movements from one region to At 5, require pA < pA to next cross ZB rather than ZB . another: sequences of arrows are another matter entirely. The diagram does not mean • 1 3 12 23 that for any path drawn following the arrows there actually exists an orbit (or orbits) that With point 2, for example, this could be more explicitly written as follows: let t2 > 0 A B B B follows this sequence. This is the question I have been considering, focusing specifically be the time such that p (t2) Z13. Then we require that p2 (t2) < p3 (t2). For ease of on the paths shown above before attempting to generalise. Numerical experimentation calculation we use the reparametrised2 system (3.2). Let us now commence a numerical suggests that such an orbit is impossible, and indeed this appears to be the case. investigation of the orbit. The first thing to notice is that it is sucient to consider an orbit starting on the As an example, we calculate the first few inequalities explicitly. The first is clear by 2 B B boundary @(⌃A ⌃B). This is because the system may be projected onto the boundary inspection: we require p1 (0) < p2 (0). Then, let the time at which the orbit reaches ⇥ A B A (see [19, 3]). Z23 be t1 > 0. On this first leg of the orbit (0

24 25 The last line comes from substituting b3 =1 b1 b2 and is performed because it is 3 then easier to get Maple to plot the correct diagrams. Hence t1 may be considered as a 4 function of b1 and b2. Clearly we require t1 > 0. Plotting this in Maple ,wehave

(a) t2 > 0 (orange) (b) s2 > 0 (cyan) (c) t2 >s2 (purple)

Figure 3.8: Plots for t2 and s2

exists a region such that all of the conditions are satisﬁed at once, then the orbit is possible. If upon overlapping the plots there is no such region, then the orbit is impossible. We thus get the following plots: (a) t = 0 and t = (b) t > 0 1 1 1 1

Figure 3.7: Plots for t1

With some thought, these plots of ⌃B are sensible. The blue lines are (extended) A B A indi↵erence planes and the red area is the region where BR (p (0)) = P2 . The green A B line denotes t1 =0andcoincideswiththeindi↵erence plane Z23:iftheinitialpointp (0) A is on Z23, then it will take no time to get there. The red line denotes t1 = :iftheinitial B B B 1 (a) s3 t5 (yellow) point p (0) is in fact P3 , then p (t)willbeconstantfort [0,t1] and will thus never A 2 reach Z23. The orange region on the right hand figure is where t1 > 0. Notice that in Figure 3.9: Plots for t3, t4 and t5. plotting this we are currently ignoring the previous conditions on b1 and b2 (for example the condition for BR (pB(0)) = P A) and are actually extending the function t (b ,b ) A 2 1 1 2 It is clear that for this particular value = 2 , the yellow region where s >t does not from its original domain determined by these conditions to the entirety of 2. 5 5 5 R intersect the red initial region. Thus for = 2 , no orbit may follow the desired sequence Performing similar calculations to find t (0, 1) such that pA(t + t ) ZB ,wefind 5 2 2 1 2 2 13 of indi↵erence planes. 4 1 b + b +2b + b For other values of beta, it is slightly less clear. For example for = we instead get t = 2 1 2 1 . (3.12) 5 2 2 2 the plot Figure 3.10 (a). However, considering also the plot for s2

(b1 b2) s2 = . (3.13) b1 +2b2

Now we require t2 >s2. This forms the purple region in Figure 3.8: Once again, the blue lines denote extended indi↵erence planes, the green lines denote where the relevant quantity is 0, and the red lines where the relevant quantity is . The 1 additional edges of the purple region visible in Figure 3.8 (c) are where t2 = s2. This plot tells us that there do exist regions such that both t2 >s2 and the initial conditions are satisﬁed. Omitting calculations, we continue plotting the further conditions: if there (a) s5 >t5 (yellow) (b) Plots for t5 and t2 overlaid

3 4 Plotting in the simplex can be achieved in Maple by creating a standard plot with b1 against b2 Figure 3.10: Plots for = 5 . and then using the transform function in the plottools library to send (x, y)to(1 x 1 y, p3 y). This 2 2 method proved easier than correctly orienting a 3d plot. Here, there is clearly nowhere where the initial conditions are satisﬁed (red), t2 >s2 4Plots shown are for = 2 ; other values look similar. 5 (purple), and s5 >t5 (yellow). Thus the orbit is still impossible. This appears to happen

26 27 for all values (0, 1). Of course, the2 above reasoning is only numerical and does not constitute a fully rigorous proof. However, it provides a good intuition as to why this particular sequence cannot be realised by a concrete orbit of ﬁctitious play. Chapter 4

The link between replicator dynamics and ﬁctitious play

As we have seen, there are quite some similarities between fictitious play and replicator dynamics, and a bit of numerical experimentation lends credence to the idea of there being some sort of link between the two. As noted by Gaunersdorfer and Hofbauer [8] and in the book by Hofbauer and Sigmund [11], the general rule of thumb is that in the long term, the time averages of solutions to the replicator equation behave as solutions for fictitious play. This was made more precise in Bena¨ım et al. [3], and this chapter will carefully go through the ideas and proofs presented in that paper. The main statement is that the time averages of solutions to replicator dynamics approximate the fictitious play process, and that this approximation gets better as t . We begin by generalising. Instead of considering replicator!1 dynamics and fictitious play as involving matrices at all, we instead just focus on one player (or population). This player receives a stream of outcomes: if he plays strategy i at time t, he will receive the payo↵ Ui(t). The key point here is that the player has no idea of what is behind these outcomes; no idea even of how many other players he is facing, let alone what they are doing, what matrices are possibly involved, whether there is some sensible mechanism behind the calculations of the payo↵s or whether it is just random. For simplicity, we assume the payo↵stobeboundedandmeasurable.Formally,withn the number of available strategies and c>0 a bound on the payo↵s, we have an outcome process: = U(t) [ c, c]n,t 0 . U { 2 } As noted above, Ui(t) is the payo↵ the player receives from playing strategy i at time t. ¯ 1 t Then the time average of the payo↵sisU(t)= t 0 U(s)ds. We next define best responses: for U [ c, c]n, 2 R BR(U)= p ⌃n : p,U =maxq,U . q ⌃n { 2 h i 2 h i} Now we define generalised fictitious play for by U 1 p˙ (t) [BR(U¯(t)) p(t)]. (4.1) 2 t If we set U(t)=Aq(t)whereq(t)istheopponent’sstrategyattimet, then we recover the original fictitious play equation (3.1) for player one.

28 29 We define generalised replicator dynamics by Now it happens that D⌘(U)isasubsetof[BR]⌘(✏)(U)anditisconvenientlyeasyto U ⌘ show that L ✏ D (U). This is the idea of the proof. p˙i(t)=pi(t)[Ui(t) p(t),U(t) ] i 1, . . . , n. (4.2) 2 h i 2 Proposition 1. D⌘(U) [BR]⌘(U). ✓ Here we may set U(t)=Ap(t)torecovertheoriginalversion. ⌘ Proof. Suppose p D (U). Then, given ` such that U` =maxj=1,...,n Uj, clearly we have To show the time averages of solutions to replicator dynamics approximate the ficti- that the `-th unit2 vector is a best response, i.e. P ` =(0,...,0, 1, 0,...,0) BR(U). tious play process, we first need a sensible description of an approximation to the fictitious ⌘ 2 Then for p D (U), we know that p` 1 ⌘. Thus: play process. We do this by approximating the best responses: 2 ` (P )` p` = 1 p` n | | | | Definition 4.1 (Approximation to best responses). Given U [ c, c] , ⌘ > 0, define 1 (1 ⌘) = ⌘. the ⌘-neighbourhood of BR(U)asfollows: 2  | | For i = `, ⌘ 6 [BR] (U)= p ⌃n q BR(U): p q < ⌘ , (4.3) { 2 | 9 2 k k1 } (P`)i pi = 0 pi = pi ⌘. | | | | | |  where is the supremum norm. Hence, for every p D⌘(U), there exists q [BR]⌘(U)suchthat k · k1 2 2 We will now show that the logit map can be used to find approximate best responses. q p ⌘. k k Then, we will use the logit map to find solutions of the replicator equation for a given . Thus D⌘(U) [BR]⌘(U). U ✓ n Definition 4.2 (Logit map). Define L : R ⌃n by ! Continuing the proof of Lemma 4.1, let ✏(⌘)satisfy ⌘ exp(Ui) exp( )=⌘. (L(U))i = n . (4.4) ✏ j=1 exp(Uj) Then for all i, k 1,...,n , 2 { } Lemma 4.1. [12, Proposition 3.1] For everyP U [ c, c]n and every ✏ > 0, there exists a 2 function ⌘(✏) with ⌘(✏) 0 as ✏ 0 such that U exp( Ui ) ! ! Li = ✏ ✏ Uj U j exp( ) L [BR]⌘(✏)(U). ✓ ◆ ✏ ✏ 2 exp( Ui ) exp( Uk ) ✓ ◆ = P ✏ ✏ exp( Uj ) · exp( Uk ) Proof. Given ⌘ > 0, define j ✏ ✏ (U U ) i k ⌘ Pexp( ✏ ) D (U)= p ⌃n Ui + ⌘ < max Uj = pi ⌘ for i =1,...,n . (4.5) = U U . j=1,...,n j k { 2 | )  } j=k exp( ✏ ) 6 This rather awkward-looking definition is actually quite simple once fully understood. This holds for every i, k 1,...,n .P Specifically, this holds for k = ` such that U` = 2 { } To illustrate the idea, consider the following example. maxj Uj. Then if Ui + ⌘ 0, we may take ⌘(✏) to satisfy the inverse of the equation for ✏(⌘), that is let ⌘(✏)satisfy ⌘ 0 ✏ = . D⌘ 0 = (1 ) ⌃ < ⌘, [0, 1] . log ⌘ 0 1 80 1 3 9 1 1 2 | 2 Then for ✏ < ✏(⌘), we have < = @ A @ A U ⌘ ⌘(✏) Here, maxj Uj = U3 =1,andwehave: U1 +⌘ = ⌘ < 1andU2 +⌘ = ⌘;< 1. Thus we require L D (U) [BR] (U), ✏ 2 ⇢ that p1,p2 < ⌘, and furthermore we want the set of all such p ⌃3 with p1,p2 < ⌘. This ✓ ◆ gives the set above. 2 where ⌘(✏) 0as✏ 0. ! ! 30 31 We now use the logit map to describe solutions of replicator dynamics, called “con- 4.1 Time averages tinuous exponential weight” (CEW) by Hofbauer, Sorin, and Viossat [12]. Definition 4.4. Define the time average of p(t)by Definition 4.3 (Continuous exponential weight). Given , define U 1 t t P(t)= p(s)ds. (4.8) t p(t)=L U(s)ds . (4.6) Z0 ✓Zo ◆ For p(t) given by CEW, we have 1 1 1 n t L maps R to ⌃n, so p(t) ⌃. Notice also that p(0) = ( n , n ,..., n ). Later we will d 1 2 P˙ (t)= p(s)ds dt t Theorem 4.1. [12, Proposition 4.1] Continuous exponential weight satisfies the replicator ✓ Z0 ◆ equation. 1 1 t = p(t) p(s)ds t t2 Proof. By definition of L,wehave Z0 1 = (p(t) P(t)). t t exp 0 Ui(s)ds pi(t)= . Thus, exp⇣R t U (s)d⌘s 1 j 0 j P˙ (t) [BR](t)(U¯(t)) P(t) , (4.9) 2 t P ⇣R ⌘ Di↵erentiating log(pi(t)), with (t) 0ast . This clearly looks like an approximation to fictitious play. However, under! continuous!1 exponential weight as defined previously, p(0) = ( 1 , 1 ,..., 1 ), p˙ (t) d t t n n n k = log exp U (s)ds log exp U (s)ds whereas we would like to discuss solutions with any given initial condition. p (t) dt i j k " 0 j 0 !# ✓ ✓Z ◆◆ X ✓Z ◆ Theorem 4.3. [12, 4.4] The solution of the replicator process (4.2) with initial condition t t § d d p(0) int(⌃n) is given by = Ui(s)ds log exp Uj(s)ds 2 dt dt t 0 j 0 ! Z X ✓Z ◆ p(t)=L U(0) + U(s)ds , (4.10) d t 0 dt j exp 0 Uj(s)ds ✓ Z ◆ = U (t) i ⇣ t ⌘ where Ui(0) = log(pi(0)). Then the time average process satisfies P ` exp 0R U`(s)ds t ˙ 1 ↵(t) ¯ R exp Uj(s)ds P(t) [BR] (U(t)) P(t) , (4.11) P 0 2 t = Ui(t) Uj(t) 0 ⇣ t ⌘ 1 j expR U`(s)ds with ↵(t) 0 as t . X ` 0 ! !1 @ A = Ui(t) p(t),U(t) .P ⇣R ⌘ Proof. We may check that (4.10) does indeed satisfy the replicator process (4.2) with the h i correct initial condition. Then it can be seen that t 1 ⌘( 1 ) 1 p(t)=L U(0) + U(s)ds // = L(tU¯(t)+t U(0))// [BR] t (U¯(t)+ U(0)), Theorem 4.2. [12, Proposition 4.2] CEW satisfies · t 2 t ✓ Z0 ◆ p(t) [BR](t)(U¯(t)), (4.7) where ⌘ 1 0ast . This can be rewritten as 2 t ! !1 1 for some (t) 0 as t . P˙ (t) [BR]↵(t)(U¯(t)) P(t) , ! !1 2 t Proof. By Lemma 4.1, with ↵(t) 0ast as required. t ! !1 p(t)=L U(s)ds This intuitively looks like an approximation to the fictitious play process, and indeed ✓Z0 ◆ this idea is broadly correct. The full details involve stochastic approximations of di↵eren- ⌘( 1 ) = L(tU¯(t)) [BR] t (U¯(t)). tial inclusions (see [3]) and even stating these results becomes rather complicated and as 2 such is beyond the scope of this discussion. In essence the limit set of the time averages 1 1 Then we simply set ✏ = t and (t)=⌘( t ). of the replicator process is a subset of the limit set of the fictitious play process.

32 33 Chapter 5 Bibliography

Conclusion [1] Aubin, J. P., Cellina, A., 1984. Di↵erential Inclusions. [2] Brown, G. W., Iterative solution of games by fictitious play, in: T.C. Koopmans (Ed.), Activity Analysis of Production and Allocation, Wiley, New York, 1951, pp. Replicator dynamics and fictitious play are two interesting examples of evolutionary 374-376. games. We have seen in Chapter 2 that the behaviour of the replicator system in the long term ties in with the concept of Nash equilibria and that orbits of the replicator system [3] Bena¨ım, M., Hofbauer, J., Sorin, S., 2005. Stochastic approximations and di↵erential are equivalent to those of the Lotka-Volterra system. In Chapter 3, we have seen that fic- inclusions. SIAM J. Control and Optimization 44 (1), 328-348. titious play is known to converge for zero-sum games and various sizes of non-degenerate [4] Berger, U., 2007. Brown’s original fictitious play. J. Econ. Theory 135, 572–578 quasi-supermodular games ( 3.3), but for the Shapley system ( 3.4) the behaviour of § § fictitious play orbits is highly complicated and not yet completely understood. Finally [5] Berger, U., 2005. Fictitious play in 2 n games. J. Econ. Theory 120 (2), 139-154. in Chapter 4, using the logit map and continuous exponential weight, we have seen that ⇥ as proved in Bena¨ım et al. [3], the time averages of solutions to the replicator system [6] Berger, U., 2007. Two more classes of games with the continuous-time fictitious play approximate the fictitious play process, with this approximation improving as t . property. Games Econ. Behav. 60 (2), 247-261. These are only some examples of dynamics in games. There are many other systems!1 [7] Fudenberg, D., Levine, D., 1998. The theory of learning in games. MIT Press series modelling di↵erent evolutionary behaviour, such as the more general setting of adaptive on economic learning and social evolution. dynamics [11, 9]. It is a wide area with many questions for the future. § [8] Gaunersdorfer, A., Hofbauer, J., 1995. Fictitious play, Shapley polygons, and the Acknowledgements replicator equation. Games Econ. Behav. 11 (2), 279–303. [9] Hahn, S., 1999. The convergence of fictitious play in 3 3 games with strategic Many thanks to Sebastian van Strien for his calm and stress-free supervision without complementarities. Economics Letters 64 (1), 57–60. ⇥ which I would have lost all sanity many months ago; to Samir Siksek for being encouraging and nice; and to Georg Ostrovski for his invaluable comments and help with proofreading. [10] Hofbauer, J., 2000. From Nash and Brown to Maynard Smith: Equilibria, Dynamics, and ESS. Selection 1, 81–88. [11] Hofbauer, J., Sigmund, K., 2003. Evolutionary Game Dynamics. Bull. Am. Math. Soc. 40 (4), 579–519. [12] Hofbauer, J., Sorin, S., Viossat, Y., 2009. Time average replicator and best-reply dynamics. Mathematics of Operations Research 34 (2), 263–269. [13] Nash, J., 1950. Non-cooperative games. Thesis, Princeton University. [14] Ostrovski, G., van Strien, S., 2011. Piecewise linear Hamiltonian flows associated to zero-sum games: transition combinatorics and questions on ergodicity. Regular and Chaotic Dynamics 16, 128–153. [15] Robinson, J., 1951. An iterative method of solving a game. Annals of Mathematics 54 (2), 296–301.

34 35 [16] Sela, A., 2000. Fictitious play in 2 3 games. Games Econ. Behav. 31, 152–162. ⇥ [17] Shapley, L.S., 1964. Some topics in two-person games. In: Advances in Game Theory. Princeton Univ. Press, Princeton, NJ, pp. 1-28.

[18] Sparrow, C. et al., 2008. Fictitious play in 3 3 games: The transition between periodic and chaotic behaviour. Games Econ. Behav.⇥ 63 (1), 259–291.

[19] van Strien, S., Sparrow, C., 2011. Fictitious play in 3 3 games: chaos and dithering behaviour. Games Econ. Behav. 73 (1), 262–286. ⇥

[20] Zeeman, E.C., 1980. Population dynamics from game theory. In: Global Theory of Dynamical Systems. Lecture Notes in Mathematics 819. Springer-Verlag.