Contents
1 Game Theory 3 1.1 Gamesinnormalform ...... 3 1.2 Bimatrix games ...... 5
2 Replicator Dynamics 6 2.1 Long term behaviour and convergence ...... 8 Dynamics in Games 2.2 Correspondence with Lotka-Volterra dynamics ...... 11 3 Fictitious Play 14 3.1 Definitions ...... 14 0820964 3.2 Existence and uniqueness of solutions ...... 15 3.3 Games with convergent fictitious play ...... 17 3.4 The Shapley system ...... 21 March 2012 3.5 Open questions ...... 22 3.5.1 Transition diagrams ...... 22
4 The link between replicator dynamics and fictitious play 29 4.1 Time averages ...... 33
5 Conclusion 34
1 Introduction Chapter 1
Game theory is a subject with a wide range of applications in economics and biology. In Game Theory this dissertation, we discuss evolutionary dynamics in games, where players’ strategies evolve dynamically, and we study the long-term behaviour of some of these systems. The replicator system is a well known model of an evolving population in which successful We begin by briefly going over the game theoretic ideas and definitions that we hope the subtypes of the population proliferate. Another system known as fictitious play (also reader is already familiar with, along with some of the biological motivation behind the called best response dynamics) models the evolution of a population in which at any given technical details. The material presented here can be found in any number of books on time some individuals review their strategy and change to the one which will maximise game theory: see for example Hofbauer and Sigmund [11, 6]. their payo↵.Wewillinvestigatethesetwosystemsandtheirbehaviourandfinallyexplore § the link between the two. 1.1 Games in normal form
Generally we assume an individual can behave in n di↵erent ways. These are called pure strategies, denoted by P1 to Pn, and could represent things like “fight”, “run away”, “play rock in a game of Rock Paper Scissors”, or even “sell shares” in a more economic context. For biology, we take it that one individual plays one pure strategy. Then the make up of the population can be described by the proportions of individuals playing each pure strategy. These proportions form what is said to be simply the strategy of the population as a whole. As the proportions should clearly sum to one, the set of possible strategies is a simplex:
n n ⌃n = p =(p1,...,pn) R : pi 0and pi =1 . 2 ( i=1 ) X Here pi can be thought of as the chance of randomly picking an individual who uses strategy i. By definition a population in which everybody plays strategy Pi corresponds to the i-th unit vector in the simplex. We then consider games to be played by a population that evolves continuously as time progresses, describing a path in the simplex. To model some form of evolution, we need to somehow quantify what happens when one individual encounters another: which strategy will perform better? Given that we know the composition of the population, how well will one strategy perform on average over many encounters? In this context, the phrases “perform better” or “succeed” refer to an increase in evolutionary fitness: more food, more territory, more females - anything that increases the chance of reproduction. The way to model this is to assign a “payo↵”: when an individual using strategy i meets a second individual following strategy j,thei-strategist will receive a payo↵ of aij. This creates an n n payo↵ matrix A =(a ). Then a population with strategy p playing ⇥ ij
2 3 against a population with strategy q will receive a payo↵ 1.2 Bimatrix games
n n An n m bimatrix game covers the situation where there are two players or populations, p Aq = a p q . ⇥ · ij i j one with n strategies and one with m strategies. In normal form this can be represented i=1 j=1 X X by a pair of n m matrices [A, B], so that if Player A plays strategy i and Player B ⇥ A game described this way is called a game in normal form. This is as opposed to plays strategy j, then the payo↵ to Player A will be aij and the payo↵ to Player B will extensive form, where a game is written as a tree. Extensive form can describe games be bij. This allows for the representation of games such as the Prisoner’s Dilemma or with imperfect or incomplete information, such as where players do not know the payo↵s Rock, Paper, Scissors. The concepts of Nash equilibria and best responses may then be or when the payo↵saresubjectto(random)noise. generalised to bimatrix games. For simplicity of notation we write the strategy p ⌃A of Player A as a row vector and the strategy q ⌃ of Player B as a column vector.2 The next concept needed is that of a best response. 2 B Definition 1.1 (Best responses). A strategy p is a best response to a strategy q if for Definition 1.4 (Best responses). A strategy pˆ ⌃A is a best response of Player A to the strategy q ⌃ of Player B if for every p 2⌃ we have every p0 in ⌃n,wehave 2 B 2 A p Aq p0 Aq. · · pˆAq pAq. (1.2) We denote the set of best responses to q by BR(q). Essentially, a best response to q is the strategy (or strategies - best responses are not necessarily unique) that will maximise Similarly a strategy qˆ ⌃B is a best response of Player B to the strategy p ⌃A of Player A if for every q 2 ⌃ we have 2 the payo↵.Thisisthusausefulconceptasmanyofthesystemswewillconsiderwill 2 B involve a population attempting to maximise its payo↵. pBqˆ pBq (1.3) If a strategy is a best response to itself, this is called a Nash equilibrium, named for John Forbes Nash. More formally: We denote by BR (q) ⌃ and BR (p) ⌃ the sets of best responses of Players A ⇢ A B ⇢ B Definition 1.2 (Nash equilibria). A strategy p is a Nash equilibrium if A and B to strategies q and p respectively. Definition 1.5 (Nash equilibria). A strategy pair (pˆ, qˆ)isaNash equilibrium for the p Ap q Ap p. · · 8 bimatrix game [A, B]ifpˆ is a best response to qˆ and qˆ is a best response to pˆ: that is, (pˆ, qˆ) BR (pˆ) BR (qˆ). If a strategy is the unique best response to itself, then it is said to be a strict Nash 2 A ⇥ B equilibrium.
Notice that p is a Nash equilibrium if and only if its components satisfy
(Ap) = c i such that p > 0, i 8 i for some constant c>0, where
p + + p =1. 1 ··· n It is well known that every finite game has a (not necessarily unique) Nash equilibrium. Multiple proofs, including Nash’s original proof from his thesis [13], may be found in a paper by Hofbauer [10, Theorem 2.1]. It seems sensible that if a strategy is a Nash equilibrium, then the population should stop evolving at this state, as it is already following the best strategy. This brings us to questions of stability of equilibria.
Definition 1.3 (Evolutionarily stable states). A strategy pˆ is said to be an evolutionarily stable state (ESS) if pˆ Ap > p Ap (1.1) · · for every p = pˆ in a neighbourhood of pˆ. 6
4 5 Note that it follows from Lemma 2.1 that we may rewrite A in a simpler form, for example having only 0 in the last row. The following useful lemma (given as an exercise in Hofbauer and Sigmund [11, p. 68]) allows us to transform the system so that we may move a point of interest such as a Nash equilibrium to the barycentre. Chapter 2 Lemma 2.2. The projective transformation p q given by ! p c q = i i i n p c Replicator Dynamics l=1 l l P ˜ 1 (with cj > 0) changes (2.2) into the replicator equation with matrix A =(aijcj ). Notice 1 1 1 that this enables us to move a specific point p =(pi) to the barycentre ( n , n ,..., n ) by 1 taking c = p . The replicator equation is a long-standing simple population model for the evolutionary i i success of di↵erent subtypes within a population. It can be found in many books on Proof. This calculation is somewhat tedious but necessary as the result will be useful evolutionary game dynamics, including those by Hofbauer and Sigmund [11, 7] and by later. First, let us calculate Fudenberg and Levine [7, 3]. § § n If a population is very large, then one individual is evolutionarily speaking a very 1 (A˜q) = a c q small insignificant part of the population. Consequently we tend to ignore the fact that i ij j j j=1 populations come in discrete sizes, and assume the population changes continuously over Xn time. Let us denote by p the proportion of type i individuals within the population. Then 1 1 i = a c p c ij j j j n p c p =(p1,...,pn) ⌃n can be considered as a continuous function of time t. The relative j=1 l=1 l l success of each2 type will be dependent on the overall composition of the population. X 1 P Denote this fitness by fi(p). Then the average fitness for the population will be f¯(p):= = n (Ap)i. n l=1 plcl i=1 pifi(p). Also we have P Definition 2.1 (Replicator Dynamics). Given p(t) ⌃n, fi : ⌃n R,thereplicator P 2 ! equation is n ¯ 1 p˙i = pi(fi(p) f(p)) i =1, . . . , n. (2.1) q A˜q = a c q q · ij j i j j=1 Notice that we can add a function (p)toeachfi without changing the dynamics. X n n In the case where fitness is given by a payo↵ matrix A, this simplifies to 1 = n 2 aijcipipj. p˙ = p ((Ap) p Ap) i =1, . . . , n. (2.2) ( l=1 plcl) i=1 j=1 i i i · X X Lemma 2.1. The addition of a constant c to the j-th column of A does not change the P dynamics. Proof. Let B be the matrix A with the addition of a constant c to the j-th column. Then, n
(Bp)i = aikpk + cpj k=1 Xn n n p Bp = a p p + cp p . · kl k l j k Xk=1 Xl=1 Xk=1 Noting that k cpjpk = cpj k pk = cpj, we see that p ((Bp) p Bp)=p ((Ap) + cp (p Ap cp )) P i i P· i i j · j = p ((Ap) p Ap). i i · Thus the dynamics do not change.
6 7 Then by definition, Lemma 2.4. Rest points p int(⌃n) of the replicator equation are precisely points p such that 2 dqi d pici n = n dt dt l=1 plcl (Ap)1 =(Ap)2 = =(Ap)n and pi =1. ✓ ◆ n ··· i=1 p˙ici pici X = P p˙lcl n n 2 Equivalently, a point in the interior of the simplex ⌃n is a rest point if and only if it is plcl ( p c ) l=1 l=1 l l l=1 a Nash equilibrium. n nXn Pp c = i i Pa p a p p Proof. Points in the interior have p > 0foralli, so this follows directly from the previous n p c ij j ij i j i l=1 l l j=1 i=1 j=1 ! lemma. X X X P p c n n n n i i p c a p a p p Lemma 2.5. If the omega limit set !(p) of a point p ⌃n under replicator dynamics n 2 i i ij j kj k j consists of one point q, then q is a Nash equilibrium of2 the underlying matrix. ( l=1 plcl) i=1 " j=1 j=1 !# X X Xk=1 X n n n Proof. Suppose q = !(p)forsomep ⌃ . Then by definition and slight abuse of picPi 1 n 2 { } 2 = n aijpj n aijcipipj notation , there exists a sequence tn with limn tn = such that p(tn) q. This plcl plcl !1 l=1 j=1 l=1 i=1 j=1 ! implies that p˙ (t ) 0. Suppose q is not a Nash equilibrium.1 Then there! exists a X X X n ! = pPici (A˜q)i q A˜q P standard basis vector ei such that ei Aq > q Aq. Equivalently, there exists ✏ > 0such · that e Aq q Aq > ✏. By definition,· (Aq) ·= e Aq,thuswehave ⇣qi ˜ ⌘ ˜ i · · i i · = n 1 (Aq)i q Aq . l=1 cl ql · q˙i = qi(ei Aq q Aq) ⇣ ⌘ n · · This is almost the replicator equation: the only di↵erence is the factor of plcl. This q˙ P l=1 = i > ✏ t 0. can be fixed by a rescaling of time, and we obtain the replicator equation with matrix ) qi 8 ˜ 1 P A =(aijcj )asrequired. But then q˙ 0foranyt : a contradiction. 6! n !1 Lemma 2.6. If q is a Lyapunov stable point of the replicator equation with matrix A, 2.1 Long term behaviour and convergence then q is a Nash equilibrium for the game with payo↵ matrix A.
We now endeavour to say something about the long term behaviour of replicator dy- Proof. Suppose q is not a Nash equilibrium. Then by continuity there exists an i and some ✏ > 0suchthat(Ap) p Ap > ✏ for all p in some neighbourhood of q. Thenp ˙ > ✏p , namics. As one would expect, this is closely tied to the concepts of Nash equilibria, i · i i evolutionarily stable strategies and Lyapunov stability1. so pi increases exponentially for all p in some neighbourhood of q. This contradicts the Lyapunov stability of q. Lemma 2.3. If p ⌃n is a Nash equilibrium for the game with payo↵ matrix A, then p is a fixed point of the2 corresponding replicator equation. Notice that not every rest point of the replicator equation is a Nash equilibrium: any pure strategy will always be a rest point (representing the idea that extinct subtypes Proof. Suppose p is a Nash equilibrium. Then, as discussed in the previous chapter, its cannot come back to life), but there are certainly games where a given pure strategy is components satisfy not a Nash equilibrium. Similarly, not every Nash equilibrium is Lyapunov stable under n fictitious play: for example in a game where every strategy is a Nash equilibrium (take (Ap)i = c i such that pi > 0, for some constant c>0, where pi =1. 8 aij =1foralli, j). i=1 X Thus, Theorem 2.1. If q ⌃n is an evolutionarily stable state for the game with payo↵ matrix n A, then q is an asymptotically2 stable rest point of the corresponding replicator equation. p Ap = p c = c. · i i=1 Proof. Consider the function X Considering the replicator equation, we see that n qi F : ⌃n [0, ], p p . p˙ =0 i p ((Ap) p Ap)=0 ! 1 7! i i 8 () i i · i=1 p ((Ap) c)=0. Y () i i We first show that this has a unique maximum at q. To see this, we use Jensen’s This clearly holds when p is a Nash equilibrium. Inequality:
1 2 It is assumed the reader is familiar with Lyapunov stability and omega limit sets: for definitions and Here we use p to denote both the point p ⌃n and the solution p(t) to the replicator equation discussion of these concepts see for example Hofbauer and Sigmund [11, 2.6] satisfying p = p(0). 2 § 8 9 Theorem 2.2 (Jensen’s Inequality). If a function f : I I is strictly convex on an some neighbourhood of q, p = q. Notice that F (p) > 0forallp ⌃n. Now interval I, then ! 6 2 n n d n f q p q f(p ), (log F )˙ = qi log pi i i i i dt i=1 ! i=1 i=1 ! X X n X d where q =(qi) int(⌃n) and pi I, with equality if and only if p1 = p2 = = pn. = q log p 2 2 ··· dt i i We apply Jensen’s Inequality to the function log(F ), setting I =[0, ]and0log0= i=1 1 Xn 0log = 0. Notice that log is strictly increasing, so F has a maximum at q if and only p˙ 1 = q i if log F has a maximum at q. Now, i q i=1 i X n n q i = qi ((Ap)i p Ap) log(F (p)) = log pi · ! i=1 i=1 X n Y = q Ap p Ap. qi · · = log pi i=1 By assumption, q is an ESS, and so by definition we have that q Ap p Ap > 0for Xn F˙ · · p in a neighbourhood of q. Hence, F > 0inaneighbourhoodofq.WeknowthatF is = qi log pi. strictly positive on ⌃n,thuswehavethatF>˙ 0andF is a strict Lyapunov function. So, i=1 X q is asymptotically stable and by previous lemmas (2.3 and 2.6), q is an asymptotically By Jensen’s Inequality, stable rest point of the replicator equation.
n p n p log q i q log i i q i q 2.2 Correspondence with Lotka-Volterra dynamics i=1 i ! i=1 i X X ✓ ◆ n p n Lotka-Volterra dynamics have been well studied and are generally one of the first examples q log i log p . () i q i of population modelling that one encounters. Consequently a correspondence between i=1 i i=1 ! X ✓ ◆ X replicator dynamics and Lotka-Volterra systems gives us a rapid gain in understanding n for comparatively little e↵ort, and is thus very helpful. Since i=1 pi =1thisgivesus n Throughout this section, p =(pi)i=1 ⌃n will be used to denote variables for replica- n n n 1 n 1 2 P tor dynamics and q =(qi) R+ will be used to denote variables for Lotka-Volterra i=1 2 qi log pi qi log qi 0, dynamics. i=1 i=1 X X Definition 2.2 (Lotka-Volterra dynamics). The generalised Lotka-Volterra system for with equality if and only if pi = cqi for some constant c and for all i. As we must have n 1 q R+ is given by p ⌃n, the only viable possibility here is that c =1.Thus, 2 n 1 2 n n q˙i = qi ri + bijqj , j=1 ! qi log pi qi log qi, X i=1 i=1 for i =1,...,n 1, where the r and b are constants. X X i ij for all p ⌃n, with equality if and only if p = q. So the function F has a unique Written thus, the generalised Lotka-Volterra equations and replicator dynamics are 2 maximum at q. both first order non-linear ordinary di↵erential equations. Replicator dynamics involves Next we show that F is a strict Lyapunov function at q, i.e. that F>˙ 0forallp in n equations with cubic terms on an n 1 dimensional space, and the Lotka-Volterra system involves n 1equationswithquadratictermsonan n 1dimensionalspace.A correspondence does not seem that unreasonable.
n 1 Theorem 2.3. There exists a smooth invertible map from p ⌃n : pn > 0 to R+ that maps the orbits of replicator dynamics to the orbits of the{ Lotka-Volterra2 system} 2.2 with r = a a and b = a a . i in nn ij ij nj
10 11 Proof. First notice that by Lemma 2.1 we may assume without loss of generality that the Then last row of A consists of zeros, i.e. (Ap)n =0.Defineqn =1.Considerthetransformation d qi n 1 p˙i = n T : p ⌃n : pn > 0 R+ dt qj { 2 } ! j=1 ! p q, n q˙iP qi j=1 q˙j 7! = n q n q n q given by j=1 j j=1 j ⇥ Pj=1 j pi n 1 n n 1 qi = P qi P P i=1 qi pn = r + b q r + b q n q i ij j n q i ij j j=1 j j=1 ! j=1 j j=1 ! for i =1,...n 1. This has inverse given by X P X P n n n Pn q p = i = pi ql aijpj pi aijpj i n 1 ! j=1 qj +1 l=1 j=1 i=1 j=1 Xn X X X for i =1,...n 1, and P = p q ((Ap) p Ap) . i l i · 1 l=1 pn = n 1 . X q +1 j=1 j By a change in velocity we thus have the replicator equation. Clearly T is smooth and invertible. We nowP show that T maps the orbits of the replicator 1 There are many other interesting results involving the replicator equation which would equation to the orbits of the Lotka-Volterra system (and that T performs the reverse). Assume the replicator equation holds. Then exceed the scope of this work. A full classification of dynamics for n =3wasgivenby Zeeman [20], and specific examples may be found in Hofbauer and Sigmund [11, 7.4]. d p § q˙ = i i dt p ✓ n ◆ pi = ((Ap)i (Ap)n) pn n
= qi aijpj j=1 ! X n 1 = q a + a q p . i in ij j · n j=1 ! X By a change in velocity we can remove the extra p . Noting that we assumed a 0for n nj ⌘ all j, set ri := ain and bij := aij. Then we have the Lotka-Volterra system as required. Assume the Lotka-Volterra equations hold. Then define aij := bij for i, j =1,...,n 1. Define a := r for i =1,...,n 1anda := 0 for j =1,...,n. For convenience of in i nj notation define qn =1.
12 13 A B P2 P2 ⌃A ⌃B A Z23 B Z13
B Z23 Chapter 3 A Z12
A B A B A B Fictitious Play P1 Z12 P3 P1 Z13 P3
Figure 3.1: Simplices and indi↵erence planes for the Shapley system ( 3.4), 1 § ⇡ 2 The fictitious play process was originally defined by Brown in 1951 [2] as a method of computing Nash equilibria in zero-sum games. It is now used primarily as a model of Then the fictitious play process is defined as follows: learning (see Berger’s paper [4] for more of the history). At a given moment, each player computes a best response to the average of his opponent’s past strategies using a payo↵ d pA =BR (pB) pA (3.1) matrix and instantly plays this best response. This causes each player’s average to move dt A continuously towards the best response. In terms of populations this can be interpreted d pB =BR (pA) pB. (3.2) as follows: in each moment of time, a proportion of the current population changes from dt B their current strategy to the best response. As mentioned previously, we are here considering pA(t)andpB(t)tobetheaverages of the players’ past strategies, or alternatively as populations many up of individuals playing 3.1 Definitions pure strategies: it is not that the player is playing a mixed strategy.
Let [A, B] be a bimatrix game. We will also use A and B to refer to the two players, but the context will make clear what is meant. Following the conventions of Sparrow et 3.2 Existence and uniqueness of solutions al. [18], we set the averages of the pure strategies of the players to be pA(t)andpB(t), It is important to note that as the best response is not necessarily unique, fictitious play defining these as row and column vectors respectively. We define the best response sets is a di↵erential inclusion. Consequently we must be very careful regarding existence and BR and BR to be the sets of best responses for each player (see Section 1.2). We also A B uniqueness of solutions. Firstly we notice that there is only a problem on the indi↵erence define: planes defined previously. These form a codimension-one subset of ⌃ ⌃ . Outside vA(t):=ApB(t) A B of this subset, the best responses of each player are unique and so we have⇥ existence of vB(t):=pA(t)B uniqueness of solutions. Generically the best response sets will consist of a pure strategy. If there is more than B A A B Lemma 3.1 (Solutions of fictitious play). If BRA(p )=Pi and BRB(p )=Pj , then one pure best response, then the best response set will consist of all convex combinations fictitious play is a di↵erential equation with solutions of these pure strategies. The strategies for which there are multiple best responses form A A A t A planes in the simplex, called indi↵erence planes as this is where a player is indi↵erent p (t)=(p (0) P )e + P i i between two or more strategies. B B B t B p (t)=(p (0) P )e + P , j j Definition 3.1 (Indi↵erence planes). The indi↵erence planes for players A and B re- for t [0, ✏) for some ✏ > 0. spectively are 2 It is worth noting that the system can be reparametrised in such a way that - assuming B A B A B A B A Zij := p ⌃A : vi (p )=vj (p )=max vk (p ) ⌃A existence and uniqueness of solutions - the players would reach their best response at time 2 k { } ✓ n o 1. This is particularly useful when modelling the system numerically. A B A B A B A B Zkl := p ⌃B : vk (p )=vl (p ) = max vk (p ) ⌃B. Lemma 3.2 (Reparametrisation). The fictitious play system (3.1) can be reparametrised 2 k { } ✓ n o as described above.
14 15 t Proof. Set s =1 e . Then P A P B 2 2 ⌃A ⌃B d dpA dt pA = ds dt ds = et(P A pA(t)) ZB ZA i B 13 A 23 1 Z23 Z12 = (P A pA( ln(1 s))). 1 s i From the solution above, we can calculate P A ZB P A P B ZA P B pA( ln(1 s)) = (pA(0) P A)(1 s)+P A. 1 12 3 1 13 3 i i Then we have Figure 3.2: Demonstration of the failure of the transversality condition in the Shapley d 1 pA = (P A P A (1 s)(pA(0) P A)) system with =0(see 3.4) ds 1 s i i i § A A = Pi p (0). B A that as BRA(p (0)) = Pk , there exists ✏ > 0suchthatforallt with t < ✏,wehave B A | | Following the same procedure for pB, this gives us solutions BRA(p (t)) = Pk . A A B Transversality means that ((1 )p (0) + Pk ) / Zij for any non-zero value of . A A A 2 A A p (s)=(1 s)p (0) + sPi (3.3) Thus, we may choose with 0 < < ✏ such that BRB((1 )p (0) + Pk )isuniqueand A A pB(s)=(1 s)pB(0) + sP B, (3.4) BRB((1 + )p (0) Pk )isunique.Thenwemaydefinethesolutionasfollows: j pA(t)=tP A +(1 t)pA(0) if t < ✏ for s [0, ✏)forsome✏ > 0. k | | 2 pB(0) if t =0 This is all very well when the best response is unique. However, most solutions B A B p (t)= tBRB(p ( )) + (1 t)p (0) if 0