The Pennsylvania State University Schreyer Honors College

Department of Mathematics

An Optimal Control Problem Arising from an Evolutionary Game

Wente Brian Cai Spring 2014

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Mathematics with honors in Mathematics

To be reviewed and approved* by the following:

Christopher Griffin Research Associate & Asst. Professor Thesis Supervisor

Nate Brown Professor of Mathematics Honors Adviser

*Signatures will be on file if this thesis is approved. Abstract This paper is an integrative study of evolutionary and optimal control. We first study the basics of evolutionary game the- ory and introduce the model that we would like to study based on the game of Rock-Paper-Scissors. We then move on to an introduction of optimal control and identify the requirements that need to be fulfilled in order for a solution to become optimal. Finally, we explore differ- ent methods of modeling the Rock-Paper-Scissors game as an optimal control problem and try to find an optimal control that minimizes the cost of the game. Various linearization schemes are attempted and the results are discussed.

i Contents

List of Figures iii

1 Introduction 1

2 Preliminaries and Literature Review 2 2.1 ...... 2 2.2 Optimal Control Theory ...... 4 2.3 Linear Quadratic Control Example ...... 7 2.4 Sufficienct Conditions for Optimal Control ...... 9 2.5 Problem Statement: Relating Optimal Control to Game Theory 10

3 Optimal Control of Generalized Rock-Paper-Scissors 11 3.1 Constructing a General Form of the Problem ...... 12 3.2 Linearizing the Problem ...... 15 3.3 A Linear Problem ...... 16

4 Future Work and Conclusions 20 4.1 Future Work ...... 20 4.2 Conclusion ...... 20

Bibliography 22

ii List of Figures

1 (a) An illustration of an unstable fixed point when a = −5; (b) An illustration of a non-linear center Nash equilibrium fixed point when a = 0; (c) An illustration of a state Nash Equilibrium fixed point when a = 1...... 4 2 Computed optimal control of rock-paper-scissors evolutionary game: x(0) = 0.5, y(0) = 0.2, z(0) = 0.3, T = 40, σ = 1/100 . 12 3 Computed optimal control of rock-paper-scissors evolutionary game varitation: x(0) = 0.5, y(0) = 0.2, z(0) = 0.3, T = 40, σ = 1/100...... 13 4 Computed optimal control of partially linearized dynamics 1 1 1 x(0) = .3 − 3 , y(0) = .3 − 3 , z(0) = .4 − 3 ...... 16 5 Optimal control of fully linearized dynamics: x(0) = .3 − 1 1 1 3 , y(0) = .3 − 3 , z(0) = .4 − 3 , σ =1 ...... 18

iii 1 Introduction

Evolutionary game theory studies the decision making of games over time, such that dominant strategies survive and get passed on to the next genera- tion while failing strategies are phased out. We start with a large population of players playing a two player game. Each player is assigned a (by genetics), and the resulting payoff from play will determine which strategy was more successful and thus which strategy reproduces. We can then track the growth or decline of strategies in our population and predict the trajec- tory of each strategy. Our goal is to study the dynamics of the game while we dynamically alter the payoffs of the game. In the case of Rock-Paper- Scissors, it is understood that the optimal strategy is to pick rock, paper, or 1 1 1 scissor in equal amounts ( 3 , 3 , 3 ). Consequently, this forms a fixed point of the evolutionary game resulting from rock-paper-scissor play. Under certain payoff assumptions in a population with many scissors and many rocks and only some paper, it is clear that the population of scissors will decrease (from lack of reproduction) while the number of paper players increases. For certain classes of payoff, this process will lead to the 1 1 1 fixed point ( 3 , 3 , 3 ), while in other cases we may observe cyclic behavior. In the case of convergent behavior, how quickly that fixed point is reached is dependent on the game. The larger the difference between the winning and losing payoff (i.e. the bigger the gap between winning and losing), the quicker the population approach the fixed point strategy. In this thesis, we consider the rock-paper-scissors payoff matrix:  0 1 + a −1  A =  −1 0 1 + a a ∈ R 1 + a −1 0 Our objective is to dynamically alter the value of a (rather than leaving a static value) in order to drive the population toward the fixed point. However, we assume that altering the value of a may be hard and therefore we attempt to minimize the total cost associated with driving the population to the prescribed fixed point. We study this problem theoretically and numerically showing two solu- tions for the resulting non-linear optimal control problem with interesting properties. We then attempt to linearize the control problem to obtain a closed form solution. Unfortunately, in so doing we lose some of the charac- teristics of the control problem and obtain an entirely new control problem with new solution structure. We then discuss in future work how to recover the original problem while maintaining a linear dynamical system, but intro- ducing non-linearity and non-convexity into the objective function.

1 2 Preliminaries and Literature Review

2.1 Evolutionary Game Theory Game theory is the study of decision making when an agent’s payoff is af- fected by the decisions of (many) other agents and is based on the payoff one gets for using certain strategies [2, 3, 5, 9, 12, 13]. For us a game consists of two players using a set, K of n pure strategies denoted by i = 1, 2, . . . , n. When Player I uses strategy m ∈ K and Player II uses strategy n ∈ K, there is a payoff amn. With values aij, i, j = 1, 2, ..., n, we can create a n × n payoff matrix A for player I (when we assume the game is symmetric, Player II’s payoff matrix is AT ) [12]. Game theory is an extensive subject; the inter- ested reader should consult [2, 3, 5, 9, 12, 13] for details. All standard game theoretic definitions given in this thesis can be found in these references. T A player’s mixed strategy is a column vector x = (x1, x2, ..., xn) where each xi is the probability of using strategy i ∈ K. Let Sn be the set of all mixed strategies in the game. So if Player I and II uses mixed strategies T x, y ∈ Sn, the expected payoff for Player I is x Ay and the expected payoff for Player II is xT AT y. The strategy x is said to be the best response to y if T T x Ay ≥ z Ay ∀ z ∈ Sn (1) Definition 2.1. A strategy x∗ is a (Nash) equilibrium for the symmetric two player game with payoff matrix A if: ∗ T ∗ T ∗ (x ) Ax ≥ y Ax ∀ y ∈ Sn (2) In evolutionary game theory, we observe how a population of players changes their strategies over time. Let’s consider a large population playing the same we defined earlier. Let pi(t) ≥ 0 be the number of P individuals at time t using strategy i and let p(t) = i∈K pi(t) > 0 be the to- T tal population. The population state is the vector x(t) = (x1(t), x2(t), ...xn(t)) such that each xi(t) is the proportion of the population using strategy i at T time t. The expected payoff for strategy i ∈ K would then be ei Ax and the T th population average payoff would be x Ax. Here ei is the i standard basis vector in Rn, when we have n strategies. Our goal is to derive a differential equation that determines the growth or decline of strategy proportions in the population. In our scenario, the growth and decline of strategies depends on the fitness of the strategy. In other words, the more successful a strategy is, the quicker the population will adopt that strategy over time while poorer strategies die off. The following derivation is due to [15]. We can define the change in the number of individuals using strategy i as T p˙i = [β + ei Ax − δ]pi (3)

2 where β ≥ 0 is the initial fitness of individuals in the population and δ ≥ 0 is the death rate for all individuals. By definition, p(t)xi(t) = pi(t). Differ- entiating Equation 3 by t, we get: T T px˙ i =p ˙i − px˙ i = [β + ei Ax − δ]pi − [β + x Ax − δ]pxi (4) If we divide both sides by p, we obtain the replicator dynamics: T T x˙ i = [ei Ax − x Ax]xi i = 1, ..., n (5) The replicator dynamics tell us that strategies that give a greater than av- erage payoff grow in the population and strategies that give less than the average payoff decline in the population. The rest points of the replicator dynamic are the zeros of the right hand T side of the replicator equation; (i.e., all points x ∈ Sn such that ei Ax = xT Ax). A rest point, x, is stable if every neighborhood B of x contains a ◦ neighborhood B such that if we start at x0, then the solution flow ϕ(t, x0) ∈ B for all t ≥ 0. In other words, x is stable if whenever a point starts in a neighborhood of x, it will stay contained in that neighborhood over time. It ∗ is asymptotically stable if there exists a B such that limt→∞ ϕ(t, x0) = x for ∗ all x0 ∈ B . The Folk Theorem of Evolutionary Game Theory explains the relationship between resting points and Nash equilibrium [6,7]. Theorem 2.2 (Folk Theorem of Evolutionary Games). Consider the repli- cator dynamics given in Equation 5 1. If x∗ is a Nash equilibrium, then it is a rest point 2. If x∗ is a strict Nash equilibrium, then it is asymptotically stable 3. If the rest point x∗ is the limit of an interior orbit, then x∗ is a Nash equlilibrium 4. If the rest point x∗ is stable, thein it is a Nash equlibrium For the remainder of thesis, we will be exploring a two-player, symmetric game: Rock, Paper, Scissors. In this game, each pure strategy dominates (i.e., beats) exactly one other. Every child in the world knows: Rock beats scissor, scissors beats paper and paper beats rock. So Strategy 1 dominates 2, Strategy 2 dominates 3, and Strategy 3 dominates 1. If two players pick the same strategy, the payoff is 0. Our payoff matrix is the following and is adapted from [7]:  0 1 + a −1  A =  −1 0 1 + a a ∈ R (6) 1 + a −1 0

3 Zeeman [16] proved the following regarding Rock, Paper, Scissors:

Theorem 2.3 (Zeeman’s Theorem [16]). The following conditions are equiv- alent for the Rock-Scissor-Paper game:

1. x∗ is asymptotically stable

2. x∗ is globally stable

3. det A > 0

4. (x∗)T Ax∗ > 0

What this tells us is that if we choose a > 0, det A > 0 which results ∗ in all orbits in the interior of Sn converging to x . If a < 0, det A < 0 and all orbits in the interior of Sn converge to the boundary ∂Sn. Finally, if a is zero, then det A = 0, and all orbits in the interior of Sn are closed orbits around x∗. This is illustrated in Figure 1.

1.0

0.8 0.40

0.6 0.35

z 0.5 z z 0.4 0.30

0.2 0.25

0.0 0.0 0.0 0.0 0.0 0.0 0.25 0.25 0.2 0.2 0.30 0.30

0.4 0.4 0.5 0.5 0.35 0.35 x y y x y x 0.6 0.6 0.40 0.40

0.8 0.8 0.45 0.45 1.0 1.0 (a) (b) (c)

Figure 1: (a) An illustration of an unstable Nash Equilibrium fixed point when a = −5; (b) An illustration of a non-linear center Nash equilibrium fixed point when a = 0; (c) An illustration of a state Nash Equilibrium fixed point when a = 1.

2.2 Optimal Control Theory An optimal control problem [1, 4, 8] deals with finding time varying optimal controls (or inputs) needed to maximize or minimize some cost as a functional of time varying state and the control. For the purposes of this thesis, we will investigate optimal control problems that depend only implicitly on time. Let x1(t), ...xn(t) be the list of state variables and u1(t), ..., um(t) be the control

4 inputs. The system dynamics is the set of n first order differential equations that dictates the path of the control variables:

x˙ 1 = g1(x1(t), . . . , xn(t), u1(t), . . . , um(t), t) (7) . .

x˙ n = gn(x1(t), . . . , xn(t), u1(t), . . . , um(t), t) (8)

The homogeneous state equation can then be defined as:

˙x(t) = g(x(t), u(t)) (9) such that   x1(t)  .  x(t) =  .  (10) xn(t) is the vector of our state variables and   u1(t)  .  u(t) =  .  (11) um(t) is the vector of controls. We will focus on a specific form for the optimal control problem. Our goal is to find an optimal control u∗ that can minimize (or maximize) our cost functional: Z T J = Ψ(x(T )) + f(x(t), u(t))dt (12) 0 subject to our state equation constraints

˙x = g(x(t), u(t)) (13) where T is an maximum time we will consider. If T = ∞, then we require infinite time-horizon controls, which we will not consider for simplicity. The interested reader may consult [4, 8, 14] for details on alternate forms of op- timal control problems. For the remainder of this thesis, we will consider a minimization problem of Bolza type:

 Z T  min Ψ(x(T )) + f(x(t), u(t))dt 0 (14)  s.t. ˙x = g(x(t), u(t))

5 We note that when Ψ(x(T )) ≡ 0, then the problem is a Lagrange type prob- lem, which we will also consider. Optimal control problems can be difficult to solve. Fortunately, necessary conditions that an optimal control u∗ must satisfy can be derived. These con- ditions are generally derived using the Hamiltonian of the control problem: The Hamiltonian is defined as

H(x∗(t), u∗(t), λ) = f(x(t), u(t)) + λT g(x(t), u(t)) (15) where f is the cost function in Expression 12, g is the right hand side of the state equations, and λ is a vector of co-state multipliers1. The following theorem is proven in most standard references on optimal control; see e.g., [1,4,8].

Theorem 2.4 (Necessary Conditions for Optimal Control). Consider the Bolza problem given in Expression 14. If u∗ is an optimal control, then

H(x∗(t), u∗(t), λ∗(t)) ≥ H(x∗(t), u(t), λ∗(t)) (16) for all t ∈ [0,T ] and for all admissable inputs u ∈ U, and must satisfy the following conditions:

∂H ∂2H 1. Pontryagin’s Minimim Principle: u˙ (t) = ∂u = 0 and ∂u2 is positive definite,

˙ ∂H T ∂g(x,u) ∂f(x,u) 2. Co-State Dynamics: λ(t) = − ∂x = −λ (t) ∂x + ∂x , ∂H 3. State Dynamics: x˙(t) = ∂λ = g(x, u),

4. Initial Condition: x(0) = x0, and

∂Ψ 5. Transversality Condition: λ(T ) = ∂x (x(T )). ∂H where partial derivatives with respect to vectors, e.g., ∂u , denote a gradient restricted to only those variables with respect to which we differentiate2.

1These co-states can be thought of like Lagrange multipliers [11] in ordinary optimiza- tion theory. 2 ∂g(x,u) Note that ∂x is a Jacobian matrix.

6 2.3 Linear Quadratic Control Example Certain classes of optimal control problems can be solved in closed form. In particular, linear quadratic control (LQC) problems have such special structure. We give an example of such a problem and its solution.

 Z T  min x2 + u2dt  0 (17)  s.t. x˙ = ax + bu   x(0) = x0 The Hamiltonian for this control problem is:

H = x2 + u2 + λ(ax + bu) (18)

∂2H ∂H By the minimum principle we require ∂u2 > 0 and that ∂u = 0. Solving for u, we have: ∂H b = 2u + bλ = 0 =⇒ u = − λ ∂u 2 Applying the co-state dynamics to solve for λ˙ yields: ∂H λ˙ = − = −(2x + aλ) (19) ∂x The state dynamics to solve forx ˙ yields: ∂H b = g(x, u) = ax + bu = ax − λ (20) ∂λ 2 Note, ax + bu is the original right-hand-side of the state dynamics in the problem constraints. Finally, we have the initial condition x(0) = x0 and the transversality condition λ(T ) = 0 yielding a two-point boundary value problem: b x˙ = ax − λ (21) 2 λ˙ = −2x − aλ (22)

x(0) = x0 (23) λ(T ) = 0 (24)

Re-writing in matrix form yields:

x˙  a − b  x = 2 (25) λ˙ −2 −a λ

7 To solve the system of differential equations, we plugged the equations into Mathematica (DSolve) which gave us the following solutions:

√ √ −t a2+b  2t a2+b be −1 + e C1 x = − √ 4 a2 + b √  √ √ √ √  −t a2+b 2 2t a2+b 2 2t a2+b e −a + a + b + ae + a + be C2 + √ (26) 2 a2 + b

√  √ √ √  −t a2+b 2 2t a2+b 2 a2+b e a + a + b − ae e C1 λ = √ 2 a2 + b √ √ −t a2+b  2 a2+b e −1 + e C2 + √ (27) a2 + b

Next, we solve our two unknown constants C1 and C2 by setting x(0) = r (r being some constant) and λ(T ) = 0. Again, using Mathematica, we get the following:

 √  2 −1 + e2T a2+b r C1 = √ √ √ √ (28) a + a2 + b − ae2T a2+b + a2 + be2T a2+b

C2 = r (29) Finally, we can define our x(t) and λ(t) functions:

√ √ −t a2+b  2t a2+b be −1 + e C1 x(t) = − √ 4 a2 + b √  √ √ √ √  −t a2+b 2 2t a2+b 2 2t a2+b e −a + a + b + ae + a + be C2 + √ (30) 2 a2 + b

√  √ √ √  −t a2+b 2 2t a2+b 2 a2+b e a + a + b − ae e C1 λ(t) = √ 2 a2 + b √ √ −t a2+b  2 a2+b e −1 + e C2 + √ (31) a2 + b

8  √  2 −1 + e2T a2+b r C1 = √ √ √ √ (32) a + a2 + b − ae2T a2+b + a2 + be2T a2+b

C2 = r (33)

2.4 Sufficienct Conditions for Optimal Control We note that any solution u∗ to Problem 14 must satisfy the necessary con- ditions set forth in Theorem 2.4. However it is possible that a function u satisfies these necessary conditions, but is not an optimal control. Man- gaserian [10] and Arrow [4] have shown proved conditions under which the necessary conditions in Theorem 2.4 are sufficient for u∗ to be an optimal control.

Theorem 2.5 (Mangaserian’s Theorem). Suppose the admissible pair (x∗, u∗) satisfies all of the relevant continuous-time optimal control problem necessary conditions for OCP, the Hamiltonian H is jointly convex in x and u for all admissible solutions, t0 and tf are fixed, x0 is fixed, and there are no terminal time conditions Ψ[x(T ),T ] = 0. Then any solution of the continuous-time optimal control necessary conditions is a global minimum.

Theorem 2.6 (Arrow’s Theorem). Let (x∗, u∗) be an admissible pair for OCP when the Hamiltonian H is jointly convex in x and u for all admis- sible solutions, t0 and tf are fixed, x0 is fixed, and there are no terminal time conditions Ψ[x(T ),T ] = 0. If there exists a continuous and piecewise T continuously differentiable function λ = (λ1, . . . , λn) such that the following conditions are satisfied:

˙ ∂H λi = − , almost everywhere i = 1, . . . , n (34) ∂xi H(x∗, u, λ(t), t) ≥ H(x∗, u∗, λ, t) for all u ∈ U and all t ∈ [0,T ] (35) Hˆ (x, λ, t) = minH(x, u, λ, t) exists and is convex in x for all t ∈ [0,T ] u∈U (36) then (x∗, u∗) solves the OCP for the given. If Hˆ (x, λ, t) is strictly convex in x for all t, then x∗ is unique (but u∗ is not necessarily unique).

We will use Mangaserian’s theorem in the sequel when we discuss lin- earization of the optimal control problem we consider.

9 2.5 Problem Statement: Relating Optimal Control to Game Theory So far, we have covered the basics of evolutionary game theory and observed how the values of parameters can affect the dynamics of the population (see, e.g., Figure 1). We also studied optimal control and identified the conditions of an optimal control problem. Our next goal is to study the effects of parameters on the population when these parameters are considered to be control functions in an optimal control problem. Specifically, we will study an optimal control problem whose differential state evolution is governed by the replicator dynamics. However, such a problem can be difficult because the replicator dynamics are non-linear. Therefore, we focus on rock-paper- scissors as an example and study methods to overcome the non-linearity.

10 3 Optimal Control of Generalized Rock-Paper- Scissors

Recall the Rock-Paper-Scissors dynamics with payoff matrix given in Expres- sion 6. We consider the case when the parameter a is allowed to vary in time as a control function. For rock-paper scissors, our goal is to find a function 1 1 1  governing a(t) to drive the flow towards the Nash equilibrium 3 , 3 , 3 , over a finite time horizon T , subject to a cost incurred from a large a(t). We formulate our problem as:  " # Z T  12  12  12  2  min x(t) − + y(t) − + z(t) − + a(t) dt  0 3 3 3  T  s.t. x˙ = x ei Ax − xAx (37)   y˙ = y eT Ay − yAy  i  T   z˙ = z ei Az − zAz Expanding the dynamics, we have:

 Z T " 2  2  2 #  1 1 1 2  min x(t) − + y(t) − + z(t) − + σa(t) dt  0 3 3 3  s.t. x˙ = −ax2y − ax2z − axyz + axz − xy + xz  2 2  y˙ = −axy − axyz + axy − ay z + xy − yz   z˙ = −axyz − axz2 − ayz2 + ayz − xz + yz (38) We note that the objective functional is convex in the unknown functions, while the dynamics are (highly) non-linear. We also note that the value of a(t) is not constrained but will be controlled in absolute value by the the objective functional itself; i.e., a(t) will not get too big or too small. With this system, our Hamiltonian will be:

 12  12  12 H = x(t) − + y(t) − + z(t) − + σa2+ 3 3 3

λ1(t)x ˙ + λ2(t)y ˙ + λ3(t)z ˙ (39)

The optimal control was solved using Mathematica (full equation listed in the Appendix). When we set T to be 40, we get the following plot: Figure 2(a) plots our x(t), y(t), and z(t) while (b) plots the value of a from time 0 to 40.

11 (a) (b)

Figure 2: Computed optimal control of rock-paper-scissors evolutionary game: x(0) = 0.5, y(0) = 0.2, z(0) = 0.3, T = 40, σ = 1/100

1 2 We also explored a modified cost functional such that x(t) − 3 + 1 2 1 2 y(t) − 3 + z(t) − 3 is outside of the integral. Thus our problem be- comes:

  2  2  2 Z T  1 1 1 2  min x(t) − + y(t) − + z(t) − + σa(t) dt  3 3 3  0 s.t. x˙ = −ax2y − ax2z − axyz + axz − xy + xz (40)  2 2  y˙ = −axy − axyz + axy − ay z + xy − yz   z˙ = −axyz − axz2 − ayz2 + ayz − xz + yz

with the Hamiltonian:

2 H = σa + λ1(t)x ˙ + λ2(t)y ˙ + λ3z˙ (41)

When we solve an instance of the the optimal control with a computer algebra system, we get the following results:

3.1 Constructing a General Form of the Problem We also explored the method of linearizing the dynamics about the fixed point and then using the results from the LQC to find an optimal solution for a. We are still analyzing the control problem from the Rock-Paper-Scissor

12 (a) (b)

Figure 3: Computed optimal control of rock-paper-scissors evolutionary game varitation: x(0) = 0.5, y(0) = 0.2, z(0) = 0.3, T = 40, σ = 1/100 game with the payoff matrix:

 0 −1 a  A =  a 0 −1 (42) −1 a 0

The resulting replicator dynamics are:

T T  x˙ i = xi ei Ax − x Ax (43) This creates a complex convolution of the control problem:

Z T min Φ(x(T )) + f(x, a)dt 0 (44) T T  s.t. x˙ i = xi ei Ax − x Ax Suppose we express A such that:

 0 −1 a   0 −1 0  0 0 1 A =  a 0 −1 =  0 0 −1 + a 1 0 0 = B + aC (45) −1 a 0 −1 0 0 0 1 0

Denote:

T T  Fi(x) = xi ei Bx − x Bx (46) T T  Gi(x) = xi ei Cx − x Cx (47)

13 Then we can write Problem 44 as: Z T min Φ(x(T )) + f(x, a)dt 0 (48) s.t. x˙ i = F(x) + aG(x) Note, when a = 0, we recover fair rock-paper-scissors dynamics as the (un- controlled) state dynamics. Let

T F(x) = [F1(x),...,Fn(x)] (49) and T G = [G1(x),...,Gn(x)] (50) (n = 3 for Rock, Paper, Scissors) be vector valued functions. The Hamilto- nian for this problem is: H(x, λ, a) ≡ f(x, a) + λT F(x) + aλT G(x) (51) Suppose that: 1 σ f(x, a) = ||x − x∗||2 + a2 (52) 2 2 such that:  1  3 ∗ 1 x =  3  (53) 1 3 is the equilibrium point of . Then: ∂H 1 = σa + λT G(x) = 0 =⇒ a∗ = − λT G(x) (54) ∂a σ If a∗ is an optimal control, then the transversality condition would ensures that: ∂Ψ(T ) 1 ∂Ψ(T )T λ(T ) = =⇒ a∗(T ) = − G(x) (55) ∂x σ ∂x In particular, if Φ(x(T )) ≡ 0, then a(T ) = 0. The resulting necessary condi- tions are the non-linear differential equations:  1  x˙ = F(x) − λT G(x) G(x) (56) σ T ∂F(x)  1  ∂G(x) λ˙ = −(x − x∗)T − λT + λT G(x) λT (57) ∂x σ ∂x

x(0) = x0 (58) ∂Ψ(T ) λ(T ) = (59) ∂x

14 Here: ∂F(x) ∂G(x) , ∂x ∂x are the Jacobians of the vector valued functions F and G. Note, this is true of evolutionary game problem with a parameter a where the payoff matrix can be written as A = B + aC.

3.2 Linearizing the Problem Suppose we linearize the problem about the equilibrium point and suppose that Φ(x(T )) ≡ 0. Let:

∂F(x) J = (60) ∂x x=x∗

∂G(x) H = (61) ∂x x=x∗ (62) Without loss of generality, suppose we translate the problem so that x∗ = 0 (i.e., by replacing x by x − x∗). Then the quasi-linearized control becomes: Z T 1 σ min ||x||2 + a2dt 0 2 2 (63) x˙ = Jx + aHx From this, we compute: 1 a∗ = − λT Hx (64) σ a quadratic form in the state and co-state. The necessary conditions for a∗ to be an optimal control are then:  1  x˙ = Jx − λT Hx Hx (65) σ T  1  λ˙ = −xT − λT J + λT Hx λT H (66) σ

x(0) = x0 (67) λ(T ) = 0 (68) which we can re-write as an almost linear two-point boundary value problem:             x˙ J 0 x 1 T H 0 x = T − λ Hx T (69) λ˙ −In −J λ σ 0 −H λ

x(0) = x0 (70) λ(T ) = 0 (71)

15 This problem, while non-linear preserves, some of the structure of the original problem, as we can see through numerical solution. Note, the optimal control,

(a) (b)

Figure 4: Computed optimal control of partially linearized dynamics x(0) = 1 1 1 .3 − 3 , y(0) = .3 − 3 , z(0) = .4 − 3 shown in Figure 4 (b), has the decaying characteristic of the control function shown in Figure 2 (b) but does not have the oscillation. This is the result of substituting the Jacobians in for the non-linear replicator dynamics.

3.3 A Linear Problem The fundamental problem with this “linearization” is the resulting problem is still non-linear. To solve this, notice that the “linearized dynamics” can be written as: x˙ = Jx + aHx = Jx + H (ax) (72) For the sake of argument, let u = ax be the control vector. Then a (non- equivalent) similar LQC problem to be studied is: Z T 1 σ min ||x||2 + ||u||2dt 0 2 2 (73) x˙ = Jx + Hu Our Hamiltonian for this problem is: 1 σ H = ||x||2 + ||u||2 + λT (Jx + Hu) (74) 2 2 16 From this, in the linearized problem we have:

∂H HT λ = σuT + λT H = 0 =⇒ u = − (75) ∂x σ The necessary conditions for u∗ to be an optimal control are then: 1 x˙ = Jx − HHT λ (76) σ λ˙ = −x − JT λ (77)

x(0) = x0 (78) λ(T ) = 0 (79)

Notice that this time, we have a linear two-point boundary value problem (a Ricatti equation [8]) that can be written as the following:

 ˙   1 T    x J − σ HH x = T (80) λ˙ −In −J λ

Unfortunately, the solution cannot be written in extended form due to length, but will have the following form:

   1 T   x ~ T J − σ HH = C exp T t (81) λ −In −J where C~ T is a constant that allows the expression to satisfy the two boundary conditions: x(0) = x0 and λ(T ) = 0. We illustrate an example solution in Figure 5. Here, we assume the control vector u = [u, v, w]T . Notice, the characteristics of u have no resemblance to the computed characteristics of a from (e.g.) Figure 4. This suggests that while we have constructed a linear control problem, it does not provide us with useful information about our original non-linear control problem. We do note however, that Arrow’s Theorem ensures that this is the true optimal control as linear quadratic control problems of this type satisfy appropriate convexity assumptions. To see why this is, note that by letter u = ax, we are implicitly allowing the control to operate independently in all three dimensions simultaneously; that is, we assume we can influence the payoff of winning the game indepen- dently among the three species (rock, paper and scissors). Clearly these extra degrees of freedom allow the control system to achieve a value state value near the equilibrium point much more quickly than in the constrained dynamics of Problem 44 or 48. In our future work, we discuss a way of recovering these

17 (a) (b)

1 Figure 5: Optimal control of fully linearized dynamics: x(0) = .3− 3 , y(0) = 1 1 .3 − 3 , z(0) = .4 − 3 , σ = 1 constraints within the objective function, thus moving all the non-convexity (and non-linearity) to the objective, rather than the differential equations.

18 4 Future Work and Conclusions

4.1 Future Work As noted, when we compare the two methods of solving our linearization problem, we noticed that the dynamics of the second problem approaches the Nash equilibrium quicker than the first but the resulting control func- tions bear no resemblance to the control computed in the general non-linear system. We can recover the original problem by enforcing constraints of the form:

ui uj = ∀i 6= j =⇒ uixj = ujxi ∀i 6= j (82) xi xj and replacing the objective functional with

Z T  2 1 σ X ui min ||x||2 + dt 0 2 2n xi i (83) x˙ = Jx + Hu

uixj = ujxi ∀i 6= j

The non-linear constraints can be priced out to produce the following (again) non-linear control problem but now with linear dynamics:

Z T  2 1 σ X ui X X 2 min ||x||2 + + ρ (u x − u x ) dt 2 2n x ij i j j i 0 i i i j (84) x˙ = Jx + Hu

As a future research project, it would be interesting to explore the solutions of this problem and compare it to the original non-linear optimization problem. As additional future work, we note that the structure of the almost lin- earized problem (Problem 48) is deceptively simple. We have not given up hope that it is possible to understand the structure of the solutions of Prob- lem 48 analytically and therefore to shed some light on the behavior of the original non-linear problem near the fixed point.

4.2 Conclusion In this thesis, we studied evolutionary game theory and introduced the payoff matrix and replicator dynamics for the rock-paper-scissors game. Next, we surveyed results in optimal control theory and identified the characteristics of an optimal control. We then provided an example for a linear quadratic

19 control problem, a form used later in the thesis. With this knowledge, we formulated an optimal control problem based on the rock-paper-scissor game and solved it numerically. We tried solving a variation of the game by mod- ifying the cost functional and identified with the same problem that global optimality could not be proved. We then attempted to linearize the problem by substituting the Jacobian matrices into the non-linear replicator dynam- ics, but found that the problem was still not linear because of the way the control interacted. We tried a different method of linearization and were able to derive a closed form solution and found some interesting properties of it. However this method altered the characteristics of our original control problem, which led to an entirely new control problem with new solution structure. Finally, we discussed as future work how to recover the original problem after linearization by enforcing certain constraints and pricing them out in the objective functional

20 References

[1] S. Anit¸a, V. Arn˘autu,and V. Capasso, An introduction to optimal con- trol problems in life sciences and economics, Springer, 2011. [2] S. J. Brams, Game theory and politics, Dover Press, 2004. [3] Y. Freund and R. E. Schapire, Game theory, on-line prediction and boosting, Ninth Annual Conference on Computational Learning Theory, 1996, pp. 325–332. [4] T. L. Friesz, Dynamic Optimization and Differential Games, Springer, 2010. [5] C. Griffin, Game Theory: Penn State Math 486 Lecture Notes (v 1.1), http://www.personal.psu.edu/cxg286/Math486.pdf, 2010-2012. [6] J. Hofbauer and K. Sigmund, Evolutionary Games and Population Dy- namics, Cambridge University Press, 1998. [7] , Evolutionary Game Dynamics, Bulletin of the American Math- ematical Society 40 (2003), no. 4, 479–519. [8] D. E. Kirk, Optimal control theory: An introduction, Dover Press, 2004. [9] R. D. Luce and H. Raiffa, Games and decisions: Introduction and critical survey, Dover Press, 1989. [10] O. L. Mangasarian, Sufficient conditions for the optimal control of non- linear systems, SIAM J. Control 4 (1966), no. 139-152. [11] J. E. Marsden and A. Tromba, Vector calculus, 5 ed., W. H. Freeman, 2003. [12] P. Morris, Introduction to Game Theory, Springer, 1994. [13] R. B. Myerson, Game theory: Analysis of conflict, Harvard University Press, 2001. [14] A. Takayama, Mathematical economics, 2 ed., Cambridge University Press, 1985. [15] J. W. Weibull, Evolutionary game theory, MIT Press, 1997. [16] E. C. Zeeman, Population dynamics from game theory, Global Theory of Dynamical Systems, Springer Lecture Notes in Mathematics, no. 819, Springer, 1980.

21 Academic Vita Wente Brian Cai [email protected]

Education The Pennsylvania State University, Spring 2014 Bachelor of Science in Mathematics Honors in Mathematics

Activities Pfizer, Epidemiology Intern, May 2012 - Aug 2012 JPMorgan Chase, Summer Analyst, June 2013 - Aug 2013 Penn State Dance Marathon, PR Photography Captain, Sep 2013 - Mar 2014