Open Thesis Edited.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University Schreyer Honors College Department of Mathematics An Optimal Control Problem Arising from an Evolutionary Game Wente Brian Cai Spring 2014 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Mathematics with honors in Mathematics To be reviewed and approved* by the following: Christopher Griffin Research Associate & Asst. Professor Thesis Supervisor Nate Brown Professor of Mathematics Honors Adviser *Signatures will be on file if this thesis is approved. Abstract This paper is an integrative study of evolutionary game theory and optimal control. We first study the basics of evolutionary game the- ory and introduce the model that we would like to study based on the game of Rock-Paper-Scissors. We then move on to an introduction of optimal control and identify the requirements that need to be fulfilled in order for a solution to become optimal. Finally, we explore differ- ent methods of modeling the Rock-Paper-Scissors game as an optimal control problem and try to find an optimal control that minimizes the cost of the game. Various linearization schemes are attempted and the results are discussed. i Contents List of Figures iii 1 Introduction 1 2 Preliminaries and Literature Review 2 2.1 Evolutionary Game Theory . .2 2.2 Optimal Control Theory . .4 2.3 Linear Quadratic Control Example . .7 2.4 Sufficienct Conditions for Optimal Control . .9 2.5 Problem Statement: Relating Optimal Control to Game Theory 10 3 Optimal Control of Generalized Rock-Paper-Scissors 11 3.1 Constructing a General Form of the Problem . 12 3.2 Linearizing the Problem . 15 3.3 A Linear Problem . 16 4 Future Work and Conclusions 20 4.1 Future Work . 20 4.2 Conclusion . 20 Bibliography 22 ii List of Figures 1 (a) An illustration of an unstable Nash Equilibrium fixed point when a = −5; (b) An illustration of a non-linear center Nash equilibrium fixed point when a = 0; (c) An illustration of a state Nash Equilibrium fixed point when a = 1. .4 2 Computed optimal control of rock-paper-scissors evolutionary game: x(0) = 0:5, y(0) = 0:2, z(0) = 0:3, T = 40, σ = 1=100 . 12 3 Computed optimal control of rock-paper-scissors evolutionary game varitation: x(0) = 0:5, y(0) = 0:2, z(0) = 0:3, T = 40, σ = 1=100............................. 13 4 Computed optimal control of partially linearized dynamics 1 1 1 x(0) = :3 − 3 ; y(0) = :3 − 3 ; z(0) = :4 − 3 ........... 16 5 Optimal control of fully linearized dynamics: x(0) = :3 − 1 1 1 3 ; y(0) = :3 − 3 ; z(0) = :4 − 3 ; σ =1 ............. 18 iii 1 Introduction Evolutionary game theory studies the decision making of games over time, such that dominant strategies survive and get passed on to the next genera- tion while failing strategies are phased out. We start with a large population of players playing a two player game. Each player is assigned a strategy (by genetics), and the resulting payoff from play will determine which strategy was more successful and thus which strategy reproduces. We can then track the growth or decline of strategies in our population and predict the trajec- tory of each strategy. Our goal is to study the dynamics of the game while we dynamically alter the payoffs of the game. In the case of Rock-Paper- Scissors, it is understood that the optimal strategy is to pick rock, paper, or 1 1 1 scissor in equal amounts ( 3 ; 3 ; 3 ). Consequently, this forms a fixed point of the evolutionary game resulting from rock-paper-scissor play. Under certain payoff assumptions in a population with many scissors and many rocks and only some paper, it is clear that the population of scissors will decrease (from lack of reproduction) while the number of paper players increases. For certain classes of payoff, this process will lead to the 1 1 1 fixed point ( 3 ; 3 ; 3 ), while in other cases we may observe cyclic behavior. In the case of convergent behavior, how quickly that fixed point is reached is dependent on the game. The larger the difference between the winning and losing payoff (i.e. the bigger the gap between winning and losing), the quicker the population approach the fixed point strategy. In this thesis, we consider the rock-paper-scissors payoff matrix: 2 0 1 + a −1 3 A = 4 −1 0 1 + a5 a 2 R 1 + a −1 0 Our objective is to dynamically alter the value of a (rather than leaving a static value) in order to drive the population toward the fixed point. However, we assume that altering the value of a may be hard and therefore we attempt to minimize the total cost associated with driving the population to the prescribed fixed point. We study this problem theoretically and numerically showing two solu- tions for the resulting non-linear optimal control problem with interesting properties. We then attempt to linearize the control problem to obtain a closed form solution. Unfortunately, in so doing we lose some of the charac- teristics of the control problem and obtain an entirely new control problem with new solution structure. We then discuss in future work how to recover the original problem while maintaining a linear dynamical system, but intro- ducing non-linearity and non-convexity into the objective function. 1 2 Preliminaries and Literature Review 2.1 Evolutionary Game Theory Game theory is the study of decision making when an agent's payoff is af- fected by the decisions of (many) other agents and is based on the payoff one gets for using certain strategies [2, 3, 5, 9, 12, 13]. For us a game consists of two players using a set, K of n pure strategies denoted by i = 1; 2; : : : ; n. When Player I uses strategy m 2 K and Player II uses strategy n 2 K, there is a payoff amn. With values aij, i; j = 1; 2; :::; n, we can create a n × n payoff matrix A for player I (when we assume the game is symmetric, Player II's payoff matrix is AT ) [12]. Game theory is an extensive subject; the inter- ested reader should consult [2, 3, 5, 9, 12, 13] for details. All standard game theoretic definitions given in this thesis can be found in these references. T A player's mixed strategy is a column vector x = (x1; x2; :::; xn) where each xi is the probability of using strategy i 2 K. Let Sn be the set of all mixed strategies in the game. So if Player I and II uses mixed strategies T x; y 2 Sn, the expected payoff for Player I is x Ay and the expected payoff for Player II is xT AT y. The strategy x is said to be the best response to y if T T x Ay ≥ z Ay 8 z 2 Sn (1) Definition 2.1. A strategy x∗ is a (Nash) equilibrium for the symmetric two player game with payoff matrix A if: ∗ T ∗ T ∗ (x ) Ax ≥ y Ax 8 y 2 Sn (2) In evolutionary game theory, we observe how a population of players changes their strategies over time. Let's consider a large population playing the same symmetric game we defined earlier. Let pi(t) ≥ 0 be the number of P individuals at time t using strategy i and let p(t) = i2K pi(t) > 0 be the to- T tal population. The population state is the vector x(t) = (x1(t); x2(t); :::xn(t)) such that each xi(t) is the proportion of the population using strategy i at T time t. The expected payoff for strategy i 2 K would then be ei Ax and the T th population average payoff would be x Ax. Here ei is the i standard basis vector in Rn, when we have n strategies. Our goal is to derive a differential equation that determines the growth or decline of strategy proportions in the population. In our scenario, the growth and decline of strategies depends on the fitness of the strategy. In other words, the more successful a strategy is, the quicker the population will adopt that strategy over time while poorer strategies die off. The following derivation is due to [15]. We can define the change in the number of individuals using strategy i as T p_i = [β + ei Ax − δ]pi (3) 2 where β ≥ 0 is the initial fitness of individuals in the population and δ ≥ 0 is the death rate for all individuals. By definition, p(t)xi(t) = pi(t). Differ- entiating Equation 3 by t, we get: T T px_ i =p _i − px_ i = [β + ei Ax − δ]pi − [β + x Ax − δ]pxi (4) If we divide both sides by p, we obtain the replicator dynamics: T T x_ i = [ei Ax − x Ax]xi i = 1; :::; n (5) The replicator dynamics tell us that strategies that give a greater than av- erage payoff grow in the population and strategies that give less than the average payoff decline in the population. The rest points of the replicator dynamic are the zeros of the right hand T side of the replicator equation; (i.e., all points x 2 Sn such that ei Ax = xT Ax). A rest point, x, is stable if every neighborhood B of x contains a ◦ neighborhood B such that if we start at x0, then the solution flow '(t; x0) 2 B for all t ≥ 0. In other words, x is stable if whenever a point starts in a neighborhood of x, it will stay contained in that neighborhood over time.