The Pennsylvania State University TOPICS in LEARNING AND

The Pennsylvania State University TOPICS IN LEARNING AND INFORMATION DYNAMICS IN GAME THEORY A Dissertation in Mathematics by Matthew Young © 2020 Matthew Young Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2020 The dissertation of Matthew Young was reviewed and approved by the following: Andrew Belmonte Professor of Mathematics Dissertation Advisor, Chair of Committee Christopher Griffin Professor of Operations Research Jan Reimann Professor of Mathematics Sergei Tabachnikov Professor of Mathematics Syed Nageeb Ali Professor of Economics Alexei Novikov Professor of Mathematics Chair of Graduate Program ii Abstract We discuss the role of learning and information in game theory, and design and investigate four models using different learning mechanisms. The first consists of rational players, one of which has the option of purchasing information about the other player’s strategy before choosing their own. The second is an agent-based public goods model where players incrementally adjust their contribution levels after each game they play. The third is an agent-based rock, paper, scissors model where players choose their strategies by flipping cards with a win-stay lose-shift strategy. The fourth is a machine learning model where we train adversarial neural networks to play arbitrary 2 × 2 games together. We study various aspects of each of these models and how the different learning dynamics in them influence their behavior. iii Table of Contents List of Figures vii List of Tables xi Acknowledgments xii Chapter 1 The Role of Learning in Game Theory 1 1.1 Introduction . 1 1.2 Rational learning after purchasing information . 3 1.3 Agent-based models with simple algorithm learning rules . 4 1.4 Machine learning by adversarial neural networks . 5 Chapter 2 Oracle Games 8 2.1 Introduction . 8 2.2 Preliminary Considerations . 14 2.2.1 Motivating Examples . 15 2.2.2 Definitions . 22 2.3 Fundamental Properties of Oracle Games . 23 2.4 Main Results . 26 2.5 Harmful Information . 34 2.6 Helpful Information . 37 2.7 Multiple Equilibria . 38 2.8 Discussion . 39 Chapter 3 Fair Contribution in a Nonlinear Stochastic Public Goods Model 42 3.1 Introduction . 42 3.1.1 Public Goods Games . 42 3.1.2 Cooperative Behavior in Biology . 44 3.1.3 Modifications of Public Goods Models . 45 3.1.4 Fairness . 47 3.1.5 Our model . 49 iv 3.2 Definition of the Public Goods Game . 49 3.3 Population Dynamics . 51 3.4 Numerical Simulations . 56 3.5 Dynamics in the Presence of a Permanent Freeloader . 68 3.5.1 Multiple Permanent Freeloaders . 70 3.6 Discussion . 72 Chapter 4 Population dynamics in a rock paper scissors model with restricted strategy transitions 74 4.1 Introduction . 74 4.1.1 Win-Stay Lose-Shift . 74 4.1.2 A Biological Basis for a Restriction to Two Strategies . 76 4.1.3 A Biological Basis for Win-Stay Lose-Shift . 77 4.1.4 Our Model . 79 4.2 Discrete Model . 80 4.3 Extinction in a Restricted Transition Population . 83 4.4 Continuous Models . 85 4.5 Discussion . 92 Chapter 5 Neural Networks Playing Games 95 5.1 Introduction . 95 5.1.1 Machine Learning . 95 5.1.2 Game theory and adversarial networks . 97 5.1.3 Our model . 99 5.2 The Model . 99 5.2.1 Construction . 99 5.2.2 Errors . 102 5.2.3 Temptation Games . 108 5.3 Pruning . 111 5.4 Network Comparisons . 114 5.4.1 Euclidean Distance in Weight Space . 114 5.4.2 Polar Projection . 118 5.4.3 Paths Between Networks . 121 5.4.4 Correlation metrics . 124 5.5 Level-k hierarchy in networks . 128 5.6 Discussion . 133 5.6.1 Summary . 133 5.6.2 Extensions of this model . 134 5.6.3 Applications . 135 v Chapter 6 Conclusion 137 6.1 Discussion . 137 6.1.1 Types of Information . 137 6.1.2 Benefit to players . 139 6.1.3 Convergence to a Nash equilibrium . 141 6.1.4 Intelligence . 142 6.2 Future Research . 143 Appendix Master Equation to Langevin Derivation from Chapter 4 146 Bibliography 150 vi List of Figures 2.1 Matching Pennies Extensive form game . 15 2.2 Game tree for the motivating construction of Oracle Games . 16 2.3 Game tree for the standard construction of Oracle Games . 17 √ 2.4 Payment√ in equilibrium√ shown for oracle functions I(x) = (a) x + 1 − 1; (b) x; (c) 2 x. ............................... 19 2.5 Amount paid by Player A at the equilibrium k (above),√ and the resulting response rate I (below) as functions of k when I(x) = kx. 20 2.6 Illustration of the construction to prove Proposition 2 (see text): (top) original given oracle function I(x); (middle) nondecreasing equivalent oracle function J1(x); (bottom) final nondecreasing, concave oracle function J(x). ..................................... 25 2.7 Harmful Information Extensive form game . 35 √ 2.8 Ea as a function of k when I = kx .................... 37 √ 3.1 Graph of the benefit function b(C) = 400C, with equilibrium and socially optimal values marked in the case where m = 2. 51 3.2 Examples of the Nash equilibrium value Ce plotted as a function of α, for three different return values R = 0.4 (bottom), 0.7 (middle), 0.95 (top). 52 3.3 Representation of a single round of play. 53 3.4 Simulation with m = n = 4 .......................... 55 vii 3.5 Numerical simulation of the model with n = 100 players and group size m = 10, showing the distribution of contributions C of each player around the fairpoint f: t = 0 (top), t = 800 (middle), t = 1600 (bottom). 56 3.6 Average contribution cavg for population of n = 100 players over time, shown for: m = 10 (top), m = 50 (middle), m = 100 (bottom). 57 3.7 Examples of five individual players’ ci over time, in a population of n = 50 players and group size m = 10......................... 58 3.8 E1 behavior . 58 3.9 E1 over time shown for three different initial conditions with n = 100, each with different subgroup sizes: m = 10, 50, 80. 59 3.10 The initial states at t= 0 (left) and at t = 1600 for each group size (right) from the simulation used in Fig. 3.9. 59 3.11 Average decrease rate for E1 as a function of m. 60 3.12 The average value of ci for populations of n = 100 players with one permanent freeloader, as a function of group size m. Each population was numerically simulated and measured ten times at regular intervals between√ t = 200,000 and 300,000. We fixed d = 10,000, and for each m let ui = 40000mC − ci, which gives f = 10, 000 even when m changes. 68 3.13 The average value of ci for populations of n = 100 players with two permanent freeloaders, as a function of m. In this case, two transitions are observed (see text). 71 4.1 Illustration of the population dynamics for one time step . 81 4.2 Trajectory of one simulation for N = 8 . 84 4.3 The average time for strategies played to reach a monoculture (average extinction time) as a function of the total number of cards N, with A = 1/2,B = 1/2,C = 0............................ 85 4.4 Heat map of the discrete system on a grid for (a) N = 40; (b) N = 200. 86 4.5 Trajectory for the deterministic system, with equally spaced initial conditions around the plane, for A = 1/2...................... 87 viii 4.6 Parameterized curve of the x and y coordinates of the equilibrium point as functions of A, with A = 0 at (0, 0), and A = 1 at (1, 0) . 87 4.7 Individual system trajectories for total population size N = 800 cards, with the same initial conditions comparing (a) the discrete stochastic system and (b) the Fokker-Planck system. 90 4.8 Heat maps for the discrete and continuous systems with N = 40 after 1,000,000 steps . 91 4.9 Heat map of the difference in the discrete system and continuous system 91 4.10 Extinction Times of discrete system and Fokker-Planck system on a log scale plot, as well as lines of best fit. 92 4.11 Trajectory for Fokker-Planck system with N = 8 . 93 5.1 Network Playing Prisoner Dilemma. Blue lines are positive weights . 102 5.2 Error of a network over time . 103 5.3 Coordination behavior over time . 105 5.4 Error (y) as a function of number of hidden layers (x) . 107 5.5 Error (y) as a function of neurons per hidden layer (x) . 108 5.6 Probability A of choosing strategy sA as a function of x in temptation game 109 5.7 Probability A of choosing strategy sA as a function of x in temptation game after special training . 110 5.8 Log-Log plot of error (y) as a function of neurons pruned (x) . 112 5.9 Histogram of pairwise Euclidean distances for 10 networks . 116 5.10 Error for network pr as a function of r . 117 5.11 Examples of paths in R2 (left) and their corresponding polar projections (right). 119 5.12 Projection of a network path created via backpropagation (left), and a random walk (right) . 120 ix 5.13 Polar projection of three network paths, with θ0 = 0, and paths colored based on the error of the adjacent networks . 123 5.14 Errors of network paths from Figure 5.13 . ..

Load more