Mean Field Asymptotic of Markov Decision Evolutionary Games and Teams
Total Page:16
File Type:pdf, Size:1020Kb
Mean Field Asymptotic of Markov Decision Evolutionary Games and Teams Hamidou Tembine, Jean-Yves Le Boudec, Rachid El-Azouzi, Eitan Altman ¤† Abstract as the transition probabilities of a controlled Markov chain associated with each player. Each player wishes to maxi- We introduce Markov Decision Evolutionary Games with mize its expected fitness averaged over time. N players, in which each individual in a large population This model extends the basic evolutionary games by in- interacts with other randomly selected players. The states troducing a controlled state that characterizes each player. and actions of each player in an interaction together de- The stochastic dynamic games at each interaction replace termine the instantaneous payoff for all involved players. the matrix games, and the objective of maximizing the ex- They also determine the transition probabilities to move to pected long-term payoff over an infinite time horizon re- the next state. Each individual wishes to maximize the to- places the objective of maximizing the outcome of a matrix tal expected discounted payoff over an infinite horizon. We game. Instead of a choice of a (possibly mixed) action, a provide a rigorous derivation of the asymptotic behavior player is now faced with the choice of decision rules (called of this system as the size of the population grows to infin- strategies) that determine what actions should be chosen at ity. We show that under any Markov strategy, the random a given interaction for given present and past observations. process consisting of one specific player and the remaining This model with a finite number of players, called a population converges weakly to a jump process driven by mean field interaction model, is in general difficult to an- the solution of a system of differential equations. We char- alyze because of the huge state space required to describe acterize the solutions to the team and to the game prob- all the of players. Then taking the asymptotics as the num- lems at the limit of infinite population and use these to con- ber of players grows to infinity, the whole behavior of the struct almost optimal strategies for the case of a finite, but population is replaced by a deterministic limit that repre- large, number of players. We show that the large popu- sents the system’s state, which is fraction of the population lation asymptotic of the microscopic model is equivalent at each individual state that use a given action. to a (macroscopic) Markov decision evolutionary game in In this paper we study the asymptotic dynamic behav- which a local interaction is described by a single player ior of the system in which the population profile evolves in against a population profile. We illustrate our model to de- time. For large N; under mild assumptions (see Section 3), rive the equations for a dynamic evolutionary Hawk and the mean field converges to a deterministic measure that Dove game with energy level. satisfies a non-linear ordinary differential equation for un- der any stationary strategy. We show that the mean field interaction is asymptotically equivalent to a Markov deci- 1. Introduction sion evolutionary game. When the rest of the population uses a fixed strategy u; any given player sees an equivalent We consider a large population of players in which fre- game against a collective of players whose state evolves ac- quent interactions occur between small numbers of chosen cording to the ordinary differential equation (ODE) which individuals. Each player is thus involved in infinitely many we explicitly compute. In addition to providing the ex- interactions with other randomly selected players. Each in- act limiting asymptotic, the ODE approach provides tight teraction in which a player is involved can be described as approximations for fixed large N. The mean field asymp- one stage of a dynamic game. The state and actions of the totic calculations for large N for given choices of strategies players at each stage determine an immediate payoff (also allows us to compute the equilibrium of the game in the called fitness in behavioral ecology) for each player as well asymptotic regime. ¤This work was partially supported by the INRIA ARC Program: Pop- Related Work. Mean field interaction models have al- ulations, Game Theory, and Evolution (POPEYE) and by an EPFL PhD ready been used in standard evolutionary games in a com- internship grant. pletely different context: that of evolutionary game dynam- † H. Tembine and R. El-Azouzi are with University of Avignon, ics (such as replicator dynamics) see e.g. [7] and refer- LIA/CERI, France, J. Y. Le Boudec is with EPFL, Laboratory for Com- puter Communications and Applications, Lausanne, Switzerland. E. Alt- ences therein. The paradigm there has been to associate man is with INRIA, MAESTRO Group, Sophia-Antipolis, France relative growth rate to actions according to the fitness they achieved, then study the asymptotic trajectories of the state Each player such that j 2 BN(t) takes part in a one-shot of the system, i.e. the fraction of users that adopt the dif- event at time t, as follows. First, the player chooses an ferent actions. Non-atomic Markov Decision Evolutionary action a in the finite set A with probability uq (ajs) where Games have been applied in [8] to firm idiosyncratic ran- (q;s) is the current player state. The stochastic array u is dom shocks and in [1] to cellular communications. the strategy profile of the population, and uq is the strategy Structure. The remainder of this paper is organized as of subpopulation q: A vector of probability distributions u follows. In next section we present the model assumptions which depend only on the type of the player and its internal and notations. In Section 3 we present some convergence state is called stationary strategy. N results of the ODE in the random number of interacting Second, say that B (t) = ( j1;:::; jk). Given the actions players. In Section 4 a resource competition between ani- a j1 ;:::;a jk drawn by the k players, we draw a new set of mals with two types of behaviors and several states is pre- internal states (s0 ;:::;s0 ) with probability LN (k;~m), j1 jk q;s;a;s0 sented. All the sketch of proofs are given in Appendix. Section 5 concludes the paper. where q = (q j1 ;:::;q jk ); s = (s j1 ;:::;s jk ) a = (a ;:::;a ); s0 = (s0 ;:::;s0 ) j1 jk j1 jk 2. Model description Then the collection of k players makes one synchronized transition, such that 2.1. Markov Decision Evolutionary Process With 1 N Players SN(t + ) = s0 i = 1;:::;k ji N ji N 1 N N We consider the following model, which we call Markov Note that S j (t + N ) = S j (t) if j is not in B (t). Decision Evolutionary Game with N players. It can easily be shown that this form of interaction has ² There are N 2 N players. following properties: (1) XN is Markov and (2) players can ² Each player has its own state. A state has two compo- be observed only through their state. nents: the type of the player and the internal state. The type The model is entirely specified by the probability distri- is a constant during the game. The state of player j at time butions JN, the Markov transition kernels LN and the strat- N N N N t is denoted by Xj (t) = (q j;S j (t)) where q j is the type. egy profile u. In this paper, we assume that J and L are The set of possible states X = f1;:::;Qg £ S is finite. fixed for all N, but u can be changed and does not depend N 1 2 on N (though it would be trivial to extend our results to ² Time is discrete, taking values in N := f0; N ; N ;:::g: ² The global detailed description of the system at time strategies that depend on N, but this appears to be unneces- N N N sary complication). We are interested in large N. t is X (t) = (X1 (t);:::;XN (t)). Define MN(t) to be the current population profile i.e It follows from our assumptions that N 1 N N M (t) = ∑ 1 N : At each time t; M (t) is in N x N j=1 fXj (t)=xg 1. M (t) is Markov. the finite set f0; 1 ; 2 ;:::;1g]X : and MN (t) is the frac- N N q;s 2. for any fixed j 2 f1;:::;Ng, (XN(t);MN(t)) is tion of players who belong to population of type q (also j Markov. This means that the evolution of one spe- called subpopulation q) and have internal state s: Also let N ¯ N N N cific player Xj (t) depends on the other players only Mq = N ∑s2S Mq;s(t) be the size of subpopulation q (inde- N pendent of t by hypothesis). We do not make any specific through the occupancy measure M (t): ¯ N Mq hypothesis on the ratios N as N gets large (it may be con- 2.2. Payoffs stant or not, it may tend to 0 or not). ² Strategies and local interaction: At time slot t; an We consider two types of instantaneous payoff and one N ordered list B (t), of players in f1;2;:::;Ng, without rep- discounted payoff: etition, is selected randomly as follows. First we draw a N ² Instant Gain: This is the random gain G j (t) obtained random number of players K(t) such that by one player whenever it is involved in an event at time t.