A Dynamic Approach for Load Balancing Dominique Barth, Olivier Bournez, Octave Boussaton, Johanne Cohen

A dynamic approach for load balancing Dominique Barth, Olivier Bournez, Octave Boussaton, Johanne Cohen

To cite this version:

Dominique Barth, Olivier Bournez, Octave Boussaton, Johanne Cohen. A dynamic approach for load balancing. The Third International Workshop on Game Theory in Communication Networks - GAMECOMM 2009, Samson Lasaulce and Yezekael Hayel, Oct 2009, Pise, Italy. inria-00435160

HAL Id: inria-00435160 https://hal.inria.fr/inria-00435160 Submitted on 23 Nov 2009

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Dynamic Approach for Load Balancing

Dominique Barth Olivier Bournez Université de Versailles. Ecole Polytechnique. Laboratoire PRiSM, Laboratoire LIX, 45, avenue des Etats-Unis 91128 Palaiseau Cedex, 78000 Versailles, FRANCE FRANCE Dominique.Barth Olivier.Bournez @prism.uvsq.fr @polytechnique.fr

Octave Boussaton Johanne Cohen LORIA, CNRS. 615 Rue du Jardin Botanique, Université de Versailles, 54602 Villers-Lès-Nancy, 45, avenue des Etats-Unis FRANCE 78000 Versailles, FRANCE Octave.Boussaton Johanne.Cohen @loria.fr @prism.uvsq.fr

ABSTRACT vector. We study how to reach a Nash equilibrium in a load balancing scenario where each task is managed by a selfish agent More formally, we consider a set of n independent tasks and attempts to migrate to a machine which will minimize (players) with weight w1,...,wn respectively. We assume its cost. The cost of a machine is a function of the load on the weights to be positive integers. These tasks are to be it. The load on a machine is the sum of the weights of the associated with E, a set of m resources (machines). Let [n] = jobs running on it. We prove that Nash equilibria can be {1, · · · , n} denote the set of tasks and [m] = {1, · · · , m} the learned on that games with incomplete information, using set of machines. some Lyapunov techniques. An assignment A : [n] → [m] specifies for each task i, the 1. INTRODUCTION machine on which it runs: each task i selects exactly one machine A[i] (also denoted by ai by abuse of notation). We consider in this paper a now classical settings for selfish load balancing, where some dynamical aspects are intro- The load λ of machine ℓ under the assignment A is the duced. We consider tasks or jobs (players) whose objective ℓ the sum of the weights of the tasks running on it: λ = is to minimize their own cost. Such games are sometimes ℓ wi. The cost of machine ℓ corresponds to its fin- called the KP model [21, 28]. We consider a particular learn- i:ai=ℓ ish time, i.e is a function depending on its load denoted ing scenario for tasks in this game that we prove to converge Pby C (λ ). The cost of task i under the assignment A cor- to Nash equilibria of the game. Our learning algorithm, al- ℓ ℓ responds to the cost on machine ai, i.e., its cost is ci = ready considered in [26] for general games, is proved to be Ca (λa ). convergent on these allocation games, through some Lya- i i punov function. We will focus in this paper on linear (affine) cost functions: C (λ) = α λ + β , with α , β ≥ 0. This is sometimes called The settings is the following [21, 28]: weighted tasks or jobs ℓ ℓ ℓ ℓ ℓ the uniformly related machine case [28]. (players) shall be assigned to a set of machines having different speeds so that tasks try to minimize their own cost. We assume from now that all costs are not greater than 1. Their cost under an assignment A depends on the load of This is without loss of generality, since dividing all costs by that machine. The load on a machine under A is the sum of some big enough constant in what follows preserve dynamics. the weights of the jobs running on it. We consider a system in which jobs are unsplittable so that a player’s entire job will go to a machine accordingly to this player’s probability We assume that tasks (players) migrate selfishly without any centralized control and only have a local view of the system. All of the tasks know how many resources (machines) are available. At each elementary step t + 1, they know their current cost and the machine they chose at step t. We study how to reach a (mixed) Nash equilibrium in this load balancing scenario. Each agent i selects a machine (at each time step or instance of the repeated game) using a mixed strategy qi(t) at step t, with qi,ℓ(t) denoting the probability for agent i to select machine ℓ at step t. We are interested in building distributed learning algorithms an ordinary differential equation. This ordinary differential for tasks so that (qi(t))i∈[n] converges towards some (possi- equation turns out to be a replicator equation. A sufficient bly mixed) Nash equilibrium of the system. condition for convergence is proved. No Lyapunov function is established for systems similar to the ones considered in Notice that such games are known to always admit at least this paper there. one pure Nash equilibrium: see [15, 9] for a proof. At least one of them is a global optimal. Replicator equations have been deeply studied in evolutionary game theory [19, 29]. Evolutionary game theory doesn’t 2. RELATED WORK restrict to these dynamics, but considers a whole family of dynamics that satisfy folk theorems in the spirit of Theorem The allocation problems that we consider in this paper can 1. also be formally defined as a routing problem between two nodes connected by parallel edges with possibly different Bounds on the rate of convergence of fictious play dynamics speeds. Each agent as an amount of traffic to map to one of have been established in [17], and in [20] for the best re- the edges such as the load on this edge is as small as possible sponse dynamic. Fictious play has been reproved to be con- [21, 28]. vergent for zero-sum games using numerical analysis methods, or more generally stochastic approximation theory: fic- Price of anarchy, introduced by [21], comparing cost of Nash tious play can be proved to be a Euler discretization of a equilibria to cost of the optimal (social cost) has been inten- certain continuous time process [19]. sively studied on these games: see e.g. [28] for a reference introduction. Evolutionary game theory has been applied to routing problems in the Wardrop traffic model in [14, 13]. Similar games but with more general cost functions have also been considered: see e.g. [28, 4, 7, 23], and all references in A replicator equation for the allocation games considered [28]. Refer to [8, 12, 22] for surveys about results for selfish in this paper has been considered in [1], where a Lyapunov load balancing and selfish routing on parallel links. function is established. The dynamics considered in [1] is not the same as us: we have a replicator dynamic where fitnesses There are a few works considering dynamic versions of these are given by true costs, whereas, [1] considers marginal costs. allocation games, where agents try to learn Nash equilibria, in the spirit of this paper. In [5, 6], the replicator dynamics for particular allocation First, notice that the proof of existence of a pure Nash equi- games have been proved to converge to a pure Nash equilib- libria can be turned into a dynamic: players play in turn, rium by modifying game costs in order to obtain Lyapunov and move to machines with a lower load. Such a strategy can functions. It has also been proved that only pure Nash equi- be proved to lead to a pure Nash equilibrium. Bounds on libria can be learned. the convergence time have been investigated in [9, 10]. Con- sidered algorithms are centralized and require complete information games. Obtained bounds are mostly obtained by bounding the possible variations of suitable potential functions. Since players play in turns, this is often called the 3. GAME THEORETIC SETTINGS Elementary Step System. Other results of convergence in Each one of the n players (tasks) has the same finite set of this model, have been investigated in [16, 23, 25]. actions (machines) E, of cardinality m: machines are known by and available to all of the players. An element of E, i.e. Concerning models that allow concurrent redecisions, we can of [m], is called a pure strategy. mention the followings works. In [11], tasks are allowed n in parallel to migrate from overloaded to underloaded re- Define functions di : j=1 E → [0, 1], 1 ≤ i ≤ n, by: sources. The process is proved to terminate in expected O(log log n+log m) rounds. The analysis is restricted to the Q case of unitary weights, and with identical machines. The di(a1, a2, · · · , an) = ci(j plays strategy aj ∈ E, 1 ≤ j ≤ n) considered process requires a global knowledge: one must (1) determine whether ones load is above or under average. where (a1, · · · , an) is the set of pure strategies played by the player team. A Stochastic version of best-response dynamic avoiding that In our case, di(a1, a2, · · · , an) = Ca ( wj ). latter problem has been investigated in [2]. It is proved i j:aj =ai to terminate in expected O(log log n + m4) rounds for uni- P form tasks, and uniform machines. This has been extended We call it the payoff function, or utility function of player i to weighted tasks and uniform machines in [3]. The ex- and the objective of each player is to minimize his payoff. pected time of convergence to an ǫ-Nash equilibrium is in O(nmW 3ǫ−2) where W denotes the maximum weight of any We assume without loss of generality in this paper that pay- task. offs stay in [0, 1]. It this doesn’t hold, an affine transforma- tion on payoffs yields this property. The dynamic considered in this paper has been studied in [26] for general stochastic games. It is proved in [26] that this As usual, we want to extend the payoff function to mixed dynamic is weakly convergent to some function solution of strategies. To do so, let S denote the simplex of dimension m: S is the set of m-dimensional probability vectors: 4. THE CONSIDERED RANDOMIZED DIS- m TRIBUTED ALGORITHM m S = {q =(q1, · · · , qm) ∈ [0, 1] : qℓ =1}. We consider the following learning algorithm, already con- ℓ=1 sidered in [26]. X An element of S is called a mixed strategy for a player.

Let K = Sn be the space of mixed strategies. Definition 3 (Considered Algorithm). 1. At every time step, each task (player) chooses an action ac-

Payoff functions di defined on pure strategies in equation cording to his current strategy or Action Probability ith (1) can be extended to functions di on the space of mixed Vector (APV). Thus, the player selects machine strategies K as follows: ℓ = ai(k) at instant k with probability qi,ℓ(k). 2. Each player obtains a payoff based on the set of all actions. The rewards to player i at time k is ci(k). di(q1, ..., qn) = E[ci|1 ≤ j ≤ n plays mixed strategy qj ] n = di(j , · · · , jn) × qs,j j1,··· ,jn 1 s=1 s 3. Each player updates his AP V according to the rule: (2) P Q qi(k+1) = qi(k)+b×(1−ci(k))×(e −qi(k)), (3) where (q1, · · · , qn) is the set of mixed strategies played by ai(k) the player team and qj,ℓ denotes the probability for player j where 0 < b < 1 is a parameter and eai(k) is a unit to play strategy ℓ. th vector of dimension m with ai(k) component unity.

Notice, that componentwise, Equation (3) can be rewritten: Definition 1. The n-tuple of mixed strategies (˜q1, · · · , qñ) is said to be a Nash equilibrium (in mixed strategies), if for each i, 1 ≤ i ≤ n, we have: di(˜q1, · · · , q˜i−1, q˜i, q˜i+1, · · · , qñ) ≤ d q , , q , q, q , , q , q . i(˜1 · · · ˜i−1 ˜i+1 · · · ñ) for all ∈ S qi,ℓ(k) − b(1 − ci(k))qi,ℓ(k) if ai 6= l qi,ℓ(k+1) = qi,ℓ(k) + b(1 − ci(k))(1 − qi,ℓ(k)) if ai = l  (4) It is well known that every n-person game has at least one Nash equilibrium in mixed strategies [24]. Decisions made by players are completely decentralized, at

∗ ∗ ∗ n ∗ each time step, player i only needs ci and qi, respectively Let K be deﬁned by K = (S ) , where S = {q ∈ S| q her payoﬀ and strategy, to update his APV. is a m-dimensional probability vector with one component unity}. In other words, the set K∗ denotes the corners of Let Q[k]=(q1(k), · · · , qn(k)) ∈ K denote the state of the simplex K. player team at instant k. Our interest is in the asymp-

∗ totic behavior of Q[k] and its convergence to a Nash Equi- Clearly, K is one to one correspondence with pure strate- librium. Clearly, under the learning algorithm speciﬁed by n E ℓ gies, i.e. with j=1 : any pure strategy corresponds to (3), {Q[k], k ≥ 0} is a Markov process. unit vector e , with ℓth component unity. ℓ Q Observe that this dynamic can also be put in the form

Definition 2. The n-tuple of actions (ã1, · · · , añ) (or Q[k +1] = Q[k] + b · G(Q[k], a[k],c[k]), (5) equivalently the set of strategies (ea˜1 , · · · ,eañ )) is called a where a[k]=(a (k), · · · , aN (k)) denotes the actions selected Nash equilibrium in pure strategies if for each i, 1 ≤ i ≤ n, 1 by the player team at k and c[k]=(c1(k), · · · ,cN (k)) their di(ã1, · · · , a˜i−1, a˜i, a˜i+1, · · · , añ) ≤ di(ã1, · · · , a˜i−1, ai, a˜i+1, ∗ resulting payoffs, for some function G(., ., .) representing the · · · , añ), for all ai ∈ S . updating specified by equation (3), that does not depend on b. Unlike general games, the allocation games considered in Q k ,Qb . this paper always admit Nash equilibria in pure strategies Consider the piecewise-constant interpolation of [ ] ( ), [15, 9]. defined by Qb(t) = Q[k], t ∈ [kb, (k + 1)b), (6) Now the learning problem for the game can be stated as follows: Assume that the game repeats at time k =0, 1, 2,... . where b is the parameter used in (3). At any instant k, let qi[k] be the strategy employed by th b the i player. Let ai[k] and ci[k] be the actual action se- Q (.) belongs to the space of all functions from R into K. lected and the payoff received by player i respectively at These functions are right continuous and have left hand lim- time k (k = 0, 1, 2,... ). Find a decentralized learning al- its. Now consider the sequence {Qb(.) : b > 0}. We are gorithm for the players, that is, design function Ti, where interested in the limit Q(.) of this sequence as b → 0. qi[k +1] = Ti(qi[k], ai[k],ci[k]) such that qi[k] → q˜i as k → +∞ where (˜q1, · · · , qñ) is a Nash equilibrium of the The following is proved in [26], and follows for example from game. [27, theorem 11.2.3]. Proposition 1. The sequence of interpolated processes In particular, solutions are known to satisfy the following {Qb(.)} converges weakly, as b → 0, to Q(.), which is the theorem (an instance of the so-called evolutionary game the- (unique) solution of Cauchy problem ory’s folk theorems) [19]. dQ = φ(Q),Q(0) = Q0 (7) dt Theorem 1 (see e.g. [19]). The following are true for b where Q0 = Q (0) = Q[0], and φ : K → K is given by the solutions of replicator equation (9): φ(Q) = E[G(Q[k], a[k],c[k])|Q[k] = Q], where G is the function in Equation (5). • All Nash equilibria are stationary points. • All strict Nash equilibria are asymptotically stable.

Recall that a family of random variable (Yt) R weakly con- t∈ • All stable stationary points are Nash equilibria. verges to a random variable Y , if E[h(Xt)] converges to E[h(Y )] for each bounded and continuous function h. As dis- cussed in [26], weak convergence is admittedly a very weak Hence, stable stationary points correspond to Nash equilib- type of convergence. ria.

The proof of Proposition 1 is based on weak-convergence However, unstable stationary points can exist. Actually, all methods, non-constructive in several aspects (mainly relies corners of simplex K are stationary points, as well as, from on existence of limit sequences in some functional spaces), the form of (9), more generally any state Q in which all and does not provide error bounds. strategies in its support perform equally well. Such a state Q is not a Nash equilibrium as soon as there is an unused E G Q k , a k ,c k Using (4), we can rewrite [ ( [ ] [ ] [ ])] in the general strategy (i.e. outside of the support) that performs better. case as follows. As limit points of a dynamic correspond to stationary points, from this theorem, we can conclude that the dynamics (9), E[G(Q[k], a[k],c[k])]i,ℓ and hence the learning algorithm when b goes to 0, will have = qi,ℓ(1 − qi,ℓ)(1 − E[ci|Q(k), ai = ℓ]) stable limit points that correspond to Nash equilibria. ′ ′ − ℓ′=6 ℓ qi,ℓ qi,ℓ(1 − E[ci|Q(k), ai = ℓ ]) ′ = qi,ℓ[ ℓ′=6 ℓ qi,ℓ (1 − E[ci|Q(k), ai = ℓ]) However, for general games, there is no convergence in the P ′ ′ − ℓ′=6 ℓ qi,ℓ (1 − E[ci|Q(k), ai = ℓ ]] general case. P ′ ′ = −qi,ℓ ℓ′ (E[ci|Q(k), ai = ℓ] − qi,ℓ E[ci|Q(k), ai = ℓ ]), P (8) We will now show that for our allocation games, there is P always convergence. It will then follow from previous dis- ′ cussions that our considered learning algorithm converges using the fact that 1 − qi,ℓ = ℓ′=6 ℓ qi,ℓ . towards Nash equilibria, i.e. solves the learning problem for P Let hi,ℓ be the expectation of the payoff for i if player i plays allocation games. pure strategy ℓ, and players j 6= i play (mixed) strategy qj . Formally, First, we specialize the dynamics for allocation games. Here, we have hi,ℓ(q1, · · · , qi−1, qi+1, · · · , qn) = E[ci|Q(k), ai = ℓ]. ci = Cai (λai ) = αa wi + αa wj + βa i i j=6 i:aj =ai i Let hi(Q) the mean value of hi,ℓ, in the sense that = αa wi + αa 1a a wj + βa i i Pj=6 i j = i i hi(Q) = qi,ℓ′ hi,ℓ′ (Q). 1 where aj =ai is 1 whenever ajP= ai, 0 otherwise. ′ Xℓ 1 Taking expectations, using E[ aj =ai ] = qj,ai , we get We obtain from (8), E[ci] = αai wi + βai + αai qj,ai wj , E[G(Q[k], a[k],c[k])]i,ℓ = −qi,ℓ(hi,ℓ − hi(Q)). j6 i X= and Hence, the dynamics given by Ordinary Differential Equa- tion (7) is componentwise: hi,ℓ = αℓwi + βℓ + αℓ qj,ℓwj . j=6 i dqi,ℓ X = −q (h − hi(Q)). (9) dt i,ℓ i,ℓ We claim the following. This is a (multi-population) replicator equation, that is to say a well known and studied dynamics in evolutionary game Theorem 2 (Extension of Theorem 3.3 from [26]). theory [19, 29]. In that context, hi,ℓ is interpreted as a fitness Suppose there is a non-negative function of a given game, and hi(Q) is the mean value of hi,ℓ in the above sense. F : K → R such that for some constants wi > 0, for all i, ℓ, Q, There may also exist some unstable solutions of Dynamic ∂F (9): Consider for example a dynamic that stays on some (Q) = wi × hi,ℓ(Q). (10) face of K where some well-performing strategy is never used. ∂q i,ℓ However, such trajectories are extremly unstable: We may Then, for any initial condition, the algorithm will converge expect the underlying stochastic learning algorithm to leave to a stationary point. such face almost-surely.

Proof. We claim that F (.) is a Liapunov function of the In other words, unless a trajectory is started from some un- dynamic, i.e. that F is monotone along trajectories. stable corner of K, we may expect its limit points to be stable stationary points. Indeed, From Theorem 1, stable stationary points are precisely Nash dF (Q(t)) dt equilibria. Hence, if the stochastic algorithm is not started dq = ∂F i,ℓ from a corner, if a function satisfying the hypotheses of The- i,ℓ ∂qi,ℓ dt ∂F orem 2 can be found, the stochastic algorithm will converge = − (Q)qi,ℓ ′ qi,ℓ′ [hi,ℓ(Q) − hi,ℓ′ (Q)] P i,ℓ ∂qi,ℓ ℓ to a Nash equilibrium. = − wih (Q)q ′ q ′ [h (Q) − h ′ (Q)] Pi,ℓ i,ℓ i,ℓP ℓ i,ℓ i,ℓ i,ℓ ′ 2 ′ = − i wi ℓ ℓ′ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ (Q)] We claim that such a function always exists for our allocation P P ′ ′ 2 = − i wi ℓ ℓ′>ℓ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ (Q)] games. Notice that our dynamic is actually quite diﬀerent ≤ 0 P P P P P P from the one considered in [1], for which another potential (11) function has been established: we have a replicator dynamic where ﬁtnesses are given by true costs, whereas, for some In above formula, we used the following fact: reasons, [1] considers marginal costs. Hence, constructions from [1] cannot be used directly. Lemma 1.

′ 2 ′ ℓ ℓ′ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ (Q)] Proposition 2. For allocation games considered in this 2 = ′ qi,ℓqi,ℓ′ [hi,ℓ(Q) − hi,ℓ′ (Q)] paper, the following function F satisﬁes the hypotheses of P Pℓ ℓ >ℓ previous theorem: Proof. Indeed,P P we have m n n 2 α ′ qi,ℓq ′ [hi,ℓ(Q) − hi,ℓ(Q)h ′ (Q)] ℓ 2 ℓ ℓ i,ℓ i,ℓ F (Q) = βℓ qj,ℓwj + ( qj,ℓwj ) ′ 2 ′ 2 = ℓ ℓ′>ℓ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ (Q)] ℓ=1 " j=1 j=1 P P ′ 2 ′ X X X + ℓ ℓ′<ℓ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ (Q)] P P ′ 2 ′ = ℓ ℓ′>ℓ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ (Q)] n P P 2 2 qj,ℓ + ′ ′ qi,ℓqi,ℓ′ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ′ (Q)] +αℓ qj,ℓw (1 − ) P Pℓ ℓ <ℓ j 2 j=1 # (by permutingP P indices) X

′ 2 ′ = ℓ ℓ′>ℓ qi,ℓqi,ℓ [hi,ℓ(Q) − hi,ℓ(Q)hi,ℓ (Q)] Proof. We have ′ ′ 2 ′ + ℓ ℓ<ℓ′ qi,ℓ qi,ℓ[hi,ℓ (Q) − hi,ℓ (Q)hi,ℓ(Q)] ∂F n 2 2 P P (Q) = βℓ′ wi + αℓ′ ( qj,ℓ′ wj )wi + αℓ′ wi − αℓ′ qi,ℓ′ wi ∂qi,ℓ′ j=1 (by changing notation of indices) n P P = wi[β ′ + α ′ ( q ′ wj ) + α ′ wi − α ′ q ′ wi] ℓ ℓ P j=1 j,ℓ ℓ ℓ i,ℓ 2 = ′ q q ′ [h (Q) − h ′ (Q)] = wi[αℓ′ wi + βℓ′ + αℓ′ ( qj,ℓ′ wj )] ℓ ℓ >ℓ i,ℓ i,ℓ i,ℓ i,ℓ P j=6 i = wi × hi,ℓ(Q), P P P as requested. This is clear that a replicator equation dynamic preserves simplex K, and hence that trajectories stay in compact set Notice that the hypothesis of aﬃne cost functions is crucial K. here.

From Liapunov Stability theorem ([18] page 194), asymptotically all trajectories will be in the set K′ = {Q∗ ∈ K : Proposition 3. Suppose for example that cost functions ∗ dF (Q ) were quadratic : dt =0}. 2 ∗ Cℓ(λ) = αℓλ + βℓλℓ + γℓ, dF (Q ) Now, from (11), we know that dt = 0 implies with αℓ, βℓ,γℓ ≥ 0, αℓ 6=0. qi,ℓqi,ℓ′ [hi,ℓ(Q) − hi,ℓ′ (Q)]=0 2 for all i,ℓ,ℓ′, hence that Q∗ is a stationary point of the A function F of class C that satisﬁes (10) for all i, ℓ, Q, dynamic. and general choice of weights (wi)i can not exist.

A solution of Dynamic (9) that starts from some stationary Proof. By Schwartz theorem, we must have points (i.e. for example a corner of K) will clearly stay ∂ ∂F ∂ ∂F ( ) = ( ), invariant. Such a starting point can be unstable. ∂qi′,ℓ′ ∂qi,ℓ ∂qi,ℓ ∂qi′,ℓ′ and hence Since we know from (11) that

∂hi,ℓ ∂hi′,ℓ′ ′ dF (Q(t)) 2 Wi = Wi , = − wi qi,ℓqi,ℓ′ [hi,ℓ(Q) − hi,ℓ′ (Q)] , ∂qi′,ℓ′ ∂qi,ℓ dt i ℓ ℓ′>ℓ ′ ′ X X X for all i,i , ℓ, ℓ , for some constants Wi,W ′ . This is easy to i which means that concerning a point Q(t) that is not a rel- see that this doesn’t hold for general choice of Q and weights ative ǫ-Nash equilibrium, we know that (wi)i in that case. dF Q t ( ( )) ′ 2 dt ≤ − wiqi,ℓqi,ℓ ǫ ′ Coming back to our model (aﬃne costs), we obtain. for some i,ℓ,ℓ .

∗ Theorem 3. For allocation games considered in this pa- Theorem 4. Let Q(0) be any initial condition in K−K . per, for any initial condition in K−K∗, the considered learning algorithm converges to a (mixed) Nash equilibrium. Let F (Q(0)) T . ≤ 2 ∗ (mini wi)ǫ δ We now discuss the time of convergence. Assume From the dynamics(9), this is clear that if qi,ℓ is 0 (respec- ′ tively 1) at time 0, it will stay null (1). Hence, if player i δi(Q) =′ min qi,ℓqi,ℓ ℓ,ℓ :qi,ℓqi,ℓ′ =06 starts with some strategy qi(0) at time 0, we know that at ∗ any time t, its strategy qi(t) will have a support included in stay bounded by below for all i by some positive δ , on time the support of qi(0): t ∈ [0,T ]. q (0) = 0 ⇒ q (t)=0, ∀t ≥ 0, i,ℓ i,ℓ Then a relative ǫ-Nash equilibrium is reached by the dynamic or equivalently, (9) at a time less than T .

∀t ≥ 0, qi,ℓ(t) > 0 ⇒ qi,ℓ(0) > 0. Proof. We have

dF (Q(t)) 2 ∗ This motivates the following deﬁnitions: ≤ −(min wi)ǫ δ , dt i whenever Q(t) is not a relative ǫ-Nash equilibrium. This Definition 4. Given a strategy q ∈ S, we write S(q) ⊂ S implies that ′ for the strategies q whose support is included in the support 2 ∗ F (Q(t)) ≤ F (Q(0)) − (min wi)ǫ δ t, of q: i.e. such that i ′ qi,ℓ > 0 ⇒ qi,ℓ > 0. if Q(t) is not a relative ǫ-Nash equilibrium from time 0 to t.

Since F is non-negative, we get that a relative ǫ-Nash equi- Definition 5. The n-tuple of mixed strategies (˜q1, · · · , qñ) librium must be reached at a time T with is said to be a relative ǫ-Nash equilibrium, if for each i, F (Q(0)) T , 1 ≤ i ≤ n, we have: di(˜q1, · · · , q˜i−1, q˜i, q˜i+1, · · · , qñ) ≤ ≤ 2 ∗ (mini wi)ǫ δ di(˜q1, · · · , q˜i−1, q, q˜i+1, · · · , qñ) + ǫ, forall q ∈ S(˜qi). which yields the theorem.

In other words, a n-tuple of mixed strategies (˜q1, · · · , q˜n) 5. CONCLUSION is not a relative ǫ-Nash equilibrium, if there exists some i, In this paper we considered a classical settings [21] for selﬁsh 1 ≤ i ≤ n, and some q ∈ S(˜qi) with load balancing, but extended with some dynamical aspects.

di(˜q1, · · · , q˜i−1, q, q˜i+1, · · · , qñ) We considered a learning algorithm proposed by [26]. We < di(˜q1, · · · , q˜i−1, q˜i, q˜i+1, · · · , qñ) − ǫ. proved that this learning algorithm learns mixed Nash equilibria of the game, extending several results of [26]. Clearly, if there exists such an i, and q ∈ S(˜qi), there must ′ exist some pure strategies ℓ, and ℓ in S(˜qi) with To do so, we proved that the learning algorithm is asymptotically equivalent to an ordinary differential equation, which di(˜q1, · · · , q˜i−1,eℓ, q˜i+1, · · · , qñ) turns out to be a replicator equation. Using a folk theo- < di(˜q1, · · · , q˜i−1,eℓ′ , q˜i+1, · · · , qñ) − ǫ, rem from evolutionary game theory, one knows that, if the that is to say, such that dynamics converges, it will be towards some Nash equilibria. We proved using a Liapunov function argument that hi,ℓ < hi,ℓ′ − ǫ, the dynamics converges in our considered settings. We also showed that it is actually possible to provide a general esti- and hence mation that one could call an epsilon convergence time for 2 2 [hi,ℓ(Q) − hi,ℓ′ (Q)] ≥ ǫ . any suitable starting condition. 6. REFERENCES [15] D. Fotakis, E.K. Spyros Kontogiannis, [1] E. Altman, Y. Hayel, and H. Kameda. Evolutionary M. Mavronicolas, and P. Spirakis. The Structure and dynamics and potential games in non-cooperative Complexity of Nash Equilibria for a Selfish Routing routing. In Wireless Networks: Communication, Game. Automata, Languages and Programming: 29th Cooperation and Competition (WNC3 2007), 2007. International Colloquium, ICALP 2002, Málaga, [2] Petra Berenbrink, Tom Friedetzky, Leslie Ann Spain, July 8-13, 2002: Proceedings, 2002. Goldberg, Paul Goldberg, Zengjian Hu, and Russell [16] Paul W. Goldberg. Bounds for the convergence rate of Martin. Distributed selfish load balancing. In SODA randomized local search in a multiplayer ’06: Proceedings of the seventeenth annual load-balancing game. In PODC ’04: Proceedings of the ACM-SIAM symposium on Discrete algorithm, pages twenty-third annual ACM symposium on Principles of 354–363, New York, NY, USA, 2006. ACM. distributed computing, pages 131–140, New York, NY, [3] Petra Berenbrink and Oliver Schulte. Evolutionary USA, 2004. ACM. equilibrium in bayesian routing games: Specialization [17] C. Harris. On the Rate of Convergence of and niche formation. In ESA, pages 29–40, 2007. Continuous-Time Fictitious Play. Games and [4] I. Caragiannis, M. Flammini, C. Kaklamanis, Economic Behavior, 22(2):238–259, 1998. P. Kanellopoulos, and L. Moscardelli. Tight bounds [18] Morris W. Hirsch, Stephen Smale, and Robert for selfish and greedy load balancing. Proceedings of Devaney. Differential Equations, Dynamical Systems, the 33rd International Colloquium on Automata, and an Introduction to Chaos. Elsevier Academic Languages, and Programming (ICALP’06), pages Press, 2003. 311–322. [19] J. Hofbauer and K. Sigmund. Evolutionary game [5] Pierre Coucheney, Corinne Touati, and Bruno Gaujal. dynamics. Bulletin of the American Mathematical Fair and efficient user-network association algorithm Society, 4:479–519, 2003. for multi-technology wireless networks. In Proc. of the [20] J. Hofbauer and S. Sorin. Best response dynamics for 28th conference on Computer Communications continuous zero-sum games. Discrete and Continuous miniconference (INFOCOM), 2009. Dynamical Systems-Series B, 6(1), 2006. [6] Pierre Coucheney, Corinne Touati, and Bruno Gaujal. [21] E. Koutsoupias and C. Papadimitriou. Worst-case Selection of efficient pure strategies in allocation equilibria. In Symposium on Theoretical Computer games. In Proc. of the International Conference on Science (STACS’99), pages 404–413, Trier, Germany, Game Theory for Networks, 2009. 4–6March 1999. [7] A. Czumaj, P. Krysta, and B. Vöcking. Selfish traffic [22] Elias Koutsoupias. Selfish task allocation. EATCS allocation for server farms. In ACM Symposium on Bulletin, 81, 2003. Theory of Computing (STOC). ACM Press New York, [23] L. Libman and A. Orda. Atomic Resource Sharing in NY, USA, 2002. Noncooperative Networks. Telecommunication [8] Artur Czumaj. Handbook of Scheduling: Algorithms, Systems, 17(4):385–409, 2001. Models, and Performance Analysis, chapter Selfish [24] John F. Nash. Equilibrium points in n-person games. Routing on the Internet. CRC Press, 2004. Proc. of the National Academy of Sciences, 36:48–49, [9] E. Even-Dar, A. Kesselman, and Y. Mansour. 1950. Convergence time to Nash equilibria. 30th [25] A. Orda, R. Rom, and N. Shimkin. Competitive International Conference on Automata, Languages routing in multiuser communication networks. and Programming (ICALP), pages 502–513, 2003. IEEE/ACM Transactions on Networking (TON), [10] Eyal Even-Dar, Alexander Kesselman, and Yishay 1(5):510–521, 1993. Mansour. Convergence time to Nash equilibrium in [26] M.A.L. Thathachar P.S. Sastry, V.V. Phansalkar. load balancing. ACM Transactions on Algorithms, Decentralized learning of Nash equilibria in 3(3), 2007. multi-person stochastic games with incomplete [11] Eyal Even-Dar and Yishay Mansour. Fast convergence information. IEEE transactions on system, man, and of selfish rerouting. In SODA ’05: Proceedings of the cybernetics, 24(5), 1994. sixteenth annual ACM-SIAM symposium on Discrete [27] D.W. Stroock and SRS Varadhan. Multidimensional algorithms, pages 772–781, Philadelphia, PA, USA, Diffusion Processes. Springer, 1979. 2005. Society for Industrial and Applied Mathematics. [28] Berthold Vöcking. Algorithmic Game Theory, chapter [12] Feldmann, Gairing, Lucking, Monien, and Rode. Selfish Load Balancing. Cambridge University Press, Selfish routing in non-cooperative networks: A survey. 2007. In MFCS: Symposium on Mathematical Foundations [29] Jörgen W. Weibull. Evolutionary Game Theory. The of Computer Science, 2003. MIT Press, 1995. [13] S. Fischer, H. Räcke, and B. Vöcking. Fast convergence to Wardrop equilibria by adaptive sampling methods. Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 653–662, 2006. [14] S. Fischer and B. Vocking. On the Evolution of Selfish Routing. Algorithms–ESA 2004: 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004, Proceedings, 2004.