arXiv:0906.2094v2 [math.PR] 21 Oct 2010 a oicniet lyhscmoeto aheulbimu equilibrium Nash a [ of in component Aumann his by play to out incentive pointed no first has was bec quickly as equilibria Accordingly, game’s n question. difficult. a the identifying debated with hence, actively exponentially and, increases an ers games remains most place of first complexity the in gies becom also has actually it to but language design, d network such and in biology applications it economics, found In theory theory. game game noncooperative noncooperative has the of among concepts remained solution have nent refinements its and equilibrium Nash 1359–1388 4, No. DOI: 20, Probability Vol. Applied 2010, of Annals The erhCuclproject (NewCom++ Council search EU-IST-NoE-FP6-2007-216715 and (Net-ReFound) i ieeta equation. differential equilibriu Nash tic function, Lyapunov learning, exponential

c aiainadtpgahcdetail. typographic 1359–1388 and 4, the pagination by No. published 20, Vol. article 2010, original Statistics the Mathematical of of reprint Institute electronic an is This nttt fMteaia Statistics Mathematical of Institute e od n phrases. and words Key equilibr to arrive may players how and why of issue the Still, .Introduction. 1. 1 2 M 00sbetclassifications. subject 2000 AMS 2009. September revised 2009; June Received upre npr yteErpa omsinGat EU-FET- Grants Commission European the by part in Supported upre yteEprkinFudto fAhn,Greece. Athens, of Foundation Empirikeion the by Supported 10.1214/09-AAP651 yPnytsMertikopoulos Panayotis By H MREC FRTOA EAIRI THE IN BEHAVIOR RATIONAL OF EMERGENCE THE hs eut ntecs fcneto games. congestion illus of by case analysis the stoch in our are results equilibria complement these Nash We strict stable. game’s asymptotically the fi eventua that iteratively) we and (even extinct magnitude, dominated are perturbations’ which i the strategies that of evolution dynamics of Irrespective replicator approach theory. shocks” the “aggregate of the version from different stochastic new sch this a perturbations, environment to random ever-changing to an subject to are adapt payoffs game’s to order in scheme ing RSNEO TCATCPERTURBATIONS STOCHASTIC OF PRESENCE esuyrpae ae hr lyr s nepnnillea exponential an use players where games repeated study We Kapodistrias vrsnei a nrdcdi [ in introduced was it since Ever describe smttcsohsi tblt,cneto ae,domin games, congestion stability, stochastic Asymptotic 2010 , nvriyo Athens of University hsrpitdffr rmteoiia in original the from differs reprint This . ope gn neatosi hs fields. these in interactions agent complex n.70/3/8831). (no. rmr 12,6J0 eodr 12,60H10. 91A22, secondary 60J70; 91A26, Primary in h naso ple Probability Applied of Annals The 1 1 , 2 n rsL Moustakas L. Aris and ,rpiao yais stochas- dynamics, replicator m, 19 n h re Re- Greek the and ) msprohibitively omes ,tento fa of notion the ], l become lly m leads eme r game ary astically un o only not turn, s dthat nd vretpc as topics iverse me fplay- of umber trating quite s fthe If . h standard the e FP6-IST-034413 fe l,the all, After otpromi- most 3 rn- ls eis he nless player a ], a strate- ial , 1 ance, 2 P. MERTIKOPOULOS AND A. L. MOUSTAKAS convinced that all other players will play theirs. And if the game in question has multiple Nash equilibria, this argument gains additional momentum: in that case, even players with unbounded deductive capabilities will be hard- pressed to choose a . From this point of view, rational individuals would appear to be more in tune with Aumann’s notion of a where subjective beliefs are also taken into account [3]. Nevertheless, the seminal work of Maynard Smith on animal conflicts [15] has cast Nash equilibria in a dif- ferent light because it unearthed a profound connection between evolution and rationality: roughly speaking, one leads to the other. So, when different species contend for the limited resources of their habitat, evolution and nat- ural selection steer the ensuing conflict to an equilibrial state which leaves no room for irrational behavior. As a consequence, instinctive “fight or flight” responses that are deeply ingrained in a species can be seen as a form of rational behavior, acquired over the species’ evolutionary course. Of course, this evolutionary approach concerns large populations of dif- ferent species which are rarely encountered outside the realm of population biology. However, the situation is not much different in the case of a finite number of players who try to learn the game by playing again and again and who strive to do better with the help of some learning algorithm. Therein, evolution does not occur as part of a birth/death process; rather, it is a byproduct of the players’ acquired experience in playing the game—see [6] for a most comprehensive account. It is also worth keeping in the back of our mind that in some applications of , “rationality” requirements precede evolution. For example, recent applications to network design start from a set of performance aspira- tions (such as robustness and efficiency) that the players (network devices) seek to attain in the network’s equilibrial state. Thus, to meet these re- quirements, one has to literally reverse-engineer the process by finding the appropriate game whose equilibria will satisfy the players—the parallel with being obvious. In all these approaches, a fundamental selection mechanism is that of the replicator dynamics put forth in [23] and [22] which reinforces a strategy proportionately to the difference of its payoff from the mean (taken over the species or the player’s strategies, depending on the approach). As was shown in the multi-population setting of Samuelson and Zhang [21] (which is closer to learning than the self-interacting single-population scenaria of [23] and [22]), these dynamics are particularly conducive to rationality. Strategies that are suboptimal when paired against any choice of one’s adversaries rapidly become extinct, and in the long run, only rationally admissible strategies can survive. Even more to the point, the only attracting states of the dynamics turn out to be precisely the (strict) Nash equilibria of the game—see [11] for a masterful survey. RATIONALITY AND STOCHASTIC PERTURBATIONS 3

We thus see that Nash equilibria arise over time as natural attractors for rational individuals, a fact which further justifies their prominence among noncooperative solution concepts. Yet, this behavior is also conditional on the underlying game remaining stationary throughout the time horizon that it takes players to adapt to it—and unfortunately, this stationarity assump- tion is rarely met in practical applications. In biological models, for exam- ple, the reproductive fitness of an individual may be affected by the ever- changing weather conditions; in networks, communication channels carry time-dependent noise and interference as well as signals; and when players try to sample their strategies, they might have to deal with erroneous or imprecise readings. It is thus logical to ask: does rational behavior still emerge in the presence of stochastic perturbations that interfere with the underlying game? In evolutionary games, these perturbations traditionally take the form of “aggregate shocks” that are applied directly to the population of each phenotype. This approach by Fudenberg and Harris [5] has spurred quite a bit of interest and there is a number of features that differentiate it from the deterministic one. For example, Cabrales showed in [4] that dominated strategies indeed become extinct, but only if the variance of the shocks is low enough. More recently, the work of Imhof and Hofbauer [8, 10] revealed that even equilibrial play arises over time but again, conditionally on the variance of the shocks. Be that as it may, if one looks at games with a finite number of players, it is hardly relevant to consider shocks of this type because there are no longer any populations to apply them to. Instead, the stochastic fluctuations should be reflected directly on the stimuli that incite players to change their strategies: their payoffs. This leads to a picture which is very different from the evolutionary one and is precisely the approach that we will be taking.

Outline of results. In this paper, we analyze the evolution of players in stochastically perturbed games of this sort. The particular stimulus-response model that we consider is simple enough: players keep cumulative scores of their strategies’ performance and employ exponentially more often the one that scores better. After a few preliminaries in Section 2, this approach is made precise in Section 3 where we derive the stochastic that governs the behavior of players when their learning curves are subject to random perturbations. The replicator equation that we get is different from the “aggregate shocks” approach of [4, 5, 8, 10] and, as a result, it exhibits markedly different ratio- nality properties as well. In stark contrast to the results of [4, 10], we show in Section 4 that dominated strategies become extinct irrespective of the noise level (Proposition 4.1) and provide an exponential bound for the rate of de- cay of these strategies (Proposition 4.2). In fact, by induction on the rounds 4 P. MERTIKOPOULOS AND A. L. MOUSTAKAS of elimination of dominated strategies, we show that this is true even for it- eratively dominated strategies: despite the noise, only rationally admissible strategies can survive in the long run (Theorem 4.3). Then, as an easy corol- lary of the above, we infer that players will converge to a strict equilibrium (Corollary 4.4) whenever the underlying game is dominance-solvable. We continue with the issue of equilibrial play in Section 5 by making a suggestive detour in the land of congestion games. If the noise is relatively mild with respect to the rate with which players learn, we find that the game’s potential is a Lyapunov function which ensures that strict equilibria are stochastically attracting; and if the game is dyadic (i.e., players only have two choices), this tameness assumption can be dropped altogether. Encouraged by the results of Section 5, we attack the general case in Sec- tion 6. As it turns out, strict equilibria are always asymptotically stochasti- cally stable in the perturbed replicator dynamics that stem from exponential learning (Theorem 6.1). This begs to be compared to the results of [8, 10] where it is the equilibria of a suitably modified game that are stable, and not necessarily those of the actual game being played. Fortunately, expo- nential learning seems to give players a clearer picture of the original game and there is no need for similar modifications in our case.

Notational conventions. Given a finite set S = s ,...,sn , we will rou- { 0 } tinely identify the set ∆(S) of probability measures on S with the standard n+1 n+1 n-dimensional simplex of R : ∆(S) x R : xα =1 and xα 0 . ≡ { ∈ α ≥ } Under this identification, we will also make no distinctionP between sα S ∈ and the vertex eα of ∆(S); in fact, to avoid an overcluttering of indices, we will frequently use α to refer to either sα or eα, writing, for example, “α S” or “u(α)” instead of “sα S” or “u(eα),” respectively. ∈ ∈ To streamline our presentation, we will consistently employ Latin indices for players (i, j, k, . . .) and Greek for their strategies (α,β,µ,...), separating the two by a comma when it would have been æsthetically unpleasant not to. In like manner, when we have to discriminate between strategies, we will assume that indices from the first half of the Greek alphabet start at 0 (α, β = 0, 1, 2,...) while those taken from the second half start at 1 (µ,ν = 1, 2,...). Finally, if X(t) is some stochastic process in Rn starting at X(0) = x, its law will be denoted by PX;x or simply by Px if there is no danger of confusion; and if the context leaves no doubt as to which process we are referring to, we will employ the term “almost surely” in place of the somewhat unwieldy “Px-almost surely.” RATIONALITY AND STOCHASTIC PERTURBATIONS 5

2. Preliminaries.

2.1. Basic facts and definitions from game theory. As is customary, our starting point will be a (finite) set of N players, indexed by i = 1,...,N . ∈N { } The players’ possible actions are drawn from their strategy sets i = siα : α = S { 0,...,Si 1 and they can combine them by choosing their αith (pure) strat- − } egy with probability piαi . In that case, the players’ mixed strategies will be described by the points pi = (pi, ,pi, ,...) ∆i := ∆( i) or, more succinctly, 0 1 ∈ S by the strategy profile p = (p ,...,pN ) ∆ := ∆i. 1 ∈ i In particular, if eiα denotes the αth vertexQ of the ith component sim- plex ∆i ֒ ∆, the (pure) profile q = (e ,α ,...,eN,α ) simply corresponds → 1 1 N to player i playing αi i. On the other hand, if we wish to focus on the ∈ S strategy of a particular player i against that of his opponents −i := ∈N N i , we will employ the shorthand notation (p−i; qi) = (p qi pN ) N \ { } 1 · · · · · · to denote the profile where i plays qi ∆i against his opponents’ strategy ∈ p−i ∆−i := ∆j. ∈ j=6 i So, once playersQ have made their strategic choices, let ui,α1,...,αN be the reward of player i in the profile (α ,...,αN ) = i, that is, the payoff 1 ∈ S i S that strategy αi i yields to player i against theQ strategy α−i −i = ∈ S ∈ S j of i’s opponents. Then, if players mix their strategies, their expected j=6 i S rewardQ will be given by the (multilinear) payoff functions ui :∆ R: → (2.1) ui(p)= ui,α ···α p ,α pN,α . · · · 1 N 1 1 · · · N αX1∈S1 αNX∈SN Under this light, the payoff that a player receives when playing a pure strat- egy α i deserves special mention and will be denoted by ∈ S (2.2) uiα(p) := ui(p−i; α) ui(p α pN ). ≡ 1 · · · · · · This collection of players i , their strategies αi i and their payoffs ∈N ∈ S ui will be our working definition for a game in normal form, usually denoted by G—or G( , , u) if we need to keep track of more data. Needless toN say,S rational players who seek to maximize their individual payoffs will avoid strategies that always lead to diminished payoffs against any play of their opponents. We will thus say that the strategy qi ∆i is ′ ′ ∈ (strictly) dominated by q ∆i and we will write qi q when i ∈ ≺ i ′ (2.3) ui(p−i; qi) < ui(p−i; qi) for all strategies p−i ∆−i of i’s opponents −i. With this in mind,∈ dominated strategies canN be effectively removed from the analysis of a game because rational players will have no incentive to ever use them. However, by deleting such a strategy, another strategy (perhaps of another player) might become dominated and further deletions of iteratively 6 P. MERTIKOPOULOS AND A. L. MOUSTAKAS dominated strategies might be in order (see Section 4 for more details). Proceeding ad infinitum, we will say that a strategy is rationally admissible if it survives every round of elimination of dominated strategies. If the set of rationally admissible strategies is a singleton (e.g., as in the Prisoner’s Dilemma), the game will be called dominance-solvable and the sole surviving strategy will be the game’s rational solution. Then again, not all games can be solved in this way and it is natural to look for strategies which are stable at least under unilateral deviations. Hence, we will say that a strategy profile p ∆ is a of the game G when ∈

(2.4) ui(p) ui(p−i; q) for all q ∆i, i . ≥ ∈ ∈N If the equilibrium profile p only contains pure strategies αi i, we will refer to it as a pure equilibrium; and if the inequality (2.4) is∈ strict S for all q = pi ∆i, i , the equilibrium p will carry instead the characterization strict6 . ∈ ∈N Clearly, if two pure strategies α, β i are present with positive probabil- ∈ S ity in an equilibrial strategy pi ∆i, then we must have uiα(p)= uiβ(p) as ∈ a result of ui being linear in pi. Consequently, only pure profiles can satisfy the strict version of (2.4) so that strict equilibria must also be pure. The converse implication is false but only barely so: a pure equilibrium fails to be strict only if a player has more than one pure strategies that return the same rewards. Since this is almost always true (in the sense that the degen- erate case can be resolved by an arbitrarily small perturbation of the payoff functions), we will relax our terminology somewhat and use the two terms interchangeably. To recover the connection of equilibrial play with strategic dominance, note that if a game is solvable by iterated elimination of dominated strate- gies, the single rationally admissible strategy that survives will be the game’s unique strict equilibrium. But the significance of strict equilibria is not ex- hausted here: strict equilibria are exactly the evolutionarily stable strategies of multi-population evolutionary games—Proposition 5.1 in [11]. Moreover, as we shall see a bit later, they are the only asymptotically stable states of the multi-population replicator dynamics—again, see Chapter 5, pages 216 and 217 of [11]. Unfortunately, strict equilibria do not always exist, Rock-Paper-Scissors being the typical counterexample. Nevertheless, pure equilibria do exist in many large and interesting classes of games, even when we leave out dominance-solvable ones. Perhaps the most noteworthy such class is that of congestion games.

Definition 2.1. A game G G( , , u) will be called a congestion game when: ≡ N S RATIONALITY AND STOCHASTIC PERTURBATIONS 7

1. all players i share a common set of facilities as their strategy set: ∈N F i = for all i ; 2. theS payoffsF are∈N functions of the number of players sharing a particular

facility: ui,α1···α···αN uα(Nα) where Nα is the number of players choosing the same facility as ≡i.

Amazingly enough, Monderer and Shapley made the remarkable discovery in [18] that these games are actually equivalent to the class of potential games.

Definition 2.2. A game G G( , , u) will be called a if there exists a function V :∆ ≡R suchN S that → ′ ′ (2.5) ui(p−i; qi) ui(p−i; q )= (V (p−i; qi) V (p−i; q )) − i − − i ′ for all players i and all strategies p−i ∆−i, qi,q ∆i. ∈N ∈ i ∈ This equivalence reveals that both classes of games possess equilibria in pure strategies: it suffices to look at the vertices of the face of ∆ where the (necessarily multilinear) potential function V is minimized.

2.2. Learning, evolution and the replicator dynamics. As one would ex- pect, locating the Nash equilibria of a game is a rather complicated problem that requires a great deal of global calculations, even in the case of potential games (where it reduces to minimizing a multilinear function over a con- vex polytope). Consequently, it is of interest to see whether there are simple and distributed learning schemes that allow players to arrive at a reasonably stable solution. One such scheme is based on an exponential learning behavior where players play the game repeatedly and keep records of their strategies’ per- formance. In more detail, at each instance of the game all players i ∈N update the cumulative scores Uiα of their strategies α i as specified by the recursive formula ∈ S

(2.6) Uiα(t +1)= Uiα(t)+ uiα(p(t)), where p(t) ∆ is the players’ strategy profile at the tth iteration of the ∈ game and, in the absence of initial bias, we assume that Uiα(0) = 0 for all i , α i. These scores reinforce the perceived success of each strategy as measured∈N ∈ by S the average payoff it yields and hence, it stands to reason that players will lean towards the strategy with the highest score. The precise way in which they do that is by playing according to the namesake exponential law: eUiα(t+1) (2.7) piα(t +1)= . eUiβ (t+1) β∈Si P 8 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

For simplicity, we will only consider the case where players update their scores in continuous time, that is, according to the coupled equations

(2.8a) dUiα(t)= uiα(x(t)) dt, eUiα(t) (2.8b) xiα(t)= . Uiβ(t) β e P Then, if we differentiate (2.8b) to decouple it from (2.8a), we obtain the standard (multi-population) replicator dynamics

dxiα (2.9) = xiα uiα(x) xiβuiβ(x) = xiα(uiα(x) ui(x)). dt  −  − Xβ Alternatively, if players learn at different speeds as a result of varied stimulus-response characteristics, their updating will take the form

eλiUiα(t) (2.10) xiα(t)= , λiUiβ (t) β e P where λi represents the learning rate of player i, that is, the “weight” which he assigns to his perceived scores Uiα. In this way, the replicator equation evolves at a different time scale for each player, leading to the rate-adjusted dynamics

dxiα (2.11) = λixiα(uiα(x) ui(x)). dt − Naturally, the uniform dynamics (2.9) are recovered when all players learn at the “standard” rate λi = 1. If we view the exponential learning model (2.7) from a stimulus-response angle, we see that that the payoff of a strategy simply represents an (expo- nential) propensity of employing said strategy. It is thus closely related to the algorithm of logistic fictitious play [6] where the strategy xi of (2.10) can be seen as the (unique) best reply to the profile x−i in some suitably modified payoffs v (x)= u (x)+ 1 H(x ). Interestingly enough, H(x ) turns i i λi i i out to be none other than the entropy of xi:

(2.12) H(xi)= xiβ log xiβ. − β :Xxiβ>0

That being so, we deduce that the learning rates λi act the part of (player- specific) inverse temperatures: in high temperatures (small λi), the players’ learning curves are “soft” and the payoff differences between strategies are toned down; on the contrary, if λi the scheme “freezes” to a myopic best-reply process. →∞ RATIONALITY AND STOCHASTIC PERTURBATIONS 9

The replicator dynamics were first derived in [23] in the context of pop- ulation biology, first for different phenotypes within a single species (single- population models), and then for different species altogether (multi-population models; [9] and [11] provide excellent surveys). In both these cases, one be- gins with large populations of individuals that are programmed to a particu- lar behavior (e.g., fight for “hawks” or flight for “doves”) and matches them randomly in a game whose payoffs directly affect the reproductive fitness of the individual players. More precisely, let ziα(t) be the population size of the phenotype (strat- egy) α i of species (player) i in some multi-population model where ∈ S ∈N individuals are matched to play a game G with payoff functions ui. Then, the relative frequency (share) of α will be specified by the population state x = (x ,...,xN ) ∆ where xiα = ziα/ ziβ. So, if N individuals are drawn 1 ∈ β randomly from the N species, their expectedP payoffs will be given by ui(x), i , and if these payoffs represent a proportionate increase in the pheno- type’s∈N fitness (measured as the number of offsprings in the unit of time), we will have

(2.13) dziα(t)= ziα(t)uiα(x(t)) dt. As a result, the population state x(t) will evolve according to

dxiα 1 dziα xiα dziγ (2.14) = = xiα(uiα(x) ui(x)), dt z dt − z dt − β iβ Xγ β iβ P P which is exactly (2.9) viewed from an evolutionary perspective. On the other hand, we should note here that in single-population mod- els the resulting equation is cubic and not quadratic because strategies are matched against themselves. To wit, assume that individuals are randomly drawn from a large population and are matched against one another in a (symmetric) 2-player game G with strategy space = 1,...,S and payoff S { } matrix u = uαβ . Then, if xα denotes the population share of individu- als that are{ programmed} to the strategy α , their expected payoff in ∈ S a random match will be given by uα(x) := uαβxβ u(α, x); similarly, β ≡ the population average payoff will be u(x,x)=P α xαuα(x). Hence, by fol- lowing the same procedure as above, we end upP with the single-population replicator dynamics

dxα (2.15) = xα(uα(x) u(x,x)), dt − which behave quite differently than their multi-population counterpart (2.14). As far as rational behavior is concerned, the replicator dynamics have some far-reaching ramifications. If we focus on multi-population models, Samuelson and Zhang showed in [21] that the share xiα(t) of a strategy 10 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

α i which is strictly dominated (even iteratively) converges to zero along any∈ S interior solution path of (2.9); in other words, dominated strategies be- come extinct in the long run. Additionally, there is a remarkable equivalence between the game’s Nash equilibria and the stationary points of the repli- cator dynamics: the asymptotically stable states of (2.9) coincide precisely with the strict Nash equilibria of the underlying game [11].

2.3. Elements of stability analysis. A large part of our work will be fo- cused on examining whether the rationality properties of exponential learn- ing (elimination of dominated strategies and asymptotic stability of strict equilibria) remain true in a stochastic setting. However, since asymptotic stability is (usually) too stringent an expectation for stochastic dynamical systems, we must instead consider its stochastic analogue. That being the case, let W (t) = (W1(t),...,Wn(t)) be a standard Wiener process in Rn and consider the stochastic differential equation (SDE)

(2.16) dXα(t)= bα(X(t)) dt + σαβ(X(t)) dWβ(t). Xβ Following [1, 7], the notion of asymptotic stability in this SDE is expressed by the following.

Definition 2.3. We will say that q Rn is stochastically asymptotically stable when, for every neighborhood U ∈of q and every ε> 0, there exists a neighborhood V of q such that

(2.17) Px X(t) U for all t 0, lim X(t)= q 1 ε n ∈ ≥ t→∞ o ≥ − for all initial conditions X(0) = x V of the SDE (2.16). ∈ Much the same as in the deterministic case, stochastic asymptotic stability is often established by means of a Lyapunov function. In our context, this notion hinges on the second order differential operator that is associated to (2.16), namely the generator L of X(t): n n 2 ∂ 1 T ∂ (2.18) L = bα(x) + (σ(x)σ (x))αβ . ∂xα 2 ∂xα ∂xβ αX=1 α,βX=1 The importance of this operator can be easily surmised from Itˆo’s lemma; indeed, if f : Rn R is sufficiently smooth, the generator L simply captures the drift of the process→ Y (t)= f(X(t)): ∂f (2.19) dY (t)= Lf(X(t)) dt + σαβ(X(t)) dWβ(t). ∂xα Xα,β X(t)

In this way, L can be seen as the stochastic version of the time derivative d dt ; this analogy then leads to the following. RATIONALITY AND STOCHASTIC PERTURBATIONS 11

Definition 2.4. Let q Rn and let U be an open neighborhood of q. We will say that f is a (local)∈ stochastic Lyapunov function for the SDE (2.16) if: 1. f(x) 0 for all x U, with equality iff x = q; 2. there≥ exists a constant∈ k> 0 such that Lf(x) kf(x) for all x U. ≤− ∈ Whenever such a Lyapunov function exists, it is known that the point q Rn where f attains its minimum will be stochastically asymptotically stable—for∈ example, see Theorem 4 in pages 314 and 315 of [7]. A final point that should be mentioned here is that our analysis will be RSi constrained on the compact polytope ∆ = i ∆i instead of all of i . Ac- cordingly, the “neighborhoods” of DefinitionsQ 2.3 and 2.4 shouldQ be taken to mean “neighborhoods in ∆,” that is, neighborhoods in the subspace topology of ∆ ֒ RSi . This minor point should always be clear from the context → i and willQ only be raised in cases of ambiguity.

3. Learning in the presence of noise. Of course, it could be argued that the rationality properties of the exponential learning scheme are a direct consequence of the players’ receiving accurate information about the game when they update their scores. However, this is a requirement that cannot always be met: the interference of nature in the game or imperfect readings of one’s utility invariably introduce fluctuations in (2.8a), and in their turn, these lead to a perturbed version of the replicator dynamics (2.9). To account for these random perturbations, we will assume that the play- ers’ scores are now governed instead by the stochastic differential equation

(3.1) dUiα(t)= uiα(X(t)) dt + ηiα(X(t)) dWiα(t), where, as before, the strategy profile X(t) ∆ is given by the logistic law ∈ eUiα(t) (3.2) Xiα(t)= . Uiβ (t) β e P RSi In this last equation, W (t) is a standard Wiener process living in i and the coefficients ηiα measure the impact of the noise on the players’Q scoring systems. Of course, these coefficients need not be constant: after all, the effect of the noise on the payoffs might depend on the state of the game in some typically continuous way. For this reason, we will assume that the functions ηiα are continuous on ∆, and we will only note en passant that our results still hold for essentially bounded coefficients ηiα (we will only need to replace min and max with ess inf and ess sup, respectively, in all expressions involving ηiα). 12 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

A very important instance of this dependence can be seen if ηiα(x−i; α) = 0 for all i , α i,x−i ∆−i, in which case equation (3.1) becomes a con- vincing∈N model for∈ S the case∈ of insufficient information. It states that when a player actually uses a strategy, his payoff observations are accurate enough; but with regards to strategies he rarely employs, his readings could be ar- bitrarily off the mark. Now, to decouple (3.1) and (3.2), we may simply apply Itˆo’s lemma to the process X(t). To that end, recall that W (t) has independent components across players and strategies, so that dWjβ dWkγ = δjkδβγ dt (the Kronecker · symbols δβγ being 0 for β = γ and 1, otherwise). Then, Itˆo’s formula gives 6 ∂Xiα dXiα = dUjβ ∂Ujβ Xj Xβ 2 1 ∂ Xiα + dUjβ dUkγ 2 ∂Ujβ ∂Ukγ · Xj,k Xβ,γ (3.3) ∂X 1 ∂2X = u (X) iα + η2 (X) iα dt  iβ ∂U 2 iβ ∂U 2  Xβ iβ iβ ∂X + η (X) iα dW . iβ ∂U iβ Xβ iβ On the other hand, a simple differentiation of (3.2) yields

∂Xiα (3.4a) = Xiα(δαβ Xiβ), ∂Uiβ − 2 ∂ Xiα (3.4b) 2 = Xiα(δαβ Xiβ)(1 2Xiβ) ∂Uiβ − − and by plugging these expressions back into (3.3), we get

dXiα = Xiα[uiα(X) ui(X)] dt − 1 2 1 2 (3.5) + Xiα η (X)(1 2Xiα) η (X)Xiβ (1 2Xiβ) dt 2 iα − − 2 iβ −  Xβ

+ Xiα ηiα(X) dWiα ηiβ(X)Xiβ dWiβ .  −  Xβ Alternatively, if players update their strategies with different learning rates λi, we should instead apply Itˆo’s formula to (2.10). In so doing, we obtain

dXiα = λiXiα[uiα(X) ui(X)] dt − RATIONALITY AND STOCHASTIC PERTURBATIONS 13

2 λi 2 2 + Xiα η (X)(1 2Xiα) η (X)Xiβ(1 2Xiβ) dt 2  iα − − iβ −  Xβ (3.5′) + λiXiα ηiα(X) dWiα ηiβ(X)Xiβ dWiβ h − X i = biα(X) dt + σi,αβ(X) dWiβ, Xβ where, in obvious notation, biα(x) and σi,αβ(x) are, respectively, the drift and diffusion coefficients of the diffusion X(t). Obviously, when λi = 1, we recover the uniform dynamics (3.5); equivalently (and this is an interpretation that is well worth keeping in mind), the rates λi can simply be regarded as a commensurate inflation of the payoffs and noise coefficients of player i in the uniform logistic model (3.2). ∈N Equation (3.5) and its rate-adjusted sibling (3.5′) will constitute our stochastic version of the replicator dynamics and thus merit some discus- sion in and by themselves. First, note that these dynamics admit a (unique) strong solution for any initial state X(0) = x ∆, even though they do not satisfy the linear growth condition b(x) + σ(x∈) C(1+ x ) that is required for the existence and uniqueness theorem| | | for SDEs|≤ (e.g.,| The| orem 5.2.1 in [20]). Instead, an addition over α i reveals that every simplex ∆i ∆ ∈ S ⊆ remains invariant under (3.5): if Xi(0) = xi ∆i, then d( Xiα)=0 and ∈ α hence, Xi(t) will stay in ∆i for all t 0—actually, it is notP harder to see that every face of ∆ is a trap for X(t).≥ So, if φ is a smooth bump function that is equal to 1 on some open neighborhood of U ∆ and which vanishes outside some compact set K U, the SDE ⊇ ⊇

(3.6) dX = φ(X) b (X) dt + σ (X) dW iα  iα i,αβ iβ Xβ will have bounded diffusion and drift coefficients and will thus admit a unique strong solution. But since this last equation agrees with (3.5) on ∆ and any solution of (3.5) always stays in ∆, we can easily conclude that our perturbed replicator dynamics admit a unique strong solution for any initial X(0) = x ∆. It is also∈ important to compare the dynamics (3.5), (3.5′) to the “ag- gregate shocks” approach of Fudenberg and Harris [5] that has become the principal incarnation of the replicator dynamics in a stochastic environment. So, let us first recall how aggregate shocks enter the replicator dynamics in the first place. The main idea is that the reproductive fitness of an individual is not only affected by deterministic factors but is also subject to stochastic shocks due to the “weather” and the interference of nature with the game. 14 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

More precisely, if Ziα(t) denotes the population size of phenotype α i of the species i in some multi-population evolutionary game G, its growth∈ S will be determined∈N by

(3.7) dZiα(t)= Ziα(t)(uiα(X(t)) dt + ηiα dWiα(t)), where, as in (2.13), X(t) ∆ denotes the population shares Xiα = Ziα/ Ziβ. ∈ β In this way, Itˆo’s lemma yields the replicator dynamics with aggregate shocksP :

2 2 2 dXiα = Xiα (uiα(X) ui(X)) η Xiα η X dt  − −  iα − iβ iβ Xβ (3.8) + Xiα ηiα dWiα ηiβXiβ dWiβ . h − X i We thus see that the effects of noise propagate differently in the case of exponential learning and in the case of evolution. Indeed, if we compare equations (3.5) and (3.8) term by term, we see that the drifts are not quite the same: even though the payoff adjustment uiα ui ties both equations − back together in the deterministic setting (η = 0), the two expressions differ by

1 2 1 2 (3.9) Xiα η η Xiβ dt. 2 iα − 2 iβ  Xβ Innocuous as this term might seem, it is actually crucial for the rational- ity properties of exponential learning in games with randomly perturbed payoffs. As we shall see in the next sections, it leads to some miraculous cancellations that allow rationality to emerge in all noise levels. This difference further suggests that we can pass from (3.5) to (3.8) simply 1 2 by modifying the game’s payoffs to uiα = uiα + 2 ηiα. Of course, this presumes that the noise coefficients ηiα be constant—the general case would require us to allow for games whose payoffse may not be multilinear. This apparent lack of generality does not really change things but we prefer to keep things simple and for the time being, it suffices to point out that this modified game was precisely the one that came up in the analysis of [8, 10]. As a result, this modification appears to play a pivotal role in setting apart learning and evolution in a stochastic setting: whereas the modified game is deeply ingrained in the process of natural selection, exponential learning seems to give players a clearer picture of the actual underlying game.

4. Extinction of dominated strategies. Thereby armed with the stochas- tic replicator equations (3.5), (3.5′) to model exponential learning in noisy environments, the logical next step is to see if the rationality properties of RATIONALITY AND STOCHASTIC PERTURBATIONS 15 the deterministic dynamics carry over to this stochastic setting. In this di- rection, we will first show that dominated strategies always become extinct in the long run and that only the rationally admissible ones survive. As in [4] (implicitly) and [10] (explicitly), the key ingredient of our ap- proach will be the cross entropy between two mixed strategies qi,xi ∆i of player i : ∈ ∈N (4.1) H(qi,xi) := qiα log(xiα) H(qi)+ d (qi,xi), − ≡ KL α :Xqiα>0 where H(qi)= qiα log qiα is the entropy of qi and d is the intimately − α KL related Kullback–LeiblerP divergence (or relative entropy): qiα (4.2) dKL(qi,xi) := H(qi,xi) H(qi)= qiα log . − xiα α :Xqiα>0 This divergence function is central in the stability analysis of the (deter- ministic) replicator dynamics because it serves as a distance measure in probability space [11]. As it stands however, dKL is not a distance function per se: neither is it symmetric, nor does it satisfy the triangle inequality. Still, it has the very useful property that d (qi,xi) < iff xi employs with KL ∞ positive probability all pure strategies α i that are present in qi [i.e., iff ∈ S supp(qi) supp(xi) or iff qi is absolutely continuous w.r.t. xi]. Therefore, if ⊆ d (qi,xi)= for all dominated strategies qi of player i, it immediately fol- KL ∞ lows that xi cannot be dominated itself. In this vein, we have the following.

Proposition 4.1. Let X(t) be a solution of the stochastic replicator dynamics (3.5) for some interior initial condition X(0) = x Int(∆). Then, ∈ if qi ∆i is (strictly) dominated, ∈ (4.3) lim dKL(qi, Xi(t)) = almost surely. t→∞ ∞

In particular, if qi = α i is pure, we will have limt→∞ Xiα(t) = 0 (a.s.): strictly dominated strategies∈ S do not survive in the long run.

Proof. Note first that X(0) = x Int(∆) and hence, Xi(t) will almost ∈ surely stay in Int(∆i) for all t 0; this is a simple consequence of the unique- ≥ ness of strong solutions and the invariance of the faces of ∆i under the dynamics (3.5).

Let us now consider the cross entropy Gqi (t) between qi and Xi(t):

(4.4) Gq (t) H(qi, Xi(t)) = qiα log Xiα(t). i ≡ − Xα 16 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

As a result of Xi(t) being an interior path, Gqi (t) will remain finite for all t 0 (a.s.). So, by applying Itˆo’s lemma we get ≥ 2 ∂Gqi 1 ∂ Gqi dGqi = dXiβ + dXiβ dXiγ ∂Xiβ 2 ∂Xiγ ∂Xiβ · Xβ Xβ,γ (4.5) qiβ 1 qiβ 2 = dXiβ + 2 (dXiβ) − Xiβ 2 X Xβ Xβ iβ and, after substituting dXiβ from the dynamics (3.5), this last equation becomes

1 2 dGq = qiβ ui(X) uiβ(X)+ η (X)Xiγ (1 Xiγ ) dt i  − 2 iγ −  Xβ Xγ (4.6) + qiβ (Xiγ δβγ )ηiγ (X) dWiγ . − Xβ Xγ ′ Accordingly, if qi ∆i is another mixed strategy of player i, we readily obtain ∈ ′ dG dG ′ = (u (X ; q ) u (X ; q )) dt qi qi i −i i i −i i (4.7) − − ′ + (q qiβ)ηiβ(X) dWiβ iβ − Xβ and, after integrating, t ′ ′ G ′ (t)= H(q q ,x)+ u (X (s); q q ) ds qi−qi i i i −i i i − Z0 − (4.8) t ′ + (q qiβ) ηiβ(X(s)) dWiβ (s). iβ − Z Xβ 0 ′ ′ Suppose then that qi q and let vi = inf ui(x−i; q qi) : x−i ∆−i . With ≺ i { i − ∈ } ∆−i compact, it easily follows that vi > 0 and the first term of (4.8) will be bounded from below by vit. However, since monotonicity fails for Itˆointegrals, the second term must ′ be handled with more care. To that end, let ξi(s)= (q qiβ)ηiβ(X(s)) β iβ − and note that the Cauchy–Schwarz inequality gives P 2 ′ 2 2 ξ (s) Si (q qiβ) η (X(s)) i ≤ iβ − iβ Xβ (4.9) 2 ′ 2 2 Siη (q qiβ) 2Siη , ≤ i iβ − ≤ i Xβ RATIONALITY AND STOCHASTIC PERTURBATIONS 17 where Si = i is the number of pure strategies available to player i and |S | ′ ηi = max ηiβ(x) : x ∆, β i ; recall also that qi,qi ∆i for the last step. {| | ∈ ′ ∈ S } t ∈ Therefore, if ψi(t)= (q qiβ) ηiβ(X(s)) dWiβ(s) denotes the martin- β iβ − 0 gale part of (4.7) andPρi(t) is its quadraticR variation, the previous inequality yields

t 2 2 (4.10) ρi(t) = [ψi, ψi](t)= ξi (s) ds 2Siηi t. Z0 ≤

Now, if limt→∞ ρi(t)= , it follows from the time-change theorem for ∞ martingales (e.g., Theorem 3.4.6 in [12]) that there exists a Wiener process Wi such that ψi(t)= Wi(ρi(t)). Hence, by the law of the iterated logarithm we get f f

lim inf Gq −q′ (t) t→∞ i i ′ H(qi q ,x) + lim inf(vit + Wi(ρi(t))) ≥ − i t→∞ ′ f (4.11) H(qi qi,x) + lim inf(vit 2ρi(t) log log ρi(t)) t→∞ ≥ − − p ′ 2 H(qi qi,x) + lim inf(vit 2ηi Sit log log(2Siηi t)) ≥ − t→∞ − q = (almost surely). ∞

On the other hand, if limt→∞ ρi(t) < , it is trivial to obtain Gq −q′ (t) ∞ i i →∞ by letting t in (4.8). Therefore, with Gq (t) Gq (t) Gq′ (t) , we →∞ i ≥ i − i →∞ readily get limt→∞ d (qi, Xi(t)) = (a.s.); and since Gα(t)= log Xiα(t) KL ∞ − for all pure strategies α i, our proof is complete.  ∈ S As in [10], we can now obtain the following estimate for the lifespan of pure dominated strategies.

Proposition 4.2. Let X(t) be a solution path of (3.5) with initial con- dition X(0) = x Int(∆) and let Px denote its law. Assume further that the ∈ strategy α i is dominated; then, for any M > 0 and for t large enough, ∈ S we have

−M 1 M hi(xi) vit (4.12) Px Xiα(t) < e erfc − − , { }≥ 2  2ηi√Sit  where Si = i is the number of strategies available to player i, ηi = |S | max ηiβ(y) : y ∆, β i and the constants vi > 0 and hi(xi) do not de- {| | ∈ ∈ S } pend on t. 18 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

Proof. The proof is pretty straightforward and for the most part fol- lows [10]. Surely enough, if α pi ∆i and we use the same notation as in the proof of Proposition 4.1, we≺ have∈

log Xiα(t)= Gα(t) Gα(t) Gp (t) − ≥ − i (4.13) H(α, x) H(pi,x)+ vit + Wi(ρi(t)) ≥ − = hi(xi)+ vit + Wi(ρi(t)), f where vi := minx ui(x−i; pi) ui(x−i; α) > 0 and hi(xi) := log xiα −i { − f } − β piβ log xiβ. Then P −M Px(Xiα(t) < e ) Px Wi(ρi(t)) > M hi(xi) vit (4.14) ≥ { − − } 1 f M hi(xi) vit = erfc − − 2  2ρi(t)  p 2 and, since the quadratic variation ρi(t) is bounded above by 2Siηi t (4.10), the estimate (4.12) holds for all sufficiently large t [i.e., such that M < hi(xi)+ vit]. 

Some remarks are now in order: first and foremost, our results should be contrasted to those of Cabrales [4] and Imhof [10] where dominated strategies die out only if the noise coefficients (shocks) ηiα satisfy certain tameness conditions. The origin of this notable difference is the form of the replicator equation (3.5) and, in particular, the extra terms that are propagated there by exponential learning and which are absent from the aggregate shocks dynamics (3.8). As can be seen from the derivations in Proposition 4.1, these terms are precisely the ones that allow players to pick up on the true 1 2 payoffs uiα instead of the modified ones uiα = uiα + 2 ηiα that come up in [8, 10] (and, indirectly, in [4] as well). Secondly, it turns out that the way thate the noise coefficients ηiβ depend on the profile x ∆ is not really crucial: as long as ηiβ(x) is continuous (or essentially bounded),∈ our arguments are not affected. The only way in which a specific dependence influences the extinction of dominated strategies is seen in Proposition 4.2: a sharper estimate of the quadratic variation of t 0 ηiβ(X(s)) ds could conceivably yield a more accurate estimate for the cumulativeR distribution function of (4.12). Finally, it is only natural to ask if Proposition 4.1 can be extended to strategies that are only iteratively dominated. As it turns out, this is indeed the case.

Theorem 4.3. Let X(t) be a solution path of (3.5) starting at X(0) = x Int(∆). Then, if qi ∆i is iteratively dominated, ∈ ∈ (4.15) lim dKL(qi, Xi(t)) = almost surely, t→∞ ∞ RATIONALITY AND STOCHASTIC PERTURBATIONS 19 that is, only rationally admissible strategies survive in the long run.

Proof. As in the deterministic case [21], the main idea is that the so- lution path X(t) gets progressively closer to the faces of ∆ that are spanned by the pure strategies which have not yet been eliminated. Following [4], we will prove this by induction on the rounds of elimination of dominated strategies; Proposition 4.1 is simply the case n = 1. To wit, let Ai ∆i, A−i ∆−i and denote by Adm(Ai, A−i) the set of ⊆ ⊆ strategies qi Ai that are admissible (i.e., not dominated) with respect to ∈ 0 0 0 any strategy q−i A−i. So, if we start with =∆i and = , we ∈ Ai A−i j=6 i Aj may define inductively the set of strategies that remain admissibleQ after n elimination rounds by n := Adm( n−1, n−1) where n−1 := n−1; Ai Ai A−i Ai j=6 i Aj similarly, the pure strategies that have survived after n such roundsQ will n n be denoted by i := i i . Clearly, this sequence forms a descending chain 0 1 S andS ∩A the set ∞ := ∞ n will consist precisely of the Ai ⊇Ai ⊇··· Ai 0 Ai strategies of player i that are rationallyT admissible. Assume then that the cross entropy Gqi (t)= H(qi, Xi(t)) = α qiα k − × log Xiα(t) diverges as t for all strategies qi / i that die outP within →∞ k ∈A the first k rounds; in particular, if α / this implies that Xiα(t) 0 as ∈ Si → t . We will show that the same is true if qi survives for k rounds but is eliminated→∞ in the subsequent one. k k+1 ′ k+1 Indeed, if qi but qi / , there will exist some q such that ∈Ai ∈Ai i ∈Ai ′ k (4.16) ui(x−i; q ) > ui(x−i; qi) for all x−i . i ∈A−i adm dom Now, note that any x−i ∆−i can be decomposed as x−i = x−i + x−i adm ∈ where x−i is the “admissible” part of x−i, that is, the projection of x−i k k on the subspace spanned by the surviving vertices −i = j=6 i i . Hence, ′ k S S if vi = min ui(α−i; qi) ui(α−i; qi) : α−i −i , we will haveQvi > 0 and, by linearity, { − ∈ S } adm ′ adm (4.17) ui(x ; q ) ui(x ; qi) vi > 0 for all x−i ∆−i. −i i − −i ≥ ∈ Moreover, by the induction hypothesis, we also have Xdom(t) 0 as t . −i → →∞ Thus, there exists some t0 such that dom ′ dom (4.18) ui(X (t),q ) ui(X (t),qi) < vi/2 | −i i − −i | dom for all t t0 [recall that X−i (t) is spanned by already eliminated strategies]. Therefore,≥ as in the proof of Proposition 4.1, we obtain for t t ≥ 0 t 1 ′ (4.19) Gq (t) Gq′ (t) M + vit + (q qiβ) ηiβ(X(s)) dWiβ(s), i − i ≥ 2 iβ − Z Xβ 0 where M is a constant depending only on t0. In this way, the same reasoning as before gives limt→∞ Gq (t)= and the theorem follows.  i ∞ 20 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

As a result, if there exists only one rationally admissible strategy, we get the following.

Corollary 4.4. Let X(t) be an interior solution path of the replicator equation (3.5) for some dominance-solvable game G and let x0 be the (unique) strict equilibrium of G. Then ∈ S

(4.20) lim X(t)= x0 almost surely, t→∞ that is, players converge to the game’s strict equilibrium (a.s.).

In concluding this section, it is important to note that all our results on the extinction of dominated strategies remain true in the adjusted dynamics (3.5′) as well: this is just a matter of rescaling. The only difference in using different learning rates λi comes about in Proposition 4.2 where the estimate (4.12) becomes

−M 1 M hi(xi) λivit (4.21) Px Xiα(t) < e erfc − − . { }≥ 2  2λiηi√Sit  As it stands, this is not a significant difference in itself because the two estimates are asymptotically equal for large times. Nonetheless, it is this very lack of contrast that clashes with the deterministic setting where faster learning rates accelerate the emergence of rationality. The reason for this gap is that an increased learning rate λi also carries a commensurate increase in the noise coefficients ηi, and thus deflates the benefits of accentuating payoff differences. In fact, as we shall see in the next sections, the learning rates do not really allow players to learn any faster as much as they help diminish their shortsightedness: by effectively being lazy, it turns out that players are better able to average out the noise.

5. Congestion games: A suggestive digression. Having established that irrational choices die out in the long run, we turn now to the question of whether equilibrial play is stable in the stochastic replicator dynamics of exponential learning. However, before tackling this issue in complete gen- erality, it will be quite illustrative to pay a visit to the class of congestion games where the presence of a potential simplifies things considerably. In this way, the results we obtain here should be considered as a motivating precursor to the general case analyzed in Section 6.

5.1. Congestion games. To begin with, it is easy to see that the potential V of Definition 2.2 is a Lyapunov function for the deterministic replicator dynamics. Indeed, assume that player i is learning at a rate λi > 0 and ∈N RATIONALITY AND STOCHASTIC PERTURBATIONS 21 let x(t) be a solution path of the rate-adjusted dynamics (2.11). Then, a simple differentiation of V (x(t)) gives

dV ∂V dxiα = = uiα(x)λixiα(uiα(x) ui(x)) dt ∂x dt − − Xi,α iα Xi,α (5.1) 2 2 = λi xiαu (x) u (x) 0, −  iα − i  ≤ Xi Xα ∂V the last step following from Jensen’s inequality—recall that = uiα(x) ∂xiα − on account of (2.5) and also that ui(x)= α xiαuiα(x). In particular, this implies that the trajectories x(t) are attractedP to the local minima of V , and since these minima coincide with the strict equilibria of the game, we painlessly infer that strict equilibrial play is asymptotically stable in (2.11)— as mentioned before, we plead guilty to a slight abuse of terminology in assuming that all equilibria in pure strategies are also strict. It is therefore reasonable to ask whether similar conclusions can be drawn in the noisy setting of (3.5′). Mirroring the deterministic case, a promising way to go about this question is to consider again the potential function V of the game and try to show that it is stochastically Lyapunov in the sense of Definition 2.4. Indeed, if q0 = (e1,0,...,eN,0) ∆ is a local minimum of V (and hence, a strict equilibrium of the underlying∈ game), we may assume without loss of generality that V (q0) = 0 so that V (x) > 0 in a neighborhood of q0. We are thus left to examine the negativity condition of Definition 2.4, that is, whether there exists some k> 0 such that LV (x) kV (x) for all ≤− x sufficiently close to q0. ∂V ∂2V To that end, recall that ∂x = uiα and that ∂x2 = 0. Then, the gener- iα − iα ator L of the rate-adjusted dynamics (3.5′) applied to V produces

LV (x)= λixiαuiα(x)(uiα(x) ui(x)) − − Xi,α (5.2) 2 λi 2 2 xiαuiα(x) η (1 2xiα) η xiβ(1 2xiβ) , − 2  iα − − iβ −  Xi,α Xβ where, for simplicity, we have assumed that the noise coefficients ηiα are constant. We will study (5.2) term by term by considering the perturbed strategies xi = (1 εi)ei, + εiyi where yi belongs to the face of ∆i that lies opposite − 0 to ei, (i.e., yiµ 0, µ = 1, 2,... and yiµ =1) and εi > 0 measures the 0 ≥ µ distance of player i from ei,0. In this way,P we get

ui(x)= xiαuiα(x) = (1 εi)ui, (x)+ εi yiµuiµ(x) − 0 Xα Xµ 22 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

(5.3) = ui, (x)+ εi yiµ[uiµ(x) ui, (x)] 0 − 0 Xµ 2 = ui, (x) εi yiµ∆uiµ + (ε ), 0 − O i Xµ where ∆uiµ = ui, (q ) uiµ(q ) > 0. Then, by going back to (5.2), we obtain 0 0 − 0 xiαuiα(x)[uiα(x) ui(x)] − Xα

= (1 εi)ui, (x)[ui, (x) ui(x)] − 0 0 − + εi yiµuiµ(x)[uiµ(x) ui(x)] − Xµ

(5.4) = (1 εi)ui, (x) εi yiµ∆uiµ − 0 · Xµ 2 εi yiµuiµ(q )∆uiµ + (ε ) − 0 O i Xµ 2 = εi yiµui, (q )∆uiµ εi yiµuiµ(q )∆uiµ + (ε ) 0 0 − 0 O i Xµ Xµ 2 2 = εi yiµ(∆uiµ) + (ε ). O i Xµ As for the second term of (5.2), some easy algebra reveals that

2 2 η (1 2xi, ) η xiβ(1 2xiβ) i,0 − 0 − iβ − Xβ

2 2 2 = η (1 2εi) η (1 εi) εi η yiµ − i,0 − − i,0 − − iµ Xµ (5.5) 2 2 2 2 2 + 2(1 εi) η + 2ε η y − i,0 i iµ iµ Xµ

2 2 2 = εi η + yiµη + (ε ) −  i,0 iµ O i Xµ and, after a (somewhat painful) series of calculations, we get

2 2 xiαuiα(x) η (1 2xiα) η xiβ(1 2xiβ)  iα − − iβ −  Xα Xβ

2 2 = (1 εi)ui, (x) η (1 2xi, ) η xiβ(1 2xiβ) − 0  i,0 − 0 − iβ −  Xβ RATIONALITY AND STOCHASTIC PERTURBATIONS 23

2 2 + εi yiµ η (1 2xiµ) η xiβ(1 2xiβ)  iµ − − iβ −  Xµ Xβ (5.6) 2 2 = εiui, (q ) η + yiµη − 0 0  i,0 iµ Xµ 2 2 2 + εi yiµuiµ(q )(η + η )+ (ε ) 0 iµ i,0 O i Xµ 2 2 2 = εi yiµ∆uiµ(η + η )+ (ε ). − iµ i,0 O i Xµ

Finally, if we assume without loss of generality that V (q0) = 0 and set ξ = x q (i.e., ξi, = εi and ξiµ = εiyiµ for all i , µ i 0 ), we readily − 0 0 − ∈N ∈ S \ { } get

∂V 2 V (x)= ξiα + (ξ ) ∂x O Xi,α iα

∂ui 2 = ξiα + (ε ) − ∂xiα O Xi,α q0

(5.7) 2 = uiα(q )ξiα + (ε ) − 0 O Xi,α

2 = εi yiµ∆uiµ + (ε ), O Xi Xµ

2 2 where ε = i εi . Therefore, by combining (5.4), (5.6) and (5.7), the nega- tivity conditionP LV (x) kV (x) becomes ≤− λi 2 2 λiεi yiµ∆uiµ ∆uiµ (η + η )  − 2 iµ i,0  Xi Xµ (5.8) 2 k εi yiµ∆uiµ + (ε ). ≥ O Xi Xµ

λi 2 2 Hence, if ∆uiµ > (η + η ) for all µ i 0 , this last inequality will be 2 iµ i,0 ∈ S \ { } satisfied for some k> 0 whenever ε is small enough. Essentially, this proves the following.

Proposition 5.1. Let q = (α1,...,αN ) be a strict equilibrium of a con- gestion game G with potential function V and assume that V (q) = 0. As- sume further that the learning rates λi are sufficiently small so that, for all 24 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

µ i αi and all i , ∈ S \ { } ∈N λ (5.9) V (q ,µ) > i (η2 + η2 ). −i 2 iµ i,0 Then q is stochastically asymptotically stable in the rate-adjusted dynamics (3.5′).

We thus see that no matter how loud the noise ηi might be, stochastic stability is always guaranteed if the players choose a learning rate that is 2 slow enough as to allow them to average out the noise (i.e., λi < ∆Vi/ηi ). Of course, it can be argued here that it is highly unrealistic to expect players to be able to estimate the amount of Nature’s interference and choose a suitably small rate λi. On top of that, the very form of the condition (5.9) is strongly reminiscent of the “modified” game of [8, 10], a similarity which seems to contradict our statement that exponential learning favors rational reactions in the original game. The catch here is that condition (5.9) is only sufficient and Proposition 5.1 merely highlights the role of a potential function in a stochastic environment. As we shall see in Section 6, nothing stands in the way of choosing a different Lyapunov candidate and dropping requirement (5.9) altogether.

5.2. The dyadic case. To gain some further intuition into why the con- dition (5.9) is redundant, it will be particularly helpful to examine the case where players compete for the resources of only two facilities (i.e., i = 0, 1 for all i ) and try to learn the game with the help of the uniformS { repli-} cator equation∈N (3.5). This is the natural setting for the [2] and the ensuing minority game [14] where players choose to “buy” or “sell” and are rewarded when they are in the minority–buyers in a sellers’ market or sellers in an abundance of buyers. As has been shown in [17], such games always possess strict equilibria, even when players have distinct payoff functions. So, by relabeling indices if necessary, let us assume that q0 = (e1,0,...,eN,0) is such a strict equilibrium and set xi xi,0. Then, the generator of the replicator equation (3.5) takes the form ≡

1 2 ∂ L = xi(1 xi) ∆ui(x)+ (1 2xi)η (x) −  2 − i  ∂x Xi i (5.10) 2 1 2 2 2 ∂ + x (1 xi) η (x) , 2 i − i ∂x2 Xi i 2 2 2 where now ∆ui ui,0 ui,1 and ηi = ηi,0 + ηi,1. It thus appears≡ particularly− appealing to introduce a new set of variables ∂ ∂ yi such that = xi(1 xi) ; this is just the “logit” transformation: yi = ∂yi − ∂xi RATIONALITY AND STOCHASTIC PERTURBATIONS 25

xi logit xi log . In these new variables, (5.10) assumes the astoundingly ≡ 1−xi suggestive guise

2 ∂ 1 2 ∂ (5.11) L = ∆ui + η ,  ∂y 2 i ∂y2  Xi i i which reveals that the noise coefficients can be effectively decoupled from the payoffs. We can then take advantage of this by letting L act on the −aiyi function f(y)= i e (ai > 0): P 1 2 −aiyi (5.12) Lf(y)= ai ∆ui aiη e . −  − 2 i  Xi

1 2 Hence, if ai is chosen small enough so that ∆ui aiη mi > 0 for all − 2 i ≥ sufficiently large yi [recall that ∆ui(q0) > 0 since q0 is a strict equilibrium], we get

−aiyi (5.13) Lf(y) aimie kf(y), ≤− ≤− Xi where k = mini aimi > 0. And since f is strictly positive for yi, > 0 and { } 0 only vanishes as y (i.e., at the equilbrium q ), a trivial modification of →∞ 0 the stochastic Lyapunov method (see, e.g., pages 314 and 315 of [7]) yields the following.

Proposition 5.2. The strict equilibria of minority games are stochas- tically asymptotically stable in the uniform replicator equation (3.5).

Remark 5.3. It is trivial to see that strict equilibria of minority games will also be stable in the rate-adjusted dynamics (3.5′): in that case we 1 2 simply need to choose ai such that ∆ui aiλiη mi > 0. − 2 i ≥ Remark 5.4. A closer inspection of the calculations leading to Propo- sition 5.2 reveals that nothing hinges on the minority mechanism per se: it is (5.11) that is crucial to our analysis and L takes this form whenever the underlying game is a dyadic one (i.e., i = 2 for all i ). In other words, |S | ∈N Proposition 5.2 also holds for all games with 2 strategies and should thus be seen as a significant extension of Proposition 5.1.

Proposition 5.5. The strict equilibria of dyadic games are stochasti- cally asym ptotically stable in the replicator dynamics (3.5), (3.5′) of expo- nential learning. 26 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

6. Stability of equilibrial play. In deterministic environments, the “folk theorem” of provides some pretty strong ties be- tween equilibrial play and stability: strict equilibria are asymptotically stable in the multi-population replicator dynamics (2.9) [11]. In our stochastic set- ting, we have already seen that this is always true in two important classes of games: those that can be solved by iterated elimination of dominated strategies (Corollary 4.4) and dyadic ones (Proposition 5.5). Although interesting in themselves, these results clearly fall short of adding up to a decent analogue of the folk theorem for stochastically perturbed games. Nevertheless, they are quite strong omens in that direction and such expectations are vindicated in the following.

Theorem 6.1. The strict equilibria of a game G are stochastically asymp- totically stable in the replicator dynamics (3.5), (3.5′) of exponential learn- ing.

Before proving Theorem 6.1, we should first take a slight detour in order to properly highlight some of the issues at hand. On that account, assume again that the profile q0 = (e1,0,...,eN,0) is a strict equilibrium of G. Then, if q0 is to be stochastically stable, say in the uniform dynamics (3.5), one would expect the strategy scores Ui,0 of player i to grow much faster than the scores Uiµ,µ i 0 of his other strategies. This is captured remarkably well by the “adjusted”∈ S \ { } scores

λiUiµ (6.1a) Zi, = λiUi, log e , 0 0 −   Xµ

(6.1b) Ziµ = λi(Uiµ Ui, ), − 0 where λi > 0 is a sensitivity parameter akin (but not identical) to the learn- ing rates of (3.5′) (the choice of common notation is fairly premeditated though). Clearly, whenever Zi,0 is large, Ui,0 will be much greater than any other score Uiµ and hence, the strategy 0 i will be employed by player i far more often. To see this in more detail, it∈ is S convenient to introduce the variables

λiUi,0 Zi,0 e (6.2a) Yi,0 := e = , λiUiν ν e eZiµ P eλiUiµ (6.2b) Yiµ := = , Ziν λiUiν ν e ν e P P where Yi, is a measure of how close Xi is to ei, ∆i and (Yi, ,Yi, ,...) 0 0 ∈ 1 2 ∈ ∆Si−1 is a direction indicator; the two sets of coordinates are then related by RATIONALITY AND STOCHASTIC PERTURBATIONS 27

λi λi the transformation Yiα = X / X , α i, µ i 0 . Consequently, to iα µ iµ ∈ S ∈ S \{ } show that the strict equilibriumPq0 = (e1,0,...,eN,0) is stochastically asymp- totically stable in the replicator equation (3.5), it will suffice to show that Yi,0 diverges to infinity as t with arbitrarily high probability. Our first step in this direction→∞ will be to derive an SDE for the evolution of the Yiα processes. To that end, Itˆo’s lemma gives 2 ∂Yiα 1 ∂ Yiα dYiα = dUjβ + dUjβ dUkγ ∂Ujβ 2 ∂Ujβ ∂Ukγ · Xj,β Xj,k Xβ,γ (6.3) 2 ∂Yiα 1 2 ∂ Yiα ∂Yiα = uiβ + ηiβ 2 dt + ηiβ dWiβ,  ∂Uiβ 2 ∂U  ∂Uiβ Xβ iβ Xβ where, after a simple differentiation of (6.2a), we have 2 (6.4a) ∂Yi,0 ∂ Yi,0 2 = λiYi,0, 2 = λi Yi,0, ∂Ui,0 ∂Ui,0 2 ′ ∂Yi,0 ∂ Yi,0 2 (6.4a ) = λiYi,0Yiν, 2 = λi Yi,0Yiν(1 2Yiν ) ∂Uiν − ∂Uiν − − and, similarly, from (6.2b) 2 (6.4b) ∂Yiµ ∂ Yiµ = 0, 2 = 0, ∂Ui,0 ∂Ui,0 ∂Yiµ = λiYiµ(δµν Yiν), ∂Uiν − (6.4b′) 2 ∂ Yiµ 2 2 = λi Yiµ(δµν Yiν)(1 2Yiν). ∂Uiν − − In this way, by plugging everything back into (6.3) we finally obtain

λi 2 λi 2 dYi, = λiYi, ui, Yiµuiµ + η Yiµ(1 2Yiµ)η dt 0 0 0 − 2 i,0 − 2 − iµ Xµ Xµ (6.5a)

+ λiYi, ηi, dWi, ηiµYiµ dWiµ , 0 0 0 −  Xµ

dYiµ = λiYiµ[uiµ uiνYiν] dt − Xν 2 λi 2 2 (6.5b) + Yiµ η (1 2Yiµ) η Yiν(1 2Yiν ) dt 2  iµ − − iν −  Xν

+ λiYiµ ηiµ dWiµ ηiν Yiν dWiν ,  −  Xν 28 P. MERTIKOPOULOS AND A. L. MOUSTAKAS where we have suppressed the arguments of ui and ηi in order to reduce notational clutter. This last SDE is particularly revealing: roughly speaking, we see that if λi is chosen small enough, the deterministic term ui, Yiµuiµ will 0 − µ dominate the rest (cf. with the “soft” learning rates of PropPosition 5.1). And, since we know that strict equilibria are asymptotically stable in the deterministic case, it is plausible to expect the SDE (6.5) to behave in a similar fashion.

Proof of Theorem 6.1. Tying in with our previous discussion, we will establish stochastic asymptotic stability of strict equilibria in the dynamics S −1 (3.5) by looking at the processes Yi = (Yi, ,Yi, ,...) R ∆ i of (6.2). In 0 1 ∈ × these coordinates, we just need to show that for every Mi > 0, i and any ∈N ε> 0, there exist Qi > Mi such that if Yi,0(0) >Qi, then, with probability greater than 1 ε, limt→∞ Yi,0(t)= and Yi,0(t) > Mi for all t 0. In the spirit of the previous− section, we will∞ accomplish this with the help≥ of the stochastic Lyapunov method. Our first task will be to calculate the generator of the diffusion Y = (Y1,...,YN ), that is, the second order differential operator 2 ∂ 1 T ∂ (6.6) L = biα(y) + (σi(y)σi (y))αβ , ∂yiα 2 ∂yiα ∂yiβ iX∈N iX∈N α∈Si α,β∈Si where bi and σi are the drift and diffusion coefficients of the SDE (6.5), respectively. In particular, if we restrict our attention to sufficiently smooth functions of the form f(y)= i∈N fi(yi,0), the application of L yields P λi 2 Lf(y)= λiyi, ui, + η 0 0 2 i,0 iX∈N

λi 2 ∂fi (6.7) yiµ uiµ (1 2yiµ)η −  − 2 − iµ ∂y Xµ i,0 2 1 2 2 2 2 2 ∂ fi + λi yi,0 ηi,0 + ηiµyiµ 2 . 2   ∂ yi, iX∈N Xµ 0

Therefore, let us consider the function f(y)= i 1/yi,0 for yi,0 > 0. With ∂f 2 ∂2f 3 P ∂y = 1/yi,0 and ∂y2 = 2/yi,0, (6.7) becomes i,0 − i,0

λi λi 2 Lf(y)= ui,0 uiµyiµ ηi,0 − yi,  − − 2 iX∈N 0 Xµ (6.8) λi 2 yiµ(1 yiµ)η . − 2 − iµ Xµ RATIONALITY AND STOCHASTIC PERTURBATIONS 29

However, since q0 = (e1,0,...,eN,0) has been assumed to be a strict Nash equilibrium of G, we will have ui, (q ) > uiµ(q ) for all µ i 0 . Then, by 0 0 0 ∈ S \{ } continuity, there exists some positive constant vi > 0 with ui, uiµyiµ 0 − µ ≥ vi > 0 whenever yi,0 is large enough (recall that µ yiµ = 1). So,P if we set 2 ηi = max ηiβ(x) : x ∆, β i and pick positiveP λi with λi < vi/η , we {| | ∈ ∈ S } i get

λivi 1 1 (6.9) Lf(y) min λivi f(y) ≤− 2 yi, ≤−2 i { } iX∈N 0 for all sufficiently large yi,0. Moreover, f is strictly positive for yi,0 > 0 and vanishes only as yi, . Hence, as in the proof of Proposition 5.2, our 0 →∞ claim follows on account of f being a (local) stochastic Lyapunov function. Finally, in the case of the rate-adjusted replicator dynamics (3.5′), the proof is similar and only entails a rescaling of the parameters λi. 

Remark 6.2. If we trace our steps back to the coordinates Xiα, our −λi λi Lyapunov candidate takes the form f(x)= i(xi,0 µ xiµ). It thus begs λ to be compared to the Lyapunov function Pµ xµ employedP by Imhof and Hofbauer in [8] to derive a conditional versionP of Theorem 6.1 in the evo- λi lutionary setting. As it turns out, the obvious extension f(x)= i µ xiµ works in our case as well, but the calculations are much more cumbersomeP P and they are also shorn of their ties to the adjusted scores (6.1).

Remark 6.3. We should not neglect to highlight the dual role that the learning rates λi play in our analysis. In the logistic learning model (2.10), they measure the players’ convictions and how strongly they react to a given stimulus (the scores Uiα); in this role, they are fixed at the outset of the game and form an intrinsic part of the replicator dynamics (3.5′). On the other hand, they also make a virtual appearance as free temperature parameters in the adjusted scores (6.1), to be softened until we get the desired result. For this reason, even though Theorem 6.1 remains true for any choice of −λi λi learning rates, the function f(x)= i xi,0 µ xiµ is Lyapunov only if the sensitivity parameters λi are small enough.P P It might thus seem unfortunate that we chose the same notation in both cases, but we feel that our decision is justified by the intimate relation of the two parameters.

7. Discussion. Our aim in this last section will be to discuss a number of important issues that we have not been able to address thoroughly in the rest of the paper; truth be told, a good part of this discussion can be seen as a roadmap for future research. 30 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

Ties with evolutionary game theory. In single-population evolutionary models, an evolutionarily stable strategy (ESS) is a strategy which is robust against invasion by mutant phenotypes [15]. Strategies of this kind can be considered as a stepping stone between mixed and strict equilibria and they are of such significance that it makes one wonder why they have not been included in our analysis. The reason for this omission is pretty simple: even the weakest evolution- ary criteria in multi-population models tend to reject all strategies which are not strict Nash equilibria [11]. Therefore, since our learning model (2.9) corresponds exactly to the multi-population environment (2.14), we lose nothing by concentrating our analysis only on the strict equilibria of the game. If anything, this equivalence between ESS and strict equilibria in multi-population settings further highlights the importance of the latter. However, this also brings out the gulf between the single-population set- ting and our own, even when we restrict ourselves to 2-player games (which are the norm in single-population models). Indeed, the single-population version of the dynamics (3.8) is:

2 2 2 dXα = Xα (uα(X) u(X, X)) η Xα η X dt  − −  α − β β Xβ (7.1) + Xα ηα dWα ηβXβ dWβ . h − X i As it turns out, if a game possesses an interior ESS and the shocks are mild enough, the solution paths X(t) of the (single-population) replicator dynamics will be recurrent (Theorem 2.1 in [10]). Theorem 6.1 rules out such behavior in the case of strict equilibria (the multi-population analogue of ESS), but does not answer the following question: if the underlying game only has mixed equilibria, will the solution X(t) of the dynamics (3.5) be recurrent? This question is equivalent to showing that a profile x is stochastically asymptotically stable in the replicator equations (3.5), (3.5′) only if it is a strict equilibrium. Since Theorem 6.1 provides the converse “if” part, an an- swer in the positive would yield a strong equivalence between stochastically stable states and strict equilibria; we leave this direction to be explored in future papers.

Itˆovs. Stratonovich. For comparison purposes (but also for simplicity), let us momentarily assume that the noise coefficients ηiα do not depend on the state X(t) of the game. In that case, it is interesting (and very instruc- tive) to note that the SDE (3.1) remains unchanged if we use Stratonovich integrals instead of Itˆoones:

(7.2) dUiα(t)= uiα(X(t)) dt + ηiα ∂Wiα(t). RATIONALITY AND STOCHASTIC PERTURBATIONS 31

Then, after a few calculations, the corresponding replicator equation reads

(7.3) ∂Xiα = Xiα(uiα(X) ui(X)) dt + Xiα ηiα ∂Wiα ηiβXiβ ∂Wiβ . −  − X  The form of this last equation is remarkably suggestive. First, it high- 1 2 lights the role of the modified game uiα = uiα + 2 ηiα even more crisply than (3.5): the payoff terms are completely decoupled from the noise, in contrast to what one obtains by introducing Stratonoviche perturbations in the evolu- tionary setting [8, 13]. Secondly, one can seemingly use this simpler equation to get a much more transparent proof of Proposition 4.1: the estimates for the cross entropy terms G ′ are recovered almost immediately from the qi−qi Stratonovich dynamics. However, since (7.3) takes this form only for con- stant coefficients ηiα (the general case is quite a bit uglier), we chose the route of consistency and employed Itˆointegrals throughout our paper.

Applications in network design. Before closing, it is worth pointing out the applicability of the above approach to networks where the presence of noise or uncertainty has two general sources. The first of these has to do with the time variability of the connections which may be due to the fluctuations of the link quality because of mobility in the wireless case or because of external factors (e.g., load conditions) in wireline networks. This variability is usually dependent on the state of the network and was our original moti- vation in considering noise coefficients ηiα that are functions of the players’ strategy profile; incidentally, it was also our original motivation for consid- ering randomly fluctuating payoffs in the first place: travel times and delays in traffic models are not determined solely by the players’ choices, but also by the fickle interference of nature. The second source stems from errors in the measurement of the payoffs themselves (e.g., the throughput obtained in a particular link) and also from the lack of information on the payoff of strategies that were not employed. The variability of the noise coefficients ηiα again allows for a reasonable ap- proximation to this problem. Indeed, if ηiα :∆ R is continuous and satisfies → ηiα(x−i; α) = 0 for all i , α i, this means that there are only errors in estimating the payoffs∈N of strategies∈ S that were not employed (or small er- rors for pure strategies that are employed with high probability). Of course, this does not yet give the full picture [one should consider the discrete-time dynamical system (2.6) instead where the players’ actual choices are consid- ered], but we conjecture that our results will remain essentially unaltered.

Acknowledgments. We would like to extend our gratitude to the anony- mous referee for his insightful comments and to David Leslie from the uni- versity of Bristol for the fruitful discussions on the discrete version of the exponential learning model. Some of the results of Section 4 were presented in the conference “Game Theory for Networks” in Bo˘gazi¸ci University, Istanbul, May 2009 [16]. 32 P. MERTIKOPOULOS AND A. L. MOUSTAKAS

REFERENCES [1] Arnold, L. (1974). Stochastic Differential Equations: Theory and Applications. Wi- ley, New York. MR0443083 [2] Arthur, W. B. (1994). Inductive reasoning and (the El Farol problem). American Economic Review 84 406–411. [3] Aumann, R. J. (1974). Subjectivity and correlation in randomized strategies. J. Math. Econom. 1 67–96. MR0421694 [4] Cabrales, A. (2000). Stochastic Replicator Dynamics. Internat. Econom. Rev. 41 451–481. MR1760461 [5] Fudenberg, D. and Harris, C. (1992). Evolutionary dynamics with aggregate shocks. J. Econom. Theory 57 420–441. MR1180005 [6] Fudenberg, D. and Levine, D. K. (1998). The Theory of Learning in Games. MIT Press Series on Economic Learning and Social Evolution 2. MIT Press, Cambridge, MA. MR1629477 [7] Gikhman, I. I. and Skorokhod, A. V. (1971). Stochastische Differentialgleichungen. Akademie-Verlag, Berlin. [8] Hofbauer, J. and Imhof, L. A. (2009). Time averages, recurrence and tran- sience in the stochastic replicator dynamics. Ann. Appl. Probab. 19 1347–1368. MR2538073 [9] Hofbauer, J. and Sigmund, K. (1988). The Theory of Evolution and Dynamical Systems. Cambridge Univ. Press, Cambridge. MR1071180 [10] Imhof, L. A. (2005). The long-run behavior of the stochastic replicator dynamics. Ann. Appl. Probab. 15 1019–1045. MR2114999 [11] Jorgen,¨ W. W. (1995). Evolutionary Game Theory. MIT Press, Cambridge, MA. MR1347921 [12] Karatzas, I. and Shreve, S. E. (1998). Brownian Motion and Stochastic Calculus. Springer, New York. [13] Khas’minskii, R. Z. and Potsepun, N. (2006). On the replicator dynamics be- havior under Stratonovich type random perturbations. Stoch. Dyn. 6 197–211. MR2239089 [14] Marsili, M., Challet, D. and Zecchina, R. (2000). Exact solution of a modified El Farol’s bar problem: Efficiency and the role of market impact. Phys. A 280 522–553. [15] Maynard Smith, J. (1974). The theory of games and the evolution of animal con- flicts. J. Theoret. Biol. 47 209–221. MR0444115 [16] Mertikopoulos, P. and Moustakas, A. L. (2009). Learning in the presence of noise. In GameNets’09: Proceedings of the 1st International Conference on Game Theory for Networks 308–313. IEEE Press, Piscataway, NJ. [17] Milchtaich, I. (1996). Congestion games with player-specific payoff functions. Games Econom. Behav. 13 111–124. MR1385132 [18] Monderer, D. and Shapley, L. S. (1996). Potential games. Games Econom. Behav. 14 124–143. MR1393599 [19] Nash, J. F. (1951). Non-cooperative games. Ann. Math. 54 286–295. MR0043432 [20] Øksendal, B. (2006). Stochastic Differential Equations, 6th ed. Springer, Berlin. [21] Samuelson, L. and Zhang, J. (1992). Evolutionary stability in asymmetric games. J. Econom. Theory 57 363–391. MR1180002 [22] Schuster, P. and Sigmund, K. (1983). Replicator dynamics. J. Theoret. Biol. 100 533–538. MR0693413 [23] Taylor, P. D. and Jonker, L. B. (1978). Evolutionary stable strategies and game dynamics. Math. Biosci. 40 145–156. MR0489983 RATIONALITY AND STOCHASTIC PERTURBATIONS 33

Department of Physics University of Athens Panepistimioupolis, Zografou Athens, 15784 Greece E-mail: [email protected] [email protected]