Santa Fe Institute Working Paper 04-08-025 arxiv.org:/abs/nlin.AO/0408039 Stability and Diversity in

Yuzuru Sato,1, ∗ Eizo Akiyama,2, † and James P. Crutchfield1, ‡ 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA 2Institute of Policy and Planning Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8573, Japan We derive a class of macroscopic differential equations that describe collective adaptation, starting from a discrete-time stochastic microscopic model. The behavior of each agent is a dynamic balance between adaptation that locally achieves the best action and memory loss that leads to random- ized behavior. We show that, although individual agents interact with their environment and other agents in a purely self-interested way, macroscopic behavior can be interpreted as game dynamics. Application to several familiar, explicit game interactions shows that the adaptation dynamics ex- hibits a diversity of collective behaviors, including stable limit cycles, quasiperiodicity, intermittency, and deterministic chaos. The simplicity of the assumptions underlying the macroscopic equations suggests that these behaviors should be expected broadly in collective adaptation. We also ana- lyze the adaptation dynamics from an -theoretic viewpoint and discuss self-organization induced by information flux between agents, giving a novel view of collective adaptation.

PACS numbers: 05.45.-a, 89.75.Fb 02.50.Le, Keywords: collective adaptation, evolutionary dynamics, , , dynamical systems

I. INTRODUCTION ogenous influences. The agents could be humans, ani- mals, or machines, but we make no assumptions about their detailed internal structures. That is, the central in groups of adaptive systems is hypothesis in the following is that collective adaptation an important and cross-cutting topic that appears under is a dynamical behavior driven by agents’ environment- various guises in many fields, including biology, cogni- mediated interactions. By separating the time scales of tive neurosciences, computer science, and social science. change in the environment, of agents’ adaptation, and In all these adaptive systems, individual agents interact of agent-agent interactions, our models describe, not the with one another and modify their behaviors according action-taking itself, but the temporal change in the prob- to the information they receive through those interac- ability distribution of choices. tions. Often, though, collective behaviors emerge that are beyond the individual agent’s perceptual capabilities and that sometimes frustrate satisfying the local goals. With competitive interactions dynamic adaptation can A. Related Work produce rich and unexpected behaviors. This kind of mutual adaptation has been discussed, for example, in This approach should be compared and contrasted studies of biological group interaction [1–3], interactive with game theoretic view [12]. First, classical game learning [4–6], large-scale adaptive systems [7–9], and theory often assumes that players have knowledge of learning in games [10, 11]. the entire environmental structure and of other players’ Here we develop a class of coupled differential equa- decision-making processes. Our adaptive agents, how- tions for mutual adaptation in agent —systems ever, have no knowledge of a game in which they might in which agents learn how to act in their environment and be playing. Thus, unlike classical game theory, in our with other agents through reinforcement of their actions. setting there is no bird’s eye view for the entire collec- We show that the adaptive behavior in agent collectives, tive that is available to the agents. Agents have only in special cases, reduces to a generalized form of mul- a myopic model of the environment, since any informa- tipopulation replicator equations and, generally, can be tion external to them is given implicitly via the rein- viewed as a kind information-theoretic self-organization forcements for their action choices. Second, although we in a collective adaptive system. employ game-theoretic concepts such as Nash equilibria, we focus almost exclusively on dynamics—transients, at- Suppose that many agents interact with an environ- tractors, and so on—of collective adaptation, while, natu- ment and each independently attempts to adjust its be- rally, making contact with the statics familiar from game havior to the environment based on its sensory stimuli. theory. Finally, despite the differences, game structures The environment consists of other agents and other ex- can be introduced as a of parameters corresponding to approximated static environments. While replicator dynamics were introduced originally ∗Electronic address: [email protected] for [13–15], the relationship be- †Electronic address: [email protected] tween learning with reinforcement and replicator equa- ‡Electronic address: [email protected] tions has been discussed only recently [10, 11]. Briefly 2 stated, in our model the state space represents an individ- at a larger scale, in agent collectives. ual agent’s probability distribution to choose actions and the adaptation equations describe the temporal The phenomenology that we develop for this is one of choice probabilities as the agents interact. Here, we based on communications systems. Agents in a collec- extend these considerations to collective adaptation, in- tive are confronted with the same three communication troducing the theory behind a previously reported model problems posed by Weaver in the founding work of infor- [16, 17]. The overall approach, though, establishes a mation theory—The Mathematical Theory of Communi- general framework for dynamical-systems modeling and cation [22]: (i) communicating accurately, (ii) interpret- analysis of adaptive behavior in collectives. It is im- ing the transmitted information, and (iii) modifying fu- portant to emphasize that our framework goes beyond ture behavior based on the information. Shannon solved the multipopulation replicator equations and asymmetric the first problem developing his theory of error-free trans- game dynamics since it does not require a static environ- mission [22]. In his vocabulary adaptive agents are infor- ment (cf. Ref. [18, 19] for dynamic environments) and it mation sources. Each (i) receives information from the includes the key element of the temporal loss of memory. external environment, which includes other agents, (ii) We model adaptation in terms of the distribution of interprets the received information and modifies its in- agents’ choices, developing a set of differential equa- ternal model accordingly, and (iii) then, making decisions tions that are a continuous-time limit of a discrete-time based on those modifications, generates future behavior. stochastic process; cf. Ref. [20]. We spend some time discussing the origin of action probabilities, since this is We will show that this information-theoretic view pro- necessary to understand the model variables and also to vides useful tools for analyzing collective adaptation and clarify the limits that we invoke to arrive at our model. also an appropriate description for our assumed fre- One is tempted to give a game-theoretic interpretation of quency dynamics. Using these we derive a new state the model and its development. For example, the mixed space based on the self- of agent’s actions strategies in game play are often interpreted as weights and this allows one to investigate the dynamics of un- over all (complete plans of) actions. However, the game- certainty in collective adaptation. It will become clear, theoretic view is inappropriate for analyzing local, my- though, that the assumption of global information maxi- opic adaptation and the time evolution of collective be- mization has limited relevance here, even for simple adap- havior. tation in a static environment. Instead, self-organization Another interpretation of our use of action probabili- that derive from the information flux between agents ties comes from regarding them as frequencies of action gives us a new view of collective adaptation. choices. In this view, one needs long-time trials so that the frequencies take on statistical validity for an agent. To illustrate collective adaptation, we present several Short of this, they would be dominated by fluctuations, simulations of example environments; in particular, those due to undersampling. In particular, one requires that having frustrated agent-agent interactions [23]. Inter- stable limit distributions exist. Moreover, the underlying estingly, for two agents with perfect memory interacting deterministic dynamics of adaptation should be ergodic via zero-sum rock-scissors-paper interactions the dynam- and have strong mixing properties. Finally, considering ics exhibits Hamiltonian chaos [16]. With memory loss, agent-agent interactions, one needs to assume that their though, the dynamics becomes dissipative and displays adaptation is very slow compared to interaction dynam- the full range of nonlinear dynamical behaviors, includ- ics. For rapid, say, real-time, adaptation, these assump- ing limit cycles, intermittency, and deterministic chaos tions would be invalid. Nonetheless, they are appropriate [17]. for long-term reinforcement, as found in learning motion The examples illustrate that Nash equilibria often through iterated exercise and learning customs through plays little or no role in collective adaptation. Some- social interaction. times the dynamics is explicitly excluded from reaching Nash equilibria, even asymptotically. Rather, it turns out that the network describing the switching between B. Synopsis deterministic actions is a dominant factor in structuring the state-space flows. From it, much of the dynamics, The approach we take is ultimately phenomenological. including the origins of chaos becomes intuitively clear. We are reminded of the reaction-diffusion models of bio- logical morphogenesis introduced originally in Ref. [21]. In the next section (Sec. II), we develop a dynamical There, the detailed processes of biological development system that models adaptive behavior in collectives. In and were abstracted, since their bio- Sec. III we introduce an information-theoretic view and chemical basis was (and still is) largely unknown, and a coordinate-transformation for adaptive dynamics. To il- behavioral phenomenology was developed on this basis. lustrate the rich range of behaviors, in the Sec. IV we Similarly, we abstract the detailed and unknown percep- give several examples of adaptive dynamics based on non- tual processes that underlie agent adaptation and con- transitive interactions. Finally, in Sec. V we interpret struct a phenomenology that captures adaptive behavior our results and suggest future directions. 3

II. DYNAMICS FOR COLLECTIVE An agent chooses its next action according to its choice ADAPTATION distribution which is updated from the reinforcement memory according to: Before developing the full equations for a collective of adaptive agents, it is helpful to first describe the dy- eβQi(τ) xi(τ) = , (2) N βQn(τ) namics of how an individual agent adapts to the con- Σn=1e straints imposed by its environment using the memory of its past behaviors. We then build up a description of where i = 1, 2,...,N. β [0, ] controls the adapta- how multiple agents interact, focusing only on the addi- tion rate: how much the choice∈ ∞ distribution is changed tional features that come from interaction. The result is by the memory of past reinforcements. For example, if a set of coupled differential equations that determine the β = 0, the choice distribution is unaffected by past re- behavior of adaptive agent collectives and are amenable inforcements. Specifically, it becomes independent of Q to various kinds geometric, statistical, and information- and one has xi(τ) = 1/N. In this case, the agent chooses theoretic analyses. actions with uniform probability and so behaves com- pletely randomly. In a complementary fashion, in the limit β , an agent chooses that action i with the → ∞ A. Individual Agent Adaptation maximum Qi(τ) and xi(τ) 1. Given Eq. (2) the time evolution→ of agent’s choice dis- Here we develop a continuous-time model for adap- tribution) is: tation in an environment with a single adaptive agent. β(Qi(τ+1)−Qi(τ)) Although the behavior in this case is relatively simple, xi(τ)e xi(τ + 1) = . (3) the single-agent case allows us to explain several basic N β(Qn(τ+1)−Qn(τ)) Σn=1xn(τ)e points about dynamic adaptation, without the compli- cations of a collective and agent-agent interactions. In where i = 1, 2,...,N. This determines how the agent particular, we discuss how and why we go from a discrete- adapts its choice distribution using reinforcements it has time stochastic process to a continuous-time limit. We received from the environment for its past actions. also describe an agent’s effective internal model of the This simple kind of adaptation was introduced as a environment and how we model its adaptation process principle of behavioral learning [24, 25] and as a model via a probability distribution of action choices. of stochastic learning [26]. It is sometimes referred to as An agent takes one of N possible actions: i = reinforcement learning [27, 28]. Arguably, it is the sim- 1, 2,...,N at each time step τ. Let the probability plest form of adaptation in which an agent develops re- for the agent to chose action i be xi(τ), where τ is lationships or behavior patterns through reinforcements the number of steps from the initial state xi(0). The from external stimuli. agent’s state vector—its choice distribution—at time τ Starting with the discrete-time model above, one can N is x(τ) = (x1(τ),x2(τ),...,xN (τ)), where Σn xn(τ) = 1. develop a continuous-time model that corresponds to the In the following we call the temporal behavior of x(τ) as agent performing a large number of actions, iterates of the dynamics of adaptation. Eq. (1), for each choice distribution update, iterate of Eq. Let ri(τ) denote the reinforcement the agent receives (2). Thus, we recognize two different time scales: one for for its taking action i at step τ. Denote the collection agent-environment interactions and one for adaptation of of these by the vector r(τ) = (r1(τ),...,rN (τ)). The the agent’s internal model based on its internal memory. agent’s memories—denoted Q(τ) = (Q1(τ),...,QN (τ)) We assume that the adaptation dynamics is very slow —of past rewards from its actions are updated according compared to interactions and so x is essentially constant to during interactions. See Fig. 1. 1 Starting from Eq. (1), one can show that the Qi(τ + 1) Qi(τ) = [δi(τ)ri(τ) αQi(τ)] , (1) − T − continuous-time dynamics of memory updates is given 1, action i chosen at step τ by the differential equations δ (τ) = i 0, otherwise ½ Q˙ i(t) = Ri(t) αQi(t) , (4) − with i = 1,...,N and Qi(0) = 0. T is a constant that sets the agent-environment interaction time scale. with i = 1, 2,...,N and Qi(0) = 0. (see App. A.) α [0, 1) controls the agent’s memory loss rate. For Here Ri is the reward the environment gives to the agent ∈ α = 0, the agent has a perfect memory as the sum of the choosing action i: the average of ri(τ) during the time past reinforcements; for α > 0 the memory is attenuated interval between updates of x at t and t + dt. in that older reinforcements have less effect on the cur- From Eq. (2) one sees that the map from Q(t) to x(t) rent Qis and more recent reinforcements are given larger at time t is given by weight. One imagines that the agent constructs a his- togram of past reinforcements and this serves as a simple eβQi(t) xi(t) = , (5) N βQn(t) internal memory of its environment. Σn=1e 4

and that function’s average. Specifically, the first term ∆Ri Ri R is the relative benefit in choosing action Adaptation i compared≡ − to the mean reinforcement across all choices. t Other things being held constant, if this term is positive, dt then action i is the better choice compared to the mean and xi will increase. The second term ∆Hi Hi H is ≡ − Interaction the relative informativeness of taking action i compared τ to the average H. Thus, xi decreases in proportion to the at time t and so this term works to increase the uncertainty of agent’s actions, flattening the choice distribution by increasing the probability of unlikely ac- −1 tions. When xi = N , the distribution is flat (purely FIG. 1: The time scale (t) of a single agent interacting with its random choices), ∆H = 0, and memory loss effects dis- environment and the time scale (τ) of the agent’s adaptation: appear. τ ¿ t. Mathematically, the adaptation equations have quite a bit of structure and this has important consequences, where i = 1, 2,...,N. Differentiating Eq. (5) gives the as we will see. Summarizing, the adaptation equations continuous-time dynamics describe a dynamic that balances the tendency to concen- trate on choices associated with the best action against ˙ N ˙ x˙ i(t) = βxi(t)(Qi(t) Σn=1Qn(t)xn(t)) , (6) the tendency to make the choices equally likely. The net − result is to increase the choice uncertainty, subject to with i = 1, 2,...,N. the constraints imposed by the environment via the re- Assembling Eqs. (4), (5), and (6), one finds the basic inforcements. Thus, the choice distribution is the least dynamic that governs agent behavior on the adaptation biased distribution consistent with environmental con- time-scale: straints and individual memory loss. We will return to discuss this mechanism in detail using information theory x˙i = β(Ri R) + α(Hi H) , (7) in the Sec. III. xi − − where i = 1, 2,...,N. Here

N Adaptation R = Σn=1xnRn (8) Probability Memory Loss is the net reinforcement averaged over the agent’s possi- Distribution ble actions. And, Hi = log xi is the self-information or degree of surprise when− the agent takes action i [22]. The average self-information, or Shannon entropy of the Action choice distribution, also appears as

N N H = Σ xnHn = Σ xn log xn . (9) n=1 − n=1 FIG. 2: A dynamic balance of adaptation and memory loss: These are the of the agent’s choice distribution Adaptation concentrates the probability distribution on the measured, not in bits (binary digits), but in nats (natural best actions. Memory loss of past history leads to a distribu- digits), since the natural logarithm is used. The entropy tion that is flatter and has higher entropy. measures the choice distribution’s flatness, being maxi- mized when the choices all have equal probability. Fortunately, the basic dynamic captured by Eq. (7) Since the reinforcement determines the agent’s interac- is quite intuitive, being the balance of two terms on tions with the environment, there are, in fact, three dif- the right-hand side. The first term describes an adap- ferent time scales operating: that for agent-environment tation dynamic, whose time scale is controlled by β. interactions, that for each agent’s adaptation, and that The second describes the loss of memory with a time for changes to the environment. However, if the environ- scale controlled by α. That is, the adaptation in choice ment changes very slow compared to the agent’s internal probabilities is driven by a balance between two forces: adaptation, the environment ri(t) can be regarded as ef- the tendency to concentrate the choice probability based fectively constant, as shown in Fig. 3. on the reinforcement Ri and the tendency to make In this case ri(t) can be approximated as a static choices equally likely.{ Finally,} on the lefthand side, one relationship between an agent’s actions and the rein- has the logarithmic derivative of the choice probabilities: forcements given by the environment. Let ri(t) = ai, x˙ i/xi = d/dt log xi. where a = (a1,...,aN ) are constants that are normal- N Note that each of the terms on the righthand side is ized: Σn=1an = 0. Given this, the agent’s time-average a difference between a function of a particular choice reinforcements are ai (Ri = ai) and the continuous-time 5

2 2 Environment t'

∆x ∆x

Adaptation t 3 1 3 1 dt

FIG. 4: Dynamics of single-agent adaptation: Here there are Interaction τ three actions, labeled 1, 2, and 3, and the environment gives 2 1 1 reinforcements according to a = ( 3 ², −1 − 3 ², 1 − 3 ²). The figure shows two trajectories from simulations with ² = 0.5 and β = 0.1 and with α = 0.0 (right) and α = 0.3 (left).

FIG. 3: The time scales of dynamic adaptation: Agent adap- tation is slow compared to agent-environment interaction and adaptation in agent collectives, the subject of the fol- environmental change is slower still compared to adaptation. lowing sections, corresponds to just this situation. Other agents provide, thought their own adaptation, a dynamic environment to any given agent and if their times scales dynamic simplifies to: of adaptation are close the dynamics can be quite rich and difficult to predict and analyze. x˙ i N = β(ai Σn=1anxn) + α(Hi H) , (10) xi − − B. Two Agent Adaptation where i = 1, 2,...,N. The behavior of single-agent adaptation given by Eq. To develop equations of motion for adaptation in an (10) is very simple. When α is small, so that adaptation agent collective we initially assume, for simplicity, that is dominant xi 1, where i is the action with the high- → there are only two agents. The agents, denoted X and Y , est reward ai, and xj 0 for j = i. The agent receives at each moment take one of N or M actions, respectively. this information from→ the fixed environment6 and its be- The agents states at time t are x = (x1,...,xN ) and havior is simply to choose the action with the maximum N M y = (y1,...,yM ), with Σn=1xn = Σm=1ym = 1. x(0) reward and the choice distribution moves to the associ- and y(0) are the initial conditions. We view the time ated simplex vertex (0,...,xi = 1,..., 0). In the special evolution of each agent’s state vector in the simplices case when α = 0, it is known that for arbitrary a Eq. (10) x ∆X and y ∆Y and the group dynamics in the moves x to the vertex corresponding to the maximum ai collective∈ state space∈ ∆ which is the product of the agent [2]. In a complementary way, when α is large enough to simplices: overcome the relative differences in reinforcements—that β is, when α 0 memory loss dominates, the agent states X = (x, y) ∆ = ∆X ∆Y . (11) → −1 goes to a uniform choice distribution (xi = N ) and ∈ × the system converges to the simplex center. Note that in There are again three different time scales to consider: this balance between local optimization one for agent-agent interaction, one for each agent’s in- and randomized behavior, which selects nonoptimal ac- ternal adaptation, and one for the environment which tions, is referred to as the exploitation-exploration trade- now mediates agent interactions via the reinforcements off [28]. given to the agents. Here we distinguish between the For instance, consider an agent that takes N = 3 global environment experienced by the agents and the actions, 1, 2, 3 , in an environment described by a = external environment, which is the global environment 2 { 1 } 1 ( 3 ², 1 3 ², 1 3 ²), with ² [ 1, 1]. In the perfect with the agent states removed. The external environ- memory− − case (α−= 0), the choice∈ − distribution converges ment controls, for example, the degree of coupling be- ∗ 1 1 1 to a stable fixed point (0, 0, 1). x = ( 3 , 3 , 3 ) is an un- tween the agents. In contrast with the single-agent case, stable hyperbolic fixed point. In the memory loss case in the many agent setting each agent’s behavior produces (α > 0), dynamics converges a stable fixed point inside a dynamic environment for the other. This environment the simplex. (These cases are illustrated in Fig. 4.) dynamics is particularly important when the adaptation Even when the environment is time-dependent, the time scales of each agent are close. agent’s behavior can track the highest-reward action as Following the single-agent case, though, we assume long as the time scale of environment change is slow com- that the adaptation dynamic is very slow compared to pared to the agent’s adaptation. However, the situation that of agent-agent interactions and that the dynamics is more interesting when environment change occurs at of the external environment changes very slowly com- a rate near the time-scale set by adaptation. Mutual pared to that of agents’ mutual adaptation. Under these 6 assumptions the agent state vectors x and y are effec- this balance together using information that comes from tively constant during the agent-agent interactions that their interactions. occur between adaptation updates. The immediate con- As given, the adaptation equations include the possi- sequence is that can describes the collective state space bility of a time-dependent environment, which would be in terms of the frequencies of actions (the choice dis- implemented (say) using a time-dependent reinforcement tributions). Additionally, the environment is essentially scheme. However, as with the single-agent case, it is help- constant relative to changes in the states x and y. ful to simplify the model by assuming a static external X X X Denote the agents’ memories by Q = (Q1 ,...,QN ) environment and, in particular, static relationships be- Y Y Y X for X and Q = (Q1 ,...,QM ) for Y and set Qi (0) = 0 tween the agents. Y and Qj (0) = 0, for for i = 1,...,N and j = 1,...,M. Assume that the external environment changes slowly For the dynamic governing memory updates we have compared to the dynamics of mutual adaptation, as illus-

X X X trated in Fig. 3. This implies a (near) static relationship Q˙ = R αX Q , i i − i between pairs of action choices (i, j) and reinforcements ˙ Y Y Y Qj = Rj αY Qj , (12) rij for both agents. Since the environmental dynamics is − X very slow compared to each agents’ adaptation, rij (t) and for i = 1, 2,...,N and j = 1, 2,...,M. RX is the reward Y i rji(t) are essentially constant during adaptation. The rs for agent X choosing action i, averaged over agent Y ’s can be approximated then as constant: Y actions between adaptive updates; and Rj is Y ’s. The X parameters αX , αY [0, 1) control each agent’s memory ∈ rij (t) = aij , loss rate, respectively. Y The map from QX (t) to x(t) and from QY (t) to y(t) rij (t) = bji , (17) at time t is for i = 1,...,N and j = 1,...,M. aij and bji are nor- β QX t e X i ( ) malized over j and i so that when summing over all ac- xi(t) = X , N βX Qk (t) tions the reinforcements vanish. Σk=1e Y βY Qj (t) N e Σn=1anj = 0 , yj (t) = Y , (13) M βY Qk (t) M Σk=1e Σm=1bmi = 0 , (18) for i = 1,...,N and j = 1,...,M. Here βX , βY [0, ] ∈ ∞ Given the form of ∆R in the adaptation equations, this control the agents’ adaptation rates, respectively. Differ- normalization does not affect the dynamics. entiating Eq. (13) with respect to t, the continuous-time Assume further that x and y are independently dis- adaptation for two agents is governed by tributed. This is equivalent to agents never having a x˙ = β x (Q˙ X ΣN Q˙ X x ) , global view of the collective or their interactions with i X i i k k k the environment (other agents). Each agent’s knowledge Y − M Y y˙j = βY yj(Q˙ Σ Q˙ yk) . (14) j − k k of the environment is uncorrelated, at each moment, with the state of the other agents. The time-average rewards for i = 1,...,N and j = 1,...,M. for X and Y now become Putting together Eqs. (12), (13), and (14), one finds the coupled adaptation equations for two agents: X M Ri = Σm=1aimym = (Ay)i Y N x˙i X X X X Rj = Σn=1bjnxn = (Bx)j , (19) = βX (Ri R ) + αX (Hi H ) , xi − − for i = 1,...,N and j = 1,...,M. In this restricted y˙j Y Y Y Y = βY (Rj R ) + αY (Hj H ) , (15) case, the continuous-time dynamic is given by the cou- yj − − pled adaptation equations for i = 1,...,N and j = 1,...,M and where x˙ i X N X Y M Y = βX [(Ay)i x Ay] R = Σk=1xkRk , R = Σk=1ykRk , xi − · X N X Y M Y N H = Σk xkHk , H = Σk ykHk . (16) + αX [ log xi + Σ xk log xk], =1 =1 − k y˙j The interpretations of the ∆R = Ri R and ∆H = = βY [(Bx)j y Bx] − yj − · Hi H terms are not essentially different from those − M introduced to describe the single-agent case. That is, + αY [ log yj + Σk yk log yk]. (20) the behavior of each agent is a dynamic balance between − (i) adaptation: concentrating the choice probability on for i = 1,...,N and j = 1,...,M. A is an N M × the best action at t and (ii) memory loss: increasing the matrix and B is a M N matrix with (A)ij = aij and × choice uncertainty. What is new here is that there are (B)ji = bji, respectively. x Ay is the inner product · N M two (and eventually more) agents attempting to achieve between x and Ay: x Ay = Σ Σ anmxnym. · n m 7

C. Collective Adaptation The special case that allows us to make contact with evolutionary dynamics and game theory is the restriction Generalizing to an arbitrary number of agents at this to agents with perfect memory interacting in a static en- point should appear straightforward. It simply requires vironment. (For further details see App. B.) In the extending Eqs. (15) to a collection of adaptive agents. two-agent case we set αX = αY = 0 and equal adap- Suppose there are S agents labeled s = 1, 2,...,S and tation rates, βX = βY . Under these assumptions our each agent can take one of N s actions, then we have model, Eqs. (20), reduces to what is either called multi- population replicator equations [14] or asymmetric game ˙s xi s s s s dynamics [10, 14]. The equations are: s = βs(Ri R ) + αs(Hi H ) . (21) xi − − x˙ i for i = 1,...,Ns and s = 1,...,S. Equations (21) consti- = (Ay)i x Ay , xi − · tute our general model for adaptation in agents collective. y˙j One describes the time evolution of the agents’ state = (Bx)j y Bx . (25) vectors in the simplices x1 ∆ , x2 ∆ , ..., and xS yj − · ∈ 1 ∈ 2 ∈ ∆S. The adaptation dynamics in the higher-dimensional collective state space occurs within From the perspective of game theory, one regards the interactions determined by A and B, respectively, as X’s 1 2 S X = (x , x ,..., x ) ∆ = ∆1 ∆2 . . . ∆S . (22) and Y ’s payoff matrices for a linear game in which X ∈ × × plays action i against Y ’s action j. Additionally, x and Note that the general model includes heterogeneous net- y, the agent state vectors, are interpreted as the mixed work settings with local interactions; see App. E. strategies. In fact, x Ay and y Bx in Eqs. (25) formally With three agents X, Y , and Z, one obtains, for ex- satisfy von Neumann-Morgenstern· · utilities [12]. If they ample: exist in the interior of the collective simplices ∆X and x˙i X X X X ∆Y , interior Nash equilibria of the game (A, B) are the = βX (Ri R ) + αX [Hi H ] fixed points determined by the intersections of the x- and xi − − y-nullclines of Eqs. (25). y˙j Y Y Y Y = βY (Rj R ) + αY [Hj H ] One must be careful, though, in drawing parallels be- y − − j tween our general dynamic setting and classical game z˙k Z Z Z Z theory. In the idealized economic agents, it is often as- = βZ (Rk R ) + αZ [Hk H ] , (23) zk − − sumed that agents have knowledge of the entire game for i = 1,...,N, j = 1,...,M, and k = 1,...,L. The structure and of other agents’ decision-making processes. static environment version reduces to Its central derives how these rational play- ers should act. Our adaptive agents, in contrast, have x˙ i = βX [(Ayz)i x Ayz] no knowledge of a game in which they might be play- xi − · ing, only a myopic model of the environment and, even N + αX [ log xi + Σ xn log xn], then, this is given only implicitly via the reinforcements − n y˙j the agents receive from the environment. In particular, = βY [(Bzx)j y Bzx] the agents do not know whether they are playing a game yj − · M or not, how many agents there are beyond themselves, + αY [ log yj + Σ ym log ym], − m or even whether other agents exist or not. Our model z˙k of dynamic adaptation under such constraints is appro- = βZ [(Cxy)k z Cxy] zk − · priate nonetheless for many real world learning systems, L whether animal, human, or economic agent collectives + αZ [ log zk + Σ zl log zl] , (24) − l [29]. The bi-matrix game (A, B) appears above as a de- for i = 1,...,N, j = 1,...,M, and k = 1,...,L, and scription of the collective’s global dynamic only under with tensors (A)ijk = aijk, (B)jki = bjki, (C)kij = the assumptions that the external environment changes M L ckij. Here (Ayz)i = Σm Σl aimlymzl and x Ayz = very slowly. N M L · Σn Σm Σl anmlxnymzl and similarly for Y and Z. The connection with evolutionary dynamics is formal and comes from the fact that Eqs. (25) are the well known replicator equations of population dynamics [2]. D. Evolutionary Dynamics and Game Theory However, the interpretation of the variables is rather dif- ferent. Population dynamics views x and y as two sep- We now interrupt the development to discuss the con- arate, but interacting (infinite size) groups. These two nections between the model developed this far and mod- populations are described as distributions of various or- els from population dynamics and game theory. There ganismal phenotypes. The equations of motion deter- are interesting connections and also some important dis- mine the evolution of these populations over time and tinctions that need to be kept in mind, before we can through interaction. In our model, in contrast, x and y move forward. represent the probability of agents choosing actions. The 8 equations of motion describe their dynamic adaptation In this space the equations of motion become: to each other through interaction. −η −ξ −η −ξ Despite the similarities that one can draw in this spe- ξ˙i = βX [(Ae )i e Ae ] αX [ξi e ξ] cial case, it is important to emphasize that our frame- − − · − − · work goes beyond the multipopulation replicator equa- −ξ −η −ξ −η η˙j = βY [(Be )j e Be ] αY [ηj e η] , tions and asymmetric game dynamics. First, the rein- − − · − − · forcement scheme R need not lead to linear interactions. (28) Second, the model does not require a static environment −ξ described by a constant bi-matrix (A, B). Finally, the for i = 1,...,N and j = 1,...,M and where e = −ξ1 −ξN −η −η1 −ηN occurrence of the memory loss term is entirely new and (e ,...,e ) and e = (e ,...,e ). not found in game theory or evolutionary dynamics. Recall that both the ∆R interaction term and the ∆H memory loss term are differences from means. This sug- gests yet another transformation to remove these com- III. INFORMATION, UNCERTAINTY, AND parisons to the mean: DYNAMIC ADAPTATION N −1 ui = ξi N ξk , We now shift away from a dynamical systems view and, − k as promised earlier, begin to think of the agent collective X=1 M as a communication system. Although, this initially will −1 vj = ηj M ηk , (29) appear unrelated, we will show that there is a close con- − k nection between the dynamical and information theoretic X=1 perspectives—connections that have both mathematical with i = 1,...,N and j = 1,...,M. This leads to the and pragmatic consequences. normalized space in RN RM : We consider the adaptive agents in the collective to × be information sources. Each agent receives information U = (u, v) = (u1,...,uN ) (v1,...,vM ) , (30) from its environment, which includes other agents. Each × N M agent interprets the received information and modifies with the constraints n=1 un = m=1 vm = 0. u and its behavior accordingly, changing from x(t) to x(t + dt). v are the normalized self-informations relative to their Each agent generates a series of messages (actions) based means. We refer to thisP space as informationP space. on its updated internal model and introduces this new The combined coordinate transformation, Eq. (29) behavior back into the environment. This is a different composed with Eq. (26), gives the well known centered interpretation of the interaction process in the collective log-ratio coordinates [30]. The inverse transformation is: which we motivated up to now only as a dynamical pro- cess. Now we discuss the adaptive dynamics from infor- e−ui xi = , N −uk mation theoretic viewpoint. Σk e e−vi yi = . (31) M −vk A. Dynamics in Information Space Σk e The resulting transformed adaptation equations di- In this section we introduce a new state space that di- rectly model the dynamics of uncertainties of agents’ be- rectly represents the uncertainties of agent actions. First, havior: as before, for clarity we focus on the two-agent static- environment case, Eqs. (20). Since the components of N u˙ = βX Ay (Ay)k αX u , the agents’ states are probabilities, the quantities − − − " k # X ξi = log xi , N − v˙ = βY By (By)k αY v . (32) ηj = log yj , (26) − − − − " k # X are the self-informations of agents X and Y choosing When the interaction matrices are normalized to zero actions i and j, respectively. When xi is small, for ex- mean, the equations simplify even further to ample, the self-information ξi is large since action i is rarely chosen by agent X. Consider the resulting change N M u˙ = βX Ay αX u , in coordinates in R+ R+ : − − × v˙ = βY Bx αY v . (33) − − Ξ = (ξ, η) = (ξ1,...,ξN ) (η1,...,ηM ) . (27) × The origin O = (0, 0,..., 0) of the normalized informa- N M The normalization conditions—Σn=1xn = Σm=1ym = tion space U corresponds to random behavior: (x, y) = 1—that restrict the agent states to lie in simplices be- (1/N,..., 1/N, 1/M,..., 1/M). The Shannon entropy of N −ξn M −ηm come Σn=1e = Σm=1e = 1 in Ξ. the choice distribution is maximized at this point. In 9 contrast, when agents choose an action with probability 1 the entropy vanishes and the agent state is located in ∆ at the simplex vertices and in U at infinity. ∆x ∆y (x*, y*) (x*, y*) In Eqs. (33) the first term is related to information in- flux to an agent from outside; i.e., from other agents and the environment. The second term is related to the in- formation dissipation due to internal memory loss. Eqs. D(x*||x) D(y*||y) (33) are useful for theory, for analysis in certain limits, as we will shortly demonstrate, and for numerical stability during simulation, which we will illustrate when consider- FIG. 5: Dynamics of zero-sum interaction without memory −1 ∗ −1 ∗ ing example collectives below. Note that Eqs. (20), Eqs. loss: Constant of motion E = βX D(x k x) + βY D(y k y (28), and Eqs. (32) are topologically orbit equivalent. keeps the linear sum of distance between the interior Nash equilibrium and each agent’s state.

B. Self-organization Induced by Dynamics of [32]. In this case, Eqs. (32) are described quite simply, Uncertainty O P U˙ = J U E, with J = , (36) Equations (32) describe a dynamics of uncertainty be- ∇ P T O tween deterministic and random behavior. Information µ − ¶ influx occurs when the agents adapt to environmental where P = βX βY A. (Again, see App. C.) − constraints and accordingly change their choice distribu- When the bi-matrix interaction (A, B) satisfies B = tion. Information dissipation occurs when memory loss AT , E is a Lyapunov function of dynamics and decreases dominates and the agents increase their uncertainty to to 0 over time [2]. In this case, (x∗, y∗) may not be in behave more randomly with less regard to the environ- the interior of the collective simplices ∆. mental constraints. The dissipation rate γ of the dynam- In some cases when neither B = AT nor B = AT , E − ics in U is controlled entirely by the memory loss rate increases non-monotonically, the dynamics in U diverges, α: and the Shannon entropies of agents’ choice distribution asymptotically decreases. (See Figs. 17 and 19 below.) N M ∂u˙ k ∂v˙k Note that in single-agent adaptation with state x and γ = + normalizing the environment’s reinforcements to a prob- ∂uk ∂vk k−1 k−1 ability distribution pe, D(pe x) is a Lyapunov function X X k = NαX MαY . (34) of the dynamics. It always decreases monotonically. In − − mutual adaptation, though, agents adapt to a dynamic Therefore, Eqs. (33) are volume preserving in U when environment that includes the other agents. As a result, αX = αY = 0. in some cases, E, a linear sum of agent relative entropies, will itself exhibit nontrivial dynamics and, in addition, In the case that agents behave without memory loss the uncertainties of agents’ choices will asymptotically (αX = αY = 0), if the interaction specified by (A, B) is decrease. zero-sum, B = AT , and if, in addition, it determines When agents adapt with memory loss (α > 0), the − ∗ ∗ an interior Nash equilibrium (x , y ) (see App. B), then dynamics is dissipative. Since the memory loss terms in- the collective has a constant of motion: duce information dissipation, the dynamics remains well inside the simplices. − − E = β 1D(x∗ x) + β 1D(y∗ y) , (35) Notably, when the agents attempt to achieve this bal- X k Y k ance together by interacting and, in particular, when the where D(p q) = Σkpk log(pk/qk) is the relative entropy interaction has nontransitive structure, the dynamics can k or the information gain which measures the similarity be- persistently wander in a bounded area in information tween probability distributions p and q [31]. (Appendix space. Since, in some cases, mutual adaptation and mem- C gives the derivation of Eq. (35).) Since the constant ory loss produce successive stretching and folding, deter- of motion E is a linear sum of relative entropies, the ministic chaos can occur with decreasing α, even with collective maintains the information-theoretic distance only two agents. A schematic view of the flow in mutual between the interior Nash equilibrium and each agent’s adaptation is given in Fig. 6. Roughly, information space state. Thus, in the perfect memory case (α = 0), by the locally splits into subspaces governed by effects of mutual inequality D(p q) 0, the interior Nash equilibrium adaptation (information influx) and memory loss (infor- k ≥ cannot be reached unless the initial condition itself starts mation dissipation). Information influx and information on it (Fig. 5). This is an information-theoretic interpre- dissipation correspond to unstable and stable flow direc- tation of the constant of motion noted in Ref. [32]. tions. In this case, “weak” uncertainty of behavior plays Moreover, when N = M the dynamics has a symplectic an important role in organizing the collective’s behavior. structure in U with the Hamiltonian E given in Eq. (35) Small fluctuations in behavior are amplified through re- 10 peated mutual adaptation with competitive interactions. will have equal adaptation rates (βX = βY = ) and the same number of actions (N = M = L = · · · ). In Adaptation these simplified cases, the equations of motion· ·for · two agents are given by

x˙ i X Memory Loss Nontransitive Interaction = [(Ay)i x Ay] + αX [Hi H ] xi − · − y˙j Y = [(Bx)j y Bx] + αY [Hj H ] , (38) FIG. 6: Horseshoe in mutual adaptation: Effect of mutual yj − · − adaptation and memory loss produce unstable and stable di- rections. The nontransitive structure of interactions leads to for i = 1,...,N and j = 1,...,N. A detailed analysis state-space folding. of this case with zero memory loss (α = 0) is given in Ref. [2] in terms of asymmetric game dynamics. We will Summarizing, in single-agent adaptation, information present results for zero and positive memory loss rates. flows unidirectionally from the environment to the agent We then consider three agents, for which the adapta- and the agent adapts its behavior to the environmental tion equations are constraints. Adaptation leads to D(pe x) 0. For k → x˙ i X mutual adaptation in an agent collective, however, infor- = [(Ayz)i x Ayz] + αX [Hi H ] mation flow is multidirectional since each agent obtains xi − · − information from its environment and organizes its be- y˙j Y = [(Bzx)j y Bzx] + αY [Hj H ] havior based on that information. In this situation, E yj − · − need not be a Lyapunov function for the dynamics. As z˙k Z we will see, when the dynamics is chaotic, global infor- = [(Cxy)k z Cxy] + αZ [Hk H ] . (39) zk − · − mation maximization is of doubtful utility. A dynamic view of collective adaptation is more appropriate in these for i, j, k = 1,...,N. We again will describe cases with cases. and without memory loss. Now consider many agents interacting. In the perfect Computer simulations are executed in the information memory case, when the game is zero-sum and has an in- space U and the results are shown in the state space X. terior Nash equilibrium (x1∗, x2∗,..., xS∗), following Eq. We ignore the dynamics on the boundary of the simplex (35), the following constant of motion exists: and concentrate the case that all variables are greater than 0 and less than 1. S 1 s∗ s E = Σs=1 D(x x ) β s k s∗ 1 s x A. Two Agents Adapting under Matching Pennies = ΣS ΣN xs∗ log k . (37) s=1 β k k x Interaction s µ k ¶ Although, strictly speaking, Hamiltonian dynamics and In the matching pennies game, agents play one of two the associated symplectic structure of information space actions: heads (H) or tail (T ). Agent X wins when the occurs only for two agents, one can describe multiple plays do not agree; agent Y wins when they do. Agent agent dynamics as a generalized Hamiltonian system [33]. X’s state space is ∆X = (x ,x ) with xi (0, 1) and x + 1 2 ∈ 1 In the general case with α > 0, dissipative dynamics x2 = 1. That is, x1 is the probability that agent X plays and high-dimensional chaotic flows can give rise to sev- heads; x2, tails. Agent Y is described similarly. Thus, eral unstable directions, since information influx has a each agent’s state space is effectively one dimensional and network structure relative the other agents. At least S the collective state space ∆ = ∆X ∆Y , two dimensional. stable directions are expected since memory loss comes The environment for two agents× interacting via the from each individual’s internal dynamics. A detailed matching pennies game leads to the following matrices information-theoretic analysis along these lines will be for Eqs. (38): reported elsewhere. ²X ²X ²Y ²Y A = − and B = − , (40) ²X ²X ²Y ²Y IV. EXAMPLES · − ¸ · − ¸

where ²X (0.0, 1.0] and ²Y (0.0, 1.0]. To illustrate collective adaptation, we now give several Figure 7∈ shows a heteroclinic− ∈ cycle of adaptation dy- examples of the dynamics in a static environment with namics on the boundary of ∆ when the αs vanish. Flows two and three agents interacting via versions of Match- on the border occur only when agents completely ignore ing Pennies and Rock-Scissors-Paper, games with non- an action at the initial state; that is, when xi(0) = 0 transitive structures. Appendix D gives the details of or yj(0) = 0 for at least one i or j. Each vertex of the the reinforcement schemes for these cases. The agents simplex is a saddle since the interaction is non-transitive. 11

(H, T) (T, T) the group of three plays by the agents is even or odd. The environment now is given by, for agent X, Y

²X , number of Hs is even X aijk = (43) ²X , otherwise ½ − (H, H) (T, H) with actions for agents X, Y , and Z given by i, j, k = H,T and ²X (0.0, 1.0]. The interaction matrices bjki { } ∈ and ckij for agents Y and Z, respectively, are given sim- FIG. 7: Flows on the boundary in Matching Pennies inter- ilarly, but with ²Y (0.0, 1.0] and ²Z [ 1.0, 0.0). Ap- action: Actions H and T correspond to “heads” and “tails”, pendix D gives the∈ details of the reinforcement∈ − scheme. respectively. Arrows indicate the direction of adaptation dy- Following the reasoning used in Matching Pennies, the namics on the boundary of the state space ∆. collective state space ∆ = ∆X ∆Y ∆Z is now a solid three-dimensional cube. Figure× 9 shows× a heteroclinic The Nash equilibrium (x∗, y∗) of the Matching Pennies network of adaptation dynamics on the boundary of ∆ ∗ ∗ 1 1 1 1 when αs vanish. Flows on ∆’s boundary is shown in Fig. game is in the center of ∆: (x , y ) = ( 2 , 2 , 2 , 2 ) and this is also a fixed point of the adaptation dynamics. The 9. Jacobian at (x∗, y∗) is ∆ is partitioned into four prism-shaped subspaces. Each prism subspace has a heteroclinic cycle on the face αX (1 + log 2) ²X that is also a face of ∆. J = 2 2 (41) − ²Y αY (1− + log 2) µ − 2 − 2 ¶ (H, T, T) and its eigenvalues are (T, T, T) (H, H, T) 4λi = (αX + αY ) (H, T, H) (T, H, T) Z 1 + log 2 − Y 2 2 (αX αY ) + 4²X ²Y /(1 + log 2) (42). (T, T, H) X ± − (H, H, H) In the perfectp memory case (αX = αY = 0), trajec- tories near (x∗, y∗) are neutrally stable periodic orbits, (T, H, H) 1 since λi = 2 √²X ²Y are pure imaginary. In the mem- ± ∗ ∗ ory loss case (αX > 0 and αY > 0), (x , y ) is glob- FIG. 9: Flows on the state space boundary under the Even- ally asymptotically stable, since Re(λ1) and Re(λ2) are Odd interactions: H and T correspond to “heads” and “tails”, strictly negative. Examples of the trajectories in these respectively. Arrows indicate the direction of adaptation dy- two cases are given in Figure 8. namics on ∆’s boundary when the αs vanish.

1 1 The Nash equilibrium of the Even-Odd interaction is ∗ ∗ ∗ 1 1 1 1 1 1 (x , y , z ) = ( 2 , 2 , 2 , 2 , 2 , 2 ) at the center of ∆ and this

Y1 Y1 is also a fixed point of the adaptation dynamics. The Jacobian there is

αX 0 0 0 0 0 1 0 1 − X1 X1 J = 0 αY 0 . (44)  −  0 0 αZ −   FIG. 8: Adaptation dynamics in Matching Pennies interac- Its eigenvalues are λ = αX , αY , αZ . Thus, in com- − − − tion: Here ²X = 0.5 and ²Y = −0.3 with (left) αX = αY = 0 plete memory case (αX = αY = αZ = 0), trajecto- and (right) αX = 0.02 and αY = 0.01. ries near (x∗, y∗, z∗) are neutrally stable periodic orbits. ∗ ∗ ∗ With memory decay (αX , αY , αZ > 0), the (x , y , z ) is globally asymptotically stable. The hyperbolic fixed points in the top and bottom faces are unstable in all B. Three Agents Adapting with Even-Odd cases. Examples of the trajectories are given in Figure Interaction 10. Notably, when a single agent (say, Z) has memory loss Now consider extending Matching Pennies for two and others have perfect memory, the crossed lines given agents so that it determines the interactions between by z = x = 0.5, z = y = 0.5 become an invariant sub- three. Here we introduce the Even-Odd interaction in space{ and trajectories are attracted} to points in this sub- which there are again two actions, H and T , but agents space. Thus, there are infinitely many neutrally stable win according to whether or not the number of heads in points. With αX = αY = 0 and αZ = 0.01, for example, 12 the adaptive dynamics alternates between a Matching and

Pennies interaction between agents X and Z by one be- 2 1 1 3 ²Y 1 3 ²Y 1 3 ²Y tween agents Y and Z during the transient relaxation to 0 1 −2 − −1 B = 1 3 ²Y 3 ²Y 1 3 ²Y . (47) a point on the invariant subspace.  − −1 1 −2  1 ²Y 1 ²Y ²Y − 3 − − 3 3   Note that the reinforcements are normalized to zero mean 1 1 and that this does not affect the dynamics. The flow on ∆’s boundary is shown in Fig. 11. This Z1 Z1 represents the heteroclinic network of adaptation dynam- ics on ∆’s edges when the αs vanish. Each vertex is a 0 0 0 0 saddle since the interaction has non-transitive structure. 1 1

X1 X1 Y1 Y1 (S, S) 1 0 1 0

(S, R)

(P, S) (R, S) FIG. 10: Dynamics of adaptation in the Even-odd interaction: (S, P)

²X = 0.5, ²Y = 0.2, and ²Z = −0.3 with αX = αY = αZ = 0 Y2 X2 X Y Z Y1 in (left) and with α = α = 0 and α = 0.01 in (right). (P, R) (R, R) Note that the neutral subspace is shown as the horizontal X1 cross. The trajectory chosen illustrates the attraction to a (P, P) (R, P) point in this subspace.

FIG. 11: Flows on the boundary of the simplex in the Rock- Scissors-Paper interaction for two agents: R, S, and P denote C. Two Agents Adapting under “rock”, “scissors”, and “paper”, respectively. The arrows in- Rock-Scissors-Paper Interactions dicate the direction of the adaptation dynamics on the bound- ary of the collective state space ∆ when the αs vanish. In this subsection, we give an example of an environ- ment in which agents have three actions. One of the most The Nash equilibrium (x∗, y∗) is given by the centers commonly studied games with three actions is the Rock- of the simplex: Scissors-Paper (RSP) game, in which an agent playing 1 1 1 1 1 1 Rock beats one playing Scissors, which in turn beats an (x∗, y∗) = ( , , , , , ) . (48) agent playing Paper, which finally beats Rock. 3 3 3 3 3 3 First we examine two agents, which is a straightfor- This is also a fixed point of the adaptation dynamics. ward implementation of the RSP game and then extend The Jacobian there is the RSP interaction to three agents and analyze the 1+²X 2 higher-dimensional behavior. The interaction matrices αX 0 3 3 − 2 −1+²X for these cases are given in App. D. 0 αX 3 3 J =  1+²Y − 2 −  . (49) Under the RSP interaction each agent has the option 3 3 αY 0 2 −1+²Y − of playing one of three actions: “rock” (R), ‘scissors” (S),  0 αY   − 3 3 −  and “paper” (P). Agent X’s probability of playing these   are denoted x1, x2, and x3 and x1 + x2 + x3 = 1. Agent Its eigenvalues are Y probabilities are given similarly. Thus, the agent state 2λi = (αX + αY ) spaces, ∆X and ∆Y , are each two dimensional simplices, − and the collective state space ∆ = ∆X ∆Y is four dimensional. × 2 For two agents the environment is given by the inter- 4 ²X ²Y 3 3(²X + ²Y ) 2 − ± − v(αX αY ) + . action matrices ±u − ³ 9 ´ u p t (50) ²X 1 1 ²Y 1 1 − − Thus, when (A, B) is zero-sum (²X + ²Y = 0) and A = 1 ²X 1 and B = 1 ²Y 1 , (45)  −   −  agents have complete memory (αX = αY = 0), trajec- 1 1 ²X 1 1 ²Y − − tories near (x∗, y∗) are neutrally stable periodic orbits     since all λ’s are pure imaginary. The dynamics is Hamil- where ²X , ²Y [ 1.0, 1.0] are the rewards for ties and ∈ − tonian in this case. With memory decay (αX , αY > 0), normalized to 2 2 ∗ ∗ and αX αY < 3 (²X + 3), (x , y ) is globally asymp- 2 1 1 | − | 3 ²X 1 3 ²X 1 3 ²X totically stable. 0 1 −2 − −1 A = 1 3 ²X 3 ²X 1 3 ²X (46) For the nonzero-sum case, we will give examples of dy-  − −1 1 −2  1 ²X 1 ²X ²X namics with ²X = 0.5, ²Y = 0.3, αY = 0.01. In this − 3 − − 3 3 −   13

∗ ∗ (S, S) case, when αX > αc, (x , y ) is globally asymptotically stable. At the point αc 0.055008938, period-doubling ∆x X ∆y bifurcation occurs. The∼ example of two agents adapt- (S, R) S S (P, S) (S, P) (R, S) ing in the Rock-Scissors-Paper interaction adaptation dy- ∆ ∆ namics illustrates various types of low-dimensional chaos. x y

We now explore several cases. (P, R) (R, R) PR PR

(P, P) (R, P) 1. Hamiltonian Limit (S, S)

∆x X ∆y (S, R) S S When the agent memories are perfect (αX = αY = 0) and the game is zero-sum (²X = ²Y ), the dynamics in (P, S) − (R, S) ∆x ∆y the information space U is Hamiltonian with a function (S, P) ∗ ∗ consists of relative entropy E = D(x x) + D(y y). (P, R) (R, R) PR PR The left columns of Figs. 12 and 13 givek trajectoriesk in the collective state space ∆, while the plots given in the (P, P) (R, P) middle and right columns are these trajectories projected onto the individual agent simplices, ∆X and ∆Y . The trajectories were generated using a 4th-order symplectic FIG. 12: Quasiperiodic tori: Collective dynamics in ∆ (left integrator [34] in U. column) and individual dynamics projected onto ∆X and When ²X = ²Y = 0.0 it appears that the dynamics is ∆Y respectively (right two columns). Here ²X = −²Y = − integrable, since only quasiperiodic tori exist for all initial 0.0 and αX = αY = 0. The initial condition is (A): conditions. With some initial conditions, the tori is knot- (x, y) = (0.26, 0.113333, 0.626667, 0.165, 0.772549, 0.062451) x y ted to form trefoil. Otherwise, when ²X = ²Y > 0.0, for the top and (B): ( , ) = (0.05, 0.35, 0.6, 0.1, 0.2, 0.7) Hamiltonian chaos occurs with positive-negative− pairs of for the bottom. The constant of motion (Hamiltonian) is Lyapunov exponents. (See Table I.) The game-theoretic E = 0.74446808 ≡ E0. The Poincar´esection used for Fig. 14 is given by x1 = x2 and y1 < y2 and is indicated here as the behavior of this example was investigated briefly in Ref. straight diagonal line in agent X’s simplex ∆X . [16]. The dynamics is very rich. For example, there are infinitely many distinct behaviors near the fixed point at the center—the interior Nash equilibrium—and a peri- odic orbit arbitrarily close to any chaotic one. (S, S) A more detailed view of the complex dynamics is given ∆x X ∆y in Figure 14 which shows Poincar´esections of Eqs. (38)’s (S, R) S S trajectories. The Poincar´esection is given byu ˙ 3 > 0 and (S, P) (P, S) (R, S) v˙3 = 0. In (x, y) space the section is determined by the ∆x ∆y constraints: (P, R) (R, R) PR PR 2 (1 ²X )y1 (1 + ²X )y2 + ²X < 0 , − − 3 (P, P) (R, P) 2 (S, S) (1 ²Y )x1 (1 + ²Y )x2 + ²Y = 0 . (51) − − 3 ∆x X ∆y (S, R) S S These sections are indicated as the straight lines drawn (S, P) (P, S) (R, S) in the ∆X simplices of Figs. 12 and 13. In Figure 14, ∆x ∆y when ²X = ²Y = 0.0, closed loops depending on the − initial conditions exhibits tori in the Poincar´esection. (P, R) (R, R) PR PR When ²X = ²Y = 0.5, some tori collapse and become − chaotic. The scatter of dots among the remaining closed (P, P) (R, P) loop shows characteristic Hamiltonian chaos. Table I shows Lyapunov spectra in U for dynamics with ²X = ²Y = 0.0 and ²X = ²Y = 0.5 with initial FIG. 13: Quasiperiodic tori and chaos: Collective dynamics − − in ∆ (left column) and individual dynamics projected onto condition (x(0), y(0)) = (x1, 0.35, 0.65 x1, 0.1,y2, 0.9 − − ∆X and ∆Y , respectively (right two columns). Here ²X = y2) with E = E0 = 0.74446808 fixed. (x1,y2) satisfies −²Y = 0.5 and αX = αY = 0. The initial conditions are e−3(E0+2log 3) the same as in Fig. 12, (A) for top row and (B) for bottom = x1(0.65 x1)y2(0.9 y2). (52) rows, respectively. Also, the constant of motion is the same: 0.035 − − E = E0. The Poincar´esection is given by 3x1 − x2 − 2/3 = 0 and y1 − 3y2 + 2/3 < 0 and this is indicated as a straight line When x1(0) = 0.05, the initial condition is (B): (x, y) = (0.05, 0.35, 0.6, 0.1, 0.2, 0.7), which we gave in the preced- in ∆X . ing examples. 14

When ²X = 0.5, the Lyapunov exponents indicate action ties are not rewarded there is only one such cy- positive-negative pairs for x1(0) = 0.05, 0.06 and 0.08, cle. It is shown in the top row of Fig. (15): (R,P ) which clearly show Hamiltonian chaos. Note that λ (S,P ) (S,R) (P,R) (P,S) (R,S) (R,P→). 2 ' → → → → → 0.0, λ3 0.0, and λ4 λ1, as expected. Note that during the cycle each agent switches between ' ' − almost deterministic actions in the order R S P . 0.5 0.6 → → 0.45 The agents are out of phase with respect to each other B 0.5 0.4 B and they alternate winning each turn.

0.35 0.4 0.3 With ²X + ²Y > 0, however, the orbit is an infinitely Y1 0.25 A Y1 0.3 A persistent chaotic transient [35]. Since, in this case, agent 0.2 0.2 X can choose a tie, the cycles are not closed. For exam- 0.15

0.1 ple, with ²X > 0, at (R,P ), X has the option of moving 0.1 0.05 to (P,P ) instead of (S,P ) with a positive probability.

0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 This embeds an instability along the heteroclinic cycle X1 X1 and so orbits are chaotic. (See Fig. 15, bottom row.) Figure 16 shows the time series for these behaviors. FIG. 14: Poincar´esections of the behavior in the preceding Usually, in transient relaxation to heteroclinic cycle, the two figures. That is, ²X = −²Y = 0.0 (left) and ²X = −²Y = duration over which orbits stay near saddle vertices in- 0.5 (right). The Poincar´esection is given by x1 = x2 and creases exponentially. However, for our case, it appears y1 < y2 (left) and 3x1 − x2 − 2/3 = 0 and y1 − 3y2 + 2/3 < 0 (right). There are 25 randomly selected initial conditions, to increase subexponentially. This is because of the very small exponent; (1 + δ)n 1 + nδ + . . . (δ << 1). In the including the two, (A) and (B), used in Figs. 12 and 13. The ∼ constant of motion (E = E0) forms the outer border of the second chaotic transient case, it still increases subexpo- Poincar´esections. nentially, but the visited vertices change irregularly. Figure 17 shows the behavior of HX , HY , and E. For both cases E eventually increases monotonically and HX ²X λ x1(0)=0.05 0.06 0.07 0.08 0.09 0.10 and HY asymptotically decrease. The agents show a ten- λ1 +0.881 +0.551 +0.563 +0.573 +0.575 +0.589 dency to decrease choice uncertainty and to switch be- 0.0 λ2 +0.436 +0.447 +0.464 +0.467 +0.460 +0.461 tween almost deterministic actions. HX and HY oscillate λ3 −0.436 −0.447 −0.464 −0.467 −0.460 −0.461 over the range [0, log 2] for ²X = 0.1 and ²Y = 0.05 and − λ4 −0.881 −0.551 −0.563 −0.573 −0.575 −0.589 over [0, log 3] for ²X = 0.1 and ²Y = 0.05. − λ1 +36.4 +41.5 +0.487 +26.3 +0.575 +0.487

0.5 λ2 +0.543 +0.666 +0.204 +0.350 +0.460 +0.460 (S, S) λ3 −0.637 −0.666 −0.197 −0.338 −0.460 −0.467

λ4 −36.3 −41.5 −0.494 −26.3 −0.575 −0.480 ∆x X ∆y (S, R) S S

(S, P) (P, S) (R, S) TABLE I: Lyapunov spectra for different initial conditions ∆x ∆y (columns) and different values of the tie breaking parame- (R, R) PR PR ter ²X . The initial conditions are (x1, x2, x3, y1, y2, y3) = (P, R) (x1, 0.35, 0.65 − x1, 0.1, y2, 0.9 − y2) with E = E0 = 0.74446808 fixed. We choose the initial conditions (x1, y2) = (P, P) (R, P)

(0.05, 0.2), (0.06, 0.160421), (0.07, 0.135275), (0.08, 0.117743), (S, S) (0.09, 0.104795), (0.10, 0.0948432). The Lyapunov exponents 3 ∆ ∆ are multiplied by 10 . Note that λ2 ' 0.0, λ3 ' 0.0 and x X y (S, R) S S λ4 ' −λ1 as expected. The Lyapunov exponents indicating (S, P) chaos are shown in boldface. (P, S) (R, S) ∆x ∆y

(P, R) (R, R) PR PR

2. Conservative Dynamics (P, P) (R, P)

With perfect memory (αX = αY = 0) and a game that is not zero-sum (²X = ²Y ) the dynamics is conservative in U and one observes6 − transients that are attracted to FIG. 15: Heteroclinic cycle with ²X = −0.1 and ²Y = 0.05 heteroclinic networks in the state space X. (See Fig. (top row). Chaotic transient to a heteroclinic network (bot- 15.) tom row) with ²X = 0.1 and ²Y = −0.05). For both When ²X + ²Y < 0, the behavior is intermittent and αX = αY = 0. orbits are guided by the flow on ∆’s edges, which de- scribes a network of possible heteroclinic cycles. Since 15

1 1 The dual aspects of chaos, coherence and irregularity, imply that agents may behave cooperatively or competi-

X1 X1 tively (or switch between both). This ultimately derives from agents’ successive mutual adaptation and memory loss in non-transitive interactions, such as in the RSP 0 0 game; as was explained in Sec. III. Note that such 1 1 global behavior organization is induced by each agents’ self-interested and myopic adaptation and “weak” uncer- tainty of their environment. Y1 Y1

(S, S) 0 0 ∆x X ∆y (S, R) S S 0t 10000 0t 10000 (P, S) (R, S) ∆x ∆y (S, P)

(P, R) FIG. 16: Time series of action probabilities during the hete- (R, R) PR PR roclinic cycles of Fig. 15. ²X = −0.1 and ²Y = 0.05 for the left column. The right column shows the chaotic transient to (P, P) (R, P) a possible heteroclinic cycles when ²X = 0.1 and ²Y = −0.05. (S, S) For both αX = αY = 0. ∆x X ∆y (S, R) S S 3 3 (P, S) (R, S) ∆x ∆y (S, P)

E E (P, R) HX HY HX HY (R, R) PR PR ln3 ln3 ln2 ln2 (P, P) (R, P) (S, S) 0 0 0t 5000 0t 5000 ∆x X ∆y (S, R) S S

(P, S) X Y (R, S) ∆x ∆y FIG. 17: Dynamics of H , H and E in conservative adap- (S, P) X Y tive dynamics: ² = −0.1 and ² = 0.05 for the left plot (P, R) PR PR and ²X = 0.1 and ²Y = −0.05 for the right. For both (R, R) αX = αY = 0. Note that E increases asymptotically and X Y H and H tend to decrease. (P, P) (R, P)

FIG. 18: Dissipative adaptive dynamics: Stable limit cycle for 3. Dissipative Dynamics αX = 0.025 (top), αX = 0.021 (middle) and chaotic with αX = 0.0198 (bottom). All cases have ²X = 0.5, ²Y = If the memory loss rates (αX and αY ) are positive, −0.3 and αY = 0.01. Period-doubling bifurcation to chaos the dynamics becomes dissipative in information space occurs with decreasing αX . U and exhibits limit cycles and chaotic attractors. (See Fig. 18.) Figure 20 (top) shows a diverse range of bi- furcations as a function of αX . It shows the dynamics 3 3 on the surface specified byu ˙ 3 < 0 andv ˙3 = 0 projected E ∗ ∗ E onto v3. The fixed point (x , y ) becomes unstable when X Y X Y H H H H αX is larger than αc 0.055008938. Typically, period- ≈ ln3 ln3 doubling bifurcation to chaos occurs with decreasing αX . ln2 ln2 Chaos can occur only when ²X + ²Y > 0 [17]. X Y 0 0 Figure 19 shows dynamics of H , H , and E in dissi- 0t 5000 t*t t*+5000 pative adaptive dynamics. For both cases shown E does not diverge due to memory loss. When αX = 0.025, X Y X Y H and H converge to oscillations over the range FIG. 19: Dynamics of H , H , and E in dissipative adaptive X Y [log 2, log 3]. When αX = 0.01, H and H exhibit dynamics: ²X = 0.5, ²Y = −0.3, and αY = 0.01 for both. chaotic behavior over the range [0, log 3]. αX = 0.025 for the left plot and αX = 0.01 for the right. ∗ 8 Figure 20 (bottom) shows that the largest Lyapunov t ≈ 10 in the right figure is the (rather long) transient time. exponent in U is positive across a significant fraction of In both cases E does not diverge due to memory loss. the parameter space; indicating that chaos is common. 16

Jacobian there is

1 2 1 2 αX 0 3 3 3 3 − 2 1 2 1 0 αX 3 3 3 3  1 − 2 − − −1 −2  3 3 αY 0 3 3 J =  2 1 − 2 1  . (56)  3 3 0 αY 3 3   −1 −2 1 −2 − −   αZ 0   3 3 3 3 −   2 1 2 1 0 α   3 3 3 3 Z   − − − − −  When αX = αY = αZ = α, its eigenvalues are i FIG. 20: Bifurcation diagram (top) of dissipative dynamics λi + α = ( 1, 1, 2, 1, 1, 2) . (57) (adapting with memory loss) projected onto coordinate v3 √3 − − − from the Poincar´esection (u ˙ 3 > 0,v ˙3 = 0) and the largest two Lyapunov exponents λ1 and λ2 (bottom) as a function In the perfect memory case (αX = αY = αZ = 0), ∗ ∗ ∗ of αY ∈ [0.01, 0.03]. Here with ²X = 0.5, ²Y = −0.3 and trajectories near (x , y , z ) are neutrally stable periodic αY = 0.01. Simulations show that λ3 and λ4 are always orbits, since the λs are pure imaginary. In the memory ∗ ∗ ∗ negative. loss case (αX , αY , αZ > 0), (x , y , z ) is asymptotically stable, since all Re(λi) are strictly negative. One expects multiple attractors in this case. The collective state space ∆ is now 6 dimensional, be- D. Three Agents Adapting under ing the product of three two-dimensional agent simplices Rock-Scissors-Paper Interactions ∆ = ∆X ∆Y ∆Z . The flow on ∆’s boundary is shown in Fig. 21,× giving× the adaptation dynamics on the edges Consider three agents adapting via (an extension of) of ∆ when the αs vanish. the RSP interaction. Here the environment is given by We give two examples with αX = αY = αZ = 0.01, the following interaction ²X = 0.5, ²Y = 0.365, ²Z = 0.8 (top: limit cycle) and − ²X = 0.5, ²Y = 0.3, ²Z = 0.6 (bottom: chaos) in Fig. − 22. Chaos is typically observed when ²X + ²Y + ²Z > 0. 2 Win over the others. Limit cycles are highly complex manifolds depending on 2 Lose to the other two. the 6-dimensional heteroclinic network on the simplex  − aijk =  1 Win over one other. (53) boundary.   1 Lose to one other. The Lyapunov spectrum for the chaotic dy- − namics is (λ1,...,λ6) = (+45.2, +6.48, 0.336, ²X Tie. −3 −  19.2, 38.5, 53.6) 10 . The dynamics has two  − − − ×  positive Lyapunov exponents. Note that this dynamics and similarly for bjki and ckij, with i, j, k = R,S,P . could have many neutrally stable subspaces in three or { } Here ²X , ²Y , ²Z ( 1.0, 1.0). (See App. D for the de- more dimensions. These subspaces act as quasistable tailed listing of the∈ − reinforcement scheme.) As before we attractors and may even have symplectic structure. 0 0 0 use normalized aijk, bjki, and ckij: These properties of high-dimensional dynamics will be reported elsewhere.

2 ²X Win over the others. − 5 2 ²X Lose to the other two. V. CONCLUDING REMARKS  − − 5 a0 =  1 ²X Win over one other. (54) ijk  − 5  ²X We developed a class of dynamical systems for collec-  1 5 Lose to one other. −4 − tive adaptation. We started with very simple agents, ²X Tie.  5 whose adaptation was a dynamic balance between adap-   tation to environmental constraints and memory loss. A The normalization does not affect the dynamics. macroscopic description of a network of adaptive agents The Nash equilibrium (x∗, y∗, z∗) is at the simplex cen- was produced. In one special case we showed that the dy- ter: namical system reduces to replicator equations, familiar in evolutionary game theory and population biology. In a more general setting, we investigated several of the result- 1 (x∗, y∗, z∗) = (1, 1, 1, 1, 1, 1, 1, 1, 1) . (55) ing periodic, intermittent, and chaotic behaviors in which 3 agent-agent interactions were explicitly given as game in- teractions. Self-organization induced by information flux It is also a fixed point of the adaptation dynamics. The was discussed using an information-theoretic viewpoint. 17

(S, S, S) havior. Since deterministic chaos occurs even in this sim-

Z 2 (S, R, S) ple setting, one expects that in higher-dimensional and Y2 X2 Y1 heterogeneous adaptive systems intrinsic unpredictabil- (P, S, S) (R, S, S) X 1 (S, P, S) Z1 ity would become a dominant collective behavior.

(P, R, S) (R, R, S) We close by indicating some future directions in which to extent the model. First, as we alluded to during the (S, S, P) (P, P, S) (R, P, S) development, there are difficulties of scaling the model to large numbers of agents. We focused on collectives with (S, R, P) (S, S, R) global coupling between all agents. However, in this case, (P, S, P) (R, S, P) (S, P, P) the complexity of interaction terms grows exponentially (S, R, R) with number of agents, which is both impractical from (P, R, P) (R, R, P)(P, S, R) (R, S, R) the viewpoints of analysis and simulation and unrealistic (S, P, R) for nature systems that are large collectives. The solution (P, P, P) (R, P, P) (P, R, R) (R, R, R) to this, given in App. E, is to develop either spatially dis- tributed agents collectives or to extend the equations to (P, P, R) (R, P, R) include explicit communication networks between agents. Both of these extensions will be helpful in modeling the many adaptive collectives noted in the introduction. FIG. 21: Flows on the simplex edges in three-agent RSP: Arrows indicate the direction of adaptation dynamics on ∆’s Second, important for applications, is to develop the boundary when the αs vanish. stochastic generalization of the deterministic equations of motion which accounts for the effects of finite and fluc- tuating numbers of agents and also finite histories for S adaptation. Each of these introduces its own kind of

∆x X ∆y X ∆z ∆x sampling stochasticity and will require a statistical dy- namics analysis reminiscent of that found in population PR genetics [36]. One necessary and possibly difficult extension will S S be to agents that adapt continuous-valued actions—say, (A, B, C) learning the spatial location of objects—to their environ- ∆ y ∆z ments. Mathematically, this requires a continuous-space extension of the adaptation equations (Eq. (15)) and this PR PR results in models that are described by PDEs [37]. S Finally, another direction, especially useful if one at- ∆ ∆ ∆ x X y X z ∆x tempts to quantify global function in large collectives, will be structural and information-theoretic analyses of PR local and global adaptive behaviors [38, 39]. Analyz- ing the stored information and the causal architecture S S [40, 41] in each agent versus that in the collective, com- (A, B, C) munication in large networks, and emerging hierarchical ∆ y ∆z structures in collective adaptation are projects now made possible using this framework. PR PR

FIG. 22: Periodic orbit (top: ²X = 0.5, ²Y = −0.365, ²Z = 0.8) and chaotic orbit (bottom: ²X = 0.5, ²Y = −0.3, Acknowledgments ²Z = 0.6) with the other parameters are αX = αY = αZ = 0.01. The Lyapunov spectrum for chaotic dynamics is (λ1,...,λ6) = (+45.2, +6.48, −0.336, −19.2, −38.5, −53.6) × The authors thank D. Albers, K. Engo, D. Krakauer, −3 10 . I. Tsuda, and J. D. Farmer for helpful discussions. YS was supported by the Special Postdoctoral Researchers Program at RIKEN. EA was supported by the Grant-in- We pointed out that unlike single-agent adaptation, in- Aid for Young Scientists (B) No.14780342, The Ministry formation flow is multidirectional in collective adaptation of Education, Culture, Sports, Science and Technology, and a dynamical analysis is required. We also noted that Japan. This work was also supported at the Santa Fe for nontransitive interactions deterministic chaos occurs Institute under the Network Dynamics Program by Intel due to the agents’ local adaptation which amplifies fluc- Corporation and under DARPA contract F30602-00-2- tuations in behavior and to memory loss stabilizing be- 0583. 18

APPENDIX A: CONTINUOUS TIME for all (x, y) ∆X ∆Y . If they exist in the interior, the solutions of∈ the following× simultaneous equations are Here we give the derivation of the continuous-time lim- Nash equilibria: its that lead to the differential equations from the original stochastic discrete-time adaptation model. (Ay)i = A(y)1 and (Bx)j = (Bx)1 (B2) Denote the agent-agent interaction time scale, number (Ay)i xAy = 0 and (Bx)j yBx = 0 , ⇐⇒ − − of interactions per adaptation interval, and adaptation N M time scale as dτ, T , and t, respectively. We assume that where Σn xn = Σm ym = 1. adaptation is very slow compared to agent-agent interac- It is known that N = M is a necessary condition for tions and take the limits dτ 0 and T , keeping the existence of a unique Nash equilibrium in the interior dt = T dτ finite. Then we take→ the limit →dt ∞ 0 to get of ∆. With N = M in the perfect memory case (αX = the derivative of the vector Q(t). → αY = 0), the unique Nash equilibrium, if it exists, is X the fixed point given by the intersection of the x- and With Eq. (2) and Qi (0) = 0, we have y-nullclines of Eqs. (20). X 1 T X T X This Nash equilibrium is not asymptotically stable, but Qi (T ) = Σk=1δij(k)rij (k) αX Σk=1Qi (k) . T − the time average of trajectories converges to it. To see (A1) £ ¤ this, suppose that xi(t) > δ for all t sufficiently large, we Thus, for continuous-time, when action i is chosen at step have n, X X d x˙i Q (t + dt) Q (t) (log xi) = = (Ay)i xAy , i − i (A2) dt x − dt i d y˙i 1 T (t+dt) k X k T (t+dt) X k (log yi) = = (Bx)i yBx . (B3) = Σ δij ( )r ( ) αX Σ Q ( ) . dt y − T dt k=T t T ij T − k=T t i T i · ¸ Taking T and dτ 0, we have Integrating the both sides from 0 to T and dividing by → ∞ → T , we get QX (t + dt) QX (t) i − i (A3) dt log xi(T ) log xi(0) − = Σj aij y SA , t+dt t+dt T i − 1 X αX X = δij(s)rij (s)ds Qi (s)ds . log yi(T ) log yi(0) dt t − dt t − = Σj bij xi SB , (B4) Z Z T − Assuming rij (t) changes as slowly as the adaptive dy- namics, r (t) is constant during the adaptation interval −1 T −1 T ij where xi = T 0 xidt, yi = T 0 yidt, SA = t t + dt. If we assume in addition that the behaviors −1 T x y −1 T y x ∼ T 0 A dt, andR SB = T 0 B dtR. Letting T of two agents X and Y are statistically independent at , the left-hand sides converge to 0. Thus, x and y are→ time t, then the law of the large numbers gives ∞a solutionR of Eqs. (B3). (This proofR follows Ref. [42].) 1 t+dt δ (s)rX (s)ds ΣM r (t)y (t) RX (t) . dt ij ij j=1 ij j i t → ≡ APPENDIX C: HAMILTONIAN DYNAMICS Z (A4) Now take dt 0. Eqs. (A4) and (A4) together give → Consider a game (A, B) that admits an interior Nash ˙ X X X ∗ ∗ T Qi (t) = Ri (t) αX Qi (t) (A5) equilibrium (x , y ) and is zero-sum (B = A ), then, − − for the continuous-time updating of the reinforcement E = β−1D(x∗ x) + β−1D(y∗ y) (C1) memory. When environmental change is slow compared X k Y k to adaptation, then is a constant of the motion. This follows by direct calcu- X N lation: Ri (t) = Σk aikyi(t) . (A6) The single-agent case is given by letting y = dE 1 N x˙ k 1 M y˙k = Σk Σk (1, 0, 0,..., 0) and ai1 = ai, i = 1,...,N. dt −βX xk − βY yk = (x∗Ay xAy) (y∗Bx yBx) − − − − = (x∗ x)A(y∗ y) + (y∗ y)B(x∗ x) APPENDIX B: NASH EQUILIBRIA − − − − = 0 . (C2) The Nash equilibria (x∗, y∗) of the interaction (game) (C3) (A, B) are those states in which all players can do no better by changing state; that is, This holds for any number of agents. Give the agents equal numbers of actions (N = M) and set α to zero x∗Ay∗ xAy∗ and y∗Bx∗ yBx∗ , (B1) (perfect memory) and make all βs finite and positive. ≥ ≥ 19

Then the adaptive dynamics is Hamiltonian in the in- 1. Matching Pennies formation space U = (u, v) with the above constant of motion E and has Poisson structure J: This game describes a non-transitive competition. Each agent chooses a coin, which turns up either heads O P J = , (C4) (H) or tails (T). Agent X wins when the coins differ, P T O à − ! otherwise agent Y wins. Table II gives the reinforce- ment scheme for the various possible plays. Note that with P = β β A. X Y the ²s determine the size of the winner’s rewards. When Proof : − ²X +²Y = 0, the game is zero-sum. The Nash equilibrium ∗ ∗ ∂E ∂ −1 N ∗ ∗ −1 N ∗ ∗ is x = y = (1/2, 1/2). = βX Σn xn log xn + βY Σn yn log yn ∂ui ∂ui −£1 N ∗ N −un βX Σn xnun log(Σn e ) ) − − X Y −1 N ∗ N −vn X Y r r βY Σ¡ n ynvn log(Σn e ) − − H H −²X −²Y −ui −1 ∗ e = β (¡x ) ¢¤ H T ²X ²Y X i N −un − Σn e T H ²X ²Y −1 ∗ = β (x xi) . (C5) X i − T T −²X −²Y Similarly, ∂E/∂v = β−1(y∗ y ). Since (x∗, y∗) is j Y j j TABLE II: The two-person Matching Pennies game: ²X ∈ − y∗ an interior Nash equilibrium, with Eq. (17), (A )i = (0.0, 1.0] and ²Y ∈ [−1.0, 0.0). ∗ (Bx )j = 0. Thus, ∂E 1 Various extensions of Matching Pennies to more than A = Ay , two players are known. We give the Even-Odd game as ∂v −β Y an example for three agents X, Y , and Z in a collective. ∂E 1 B = Bx . (C6) All flip a coin. Agents X and Y win when the number ∂u −β X of heads is even, otherwise Z wins. Table III gives the and reinforcement scheme. When the ²s add to zero, the game is zero-sum. The unique mixed Nash equilibrium is x∗ = ∂E ∗ ∗ 1 1 1 βX βY A v y = z = ( , , )—the simplex center. U ∂ 2 2 2 J E = − T ∂E ∇ " ( βX βY A) ∂u # − − X Y Z XY Z r r r βX Ay = − HHH −²X −²Y −²Z " βY Bx # − HHT ²X ²Y ²Z u˙ HTH ²X ²Y ²Z = " v˙ # HTT −²X −²Y −²Z THH ²X ²Y ²Z = U˙ . (C7) THT −²X −²Y −²Z We can transform U = (u, v) to canonical coordinates TTH −²X −²Y −²Z 0 U = (p, q): TTT ²X ²Y ²Z

0 U˙ = S U0 E , (C8) ∇ TABLE III: The three-player Even-Odd game: ²X ∈ (0.0, 1.0] and ²Y ,²Z ∈ [−1.0, 0.0). with

O I S = − (C9) I O Ã ! 2. Rock-Scissors-Paper where I is an N N identity matrix and with a linear transformation U×0 = MU to the Hamiltonian form. 2 This game describes a non-transitive three-sided com- petition between two agents: rock (R) beats scissors (S), scissors beats paper (P), but paper beats rock. Table APPENDIX D: REINFORCEMENT SCHEMES IV gives the reinforcement scheme. The ²s here con- AND INTERACTION MATRICES trol the rewards for ties. When they add to zero, the game is zero-sum. The unique mixed Nash equilibrium ∗ ∗ 1 1 1 Here we give the reinforcement scheme interaction ma- is x = y = ( 3 , 3 , 3 )—again, the center of the simplex. trices for the constant-environment collectives investi- The extension of RSP interaction to three agents is gated in the main text. straightforward. The reinforcement scheme is given in 20

X Y X Y r r APPENDIX E: NETWORK INTERACTIONS R R ²X ²Y R S 1 -1 We can describe heterogeneous network interactions R P -1 1 within our model. We give an example of a model for lat- S R -1 1 tice interactions here. Agents s = 1, 2,...,S are on a spa- tial lattice: agent s interacts with agent s 1 through bi- S S ²X ²Y matrices (As,Bs−1) and agent s + 1 through− (Bs, As+1). S P 1 -1 Each bi-matrix is 2 2. See Fig. 23. P R 1 -1 × P S -1 1

P P ²X ²Y S-1 S S S+1 ... S-1 B AAS B S+1 ... TABLE IV: The two-person Rock-Scissors-Paper game: ²X ,²Y ∈ (−1.0, 1.0).

FIG. 23: Agent s interacts with agent s − 1 through bi- s s s s Table V. When ²X + ²Y + ²Z = 0, the game is zero-sum. matrices (A ,B −1) and agent s + 1 through (B ,A +1). The Nash equilibrium is x∗ = y∗ = z∗ = (1/3, 1/3, 1/3).

Agents choose actions among the 2 2 action pairs × X Y Z X Y Z X Y Z for both the right and left neighboring agents. The ac- XY Z r r r XY Z r r r XY Z r r r tion pairs are (1, 1), (1, 2), (2, 1), (2, 2) and are weighted RRR ²X ²Y ²Z S RR -2 1 1 PRR 2 -1 -1 with probabilities x1,...,x4. Inserting the interaction bi- RR S 1 1 -2 S RS -1 2 -1 PR S ²X ²Y ²Z matrices into the S-agent adaptive dynamics of Eq. (21) RRP -1 -1 2 S RP ²X ²Y ²Z PRP 1 -2 1 gives R SR 1 -2 1 S SR -1 -1 2 P SR ²X ²Y ²Z ˙s RS S 2 -1 -1 S S S ²X ²Y ²Z PS S -2 1 1 xi s s−1 s s s−1 s = βs (A x )i p A x RS P ²X ²Y ²Z S SP 1 1 -2 P S P -1 2 -1 xi − · £s s+1 s s s+1 RPR -1 2 -1 S PR ²X ²Y ²Z P PR 1 1 -2 + (B x )i q B x − · RP S ²X ²Y ²Z S P S 1 -2 1 PP S -1 -1 2 s 4 s s + αs( log xi Σnxn log xn¤). (E1) RPP -2 1 1 S PP 2 -1 -1 PPP ²X ²Y ²Z − − s s s s s s s s where Σxi = 1 and p = (x1 + x2,x3 + x4), q = (x1 + TABLE V: The 3-person Rock-Scissors-Paper game: s s s x3,x2 + x4). ²X ,²Y ,²Z ∈ (−1.0, 1.0). In a similar way, arbitrary network interactions can be described by our adaptive dynamics.

[1] A. T. Winfree, The Geometry of Biological Time [9] J. H. Holland, Adaptation in Natural and Artificial Sys- (Springer Verlag, 1980). tems (MIT Press, Cambridge, MA, 1992), second edition [2] J. Hofbauer and K. Sigmund, Evolutionary Games (First edition, 1975). and Population Dynamics (Cambridge University Press: [10] T. Borgers and R. Sarin, J. Econ. Th. 77, 1 (1997). Cambridge, 1988). [11] D. Fudenberg and D. K. Levine, Theory of Learning in [3] S. Camazine, J.-L. Deneubourg, N. R. Franks, J. Sneyd, Games (MIT Press, 1998). G. Theraulaz, and E. Bonabeau, eds., Self-Organization [12] J. von Neumann and O. Morgenstern, Theory of Games in Biological Systems (Princeton University Press, and Economic Behavior (Princeton University Press, Princeton, 2001). 1944). [4] G. Bateson, Steps to an ecology of mind : collected essays [13] P. D. Taylor and L. B. Jonker, Mathematical Biosciences in anthropology, psychiatry, evolution, and epistemology 40, 145 (1978). (Northvale, N.J. : Aronson, 1987). [14] P. D. Taylor, J. Appl. Probability 16, 76 (1979). [5] O. E. Rossler, Ann. NY Acad, Sci. 504, 229 (1987). [15] J. W. Weibull, Evolutionary Game Theory (MIT Press, [6] M. Taiji and T. Ikegami, Physica D134, 253 (1999). 1995). [7] H. A. Simon, The Sciences of the Artificial, Karl Taylor [16] Y. Sato, E. Akiyama, and J. D. Farmer, Proc. Natl. Acad. Compton Lectures (MIT Press, Cambridge, 1996), first Sci. USA 99, 4748 (2002). edition 1969. [17] Y. Sato and J. P. Crutchfield, Phys. Rev. E67, 015206(R) [8] R. Brooks and L. Steele, The Artificial Life Route to Ar- (2003). tificial Intelligence: Building Embodied, Situated Agents [18] E. Akiyama and K. Kaneko, Phyisca D 147, 221 (2000). (Lawrence Erlbaum Associatges, New York, 1995). [19] E. Akiyama and K. Kaneko, Phyisca D 167, 36 (2002). 21

[20] Y. Ito, J. Appl. Prob. 16, 36 (1979). Theory (John Wiley and Sons, Inc., 1991). [21] A. M. Turing, Phil. Trans. R. Soc. London B 237, 37 [32] J. Hofbauer, J. Math. Biol. 34, 675 (1996). (1952). [33] A. M. Perelomov, Integrable Systems of Classical Me- [22] C. E. Shannon, The Mathematical Theory of Communi- chanics and Lie Algebras (Birkhauser, 1990). cation (The University of Illinois Press, 1949). [34] H. Yoshida, Phys. Lett. A150, 262 (1990). [23] W. S. McCulloch, Bulletin of Mathematical Biophysics [35] T. Chawanya, Prog. Theo. Phys. 94, 163 (1995). 5, 115 (1945). [36] E. van Nimwegen, J. P. Crutchfield, and M. Mitchell, [24] B. F. Skinner, The behavior of organisms: an experimen- Theoret. Comp. Sci. 229, 41 (1999). tal analysis (Appleton-Century / New York, 1938). [37] J. Hofbauer, V. Hutson, and G. T. Vickers, Nonlinear [25] D. O. Hebb, The Organization of Behaviour (John Wiley Analysis, Theory, Methods, and Applications 30, 1235 and Sons, New York, 1949). (1997). [26] M. F. Norman, Markov Processes and Learning Models [38] R. Shaw, The Dripping Faucet as a Model Chaotic System (Academic Press / New York, 1972). (Aerial Press, Santa Cruz, California, 1984). [27] A. L. Samuel, IBM Journal on Research and Develop- [39] J. P. Crutchfield and K. Young, Phys. Rev. Lett. 63, 105 ment 11, 601 (1967). (1989). [28] R. S. Sutton and A. G. Barto, Reinforcement learning: [40] J. P. Crutchfield and C. R. Shalizi, Physical Review E An introduction (MIT Press, 1998). 59, 275 (1999). [29] D. Kahneman and A. Tversky, eds., Choices, Values, and [41] J. P. Crutchfield and D. P. Feldman, Chaos 13, 25 (2003). Frames (Cambridge University Press, 2000). [42] P. Schuster, K. Sigmund, J. Hofbauer, and R. Wolff, Biol. [30] J. Aitchison, The Statistical Analysis of Compositional Cybern. 40, 1 (1981). Data (Blackburn Press, Caldwell, New Jersey, 1986). [31] T. M. Cover and J. A. Thomas, Elements of Information