The dynamics of social innovation

H. Peyton Young1

Department of , , Oxford OX1 3UQ, United Kingdom

Edited by Paul Robert Milgrom, Stanford University, Stanford, CA, and approved June 16, 2011 (received for review January 18, 2011) Social norms and institutions are mechanisms that facilitate co- innovations, because they require coordinated change in expect- ordination between individuals. A social innovation is a novel ations and behaviors by multiple individuals. For example, an mechanism that increases the welfare of the individuals who individual who invents a new form of legal contract cannot adopt it compared with the status quo. We model the dynamics simply institute it on his own: First, the other parties to the of social innovation as a played on a network. contract must enter into it, and second, the ability to enforce the Individuals experiment with a novel that would increase contract will depend on its acceptance in society more generally. their payoffs provided that it is also adopted by their neighbors. Institutions exhibit strongly increasing returns precisely because The rate at which a social innovation spreads depends on three of their function as coordination technologies. This property is of factors: the topology of the network and in particular the extent course not limited to social innovations; it is also a feature of to which agents interact in small local clusters, the payoff gain of technological innovations with positive network externalities (4– the innovation relative to the status quo, and the amount of 6), and the results in this paper apply to these situations as well. noise in the process. The analysis shows that local The increasing returns feature of social innovation has im- clustering greatly enhances the speed with which social innova- portant implications for the dynamics governing institutional tions spread. It also suggests that the welfare gains from in- change. Of particular relevance is the social network that governs novation are more likely to occur in large jumps than in a series of people’s interactions. The reason is that, when a social in- small incremental improvements. novation first appears, it will typically gain a foothold in a rela- tively small subgroup of individuals that are closely linked by convergence time | learning | game dynamics geography or social connections. Once the new way of doing things has become firmly established within a local social group, ocial institutions are forms of capital that, together with it propagates to the rest of society through the social network. Sphysical and human capital, determine the rate of economic Thus, a key determinant of the speed with which institutional growth. Notable examples include private property rights and change occurs is the network topology and in particular the extent legal systems to protect them, accounting methods, forms of to which interactions are “localized”. corporate governance, and the terms of economic contracts. The relationship between speed of diffusion and network However, in contrast with the literature on technological prog- structure has been investigated in a variety of settings. One branch ress, relatively little is known about the ways in which new of the literature is concerned with dynamics in which each agent institutions are created and how they become established within responds deterministically to the number of his neighbors who a given social framework. In this paper I discuss one approach have adopted the innovation. From an initial seeding among a to this problem using methods from evolutionary . subset of agents, the question is how long it takes for the inno- In the abstract, an institution can be viewed as a set of rules vation to spread as a function of network structure (7–11). A that structure a given type of interaction between individuals (1). second branch of the literature explores dynamics in which ’ These rules can be simple coordination devices, such as which agents respond to their neighbors choices probabilistically; that is, hand to extend in greeting or who goes through the door first. Or actions with higher expected payoffs are more likely to be chosen they can be very elaborate, such as rituals of courtship and than actions with low expected payoffs. Here the question is how marriage, cycles of retribution, performance criteria in employ- long it takes in expectation for the innovation to spread in the – ment contracts, and litigation procedures in the courts. We want population at large (12 15). The rate of spread hinges on topo- to know how a particular set of rules becomes established as logical properties of the network that are different from those common practice and what process describes the displacement of that determine the speed of convergence in the nonstochastic one set of rules by another. case. A third branch of the literature examines the issue of how The viewpoint we adopt here is that new norms and institu- the network structure itself evolves as agents are matched and – tions are introduced through a process of experimentation by rematched to play a given game (16 19). In this paper we assume that agents adjust their behavior individuals and that the value of adoption by a given individual is fi an increasing function of the number of his neighbors who have probabilistically and that the network structure is xed. In con- adopted. Under certain conditions that we discuss below, this trast to earlier work in this vein, however, we emphasize that it is trial-and-error process will eventually trigger a general change in not only the network topology that determines how fast an in- expectations and behaviors that establishes the new institution novation spreads. Rather, the speed depends on the interaction within society at large. However, it can take a very long time for between three complementary factors: i) the payoff gain repre- this to happen even when the new way of doing things is superior sented by the innovation in relation to the status quo; ii) the responsiveness or rationality of the agents, that is, the probability to the status quo. One reason for this inertia is lack of information: it may not be immediately evident that an innovation actually is superior to This paper results from the Arthur M. Sackler Colloquium of the National Academy of the status quo, due to the small number of prior instances and Sciences, “Dynamics of Social, Political, and Economic Institutions,” held December 3–4, variability in their outcomes. Thus, it may take a long time for 2010, at the Arnold and Mabel Beckman Center of the National Academies of Sciences enough information to accumulate before it becomes clear that and Engineering in Irvine, California. The complete program and audio filesofmost the innovation is superior (2). A second reason is that an inno- presentations are available on the NAS Web site at www.nasonline.org/Sackler_Dynamics. vation as initially conceived may not work very well in practice; it Author contributions: H.P.Y. performed research and wrote the paper. must be refined over time through a process of learning by doing The author declares no conflict of interest. (3). A third reason is that social innovations often exhibit in- This article is a PNAS Direct Submission. creasing returns. Indeed this property is characteristic of social 1E-mail: [email protected]. www.pnas.org/cgi/doi/10.1073/pnas.1100973108 PNAS Early Edition | 1of7 that they choose a (myopic) best response given their infor- bounded (possibly large) size that has a sufficiently high mation; and iii) the presence of small autonomous enclaves where proportion of its interactions with other members of the the innovation can gain an initial foothold, combined with group as opposed to outsiders. Various natural networks pathways that allow the innovation to spread by contagion once have this property, including those in which agents are em- it has gained a sufficient number of distinct footholds. We also bedded more or less uniformly in a finite Euclidean space use a somewhat subtle notion of what it means for an innovation and are neighbors if and only if they are within some spec- to “spread quickly”. Namely, we ask how long it takes in expec- ified distance of one another. (This result follows from tation for a high proportion of agents to adopt the innovation Proposition 2 below; a similar result is proved in ref. 15.) and stick with it with high probability. This latter condition is 4. The payoff gain from the innovation relative to the status needed because, in a stochastic learning process, it is quite pos- quo has an important bearing on the absolute speed with sible for an innovation to spread initially, but then go into reverse which it spreads. We call this the advance in welfare repre- and die out. We wish to know how long it takes for an innovation sented by the innovation. If the advance is sufficiently large, to reach the stage where it is in widespread use and continues to no special topological properties of the network are re- be widely used with high probability thereafter. quired for fast learning: It suffices that the maximum degree The main results of this paper can be summarized as follows. is bounded. First we distinguish between fast and slow rates of diffusion. 5. An innovation that leads to a small advance will tend to take Roughly speaking, an innovation spreads quickly in a given class much longer to spread than an innovation with a large ad- of networks if the expected waiting time to reach a high level of vance. A consequence is that successful innovations will penetration and stay at that level with high probability is bounded tend to occur in big bursts, because a major advance will- above independently of the number of agents; otherwise the in- tend to overtake prior innovations that represent only minor novation spreads slowly. Whether it spreads quickly or slowly advances. depends on the particular learning rule used, the degree of ra- tionality of the agents, the gain in payoff from the innovation, and Model the structure of the network. In the interest of keeping the model Let Γ be a graph with vertex set V and edge set E, where the simple and transparent, we restrict our attention to situations edges are assumed to be undirected. Thus, E is a collection of that can be represented by a two-person coordination game with unordered pairs of vertices {i, j}, where i ≠ j. Assume that there two actions—the status quo and the innovation. Although this are n vertices or nodes.Eachedge{i, j} has a weight w = w > 0 assumption is somewhat restrictive, many coordination mecha- ij ji mutual influence i nisms are essentially dyadic, such as principal-agent contracts, the that we interpret as a measure of the that division of property rights between married couples, and rules and j have on one another. For example, wij may increase the governing bilateral trade. We also focus mainly on log-linear closer that i and j are to each other geographically. Because we response processes by individuals, because this class of rules is can always assume that wij = wji = 0 whenever {i, j}isnot an Γ fi particularly easy to work with analytically. Among the key find- edge, the graph is completely speci ed by a set V consisting of ings are the following: n vertices and a symmetric nonnegative n × n matrix of weights W =(w ), where w =0foralli. 1. If the agents’ level of rationality is too low, the waiting time ij ii fi Assume that each agent has two available choices, A and B.We to spread successfully is very long (in fact it may be in nite) think of B as the status quo behavior and A as the innovative be- because there is too much noise in the system for a substan- havior. The state of the evolutionary process at time t is a vector xt tial proportion of the agents to stay coordinated on the in- ∈ n t ’ novation once it has spread initially. However, if the level of {A, B} , where xi is i s choice at time t. Let G be a symmetric rationality is too high, it takes an exponentially long time in two-person game with payoff function u(x, y), which is the payoff expectation for the innovation to gain a foothold anywhere. to the player who chooses x against an opponent who chooses y. (Here “rationality” refers to the probability of choosing a We assume that u(x, x) > u(y, x) and u(y, y) > u(x, y); that is, G is myopic best response to the actions of one’s neighbors. Un- a coordination game. The game G induces an n-person network der some circumstances a forward-looking rational agent game on Γ as follows: The payoff to individual i in any given pe- might try to induce a change in his neighbors’ behavior by riod results from playing the game against each of his neighbors fi ’ t adopting rst, a situation that is investigated in ref. 20.) once, where i s current strategy xi is unconditional on which Hence only for intermediate levels of rationality can one neighbor he plays. The payoff from any given match is weighted expect the waiting time to be fast in an absolute sense and according to the influence of that neighbor. For each agent i, to be bounded independently of the number of agents. In let Ni denote the set of i’s neighbors, that is, the set of all vertices j this respect the analysis differs substantially from that in such that {i, j} is an edge. Thus, the payoff to i in state x is Montanari and Saberi (15). X 2. Certain topological characteristics of the network promote U x ¼ w u x ; x : [1] fast learning. In particular, if the agents fall into small i ij i j j∈N clusters or enclaves that are mainly connected with each i other as opposed to outsiders, then learning will be fast, To give this a concrete interpretation, suppose that B is a form assuming that the level of rationality is neither too low nor of contractual negotiation that relies solely on a verbal under- too high. One reason why clustering may occur in practice — standing and a handshake, whereas A requires a written agree- is homophily the tendency of like to associate with like ment and a witness to the parties’ signatures. If one side insists (9, 21). Moreover, recent empirical work suggests that some on a written agreement (A) whereas the other side views a verbal behaviors actually do spread more quickly in clustered net- works than in random ones (22). However, not everyone has understanding as appropriate (B), they will fail to coordinate to be contained in a small cluster for learning to be fast: It because they disagree on the basic rules of the game. Note that suffices that the innovation be able gain a foothold in a subset this metagame (agreeing on the rules of the game) can be viewed of clusters from which it can spread by contagion. as a pure coordination game with zero off-diagonal payoffs: 3. For convergence to be fast, it is not necessary for the agents There is no inherent value in adopting A or B unless someone to be contained in enclaves that are small in an absolute else also adopts it. This situation is shown below, where α is the sense; it suffices that everyone be contained in a subgroup of added benefit from the superior action:

2of7 | www.pnas.org/cgi/doi/10.1073/pnas.1100973108 Young chastic nature of the learning process. To understand the nature AB of the problem, consider a situation in which β is close to zero, so A 1 þ α; 1 þ α 0; 0 : [2] ; ; that the probability of playing A is only slightly larger than the B 0 011 probability of playing B. No matter how long we wait, the prob- [In the case of a symmetric 2 × 2 coordination game with non- ability is high that a sizable proportion of the population will be zero off-diagonal payoffs, the adoption dynamics are qualita- playing B at any given future time. Thus, the expected waiting fi tively similar, but they tend to select the risk-dominant time until everyone rst plays A is not the relevant concept. A fi equilibrium, which may differ from the Pareto-dominant equi- similar dif culty arises for any stochastic learning process: If the librium (13, 23, 24).] noise is too large, it is unlikely that everyone is playing A in any given period. Learning We are therefore led to the following definition. Given a sto- ε How do agents adapt their expectations and behaviors in such chastic learning process P on a graph Γ (not necessarily log- an environment? Several models of the learning process have linear learning), for each state x let a(x) denote the proportion been proposed in the literature; here we discuss two particular of agents playing A in state x. Given a target level of penetration examples. Assume that each agent updates at random times 0 < p < 1, define according to the realization of a Poisson process with unit ex- ε ′ pectation. The processes are assumed to be independent among TðP ; Γ; α; pÞ¼E½min ft : aðxtÞ ≥ p & ∀ t ′ ≥ t; Pðaðxt Þ ≥ pÞ ≥ pg agents. Thus, the probability is zero that two agents update si- [4] multaneously, and each agent updates once per unit time period ε in expectation. We think of each update by some agent as de- In words, T(P , Γ, α, p) is the expected waiting time until at least p fining a “tick” of the clock (a period), and the periods are of the agents are playing A, and the probability is at least p that at denoted by t=1, 2, 3, ... . least this proportion plays A at all subsequent times. The waiting − ε Denote the state of the process at the start of period t by xt 1. time depends on the learning process P (including the specific Let ε > 0 be a small error probability. Suppose that agent i is the level of noise ε), as well as on the graph Γ and the size of the unique agent who updates at time t. Assume that, with proba- advance (the payoff gain) α. Note that the higher the value of p bility 1 – ε, i chooses an action that maximizes his expected is, the smaller the noise must be for the waiting time to be finite. payoff, and with probability ε he chooses an action uniformly at To distinguish between fast and slow learning as a function of random. This process is the uniform error model (12, 23, 24). the number of agents, we consider families of networks of dif- An alternative approach is the log-linear model (13). Given ferent sizes, where the size of a network is the number of nodes a real number β ≥ 0 and two actions A and B, suppose that agent (equivalently, the number of agents). i chooses A with probability Fast Versus Slow Learning. Given a family of networks G and a size βU ðA; xt − 1Þ α > e i − i of advance 0 (the payoff gain from the innovation), learning : [3] G α < ε > βU ðA; xt − 1Þ βU ðB; xt − 1Þ is fast for and if, for every p 1 there exists p 0 such that e i − i þ e i − i for each ε ∈ (0, εp), In other words, the log probability of choosing A minus the log ε; Γ; α; Γ ∈ G: [5] probability of choosing B is β times the difference in payoff; TðP pÞ is bounded above for all hence the term “log-linear learning”. [This model is also stan- Otherwise learning is slow; that is, for every ε > 0 there is an dard in the discrete choice literature (25).] The parameter β infinite sequence of graphs Γ1, Γ2, ... , Γn, ... ∈ G such that measures the responsiveness or rationality of the agent: the larger ε lim →∞T(P , Γ , α, p)=∞. β is, the more likely it is that he chooses a best reply given the n n actions of his neighbors, i.e., the less likely it is that he experi- Autonomy ments with a suboptimal action. fi In this section we describe a general condition on families of More generally, we may de ne a perturbed best response Γ × networks that guarantees fast learning. Fix a network =(V, W), learning process as follows. Let G be a 2 2 game with payoff ε α > Γ a learning process P , and an advance 0. Givenε a subset of matrix as in eq. 2, and let be a weighted undirected network ⊆ fi ~ with n nodes, each occupied by an agent. For simplicity we vertices S V,de ne the restricted learning process PS as follows: All nodes i ∉ S are held fixed at B whereas the nodes in S update continue to assume that agents update asynchronously according ε ! ; ! to i.i.d. Poisson arrival processes. Let ε > 0beanoise parameter, according to the process P . Let ð A S B V− SÞ denote the state in ε t − 1 which every member of S plays A and every member of V – S and let qi ðx Þ be the probability that agent i chooses action A t−1 plays B. when i is chosen to update and the state is x . We assume that ε ε − Γ α i) q ðxt 1Þ depends only on the choices of i’s neighbors; ii) !The! set S is autonomous for (P , , ) if and only if i− ; q0ðxt 1Þ¼1 if and only if A is a strict best response to the choices ð A S B V−εSÞ is strictly stochastically stable for the restricted i ε − ~ ’ t 1 process PS. [A state is strictly stochastically stable if it has prob- of i s neighbors; iii) qi ðx Þ does not decrease when one or more ’ ε t − 1 → 0 t − 1 ability one in the limit as the noise goes to zero (24, 26, 27).] of i s neighbors switches to A; and iv) qi ðx Þ qi ðx Þ in an ≥ ε exponentially smooth manner; that is, there is a real number r 0 Proposition 1. Given a learning process P , a family of networks < ε t − 1 =εr < ∞ such that 0 limε→0 qi ðx Þ . (In the case of log-linear G, and an innovation with advance α > 0, suppose that there exists fi ε –β learning these conditions are satis ed if we take = e .) This a positive integer s such that for every Γ ∈ G, every node of Γ is process defines a Markov chain on the finite set of states {A, B}n, ε Γ ε contained in a subset of size at most s that is autonomous for (P , , whose probability transition matrix is denoted by P . α). Then learning is fast. Speed of Diffusion Concretely this means that given any target level of penetra- ! Let B be the initial state in which everyone is following the status tion p < 1, there is an upper bound on the noise, εp,α, such that quo behavior B. How long does it take for the innovative behavior for any given ε in the range 0 < ε < εp,α, the expected waiting A to become widely adopted? One criterion would be the time until at least p of the agents play A (and continue to do so expected waiting time until the first time that everyone plays A. with probability at least p in each subsequent period) is bounded Unfortunately this definition is not satisfactory due to the sto- above independently of the number of agents n in the network.

Young PNAS Early Edition | 3of7 This result differs in a key respect from those in Montanari and periðSÞ¼dðS; V− SÞ; areaðSÞ¼dðS; SÞ: [9] Saberi (15), who show that for log-linear learning the expected waiting time to reach all-A with high probability is bounded as Next observe that d(S) = 2 area(S) + peri(S). It follows from a function of n provided that the noise level is arbitrarily small.Un- eq. 8 with S′ = S that fortunately this result implies that the absolute waiting time to reach all-A is arbitrarily large, because the waiting time until any periðSÞ=areaðSÞ ≤ ð1=rÞ − 2: [10] given agent first switches to A grows exponentially as the noise fi level goes to zero. If S is autonomous under log-linear learning, then by de nition The proof of Proposition 1 runs as follows. For each agent i let Si the potential function of the restricted process is maximized be an autonomous set containing i such that |S | ≤ s. By assumption when everyone in S chooses A. Straightforward computations ! ! i ; show that this result implies the following. the state ð A Si B V− Si Þ in which all members of Si play A is sto- < chastically stable. Given a target level of penetration p 1, we can Proposition 2. Given a graph Γ and innovation advance α > 0, choose ε so small that the probability of being in this state after some S is autonomous for α under log-linear learning if and only if S is finite time t ≥ T is at least 1 – (1 – p)2. Because this result holds for Si r-close-knit for some r > 1/(α + 2). the restricted process, and the probability that i chooses A does not decrease when someone else switches to A, it follows that in Corollary 2.1. If S is autonomous for α under log-linear learning, t ≥ − − 2 ≥ < α the unrestricted process Pðxi ¼ AÞ 1 ð1 pÞ for all t TSi . then peri(S)/area(S) . (This result uses a coupling argument similar to that in ref. 27.) G ∈ | | ≤ ε A family of graphs is close-knit if for every r (0, 1/2) there Because Si s for all i, we can choose and T so that Γ ∈ G t ≥ − − 2 ≥ exists a positive integer s(r) such that, for every , every node Pðxi ¼ AÞ 1 ð1 pÞ for all i and all t T. It follows that the Γ – expected proportion of agents playing A is at least 1 – (1 – p)2 at all of is in an r-close knit set of cardinality at most s(r) (27). times t ≥ T. Corollary 2.2. Given any close-knit family of graphs G, log-linear Now suppose, by way of contradiction, that the probability is learning is fast for all α > 0. less than p that at least p of the agents are playing A at some time t ≥ T. Then the probability is greater than 1 – p that at least 1 – p Examples of the agents are playing B. Hence the expected number playing Consider n nodes located around a circle, where each node is A is less than 1 – (1 – p)2, which is a contradiction. linked by an edge to its two immediate neighbors and the edge weights are one. [This is the situation originally studied by Ellison Autonomy and Close-Knittedness (12).] Any set of s consecutive nodes is (1/2 − 1/2s)-close-knit. fi Autonomy is de ned in terms of a given learning process, but the It follows that for any α > 0, log-linear learning is fast. underlying idea is topological: if agents are contained in small Next consider a 2D regular lattice (a square grid) in which subgroups that interact to a large degree with each other as every vertex has degree 4 (Fig. 1). Assume that each edge has opposed to outsiders, then autonomy will hold under a variety of weight 1. The shaded region in Fig. 1 is a subset of nine nodes learning rules. The precise formulation of this topological con- that is 1/3-close-knit. Hence it is autonomous whenever α > 1. dition depends on the particular learning rule in question; here More generally, observe that any square S of side m has 2m(m – 1) we examine the situation for log-linear learning. internal edges and m2 vertices, each of degree 4; hence X : The degree of vertex i is di ¼ wij d S; S d S ¼ 2m m − 1 4m2 ¼ 1=2 − 1=2m: [11] j∈V Furthermore it is easily checked that for every nonempty subset Let Γ =(V, W) be a graph and α > 0 the size of the advance. For S′ ⊆ S, every subset of vertices S ⊆ V let ′ ′ X d S ; S d S ≥ 1=2 − 1=2m: [12] d S ¼ di: [6] i∈S Therefore, every square of side m is (1/2 – 1/2m)-close-knit and hence is autonomous for all α > 2/(m – 1). A similar argument ′ ⊆ Further, for every nonempty subset S S let holds for any d-dimensional regular lattice: Given any α > 0, X fi α ′; : [7] every suf ciently large sublattice is autonomous for , and this d S S ¼ wij holds independently of the number of vertices in the full lattice. fi; jg:i∈S′; j∈S

Thus, d(S) is the sum of the degrees of the vertices in S, and d(S′, S) is the weighted sum of edges with one end in S′ and the other end in S. Given any real number r ∈ (0, 1/2), we say that the set S is r-close-knit if ∀S′⊆ S; S′≠∅; d S′; S d S′ ≥ r: [8]

S is r-close-knit if no subset has “too many” (i.e., more than 1 – r) interactions with outsiders. This condition implies in particular that no individual i ∈ S has more than 1 – r of its interactions with outsiders, a property known as r-cohesiveness (8). One consequence of close-knittedness is that the “perimeter” of S must not be too large relative to its “area”. Specifically, let us define the perimeter and area of any nonempty set of vertices Fig. 1. A 2D lattice with a subset of nine vertices (circled) that is autono- S ⊆ V as follows: mous for any α > 1 under log-linear learning.

4of7 | www.pnas.org/cgi/doi/10.1073/pnas.1100973108 Young Note that in these examples fast learning does not arise be- the innovation can establish a toehold in the cliques relatively cause neighbors of neighbors tend to be neighbors of one an- quickly, which then causes the hub to switch to the innovation also. other. In fact, a d-dimensional lattice has the property that Note, however, that fast learning in the network with cliques none of the neighbors of a given agent are themselves neighbors. does not follow from Proposition 2, because not every node is Rather, fast learning arises from a basic fact of Euclidean ge- contained in a clique. In particular, the hub is connected to all of ometry: The ratio of “surface” to “volume” of a d-dimensional the leaves, the number of which grows with the size of the tree, so cube goes to zero as the cube becomes arbitrarily large. it is not in an r-close–knit set of bounded size for any given r < 1/2. A d-dimensional lattice illustrates the concept of autonomy in Nevertheless learning is fast: Any given clique adopts A with a very transparent way, but it applies in many other situations as high probability in bounded time; hence a sizable proportion well. Indeed one could argue that many real-world networks are of the cliques linked to the hub switch to A in bounded time, and composed of relatively small autonomous groups, because people then the hub switches to A also. In other words, fast learning tend to cluster geographically, or because they tend to interact occurs through a combination of autonomy and contagion, a with people of their own kind (homophily), or for both reasons. topic that we explore in the next section. To understand the difference between a network with small autonomous groups and one without, consider the pair of net- Autonomy and Contagion works in Fig. 2. Fig. 2, Upper shows a tree in which every node Contagion expresses the idea that once an innovation has become other than the end nodes has degree 4, and there is a “hub” (not established for some group, it spreads throughout the net- shown) that is connected to all of the end nodes. Fig. 2, Lower work via the best reply dynamic. Morris (8) was the first to study shows a graph with a similar overall structure in which every node the properties of contagion in the setting of local interaction other than the hub has degree 4; however, in this case everyone games and to formulate graph-theoretic conditions under which (except the hub) is contained in a clique of size 4. In both networks contagion causes the innovation to spread throughout the net- all edges are assumed to have weight 1. work (for more recent work along these lines see refs. 9–11). Suppose that we begin in the all-B state in both networks, that Whereas contagion by itself may not guarantee fast learning in agents use log-linear learning with β = 1, and that the size of the a stochastic environment, a combination of autonomy and con- advance is α > 2/3. Let each network have n vertices. It can be tagion does suffice. The idea is that autonomy allows the in- shown that the waiting time to reach at least 99% A (and for this to novation to gain a foothold quickly on certain key subsets of the hold in each subsequent period with probability at least 99%) is network, after which contagion completes the job. unbounded in n for the network on the top, whereas it is bounded Consider a subset S of nodes, all of which are playing A, and independently of n for the network on the bottom. In the latter choose some i ∉ S. Let α be the size of the advance of A relative case, simulations show that it takes <25 periods (on average) for A to B. Then A is a strict best response by i provided that to penetrate to the 99% level independently of n. The key differ- X X ence between the two situations is that, in the network with cliques, 1 þ α wij > wij: [13] j∈S j∉S

Letting r = 1/(α + 2), we can write this as follows: X X wij > r wij ¼ rdi: [14] j∈S j∈V

Recall thatP for any vertex i and subset of vertices S, ; dði SÞ¼ j∈S wij is the total weight on the edges linking i to a member of S.GivenagraphΓ =(V, W), a real number r ∈ (0, 1/2), and a subset of vertices S,define the first r-orbit of S as follows: 1 ∪ ∉ : ; > : [15] Or S ¼ S i S d i S rdi Similarly, for each integer k > 1 recursively define the kth r- orbit by k k − 1 ∪ ∉ k − 1 : ; k − 1 > : (16) O r S ¼ O r i O r d i O r rdi Now suppose that r = 1/(α + 2). Suppose also that everyone in the set S is playing A. If the learning process is deterministic best response dynamics, and if everyone updates once per time pe- riod, then after k periods everyone in the kth r-orbit of S will be playing A. Of course, this argument does not show that log-linear learning with asynchronous updating will produce the same re- sult. The key difficulty is that the core set S of A-players might unravel before contagion converts the other players to A. How- ever, this problem can be avoided if i) the core set S reaches the all-A state within a bounded period and ii) S is autonomous and hence its members continue to play A with high probability. These conditions are satisfied if the core set is the union of autonomous sets of bounded size. We therefore have the following result.

Fig. 2. Two networks of degree 4, except for a hub (not shown) that is Proposition 3. Let G be a family of graphs. Suppose that there connected to every end node (dashed lines). All edge weights equal 1. exist positive integers s, k and a real number r ∈ (0,1/2) such that, for

Young PNAS Early Edition | 5of7 every Γ =(V, W) ∈ G, there is a subset of vertices S such that i) Sis – k the union of r-close knit sets of size at most s and ii) Or ðSÞ¼V. Then log-linear learning is fast on G whenever α > (1/r) – 2. We illustrate with a simple example. Let the network consist of a circle of n agents (the rim) plus a central agent (the hub). Each agent on the rim is adjacent to the hub and to its two immediate neighbors on the rim. Note that the hub is not contained in an r-close–knit set of bounded size for any r < 1/2 . However, for every r < 1/2, the hub is in the first r-orbit of the rim. Moreover, for every r < 1/3, there is an r-close–knit set of bounded size that consists of rim nodes; namely, choose any sequence of k adjacent rim nodes where k > 1/(1 – 3r). It follows from Proposition 3 that learning is fast for this family of graphs whenever α > 1. Proposition 3 shows that for some families of networks, log- learning is fast provided that is α large enough, where the meaning of “large enough” depends on the network topology. When the degrees of the vertices are bounded above, however, α there is a simple lower bound on that guarantees fast learning Fig. 4. Expected number of periods to reach 99% playing A as a function of independently of the particular topology. the size of advance α when β = 1, estimated from 10,000 simulations starting Proposition 4. Let G be a family of graphs with no isolated from the all-B state. vertices and bounded degree d ≥ 3. If α > d – 2, then log-linear learning is fast. mous. Therefore every vertex is contained in an autonomous set Proof sketch. Given α > d − 2, let k be the smallest integer that is of size at most k. It follows that log-linear learning is fast on all strictly larger than ðα þ 2Þ=ðα þ 2 − dÞ. I claim that every con- networks in G when α > d − 2. nected set S of size k is autonomous. By Proposition 2 it suffices Fast learning says that the waiting time is bounded for a given to show that any such set S is r-close-knit for some r > 1=ðα þ 2Þ. In particular it suffices to show that family of networks, but it does not specify the size of the bound concretely. Actual examples show that the waiting time can be ′; ′; surprisingly short in an absolute sense. Consider an innovation ∀ ′⊆ ; ′≠∅; dðS SÞ dðPS SÞ > = α : [17] S S S ¼ 1 ð þ 2Þ with advance α = 1, and suppose that all agents use log-linear dðS′Þ di i∈S′ learning with β = 1. Fig. 3 shows the expected waiting time to reach the 99% penetration level for two families of networks: Since S is connected, it contains a spanning tree, hence circles where agents are adjacent to their nearest four neighbors dðS; SÞ ≥ jSj − 1. Furthermore, d ≥ d for all i. It follows that i and 2D lattices. (Thus, in both cases the networks are regular of dðS; SÞ jSj − 1 degree 4.) The expected waiting time is less than 25 periods in ≥ > 1=ðα þ 2Þ; [18] both situations. In other words, almost everyone will be playing dðSÞ djSj A after about 25 revision opportunities per individual. where the strict inequality follows from the assumption that Note that this waiting time is substantially shorter than it takes jSj¼k > ðα þ 2Þ=ðα þ 2 − dÞ. This verifies eq. 17 for the case for a given individual to switch to A when his neighbors are β S′ ¼ S. It is straightforward to show that eq. 17 also holds for all playing B. Indeed, the probability of such a switch is e0/(e0 + e4 ) – β proper subsets S′of S. We have therefore established that every ≈ e 4 ≈ 0.018. Hence, in expectation, it takes about 1/0.018 = 54 connected set of size k is autonomous. Clearly every connected periods for any given agent to adopt A when none of his neighbors component of the network with size less than k is also autono- has adopted, yet it takes only about half as much time for nearly everyone to adopt. The reason, of course, is that the process is

Fig. 3. Simulated waiting times to reach 99% A starting from all B for circles Fig. 5. The level of innovative advance α =f(r) required to achieve an av- (solid curve) and 2D lattices (dashed curve). Time periods represent the erage growth rate of r per period. Simulation results for log-linear learning average number of updates per individual. with β =1on a 30 × 30 2D lattice are shown.

6of7 | www.pnas.org/cgi/doi/10.1073/pnas.1100973108 Young speeded up by contagion. The rate at which the innovation reached the target p. Then the rate of advance per period is r(α), α spreads results from a combination of autonomy and contagion. where (1 + r(α))T( ) =1+α; that is, Bursts of Innovation 1=TðαÞ r α ¼ð1 þ αÞ − 1: [19] We have seen that the speed with which innovations spread in a social network depends crucially on the interaction between The inverse function f(r) mapping r to α is shown in Fig. 5 for a three features: the size of the advance α, the degree of rationality 2D lattice and log-linear learning with β = 1. Fig. 5 shows that to β, and the existence of autonomous groups that allow the in- achieve a 1% average growth rate per period requires innovative novation to gain a secure foothold. The greater the advance is bursts of size at least 40%. A 3% growth rate requires innovative from the innovation relative to the status quo, the more rapidly it bursts of at least 80%, and so forth. spreads for any given topology, and the more that people are Of course, these numbers depend on the topological properties clustered in small autonomous groups the more rapidly the in- of the grid and on our assumption that agents update using a log- novation spreads for any given size of advance. Furthermore the linear model. Different network structures and different learning degree of rationality must be at an intermediate level for the rate rules may yield different results. Nevertheless, this example of spread to be reasonably fast: If β is too high it will take a very illustrates a general phenomenon that we conjecture holds across long time before anyone even tries out the innovation, whereas if a range of situations. Institutional change that involves a series of β is too low there will be so much random behavior that even approximate convergence will not take place. small step-by-step advances may take a very long time to spread In this section we examine how the size of the advance α affects compared with a change of comparable total magnitude that the rate of diffusion for a given family of networks. The results occurs all at once. The basic reason is that it takes much longer suggest that social innovations tend to occur in large bursts rather for a small advance to gain a secure foothold in an autonomous than through small incremental improvements. The reason is that group: The group must be quite large and/or it must be quite a small improvement takes a much longer time to gain an initial interconnected to prevent a small advance from unraveling. foothold than does an innovation that results in a major gain. Furthermore, under a small advance there are fewer pathways To illustrate this point concretely, let G be the family of 2D through the network that allow contagion to complete the diffu- lattices where each agent has degree 4. Fig. 4 shows the simu- sion process. The key point is that social innovations are tech- lated waiting time to reach the target p = 0.99 as a function of α. nologies that facilitate—and require—coordination with others For small values of α the waiting time grows as a power of α and to be successful. It is this feature that makes social change so is many thousands of periods long, whereas for large values of α difficult and that favors large advances when change does occur. (e.g., α > 1) the waiting time is less than 20 periods. To understand the implications of this relationship, suppose ACKNOWLEDGMENTS. I am indebted to Gabriel Kreindler for suggesting a that each successive innovation leads to an advance of size α. Let number of improvements and for running the simulations reported here. fi T(α) be the waiting time for such an innovation to spread (for The paper also bene ted from comments by Tomas Rodriguez Barraquer β and participants at the Sackler Colloquium held at the Beckman Institute, a given level of penetration p and level of rationality ). Assume December 3–4, 2010. This research was supported by Grant FA 9550-09-1- that each new advance starts as soon as the previous one has 0538 from the Air Force Office of Scientific Research.

1. North D (1990) Institutions, Institutional Change and Economic Performance (Cam- 15. Montanari A, Saberi A (2010) The spread of innovations in social networks. Proc Natl bridge Univ Press, Cambridge, UK). Acad Sci USA 107:20196–20201. 2. Young HP (2009) Innovation diffusion in heterogeneous populations: Contagion, 16. Skyrms B, Pemantle R (2000) A dynamic model of social network formation. Proc Natl social influence, and social learning. Am Econ Rev 99:1899–1924. Acad Sci USA 97:9340–9346. 3. Arrow KJ (1962) The economic implications of learning by doing. Rev Econ Stud 29: 17. Jackson MO, Watts A (2002) The evolution of social and economic networks. J Econ – 155 173. Theory 106:265–295. 4. Katz M, Shapiro C (1985) Network externalities, competition, and compatibility. Am 18. Goyal S, Vega-Redondo F (2005) Learning, network formation, and coordination. – Econ Rev 75:424 440. Games Econ Behav 50:178–207. – 5. David PA (1985) Clio and the economics of QWERTY. Am Econ Rev Papers Proc 75:332 337. 19. Jackson MO (2008) Social and Economic Networks (Princeton Univ Press, Princeton). 6. Arthur WB (1994) Increasing Returns and Path Dependence in the Economy (Univ of 20. Ellison G (1997) Learning from personal experience: One rational guy and the justi- Michigan Press, Ann Arbor, MI). fication of myopia. Games Econ Behav 19:180–210. 7. Watts D (1999) Small Worlds: The Dynamics of Networks Between Order and Ran- 21. Lazarsfeld PF, Merton RK (1954) Freedom and Control in Modern Society, ed Berger M domness (Princeton Univ Press, Princeton). (Van Nostrand, New York). 8. Morris S (2000) Contagion. Rev Econ Stud 67:57–78. 22. Centola D (2010) The spread of behavior in an online social network experiment. 9. Golub B, Jackson MO (2008) How homophily affects the speed of learning and dif- – fusion in networks. Stanford University arXiv:0811.4013v2 (physics.soc-ph). Science 329:1194 1197. 10. Jackson MO, Yariv L (2007) The diffusion of behaviour and equilibrium structure 23. Kandori M, Mailath G, Rob R (1993) Learning, mutation, and long run equilibria in – properties on social networks. Am Econ Rev Papers Proc 97:92–98. games. Econometrica 61:29 56. – 11. Centola D, Macy M (2007) Complex contagions and the weakness of long ties. Am J 24. Young HP (1993) The evolution of conventions. Econometrica 61:57 84. Sociol 113:702–734. 25. McFadden D (1973) Frontiers in Econometrics, ed Zarembka P (Academic, New York), 12. EllisonG (1993) Learning, local interaction,and coordination. Econometrica 61:1046–1071. pp 105–142. 13. Blume LE (1995) The statistical mechanics of best-response strategy revision. Games 26. Foster DP, Young HP (1990) Stochastic evolutionary game dynamics. Theor Popul Biol Econ Behav 11:111–145. 38:219–232. 14. Young HP (2006) The Economy as a Complex Evolving System III, eds Blume LE, 27. Young HP (1998) Individual Strategy and Social Structure: An Evolutionary Theory of Durlauf SN (Oxford Univ Press), pp 267–282. Institutions (Princeton Univ Press, Princeton).

Young PNAS Early Edition | 7of7