How network structure impacts socially reinforced diffusion?

by

Jad Sassine

M.S. Applicable Mathematics London School of Economics (2013)

SUBMITTED TO THE SLOAN SCHOOL OF MANAGEMENT IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE IN MANAGEMENT RESEARCH

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

MAY 2020

©2020 Massachusetts Institute of Technology. All rights reserved.

Signature of Author: ______

Department of Management April 17, 2020

Certified by: ______

Hazhir Rahmandad Associate Professor of System Dynamics Thesis Supervisor

Accepted by: ______

Catherine Tucker Sloan Distinguished Professor of Management Professor, Marketing Faculty Chair, MIT Sloan PhD Program 2

How network structure impacts socially reinforced diffusion? by

Jad Sassine

Submitted to the Sloan School of Management on April 17, 2020 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Management Research

Abstract

Social scientists have long studied adoption choices that depend on the number of prior adopters. What is the effect of network structure on such adoption dynamics? The emerging consensus holds that when agents require a high reinforcement threshold for adoption, clustered networks are better conduits of social contagion than random ones. Using models with deterministic thresholds this argument formalizes the idea that transmission will get ‘stuck’ should the number of neighboring adopters fall below a threshold. In this paper, we explore the effect of stochastic thresholds on the diffusion races between random and clustered networks. We show that even low probabilities of adoption upon a single contact would tilt the balance in favor of random networks, a tendency that is reinforced with the size of the network. Moreover, if repeated signals from the same adopter can reinforce a message, random networks are further promoted. However, we also show that clustered networks can still be preferred over random networks if adopters become ‘inactive’ – i.e. they stop sending messages - with high probability. These findings refocus our theoretical understanding of how network structure moderates social influence, and raises new questions on contagion phenomena that benefit from clustered networks.

Thesis Supervisor: Hazhir Rahmandad Title: Associate Professor of System Dynamics

3

Introduction

Social influence is at the heart of social sciences. From joining an online , to adopting a social norm, participating in a riot or buying a novel product, many important choices are strongly influenced by the existence and the number of other adopters (Karsai, Iniguez, Kaski, & Kertész, 2014; Scott, Konstantin,

& Milan, 2016; Ugander, Backstrom, Marlow, & Kleinberg., 2012). These adoption dynamics have historically been modeled using behavioral thresholds, where a threshold represents the minimum number of social connections who should have signaled their adoption before the focal actor is convinced to adopt

(Granovetter, 1978).

In understanding social influence, a central question of theoretical and practical relevance relates to the type of network that best facilitates diffusion: Would cohesive communities promote the adoption of new social norms? What about the impact of random boundary spanning ties on diffusion of infectious diseases, or social movements? In promoting new organizational practices should intra-organizational social networks be seeded with cohesive clusters or cross-cutting relationships? Communication networks are rarely fully connected, and even neighboring adopters do not always broadcast their choices. Earlier studies highlighted the central role of random ties connecting disparate parts of social networks in promoting diffusion (Granovetter, 1973). However, threshold models identified an important value of repetition: unless located in a highly clustered network, high threshold agents may not receive sufficient reinforcement to adopt, thus breaking the diffusion dynamics in a population of high- threshold individuals.

It is this intuition that led Morris (2000), Centola & Macy (2007), and Montanari & Saberi (2010) to argue that for an important class of diffusion dynamics clustered networks dominate random networks, i.e. they lead to more or faster diffusion, because they decrease the fraction of isolated agents. From a policy perspective, this insight has implied that maintaining social cohesion increases the speed of diffusion when adoption involves risk, complementarity or normative acceptance (Centola, 2018). The emerging consensus has thus separated social diffusion into “simple” and “complex” alternatives, where the simple diffusion is

4 enhanced by more random networks and the corresponding weak ties, but clustered networks engendering significant reinforcement are needed for enabling complex contagion.

The argument in favor of clustered networks relies on the idea that transmission will get ‘stuck’ should the number of neighboring adopters fall below the adoption threshold. This argument is elegant; however, it has only been formally shown using deterministic thresholds where adoption is impossible below the threshold reinforcement. Considering (the arguably more realistic) stochastic activation functions, where transmission likelihood is non-zero for any level of reinforcement but becomes significantly more likely at a given threshold, could change the calculus. The transmission would not be stuck indefinitely for any actor, unless additional mechanisms are invoked. As a result, a tradeoff emerges between the cost of being temporarily slowed down due to low reinforcement and benefits of seeding the contagion in distant parts of a social network. The resulting tradeoff is intricately dependent on the state of diffusion. As the diffusion spreads in a random network, the frontier connecting susceptible to adopters expands creating an exponential growth dynamics not observed in clustered networks. Moreover, later in diffusion susceptible nodes in a random network become increasingly likely to be connected to more than one adopter as the network saturates. Therefore in the later stages random networks may actually facilitate complex contagion by exposing all susceptible nodes to a large level of reinforcement simultaneously. The implication of these mechanisms for the race between random and clustered networks has not been previously studied, and they may be of significant theoretical and practical relevance. If these mechanisms promote random networks significantly, we may need to revisit the dichotomy between simple and complex contagion, or seek other mechanisms that explain why clustered networks may promote diffusion under specific conditions.

We develop a general model allowing us to study this question under a variety of probability activation functions. We build the model on the principle that two people may behave differently, even if they have the same adoption threshold and reinforcement levels. In other words, the thresholds are stochastic and not deterministic. We then consider that repeated signals from the same source may accumulate and reinforce adoption. After all, repetition is a very powerful tool of persuasion and

5 experimental evidence shows that agents generally overweight repeated information (Enke &

Zimmermann, 2019). By comparing the behavior of this model on different network structures, we can identify the conditions under which clustered networks dominate random ones. Our analysis suggests those conditions are rather narrow and strongly depend both on the shape of the probability activation function and the size of the network. As prior theory had suggested, deterministic thresholds promote clustered networks, but this regularity breaks down as soon as realistically stochastic activation functions are considered. Moreover, the larger the network, the stronger are the benefits of random networks for complex contagion. Finally, if repetition can engender adoption, then random networks are further strengthened against clustered diffusion. Overall, incorporating two behaviorally realistic features of social influence, stochastic thresholds and repetition, can significantly tilt the balance of diffusion speed in favor of random networks.

We also explore how the horserace between random and clustered networks is impacted by heterogeneity in adopters’ motivation to share information. In the ‘The Strength of Weak ties’, Granovetter

(1973) had already postulated that “if the motivation to spread [a] rumor is dampened a bit on each wave of retelling […] bridges will not be crossed”. In other words, the spread will get ‘stuck’. We confirm this insight and show that clustered networks can still be preferred over random networks over some parts of the space as the probability that an adopter becomes ‘inactive’ (i.e. stops sending messages) increases.

Therefore, we show that the strength of clustered networks depends on two likelihoods – one for adoption upon a single contact, the other for adopters becoming silent. We end by discussing the policy implication of this insight, proposing that how the messages are sent may be as important as ‘what is to be diffused’.

Specifically, if the environment limits repeated interactions, then maintaining social cohesion may speed up diffusion. Otherwise, from common social networks to the so-called conversational firms (Turco, 2016), where repeated signals are common and low cost, the benefits of distant ties are exponential, increase with network size, and may well apply to ‘whatever is to be diffused’.

6

From deterministic to stochastic thresholds

Connecting socially distant actors promotes diffusion by enabling it to spread simultaneously in different parts of the network (Granovetter, 1973). Thus rewiring highly clustered networks to create such shortcuts increases the speed of diffusion (Watts & Strogatz, 1998). In Complex Contagion and the Strength of Weak ties, Centola & Macy (2007) showed that this simple intuition may not generalize to ‘whatever is to be diffused’. Specifically, they created a typology of contagion in terms of the distinct number of adopters required for transmission to occur. If that threshold is equal to one, then the contagion is simple and random networks dominate clustered ones because of their many shortcuts across different parts of the network.

Centola and Macy showed that in ‘complex’ contagion, where the threshold is higher than one, those shortcuts become less effective, even harmful. Because adoption is contingent upon multiple reinforcements, a single weak tie is an ineffective conduit. In fact rewiring to create those distant connections – by reducing the density of local reinforcement-- may hurt complex diffusion. This intuition has promoted the distinction between simple and complex contagion as fundamental to understanding diffusion processes. However, the basic insight was developed in comparing two extremes: where all that it takes for diffusion is a single contact (simple) vs. a requirement for at least two reinforcements from different adopters (complex diffusion). We know little about the intermediate cases where additional reinforcement adds value but a single contact, with a small probability, may be sufficient for adoption.

Arguably most real-world diffusion processes are better represented by such stochastic thresholds. In practice we expect agents to influence each other over repeated interactions, with each reinforcement incrementally increasing the probability of adoption. How should we think about the relative merits of clustered vs. random networks under such repeated interactions with stochastic adoption thresholds?

We first build intuition about relevant tradeoffs using a very simple example, and then extend the analysis to the more general case. Let us consider the impact of rewiring local links to create shortcuts on a ring lattice with four links per node (Figure 1). We show a random rewiring that replaces two local links

(A-B and C-D) with the distant short-cuts (A-C and B-D). Such a rewiring increases the randomness of the

7 graph while keeping the degree of each node constant at four. In the graph, black nodes are the adopters and white nodes are susceptible.

Figure 1: 1-dimensional lattice network (left) and with a single rewiring (right)

First consider the classic ‘simple’ and ‘complex’ contagion where one and two adopter neighbors are necessary (and sufficient) for adoption. In the local graph (left) the diffusion progresses locally in both clockwise and counter-clockwise directions. In simple contagion each frontier of contagion (region where susceptible nodes and adopters come together) includes two susceptible nodes that adopt every period for a speed of four nodes per period. That rate goes down to two nodes per period in complex contagion where a single susceptible node is connected to two adopters on each of two contagion frontiers. The rewiring

(right graph) impacts the simple and complex contagion differently. In simple contagion rewiring seeds the diffusion in another region of network, opening two new frontiers (and while the different frontiers have not merged) doubling the diffusion speed. In contrast, rewiring hurts complex diffusion because absent A-

B link the B node is only connected to one adopter, which stops the diffusion when a frontier reaches B. In this case rewiring cuts the diffusion speed by half. By developing this simple intuition Centola and Macy

(2007) showed that random rewiring may have very different effects depending on whether one is concerned with simple or complex contagion.

8

But what if we consider complex contagions where with probability p adoption may also happen in response to a signal from a single adopter neighbor? An approximation of early diffusion speed in this more general scenario is informative. The contagion may now get stuck at B for multiple periods. However, if the interaction is repeated, a non-zero value of p implies that transmission will ultimately occur both locally and across the network, doubling the rate of adoption per round moving forward. For the diffusion rate to double, A needs to infect C, and B needs to infect D. However, B needs to adopt (before he can infect D), creating a bottleneck of two successive single-neighbor adoptions with the expected waiting time before (B, C and) D adopt at 1⁄�. During that time, the initial rate of spread (call that r) is divided by two, so the total number of susceptible nodes goes down by �⁄2� leaving us with � − �⁄2� susceptible nodes adopting with rate 2r. Compare that with diffusion speed in the clustered network (left) which after 1⁄� has � − �⁄� susceptibles adopting at rate r. Comparing these adoption rates rewiring helps if

� 2� < � − �⁄� � − �⁄2�

Which offers a threshold for probability of single-contact adoption that promotes rewiring:

3� � > (1) 2�

This result is consistent with Eckles et al. (2019) who proved the existence of a similar threshold on a graph generated by the union of a lattice network and a random network.

The value of rewiring depends on the network size, N: the higher the N, the lower the p can be while rewiring enhances diffusion speed. This analytical solution focused on early diffusion speed, and additional considerations should be accounted for as the number of shortcuts increases. On the one hand, as the number of shortcuts increases, the speed may double further, because diffusion can occur over more chunks at the same time. However, as the number of shortcuts increases, the size of each chunk decreases.

Since the maximum diffusion speed (2r in the above example with a single rewiring) is not realized for a while, the smaller and overlapping chunks are especially costly. Diffusion should wait significantly for any chunk to activate, but once active it is quickly consumed so benefits of rate increase are short-lived. At the

9 extreme, all triangles are rewired to form shortcuts resulting in a regular , the number of chunks grows to the size of the network, but each offers little gain for diffusion speed. The resulting tradeoffs are not analytically tractable, and next we turn to a simulation model to understand the race between clustered and random networks in facilitating a broad class of contagion processes.

A model of Contagion with repeated interactions and stochastic thresholds

We model a population of agents where each ‘active’ node j (i.e. an adopter that is actively sending messages to her neighbors) sends messages to all her neighbors at a rate �. Each susceptible node keeps track of all the messages received from each adopter. Let �(�) represent the total number of messages from adopter j remembered by agent i at time t (we consider forgetting in an extension; the base model increments messages from agent j without forgetting). The susceptible agent (down)weights multiple messages from the same adopter using a discount factor w and combines the resulting signals from all her

�(�) adopter neighbors into an activation score, �(�):

() () �(�) = � (2)

The discount factor is between 0 and 1, with a value of zero reflecting susceptible agents who ignore all but the first message from any given adopter. In fact in this scenario the model reduces to the one by Centola

& Macy (2007) where additional messages have no effect. In contrast with w=1 susceptibles are convinced equally by repeated messages from the same person as with the same number of messages from distinct adopters. Finally, every period a susceptible agent receives any message, she adopts with probability

�(�(�)) where P is an increasing function of activation score starting from zero when �(�) = 0 and not exceeding one. For our model we use the following logit function to operationalize p:

1 �(�) = (3) 1 + �()

10

reflects the inflection point in P where probability of adoption reaches 0.5 and determines the steepness of the curve, and therefore the activation probability when U is smaller than . We use a value of in the following analyses and vary to control how much a single message can trigger adoption.

In short, this model introduces two realistic mechanisms that broaden the threshold models of diffusion (Granovetter, 1978). First, repetition from the same source may lead to persuasion. Repeated messages may be truly informative if the adopter learns about new features of ‘what is to be diffused’ over time. But even when it is not, experimental evidence suggests people do not fully discount repeated signals from the same adopter (Enke & Zimmermann, 2019). With w values above zero the same adopter can convince a susceptible node through repeated messages. The second mechanism is the use of an stochastic p function. Deterministic threshold models specify a threshold T and assume P(U|U

Allowing P to take non-integer values reflects the stochastic nature of most real diffusion processes: some susceptibles may adopt with a positive probability after a single message, and some may not adopt after two or more messages. Next we analyze the impact of each mechanism on the race between clustered and random networks, before combining the two mechanisms for more general outcomes.

Analysis

Following prior theoretical analyses (e.g. Centola & Macy (2007)) we assume agents are communicating on a two-dimensional lattice (also known as Moore) network or its rewired variants. We use a constant degree for all nodes to focus the analysis on the impact of clustering without confounding that with heterogeneity due to . Each node in the network starts as ‘susceptible’, except for one random node and her neighbors who are initialized as ‘adopters’. At each time step, the simulation proceeds as follows. First, each adopter sends a message to a fraction of his friends (we will show results when that fraction is equal to 1, i.e. when each adopter sends a message to all his friends, but results are robust to smaller fractions). Then, we go through each susceptible node who received a message and update his state according to equations (2) and (3).

11

Clustered vs random networks

Changing the rewiring probability allows one to span the range between 2-dimensional lattice and random networks with the same constant degree. For simplicity we focus the exposition on comparing the two extremes which best highlights the key intuitions. In our networks each node is connected to eight

(closest/random) neighbors (for lattice/random networks). Prior work has shown that random (clustered) networks lead to faster diffusion when deterministic thresholds of � = 1 (� = 2) are combined with non- repetitive interactions (w=0). To understand the more general case, we first explore what happens when

0

Figure 2: Cumulative adoption for clustered (blue/dashed) vs random (orange) as P(1), the probability of adoption when neighboring adopters is 1, increases from 0.006 to 0.1 (N = 2500)

Changing the parameter β offers various P(1) values (note that assuming a threshold of τ=2, we fix

P(2) at 0.5; see equation 3) that span the range between simple (P(1)=P(2)) and complex (P(1)=0) diffusion.

Figure 2 compares diffusion trajectories over time on clustered (blue dashed) and random (solid red) networks for three (rather small) values of P(1). A couple of patterns emerge. First, we can replicate settings in which clustered networks dominate random ones (panel a). However, those scenarios require very small values for P(1). As P(1) grows random networks quickly gain the upper hand: the dominant network switches at P(1) close to the analytical threshold of 1⁄√� ~0.02 we calculated earlier. Second, diffusion on random networks is characterized by two distinct phases: a slow early phase (typically slower than

12 clustered case) and a much faster late diffusion. The higher the P(1), the sooner that phase shift in the diffusion pattern occurs which promotes the random network over the clustered one.

What is the typical range for P(1) in practice? A few empirical pointers may inform the choice of

P(1) relative to P(2) in realistic networks. For example (Centola, 2010) experiments on the spread of health behavior showed a P(2)/P(1) ratio close to 2. Similar studies in the study of peer effects in online social networks have find values between 1 and 4 (Aral, Muchnik, & Sundararajan, 2009; Bakshy, Rosenn,

Marlow, & Adamic, 2012; Bjarke, Sapieżyński, Ferrara, & Lehmann, 2017; Ugander et al., 2012). Those findings further underline the importance of exploring stochastic thresholds. For example the ratio goes to infinity for a deterministic threshold of τ=2. Given the observed empirical ranges for P(2)/P(1), it may well be the case that in typical settings we would not observe clustered networks outperform random ones, at least based on the mechanisms explored so far.

Figure 3: Frontiers of the three network structures considered: (1) 1-dimentional lattice network, (2) 2-dimentional or Moore network, and (3) tree network. �� denotes the number of adopters each susceptible node is connected to.

13

The emergence of two distinct phases in diffusion as we move from clustered to random networks is a central feature of diffusion dynamics and key to understanding the relative diffusion speed in different networks. This dynamic is partially informed by the shape of diffusion frontier in a given network. Figure

3 shows those frontiers for three different networks: panels A and B represent one- and two-dimensional lattice networks, and panel C shows a tree network. The diffusion frontier is where adopters (filled circles) connect to susceptible ones (empty circles). Networks A and B are clustered: one’s contacts are likely to be connected to each other. In contrast a tree network has no triangles and thus provides a good approximation for what happens on a random graph in the first phase of diffusion.

In a one-dimentional lattice network, the size of frontier (the number of contacts between susceptibles and adopters) is always constant (four in figure 3-1). In a two-dimensional lattice the size of frontier increases with the number of adopters with a rate that reflects the growth in the boundary around an area filled with adopters. In particular, if � is the total number of adopters, the size of frontier equals

4√� + 1. In a tree network, the size of frontier grows much faster with the number of adopters: since each adopter is connected to 7 unique susceptible nodes, the number of contacts is equal to 7�. While in the tree network the number of contacts between susceptible and adopter grows more rapidly with the number of adopters, the connected susceptible each are connected to a unique adopter. Therefore the progression of diffusion on the tree network requires each node to be convinced by signals from a single adopter, a low-probability event. In contrast, many nodes on the frontiers of lattice networks are connected to two adopters, benefiting from the higher adoption likelihoods that scale with P(2). To formalize the argument, we can distinguish between two types of susceptible nodes on the frontier: � represent those with contact to a single adopter, and � represent those with connections to more than one adopter. � nodes adopt with rates proportional to P(1), whereas � nodes adopt with rates proportional to P(2) or higher.

Note that P(K) for K>2 is relevant for nodes with contact to more than two adopters; but with P(2)=0.5 in our setting explicitly capturing nodes with higher order connections does not change the qualitative argument. Total adoption rate, AR, would then scale with:

14

�� ≈ ��(1) + ��(2) (4)

In clustered networks (networks 1 and 2 in figure 3, abbreviated � and �), the majority of nodes on the

frontier are �, so the adoption rates can be approximated by �� = � �(2), and �� = � �(2) respectively. In the tree network N3 all frontier nodes are �, so the adoption rate can by approximated by

�� = � �(1). Completing a differential equation approximation for these three networks we can calculate the early trajectory for adoption assuming that the initial number of adopters is equal to 9 (one focal adopter and all her contacts):

�� = �� ≈ � �(2) = 6�(2) => �(�) ≈ 6�(2)� + 9 (5) ��

�� 1 = �� ≈ � �(2) = 4�(2)� => � (�) ≈ (4�(2)� + 9) (6) �� 4

�� = �� ≈ � �(1) = 7�(1)� => � (�) ≈ 9�() (7) ��

In short, ring lattice shows a linear growth in adopters with a slope proportional to P(2), Moore lattice has a quadratic growth with P(2)2 multiplier, and tree network grows exponentially but with an exponential term that scales with P(1). Given the smaller values for P(1) compared to P(2) early growth is slow for �, but once under way the exponential term takes over and dominates rates in lattice networks. These results explain the faster early adoption rates in clustered networks, as well as the gradual catch up in random networks. The shift to a rapid, step-like, growth in random networks is driven by another mechanism however to which we turn next.

Early in the diffusion a tree network provides a good approximation for a random network because random network has very small clustering and thus most nodes on the frontier belong to �. However, as diffusion progresses random networks diverge from unbounded trees in two ways. First, with a limited population the number of available susceptible nodes drops, limiting diffusion through saturation. This limiting effect is counter-balanced by a different force: as adopters send messages to disparate parts of the network, random networks increasingly host susceptible nodes with contacts to multiple adopters. As a result the term ��(2) becomes more important in the later stages of the diffusion. Once the majority of

15

susceptibles fall into the � category, the diffusion goes through a phase shift where all remaining nodes adopt in a few periods (because of high P(2)). This mechanism contrasts with the more traditional diffusion models in which saturation takes over after initial growth. In fact, with small P(1) values, the fastest diffusion rates in random graphs unfold not because of the exponential term (equation 7), but due to the emergence of many reinforcing distinct contacts on the frontier (i.e. the ��(2) term). We show this phase transition in figure 4, tracking the cumulative adopters of each node type (�, �, �, etc) over time on a random network. Initially, all the adoption is from � (blue), just as is the case in tree networks. As the contagion spreads, however, the number of � nodes increases rapidly, taking over as the driver of diffusion around time 10. The phase shift ensues where in a few periods all the remaining nodes, exposed to multiple reinforcing messages, become adopters. It is the timing of this phase shift that determines the outcome of the race between clustered and random networks: if it happens before the completion of diffusion in the clustered networks, random networks dominate.

To characterize the timing of the phase shift, we can write the probability of adoption as a weighted average between P(1) and P(2):

�(�) �(�) �~ 1 − P(1) + P(2) (8) � �

Equation (8) says that when the cumulative number of adopters �(�) is small, the probability of adoption is approximately equal to P(1). However, as �(�) increases, the weight continuously moves to P(2). The

() () change of phase occurs when P(2) > 1 − P(1), i.e. when

�(�) P(1) > (9) � P(1) + P(2)

That happens when enough adopters are generated through the ��(1) term so that the majority of new adoption can come from the ��(2) term. The exponential growth driven by ��(1) is the initial engine to turn the phase shift on, after which diffusion completes rapidly.

16

Figure 4: cumulative adopters of each node type over time on a random network

In summary, we have identified two types of mechanisms regulating the race between clustered and random networks. The first type involves the tradeoff between probability of adoption and the number of contacts. Due to reinforced signals clustered networks have higher probability of adoption for each node on the frontier, but the frontier grows more slowly in the absence of long-distance links. Random networks are slowed down when single contacts are the main channel for diffusion, but benefit from exponential growth in the size of the frontier. The second mechanism involves a change of phase in random graphs where the exponential growth in the number of susceptible nodes ceases, but remaining susceptible nodes receive so much reinforcement that they all adopt in tandem. The initial exponential growth phase, even if slow, is crucial to get the network to the second phase. However, prior work using deterministic thresholds for complex contagion, by assuming P(1)=0, turned off the exponential growth mechanism and thus significantly hampered diffusion on random graphs. Our extended model shows how the second phase is key to strength of random networks and why even small, non-zero adoption probabilities for single messages may be quite consequential.

17

Relationship between infectivity and network size

At the heart of the previous argument is the comparison between linear and exponential growth. Figure 5 generalizes that comparison for different combinations of P(1) and network size. The y-axis shows variation in P(1), the value of the activation function when the number of adopters is equal to 1. The contour value associated with each point shows the normalized difference in area below the cumulative adoption time series between random and clustered networks. Specifically, we calculated the area below each time series between 0 and T, where T is the time after which adoption stops in both time series. If � and � correspond to the area below the series for clustered and random networks respectively, we graph the relative strength

of random networks as �� = where the normalization term � is the average between � and �

(averaged over 100 runs). A positive value implies that random networks dominate clustered networks, and the higher the value the higher the relative strength of random networks.

There are three main reason why we choose this measure (relative to, for example, the difference in time for diffusion to reach 100% of nodes). First, when we analyze extensions to this model where the maximum adoption fraction is less than 1, we need to compare two outcomes (not just the time to reach the maximum fraction, but also the maximum fraction itself which differs across runs). This measure collapses these two dimensions into one and simplifies the comparison of graphs across sections. Second, this measure formalizes the intuition that speed (early on in the process) matters, especially if the maximum reached is not significantly different. Finally, qualitative insights from Figure 5 do not change whether we use this measure, or the difference in time for diffusion to reach 100% of nodes (which can be done in this section because the maximum fraction is always one).

To provide more intuition, we calculate this measure for figure 2.B. The time at which both time series reach their maximum is 49. The area below the random time series between [0, 49] is 27. The area below the clustered time series in that same interval is 23. So, the difference is 4. Normalized by (27 +

23)/2, the score is equal to 0.17. A similar calculation would show that in figure 2.A (where clustered networks dominate random ones) the score is -0.74, and in figure 2.C (where the dominance of random

18 networks is more pronounced than in figure 2.B, the score is 0.51). Note that in Figure 5, we calculate the average score, not the score of the average (as we just did in this example).

Figure 5: Interaction between P(1) (y-axis) and network size (x-axis) in determining the relative strength of clustered networks. The contour value associated with each point shows the normalized difference in area (DA) below the cumulative adoption time series between random and clustered networks. Higher than 0 contour values show regions where random networks dominate clustered ones. We also indicated the location of each parameter configuration explored in figure 2

Two features are noteworthy in understanding the diffusion race between random and clustered networks. First, random networks dominate as we increase P(1). The time at which the initial exponential growth (in random networks) takes over the (second order) growth in clustered networks depends on the

P(1) vs. P(2), regulating the outcomes of equations 6 and 7. The lower P(1) is relative to P(2), the longer it will take for the exponential curve to take over. However, in most of the parameter space random networks dominate. In practice, the only part of the space where clustered networks boost diffusion is where P(1) is less than 0.02. This corresponds to a P(2)/P(1) ratio of 25, not a realistic range for empirical estimates of

P(2)/P(1). For example Centola (2010) ran an experiment to measure the effect of network structure on the adoption of health behavior. In this experiment the reported value of P(2)/P(1) is close to two, i.e. 12 times smaller. Second, network size generally aids random networks (i.e. reduces the P(1) values where the two

19 networks are on par). The higher the network size, the more time there is for the exponential growth to show its full impact, and therefore the lower P(1) can be relative to P(2) for random networks to dominate clustered networks. An exception to this is when p(1) is so small that diffusion rate is faster on clustered networks than on random networks (up until clustered networks win the race). Here increasing N increases the advantage that clustered networks can accumulate over random networks. Figure 6 shows this when

P(1) is set to 0.001.

Figure 6: Cumulative adoption for clustered (blue/dashed) vs random (orange) as the size of the network increases from 900 to 2500 (p(1) = 0.001)

Finally, note that when network size is very small, random networks always dominate clustered networks. This results emerges because for a very small network many nodes find themselves connected to multiple adopters at the very beginning of diffusion (remember that A(0)=9 in our setting). Therefore the second phase, where random networks have the advantage, is reached almost from the start of the simulation

(see equation 9) and therefore random networks again gain the upper hand for very small networks.

20

What if repetition matters?

So far we excluded the impact of repetitive contacts on adoption by assuming w=0. However, experimental research shows that repetition matters, because people do not fully discount repeated messages from the same source (Enke & Zimmermann, 2019) . Therefore in this section we simulate the race between random and clustered networks when repeated messages from the same contact are not fully discounted. Figure 7 reports the contour plots relating P(1) and network size for four different values of w. Counting repeated contacts clearly benefits diffusion on random networks.

Figure 7: Interaction between P(1) (y-axis) and network size (x-axis) in determining the relative difference in area (DA) for different values of w. Note that figure 7.1 is the same as figure 5.

The basic mechanism is simple: if repeated contacts are accumulated, even a single adopter in one’s network could transmit enough reinforcing messages to convince individuals with high thresholds, strengthening the impact of single, distant, connections in the overall diffusion. Note the existence a threshold at 0.5 above which random networks dominate clustered everywhere. This is due to the fact while w <= 0.5, the score �(�) never goes above 2. Since for low p(1) the logit activation function (equation 3) is very steep around the inflection point �=2, it is almost flat when the score is below 2. As a result, the two figures 7.A and 7.B are almost the same. However, when w is such that the score can increase above 2, random networks dominate everywhere (as can be seen from figure 7.C and 7.D)

21

When senders become inactive

So far we have assumed that transmission occurs at every period. Yet sending information is an individual decision, and different individuals have different propensities for sending information that may change over time, or in response to environmental factors, such as communication costs. For example, before digital communication was common, the local spread of social movement was partly explained based on the cost of coordination with distant actors (Hedstrom, 1994). More generally communication costs are likely to reduce individuals’ motivation to broadcast their choices over time as erosion of novelty tempers their motivation. Therefore, we test the robustness of our results to variation in the probability of sending information. At each period, an adopter becomes ‘inactive’ – i.e. with probability π he stops sending messages from that point on. Because a susceptible agent only (re)considers adoption upon receiving a new message, inactivation reduces the ‘contagiousness’ of adopters over time, relaxing the assumption of temporal homogeneity (Strang & Tuma, 1993). Another way for inactivation to reduce contagiousness would be to (re)consider adoption at each time step and let agents forget some of the message they have received (potentially reducing the benefits of repetition); results are robust to either approaches.

Figure 8: Effect of P(inactive) on diffusion. Here N=2500 and P(1)=0.06 (similar to figure 2.C)

Figure 8 shows the effect of increasing π on the diffusion curves when √� = 50. Panel 1) replicates figure 2.3 where random networks dominate clustered. Setting π=0.5 has the effect of silencing several active nodes after a single transmission, rendering ineffective the repetition needed for single connections

22 to be the conduit of transmission. As a result, reduction in diffusion speed becomes more pronounced in random networks because it is more likely to find nodes with a single connection to adopters. With π = 1, no random network is able to reach more than a few nodes, and we recover the intuition that the spread can get ‘stuck’, consistent with deterministic thresholds. Figure 9 generalizes the interaction between P(1) and network size in determining the relative strength of random networks: as π increases, so does the area below the 0 contour line, which is where clustered network dominate random ones.

Figure 9: Interaction between P(1) (y-axis) and network size (x-axis) in determining the relative strength of random networks. Higher than 0 contour values show regions where random networks dominate clustered ones. The leftmost graph reproduces Figure 5. The other two figures show how the area below the 0 contour increases as π, the probability that a node becomes inactive increases.

We then explored how π interacts with the previously analyzed parameters. To do so, we randomly sampled points where π ranges between 0 and 1, p(1) between 0.001 and 0.3, w between 0 and 1, and network size between 10 and 100. Based on these points, we estimated a model predicting the probability that a random network dominates a clustered one from these parameters as well as with their polynomial combinations with degree of up to 8. We included a l1-penalization to select the most important features.

Results show that the single most important variables are π and p(1): including these two variables produces an AUC of 0.95 on a held out test set. Figure 10 shows the decision boundary above which random networks dominate clustered networks as a function of π and p(1) for different combinations of w and network size.

Each point represents a predicted probability conditional on the given parameter. As we can see from the graph, the shape of the decision boundary does not change across conditions: he higher π, the higher p(1)

23 needs to be for random to dominate clustered networks. w pushes the decision boundary to the right, reducing the area where clustered network dominates. Network size has practically no effect, which shows that although it is important to understand the magnitude of the different between random and clustered networks (as shown in the previous analyzes), it has no effect the sign.

Figure 10: Decision boundary above which random networks dominate clustered ones. The contour line corresponds to a probability of 0.5, based on predictions from a logistic regression model providing an AUC > 0.98 on both training and a held-out test set

Becoming inactive directly manipulates the number of messages a node receives from her neighbors, which in turn impacts diffusion dynamics. Forgetting of past messages may also impact the store of messages in

24 a receiving node, potentially reducing the benefits of repetition. To explore this effect we ran additional tests, letting susceptible agents forget messages at each period with probability ϕ. Results are robust to ϕ going up to 1. We also tested the impact of reducing the rate at which adopters send messages or increasing the rate at which susceptible nodes consider adoption (i.e. not just following the reception of a new message but at each time step). Results are robust across all these relaxations. All in all, these results show that

‘turning off’ is an almost necessary condition for clustered networks to dominate random ones.

Discussion

In this paper we revisited the role of network clustering in dynamics of social contagion. Recognizing the importance of stochasticity in threshold views of , we built a model that spans the traditional mechanisms with deterministic thresholds and those that allow stochastic responses to prior reinforcements. When deterministic thresholds are used, our model replicates the dominance of clustered networks over random ones when it comes to complex contagion. However, we find that this result is subject to two behaviorally important limitations. First, random connections to distant parts of a social network regain their value subject to small but non-zero probabilities of adoption upon a single contact.

When distant connections can occasionally trigger adoption, a reinforcing feedback process is activated that generates an exponential growth in the number of adopters, ultimately overwhelming the polynomial growth dynamics in clustered networks. The advantage of random connections increases with the size of the network because larger networks allow more time for the exponential growth mode to take over.

Curiously, also very small networks benefit disproportionately from randomness, because they quickly expose a large fraction of susceptible nodes to multiple adopter contacts. Thus contagion transitions rapidly to a second phase where all susceptible nodes in a random network, experiencing reinforcement from all directions, adopt in just a few periods. The second mechanism, the incorporation of repetitive messages on adoption choices, further tilts the balance in favor of random networks. When a random connection can convince a distant agent through repetition, those random connections become more viable as conduits of

25 social influence. Put together, these mechanisms suggest that once we move beyond the strictly deterministic thresholds, it is harder to find settings in which clustered networks dominate random ones.

Yet clustered networks may dominate when agents stop broadcasting their choices soon after adoption. When agents become inactive soon after adoption they are less likely to convince their distant contacts through repeated messages, reducing the value of those long-distance connections. This mechanism, under-appreciated in previous theoretical work, may actually be more relevant for understanding conditions under which clustering can boost diffusion.

The mechanisms we consider matter for theory and practice of social influence. The theoretical view that has solidified in the recent years separates contagion dynamics into complex and simple categories, with the former being strengthened by cohesive and clustered networks in contrast to the latter which benefits from random networks (Centola & Macy, 2007). Our initial results question that distinction, suggesting that mechanisms invoked to promote clustered networks are not robust to behaviorally plausible extensions, and random networks should be dominant more widely than previously recognized. Yet our results also highlight the importance of repetition (or lack thereof) in understanding the value of clustering.

For example some experimental research has showed clustered networks could indeed speed up diffusion

(Centola, 2010) even when probability of adoption after a single contact is rather high. Our analysis shows that this can only happen if adopters become inactive with high probability – i.e. they do not want to (or are constrained to not) send more than one message. This suggests creating a typology of ‘what is to be diffused’ according to ‘where it is to be diffused’. If the social space is such that repeated interactions with distant actors is costly, then clustered networks could indeed dominate random ones. These competing processes matter for understanding adoption of practices, beliefs, and technologies in organizational settings. For example tools reduce the costs of repeated interactions. This shift in organizational structures towards more open communication environments (Turco, 2016) can also shift the type of organizational networks most fitted for adoption of new technologies and ideas. An understanding of the role of repetition suggests modern organizations may increasingly benefit from random connections

26 across boundaries that facilitate diffusion processes, promoting organizational members to speak up and share ideas across formal organizational boundaries.

A couple of boundary conditions are noteworthy in interpreting our results. First, our study varied network structures while keeping the adoption thresholds and the motivation to share information constant across networks. This assumption may not hold in practice. For example, it is well known that social cohesion and range affect people’s motivation to share (Reagans & McEvily, 2003). Therefore, the full impact of randomizing a network should include the possibility that distant connections have higher thresholds to adopt, or less motivation to share as replacing triangles with shortcuts reduces social cohesion

(Coleman, 1988, 1994; Granovetter, 1985) while increasing range (Burt, 2009). A more nuanced analysis could endogenize this effect.

Second, repetition of signals may have more nuanced impacts on adoption. There are at least three mechanisms supporting why individuals may not be infinitely susceptible to change behavior in response to repeated signals. The first mechanism is cognitive dissonance: past choices impact current preferences

(Brehm, 1956; Velleman, 2000). Cognitive dissonance becomes relevant in the context of multiple exposures to cues when individuals "rationalize the choices they make when confronted with difficult decisions by claiming they never wanted the option they did not choose" (Jarcho, Berkman, & Lieberman,

2011). As a result, if an individual happens (potentially randomly) to not to adopt a behavior, cognitive dissonance implies that the probability of adoption may actually decline with additional rounds of (non- adopting) exposure. The second mechanism is habit formation. When a behavior is repeated multiple times, it will lead to habit (see Wood & Rünger, 2016 for a review), whereby behavior is a triggered by a learned association between contextual cues and the behavior (i.e. the choice). Over multiple exposures to affirmative cues, if an individual repeatedly ignores the choice, or chooses an alternative, extra exposures may actually reduce the probability of adoption regardless of true payoffs from adoption. A third mechanism is rational Inattention. It is well known that attention is a limited resource (Broadbent, 1958;

Schneider & Shiffrin, 1977; Simon, 1971) and as a result ought to be allocated with care. This may lead users to be ‘rationally inattentive’ (Caplin & Dean, 2014; Sallee, 2014; Sims, 2003) to uninformative pieces

27 of information. Here uninformativeness is related to surprise in an information theoretic sense. Unless the world is changing fast, repetition necessarily leads signals to become uninformative. Taken together, these mechanisms provide additional pathways through which repetition may become less important, and diffusion may get stuck on random networks.

Reference

Aral, S., Muchnik, L., & Sundararajan, A. (2009). Distinguishing influence-based contagion from -driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012). The Role of Social Networks in Information Diffusion. Proceedings of the 21st International Conference on World Wide Web (WWW), 519–528. Bjarke, M., Sapieżyński, P., Ferrara, E., & Lehmann, S. (2017). Evidence of complex contagion of information in social media: An experiment using Twitter bots. PloS One, 12(9). Brehm, J. W. (1956). Postdecision changes in the desirability of alternatives. Journal of Abnormal and Social Psychology, 52(3), 384–389. https://doi.org/10.1037/h0041006 Broadbent, D. E. (1958). Perception and Communication. New York: Pergamon Press. Burt, R. S. (2009). Structural holes: The social structure of competition (Harvard un). Caplin, A., & Dean, M. (2014). Revealed Preference, Rational Inattention, and Costly Information Acquisition. National Bureau of Economic Research Working Paper 19876. Centola, D. (2010). The Spread of Behavior in an Online Social Network Experiment. Science, 329(5996), 1194–1197. Centola, D. (2018). How behavior spreads: The science of complex contagions. Vol. 3. Princeton University Press. Centola, D., & Macy, M. (2007). Complex Contagions and the Weakness of Long Ties. American Journal of Sociology, 113(3), 702–734. Coleman, J. S. (1988). in the creation of human capital. American Journal of Sociology, 94, S95–S120. Coleman, J. S. (1994). Foundations of social theory. Harvard university press. Eckles, D., Mossel, E., Rahimian, A., & Sen, S. (2019). Long ties accelerate noisy threshold-based contagions. Enke, B., & Zimmermann, F. (2019). Correlation Neglect in Belief Formation. The Review of Economic Studies, 86(1), 313–332. Granovetter, M. (1973). The Strength of Weak Ties. American Journal of Sociology, Vol. 78, pp. 1360– 1380. Granovetter, M. (1978). Threshold Models of Collective Behavior. American Journal of Sociology, 83(6), 1420–1443. Granovetter, M. (1985). Economic action and social structure: The problem of embeddedness. American Journal of Sociology, 91(3), 481–510. Hedstrom, P. (1994). Contagious Collectivities: On the Spatial Diffusion of Swedish Trade Unions. American Journal of Sociology, 99, 1157–1179. Jarcho, J. M., Berkman, E. T., & Lieberman, M. D. (2011). The neural basis of rationalization: cognitive dissonance reduction during decision-making. Social Cognitive and Affective Neuroscience, 6(4), 460–467.

28

Karsai, M., Iniguez, G., Kaski, K., & Kertész, J. (2014). Complex contagion process in spreading of online innovation. Journal of The Royal Society Interface, 11(101). Montanari, A., & Saberi, A. (2010). The spread of innovations in social networks. Proceedings of the National Academy of Sciences, 107(47), 1–6. Morris, S. (2000). Contagion. The Review of Economic Studies, 67(1), 57–78. Reagans, R., & McEvily, B. (2003). Network structure and knowledge transfer: The effects of cohesion and range. Administrative Science Quarterly, 48(2), 240–267. Sallee, J. M. (2014). Rational Inattention and Energy Efficiency. Journal of Law and Economics, 57(3), 781–820. Schneider, W., & Shiffrin, R. M. (1977). Controlled and Automatic Human Information Processing: I. Detection, Search, and Attention. Psychological Review, 84(1), 1–66. Scott, G., Konstantin, S., & Milan, S. (2016). Formal Models of Nondemocratic Politics. Annual Review of Political Science, 19, 565–584. Simon, H. (1971). Designing Organizations for an Information-Rich World. In Martin Greenberger (Ed.), Computers, Communications, and the Public Interest (pp. 37–72). Baltimore, MD: John Hopkins University Press. Sims, C. A. (2003). Implications of rational inattention. Journal of Monetary Economics, 50(3), 665–690. https://doi.org/10.1016/S0304-3932(03)00029-1 Strang, D., & Tuma, N. B. (1993). Spatial and Temporal Heterogeneity in Diffusion. American Journal of Sociology, 99(3), 614–639. Turco, C. J. (2016). The conversational firm: Rethinking bureaucracy in the age of social media. Columbia University Press. Ugander, J., Backstrom, L., Marlow, C., & Kleinberg., J. (2012). Structural diversity in social contagion. Proceedings of the National Academy of Sciences, 109(16), 5962–5966. Velleman, J. D. (2000). From self psychology to moral philosophy. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393(June), 1–3. Wood, W., & Rünger, D. (2016). Psychology of habit. Annual Review of Psychology, 67, 289–314.