Supporting Information for: The limits of social mobilization Alex Rutherford, Manuel Cebrian, Sohan Dsouza, Esteban Moro, Alex Pentland,

Contents

A Population Density Distribution 1

B Simulation Details 5 B.1 Simulation Method ...... 5 B.2 Branching Factor Distribution ...... 7

C Further Results 8 C.1 Parameter Exploration ...... 8 C.2 DARPA Network Challenge Balloon Locations ...... 8 C.3 Search Completion Times ...... 11 C.4 Effect of Search Origin ...... 13 C.5 Searchability ...... 14 C.6 Search Efficiency ...... 14 C.7 Logarithmic Blendability Function ...... 14

D Density Dependent Mobility & Distribution of Passive Recruits 16 D.1 Density Dependent Mobility Radius ...... 16 D.2 Completion Time & Balloon Location Probability ...... 16 D.3 Super-linear Blendability Function ...... 20 D.4 Logarithmic Blendability Function ...... 20

E Analysis of Findability Function 22

A Population Density Distribution

As input to the simulations, we use gridded population density [1] based on census data [2] for the mainland USA. This comprises 7,820,528 cells each with an area of 1km2, of which 5,060,288 are populated (i.e. 2,760,240 are empty). The distribution of population amongst the cells displays a familiar fat-tailed behaviour (Fig.(2)), as a result 90% of the cells contain a population <10 allowing for very precise simulation of the recruitment dynamics.

The fat-tailed behaviour of cell populations is a result of the highly hetrogeneous distribution of population typical of a country with large urban centres. This can be seen more clearly still in Fig. (3). The spatial autocorrelation is seen to decay very slowly over with a chacteristic lengthscale of around 14km. Thus on average, any location with high population density has a surrounding area of π(142) ∼ 616km2 with comparable density, and likewise a cell of low density will be surrounded by an area of low density. This small-scale homegenity is observed despite large-scale heterogeneity.

A further pathological demonstration of the population heterogeneity is the difference between Liben-Nowell et al neighborhoods in a dense urban environment (New York City) and a relatively less populated urban environment (Yuma, AZ). Correlations in such as transitivity and clustering as well as bursty dynamics have been found to slow diffusive behaviour on networks [3], and we conjecture an analagous effect due to spatial clustering of population. In NY we see a very small Liben-Nowell et al neighborhood, whereas in AZ the lower population density gives rise to a larger neighborhood. We see that the closer cells have a much stronger weighting i.e. probability of social tie, compared to more distant cells.

1 Figure 1: Map of Population Density (logarithmic scale, per km2) Across Mainland USA and Locations of Balloons in Red Balloon Challenge. (Lambert Azimuthal Equal Area Projection)

Figure 2: Log-log Plot of Population Distribution of Cells

2 Figure 3: Spatial Autocorrelation of Population Density

3 Figure 4: Heatmaps of Liben-Nowell et al neighborhoods in New York City (top) and Yuma, AZ (bottom) relative to the central black cell. Each square is a 1km2 area, the shading reflects the weight of each cell (high to low, red to blue) i.e. The probability of a friendship between a person in that cell and a person in the central cell. The irregular shape of the lower image is due to unpopulated regions.

4 B Simulation Details B.1 Simulation Method The simulation begins by seeding the cell corresponding to MIT in Cambridge, MA with 164 seeds representing the first round of recruitment from the MIT team. Each seed recruits a given number of new active nodes taken from the empirical branching distribution in [4]. Each new, active recruit is assigned an action time given as the current time plus a waiting time taken from a log-normal distribution as observed in [5] and placed into a priority queue sorted by future action time. The waiting time represents the time between the parent node sending the message and the child node sending the message to others and joining the search. Recruits are of 2 geographical types; background, which are chosen uniformly at random from the entire population and rank based, which are selected in inverse proportion to their rank according to (1) as in [6]. 1 P ∝ (1) ij P p k:rik

Where Pij is the probability of friendship between agents i and j and pk is the population at k. Each successful recruitment is determined to be a background recruit with probability n p = background (2) nbackground + nrank

and rank based with probability 1 − p. nrank and nbackground are given as 5.5 and 2.5 respectively from [6]. The probability of geographical recruits is truncated at 105 and distances of 104km. As well as active recruits which join the branching process, each parent node also gives rise to npass passive recruits regardless of the number of active recruits. These behave in exactly the same way as the active recruits described above, except upon activation, they search for balloons but do not perform further recruitment.

The simulation proceeds by stepping forward in time until the activation time of the recruit at the top of the pri- ority queue. This recruit is removed from the queue, performs any recruitment of its own, adds any such new recruits to the queue and locates any balloons in its vicinity. The number of people recruited from each cell is counted as the simulation progresses, and may not exceed the population of the cell. Any further recruitment from a cell beyond its population is ignored and assumed to represent a in the recruitment network. Passive recruits may later be recruited into an active role, however agents selected as active recruits and later selected as passive recruits have no effect on the search process.

The calculation of the weights and ranks for each Liben-Nowell et al neighborhood of every cell at this level of pre- cision is extremely computationaly demmanding. Regions of low population density give rise to larger neighborhoods containing up to 20,000 other cells. Therefore the full set of cells and weights for each neighborhood was calculated in advance and retreived from a database as required during the course of each simulation.

If a balloon is located within the search neighborhood of a recruit, that balloon is ‘found’ immediately which is reasonable since the agent in question is able to report any sightings made before her recruitment. Once a balloon in a cell is found, any further recruitment from the population within that cell will have no effect.The balloon is found with probability 1 within its neighborhood of size rmob.

Figure (5) shows the results of a typical (unsuccessful) search simulation. Initially the number of recruits grows steadily but eventually saturates around 5 × 105, likewise the rate at which cells are searched decreases. The difference between the number of activated recruits (blue) and recruited but not activated individuals (red) represents the size of the action queue, when they converge there are no further agents waiting to act and the branching process terminates. Only the initial dynamics are displayed here, due to the skew in waiting time, the final 20 recruits take 3 years to join the search.

5 Figure 5: Plot of the Number of Activated Recruits (blue), Number of Agents Recruited But Not Yet Activated (red) and Number of Cells Searched (green). Parameters are npass = 400, rmob = 1km

6 B.2 Branching Factor Distribution We determine the distribution of the branching factor to be sampled in our simulations by fitting to a subset of the 4495 individuals which signed up to the balloon challenge. We consider the initial round of recruitment by the MIT seed node to be atypical since it targeted a number of individuals far greater than the average number of friends of an individual, and it is likely that a larger proportion of those targeted will be recruited due to the affinity of the team with the challenge. Thus our simulations begin with 164 seed recruits, and further recruitment proceeds in accordance with the typical branching behaviour. Since extra effort is likely to be excerted by seed nodes to recruit individuals in the initial stages of any social mobilization task, our setup maintains generality. Due to the small sample set, we exclude several single large outliers in the fitting procedure which were considered atypical i.e. media outlets or other individuals with a strong affinity to the task. It must be emphasised that the exact distribution of the branching factor is difficult to determine due to the sparsity and uniqueness of the data. In any case it is not impor- tant, since the mean is well below the tipping point, other processes dominate the search as discussed in the main paper.

The branching behaviour of the remaining 4483 nodes was fit to a power law distribution with exponent α = 2.0786 and mean < Ro >= 0.8906. In order to appropriately sample the power law distribution, we construct a Harris discrete distribution function for branching factor k. H P (k) = αβ (3) β + kα

Where Hαβ is chosen to ensure normalisation and β allows fitting to a given empirical mean value.

7 C Further Results C.1 Parameter Exploration

Figure 6: Heat map of the average number of balloons located (top) and probability of success (bottom) in 100 distinct search simulations for different values of passive recruits and mobility radius.

Figure (6) shows the average number of balloons located (top), and the probability of locating all 10 balloons (bottom) in 500 simulations for each of a range of values of passive recruits and mobility radii. We see that an increased level of mobility and number of passive recruits leads to increased success in locating balloons, with a minimum threshold of rmob = 1km and npassive = 300 for a reasonable level of success (e.g.hnfoundi =8). It is important to note that the process of passive recruitment is critical for success; among 10,000 realisations of the search process, even with a large mobility (npass = 0, rmob = 3km), not one instance successfully located all balloons and in fact only 5 found any at all. We find that for many higher values of the parameters (rmob, npassive) the majority of the balloons are found, but the same few persistently remain elusive. These particular balloons (numbers 6 and 8 in [4] were placed in particularly sparsely populated areas, the population density in the vicinity of the target can be seen to have a systematic effect on the time for location.

C.2 DARPA Network Challenge Balloon Locations Due to the confounding affects of population heterogeneity and geography, we do not expect all balloon locations to be equivalent. Clearly the of success is not linear in the parameters npass and rmob (see Figure (2) in main paper). Figure (7) and table (1) shows the geographical location of each of the balloons.

8 Location Cell Population r = 2km Averaged Population Balloon 1 Union Square, San Fransisco, California 2717 5259 Balloon 2 Charrapal Park, Scottsdale, Arizona 1692 1661 Balloon 3 Tonsler Park, Charlottesville, Virginia 3629 1862 Balloon 4 Chase Palm Park, Santa Barbara, California 3762 1193 Balloon 5 Lee Park, Memphis, Tennesse 1681 1270 Balloon 6 Collins Avenue, Miami, Florida 5 501 Balloon 7 Glasgow Park, Christiana, Delaware 100 475 Balloon 8 Katy Park, Katy, Texas 43 176 Balloon 9 Waterfront Park, Portland, Oregon 3345 2914 Balloon 10 Centennial Park, , Georgia 217 2099

Table 1: Location and population density in balloon cell and within a 2km radius for each balloon in DARPA Balloon Challenge

In most cases the balloons were placed in fairly visible public places such as parks and squares, although often these places had very low local population density despite their visibility. A good example is balloon number 6 (see Figure (7)) which was placed on Collins Avenue in Miami. Although the local population density according to [1] is just 5, since it is on a major road it is likely that many individual mobility patterns overlapped with it. This partic- ular case was also exacerbated by the fact that the location was on the coastline; so simple geometry provided fewer opportunities for recruitment nearby. Table (1) demonstrates that these sparsely populated single cells (particularly balloons 6, 7 and 8) were surrounded by other sparsely populated cells and these were among the most difficult bal- loons to be found in our simulations. However an increased mobility radius will allow a balloon to be more easily found.

Figure (8) develops this idea further. We see that the the majority of balloons are found with relative ease with a moderate number of passive recruits (npass ≈ 100). Yet balloons 6 and 8 in particular require a large degree of mobility since they are in sparse surroundings so must be located from more distant, populated cells.

9 Figure 7: Maps of DARPA Balloon Challenge Locations

10 Figure 8: Multiplot of probability distribution of locating each of 10 balloons averaged over 500 searches as a function of rmob and npass

C.3 Search Completion Times Simple comparison of the search completion times is confounded by the fact that a search may terminate before lo- cating all 10 balloons, and indeed at many points in the parameter space it is overwhelmingly likely that this will happen. Therefore we compare the completion times between all successful search simulations and all unsuccessful searches simulations separately, shown in Figure (9) and (10) respectively. In both cases, each plot is annotated with the percentage of successful/unsuccessful searches (i.e. all 10 balloons located/not located before termination of the branching process).

The successful searches shown in Figure (9) complete faster with more passive recruitment. Meanwhile the overall probability of being successful depends strongly on the level of mobility, as the balloons in areas of low population otherwise hinder success as discussed above. That is moving from top to bottom within the array of plots, the overall likelihood of success increases, while the centre of gravity of the completion time distribution moves to smaller times moving from left to right. Likewise, increased passive recruitment (moving from left to right in Figure (10)) simply prolongs unsucessful searches with little increase in the chance of success at low values of mobility. Increased mobility (moving from top to bottom in the array of plots) systematically reduces the chance of failure.

Thus both individual mobility and passive recruitment are essential for timely, successful search, however both contribute in subtly different ways. More passive recruits per agent lead to successful location of the balloons in shorter times. Yet a few isolated balloons mean that success is impossible without a certain level of mobility, and the overall probability of success is strongly dependent on this parameter.

11 Figure 9: Multiplot of distribution of completion times for 500 searches (including only successful searches) as a function of rmob and npass. Each plot is annotated with the percentage of successful searches which are included.

Figure 10: Multiplot of distribution of completion times for 500 searches (including only unsuccessful searches) as a function of rmob and npass. Each plot is annotated with the percentage of unsuccessful searches which are included.

12 C.4 Effect of Search Origin We briefly investigate how the choice of search origin affects the dynamics. Figure (11) shows the searchability of all cells averaged over 1000 distinct searches excluding cells within 15km of MIT (black points), the searchability of these cells is then superimposed in red. It can be seen that many of these cells have a searchability much higher than the prevailing trend, however all significant outliers are accounted for by these privileged cells.

Figure 11: Scatter plot of searchability over 1000 searches for all cells. Those within 15km of MIT are in red and the remainder are in black.

13 C.5 Searchability Figure (12) shows the variation of searchability, blendability and findability calculated from 10,000 searches using rmob = 2km and npass = 400, as well as the underlying population density across the entire US.

Figure 12: Clockwise from top left; searchability, blendability, findability and underlying population density using rmob = 2km and npass =400. Grey circles indicate locations used in DARPA balloon challenge. Searchability indicates probability cell is searched in our simulations, blendability is Bettencourt et al. super-linear scaling of population density and findability is the ratio of searchability and blendability. Each are log-scaled.

C.6 Search Efficiency Due to the heterogeneities in the population density discussed above, we find that the search efficiency of background recruits and rank based recruits differ considerably. The relative efficiency of each type of recruit can be quantified by counting the number of cells which are first searched by each type. Since distance-independent ties are fewer and necessarily operate on a longer lengthscale, there is less chance of background recruits overlapping. Conversely, the clustering of population in built up areas leads to continuous, overlapping rank based recruitment within neighboring cells; the branching search process is effectively ’trapped’ within a small region.

In Figure (13) we consider the search due simply to the branching process, with no passive recruitment. The solid lines show the number of recruits of each type which exist in the ratio ( nbackground = 2.5 ), the dashed lines nrank 5.5 represent how many of these short and background recruits are into cells which have not yet been searched. While both recruitment processes are inefficient, the ratio between the proportions of each type which search new cells (red line) demonstrates that background recruits have roughly 10 times the search efficiency. It should be noted that due to the sparsity of recruits in this case there is some noise and transient behaviour at shorter times.

C.7 Logarithmic Blendability Function We now consider a logarithmic blendability function related to the scaling of urban walking speed as reported by Bornstein and Bornstein [7]. We omit the findability behaviour below a population denisty of 10km−2 due to divergent behaviour of the logarithm function at low values.

14 Figure 13: Number of background and rank based recruits (solid lines) over time, and of these recruits the number which are in a previously unsearched cell (dashed lines). The red line shows the ratio of new cells searched by each type(= nbackground ), the dashed red line represents the ratio expected if both types of recruits search with equal nrank efficiency. Parameters are rmob = 0 and npass = 0.

Figure 14: Scatter plot of cell searchability with population (black) calculated using a fixed mobility radius of 2km, scaled logarithmic blendability function (blue) and scaled findability (red)

15 D Density Dependent Mobility & Distribution of Passive Recruits D.1 Density Dependent Mobility Radius We consider a non-uniform mobility radius determined by local population density smoothed over a 5km radius. We assign a mobility radius in inverse proportion to the local population density. We equally divide the full range of smoothed population density into 6 bins. For each cell, we assign a mobility radius sampled from a uniform distribu- tion between 0km and a maximum in the range [1-6] km.

Figure 15: Distribution of population densities smoothed over 5km

The number of passive recruits is now modeled as a distribution. In the absence of a well defined functional form of the social graph reported in [8], we model the distribution of passive recruits as a power law with an exponent equal to that describing the number of active recruits, but with a mean chosen as a free parameter in {0, 100, 200, 300, 400}.

The completion time distribution and searchability plots using this variable mobility radius and passive recruit distribution mean of 400 are shown below.

D.2 Completion Time & Balloon Location Probability Comparing Figure (17) to Figure (1) in the main paper, we see little qualitative difference. The mean completion time remains close to the 48 hour baseline. One effect of the variable mobility radius is an overall reduction in success rate from 89% using (npass = 400, rmob = 2km) to 67%.

The distribution of completion times for successful and unsuccessful searches, as a function of the mean number of passive recruits (in analogy to figures (9),(10)) is shown below. The same trends persist in both cases; increasing passive recruits improves success rate and leads to faster completion amongst these successful searches. However un- successful searches take longer to fail.

Figure (20) shows the number of recruits and probability of success in finding randomly placed balloons using a variable passive recruit number and mobility radius as described above (to be compared to Figure (2) in main paper).

16 Figure 16: Scatter plot of smoothed cell population density and mobility radius sampled from a uniform distribution in a range between 0 and a maximum value determined by local density.

Figure 17: Histogram of completion times for successful searches with variable mobility radius out of 3000 instances and inset for unsucessful searches. Success rate is 62%

17 Figure 18: Histograms of completion times for successful searches as a function of mean number of passive recruits. Annotation is percentage success rate.

Figure 19: Histograms of completion times for unsuccessful searches as a function of mean number of passive recruits. Annotation is percentage failure rate.

18 Figure 20: Scatter plot of number of recruits at completion in a search for a single randomly placed balloon as a function of the population in the cell in which the balloon is placed for 5,000 randomly selected balloon locations. Black dots represent only successful searches (bottom). Histogram represents the probability to successfully find the balloon. Dashed black vertical lines indicate the populations of the locations used in the DARPA balloon challenge. The red line represents the mean number of recuits for each histogram bin (top).

19 D.3 Super-linear Blendability Function The variable mobility radius leads to more scatter in the individual cell searchabilities around the underlying logistic trend with population density; now there are a few cells with very large searchability despite low population density. In order to show more clearly the underlying trend we take the mean of the searchability amongst cells with identical population density and plot this average as function of population. We plot the individual cell searchabilities as well as the average at each population value and indicate 1 standard deviation. The findability derived from this mean searchability is then considered.

Figure 21: Scatter plot of cell searchability as a function of cell population density. The mean searchability among cells with the same population density (red line) and 1 standard deviation of this distribution.

D.4 Logarithmic Blendability Function We now consider the same plot as above using the logarithmic scaling of blendability reported by Bornstein and Bornstein [7]. We omit the findability behaviour below a population denisty of 10km−2 due to divergent behaviour of the logarithm function at low values.

20 Figure 22: The mean searchability (black line), scaled super-linear blendability function (blue line) and their ratio, the scaled blendability (red line)

Figure 23: The mean searchability (black line), scaled logarithmic blendability function (blue line) and their ratio, the scaled blendability (red line)

21 E Analysis of Findability Function

One of the main results in the paper is the nontrivial behavior of the findability as a function of the population density. Here we study the conditions for the existence of a maximum in the findability function. Although si is not strictly a function we will assume here that it is well represented by the function form si(pi) Thus si(pi) fi = β−1 pi

dfi and then, we will have a maximum at the population pi at which = 0, i.e. when dpi

dsi pi = si(β − 1) (4) dpi

dsi Here we have the first condition: the shape of si in Figure 3 of the main paper tell us that ≥ 0 always and thus dpi the above condition can be only met if β > 1. The second condition is based on the fact that the left hand side of (4) is a bell-shaped function, that is, it takes zero value at pi = 0 and pi → ∞. Thus, in order to have a non-trivial solution of equation (4) we need that the slope γ of the left hand side of (4) is greater that the slope of the right hand side at pi = 0. Assuming that si(pi) ' pi for pi ' 0 we get that the condition is that β < γ + 1 (5) Putting the two conditions together we get that 1 < β < γ + 1 We now investigate two specific functional forms for s(p) to exemplify the condition (5) found above:

• s(p) = tanh(p/p0) In this case we get that γ = 1 since tanh(p/p0) ' p/p0 for p ' 0. Thus in this case we get that 1 < β < 2 This corresponds to the situation in the main paper. p • s(p) = tanh( p/p0) In this case we get that γ = 1/2 and thus the condition is 1 < β < 3/2 The results are found in figure 24.

References

[1] Center NNCD (2008), Gridded 1 km Population for the Conterminous . http://www.ncdc.noaa. gov/oa/climate/research/population/, [Online; accessed 3-September-2012]. [2] Bureau USC (2001), Census 2000: Census Tract Cartographic Boundary Files - U.S. Census Bureau. http://www. census.gov/geo/www/cob/tr2000.html, [Online; accessed 3-September-2012]. [3] Karsai M, et al. (2011) Small but slow world: How network topology and burstiness slow down spreading. Physical Review E 83:025102. [4] Pickard G, et al. (2011) Time-critical social mobilization. Science 334:509–512. [5] Iribarren JL, Moro E (2009) Impact of Human Activity Patterns on the Dynamics of Information Diffusion. Physical Review Letters 103:038702+. [6] Liben-Nowell D, Novak J, Kumar R, Raghavan P, Tomkins A (2005) Geographic routing in social networks. Proceedings of the National Academy of Sciences 102:11623–11628. [7] Bornstein M, Bornstein H (1976) The pace of life. Nature 259:557–559. [8] Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. CoRR abs/1111.4503.

22 1.0 0.8 0.6 s(p) s(p),f(p) f(p) beta = 1.1 0.4 f(p) beta = 1.2 f(p) beta = 1.3

0.2 f(p) beta = 1.4 f(p) beta = 1.5

1e+01 1e+02 1e+03 1e+04 1e+05

p

p Figure 24: Findability for si(p) = tanh( p/20) and different values of β.

23