Supporting Information For: the Limits of Social Mobilization Alex Rutherford, Manuel Cebrian, Sohan Dsouza, Esteban Moro, Alex Pentland, Iyad Rahwan

Supporting Information for: The limits of social mobilization Alex Rutherford, Manuel Cebrian, Sohan Dsouza, Esteban Moro, Alex Pentland, Iyad Rahwan Contents A Population Density Distribution 1 B Simulation Details 5 B.1 Simulation Method . 5 B.2 Branching Factor Distribution . 7 C Further Results 8 C.1 Parameter Exploration . 8 C.2 DARPA Network Challenge Balloon Locations . 8 C.3 Search Completion Times . 11 C.4 Effect of Search Origin . 13 C.5 Searchability . 14 C.6 Search Efficiency . 14 C.7 Logarithmic Blendability Function . 14 D Density Dependent Mobility & Distribution of Passive Recruits 16 D.1 Density Dependent Mobility Radius . 16 D.2 Completion Time & Balloon Location Probability . 16 D.3 Super-linear Blendability Function . 20 D.4 Logarithmic Blendability Function . 20 E Analysis of Findability Function 22 A Population Density Distribution As input to the simulations, we use gridded population density [1] based on census data [2] for the mainland USA. This comprises 7,820,528 cells each with an area of 1km2, of which 5,060,288 are populated (i.e. 2,760,240 are empty). The distribution of population amongst the cells displays a familiar fat-tailed behaviour (Fig.(2)), as a result 90% of the cells contain a population <10 allowing for very precise simulation of the recruitment dynamics. The fat-tailed behaviour of cell populations is a result of the highly hetrogeneous distribution of population typical of a country with large urban centres. This can be seen more clearly still in Fig. (3). The spatial autocorrelation is seen to decay very slowly over distance with a chacteristic lengthscale of around 14km. Thus on average, any location with high population density has a surrounding area of π(142) ∼ 616km2 with comparable density, and likewise a cell of low density will be surrounded by an area of low density. This small-scale homegenity is observed despite large-scale heterogeneity. A further pathological demonstration of the population heterogeneity is the difference between Liben-Nowell et al neighborhoods in a dense urban environment (New York City) and a relatively less populated urban environment (Yuma, AZ). Correlations in network topology such as transitivity and clustering as well as bursty dynamics have been found to slow diffusive behaviour on networks [3], and we conjecture an analagous effect due to spatial clustering of population. In NY we see a very small Liben-Nowell et al neighborhood, whereas in AZ the lower population density gives rise to a larger neighborhood. We see that the closer cells have a much stronger weighting i.e. probability of social tie, compared to more distant cells. 1 Figure 1: Map of Population Density (logarithmic scale, per km2) Across Mainland USA and Locations of Balloons in Red Balloon Challenge. (Lambert Azimuthal Equal Area Projection) Figure 2: Log-log Plot of Population Distribution of Cells 2 Figure 3: Spatial Autocorrelation of Population Density 3 Figure 4: Heatmaps of Liben-Nowell et al neighborhoods in New York City (top) and Yuma, AZ (bottom) relative to the central black cell. Each square is a 1km2 area, the shading reflects the weight of each cell (high to low, red to blue) i.e. The probability of a friendship between a person in that cell and a person in the central cell. The irregular shape of the lower image is due to unpopulated regions. 4 B Simulation Details B.1 Simulation Method The simulation begins by seeding the cell corresponding to MIT in Cambridge, MA with 164 seeds representing the first round of recruitment from the MIT team. Each seed recruits a given number of new active nodes taken from the empirical branching distribution in [4]. Each new, active recruit is assigned an action time given as the current time plus a waiting time taken from a log-normal distribution as observed in [5] and placed into a priority queue sorted by future action time. The waiting time represents the time between the parent node sending the message and the child node sending the message to others and joining the search. Recruits are of 2 geographical types; background, which are chosen uniformly at random from the entire population and rank based, which are selected in inverse proportion to their rank according to (1) as in [6]. 1 P / (1) ij P p k:rik<rij k Where Pij is the probability of friendship between agents i and j and pk is the population at k. Each successful recruitment is determined to be a background recruit with probability n p = background (2) nbackground + nrank and rank based with probability 1 − p. nrank and nbackground are given as 5.5 and 2.5 respectively from [6]. The probability of geographical recruits is truncated at 105 and distances of 104km. As well as active recruits which join the branching process, each parent node also gives rise to npass passive recruits regardless of the number of active recruits. These behave in exactly the same way as the active recruits described above, except upon activation, they search for balloons but do not perform further recruitment. The simulation proceeds by stepping forward in time until the activation time of the recruit at the top of the priority queue. This recruit is removed from the queue, performs any recruitment of its own, adds any such new recruits to the queue and locates any balloons in its vicinity. The number of people recruited from each cell is counted as the simulation progresses, and may not exceed the population of the cell. Any further recruitment from a cell beyond its population is ignored and assumed to represent a loop in the recruitment network. Passive recruits may later be recruited into an active role, however agents selected as active recruits and later selected as passive recruits have no effect on the search process. The calculation of the weights and ranks for each Liben-Nowell et al neighborhood of every cell at this level of pre- cision is extremely computationaly demmanding. Regions of low population density give rise to larger neighborhoods containing up to 20,000 other cells. Therefore the full set of cells and weights for each neighborhood was calculated in advance and retreived from a database as required during the course of each simulation. If a balloon is located within the search neighborhood of a recruit, that balloon is `found' immediately which is reasonable since the agent in question is able to report any sightings made before her recruitment. Once a balloon in a cell is found, any further recruitment from the population within that cell will have no effect.The balloon is found with probability 1 within its neighborhood of size rmob. Figure (5) shows the results of a typical (unsuccessful) search simulation. Initially the number of recruits grows steadily but eventually saturates around 5 × 105, likewise the rate at which cells are searched decreases. The difference between the number of activated recruits (blue) and recruited but not activated individuals (red) represents the size of the action queue, when they converge there are no further agents waiting to act and the branching process terminates. Only the initial dynamics are displayed here, due to the skew in waiting time, the final 20 recruits take 3 years to join the search. 5 Figure 5: Plot of the Number of Activated Recruits (blue), Number of Agents Recruited But Not Yet Activated (red) and Number of Cells Searched (green). Parameters are npass = 400, rmob = 1km 6 B.2 Branching Factor Distribution We determine the distribution of the branching factor to be sampled in our simulations by fitting to a subset of the 4495 individuals which signed up to the balloon challenge. We consider the initial round of recruitment by the MIT seed node to be atypical since it targeted a number of individuals far greater than the average number of friends of an individual, and it is likely that a larger proportion of those targeted will be recruited due to the affinity of the team with the challenge. Thus our simulations begin with 164 seed recruits, and further recruitment proceeds in accordance with the typical branching behaviour. Since extra effort is likely to be excerted by seed nodes to recruit individuals in the initial stages of any social mobilization task, our setup maintains generality. Due to the small sample set, we exclude several single large outliers in the fitting procedure which were considered atypical i.e. media outlets or other individuals with a strong affinity to the task. It must be emphasised that the exact distribution of the branching factor is difficult to determine due to the sparsity and uniqueness of the data. In any case it is not impor- tant, since the mean is well below the tipping point, other processes dominate the search as discussed in the main paper. The branching behaviour of the remaining 4483 nodes was fit to a power law distribution with exponent α = 2:0786 and mean < Ro >= 0:8906. In order to appropriately sample the power law distribution, we construct a Harris discrete distribution function for branching factor k. H P (k) = αβ (3) β + kα Where Hαβ is chosen to ensure normalisation and β allows fitting to a given empirical mean value. 7 C Further Results C.1 Parameter Exploration Figure 6: Heat map of the average number of balloons located (top) and probability of success (bottom) in 100 distinct search simulations for different values of passive recruits and mobility radius. Figure (6) shows the average number of balloons located (top), and the probability of locating all 10 balloons (bottom) in 500 simulations for each of a range of values of passive recruits and mobility radii.

Load more