<<

"RIOTS AND SOCIABILITY: A CASE STUDY OF HUMAN MASSACHUSETTS INSTIfUTE OF TECHNOLOGY INTERACTION NETWORKS IN , "

BY OCT 0L7 MICHAEL ANGELO GRECO III B.S. COMPUTER SCIENCE, MATHEMATICS, ART LIBRARIES UNIVERSITY OF WISCONSIN MADISON, 2006

SUBMITTED TO THE DEPARTMENT OF URBAN STUDIES AND PLANNING AND THE ENGINEERING SYSTEMS DIVISION IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREES OF MASTER IN CITY PLANNING AND MASTER OF SCIENCE IN TECHNOLOGY POLICY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEPTEMBER 2014 02014 - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ALL RIGHTS RESERVED.

Signature redacted Signature of Author Department of Urban Studies and Planning e ms Division Te bey 3, 2014 Signature redacted Certified by Carlo Ratti Professor o h actie, Department of Urban Studies and Planning Thesis Supervisor Signature redacted Accepted by , Dennis Frenchman Professor of Urban Design and Planning Chair, MCP Committee Department of Urban Studies and Planning

Accepted by Signature redacted ( ' Dava J. Newman Professor of Aeronautics and Astronautics and Engineering Systems Director, Technology and Policy Program

"Riots and Sociability: A Case Study of Human Interaction Networks in Qatif Saudi Arabia" by Michael Angelo Greco III

Submitted to the Department of Urban Studies and Planning and the Engineering Systems Division in partial fulfillment of the requirements for the degrees of Master in City Planning and Master of Science in Technology Policy ABSTRACT

Since the onset of the Arab Spring in late 2010, waves of political activism have rever- berated across much of the Arab world. A growing body of literature has emerged that explores how new communications and social media technologies have contributed to, and in certain cases instigated various forms of collective action. However, little research has examined the effect of these activities on communication patterns them- selves. This thesis aims to investigate the reorganization of sociability under civil duress at an aggregate, urban scale.

The study employs a novel approach to communications analysis, applying the Syn- thetic Control Method to estimate the causal effect of riots on different characteristics of human interaction within Qatif, Saudi Arabia, after an exogenous shock triggered a surge in public demonstrations. The analysis reveals a strong, statistically signifi- cant drop in total call volume, relative to other cities in Saudi Arabia. This is com- bined with a similarly strong and statistically significant drop in unique daily callers- demonstrating that people weren't only making fewer calls, fewer people were partic- ipating in the telecom network each day. Interestingly, daily phone activity is shown to increase within the subnetwork of users identified to hold strong spatiotemporal ties to the city, even though their total activity measures (which include connections both internal and external to the subnetwork) remain constant. This suggests a shift in callee preference for individuals who are more directly affected by urban unrest. Lastly, information transmission tests are performed on Qatif's pre and post treatment interaction networks. Initial research shows that-beyond a 26% diffusion threshold- information reaches more people faster through the post treatment network. This pro- vides some support to the hypothesis that communities under duress intelligently reor- ganize communications to increase dissemination speed and breadth, however, further research will be required to refine these findings and demonstrate a causal link.

Thesis Supervisor: Carlo Ratti Title: Professor of the Practice, Department of Urban Studies and Planning

3 Contents

I INTRODUCTION 9

2 DATA AND PROCESSING '5 2.1 Call Detail Records ...... 15 2.2 Tweets ...... 17 2.3 City Selection and Data Aggregation .... . 20 2.4 Data Limitations ...... 20

3 METHODS 1 3.' Synthetic Control Methods...... 21

4 ANALYSIS: CALL BEHAVIOR 25

5 ANALYSIS: INTER AND INTRACITY CALLING PATTERNS 37 5.1 Location Identification ...... 38 5.2 Urban Call Counts ...... 39

6 ANALYSIS: TWITTER ACTIVrTY 43 6.1 Geotagged Activity ...... 43

7 FUTURE DIRECTIONS 48 7.1 Location Estimation for Non-Geotagged Tweets . 48

7.2 Communication Networks ...... 52 7.3 Religiosity ...... 56

8 DISCUSSION 6o

4 APPENDICES 64

A APPENDIX: CALL BEHAVIOR 65

B APPENDIX: INTER AND INTRACITY CALLING PATTERNS 69

C APPENDIX: TWITTER ACTvITY 74

REFERENCES 81

5 Listing of figures

i.o.i Protest Images From QatifFollowing Ahmad al-Matar's Death. Found

at: http://khaleejsaihat.com/web3/showthread.php?t=129754 . . . . 14 2.1.1 Geographic Distribution of Cell Towers in Saudi Arabia ...... 16 2.1.2 Left: Service Type Histogram, Right: Service Detail Description His- togram ...... 17 2.1.3 Phone Activity Timeline Over Study Period (Top), Daily Phone Ac- tivity Timeline of Saudi Arabia, Dec. 12th (Bottom) ...... 18 2.2.1 Tweet Timeline Over Study Period (Top), Daily Tweet Timeline of Saudi Arabia, Dec. 12th (Bottom) ...... 19 4.0.1 Daily call distributions for Dec. 21st and Dec. 28th for All KSA govornerates (Left), and Qatif (Right) ...... 26 4.0.2 Trends in total daily network activity, Qatif vs. Other Saudi Gover- norates, Dec. 20th - Jan. 3rd (Top), and Trends in Average Daily Call Duration, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Bottom). "Treatment" indicated by dashed pink line ...... 27 4.0.3 Trends in Total Network Activity, Qatif and Synthetic Qatif (Left), and Total Network Activity Gap Between Qatif and Synthetic Qatif

(Right) ...... 29 4.0.4 Trends in Average Daily Call Duration, Qatif and Synthetic Qatif (Left), and Average Daily Call Duration Gap Between Qatif and Syn- thetic Qatif (Right) ...... 31 4.0.5 Trends in Unique Callers, Qatif and Synthetic Qatif (Left), and Gap in Unique Callers, Qatif and Synthetic Qatif (Right) ...... 32

6 4.0.6 Synthetic Control Placebo Tests with Sabya. Total Daily Network Activity (Left), Average Call Duration (Middle), and Daily Unique Callers (Right) ...... 32 4.0.7 Across-Unit Placebo Tests: Total Activity (all, SOOx or less, roox or

less, Sox or less) ...... - - ...... 34 4.0.8 Across-Unit Placebo Tests: Daily Unique Callers (all, SOOx or less,

oox or less, 5ox or less) ...... 35 4.0.9 In-Time Placebo Tests with Qatif. Total Daily Network Activity (Left) and Average Call Duration (Right) ...... 36 5.2.1 Trends in standardized intra (top), inter-in (middle), and inter-out (bottom) call volumes daily network activity, Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indi- cated by dashed pink line ...... 40 5.2.2 Trends in Intra Call Volumes ...... 41 5.2.3 Trends in Inter-In Call Volumes ...... 41 5.2.4 Trends in Inter-Out Call Volumes ...... 42 6.1.1 Trends in standardized daily Tweet volume (top), Tweet length (mid- dle), and Tweets per user (bottom), Qatif (solid) vs. other Saudi gov- ernorates (dashed). Dec. 20th - Jan. 3rd. Treatment indicated by dashed pink line ...... 44 6.i .2 Trends in Total Tweet Activity, Qatif and Synthetic Qatif ...... 45 6.i .3 Trends in Average Tweet Length, Qatif and Synthetic Qatif . . . . . 45 6.x .4 Trends in Tweets Per User, Qatif and Synthetic Qatif ...... 46 7.1.' Trends in Total Tweet Activity, Qatif and Synthetic Qatif ...... 52 7.1.2 Trends in Average Tweet Length, Qatif and Synthetic Qatif . . . . . 53 7..3 Trends in Tweets Per User, Qatif and Synthetic Qatif ...... 53 7.2.1 Total Degree Distribution (Left), and Edge Weight Distribution (Right) of the Complete Reciprocated Network, KSA ...... 54 7.2.2 Fraction of Infected Nodes as Function of Time (Top), Number of infected Nodes at each instance of t (Middle), and Distributions of Edge Weights Responsible for Infection ...... 57 7.3.1 Daily Network Activity Distributions from (Western Saudi Arabia), (Central Saudi Arabia), and the Eastern Region . . 58

7 7.3.2 Trends in Daily Prayer Time Disruption, All KSA Cities. Qatif drawn in pink...... 59 A.o. i Total Network Activity, Qatif and Synthetic Qatif (3 Weeks) ... . 67 A.o.2 Number of Unique Daily Callers, Qatif and Synthetic Qatif (3 Weeks) 68 B.o.i Intra Call Activity Synthetic Control Placebo Test with Samteh (Left), In-time Intra Call Activity Placebo with Qatif (Right) ...... 70

B.o.2 Daily Local Call Activity ...... 71 B.o.3 Across-Unit Placebo Tests: Intra Call Activity (all, 2Ox or less, iox or less, 5x or less) ...... 72 B.o.4 Intra Call Activity, Qatif and Synthetic Qatif (3 Weeks) ...... 73 C.o. i Tweets Per User Synthetic Control Placebo Test with Ahad Rufaydah (Left), In-time Tweets Per User Placebo with Qatif (Right) ... .. 76 C.o.2Across-Unit Placebo Tests: Tweets Per User (all, 50x or less, 2ox or less, 5x or less) ...... 77

8 Introduction

This paper seeks to explore how social unrest affects broad-scale sociability in a city or region. Since the Arab Spring in the early 2oios, there have been waves of politi- cal activism across much of the Arab world, including the Kingdom of Saudi Arabia. A growing body of literature has developed to investigate how new communications and social media technologies have contributed, and in certain cases instigated vari- ous forms of collective action. A few studies have considered the impact of mobile phone access in facilitating collective action, though most have narrowed in on the ef- fects of a specific emerging technology, like Twitter or Facebook. Furthermore, these studies follow a wide range of methods that can be broadly grouped into the following three categories: qualitative approaches that relied on survey data and expert inter- views; quantitive approaches that characterized the nature of communication patterns through these new media outlets; or more advanced analytical methods that sought to isolate the role various communications media played during social unrest. Examples of these types of research will be discussed in brief over the remainder of this section.

9 Expert interviews are a popular method for subjectively exploring the impact of so- cial connectivity in urban environments. Tufekci et al. examined the protests in Egypt by surveying participants in Tahrir Square. They argued that respondents who used social media were much more likely to attend the demonstrations on the first day. They further noted that approximately half of those questioned spread media from the protests online, and that in many cases communication through social media, mobile phones and face-to-face conversations superseded the role of traditional news media during the protests. Thus, the authors concluded that social technologies were crit- ical in diffusing information related to, and encouraging involvement in the activist intervention Tahir Square [I]. Similarly, Breuer et al. used a qualitative approach to characterize the role of social media in the Tunisian Revolution. Using a combination of expert interviews with protest participants and preference survey data from Tunisian internet users, the authors claimed that new social media forms helped overcome a government-enforced media blackout by enabling activists to broadcast information on the movement. Further, they found these collaborative technologies facilitated the emergence of partnerships between activist groups, and encouraged a kind of 'emo- tional mobilization' through the depiction of the regime's atrocities in the uncensored content [2].

Qualitative research has been a popular method for identifying a symbiosis between on- line communication technologies and social unrest. Another body of research pushed the relationship between social media and collective action further by quantifying as- pects of user behavior in response to specific conflict events. Lotan et al compared and contrasted Tweet broadcasting and news consumption in Tunisia and Egypt [3]. At a high level, this study reiterated the importance of social media as a communica- tions medium during periods of cultural instability. Furthermore, they built off the existing state of the literature by highlighting important differences and similarities in information flow across cultures [31. They found that tweets produced by both in- dependent activists and mainstream news outlets produced a larger responses in Egypt than Tunisia. On the hand, Tunisians responded more to news disseminated by blogger sites, while both countries showed a heavy reliance on information spread by journal- ists. This research suggests a link between social media platforms and city's cultural fabric, even though pinpointing this exact relationship eclipsed the scope of the paper.

IO The papers by Szell et al. and Bagrow et al. examined how the social temperature of a city or region manifested itself in communication patterns. Szell et al. studied the nature of collective reaction to major events like presidential elections, sports tour- naments, and weather abnormalities on Twitter. They found a negative correlation between users' excitation and message length. [41. Bagrow et al. also explored how emotions impacted communication patterns. They used geotagged mobile phone call records gathered from cities in emergency states. They found that spikes in commu- nications volume were spatially and temporally bounded, and asserted that affected individuals "will only invoke the social network to propagate information under the most extreme circumstances" [51. On the other hand, even though Szell et al. and Bagrow et al's research suggested that the importance of communication mediums in- creased during periods of social excitation, not all elements of social media played a strong role in these events. Aday et al. investigated the role of specific features of social media in intra-country collective and regional diffusion. They limited their analysis to exploring how bit.ly links-a URL shortening service popular on Twitter-were used to share and consume relevant protest data [6]. They found no evidence that the service played a significant role in Tunisia, Egypt, Bahrain, or Libya.

A final subset of papers on social media and sociability posit that a causal relationship exists between communication technology market penetration and the probability of social action. Pierskalla et al. explored the effects of mobile phone access on collec- tive action in Africa. The authors employed UCDP conflict data from 1989 to 2010 alongside data on mobile phone coverage data from the GSMA. Using a number of binary dependent variable models, they showed the availability of mobile phone cov- erage significantly and substantially increased the likelihood of violent conflict in the region [7]. They further concluded that the adoption of communication technologies produced intrinsic changes in a city or region's communications patterns and willing- ness to collaborate. They also noted that the ability to assemble did not automatically yield a social good-in their case it facilitated violent collective action and increased overall instability in the study region.

Most of the research on communication technologies and collective action has fo- cused on the utilization of social media and mobile phones. Researchers tend to agree

II that these new technologies facilitate collective action, although there are disputes over the scale of their impact. Researchers also agree that engagement attributes measured through these technologies change to reflect states of heightened tension. However, few studies have addressed the broader social implications of these communication devices. This analysis builds on existing studies by estimating the causal effect of civil unrest on various characteristics of the human interaction networks as expressed through Call Detail Records (CDRs), and social media as expressed through Twitter messages. Us- ing Synthetic Control Methods, differences in call and tweet behavior are calculated between a region experiencing riots (Qatif, Saudi Arabia-the treated unit), and re- gions that do not experience riots (other cities in Saudi Arabia-the control units). Surprisingly, the results show that total daily network activity across the city of study significantly decreases with treatment, while call activity within the subset of individu- als who hold strong spatiotemporal ties to the region significantly increases. The effects on average call duration, and inter-city call volumes remain inconclusive. When ex- amining tweets, the number of Tweets per user, per day drastically increases at the symbolic climax of the demonstrations, but placebo tests indicate that this finding is not robust. Lastly, preliminary information transmission tests performed on city-scale interaction networks shows that information reaches more people faster during the post treatment period, however, further research will be required to refine these findings and demonstrate a causal link.

CONTEXT: THE DEATH OF AHMAD AL-MATAR

Large-scale public demonstrations have plagued Saudi Arabia since February 2011. The protests have been mainly concentrated in the oil-rich Eastern Province. The Eastern Province, and the city of Qatif in particular, is home to the largest proportion of Shiites, members of the minority denomination of . Sunni-Shia relations have a history of tension in the nation, and, following the early events of the Arab

Spring in late 2010, sparks of unrest began to reignite. A movement began to coalesce around the early demonstrations of 2011, with protesters calling the release of political prisoners, freedom of expression and assembly, and an end to widespread discrimina- tion against Shiites. In February, after seven young shiites were killed, the country experienced its largest collective uprising since 1979 [8]. Protests then occurred on an

12 I

almost regular basis before cooling off through the spring and early summer of 2012.

This period of relative tranquility was broken on July 8th with the shooting and ar- rest of Nirm al-Nimr [8], a Shia Sheikh and outspoken leader of the movement, which re-escalated tensions and sparked a new wave of demonstrations [9]. Large protests engulfed the city Qatif and quickly became violent, leading to the deaths of two more protestors [ro]. Security forces began to crack down on dissidents, pursuing 23 men whom the government claimed were wanted for inciting unrest in Qatif. Raids in late September brought about the deaths or injuries of several of these men.[i 1]

As time passed, younger activists in the region began to incorporate protest tactics em- ployed by Bahraini youth [io], which included the nightly burning tires on the roads around the city. It was at one of these demonstrations near midnight of December

2 7 th that teenage Shia activist Ahmad al-Matar was shot dead by security forces. In spite of little media coverage, reports indicate the protest was.held to demand the re- lease of political prisoners, and several other protesters were injured and/or arrested. [ 12]. This event set in motion a wave of riots and demonstrations throughout the city that culminated in a funeral procession on December 31st with an estimated crowd of so,ooo [io]. Activists also took to Twitter, starting a campaign that used the hashtag "We All Are Qatif" [13].

Since protests occur on a regular basis in Qatif, the study has been framed around al-Matar's death as an exogenous 'treatment' applied to the social fabric of the city.

13 Figure 1.0.1: Protest Images From Qatif Following Ahmad al-Matar's Death. Found at: http://khaleejsaihat.com/web3/showthread.php?t=1 29754

14

I 2 Data and Processing

2.1 CALL DETAIL RECORDS

CALL DETAIL RECORDS (CDRs) are the primary dataset of interest in this study. The records cover 12/03/12 to 1/3/13. They were obtained from a Saudi telecommunica- tions agency that offers mobile services to the Kingdom. Cellular activity is one of the most powerful real-time sensing mechanisms currently available to us; the ubiquity of digital devices allow us to capture extremely high-resolution traces of humanity across a variety of dimensions. Saudi Arabia's mobile phone penetration is above 198%-an as- tonishing figure suggesting that many across the Kingdom own more than one mobile device.

The data cover the entire country of Saudi Arabia, with over 100 million daily network connections to over 1o thousand unique cell towers, with approximately 18 million

'5 Figure 2.1.1: Geographic Distribution of Cell Towers in Saudi Arabia

unique phones. The CDR dataset consists of anonymous location measurements gen- erated each time a device connects to the cellular network. Each anonymized record holds a precise time and duration measure for the connection, the caller's location (by cell tower), the 'service type', and 'service detail description.' The service type consists of an identifier that logs the type of origin and destination telephones. The 'service de- tail description' describes the record's type of communication, e.g. voice, data, SMS, etc. There are 486 unique codes, however only 13 5 appear in the dataset. SMS activ- ities are among those excluded. Internet requests were found to hold broken spatial identifiers and were consequently excluded. Thus, for the purposes of this study, all but voice activity have been culled from the dataset.

To construct the composite data table, all tower pings were summed per city, per day to arrive at a total activity measure. Average daily call duration, the number of unique callers, and the number of calls per individual were constructed in a similar fashion. Lastly, these results were combined with population statistics obtained from KSA's

i6 Figure 2.1.2: Left: Service Type Histogram, Right: Service Detail Description Histogram

Ministry of Economy & Planning, Central Department of Statistics & Information, Department of Analysis & Reports. The final dataset includes percentages of men and women, and non-Saudis, for each governorate from the year 2010.

The top panel of Figure 2.1.3 presents a snapshot of daily activity across the nation. To capture activity at a more granular scale, a daily histogram of call activity was recorded at I 5-minute intervals over the course of each day (shown in the bottom panel). Each day follows a very stable pattern of low early-morning activity, a mid-day peak around 1:oopm, a lull in calls until approximately 3:oopm, and a daily maximum between 6:oo and 7:oopm. The overall stability of these plots suggests that it may be possible to detect to a city-wide disruption.

2.2 TWEETS

The second dataset consists of messages posted on Twitter from Saudi Arabia over the study period (12/20/12 to 1/3/3). Twitter is a social networking service that allows users to share and read 'Tweets,' which they define as expressions of a moment or idea

[14]. Tweets are limited to 140 characters and can be posted through a website in- terface, text message, or mobile app. Since its founding in 2006, Twitter has become increasingly popular across the globe. While its usage varies from country to country, A 2012 survey of European and Middle East markets conducted by GlobalWebIndex

17 10 x 10

9

8

7 12/20 12122 12/24 12/26 12/28 12/30 01/01 01/03 10 12 x

00 00 08:00 16:00 00:00

Figure 2.1.3: Phone Activity Timeline Over Study Period (Top), Daily Phone Activity Timeline of Saudi Arabia, Dec. 12th (Bottom) found that 51% of Saudi Arabian internet users are active on Twitter-the highest pen- etration rate of any locale in their report [15]. Saudi Arabia was also found to hold the fastest rate of growth over much of 2012.

The Twitter dataset used in this study is comprised of geotagged Tweets from Twit- ter's Decahose service, which provides "statistically valid sample of at least 10% of all Tweets, selected at random"[ 16]. Each record contains a user identifier, message (up to 140 characters long), timestamp, and location (represented as a pair of latitude and longitude coordinates). These messages were posted by users who chose to include additional locational metadata with each Tweet. Roughly 0.00 1% of the total Tweet stream is geotagged. For the Saudi dataset-tweets corresponding to the cities under

18 comparison-this amounts to about 160,00o tweets from 62,00o unique users with an average message length of 70.5 characters. While this dataset is meager in comparison to CDRs, a method for estimating locations for non-geotagged Tweets is presented in Chapter 7.

2x-104

1.5

12/20 12/22 12/24 12/26 12/28 12/30 01/01 01/03

150

100

50

0

00:00 08:00 16:00 00:00

Figure 2.2.1: Tweet Timeline Over Study Period (Top), Daily Tweet Timeline of Saudi Arabia, Dec. 12th (Bottom)

Unlike the phone activity shown above, daily tweet distributions shown in Figure 2.2.1 are significantly noisier and don't appear to follow an obvious pattern. It may be harder to characterize daily patterns and detect changes introduced by an exogenous shock.

19 2.3 CITY SELECTION AND DATA AGGREGATION

To isolate the treatment effects on Qatif, additional Saudi cities were used as units of comparison. All cities that held a population of over 100,000 were selected, amounting to a total of 40 urban governorates (including Qatif). A city's collection of cell towers was identified by intersecting all towers with its geographic boundaries. The call records were then clustered by tower ID. The geotagged tweets were simply aggregated by city boundaries.

2.4 DATA LIMITATIONS

After exploring the basic properties of the phone call and tweet datasets it's worth un- derscoring their limitations. Both are susceptible to some degree of sampling bias. Regarding the phone records, the data lack explicit market share values per city. While this could be calculated from indirectly through demographic indicators, it remains difficult to know how representative to the location the sample is. In spite of concerns related to possible sampling biases of CDRs [17], they remain one of the most com- prehensive data sources available in representing large-scale human interaction. The geotagged tweets, on the other hand, capture high spatial resolution, but the popula- tion coverage is not nearly as high as the CDRs. Data from Twitter are also liable to other demographic biases, as sampling individuals who participate in online media is inherently biased towards groups who have access to the internet. Lastly-and this is true of both Tweets and CDRs-a user may not be tied to a single individual, and a single individual may not be tied to a single user. It's entirely possible for one person to hold multiple accounts (e.g. an individual owning a phone for business and per- sonal use), as it's possible for a group to communicate through one account (e.g. a company utilizing bots for automated calling or Tweeting). This should not be detri- mental to this study due to the level of aggregation in much of the analysis. However, care has been taken to eliminate this bias in specific instances which examine narrower subpopulations.

20 -I

3 Methods

3.1 SYNTHETIC CONTROL METHODS

Synthetic Control Methods (SCM) is used as the primary methodological tool in this study. The statistical technique was developed by Abadie et al. as a means to investi- gate causal inference in comparative case studies with aggregate data [ 18]. SCM was primarily developed as a means to assess the impact of policy interventions that are applied at an aggregate level, or the effects of a 'treatment' that has been implemented at an aggregate scale (e.g over a country, region, or city), to a small number of units. The traditional approach to comparative case studies of this nature is to use a control group's outcome to approximate the outcome that would have been observed for the treatment group in the absence of treatment. The choice of control units is typically at the researcher's discretion, which has aroused questions over whether or not the con- trol can be interpreted as a plausible counterfactual. It is also difficult to find a single untreated unit that appropriately approximates the unit that has received treatment.

21

11 SCM overcomes this by implementing a data-driven selection process for the control group, offering a much more empirically-defined means of inference. SCM incorpo- rates a weighted combination of units to better approximate the unit that has been exposed to the treatment [19].

The method was first used to examine the economic effects of conflict in the Basque

Country, where the the authors found that after an outbreak of terrorism in the 1970s, the region's per capita GDP declined about 1o percent [18]. SCM was then applied to California's Proposition 99, a cigarette tax enacted in 1988. The authors estimated that in 2000 the annual per-capita cigarette sales were roughly 26 packs lower than what they would have been without the tax [20]. This study represents the first time this methods has been applied to data on daily timescale.

In this study the synthetic control approach will be employed to select a combination of urban governorates (a special Saudi designation for cities at the second level of regional administration within the country) to construct a better comparison for the exposed governorate to the treatment than any single governorate alone. The potential controls were chosen from a list of all Saudi governorates that had a population greater than 100,ooo, as of 201o in the official census.

Qatif is the treated unit, as the riots were concentrated there. Other cities in the Eastern Province may have experienced heightened unrest during the treatment period as well. With this in mind, these cities were included in the donor pool, but the synthetic con- trol method did not make significant use them in its construction of synthetic Qatif. This permits the assumption of no interference between units-violations of the stable unit treatment value assumption. Additionally, it is assumed that the treatment has no effect on the outcome variables before the implementation period. However, this may be a strong assumption since, as stated previously, the Eastern Province has been ex-

periencing unrest since February 2011. Following Abadie et al. [21], the model works out as follows:

For units i = 1,... , J+ 1 and time periods t = 1, ... , T, let:

22 * To be the number of pre-treatment periods with 1 < To < T

* Yt be the dependent variable for unit i at time t in the absence of treatment

e YI be the dependent variable for unit i at time t if unit i is exposed to treatment in period To + 1 to T.

Only the first city (Qatif), i = 1, is exposed to treatment after period To, thus: Dit = 1 if i = 1 and t > To, 0 otherwise.

The observed outcome for unit i at time t is Yt = Yt +aitDit . The desired estimate is: act = Y t - Yf = Yit - YN for t > To. Yt can be observed, so to estimate ait an estimate of Y is required, which can be given by the factor model:

t tZi +Atl ti i N O

6 where t represents the unobserved common time-dependent factor; 6t is a vector of unknown parameters; Zi is a vector of observed covariates not affected by the treat- ment; At is a vector of unobserved common factors; i is a vector of unobserved co- variates, and Eit are error terms representing unobserved transitory shocks. ait will be unbiased if a (Jx1) vector of weights W = (w2, ... , wJ+1)' is chosen such that wj > 0(j = 2, J+ 1) and w2 +... +wJ = 1 where each particular value of the vector W represents a potential synthetic control. Suppose that there are (w2, ... , w* 1 )

J+1 J+1 J+1 E w Yji = Y11, . . , E w jjTo = Y1T, and 1 w Zj = Zi j=2 j=2 j=2

The synthetic control units are selected such that this equation can hold approximately given an appropriate number of pre-treatment time periods.

Let J be the number of available control units and W = (w2, ... , wJ+i)' be a (Jx1) vector of nonnegative weights which sum to i. The scalar wj(j = 1, ... , J) is the weight of region j in synthetic Qatif. Let Xi be a (Kx1) vector of pre-treatment char-

acteristics for the treated unit Qatif. Let X0 be a (KxJ) matrix which contains the values of the same variables for the J possible control governorates. A vector of weights

23 W* is chosen to minimize ||X1 - XoW||v = ((X1 - XoW)'V(X1 - XOW)) where wj > 0(j = 2, ... , J+ 1) and w 2 + ... + wj = 1. V is a diagonal matrix

(kxk) that assigns weights to linear combinations of the variables in Xo and X 1 to minimize the mean square prediction error (MSPE) of the synthetic control estimator.

24 U.

4 Analysis: Call Behavior

As an introductory sanity check, daily call distributions were computed for Qatif by itself, and all other governorates in aggregate for the days of December 21st-the first pre-treatment Friday in the dataset-and December 28th-the first post-treatment Fri- day in the dataset. The first figure depicts the daily distributions for all governorates except Qatif and shows very little variation over the course of the day (Figure 4.0. 1), while the second figure demonstrates a clear decrease in combined activity between gam and 9pm (Figure 2).

Figure 4.0.2 plots the trends in total network activity (top) and call duration (bot- tom) in Qatif and the rest of the governorates in the KSA. There exists some similarity between the plots during the pretreatment, and with some divergence from the treat- ment day onward. However, it remains difficult to judge how closely the aggregate group cities compare with Qatif. The plot of average call duration is even harder to

25 0000. ---r 8- 4 7 ATI KSA 100 TIO.> m3 P-Tres511.t - r e1 m -Traatms,1 - All KSA C16.oo 100.000. POgt-T 00 ,l 1 6 3000 :0861 ry6 ng1-1600

6 3000

5 -2500- 45 2000

2 -1000

. 500 -

Figure 4.0.1: Daily call distributions for Dec. 21st and Dec. 28th for All KSA govornerates (Left), and Qatif (Right)

compare visually, but it's worth noting that the pronounced jump in Qatif's average call duration at the beginning of the post-treatment period, before it again falls below the country-wide average.

Synthetic Qatif is constructed as a weighted average of potential control cities with weights chosen so that the result best reproduces the values of a set of predictors of sociability before the riots began on December 28th. The initial variables of interest are: total daily network activity, daily unique users, and the average duration of daily calls.

Using the synthetic control method described previously, Two distinct synthetic Qat- ifs are constructed such that they that mirror the selected predictors. The treatment effects are then estimated as the differences between Qatif and its synthetic versions in the days following.

The predictors of total daily network activity, daily unique callers, and average call duration are the same. They include:

I. total network activity in the week before treatment, or daily unique callers in the week before treatment

2. average duration of calls in the week before treatment

3. percent of males as obtained from the Department of Analysis & Reports in 2010

26 1.5 -- T II I - 8 - All Cities 1 . * Qatif

0.5 -- , C 0 - -0

-1.5 -o

-3 Q -2-

-2.5 --

19 20% 21 22 23 24 25 26 27 28 29 30 31 01 02 03 04 Day

-- Qatif 0.5-

-3 I I I I I I

-1.5 -

219 20 21 22 23 24 25 26 27 28 29 30 31 01 02 03 04 Day

Figure 4.0.2: Trends in total daily network activity, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Top), and Trends in Average Daily Call Duration, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Bottom). "Treatment" indicated by dashed pink line

4. percent of Saudis, also obtained from the Department of Analysis & Reports in 2010

Table 4.0.1 displays the balance between Qatif and synthetic Qatif for the outcome

27 Table 4.0.1: Total Daily Activity Predictor Means

Variables Treated Synthetic Sample Mean avgDuration (days 1-7) 122.141 122.208 137.957 totalActivity (days 1-7) 12.400 12.402 13.059 percentMen 0.545 0.570 o.6oo percentSaudi o.870 o.855 0.822

Table 4.0.2 Total Daily Network Activity Governorate Weights in Synthetic Qatif

Governorate Weight Governorate Weight Al Bahah 0.010 Ar Rass 0.000 Ar ar 0.007 Unayzah 0.007 Sakaka 0.007 Al Riyadh 0.000 Al Bahar 0.004 Ad Duwadimi 0.010 Buraydah 0.004 Al Majmaah 0.009 Al Kharj 0.005 Al Quwayiyah 0.000 Khamis Mushayt 0.004 0.003 Ha il 0.003 Ahad Rufaydah 0.005 Sabya 0.435 Al Majardah 0.003 Al Qunfidhah o.080 Bishah o.oo6 0.002 Al Qhazaiah o.oo6 Tabuk 0.001 Abu Arish 0.311 Ad 0.003 Ahad Al Masarihah o.oo8 Al Ahsa 0.003 Jizan o.oo6 Al Jubayl 0.004 o.oo8 Al Khubar 0.004 Al Taif 0.002 Haft Al Babin o.oo6 o.oo8 Al Quryyat 0.013 Jiddah 0.008 0.004 0.004 Muhayil 0.003

variable total network activity. As explained previously, the value V was chosen to minimize the MSPE during the pre-treatment week; the values associated with total network activity, average call duration, and percent men was the largest (see Appendix A). Table 4.0.2 displays the weights of each control city in synthetic Qatif. Synthetic Qatif is largely composed of a combination of Sabya and Abu Arish. Both Sabya and Abu Arish are more underdeveloped than Qatif. Sabya, like Qatif, is known for having a higher concentration of Shiites than most cities in Saudi Arabia.

28 -I

N8

tO------6t~i~l

2 4 6 8 10 12 14 2 4 8 8 10 12 14

day day

Figure 4.0.3: Trends in Total Network Activity, Qatif and Synthetic Qatif (Left), and Total Network Activity Gap Between Qatif and Synthetic Qatif (Right)

The first panel of Figure 4.0.3 shows the total network activity in Qatif relative to syn- thetic Qatif in the pre and post treatment periods. Total network activity of synthetic Qatif closely tracks real Qatif in the week before the riots. fhis data, along with the relative good balance in Table 4.0.1 suggests that synthetic Qatif provides a good ap- proximation of the total network activity in Qatif before treatment. To assess if this approximation holds throughout the pre-treatment period, the graph was extended an additional week prior to treatment (see Figure B.o.4 in the Appendix). After Ahmad al-Matar was shot at midnight of December 27 th (the end of day 7, beginning of day 8), the two lines begin to diverge substantially, with total network activity decreasing for actual Qatif. The second panel of Figure 4 plots the daily gaps in total network activity between Qatif and its synthetic counterpart and suggests that the treatment had a large effect on total activity for the following few days but, as expected, the effect is not sustained.

29 Table 4.0.3: Daily Average Call Duration Predictor Means

Variables Treated Synthetic Sample Mean avgDuration (days I-7) 122.141 122.153 137-957 totalActivity (days 1-7) 12.4 12.402 13.059 percentMen 0.545 0.551 o.6oo percentSaudi o.870 o.839 0.822

Table 4.0.4: Daily Average Call Duration Governorate Weights in Synthetic Qatif

Governorate Weight Governorate Weight Al Bahah 0.007 Ar Rass 0.005 Ar ar o.oo8 Unayzah o.oo6 Sakaka 0.007 Al Riyadh 0.001 Yanbu Al Bahar 0.003 Ad Duwadimi o.oo8 Buraydah 0.005 Al Majmaah o.oo6 Al Kharj 0.005 Al Quwayiyah o.oo6 Khamis Mushayt 0.004 Abha 0.003 Ha il 0.003 Ahad Rufaydah o.oo6 Sabya 0.000 Al Majardah 0.001 Al Qunfidhah 0.009 Bishah o.oo6 Najran 0.001 Al Qhazaiah 0.007 Tabuk 0.001 Abu Arish o.826 Ad Dammam 0.003 Ahad Al Masarihah 0.005 Al Ahsa 0.005 Jizan 0.005 Al Jubayl 0.003 Samtah 0.005 Al Khubar 0.003 Al Taif 0.004 Haft Al Babin 0.007 Al Lith 0.004 Al Quryyat o.oo8 Jiddah 0.002 Medina 0.005 Mecca 0.004 Muhayil 0.002

Table 4.0.3 displays the balance between Qatif and its synthetic control for the out- come variable of average call duration. As expected under this scenario, the values of the diagonal element V associated with average call duration pre treatment is the largest; total network activity was not a predictor at all (see Table A.o.2 in the Appendix). Ta- ble 4.0.4 displays the weights of each control city in synthetic Qatif for average call duration. The weights indicate that in this case, synthetic Qatif is constructed mainly by Abu Arish, although others in the donor pool played a small part.

30 * -- OyTl0ed

A A

2 4 6 8 10 12 14 2 4 6 8 10 12 14

days days

Figure 4.0.4: Trends in Average Daily Call Duration, Qatif and Synthetic Qatif (Left), and Average Daily Call Duration Gap Between Qatif and Synthetic Qatif (Right)

The first panel of Figure 5 shows average call duration for Qatif and its synthetic con- trol. The two lines remain closely aligned before and after treatment. This outcome demonstrates that the current predictors are not good at estimating the causal effect of treatment on average call duration.

Figure 4.0.5 shows the total number of unique callers per day between Qatif and syn- thetic Qatif. The plots are similar to the trends in daily network activity, in that both the synthetic and real plots look consistent leading up to the treatment event, after which they deviate considerably.

Table 4.0.5 displays the balance between Qatif and synthetic Qatif for Daily Unique Callers. Table 4.0.6 displays the weights of each control city in synthetic Qatif. Sabya and Abu Arish again make up the majority of the counterfactual.

3' Go ------

-I

0

2 4 BE 10 12 14 2 4 0 8 10 12 14 day day

Figure 4.0.5: Trends in Unique Callers, Qatif and Synthetic Qatif (Left), and Gap in Unique Callers, Qatif and Synthetic Qatif (Right)

Table 4.0.5: Daily Unique Callers Predictor Means

Variables Treated Synthetic Sample Mean uniqueUser (days 1-7) 10.905 10.904 11.210

avgDuration (days 1-7) 122.141 122.208 137.957 percentMen 0.545 0.570 o.6oo percentSaudi o.870 0.855 0.822

INFERENCE

[ SM t

I F 0-p

2 a 6 e t0 2 +i z i s a to

Figure 4.0.6: Synthetic Control Placebo Tests with Sabya. Total Daily Network Activity (Left), Average Call Duration (Middle), and Daily Unique Callers (Right)

To assess the significance of the estimates, a variety of placebo tests are now conducted.

32 Table 4.0.6: Daily Unique Callers Governorate Weights in Synthetic Qatif

Governorate Weight Governorate Weight Al Bahah 0.004 Ar Rass 0.009 Ar ar 0.006 Unayzah 0.009 Sakaka 0.007 Al Riyadh 0.004 Yanbu Al Bahar 0.005 Ad Duwadimi 0.004 Buraydah o.oo8 Al Majmaah 0.003 Al Kharj 0.007 Al Quwayiyah 0.000 o.oo78 Khamis Mushayt Abha 0.006 o.oo86 Ha ii 0.006 Ahad Rufaydah 0.007 Sabya 0.471 Al Majardah 0.004 Al Qunfidhah 0.003 Bishah 0.006 Najran 0.005 Al Qhazaiah 0.004 Tabuk 0.005 Abu Arish 0.274 Ad Dammam o.oo8 Ahad Al Masarihah 0.005 Al Ahsa 0.013 Jizan 0.008 Al Jubayl o.oo6 Samtah 0.005 Al Khubar 0.007 Al Taif 0.006 o.oo4 Haft Al Babin 0.009 Al Lith 0.004 Al Quryyat 0.001 Jiddah 0.005 Medina 0.018 Mecca 0.038 Muhayil 0.005

First, the synthetic control method is applied to Sabya, a city similar to Qatif based on its daily network activity profile and contributed the highest weight in the synthetic control method for the outcome variable total network activity. Figure 4.0.6 shows there is no difference in outcome trajectory between pre and post treatment for Sabya.

Across-unit and in-time permutation tests are now performed. The across-unit placebo test iteratively assigns treatment status to every other city in the donor pool and applies the synthetic control method. If the placebo studies create gaps of similar magnitude to the one estimated for Qatif, the analysis does not provide significant evidence of a negative effect of the treatment on total network activity.

33 fK

'S VM A 'S

11

ow Pgoo

2 4 6 6 10 12 14 2 4 0 e 10 12 14

day day

4 r 2 4 0 a 10 12 14 2 4 a o 10 12 14

day day

Figure 4.0.7: Across-Unit Placebo Tests: Total Activity (all, 500x or less, 100x or less, 50x or less)

In the four graphs in Figure 4.0.8, the gray lines are the control cities and their diver- gence from their synthesized analogs, and the black line is same divergence for Qatif. This helps in assessing whether the estimated treatment effect for the treated unit is distinguishable from randomness. The top left graph includes all placebo cases. The top right graph excludes cities whose MSPEs are 20 times greater, the bottom left graph excludes cities whose MSPEs are io times greater, and the bottom right graph excludes cities whose MSPEs are 5 times greater. Once cities that do not provide a good coun- terfactual are excluded, the gaps are smaller in magnitude than Qatif's suggesting that the result is significant.

34 r 9

4 ri I

p6 3

4

4

4 merol

2 4 8 8 10 12 14 2 4 8 8 10 12 14 day day

n 0

s r o d d 3P a ,s g _.

,s w 4- 0

Y - 4

4

2 4 8 8 10 12 14 2 4 8 8 10 12 14

day day

Figure 4.0.8: Across-Unit Placebo Tests: Daily Unique Callers (all, 500x or less, 100x or less, 50x or less)

The in-time placebo test assigns the treatment period to a time t < To for the ac- tual treated unit. A To of 4 days before the shooting of al-Matar was chosen. As shown in the leftmost panel of Figure 4.0.9, total network activity of synthetic Qatif closely tracks real Qatif during both pre and post treatments, demonstrating that no effect is detected in the absence of treatment. The middle panel of the figure shows the aver- age call duration of synthetic and real Qatif. Although the actual treatment at day 8 has no effect on average duration, there exists a slight but discernible difference at the placebo treatment of day 4. Finally, the rightmost panel of unique daily callers shows a strong correspondence between the synthetic and actual plots over the entire time period, again signifying no effect.

35 V f 3 a 5

6 : e s T e 5 P 12 3 a

Figure 4.0.9: In-Time Placebo Tests with Qatif. Total Daily Network Activity (Left) and Average Call Dura- tion (Right)

36 5 Analysis: Inter and Intracity Calling Patterns

The aggregate activity analysis in the previous section serves as a strong indicator that the population of Qatif altered their communications patterns in response to the violence of December 2 7 th. However, little can be said about the nature of this change. The study will now turn its attention to some of the more nuanced characteristics of call behavior, namely, the relationships between and across cities. Three distinct properties of city communications will now be examined:

" Intra call patterns: calls whose source and destination are within the city

e Inter-in call patterns: calls whose source is outside of the city and destination is within the city

" Inter-out call patterns: calls whose source is within of the city and destination is outside the city

37 These measures may provide a better understanding of whether people turn inward to their communities during times of political duress, or whether they turn outward to spread news beyond their localities. Due to the structure of the dataset, callee location must first be inferred to calculate each of the above quantities. To calculate each of the above quantities per city, dataset. The process is known in the literature as home/work estimation, and has been used often to explore urban mobility behavior [22] [23] [24]

[25].

5.1 LOCATION IDENTIFICATION

Initially, the location identification procedure used followed a filtration procedure sim- ilar to the one discussed in Phithakkitnukoon et al. [26]. First, all weekend activities (weekends in Saudi Arabia are Thursdays and Fridays) were culled from the dataset. Then, call sequences that are too infrequent were removed to focus on meaningful es- timates; only calls that were made within a 16 hour time window were included. Day and nighttime periods were defined as 10:oopm to 6:ooam and 9:ooam to 3:oopm respectively, and calls were binned accordingly. For each phone user, day and night locations were ranked by activity (with a small correction for call hops amongst nearby towers), and representative towers were selected if the user made more than 60% of his or her calls in this location. Although this helps eliminate false traces, it does limit the study to users who hold occupations that follow traditional business hours. While this may encapsulate students, it's likely the jobless, disenfranchised youth that may orient protest movement have been culled from the sample. On the whole, this filtration method resulted in well-defined home/work location pairs for roughly 11% of the the unique identifiers in the full dataset.

Unfortunately, the resulting coverage was too sparse to identify weekly patterns at a city scale, making the filtration too stringent to capture phenomena related to the protests. The location identification procedure was then modified to encapsulate a greater sample size. After all, this study is only interested in a caller's primary city of residence, which permits a greater degree of leniency in the filtration process. In the modified approach all unique users are selected from the full, month-long dataset. Then a record of each user's activities are aggregated per city in the donor pool. To guarantee a meaningful

38 sample of call records, any user who made less than one call every 2 days is culled from the dataset. Then, for each user, governorates are ranked by total activity. A city is designated as a user's home location if more than 75% of his or her total calls are made there. In general, some care should be taken to prevent misidentification due to users moving or vacationing, but this is not considered in this instance due to the relatively tight time window of the dataset.

The new dataset consists of roughly 7.3 million users, or about 40% of the total dataset.

5.2 URBAN CALL COUNTS

After subsetting the set of users whose locations were identifiable, intra, inter-in, and inter-out activity counts are made per city, per day.

Figure 5.2.1 shows the standardized profiles for Qatif (in black), and all other Saudi cities (in gray) over the study period. Contrary to the total daily call activity plot in the previous section (Figure4.o.2, one sees a gradual uptick in daily calls from pre to post treatment weeks. In fact, in terms of week-over-week changes, Qatif experiences a 9.08% increase in intra call activity, against a country-wide 6.64% increase. Similarly, Qatif sees a 7.81 % increase in inter-in activity and a 10.20% increase in inter-out call activity, while the other cities in the donor pool experience increase of roughly 7.7% for both activity types. Looking more closely at the weekly rhythms of the charts, one can see a clear drop in activity on Fridays, similar to the total activity plot from the previous section.

SCM is now applied, using the following predictors:

" Log of daily intra city call activity the week before treatment

" Log of daily inter-in call activity the week before treatment

" Log of daily inter-out call activity the week before treatment

" Percent of males as obtained from the Department of Analysis & Reports in 2010

39 3

2

1 - 0i

0 - 0 .. - -

3 -

2

-3 - - -- ,

21O N

00 p--e--

-2

-3 ------L 3

2

1

-2

l l._jJ Ai I I Iil 1| 3g- -. 19 20 21 22 23 24 25 26 27 28 29 30 31 01 02 03 04 Day -e- Qatif - e - All Other Saudi Govemorates

Figure 5.2.1: Trends in standardized intra (top), inter-in (middle), and inter-out (bottom) call volumes daily network activity, Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indicated by dashed pink line

- Percent of Saudis as obtained from the Department of Analysis & Reports in 2010

Most strikingly, the Figure 5.2.2 shows a marked increase in phone activity immediately following al-Matar's death. The plot shows a strong increase in intracity call activity at

40 g I ) O Ul 1 f:i 8O T EO yy.E

O o

1 Lb

IIt

O I a _ I

1

2 4 6 a 10 12 14 2 4 6 8 10 12 14

day day

Figure 5.2.2: Trends in Intra Call Volumes

I

i N O C1 0

m

O

O U C

I m

O p 00 4

I I

O

4 - Oely L -- Syla alc pgtl -* 2 4 6 6 10 12 14 2 4 6 8 10 12 14

dy day

Figure 5.2.3: Trends in Inter-In Call Volumes

the onset of the protests on December 2 7 th (day 8 of the study period), and another peak on January 1st (day 13 of the study period). In a western context one could mis- take this as an effect of the new years holiday, however, this measure is relative to all cities in the donor pool-synthetic Qatif would have seen an increase as well. Addi- tionally, Saudis follow the Hijri, as opposed to the gregorian calendar year. January ist was significant to Qatif for a different reason; this was the day after al-Mater's funeral procession.

41 1i 0 O o

b

1\~ O

2 4 0 6 10 12 14 2 4 s 1 10 12 14 day day

Figure 5.2.4: Trends in Inter-Out Call Volumes

See Tables C.o.i and B.o.2 in the appendix for more detail on these results, and Fig- ures B.o. i and B.o.3 for their robustness checks. As an additional investigation, Figure B.o.2 shows daily measures of all calls made by individuals identified as residents against a synthetic control. The plot shows a strong correspondence between the real and syn- thetic measures, both pre and post treatment. This adds more nuance to the shift in communications patterns, suggesting that while a residential call 'budget' remained fixed before and after the event, a greater proportion of calls were made to within-city individuals.

Unfortunately, the trends in inter-in and inter-out activity (Figures 5.2.3 and 5.2.4) do not tell as clear a story. Qatif's aggregate change in inter-in activity week over week is roughly consistent with the nation-wide average, and daily plot closely mirrors the synthetic control. On the other hand, Qatif's increase in inter-out activity is a good deal higher than the mean for all cities, yet the daily plot holds a consistent profile with the synthetic profile before and during the period of unrest.

The next section will apply SCM to activity on Twitter.

42 Analysis: Twitter Activity

The Twitter dataset is examined through the following dimensions:

r. The total number of tweets across the city.

2. The number of tweets per unique user.

3. The average message length.

6.1 GEOTAGGED ACTIVITY

The first pass of the analysis looks at trends in twitter activity through the geotagged dataset. The changes from the pre-treatment to post-treatment weeks show only a slight increase in overall tweet volume for Qatif (0.4 3%) and a 7.91% decrease in unique daily users, against an average 18.32% increase for the rest of the country, with a correspond- ing 13.12%. increase in unique daily users. The standardized daily activity plots of Qatif

43 and the rest of Saudi Arabia, (reproduced in the appendix), show profiles that are much noisier than the phone activity trends. This may indicate that the twitter sample is too small, or the temporal resolution is too detailed to quantify an impact in behavior.

3

2 0

- o--_ .

0

Q y - i -1 p.

-2 'II -3 ii ,i 3

2 JI L L L _ _ i ~L L L T T F- rr-F 1--F T-r- T--- C F F 1 O

0 0 a CU Ca -1 Q. - - 6)

-2

3 - r i- g- r - r r --~ -r -

2

1-0

-2 - " A L J I ~ L L L L

19 20 21 22 23 24 25 26 27 28 29 30 31 01 02 03 04 Day -G- Qatif - e - All Other Saudi Govemorates

Figure 6.1.1: Trends in standardized daily Tweet volume (top), Tweet length (middle), and Tweets per user (bottom), Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indicated by dashed pink line

The following predictors are used:

- Log of the tweet volume or the number of tweets per unique user, per day in the week before treatment

44 " The average tweet length per day in the week before treatment

- percent of males as obtained from the Department of Analysis & Reports in

2010

" percent of Saudis, also obtained from the Department of Analysis & Reports in

2010

T ,

I aO ------f1

__ ~ __r

2 4 a 8 10 12 14 2 4 8 8 10 12 14

Day Day

Figure 6.1.2: Trends in Total Tweet Activity, Qatif and Synthetic Qatif

It

------2

v

4 81 i 2 4 6 a 10 12 14 2 4 6 8 10 12 14 ay day

Figure 6.1.3: Trends in Average Tweet Length, Qatif and Synthetic Qatif

45 2 4 6 6 10 12 14 2 4 6 8 10 12 14

Day Day

Figure 6.1.4: Trends in Tweets Per User, Qatif and Synthetic Qatrf

As evidenced in Figures 6.1.2 and 6.1.3, SCM is able to capture and quantify effects on neither geotagged tweet volume nor average tweet length. The plots of daily Tweets per user are slightly better however. Figure 6.1.4 shows a slight, but imperfect corre- spondence between Qatif and its counterfactual for days i through 11i, before a striking divergence for the remainder of the study period. These dates somewhat coincide with the funeral procession of Ahmad al-Matar, which took place on December 31st, or day 12. Predictor weights, governorate weights, and v-weights and displayed in Tables C.o.i, C.o.2, and C.o.3 in Appendix C.

Similar robustness checks are now performed using the techniques explained in pre- vious chapters. The second panel of Figure C.o. i shows the in-time placebo test over the pretreatment period. The lines for Qatif and synthetic Qatif are mostly consistent, showing that no effects are detected in the absence of treatment. The first across-unit placebo is conducted using Ahad Rufayday-the city that represents the largest share of Qatif's synthetic counterfactual. As the first panel in Figure C.o. i depicts, SCM does not produce a consistent counterfactual for the governorate. The complete across-unit permutation test is shown in C.o.2. They suggest that the estimated treatment effect for Qatif is not distinguishable from randomness. These are indications that the change seen in daily Tweets Per User may be the result of a statistical fluke, not the onset of city riots. Ultimately, it's likely that the Tweet data is simply too noisy at this scale. In order to capture a more comprehensive view of Twitter activity in each city, the "Location

46 Estimation for Non-Geotagged Tweets" section in Chapter 7 presents a procedure to estimate locational data from non-geotagged tweets using a probabilistic classifier.

47 7 Future Directions

7.1 LOCATION ESTIMATION FOR NON-GEOTAGGED TWEETS

While majority of Twitter users do not share latitude and longitude data with their tweets, many record an individually-defined, publicly-accessible location string in their account information. Typically, this is the city and country in which they reside. Using this assumption, it is possible to estimate where tweets were made using a naive Bayes text classifier.

The Naive Bayes model is a straightforward probabilistic learning method that has found great popularity in text classification problems due to its relative simplicity yet highly effective performance in many real-world problem domains [27]. It works par- ticularly well for classification problems with high feature spaces, which is well suited for location identification based on a number of diverse textual indicators. Following the approach of Manning et al. [28], the probability of tweet t originating from city c

48 is based on Bayes Rule:

P(C = c)P(t = {w 1 , w2, ... wn }IC = c) P(C = cit = {w1, W2, .. we} P(t = {wl, w2, ... wnt })

Where:

" City c exists in the set of all Saudi Cities in the donor pool: c E C = {ci, c2, ... , C41}

" Each tweet t E X

" nt is the number of terms in the tweet's location field.

P(wi Ic) is the conditional probability of term wi occurring in a tweet of city c.

" P(c) is the prior probability of a tweet originating from city c.

The denominator of the above expression is constant over the target cities, as it is a function of values in t. Additionally, Naive Bayes assumes conditional independence between the attribute values (the tweet location tokens) given a target value (each city). Hence:

P(clt) oC P(c) 11 P(wijc) 1<=i<=nt

Using a training set T of tweets labeled such that (t, c) E XxC, a classifier -y is devel- oped that maps tweets to cities: X -+ C

If a tweet's location terms don't produce clear evidence for a specific city, the city that has the highest prior probability is chosen (the maximum a posteriori decision rule). The Naive Bayes classifier is defined as:

49 argmaxcecP(ct) = argmaxc E C (c) fi P(wi2 c) 1 <=k<=nt

Thus, the estimate to determine which city a tweet belongs to is the product of the probability of each location term of the tweet given a specific city, multiplied by the probability of the city. After calculating for each c E C, the c with the highest proba- bility is selected.

To avoid floating point underflow (situations where the probabilities are so small they can't be stored in memory),instead of multiplying the probabilities they will be logged and summed:

argmaxcEc[logP(c) log1(wi1c)] 1<=k<=nt

Which effectively treats every conditional parameter logP(wi Ic) as a weight that indi- cates how good an indicator tk is for c, where the prior logP(c) is a weight that tells the relative frequency of c. The intuition here is that more frequently cited cities are more likely to be correct.

The maximum likelihood estimate is used estimate the parameter P(c). Given the training set of geotagged tweets this is the relative frequency of c, i.e.:

p(c) = N

Where N, is the number of tweets from city c, and N is the total number of tweets.

The conditional probability P(wl c) is estimated as the relative frequency of the lo- cation term w within tweets belonging to city c:

50 P(wIc) = T Et'E Wem

Here Wow is the count of the location term w from city c. The count runs over all different positions k in the training set of tweets, thus the positions bear no impact on the estimates. This may pose problems in other applications of Naive Bayes, however the problem at hand is narrowly defined, and is unlikely to skew the results.

As safeguard against misclassification, location terms were only associated with Saudi cities during the training process if at least 50% of their occurrences were found in that city.

Applying the supervised learning procedure to stream of 10% of total tweets resulted in about 1,ooo,ooo new tweets from roughly 69,ooo unique users over the entire study period. The analysis from Chapter 7 was redone with the new dataset and the results are reproduced below. As the plots demonstrate, it seems that the data are still too sparse to identify any city-level trends through SCM. While it's possible to lower the classifier's stringency and accumulate more urban-level data, it's worth noting that, overall, the scale of coverage between CDRs and Tweet data is completely different; When looking at Qatif only, the CDR dataset holds roughly 235,ooo records per day, while the Tweet dataset holds approximately 90. It's entirely possible that Twitter's user base in Saudi Arabia was simply too underdeveloped and/or uneven at this point in time, making it impossible to aggregate and compare at the spatial scale of a city, or the temporal scale of a day. Further research will work to balance the classifier's rigidity and total output in an effort to better characterize the treatment effect through social media.

7.I.I INITIAL RESULTS: NON-GEOTAGGED TWEETS

As in Chapter 7, the following predictors are used:

5' " Log of the tweet volume or The number of tweets per unique user, per day in the week before treatment

" The average tweet length per day in the week before treatment

- percent of males as obtained from the Department of Analysis & Reports in

2010

- percent of Saudis, also obtained from the Department of Analysis & Reports in

2010

a

- - - - j 0 ------'6. 0 b a

9

- - syMMtw Qotl

2 4 a a 10 12 14 2 4 6 8 10 12 14

o4y Day

Figure 7.1.1: Trends in Total Tweet Activity, Qatif and Synthetic Qatif

7.2 COMMUNICATION NETWORKS

The analyses on aggregate activity through call records and social media demonstrate a profound change in Qatif's communication patterns in response to civil unrest. The most important next step will be in exploring the compositional changes that occur in the city's social network; is it possible to identify any emergent reorganization strategies that either encourage or impede information flow? This section presents the a few initial investigations in this direction.

52 12

o I n

2 4 B 8 10 12 14 2 4 0 B 10 12 14

Figure 7.1 2: Trends in Average Tweet Length, Qatif and Synthetic Qatif 0!

- - / -O----- (p~~~~~p9~wI w-ww ~-

Day Day

I I 2 4 6 8 10 12 14 2 4 6 8 10 12 14

Figure 7.1.3: Trends in Tweets Per User, Qatif and Synthetic Qatif

NETWORKc GENERATION

Human interaction networks based on CDR data have been constructed for each city in the Saudi donor pooi. The generation procedure follows [29], wherein each mo- bile phone user is defined as a node, and links are formed among nodes according to the communications records. This study focusses only on cities' reciprocal net- works, in which two nodes are connected if and only if both of the corresponding users initiated at least one call to the other over the study period. A non-reciprocal

53 network, in which a link exists if either side initiated activity, may contain unidirec- tional communications-possibly interactions between individuals who do not know each other. Thus, it is presumed to represent a more superficial social network than the reciprocal alternative. Again, following Schlapfer et al., all nodes which never receive nor initiate calls are eliminated, in an effort to remove potential bias from call centers and/or other business hubs.

The network is composed only of users whose home locations have been identified following the procedure described in Chapter 5. The set of users represents roughly 40% of the total individuals in the dataset. Each edge weight wi. is defined by the number of communications initiated by individual ni to individual nr. The degree and edge weight distributions of the two-week nationwide network are shown in Fig- ure 7.2.1. Once the complete network was constructed it was split into 40 different city networks by severing intercity edges. A variety of timescales were used, but single day networks proved too sparse to offer meaningful insight. Ultimately, two week-long, directed networks were built corresponding to pre and post treatment periods. This analysis looks at the largest connected cluster (LCC, giant component) extracted from each Qatif network. The networks' basic properties are summarized in Table 7.2.1.

106 -- --- __ 108 10 ''______1_____ 101

10 102

101 ______01 1 2120 103 104

10-2 10 10 103 100 10 102 10 14

Figure 7.2.1: Total Degree Distribution (Left), and Edge Weight Distribution (Right) of the Complete Recip- rocated Network, KSA

54 Week n m Avg Degree GCC LCC Qatif Post 23193 60379 5.21 0.092 76% Qatif Pre 22522 56392 5.01 0.092 73%

Table 7.2.1: Summary statistics for Communication Networks. The size of the larges connected compo- nent (LCC) is presented as a percentage of the number of nodes in the full city network. The total number of nodes (n), number of links (m), average degree, and global clustering coefficient (GCC) correspond to the complete city networks

INFORMATION DIFFUSION

Following Onnela et al. [30], Figure 7.2.2 explores global information diffusion across both the pre and post treatment networks for Qatif. The process is based on a sim- ple infection model in which the probability that an infected node passes the disease to its nearest neighbor node is proportional to the strength of their connection. The procedure randomly selects an individual and 'infects' him or her with information at time to = 0. At each following time step t8 each infected individual n will pass the information to another individual nr in its contact list with probability Pi = zwig, where wi3 is the edge weight. Thus if two individuals have a higher number of con- nections between them they will be more likely to pass information to each other. x is used a control parameter for the rate of overall spread through the network. The most straightforward choice is x = 1/max(wi3 ), such that the strongest weight will result in a probability of I. However, as Onnela et al. state, this creates very long simulation times due to the skewness of the weight distribution; normalizing by the maximum weight creates very small transmission probabilities for the majority of connections. By increasing the value of x the simulation can be sped up without dramatically al- tering the overall system. This produces a cutoff w* to the transmission probability, such that transmission will always occur for weights above w*. w* is chosen so that Pc,,(w*) ~~.965, or w* ~ 14. Thus Pig - wig for 96.5% of the weights.

The diffusion simulation has been conducted 1,ooo times over each network. As the top panel in Figure 7.2.2 demonstrates, beyond a threshold of roughly 25% infected, the rate of transmission is actually faster in the post treatment network. Additionally, as seen in the bottom panel of 7.2.2, the distributions of edge weights of the links responsible for infecting an individual favor low edge strengths for both networks, sug-

55 gesting that the majority of individuals get their through weak ties-a finding that is consistent with the revered role of weak ties in information sharing [31]. Moreover, the phenomenon is slightly pronounced in the post treatment distribution, which may imply that weak ties become increasingly important during times of duress. It must be stated, however, these results are strictly preliminary. They provide some indication that communities intelligently reorganize communications to increase dissemination speed and breadth during periods of civil unrest, but further research will be required to clarify this relationship and demonstrate a causal link.

7.3 RELIGIOSITY

An intriguing pattern was found in the Saudi mobile activity distributions; at various points in the day activity would simply drop off for around 30 to 40 minutes before retiring to its typical trend. These inactivity "valleys" were actually the result of daily prayer times. Millions of Muslims across the country put down their phones to turn and face the holy city of Mecca to give prayer five times a day. Shops and businesses essentially close for 20-30 minutes while the religious police-the Mutaween-surveil the streets in the hopes of sending all loiterers to the nearest mosques. Interestingly, the activity distributions capture this behavior very closely. The precise timing of these calls to prayer depend on the position of the sun in the sky, and thus, by differentiating the CDR distributions into western, central, and eastern regions one can see the prayer times moving across the country as shown in Figure 7.3-1 .

This prayer time disruption could function as a rough proxy for urban-level religiosity. The following method has been created to catalogue daily disruption. It utilized the fourth prayer, Maghrib (the sunset call to prayer), due of its strong presence in the data.

Method I: Let nmaxo be the maximum total network traffic at the beginning of the window,

td,maxo, nd,max, be the maximum total network traffic at the end of the window for day d, td,maxi, and nd,mini be the minimum total network traffic over the win- dow, at td,min for day d. Now let C(t) equal call count for time t, and O(t) as:

56 I

0.8

0.6

a

m c

c 0.4 a U

N d

0-2 Qatif Pre Treatment

Qatif Post Treatment

0 ------0 20 40 60 80 10 0 Time t

500 Qatif Pre Treatment

__ _ Qatif Post Treatment 400

8 300

E 200 z

100

0 0 20 40 60 80 100 Time t

0.35 7------Qatif Pre Treatment

0.3 f-4----- Qatif Post Treatment -

0.25

0.2

a. 0.15

0.1

0.05

0 5 10 15 220 Edge Weight

Figure 7.2.2: Fraction of Infected Nodes as Function of Time (Top), Number of infected Nodes at each instance of t (Middle), and Distributions of Edge Weights Responsible for Infection

57 U

my

-~

U

/7, <~4(I / / q'V v

I / ~hJW ~ 1< 1

iii

Figure 7.3.1: Daily Network Activity Distributions from Jeddah (Western Saudi Arabia), Riyadh (Central Saudi Arabia), and the Eastern Region

((nd,MAX, - nd,maxo ) / (td,maxi - td,maxo)) t + nd,mini (the estimated curve had no disruption occurred). The disruption is then calculated as the ratio of the disturbance area over total possible area:

tdmaxl f C(t)-C(t)dt R2d tdmax d- tdax 1 f C(t)dt tdrnaxo

Figure 7.3.2 shows disruption for all cities over pre and post treatment periods. The plot is messy at best. Unfortunately it seems that religiosity is not tractable using this method over a daily time scale. Quantifying this phenomenon remains an open ques- tion for the future.

58 U

L.Lr

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02 ' ' ' ' 19 20 21 22 23 24 25 26 27 28 29 30 31 01 02 03 04

Figure 7.3.2: Trends in Daily Prayer Time Disruption, All KSA Cities. Qatif drawn in pink.

59 8 Discussion

The analysis presented above points to a number of compelling, statistically signifi- cant changes in communications across Qatif-relative to other urban agglomerations in Saudi Arabia-in the week following Ahmad al-Matar's death. The effects of social unrest undoubtedly reverberate through phone behavior, as Qatif's daily call activity ap- pears to shift in both magnitude and composition in response to the exogenous shock.

The most powerful trend identified in Qatif is the decrease in city-wide phone activity over the post-treatment time window; the volume of daily calls exhibits a dramatic drop immediately following the boy's death, and holds well below the synthetic counterfac- tual for the remainder of the study period. A similarly significant drop in unique daily callers is also found-evidence that not only were fewer calls being made, but fewer people were actively communicating through the city's telecom infrastructure. Inter- estingly, the differences between the treated and control units at the extremes of the post treatment period for unique daily callers are slightly less pronounced than those

6o I

of total daily calls, suggesting that treatment effect on unique callers has a slower initial response time and a faster falloff.

These results raise two possible, but not mutually exclusive post-treatment scenarios: (i) individuals who had the means to leave the city did so; or (2) people limited their daily calls, potentially switching to other forms of communication. To begin with the former, it's possible that individuals who had been relative 'outsiders' in Qatif- individuals who had ties to other areas-turned away at the first sign of violence. This is not unlikely, given allegations that the Government had labeled the city as a dangerous place to visit [1 3]. It's within the realm of possibility that the alleged scaremongering had made individuals apprehensive about spending time in Qatif.

To address latter, there seems to be some sensitivity in the Saudi population regard- ing privacy concerns related to mobile phones. The government has a huge stake in mobile infrastructure; STC, for instance, is majority-owned by the Saudi government through Saudi Arabia's Public Investment Fund. There have long been rumors circu- lating that the government monitors its citizens' activities through mobile devices [391. These rumors came to a head recently, when Human Rights Watch, an international human rights advocacy group, accused the Kingdom of tapping Qatif residents' phones and monitoring their activity [42). Allegedly, the surveillance software had been prop- agated through the city as malware masquerading as a local news app, since the onset of Shia-led protests in 2011 [43]. It's possible that people-even those not involved with the demonstrations-wanted to avoid any potential scrutiny and avoided phone use at the first indication of civil disruption.

While daily call volume and unique daily callers experience steep declines, no evidence of a change in the average duration of daily calls is found. Social unrest appears to have a stronger, more generalized effect on how often people make calls and whom they call than how long they stay on the phone. Duration measures appear too noisy to isolate patterns at the levels of aggregation employed here. The study period may need to be constrained to a tighter time window around the event, and/or measurements may need to be obtained at more granular intervals to extract any changes in response to treatment. It's also possible that changes in call duration are only be measurable within subpopulations who are active in the protest movement. These remain issues to be explored in the future.

Beyond changes in daily aggregate activity, strong evidence exists of a transformation in the call composition of individuals who are identified as local residents. The analysis presents an increase in daily activity within the subnetwork of users identified to hold strong spatiotemporal ties to the city, even though their total activity-the number of connections both internal and external to this subnetwork-remains constant. This increase in intra network communication suggests that people strengthen their connec- tions with others in the urban community during periods social unrest. Interestedly, the call measures within this network peak on the day of the al-Matar's funeral pro- cession, adding credence to the notion that these changes were tied to the treatment. Inter-in activity, calls originating in other cities and terminating in Qatif, sees a slight increase week over week, but the change is statistically consistent with all other cities in the donor pool. On the other hand, inter-out activity, calls originating in Qatif and terminating in other cities, sees an increase of roughly i 0% week over week, which is higher than the national increase of 7.7%. However, daily trends do not appear to tractable using SCM.

Additionally, an initial exploration of Qatif's city-wide human interaction networks provides some evidence that information diffusion increases in breadth and speed af- ter treatment. The transmission simulations also point to a higher reliance on weak ties in the post-treatment network, which is consistent with the leading theory on the topic. While these findings are strictly preliminary, they offer some suggestion that communities under duress intelligently reorganize communications to increase overall information flow. Further research will be required to better identify and articulate the structural changes to Qatif's human interaction network before work can done to determine a causal link.

Finally, examining the behavior on Twitter yields some interesting, if not altogether robust, findings. Both geotagged and location-estimated tweet activity, aggregated to the city scale, show no recognizable trends from one week to the next. Average Tweet length is similarly noisy for both Tweet datasets. However, when looking at daily Tweets per user, there appears to be a striking increase before, during, and after al-Matar's funeral on December 31st. It should be stated, however, that this finding

62 is tenuous; the robustness checks do not provide much support that this is more than a statistical anomaly. Ultimately it seems that-while Twitter adoption is quite high in Saudi Arabia-Tweet coverage per city is simply too uneven and sparse to capture any recognizable trends during this time. Upcoming research will attempt to soften the locational classifier's stringency in the hopes of creating a more comprehensive view of social media usage during this period of time.

63 Appendices

64 A Appendix: Call Behavior Table A.0.1: V-Weights for Total Daily Activity

v.Weights log(Total Activity) (days 1-7) 0.421 Avg. Duration. (days 1-7) 0.315 Percent Men o.26 Percent Saudi 0.003

Table A.0.2: V-Weights for Average Daily Duration

v.Weights log(Total Activity) (days 1-7) o Avg. Duration. (days 1-7) 0.912 Percent Men o.o68 Percent Saudi 0.02

Table A.0.3: V-Weights for Daily Unique Callers

v.Weights log(Unique Callers) (days 1-7) 0.893 Avg. Duration. (days 1-7) 0.055 Percent Men 0.048 Percent Saudi 0.004

66 coj

Cu -

C

01

Qatif T -- yntheic Qatif

5 10 15 20

day

Figure A.0.1: Total Network Activity, Qatif and Synthetic Qatif (3 Weeks)

67 .

.

.

.

.

0l I ; r , I r r m I \ \ I I 0

. 0

0

r

.

- syntheticOat, Qatif 0~ 5 10 15 20

day

. Figure A.0.2 Number of Unique Daily Callers, Qatif and Synthetic Qatif (3 Weeks)

. 68

.

. B Appendix: Inter and Intracity Calling Patterns

Table B.0.1: Daily Intra Call Activity Predictor Means

Variables Treated Synthetic Sample Mean intraLog (days 1-7) 10.521 10.521 11.643 interInLog (days 1-7) 9.059 9.030 10.120 interOutLog (days 1-7) 9.008 9.068 10.145 percentMen 0.545 0.546 0.602 percentSaudi 0.870 0.804 0.822

69 Table B.0.2: Governorate Weights in Synthetic Qatif (Daily Intra Call Activity)

Governorate Weight Governorate Weight Al Bahah 0.002 Ar Rass 0.005 Ar ar 0.029 Unayzah 0.003 Sakaka 0.000 Al Riyadh 0.001 Yanbu Al Bahar 0.002 Ad Duwadimi 0.005 Buraydah 0.002 Al Majmaah 0.003 Al Kharj 0.002 Al Quwayiyah 0.004 Khamis Mushayt 0.002 Abha 0.002 Ha ii 0.003 Ahad Rufaydah 0.002 Sabya 0.004 Al Majardah 0.227 Al Qunfidhah 0.001 Bishah 0.416 Naj ran 0.002 Al Qhazaiah 0.012 Tabuk 0.001 Abu Arish 0.006 Ad Dammam 0.001 Ahad Al Masarihah 0.09 Al Ahsa 0.001 Jizan 0.002 Al Jubayl 0.001 Samtah 0.004 Al Khubar 0.012 Al Taif 0.224 Haft Al Babin 0.002 Al Lith 0.022 Al Quryyat 0.002 Jiddah 0.001 Medina 0.002 Mecca 0.001 Muhayil 0.004

- T e

I 4 a - a~rt~tc 6 I

2 4 6 a 10 12 14 1 2 3 4 5 6 7 8 day days

Figure B.0.1: Intra Call Activity Synthetic Control Placebo Test with Samteh (Left), In-time Intra Call Activ- ity Placebo with Qatif (Right)

70 z

Snthet coa

2 4 e 8 10 12 14

Figure B.0.2: Daily Local Call Activity

Table B.0.3: V-Weights for Daily Unique Callers

v.Weights Intra Calls (days 1-7) 0.267 Inter-In Calls (days 1-7) 0.222 Inter-Out Calls (days 1-7) 0.12 Percent Men 0.232 Percent Saudi o.16

71 ------

- Oam 00" .yam.,

2 4 6 a 10 12 14 2 4 6 a 10 12 14 day

------0.11--- --

2 4 a 6 10 12 14 2 4 a a 10 12 14 day day

Figure B.0.3: Across-Unit Placebo Tests: Intra Call Activity (all, 20x or less, 10x or less, 5x or less)

72 0

J

0 0 da

-Qatif synthetic Qatif

5 10 15 20 day

Figure B.O.4: Intra Call Activity, Qatif and Synthetic Qatif (3 Weeks)

73 C Appendix: Twitter Activity

Table C.0.1: Daily Tweets Per User Predictor Means

Variables Treated Synthetic Sample Mean Daily Tweets Per User (days 1-7) 0.267 0.267 0.246 Log Total Daily Tweets (days 1-7) 3.501 3.501 3.740 percentMen 0.545 0.546 o.602 percentSaudi 0.870 0.804 o.822

74 Table C.0.2: Governorate Weights in Synthetic Qatif (tweets Per User)

Governorate Weight Governorate Weight Al Bahah 0.007 Ar Rass 0.014 Ar ar 0.023 Unayzah 0.009 Sakaka 0.011 Al Riyadh 0.009 Yanbu Al Bahar o.oo8 Ad Duwadimi 0.012 Buraydah 0.013 Al Majmaah 0.007 Al Kharj 0.010 Al Quwayiyah 0.006 Khamis Mushayt 0.014 Abha 0.013 Ha ii 0.018 Ahad Rufaydah 0.311 Sabya o.oo6 Al Majardah 0.072 Al Qunfidhah 0.098 Bishah 0.105 Najran 0.010 Al Qhazaiah 0.004 Tabuk 0.028 Abu Arish 0.007 Ad Dammam 0.007 Ahad Al Masarihah 0.000 Al Ahsa 0.029 Jizan 0.007 Al Jubayl 0.004 Samtah 0.012 Al Khubar 0.002 Al Taif 0.00I Haft Al Babin 0.032 Al Lith 0.010 Al Quryyat 0.013 Jiddah 0.007 Medina 0.010 Mecca 0.007 Muhayil 0.042

Table C.0.3: V-Weights for Daily Unique Callers

v.Weights Daily Tweets Per User (days 1-7) 0.745 Log Total Daily Tweets (days 1-7) 0.069 Percent Men o.185 Percent Saudi 0.001

75 o-

m _ 0

0 E d 0 I o"

N _ O

C

-ha-ydRh - - - synthetic Qasti O _ C 2 4 8 1o 12 14 1 2 3 4 5 6 7 6 Day

Figure C.0.1: Tweets Per User Synthetic Control Placebo Test with Ahad Rufaydah (Left), In-time Tweets Per User Placebo with Qatif (Right)

76 r r D

Ii

------

---- ,.g-

2 4 6 8 10 12 14 2 4 a 8 10 12 14 day day

"4

a d G ------p

- onu mmla :ayiw

2 4 6 8 10 12 14 2 4 6 8 10 12 14

day day

Figure C.0.2: Across-Unit Placebo Tests: Tweets Per User (all, 50x or less, 20x or less, 5x or less)

77 References

[I] Z. Tufekci and C. Wilson, "Social media and the decision to participate in politi- cal protest: Observations from tahrir square," Journalof Communication, vol. 62, pp. 363-379, Apr. 2012.

[2] A. Breuer, T. Landman, and D. Farquhar, "Social media and protest mobiliza- tion: Evidence from the tunisian revolution," SSRN Scholarly Paper ID 2133897, Social Science Research Network, Rochester, NY, Aug. 2012.

[3] G. Lotan, E. Graeff, M. Ananny, D. Gaffney, I. Pearce, and D. Boyd, "The rev- olutions were tweeted: Information flows during the 2011 tunisian and egyptian revolutions," InternationalJournal of Communication, vol. I1, pp. 1375-1405, 2011.

[4] M. Szell, S. Grauwin, and C. Ratti, "Contraction of online response to major events," PLoS ONE, vol. 9, p. e89052, Feb. 2014.

[5] J. P. Bagrow, D. Wang, and A.-L. Barabasi, "Collective response of human pop- ulations to large-scale emergencies," PLoS ONE, vol. 6, p. e1768o, Mar. 2011.

[6] S. Aday, H. Farrell, M. Lynch, and D. Freelon, Blogs and Bullets H: New Media and Conflict after the Arab Spring. United States Institute of Peace Press, Mar. 2014.

[7] J. H. Pierskalla and F. M. Hollenbach, "Technology and collective action: The effect of cell phone coverage on political violence in africa," American Political Science Review, vol. 107, pp. 207-224, May 2013.

[8] T. Matthiesen, "Saudi arabia's shiite escalation," July 2012.

[9] "Saudi police arrest prominent shi'ite muslim cleric," Reuters, July 2012.

[io] T. Matthiesen, Sectariangulfi Bahrain, Saudi Arabia, and the Arab Spring that wasn't. Stanford, California: Stanford Briefs, an imprint of Stanford University Press, 2013.

78 [ I] R. Staff, "Two killed as saudi security forces try to arrest shi'ite man," Reuters, Sept. 2012.

[I2] "Saudis protest killing of teen protester in qatif," Press TV, Jan. 2013.

[13] B. Perazzo, "Propaganda & sectarianism: How the saudi government stifles the truth about qatif," Jan. 2013.

[14] "The story of a tweet," Aug. 2014.

[15] M. Mari, "Twitter usage is booming in saudi arabia - GlobalWebIndex (," Mar. 2013.

[16] "Realtime twitter data access," Aug. 2014.

[17] D. Boyd and K. Crawford, "Six provocations for big data," SSRNElectronic Jour- nal, 2011.

[18] A. Abadie and J. Gardeazabal, "The economic costs of conflict: A case-control study for the basque country," Tech. Rep. w8478, National Bureau of Economic Research, Cambridge, MA, Sept. 2001.

['9] A. Abadie, A. Diamond, and J. Hainmueller, "Comparative politics and the syn- thetic control method," SSRN Scholarly Paper ID 1950298, Social Science Re- search Network, Rochester, NY, Feb. 2014.

[20] A. Abadie, A. Diamond, and J. Hainmueller, "Synthetic control methods for comparative case studies: Estimating the effect of california's tobacco control pro- gram," Journalofthe American StatisticalAssociation, vol. 10s, pp. 493-505, June 2010.

[21] A. Abadie, A. Diamond, and J. Hainmueller, "Synth: An r package for syn- thetic control methods in comparative case studies,"JournalofStatisticalSoftware, vol. 42, pp. 1-17, June 2011.

[22] E Calabrese, M. Colonna, P. Lovisolo, D. Parata, and C. Ratti, "Real-time ur- ban monitoring using cell phones: A case study in rome," IEEE Transactionson Intelligent TransportationSystems, vol. 12, pp. 141-151, Mar. 2011.

[23] F. Calabrese, G. Di Lorenzo, L. Liu, and C. Ratti, "Estimating origin-destination flows using mobile phone location data," IEEE Pervasive Computing, vol. 10, pp. 36-44, Apr. 201.

[24] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi, "Understanding individual human mobility patterns," Nature, vol. 453, pp. 779-782, June 2008.

79 [25] P. Wang, T. Hunter, A. M. Bayen, K. Schechtner, and M. C. Gonzalez, "Un- derstanding road usage patterns in urban areas," Scientific Reports, vol. 2, Dec. 2012.

[z6] S. Phithakkitnukoon, Z. Smoreda, and P. Olivier, "Socio-geography of human mobility: A study using longitudinal mobile phone data," PLoS ONE, vol. 7, p. e39253, June 2012.

[27] D. J. Hand and K. Yu, "Idiot's bayes: Not so stupid after all?," International Statistical Review / Revue Internationale de Statistique, vol. 69, p. 3 8 5, Dec. 2001.

[28] C. D. Manning, Introduction to information retrieval. New York: Cambridge University Press, 2008.

[29] M. Schlapfer, L. M. A. Bettencourt, S. Grauwin, M. Raschke, R. Claxton, Z. Smoreda, G. B. West, and C. Ratti, "The scaling of human interactions with city size," Journalof The Royal Society Interface, vol. II, pp. 201 30789-201 30789, July 2014.

[30] J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabasi, "Structure and tie strengths in mobile communication net- works," Proceedings ofthe NationalAcademy of Sciences, vol. 104, pp. 73 3 2-7 3 36, May 2007.

[ 31] M. Granovetter, "The strength of weak ties," The American Journalof Sociology, vol. 78, pp. 1360-13 80, May 1973.

[32] J.-P. Onnela, J. Saramiki, J. Hyvbnen, G. Szab6, M. A. d. Menezes, K. Kaski, A.-L. Barabasi, and J. Kertesz, "Analysis of a large-scale weighted network of one- to-one human communication," New journal ofPhysics, vol. 9, pp. 179-179, June 2007.

[33] "Another young man shot dead in qatif," Saudi Shia, Dec. 2012.

[34] H. al Khoei, "Deadly shootings in saudi arabia, but arab media look the other way," The Guardian,Nov. 2011.

[35] J.-P. Onnela, S. Arbesman, M. C. Gonzalez, A.-L. Barabisi, and N. A. Christakis, "Geographic constraints on social network groups," PLoS ONE, vol. 6, p. e16939, Apr. 2011.

[36] P. Wood, "How saudis are learning to protest," Mar. 2011.

[37] N. Eagle, A. Pentland, and D. Lazer, "Inferring friendship network structure by using mobile phone data," Proceedings of the National Academy of Sciences, vol. io6, pp. 15274-15278, Sept. 2009.

80 [38] F. Calabrese, Z. Smoreda, V. D. Blondel, and C. Ratti, "Interplay between telecommunications and face-to-face interactions: A study using mobile phone data," PLoS ONE, vol. 6, p. e2o814, July 2011.

[39] Anonymous, "Interview with saudi lawyer," May 2013.

[40] C. Song, Z. Qu, N. Blumm, and A.-L. Barabasi, "Limits of predictability in hu- man mobility," Science, vol. 327, pp. 1018-1021, Feb. 2Q10.

[41] A.-M. Staff, "Questions over death of protester in saudi arabia's eastern province - al-monitor: the pulse of the middle east," Jan. 2013.

[42] "Riyadh accused of tapping dissidents' phones," June 2014.

[43] "Saudi arabia: Malicious spyware app identified I human rights watch," June 2014.

[44] "Saudi government monitoring internet to stifle protests."

[45] "Saudi telecom sought US researcher's help in spying on mobile users (wired UK)."

[46] G. Tavares and A. Faisal, "Scaling-laws of human broadcast communication en- able distinction between human, corporate and robot twitter users," PLoS ONE, vol. 8, p. e65774, July 2013.

[47] P. A. Grabowicz, J. J. Ramasco, E. Moro, J. M. Pujol, and V. M. Eguiluz, "Social features of online networks: The strength of intermediary ties in online social media," PLoS ONE, vol. 7, p. e293 58, Jan. 2012.

[48] M. Granovetter, "The impact of social structure on economic outcomes," Journal ofEconomic Perspectives, vol. 19, pp. 33-50, Jan. 2005.

[49] M. Batty, "The size, scale, and shape of cities," Science, vol. 319, pp. 769-771, Feb. 2008.

[50] M. Fujita, P. R. Krugman, and A. J. Venables, The spatialeconomy: cities, regions and internationaltrade, vol. 213. Wiley Online Library, 1999.

[51] M. Karsai, N. Perra, and A. Vespignani, "Time varying networks and the weakness of strong ties," Scientifc Reports, vol. 4, Feb. 2014. Colophon

HIS THESIS WAS TYPESET using LTEX,

originally developed by Leslie Lamport and based on Donald Knuth's TEX. The body text is set in I I point Arno Pro, designed by Robert Slimbach in the style of book types from the Aldine Press in Venice, and issued by Adobe in 2007. A template, which can be used to format a PhD thesis with this look and feel, has been released under the permissive MIT (xI i) license, and can be found online at github.com/suchow/ or from the author at [email protected].

82