Quick viewing(Text Mode)

TREATMENT and SPILLOVER EFFECTS UNDER NETWORK INTERFERENCE Michael P

TREATMENT and SPILLOVER EFFECTS UNDER NETWORK INTERFERENCE Michael P

TREATMENT AND SPILLOVER EFFECTS UNDER NETWORK INTERFERENCE Michael P. Leung*

Abstract—We study nonparametric and regression estimators of treatment within cluster and identifies the spillover effect using varia- and spillover effects when interference is mediated by a network. Infer- tion in treatment saturation across clusters. ence is nonstandard due to dependence induced by treatment spillovers and network-correlated effects. We derive restrictions on the network degree Partial interference is a special case of network interference distribution under which the estimators are consistent and asymptotically because we can define a block-diagonal network adjacency normal and show they can be verified under a strategic model of network matrix that captures the interference structure by connecting formation. We also construct consistent variance estimators robust to het- eroskedasticity and network dependence. Our results allow for the estima- two units if and only if they are members of the same cluster. tion of spillover effects using data from only a single, possibly sampled, In this paper, we study a general network interference setting network. in which the network need not be block-diagonal. This is important in settings where interference is mediated through a . For example, Cai et al. (2015) and Oster I. Introduction and Thornton (2012) study the diffusion of product adop- vast literature in econometrics and studies in- tion through social networks by randomizing knowledge of, Aference on treatment effects. A fundamental assumption or access to, the product across units. To separately identify in this literature is the stable unit treatment value assumption treatment and spillover effects, we can consider subpopula- (SUTVA), which states that the ego’s treatment response is tions of units with the same degree, or number of network invariant to the treatment assignment of any alter. In other neighbors. In such subpopulations, variation in the number words, there are no treatment “spillovers.” However, there of treated neighbors identifies the spillover effect, and varia- are many important contexts in which this assumption fails tion in the ego’s treatment assignment identifies the treatment to hold, and the measurement of spillover effects is often of effect. inherent interest. This paper studies inference on treatment Whereas inference in the partial interference setting is stan- and spillover effects in the absence of SUTVA. We assume dard using conventional clustered standard errors, inference spillovers are mediated through a network, which may be par- with network interference is challenging because a social net- tially observed by the econometrician, and that treatments are work often cannot be partitioned into a large set of plausi- randomly assigned. This setting is relevant for a wide vari- bly independent clusters. For example, Miguel and Kremer ety of experimental contexts, including those in development (2004) consider a setting with a single network where units economics (Bhattacharya, Dupas, & Kanaya, 2013; Duflo, are students, students are treated if they underwent deworm- Dupas, & Kremer, 2011; Miguel & Kremer, 2004), social ing, and students are connected if they study at the same economics (Kling, Liebman, & Katz, 2007; Sobel, 2006), school or nearby schools. The authors are interested in the medical science (Christakis, 2004; Kim et al., 2015; Valente spillover effect of deworming on other students’ absenteeism. et al., 2007), and the study of online social networks (Bond While it appears natural to cluster standard errors at the school et al., 2012; Kramer, Guillory, & Hancock, 2014).1 level, this may not adequately account for the dependence To identify spillover effects in settings without network structure for two reasons. First, the outcome of student i de- data, researchers often employ two-stage randomization de- pends on Ni, the number of treated students in neighboring signs, where each cluster c is randomly assigned a treatment schools, due to the spillover effect. Consequently, the data are not i.i.d. across students because Ni is a function of the saturation sc ∈ [0, 1] and then each individual i in c is ran- domized into treatment with s (Baird et al., 2018; treatment status of any student j in a neighboring school. c ε Hudgens & Halloran, 2008). Under the partial interference Second, the error term i may exhibit network dependence. assumption of no spillovers across clusters, this design iden- The underlying causal mechanism behind absenteeism im- tifies the treatment effect using variation in treatment status plies that the error is a function of whether the student has a worm infection, and this is correlated between neighbors i and j. Clustering standard errors at the school level assumes Received for publication July 5, 2016. Revision accepted for publication students in neighboring schools are independent, which is January 11, 2019. Editor: Bryan S. Graham. violated in both of these scenarios. ∗Leung: University of Southern California. Todeal with network dependence, some studies simply col- I thank the editor and referees for comments that helped improve the lect data on many plausibly independent (usually geograph- exposition of the paper. This work benefited from research support by MIT IDSS. ically isolated) networks, but this can be costly in practice, A supplemental appendix is available online at http://www.mitpress since it requires a network survey in addition to a random- journals.org/doi/suppl/10.1162/rest_a_00818. ized control trial in each cluster. Other studies choose some 1Other papers in applied economics that estimate treatment spillovers in experimental settings include Bandiera, Barankay, and Rasul (2009), rule to partition the network into many subnetworks, which Bursztyn et al. (2014), Cai, De Janvry, and Sadoulet (2015), Duflo and Saez are assumed to be independent. For example, they may clus- (2003), Dupas (2014), Kim et al. (2015), and Oster and Thornton (2012). ter standard errors according to some observed groups, like

The Review of Economics and Statistics, May 2020, 102(2): 368–380 © 2019 by the President and Fellows of Harvard College and the Massachusetts Institute of Technology https://doi.org/10.1162/rest_a_00818

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 TREATMENT AND SPILLOVER EFFECTS 369

schools or classrooms, or they may apply a community de- tunately, it is quite common in practice for the network to tection algorithm. However, as in the deworming example, be sampled and partially observed. We therefore consider a the independence assumption may not be credible, since it superpopulation model, allowing network data to be obtained ignores links that bridge these subnetworks. through a common snowball-sampling scheme. Nonetheless, We study nonparametric and regression estimators of the general central limit theorem we derive that underlies treatment and spillover effects, develop conditions under our main results can be applied to either the finite- or super- which they are consistent and asymptotically normal in the population case. single-network setting, and provide new variance estimators The econometrics literature predominantly focuses on robust to heteroskedasticity and network dependence. The asymptotic inference in the many-networks case, whereas practical relevance of these results is twofold. First, they ob- we consider the single, large network case. Manski (2013) viate the need to collect data on a large sample of plausi- and Lazzati (2015) study partial identification of treatment bly independent clusters or networks. Instead, only a single response under group interference. There is also a large lit- network is required, which substantially reduces the cost of erature on identifying and estimating peer effects, usually in network . We illustrate this point in an empirical the specific context of the linear-in-means model (Bramoullé, application, qualitatively replicating the results of Cai et al. Djebbari, & Fortin, 2009; Brock & Durlauf, 2001; De Giorgi, (2015) but using data from the largest two villages in their Pellizzari, & Redaelli, 2010; Graham, 2008; Hirano & Hahn, sample, comprising 142 of the 1,225 available units. Second, 2010; Manski, 1993). the assumption that SUTVAholds across clusters in two-stage A growing literature in statistics studies estimation of treat- randomized designs may be unreasonable in settings where ment effects when SUTVA is violated (Basse & Airoldi, clusters are not sufficiently geographically isolated and cross- 2015; Hudgens & Halloran, 2008; Toulis & Kao, 2013). Most cluster links exist. Using our results, it is unnecessary to par- papers assume that the sample contains many independent tition the data into many independent clusters, as the entire clusters or networks. Aronow and Samii (2017), Liu and Hud- sample can be treated as a single network. gens (2014), Sofrygin and van der Laan (2015), and van der Our results exploit the fact that under restrictions on Laan (2014) discuss inference with a single large cluster or spillover effects widely used in practice and restrictions on network. Aronow and Samii (2017) do not explicitly impose network-correlated effects, a single network can hold a great a network structure and instead require high-level restrictions deal of independent information. For example in Miguel and on potential outcomes that ensure weak dependence. In con- Kremer (2004), suppose students i and j do not study in the trast, a primary objective of this paper is to provide primitive same or neighboring schools and, further, that worm infec- restrictions on the network that induce weak dependence. Liu tions are independent between students at such a distance. and Hudgens (2014) discuss asymptotic inference for a single Then absenteeism will be independent between i and j. The complete network. degree of dependence between units now depends on the net- For asymptotic inference in the single-network case, the lit- work structure. In the extreme case where the network is erature draws on limit theorems that impose uniform bounds completely connected, all outcomes are highly correlated; on the maximum K-neighborhood size.2 However, such con- when the network is empty, this corresponds to the con- ditions are often unrealistic and violated by typical models ventional SUTVA setting. More generally, units with many of network formation. For example, the basic Erd˝os-Rényi connections—what are called high-degree units—generate model generates a limiting Poisson degree distribution, which strong correlation, since each of their treatments can affect has unbounded positive support. There is little discussion of many neighbors. We formally establish asymptotic normality whether reasonable models can generate networks that sat- of nonparametric and regression estimators under new weak- isfy these uniform bounds, likely because obtaining bounds dependence conditions that require the existence of higher- on the maximum K-neighborhood size of a random network order moments of the degree of distribution of the observed is a difficult combinatorial problem. We instead derive a new network. Thus, these conditions limit the occurrence of high- central limit theorem for locally dependent data that does not degree units. Furthermore, we verify these conditions in an require uniformly bounded degrees in the dependency graph. economically motivated model of network formation with This result may be of independent interest; as we discuss, it strategic interactions, drawing on laws of large numbers for can be applied to data with overlapping clusters or certain network moments (Leung, 2019a; Penrose & Yukich, 2003). forms of spatial dependence. Furthermore, we show that the The existing literature focuses on nonparametric estima- required weak-dependence conditions can be verified in an tion, but regression estimators are widely used in practice economically motivated model of network formation. because the effective sample size for nonparametric esti- Several papers study finite-sample inference in the single- mators can be quite small after conditioning. We therefore network setting. Athey, Eckles, and Imbens (2015) show how develop corresponding results for regression estimators. In to compute exact p-values for randomization tests of a variety addition, most of the literature studies finite-population mod- els. If the network is only partially observed, estimands in 2The K-neighborhood of a unit i is the number of units connected to i via these settings cannot be extrapolated to the broader network a sequence of at most K links. The choice of K depends on the model, in and therefore may be of limited policy relevance. Unfor- our case two.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 370 THE REVIEW OF ECONOMICS AND STATISTICS

of null hypotheses when the population is connected through where r(·) is an unknown, real-valued function. This can a network. Tchetgen and VanderWeele (2012) construct con- be viewed as a potential outcomes model with multivalued servative finite-sample confidence intervals for the model “treatment” (Di, Ti ), or what Manski (2013) calls an “effec- of Hudgens and Halloran (2008) using large-deviations tive treatment.” Note that γi is not considered part of the inequalities. effective treatment because we will allow for correlation be- In the next section, we present the model and discuss iden- tween degree and unobservables. Hence, γi functions as a tification. In section III, we present results for estimation and control variable. Both models allow only for exogenous peer inference. We illustrate these results in an empirical appli- effects, in the terminology of Manski (1993). It is typical in cation in section IV and a Monte Carlo study in section V. applied work to focus on such effects, even in settings with an Finally, section VI concludes. All proofs are in the appendix. incomplete network (Bandiera et al., 2009; Cai et al., 2015; Manresa, 2016; Oster & Thornton, 2012). Hudgens and Halloran (2008) consider a model similar II. Model to equation (2) for the case in which A is the complete net- work, meaning all units are linked with one another. Aronow Consider n units connected through an undirected network, and Samii (2017) study generalizations of equation (2) that × represented by an N N symmetric matrix A, with ijth en- replace T and γ with more complicated network summary 3 i i try Aij equal to 1 if units i and j are linked and 0 otherwise. statistics. We focus on equation (2) for two reasons. First, We assume there are no self links, so A = 0 for all i.As- ii Ti and γi appear to be the most common summary statistics sociate each unit i with a binary treatment Di, unobserved used in practice. Second, we can provide more primitive jus- ε ∈ dε ∈ heterogeneity i R , and treatment response Yi R.We tifications for equation (2) in terms of nonparametric shape { }N { ε , ε , } assume Di i=1 is i.i.d., and the elements of ( i j Aij) i= j restrictions on the response function, as discussed in section are identically distributed. SA.1 of the appendix. Nonetheless, our model can be easily As an example, consider the setting of Cai et al. (2015). be generalized to allow for higher-order and heterogeneous They study the diffusion of a weather insurance product spillovers, as discussed in section SA.1. What is important through a social network in the context of a randomized ex- for inference is the restriction that the treatment assignment periment on farmers in rural China. The outcomeYi is an indi- of units further than some fixed path distance K do not affect cator for whether farmer i purchases weather insurance, Di is the ego’s outcome, which is satisfied by K = 1 in the mod- an indicator for attending an intensive information session ex- els above. Such a restriction can be tested using the methods ε plaining the benefits of insurance, and i captures unobserved proposed by Athey et al. (2015). determinants of the purchase decision. The hypothesis is that having more treated friends increases the likelihood of own take-up. A. Sampling Frame As is common in applied work, Cai et al. (2015) assume We take the tuple W = (Y , D , T , γ ) to be the data on outcomes follow a linear-in-means model with no endoge- i i i i i unit i available to the econometrician and assume that we nous effect, a simplified version of which can be written as observe a large sample of Wi’s. Additionally, we assume the 1-neighborhood of each sampled unit i is observed, that is, the Yi = β1 + β2Di + β3Ti/γi + εi, (1) set of units i connected to j in A. It will be important to know where T = N D A is the number of i’s treated neigh- these identities of network neighbors in order to construct i j=1 j ij N standard errors. bors, and γ = A is i’s degree, or number of neigh- i j=1 ij Such a sample may be obtained from a single, fully ob- bors. The treatment effect is captured by β and the spillover 2 served network consisting of N units. It may also be obtained effect by β . They observe data on a sample of N farmers 3 from a sampled network, which is common in practice. A {(Y , D , T , γ )}N . The inferential challenge is that if few i i i i i=1 popular method of sampling networks is snowball-sampling villages are observed, then clustered standard errors are in- 1-neighborhoods.4 Under this scheme, the researcher first valid. Robust standard errors are also invalid because the data draws a random sample of n ≤ N focal units and collects their are not i.i.d. across farmers. For example, T can clearly be i treatment responses, treatment assignments, and identities of correlated across i, and ε may also exhibit correlated effects. i network neighbors. Then the researcher collects the treatment We study inference on the parameters of equation (1), as assignments of each neighbor. Note that these neighbors need well as a more general nonparametric model, not also be focal units. Formally, our data consist of the triangular array {W }n Yi = r (Di, Ti, γi, εi ) , (2) i i=1 indexed by the network size N, where the number of focal units n ≤ N and each Wi implicitly depends on N through Ti

3While aspects of the proofs are specific to undirected networks, the results can also be applied to directed networks. Directedness changes only the 4This is the sampling scheme used by AddHealth and many applied pa- definitions of the dependency graph and network-correlated effects. The pers, including Banerjee et al. (2013), Cai et al. (2015), Oster and Thornton empirical application in section IV considers a setting with directed links. (2012), Paluck, Shepherd, and Aronow (2016), and Valente et al. (2007).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 TREATMENT AND SPILLOVER EFFECTS 371

and γi. We imagine this array being generated according to C. Identification the following data-generating process: When treatments are completely randomized, we can intu- • itively identify treatment and spillover effects (3) as follows. For each N, a network A on N units is formed according First, we collect the subpopulation of units with degree γ. to some unknown process. Then within this subpopulation, we compare the outcomes of those with treatment assignment d and t treated neigh- • ε   Conditional on A, all N units draw i. bors against those with treatment assignment d and t treated neighbors. In practice, it is often of interest to compare against • The econometrician snowball-samples 1-neighborhoods an uncontaminated control, which corresponds to μ(0, 0, γ). starting from a simple random sample of n ≤ N focal Intuitively, this can be identified by the average outcome of units, where n/N → c ∈ (0, 1] as N →∞. the untreated with no treated neighbors and degree γ.Wenext formalize these ideas. • Units in our snowball sample are randomly assigned treat- Recall that A˜ denotes the set of sampled links and n the ments D , and treatment responses are realized as in equa- ={ }n i number of sampled focal units. Let D Di i=1, and simi- tion (2). larly define ε. We impose two assumptions on the distribution of (D, A˜, ε) that identify the estimands above. In what follows, we will let A˜ denote the set of sampled links. All limits and asymptotic statements in this paper are Assumption 1 (Exogeneity). (a) D ⊥⊥ (A˜, ε). (b) For any unit taken under this sequence sending N →∞. i, εi ⊥⊥ A˜ | γi.

Assumption 1a requires treatment assignments to be inde- B. Estimands pendent of A˜, which has two implications. First, it means For any integrable, real-valued function h(·), define that the network is unaffected by treatment. This is a testable implication, since D and A˜ are observed. For example, one μh(d, t, γ) = E h (r(d, t, γ, ε1(γ))) , could regress A˜ij on Di and D j. In applied work, this as- sumption appears to be pervasive because researchers typi- , , γ ε γ where (d t ) and 1( ) follows the conditional distribution cally obtain network links through a survey, then implement ε γ = γ of 1 given 1 . The expectation is taken with respect to a randomized control trial, and finally estimate spillover ef- this conditional distribution, which implicitly depends on N fects using the prior surveyed network. Hence, by assuming γ · due to 1. When h( ) is the identity function, this corresponds the pretreatment network is the relevant one, they implicitly to the (conditional on γ) average structural function (ASF), assume that A˜ is unaffected by treatment due to the temporal which captures the mean outcome under counterfactual val- ordering. ues of the treatment assignment d and number of treated Since we assume treatments are i.i.d, the second impli- neighbors t. Interest often centers on average treatment or cation of part a is that the treatments are not assigned on spillover effects, defined as the difference the basis of the network structure. In principle this can be re- μ , , γ − μ , , γ , laxed to allow treatments to be weakly dependent conditional h(d t ) h(d t ) (3) on A˜, for example, similar to the network correlated effects    assumption below (assumption 3). We focus on the simple where d, d ∈{0, 1}, t, t ∈ N, and t, t ≤ γ. Thus, for fixed  i.i.d. case, since randomized assignment is common in prac- t, variation in d, d captures an individual-level treatment ef-  tice; it is apparently satisfied by all empirical papers listed in fect, whereas for fixed d, variation in t, t captures a spillover section I. effect. These effects are analogous to the estimands proposed Assumption 1b is weaker than requiring a fully exogenous by Hudgens and Halloran (2008) for two-stage randomized network, a standard assumption in the econometric literature designs. Note that the degree is kept fixed at γ because we on peer effects and the applied economics literature. Our as- will allow ε to be correlated with γ . This means that vari- i i sumption relaxes full exogeneity by allowing unobservables ation in degree is not causal, which is why γ constitutes a i to be correlated with own degree γ . For example, it holds control variable and is not part of the effective treatment. i if (ε , γ ) is initially drawn i.i.d. and the network is formed We can also obtain the conditional quantile structural i i according to the configuration model (Jackson, 2008). function (QSF) by choosing h(x) = 1{x ≤ q} and setting the QSF for the τth quantile equal to the smallest q such Example 1. SupposeYi represents a student’s test score. This that E h(r(d, t, γ, ε1(γ))) ≥ τ. We can then define quantile may be affected by unobserved characteristics ν j of a friend treatment or spillover effects analogous to equation (3). When j, such as parental income. Then if referring to the conditional ASF, we will omit the subscript h, writing n A ν ε = ν , j=1 ij j , i i n μ(d, t, γ) = E r(d, t, γ, ε1(γ)) . j=1 Aij

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 372 THE REVIEW OF ECONOMICS AND STATISTICS

this allows for peer effects in unobservables. Assumption 1b setting, it is immediate from the usual law of large numbers is satisfied, for example, if νi is i.i.d. across students and that such estimators are consistent. However, outcomes in our independent of the friendship network A. setting are dependent due to spillovers, which motivates our Remark 1. It is possible to generalize assumption 1b by as- asymptotic theory. suming εi ⊥⊥ A˜ | Si(A), where Si(A) is some vector of network statistics that includes degree. However, it is an open question D. Network Correlated Effects what primitive models can rationalize conditional indepen- { }n dence assumptions of this type. Also, it should be noted that Observations Wi i=1 may be correlated for two reasons. while assumption 1b allows for some dependence between First, treatment spillovers induce correlation between units the network and unobservables, it does not allow for unob- that are linked together or to a common unit. Second, there served homophily, a major challenge for identification in this may be correlation between the unobservables of different literature.5 units. We will focus on the following network correlated ef- fects assumption in which the network A induces dependence In order to nonparametrically identify the ASF, support between the unobservables of directly connected units. conditions are required. In particular, there needs to be suffi- Assumption 3 cient variation in the treatment assignment and network de- (Correlated Effects). For any sampled unit i, j grees in large samples. The next assumption formalizes this pair ( ), requirement: a. εi is independent of ε j conditional on A˜ and the event = = 6 Assumption 2 (Support). that Aij maxk AikA jk 0, and b. (εi,ε j ) is independent of A˜ conditional on Aij, γi, a. P(D = 1) ∈ (0, 1). 1 γ j, AikA jk. b. There exists a limiting degree distribution P : N → k ε [0, 1] such that for all γ ∈ N, Assumption 3a is the main requirement, which allows i and ε j to be correlated if i and j are linked (Aij = 1) or they n = 1 p have a common neighbor (maxk AikAkj 1), but they must be 1 {γ = γ} −→ P(γ). n i conditionally independent otherwise. Assumption 3b allows i=1 the joint distribution of the unobservables for a pair of units to Define ={γ : P(γ) > 0}. depend on the network through their potential link, degrees, and number of common neighbors. We have in mind two Assumption 2b defines the limiting degree distribution P(·) main examples. of the sampled focal units and its support . In words, P(γ) Example 2. Suppose units represent students, and two stu- is the fraction of sampled units with degree γ in the large-N dents are linked if they share a common class. Then correlated limit. We can identify the conditional ASF only for values of effects may arise from classroom-level heterogeneity, such as γ ∈ . Clearly, if ={0}, then the network is empty in the teacher ability. In this case, it is reasonable to assume that εi limit, and spillover effects are impossible to identify. Estab- and ε j are correlated only if i and j are in the same class, so lishing convergence in assumption 2b requires a law of large that εi ⊥⊥ ε j | Aij = 0, and assumption 3a holds. Furthermore, numbers for degrees, inherently dependent objects, which in the joint distribution of (εi, ε j ) depends only on whether i and turn requires restrictions on the underlying model of network j share a class, so (εi, ε j ) ⊥⊥ A˜ | Aij, and assumption 3b holds. formation. In section A.3 of the appendix, we verify the exis- To take another example, in the setting of Miguel & Kremer tence of P for an economically motivated model of network (2004), we would define Aij = 1 if students (i, j) study at formation. the same school or neighboring schools. Since linked stu- The following proposition characterizes the values of dents are physically close, their worm infection statuses are (d, t, γ) for which the conditional ASF is identified. The likely to be correlated. It may be reasonable to assume stu- proof is straightforward and therefore omitted. dents that who are far away—more than two hops away in the Proposition 1 (Identification). Under assumptions 1 and 2, school network—have independent infection rates, in which case assumption 3a holds.

μh(d, t, γ) = E[h(Y1) | D1 = d, T1 = t, γ1 = γ] Example 3. In example 1, it is clear that assumption 3a is satisfied. Assumption 3b is also satisfied because condi- for all d ∈{0, 1}, γ ∈ , and integers t ≤ γ. , γ , γ , γ−1 ν tional on Aij i j k AikA jk, the term i j Aij j is just a weighted sum of a known number of i.i.d. random variables. Strictly speaking, this proves μh(d, t, γ) is identified only if the conditional expectation on the right-hand side can be In principle, other sources of correlation are possible— consistently estimated. In section III, we study the asymptotic for example, if units are partitioned into clusters—and there properties of the sample analog estimator. In the standard i.i.d. 6Note that because we assume the 1-neighborhoods of all focal units 5 { }n γ , γ , Inference on social interactions when the network is generally endoge- Wi i=1 are observed, the quantities i j k AikA jk, and maxk AikA jk are nous is a well-known open problem (Carrell, Sacerdote, & West, 2013; the same whether denoted using the sampled network A˜ or the full net- Goldsmith-Pinkham & Imbens, 2013; Shalizi & Thomas, 2011). work A.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 TREATMENT AND SPILLOVER EFFECTS 373

−1 are cluster-correlated effects in addition to network effects. is equal to 0 if the distance r ||ρi − ρ j|| is below 1 and equal Remark 2 discusses such generalizations. to ∞ otherwise, where r is a constant defined below. As Le- ung (2019a) discussed, this model can be microfounded as E. Network Formation a network-formation game, where the equilibrium concept corresponds to the widely used notion of pairwise stability In the main body of this paper, we impose general con- with transferable utility (Jackson, 2008). ditions on the sampled network A˜ to obtain a central limit The parameter θ2 captures the effect of education level theorem. In section A.3 in the appendix, we verify these con- on the propensity to share risk. If θ3 is strictly negative, it ditions under the sampling scheme in section IIA when the captures geographic homophily, since risk-sharing relation- network is realized according to an economically motivated ships do not occur between geographically distant house- model of network formation. We next discuss some aspects holds. Finally, θ4 captures strategic interactions—in this case, of this model. (Readers who are interested in the application a so-called taste for transitivity. We might expect θ4 > 0 be- of the main methods of this paper may skip this section at cause the common risk-sharing partner k can ensure i against first reading.) defaults by j and vice versa, which promotes risk sharing The model is a special case of that studied by Leung between i and j. (2019a), who establishes a law of large numbers (LLN) for As discussed in section IIIA, below, most real-world a large class of network moments under weak-dependence networks are sparse, meaning that the average degree is conditions on the primitives. We use this result to verify substantially smaller than the number of units. This is for- the general conditions required in this paper. In principle, malized by requiring the expected degree of any unit to be these conditions can be verified for other network sampling finite in the large-network limit. Toward this end, we assume schemes and models of network formation, but to our knowl- r = (κ/N)1/d for some κ > 0. Then, as discussed in section edge, no other papers provide a suitable law of large num- 2.1 of Leung (2019a), the resulting network is sparse. bers for strategic models. Our model allows for strategic In our simulation study, we pick (θ1, θ2, θ3, θ4) = (−0.25, interactions that encompass most cases of interest consid- 0.5, −1, 0.25) and κ = 3.28. These choices of parameters— ered in the econometric literature (Christakis et al., 2010; in particular, the choice of θ4—ensure that a key weak- Goldsmith-Pinkham & Imbens, 2013; Graham, 2016; Mele, dependence condition (assumption 6 of Leung, 2019a) holds. 2017; Sheng, forthcoming). Alternatively, link formation The condition restricts the strength of strategic interactions may be purely dyadic, with no strategic interactions, as in by ensuring that θ4 is not too large, which is analogous to Graham (2017). restrictions on autoregressive parameters in time series and We next discuss a simple example of the model used in our spatial econometrics. simulation study in section V. We present the general model Finally, we need to specify an equilibrium selection mech- in section A.3 of the appendix. Note that our formal results anism to complete the model, since there may be multi- will not actually assume a particular parametric model of net- ple “equilibrium” networks consistent with equation (4). A work formation but will rather impose general nonparametric second key weak-dependence condition required in Leung conditions to obtain weak dependence. (2019a) (assumption 7) imposes restrictions on the selection Fafchamps and Gubert (2007) study the formation of risk- mechanism. These restrictions are satisfied by myopic best- sharing relationships in the rural Philippines. Consider the response dynamics, which generates a network by starting at following variant of their setting in which units represent a random initial network, repeatedly picking a random pair households and two households are connected (Aij = 1) if of units, and having the selected pair form or sever a link in and only if they share risk. The authors run a logistic regres- bestresponse to the current state of the network. This mecha- sion of link formation Aij on various exogenous character- nism is well studied in the theory literature (Jackson & Watts, istics of i and j, including education level and geographic 2002) and commonly assumed in dynamic models of network distance. We augment their model to allow for strategic in- formation. teractions. Suppose households are endowed with two char- acteristics: αi, an indicator for whether the head of household is well educated, and ρi, which represents the geographic lo- III. Estimation and Inference cation of i. We assume network potential links satisfy In this section, we first present our estimators and network robust standard errors for nonparametric and regression esti- = θ + θ α + α + θ ρ , ρ Aij 1 1 2( i j ) 3d( i j ) mators of the ASF and treatment or spillover effects. We then discuss the asymptotic theory, including the intuition behind + θ4 max AikAkj + ζij > 0 , (4) our main weak-dependence condition. k In light of proposition 1, it is natural to estimate the con- ditional ASF using the sample analog where ζ ∼ N (0, 1) is a random-utility shock and max ij k AikAkj is an indicator for whether i and j both share risk with n h(Y )1 (d, t, γ) ρ , ρ μ d, t, γ = i=1 i i , some common household k. The distance function d( i j ) ˆ h( ) n , , γ i=1 1i(d t )

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 374 THE REVIEW OF ECONOMICS AND STATISTICS

where we abbreviate FIGURE 1.—A LINE NETWORK

1i(d, t, γ) = 1{Di = d, Ti = t, γi = γ}.

When h is the identity function, we simply write μˆ (d, t, γ). Then average treatment or spillover effects can be estimated  where M = (X1εˆ1 ···Xnεˆn) , εˆ is the vector of regression by the difference residuals, and G is an n × n matrix with ijth entry equal   to G defined in equation (6). μˆ (d, t, γ) − μˆ (d , t , γ). (5) ij This generalizes the usual formula for robust standard er- { }n Below, we provide conditions under which rors to allow for correlation between observations Wi i=1 captured by the dependency graph G.IfG is the identity √   matrix, this reduces to the standard Eicker-Huber-White for- n μˆ (d, t, γ) − μˆ (d , t , γ) mula, which corresponds to the case of independent observa- d − μ(d, t, γ) − μ(d, t , γ) −→ N (0, σ2 ). tions. If G has a block-diagonal structure, where the blocks TS are matrices of 1s and the off-diagonals are 0s, then equa- tion (8) is equivalent to clustered standard errors, where the σ2 To define our estimator for TS, for each pair of sampled blocks represent clusters. With network data, the structure of , units (i j), define the indicator G will typically be more heterogeneous in practice, since it is dictated by network linkages. As discussed in remark 2, Gij = 1 Aij + max AikAkj + 1{i = j} > 0 . (6) k we can also consider settings with both network and cluster dependence by constructing G appropriately. That is, Gij = 1 if and only if i and j are the same individual, are connected, or share a common neighbor k. By construc- tion, our sampling scheme ensures that Gij is observed for all A. Dependence Structure sampled units. Further define This section presents our main weak-dependence condi- , , γ , , γ tion required to prove consistency and asymptotic normality. = 1i(d t ) − 1i(d t ), ai   In order to build intuition, we first clarify the dependence ρˆ(d, t, γ) ρˆ(d , t , γ) { }n structure of Wi i=1 under the previous assumptions. Con-   1i(d, t, γ)   1i(d , t , γ) sider figure 1. Since A12 = 1 in the figure, W1 and W2 are bi = μˆ (d, t, γ) − μˆ (d , t , γ) , ρˆ(d, t, γ) ρˆ(d, t , γ) correlated due to treatment spillovers (both Y1 and Y2 may depend on D ) and network correlated effects (ε and ε may n 2 1 2 1 be correlated under assumption 3.) Since A12A23 = 1, W1 and ρˆ(d, t, γ) = 1i(d, t, γ). n W3 are also correlated for the same reasons (e.g., D2 affects i=1 both Y1 and Y3). However, since 1 and 4 are more than two links apart, W1 ⊥⊥W4 | A. Our variance estimator is given by In general, Wi and Wj are dependent if and only if i and j are less than two links apart in the observed network. We 1 n n σˆ 2 = G Y a − b Y a − b . (7) can represent this dependence structure using a dependency TS n ij i i i j j j graph, a symmetric and binary n × n matrix with ijth entry i=1 j=1 given by equation (6). This graph connects two units if and In finite samples, the nonparametric estimator may per- only if they are no more than two links apart in the observed n ˜ form poorly when the effective sample size = 1i(d, t, γ) network A. Then by construction, for any two disjoint subsets i 1 , ⊆{ ,..., } = ∈ ∈ is small. This motivates the use of semiparametric estimators. I1 I2 1 n with G jk 0 for all j I1 and k I2,we We discuss generalized method of moments in section A.1 have in the appendix. Here, we consider a simple linear analog of { ∈ }⊥⊥{ ∈ }| . model (2), Wi : i I1 Wj : j I2 G  That is, two observations are linked in G if and only if they Yi = X β + εi, i are dependent.

where Xi = (Di, Ti, γi ). Other specifications are of course This suggests that sparsity of G is important for weak possible—for example, equation (1). In section SA.2, we dependence. In the extreme case where all units are con- derive primitive sufficient conditions for consistency and nected in G, clearly the sample is “strongly dependent” be- ε asymptotic normality of the OLS estimator and show con- cause all i’s may be correlated under assumption 3. If G sistency of the following variance estimator, is empty, then the Wi’s are mutually independent. Thus, for “weak” dependence, G cannot contain too many high-degree ˆ = (X X )−1MGM(X X )−1, (8) units—those connected with many other units. That is, G

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 TREATMENT AND SPILLOVER EFFECTS 375

needs to be sufficiently sparse, as formalized in the next quirement plays a similar role to the restriction on the third assumption. moment of the degree distribution to ensure that the tails of Define Ni ={j : Gij = 1, 1 ≤ j ≤ n}, the set of sampled this distribution are sufficiently thin so that most units have 3 3 neighbors of i in the dependency graph G, and let (G )ij small degree. Formally, the inner sum j=i(G )ij counts the denote the ijth entry of the third matrix power of G. Finally, number of walks of length 3 emanating from unit i in the let |S| denote the cardinality of any set S. graph induced by G. −1 n | |3 The existence of higher-order moments of the degree dis- Assumption 4 (Degree Distribution). n i=1 Ni and − n tribution is testable by estimating its tail index, which is the n 1 (G3) are bounded in probability. i=1 j=i ij subject of a large literature (Ibragimov, Ibragimov, & Walden, Note that |Ni| is the number of units at most two links 2015). Of course, this requires additional weak-dependence apart from i in the sampled network A˜. Since γi ≤|Ni|, the conditions on the network formation model and appropri- first condition implies that the average degree in the sampled ate standard errors for the tail index estimator, which is be- network A˜ is asymptotically bounded. This is precisely what yond the scope of this paper. For recent progress related to we mean by the network being sparse; it is well known that this topic, see Leung & Moon (2019) for a CLT for network most real-world networks have average degree substantially data and Leung (2019b) for testing for power law degree smaller than the network size (Barabási, 2015). distributions. In practice, sparsity can be assessed by computing the den- Remark 2. It is possible to relax assumption 3 to accom- sity of the dependency graph, which is the fraction of linked modate a general class of heterogeneous correlated effects, pairs: including those arising from cluster-level correlation with possibly overlapping clusters or certain forms of spatial de- 1 . n Gij pendence. Suppose that in addition to treatment spillovers, 2 i≤ j there exists dependence in unobservables through an exoge- nous network A = A for which assumption 3 holds with A / − This is equal to the average degree times 2 (n 1). While replaced by A. Then the correct dependency graph would there is no universally accepted value of the density above be G, where Gij = max{Gij, Aij}. Assumption 4 must then which a network is considered dense and below which it is hold for G. The network A can capture cluster dependence if sparse, it is useful to compare the value against those of other the network has a block-diagonal structure with blocks corre- social networks. For example, the sparse networks surveyed sponding to clusters. The clusters may even be overlapping. in Chandrasekhar (2016) all have density below 12% and We can also allow for direct generalizations of assumption 3 most below 7%. where εi and ε j are correlated (so Aij = 1) if and only if the A simple example of a network that satisfies assumption 4 path distance between i and j in A is below K for any finite is one with uniformly bounded degree. For instance, online K. In the case of assumption 3, this holds for K ≤ 2. social networks may have a hard cap on the number of friends or followers, which is typically substantially smaller than the universe of users. This also holds if A is composed of many B. Large-Sample Theory disconnected subnetworks of uniformly bounded size, which corresponds to the usual sampling frame for clustered stan- The next results establish consistency and asymptotic nor- dard errors. However, this is more reasonably considered a mality of our nonparametric estimator. The results use two large sample of networks rather than a single one. In section regularity conditions in section A.2 of the appendix, which A.3 of the appendix, we verify assumption 4 for a more realis- require the existence of certain fourth moments and limits. tic model of network formation that allows for an unbounded All proofs can be found in section SA.5 of the appendix. degree distribution. Theorem 1 (LLN). Under assumptions 1 to 4 and regularity Assumption 4 requires the existence of higher-order mo- conditions (assumptions 5 and 6), ments of the degree distribution of G. Since degrees in G are necessarily larger than degrees in A, this in turn implies the p μˆ h(d, t, γ) −→ lim μh(d, t, γ) existence of higher-order moments of P(·) defined in assump- N→∞ tion 2b. The requirement n−1 n |N |3 = O (1) means that i=1 i p ∈{ , } γ ∈ ≤ γ the third moment of the degree distribution is asymptotically for all d 0 1 , , and integers t . bounded. From the previous discussion on sparsity, it is clear This implies consistency of analog estimators of average why the first moment must be bounded. If the second mo- treatment and spillover effects, as well as quantile analogs. ment were not, then the variance of the distribution would The next theorem establishes asymptotic normality for aver- be infinite. This would mean that many observations have a age effects. high degree in G, so the sample is strongly correlated for the −1 n 3 same reasons. The second average n i=1 j=i(G )ij is Theorem 2 (CLT for ATE). Under assumptions 1 to 4 and a higher-order moment of the degree distribution, which is regularity conditions (assumptions 5 and 6 in the appendix also required to remain asymptotically bounded. Such a re- for h equal to the identity function), for any d, d ∈{0, 1},

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 376 THE REVIEW OF ECONOMICS AND STATISTICS

γ ∈ ,  ≤ γ σ2 < ∞ TABLE 1.—LINEAR REGRESSION ESTIMATES , and integers t t , there exists TS such that √ Treatment Spillover   n (μˆ (d, t, γ) − μˆ (d , t , γ)) − (μ(d, t, γ) Estimates 0.017 0.468 Network SEs 0.050 0.232 − μ , , γ −→d N , σ2 .7 HC SEs 0.076 0.254 (d t )) 0 TS Network SEs computed from equation (8). HC SEs are conventional robust standard errors. n = 142.

Our last result concerns the consistency of σˆ 2 . TS Following equation (3) of Cai et al. (2015), we estimate Proposition 2 (Variance Estimator). Under the assumptions the following linear probability model: p of theorem 2, σˆ 2 −→ σ2 . TS TS = β + β + β /γ + β + ε , Yi 1 2Di 3Ti i Xi 4 i In section SA.2 of the appendix, we provide analogous results for linear regression. where Xi is a vector of controls discussed in the paper. We compute standard errors using our variance estimator in equa- tion (8), where the dependency graph G used in the for- IV. Empirical Application mula is constructed by converting A to an undirected network = = Cai et al. (2015) conduct a randomized control trial in rural (i.e., we set Aij 1 whenever A ji 1) and then following China in which farmers are randomly provided information equation (6). This accounts for dependence through treat- on a weather insurance product. The collect data on farmers’ ment spillovers as well as network-correlated effects as in, social networks and estimate the spillover effect of having for instance, Example 1. The resulting dependency graph more treated friends on own take-up of the insurance prod- has 485 links, and the density is 6%, corresponding to a uct. The hypothesis is that information obtained by treated sparse network. For comparison, we also report conventional farmers may diffuse through the social network, inducing heteroskedasticity-consistent (HC) standard errors that treat take-up even among the untreated. We illustrate the use of observations as i.i.d. These are obtained by setting G in equa- our methods by replicating their analysis on a small subset tion (8) equal to the identity matrix, which corresponds to a of their data set. graph with only n = 142 self-links. Their network data are obtained from a household census Table 1 displays our results. Similar to Cai et al. (2015), asking each head of household to list up to five close friends. we find positive spillover effects, which are significant at the This corresponds to snowball-sampling 1-neighborhoods (al- 5% level when using our proposed standard errors. The mag- beit with censoring) and induces a directed network on the set nitude of the spillover effect at 0.468 is similar to the orig- of sampled units. By “directed,” we mean that if Aij = 1(j is inal full-sample estimate of 0.278 and indicates that when listed as a close friend of i), it does not necessarily imply that the fraction of treated friends increases from 0 to 1, take- A ji = 1. Since degrees are uniformly bounded, this satisfies up increases by 46.8%. The treatment effect is substantially assumption 4. smaller and insignificant, in line with the full-sample esti- The authors find a significant network spillover effect on mate. If HC standard errors, which are larger than ours, are take-up of the insurance product, using data on many vil- used instead, then both the treatment and spillover effects are lages, with standard errors clustered at the village level. We insignificant. Note that this is not a general phenomenon; for replicate their main specification, equation (3), using only instance, in our simulation study, the correlated-effects struc- data on the largest two villages in their analysis. This re- ture induces network standard errors that are larger than the duces the sample size from 1,255 in their pooled specifi- HC standard errors. cation to 142 units.8 With so few clusters, the only avail- able option in the literature for computing standard errors is heteroskedasticity-consistent (HC) standard errors, but these V. Monte Carlo can only be applied to independent data. As discussed in sec- In this section, we provide simulation results illustrating tion IID, treatment spillovers and network-correlated effects the finite-sample properties of our proposed estimators. Our lead to violations of independence across units. This moti- first exercise concerns the nonparametric estimator of aver- vates the use of our standard errors, which can account for this age treatment and spillover effects and the second the linear dependence. regression estimator. For the former, we estimate the spillover effect: 7The notion of convergence used is “in probability” conditional con- vergence in distribution, where we condition on the sampled network A˜. μ(0, 1, 3) − μ(0, 0, 3). √That is, the conditional (on A˜) Wasserstein distance between the laws of μ τ − μ τ − μ τ − μ τ N , σ2 n ( ˆ ( ) ˆ ( )) ( ( ) ( )) and 0 TS converges in prob- Outcomes are realized according to the random-coefficients ability (unconditionally) to 0. See theorem SA.4.1 in the appendix. 8Following Cai et al. (2015), observations used in the regression consist model, only of farmers assigned to second-round groups who did not receive first- = θ + θ + θ + θ 2 + θ γ , round take-up information. Yi i1 i2Di i3Ti i4Ti i5Ti i

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 TREATMENT AND SPILLOVER EFFECTS 377

TABLE 2.—NONPARAMETRIC ESTIMATOR TABLE 3.—LINEAR REGRESSION

N 1,000 2,000 3,000 β0 −0.25 0.5 0.25 Eff. N (0,1,3)224362N = 500 Estimate −0.25 0.50 0.25 Eff. N (0,0,3)173348 Network SE 0.14 0.10 0.30 Estimate −3.00 −2.98 −3.02 Network coverage 0.93 0.93 0.93 SE 0.71 0.52 0.43 HC SE 0.09 0.11 0.25 Coverage 0.93 0.94 0.94 HC coverage 0.78 0.94 0.87 = − Three thousand simulations. Target coverage: 0.95. Estimates computed from equation (5) and standard N 1000 Estimate 0.25 0.50 0.25 errors from equation (7). Second and third rows display effective sample sizes of the two means in the Network SE 0.10 0.07 0.22 nonparametric estimator. Network coverage 0.94 0.94 0.94 HC SE 0.07 0.08 0.18 HC coverage 0.79 0.95 0.87 where the parameters are independent and distributed as Three thousand simulations. Target coverage: 0.95. Network SEs computed from equation (8). HC SE iid iid follows: θ , θ ∼ N (1, 1), θ , θ ∼ N (−1, 1), and θ = displays conventional robust standard errors. i2 i3 i4 i5 i1 γ−1 ν ν iid∼ N , θ i j Aij j, where j (1 1). Thus, i1 captures peer effects in unobservables, and the true average spillover effect as in equation (6). For comparison, we display conventional is −3. The probability of being treated is 0.3. heteroskedasticity-consistent (HC) standard errors computed under the assumption of i.i.d. data (“HC SE”) and the cor- responding coverage (“HC Coverage”). Our standard errors A. Network Formation perform well in terms of coverage. HC standard errors for the We assume observation of the full network, so n = N. The intercept and spillover parameters are smaller than ours and network formation model follows equation (4) in section IIE. substantially undercover. We assume geographic locations ρi are uniformly distributed , 2 α iid∼ . ζ iid∼ N , on [0 1] , i Ber(0 5), and ij (0 1). The true pa- VI. Conclusion rameter for the network-formation model is (θ1, θ2, θ3, θ4) = (−0.25, 0.5, −1, 0.25). We set the scaling constant r equal This paper studies inference on treatment and spillover to (κ/N)1/2 to ensure network sparsity, where κ = 3.28.9 effects when SUTVA is violated and spillovers are mediated We simulate networks using myopic best-response dynam- by a network. In contrast to most of the existing literature, ics, where the initial network is the network of geographic we only require the observation of a single large network, −1 neighbors 1{r ||ρi − ρ j|| ≤ 1}. and we consider a super-population model that allows the network to be partially observed. We study the large-sample properties of nonparametric and linear regression estimators B. Results of treatment and spillover effects. We show that the estimators Table 2 displays the results of our first exercise. The cells are asymptotically normal under restrictions on the network display averages over the simulation draws. Row “ATE Est” structure and develop new network robust standard errors. is the point estimate and “SE” the , computed Our technical contributions include a new CLT for locally dependent data without requiring uniformly bounded degree using equation (7), where G is constructed as in equation (6). N , , and verification of high-level conditions for local dependence Row “Eff. N (0, 1, 3)” equals i=1 1i(0 1 3) and “Eff. N n , , under a strategic model of network formation. (0, 0, 3)” equals i=1 1i(0 0 3); these give the (average) effective sample sizes, rounded to the nearest integer. The While we focus on network-driven sources of dependence, standard frequency estimator is unbiased, so unsurprisingly, our results are more broadly applicable to other forms of the estimates are close to the true value of −3 across all heterogeneous cross-sectional dependence, including spatial sample sizes. Coverage using our standard errors is close to correlation and cluster dependence with overlapping clusters. the nominal level. We also anticipate that it is straightforward to extend our results to unconfounded treatment assignment and quantile effects. An important question unaddressed by this work is C. Linear Regression optimal assignment of treatments under network interference. (For some work on this topic, see Basse & Airoldi, 2015, and For our second exercise, we consider the linear-in- Ugander et al., 2013.) means model (1), with β0 = (−0.25, 0.5, 0.25) and εi = iid γ−1 A ν for ν ∼ N (0, 1). As in the random-coefficients i j ij j i REFERENCES specification, εi captures peer effects in unobservables. Aronow, P., and C. Samii, “Estimating Average Causal Effects under Gen- Table 3 shows simulation results for the linear regression es- eral Interference, with Application to a Social Network ,” timator with our standard errors (8), where G is constructed Annals of Applied Statistics 11 (2017), 1912–1947. Athey, S., D. Eckles, and G. Imbens, “Exact P-Values for Network Inter- ference,” NBER technical report (2015). 9 More precisely, κ = (πP(−θ3 < θ1 + θ2(αi + α j ) + ζij ≤ 0 | αi, Baird, S., J. Bohren, C. McIntosh, and B. Ozler, “Optimal Design of Ex- −1 α j )2 ) − 0.3for|| · ||2 equal to the L2 norm. This choice of κ ensures a periments in the Presence of Interference,” this REVIEW 100 (2018), weak-dependence condition holds (assumption 6 of Leung, 2019a). 844–860.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 378 THE REVIEW OF ECONOMICS AND STATISTICS

Bandiera, O., I. Barankay, and I. Rasul, “Social Connections and Incentives Ibragimov, M., R. Ibragimov, and J. Walden, Heavy-Tailed Distributions in the Workplace: Evidence from Personnel Data,” Econometrica 77 and Robustness in Economics and Finance (New York: Springer, (2009), 1047–1094. 2015). Banerjee, A., A. Chandrasekhar, E. Duflo, and M. Jackson, “The Diffusion Jackson, M., Social and Economic Networks (Princeton, NJ: Princeton Uni- of Microfinance,” Science 341 (2013), 1236498. versity Press, 2008). Barabási, A., Network Science (Cambridge: Cambridge University Press, Jackson, M., and A. Watts, “The Evolution of Social and Economic Net- 2015). works,” Journal of Economic Theory 106 (October 2002), 265– Basse, G., and E. Airoldi, Optimal Model-Assisted Design of Experiments 295. for Network Correlated Outcomes Suggests New Notions of Network Kim, D., A. Hwong, D. Stafford, D. Hughes, A. O’Malley, J. Fowler, and Balance,” arXiv:1507/00803 (2015). N. Christakis, “Social Network Targeting to Maximise Population Bhattacharya, D., P. Dupas, and S. Kanaya, “Estimating the Impact of Behaviour Change: A Cluster Randomised Controlled Trial,” Lancet Means-Tested Subsidies under Treatment Externalities with Ap- 386 (2015), 145–153. plication to Anti-Malarial Bednets,” NBER working paper 18833 Kling, J., J. Liebman, and L. Katz, “Experimental Analysis of Neighborhood (2013). Effects,” Econometrica 75 (2007), 83–119. Bond, R., C. Fariss, J. Jones, A. Kramer, C. Marlow, S. J. Settle, and J. H. Kramer, A., J. Guillory, and J. Hancock, “Experimental Evidence of Fowler, “A 61-Million-Person Experiment in Social Influence and Massive-Scale Emotional Contagion through Social Networks,” Political Mobilization,” Nature 489 (2012), 295–298. Proceedings of the National Academy of Sciences 111 (2014), 8788– Bramoullé, Y., H. Djebbari, and B. Fortin, “Identification of Peer Effects 8790. through Social Networks,” Journal of Econometrics 150 (2009), 41– Lazzati, N., “Treatment Response with Social Interactions: Partial Identifi- 55. cation via Monotone Comparative Statics,” Quantitative Economics Brock, W., and S. Durlauf, “Discrete Choice with Social Interactions,” Re- 6 (2015), 49–83. view of Economic Studies 68 (2001), 235–260. Leung, M., “A Weak Law for Moments of Pairwise-Stable Networks,” Jour- Bursztyn, L., F. Ederer, B. Ferman, and N. Yuchtman, “Understanding nal of Econometrics 210 (2019a), 310–326. Mechanisms Underlying Peer Effects: Evidence from a Field Ex- ——— “Dependence-Robust Inference Using Resampled Statistics,” Uni- periment on Financial Decisions,” Econometrica 82 (2014), 1273– versity of Southern California working paper (2019b). 1301. Leung, M., and R. Moon, “Normal Approximation in Large Net- Cai, J., A. De Janvry, and E. Sadoulet, “Social Networks and the Decision to work Models,” University of Southern California working paper Insure,” American Economic Journal: Applied Economics 7 (2015), (2019). 81–108. Liu, L., and M. Hudgens, “Large Sample Randomization Inference of Carrell, S., B. Sacerdote, and J. West, “From Natural Variation to Optimal Causal Effects in the Presence of Interference,” Journal of the Amer- Policy? The Importance of Endogenous Peer Group Formation,” ican Statistical Association 109 (2014), 288–301. Econometrica 81 (2013), 855–882. Manresa, E., “Estimating the Structure of Social Interactions Using Panel Chandrasekhar, A., “Econometrics of Network Formation,” in Y. Data,” MIT working paper (2016). Bramoullé, A. Galeotti, and B. Rogers, eds., Oxford Handbook on Manski, C., “Identification of Endogenous Social Effects: The Reflection the Econometrics of Networks (New York: Oxford University Press, Problem,” Review of Economic Studies 60 (1993), 531–542. 2016). ——— “Identification of Treatment Response with Social Interactions,” Christakis, N., “Social Networks and Collateral Health Effects,” British Econometrics Journal 16 (2013), S1–S23. Medical Journal 329 (2004), 184. Mele, A., “A Structural Model of Segregation in Social Networks,” Econo- Christakis, N., J. Fowler, G. Imbens, and K. Kalyanaraman, “An Empir- metrica 85 (2017), 825–850. ical Model for Strategic Network Formation,” Harvard University Miguel, E., and M. Kremer, “Worms: Identifying Impacts on Education and working paper (2010). Health in the Presence of Treatment Externalities,” Econometrica 72 De Giorgi, G., M. Pellizzari, and S. Redaelli, “Identification of Social In- (2004), 159–217. teractions through Partially Overlapping Peer Groups,” American Oster, E., and R. Thornton, “Determinants of Technology Adoption: Peer Economic Journal: Applied Economics 2 (2010), 241–275. Effects in Menstrual Cup Take-Up,” Journal of the European Eco- Duflo, E., and E. Saez, “The Role of Information and Social Interac- nomic Association 10 (2012), 1263–1293. tions in Retirement Plan Decisions: Evidence from a Randomized Paluck, E., H. Shepherd, and P. Aronow, “Changing Climates of Conflict: Experiment,” Quarterly Journal of Economics 118 (2003), 815– A Social Network Experiment in 56 Schools,” Proceedings of the 842. National Academy of Sciences 113 (2016), 566–571. Duflo, E., P. Dupas, and M. Kremer, “Peer Effects, Teacher Incentives, and Penrose, M., and J. Yukich, “Weak Laws of Large Numbers in Geomet- the Impact of Tracking: Evidence from a Randomized Evaluation in ric Probability,” Annals of Applied Probability 13:1 (2003), 277– Kenya,” American Economic Review 101 (2011), 1739. 303. Dupas, P., “Short-Run Subsidies and Long-Run Adoption of New Health Shalizi, C., and A. Thomas, “Homophily and Contagion Are Generically Products: Evidence from a Field Experiment,” Econometrica 82 Confounded in Observational Social Network Studies,” Sociological (2014), 197–228. Methods and Research 40 (2011), 211–239. Fafchamps, M., and F. Gubert, “The Formation of Risk-Sharing Networks,” Sheng, S., “A Structural Economic Analysis of Network Formation Games Journal of Economic Development 83 (2007), 326–350. through Subnetworks,” Econometrics, forthcoming. Goldsmith-Pinkham, P., and G. Imbens, “Social Networks and the Identifi- Sobel, M., “What Do Randomized Studies of Housing Mobility Demon- cation of Peer Effects,” Journal of Business and Economic Statistics strate? Causal Inference in the Face of Interference,” Journal of the 31 (2013), 253–264. American Statistical Association 101 (2006), 1398–1407. Graham, B., “Identifying Social Interactions through Conditional Variance Sofrygin, O., and M. van der Laan, “Semi-Parametric Estimation and Infer- Restrictions,” Econometrica 76 (2008), 643–660. ence for the Mean Outcome of the Single Time-Point Intervention in ——— “Homophily and Transitivity in Dynamic Network Formation,” a Causally Connected Population,” University of California, Berke- University of California, Berkeley working paper (2016). ley working paper (2015). ——— “An Empirical Model of Network Formation: Detecting Homophily Tchetgen, Eric J. Tchetgen and Tyler J. VanderWeele, “On Causal Infer- When Agents Are Heterogeneous,” Econometrica 85 (2017), 1033– ence in the Presence of Interference,” Statistical Methods in Medical 1063. Research 21 (2012), 55–75. Hirano, K., and J. Hahn, “Design of Randomized Experiments to Measure Toulis, P., and E. Kao, “Estimation of Causal Peer Influence Effects” (pp. Social Interaction Effects,” Economics Letters 106:1 (2010), 51–53. 1489–1497), in Proceedings of the 30th International Conference Hoff, P., A. Raftery, and M. Handcock, “Latent Space Approaches to Social on Machine Learning (JMLR: W&CP, 2013). Network Analysis,” Journal of the American Statistical Associatio Ugander, J., B. Karrer, L. Backstrom, and J. Kleinberg, “Graph Cluster Ran- 97 (2002), 1090–1098. domization: Network Exposure to Multiple Universes” (pp. 329– Hudgens, M., and M. Halloran, “Toward Causal Inference with Interfer- 337) in Proceedings of the 19th ACM SIGKDD International Confer- ence,” Journal of the American Statistical Association 103 (2008), ence on Knowledge Discovery and Data Mining (New York: ACM, 832–842. 2013).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 TREATMENT AND SPILLOVER EFFECTS 379

Valente, T., A. Ritt-Olson, A. Stacy, J. Unger, J. Okamoto, and S. Sussman, Assumption 5. For any d ∈{0, 1}, γ ∈ , and integer t ≤ γ, “Peer Acceleration: Effects of a Social Network Tailored Substance Abuse Prevention Program among High-Risk Adolescents,” Addic- 4 sup E h(r(d, t, γ, ε1(γ))) < ∞. tion 102 (2007), 1804–1815. n van der Laan M., “Causal Inference for a Population of Causally Connected Units,” Journal of Causal Inference 2:1 (2014), 13–74. The next condition requires the existence of certain limits. We verify this assumption under a strategic model of network Appendix formation in section A.3. Assumption 6. For any d ∈{0, 1}, γ ∈ , and t ≤ γ, A. Generalized Method of Moments 1 n Let m(·)beap-dimensional vector of moments. In this E h(r(d, t, γ, ε (γ)))h(r(d, t, γ, ε (γ))) n i j section, we assume the model is defined by the moment i=1 j∈Ni\{i} condition ×1{Di = d, Ti = t, γi = γ}1{D j = d, Tj = t, γ j = γ} A˜ E[m(Wi, θ0 ) | Xi, A˜] = 0. (A1) μ , , γ This allows for nonlinear models, such as models with dis- has a finite probability limit. Also, limn→∞ h(d t ) exists. crete outcomes. For instance, if Yi is binary, we can consider , θ a logit analog of equation (1) and set m(Wi 0 )tobethe C. General Model of Network Formation implied score function.  The GMM objective function is Mn(θ) Mn(θ), where In this section, we introduce a strategic model of network θ = −1 n , θ formation under which we verify assumptions 2b, 4, and 6, Mn( ) n i=1 m(Wi ) is the vector of sample moments and  a p × p weight matrix. Using theorem SA.4.1 in the and SA.2.1, the last of which is used to derive the asymptotic appendix, it is straightforward to derive regularity conditions properties of the linear regression estimator. The model is a such that the GMM estimator θˆ satisfies special case of that studied in Leung (2019a), which provides general weak-dependence conditions under which network √ d n(θˆ − θ0 ) −→ N (0,), moments obey a weak law of large numbers. This weak law is used to verify the assumptions. For an illustrative example θ = −1 n ∇ , θ where for Sn( ) n i=1 θm(Wi ), of the model, see equation (4) in section IIE. In addition to (Di, εi), each unit i is endowed with a type −   − ρ , α  = lim V 1E[S (θ ) | A˜]  (θ ) E[S (θ ) | A˜]V 1, ( i i ) relevant for link formation, which can include char- →∞ n n 0 n 0 n 0 n n acteristics like age and education level. The component ρi is  a vector of homophilous attributes, meaning that as the dis- Vn = E[Sn(θ0 ) | A˜] E[Sn(θ0 ) | A˜], similarity between ρ and ρ grows, link formation between  θ = θ θ  | ˜ . i j n( 0 ) E[nMn( 0 )Mn( 0 ) A] i and j becomes increasingly unlikely. Examples include in- come and geographic location. They can also more abstractly An estimator for  analogous to equation (8) for the OLS represent locations in a latent “social space” following the lit- estimator is erature on latent-space models (Hoff, Raftery, & Handcock, 2002). Finally, each pair of units (i, j) is endowed with a  −1    ζ ˆ = Sn(θˆ) Sn(θˆ) Sn(θˆ) M GM Sn(θˆ) random-utility shock to link-formation incentives ij. Potential links in the network satisfy the following  −1 Sn(θˆ) Sn(θˆ) , pairwise-stability condition. For all pairs of units i, j, = −1||ρ − ρ ||, , α , α ,ζ > , where Aij 1 V (r i j Sij i j ij) 0 (A2) ⎛ ⎞ || · || d m1(W1, θˆ) ··· mp(W1, θˆ) where is a norm on R and r a scaling constant defined ⎜ ⎟ below. The vector of statistics M = ⎜ . . ⎟. ⎝ . . ⎠ −1 −1 m (W , θˆ) ··· m (W , θˆ) Sij = S r ||ρi − ρ j||, {r ||ρk − ρl || : k, l = 1,...,N}, 1 n p n A−ij, {(αi, α j,ζij)}i= j B. Regularity Conditions captures strategic interactions, where A−ij is the adjacency The asymptotic theory makes use of the following regu- matrix excluding the potential link between i and j. In equa- larity conditions. The first condition requires the existence of tion (4), Sij = maxk AikA jk. Another example of strategic fourth moments. Note that for consistency, this can be relaxed interactions is preferential attachment—the tendency for to second moments. high-degree individuals to obtain additional connections.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021 380 THE REVIEW OF ECONOMICS AND STATISTICS

This is captured by including the degrees γi, γ j in Sij.In We maintain the following distributional assumptions. equation (4), V takes a linear form. Here we only assume Types are i.i.d. across units, and random-utility shocks are that the joint surplus is strictly decreasing in the first compo- i.i.d. across unit pairs. Homophilous attributes ρi are con- nent of Sij, which captures homophily in ρi. tinuously distributed and independent of all other primi- As discussed in section IIE, most real-world social net- tives. This independence assumption is maintained to sim- works are sparse, formalized by the requirement that the ex- plify the proofs, but it can be relaxed to assumption 8 of pected degree of any unit is asymptotically finite. The next Leung (2019a). assumption imposes a stronger version of this requirement. Proposition 3. Assumptions 2b, 4, and 6 and SA.2.1 in the Assumption 7 (Sparsity). Let r = (κ/N)1/d , where κ is a supplemental appendix are satisfied under our assumptions positive constant. Furthermore, 1 and 5 and assumptions 1–3, 6, and 7 in Leung (2019a). Proof. See section SA.5. −1||ρ − ρ || + −1ρ − ρ , sup sup N E (r i j 1)1 supV (r i j Compared with the assumptions for theorem 1 of Leung N∈N ρi,αi s (2019a), the only difference here is we replace the sparsity condition (assumption 4) and uniform square-integrability s, αi, α j,ζij) > 0 ρi, αi < ∞. condition, equation 14 with assumption 7, which is a suffi- cient condition. The main weak-dependence conditions im- The scaling constant r is taken to zero as N →∞at the rate posed on the network-formation model are assumptions 6 and −1/d N , where d is the dimension of ρi. The rate is standard 7 of Leung (2019a). The former restricts the strength of strate- to ensure sparsity of random geometric graphs, which are gic interactions, and the latter restricts the equilibrium selec- special cases of the model studied here. For more complicated tion mechanism. These weak-dependence conditions imply models, we have to impose the second requirement to ensure a law of large numbers for a large class of network moments, sparsity. Leung (2019a, sec. D) discusses primitive conditions which we apply to prove the proposition. for this. For example, the assumption is satisfied by the model in section IIE.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/rest_a_00818 by guest on 26 September 2021