<<

Treatment and Spillover Effects under Network Interference

Michael P. Leung˚ July 3, 2016

Abstract. We study identification and estimation of treatment and spillover effects when interference is mediated by a network, focusing on the case of ran- domized assignment. We show that identification of spillover effects is possible with only a single, possibly sampled, network under common shape restrictions on treatment response. Estimation in the single-network setting is nonstan- dard due to correlation between outcomes induced by network spillovers and correlated effects. Nonetheless, under restrictions on moments of the network, nonparametric estimators for treatment and spillover effects are consistent and asymptotically normal, and we can construct consistent variance estimators. We also derive analogous results for GMM estimators, including robust stan- dard errors.

JEL Codes: C13, C21 Keywords: social networks, treatment effects, spillovers, SUTVA, Stein’s method

˚USC, Department of Economics and MIT, Institute for Data, Systems, and Society. E-mail: [email protected].

1 Michael P. Leung 1 Introduction

A vast literature in econometrics and studies identification and estimation of treatment effects. In this literature, the stable-unit treatment value assumption (SUTVA) is a fundamental assumption, which states that the ego’s treatment response is invariant to the treatment assignment of any alter. In other words, there is no “interference” or are no “spillovers” in treatment assignment. However, there are many important contexts in which this assumption fails to hold, and the measurement of spillover effects is often of inherent interest. This paper studies identification and estimation of treatment and spillover effects in the absence of SUTVA. In our model, treatment response may depend on the treat- ment assignment of others in a network. We focus on randomized assignment, which is relevant for a wide variety of experimental contexts, including those in development economics (Bhattacharya et al., 2013; Duflo et al., 2011; Miguel and Kremer, 2004), social economics (Kling et al., 2007; Sobel, 2006), medical science (Christakis, 2004; Kim et al., 2015; Valente et al., 2007), and the study of online social networks (Bond et al., 2012; Kramer et al., 2014).1 Unlike most papers in the literature, we assume the econometrician observes a small sample of networks, possibly only a single network. Our results also hold if the network is sampled under a common sampling scheme. Because standard inference procedures treat the network, rather than the node (individual), as the unit of observation, a common strategy is to partition the observed network into subunits, which are then assumed to be independent, often without apparent justification. Our analysis obviates the need to arbitrarily cluster in this fashion. Perhaps as a consequence, the collection of extensive data on many social networks through large-scale surveys is becoming more common (e.g. Banerjee et al., 2013; Cai et al., 2015). Our results substantially reduce the burden of data collection, since, at least under our assumptions, it is unnecessary to sample a large number of networks for the purposes of internal validity. Let the treatment response Yi of any node i be given by

Yi “ rpi, D, G, εiq,

where D is the vector of treatment assignments for all nodes in the network G, and εi is unobserved heterogeneity. We are interested in functionals of the response function rp¨q, for instance the average structural function (ASF) Errpi, d, g, εiqs, where d and g are fixed realizations of the treatment assignment vector and the network. Average treatment and spillover effects can then be recovered from differences Errpi, d, g, εiqs´ 1 1 Errpi, d , g , εiqs. The quantile structural function (QSF) and quantile treatment and spillover effects can be similarly defined.

1Other works in applied economics that estimate treatment spillovers in experimental settings include Bandiera et al.(2009), Bursztyn et al.(2014), Cai et al.(2015), Duflo and Saez(2003), Dupas(2014), and Oster and Thornton(2012).

2 Treatment and Spillover Effects If the econometrician observes a large number of independent networks, then iden- tification and estimation largely follow from the standard theory by viewing networks as “individuals” and a vector of treatment assignments as the treatment assigned to a network, assuming SUTVA now holds across networks. Instead, we consider an asymptotic sequence that sends the size of a single network, rather than the number of networks, to infinity, to approximate the finite-sample reality in which we observe a small sample of networks. Note that multiple networks is a special case, since one can define the union of all observed networks as a single network. It is evident that without further restrictions on rp¨q, identification of treat- ment/spillover effects is impossible in a single-network context, since the dimensions of D and G grow with the number of nodes. We therefore consider two natural shape restrictions on rp¨q that are often used in practice, namely local spillovers (individ- ual responses may depend only on own treatment and the treatment assignments of network neighbors) and anonymity (individual identities do not enter the response function). We find that rp¨q satisfies local spillovers and anonymity if and only if pi, D, Gq enter rp¨q only through the following vector of sufficient statistics: own treatment assignment, number of treated neighbors, and number of untreated neigh- bors. This is useful because the dimension of this vector is fixed with respect to the network size. Then under nontrivial treatment assignment and support conditions on the degree distribution,2 the average and quantile structural functions are identified, provided consistent estimators exist. Estimation and inference requires a central limit theorem, but the dependence structure of the set of outcomes is nonstandard and network-dependent. In particular, conditional on G, the responses of two nodes are dependent if the nodes are connected or they share a common network neighbor, due to treatment spillovers. There may also be dependence due to correlated effects. Notice that unlike time series or spatial settings, there is no underlying metric space with which to define α-mixing or other more standard forms of dependence. Fortunately, a central limit theorem may still be proved using Stein’s method (Penrose, 2003; Ross, 2011). This implies a weak law of large numbers, and therefore the ASF and QSF can be nonparametrically estimated using sample conditional means. These convergence results hold conditional on G, provided certain restrictions on “moments” of G are satisfied. We provide primitive conditions on a large class of network formation models that guarantee these restrictions hold. An additional complication is that, conditional on the network, responses Yi are non-identically distributed, since they depend on the network through spillovers. It is well-known that even in the case of independent and non-identically distributed random variables, it is impossible in general to consistently estimate the asymptotic variance of means, and only conservative upper bounds are estimable. Fortunately, we find that local spillovers and anonymity generate sufficient homogeneity in our

2The degree of a node is its number of connections. A network G’s degree distribution is the distribution of node degrees.

3 Michael P. Leung setting that a consistent estimator for the asymptotic variance exists for nonpara- metric estimators of treatment and spillover effects. Therefore, asymptotically exact inference is possible. Lastly, we derive analogous results for linear regression, including robust standard errors that account for dependence due to spillovers. We also discuss generalizations to method of moments. While we focus on network-driven sources of dependence, our approach accommodates more general forms of heterogeneous cross-sectional depen- dence including spatial correlation and clustering with overlapping groups. A growing literature in statistics studies estimation of treatment effects when SUTVA is violated (e.g. Basse and Airoldi, 2015; Hudgens and Halloran, 2012; Toulis and Kao, 2013). The majority of papers implicitly assume that the sample contains a large number of independent networks or groups, which, as previously noted, simply corresponds to assuming SUTVA across groups. While we study structural quantities such as the ASF, this literature focuses on estimands

1 n 1 1 yipdq ´ 1 yipdq , (1) n »|Di| |Di| 1 fi i“1 dPDi dPD ÿ ÿ ÿi – fl 1 where Di, Di are sets of treatment assignment vectors D, and yip¨q is the treatment 3 1 response function. For instance, Toulis and Kao define “k-level effects” by setting Di to be a singleton set containing the zero vector (no treated nodes) and Di to be the set of treatment assignments in which exactly k neighbors of i are treated. While such estimands appear to be natural analogs of the average treatment effect (ATE) under SUTVA, there are many possible choices of Di, and it is unclear why averaging responses over treatment assignments within such classes delivers policy- relevant estimands without further restrictions on the response function. Aronow and Samii(2015) impose homogeneity in the response function by assuming that 1 yipdq “ yipd q for all d P Di, but this still leaves open the question of the proper choice of Di. In this respect, one of our contributions is to show that natural and widely used shape restrictions on the response function exist that justify simple choices of Di. In particular, we can let Di “ Dipd, t, uq, the set of pD,Gq under which i is assigned to treatment d, i has t treated network neighbors, and i has u untreated network neighbors. Furthermore, for such choices of Di, the limit of each term in the difference (1) has a structural interpretation as an ASF, and the QSF may be similarly defined. Aronow and Samii(2015) and van der Laan(2014) are some of the few papers that discuss inference with a single large network. There are several differences between the work of Aronow and Samii and ours. First, rather than explicitly impose a n network structure, they impose high-level conditions on tyipdqui“1 that ensure weak 3The only source of randomness in these models comes from the treatment assignment mechanism.

4 Treatment and Spillover Effects dependence but do not provide primitive sufficient conditions on the network structure or response function. Moreover, their setting only allows for a conservative estimate of the asymptotic variance, and they do not discuss semiparametric estimation. van der Laan studies a panel data setting in which response at period t only depends on the treatment assignment of others in past periods. His limit results require degree to be uniformly bounded in n, which is a strong assumption we do not impose. In the econometrics literature, Manski(2013) and Lazzati(2015) study partial identification of treatment response under group interference. Their identification arguments assume that the econometrician observes a large number of independent groups (networks). Manski characterizes the identified set under various shape re- strictions on the response function. His constant treatment response assumption is analogous to the homogeneity condition imposed by Aronow and Samii discussed above. While Manski’s model can allow for endogenous peer effects in a reduced- form way, this is possible because he implicitly focuses on the many-networks case. For the case of a single large network, the relevant asymptotic theory does not yet exist. Lastly, there is a large literature in econometrics on identifying and estimating peer effects, usually in the specific context of the linear-in-means model (e.g. Bramoullé et al., 2009; Brock and Durlauf, 2001; De Giorgi et al., 2010; Graham, 2008a; Manski, 1993). This paper studies more general nonparametric and semiparametric models. Furthermore, this literature predominantly focuses on the many-networks case. In the next section, we outline the model and discuss identification. In §3, we present results on estimation and inference. We then illustrate these results in a Monte Carlo study in §4. Lastly, §5 concludes.

2 Model

There are n nodes (individuals) positioned on an undirected network G, represented by a symmetric matrix with ijth entry Gij “ 1 if nodes i and j are linked and Gij “ 0 otherwise. As usual, we assume there are no self links, so Gii “ 0 for all i. A node i is 4 endowed with a binary treatment Di and unobserved heterogeneity εi. We assume n 5 tDiui“1 is distributed i.i.d., and tpεi, εj,Gijqui‰j is identically distributed. Outcomes Yi are realized according to the model

Yi “ rpi, D, G, εiq, (2) where D “ pD1,...,Dnq. Similarly define Y and ε. The econometrician observes a single realization of Y,D,G.6

4It is straightforward to extend the results to finitely supported treatment dosages. 5This implies that the adjacency matrix G is an exchangeable array, which is the usual case. It holds, for instance, in the class of network formation models considered in §A.1. 6We discuss sampled network data below. See e.g. Remark6.

5 Michael P. Leung We focus on the canonical case of randomized treatment assignment and an ex- ogenous network.

Assumption 1 (Exogeneity). (a) D KKpG, εq. (b) For any node i, εi KK G.

Part (a) implies that the treatment is independent of unobserved heterogeneity. It is straightforward to generalize the model to include covariates and relax this as- sumption to unconfoundedness. We focus on randomized assignment to simplify the presentation. The assumption also implies that treatments are not assigned on the basis of network structure. This is a commonly used assumption in the literature (e.g. Aronow and Samii, 2015; Athey et al., 2015) and a natural starting point for analysis.7 It is apparently satisfied by all empirical papers listed in §1. Part (b) of the assumption says the marginal distribution of unobservables is independent of the network. Exogeneity of the network, while strong in some contexts, is a standard condition in the literature. Inference on social interactions when the network is endogenous is a well-known open problem (Carrell et al., 2013; Shalizi and Thomas, 2011).

2.1 Shape Restrictions Even with D and G exogenous, some restrictions on rp¨q are necessary for identifica- tion with a single network to be possible, since the dimensions of D and G grow with n. We introduce two commonly used shape restrictions.

Assumption 2 (Local Spillovers). Fix any node i. For any D,D1, G, G1 such that 1 1 1 1 Di “ Di, Gij “ Gij, and DjGij “ DjGij for all nodes j, it is the case that 1 1 8 rpi, D, G, eq “ rpi, D ,G , eq for any e P supppε1q.

This states that treatment outcomes depend on D,G through own treatment and treatments of network neighbors (those linked to the ego), which is a widely used assumption found in most papers listed in §1. This condition can be tested using the methods developed in Athey et al.(2015) and Song(2015).

Assumption 3 (Anonymity). Let π be a bijection on t1, . . . , nu. Let D,D1 be vectors

7As we will see, the asymptotic theory developed in this paper conditions on the network G, which determines the correlation structure of relevant statistics used for inference. If D and G are dependent, then treatment assignments are not i.i.d. conditional on G. In general, asymptotically exact inference is not possible when observations are non-identically distributed. When observations are also dependent, we know of no procedure to construct “sharp,” consistently estimable upper bounds on the asymptotic variance. 8For a density fp¨q and random vector X, we will denote their respective supports by supppfq and supppXq.

6 Treatment and Spillover Effects

1 1 of treatment assignments and G, G be networks satisfying D “ pDπp1q,...,Dπpnqq and

Gπp1qπp1q ¨ ¨ ¨ Gπp1qπpnq 1 . . G “ ¨ . . ˛ . Gπpnqπp1q ¨ ¨ ¨ Gπpnqπpnq ˚ ‹ ˝ ‚ 1 1 For any e P supppε1q, rpi, D, G, eq “ r πpiq,D ,G , e . ` ˘ This states that, fixing a node i and its unobserved heterogeneity, if you swap the treatment assignments and network positions of nodes j and k, where either j or k may possibly equal i, then the response function for node i remains unaltered. Hence, rp¨q does not depend directly on labels assigned to nodes, which is a commonly imposed restriction. For instance, if nodes 1 and 2 are connected to 3, then 3’s response is the same whether D1 “ 1 ´ D2 “ 1 or D2 “ 1 ´ D1 “ 1, all else equal. For further discussion, see Remark3 below. The next theorem shows that these shape restrictions are equivalent to assuming that rp¨q depends on pi, D, Gq only through the following vector of sufficient statis- tics: own treatment, number of treated network neighbors, and number of untreated neighbors. This will enable us to analyze identification with a single large network.

Theorem 1. The response function satisfies Assumptions2 and3 if and only if 1 1 1 1 1 for any nodes i, j and D,D , G, G such that Di “ Dj, k DkGik “ k DkGjk, and 1 1 1 1 p1 ´ DkqGik “ p1 ´ D qG , it is the case that rpi, D, G, eq “ rpj, D ,G , eq for k k k jk ř ř any e P supppε1q. ř ř We can therefore reparametrize our model as

Yi “ r pDi,Ti,Ui, εiq , (3) where Ti “ j DjGij and Ui “ jp1 ´ DjqGij.

Remark 1.řTheorem1 can be extendedř to allow for higher-order spillovers by gen- eralizing Assumption2 in the natural way. If i’s treatment response depends on the treatment assignment of `-neighbors for all ` “ 1, . . . , k, that is, nodes j for which the path distance between i and j is no more than k, then the model in equation (3) would also depend on the number of treated and untreated `-neighbors for all ` “ 1, . . . , k. We focus on local spillovers in this paper to simplify notation and because it is the most widely used assumption in practice.

Remark 2. This a model with exogenous peer effects. Even in contexts without an explicit network structure, it is common for papers to focus only on estimating exogenous peer effects, captured by regressing Yi on Ti or similar summary statistics. (e.g. Duflo et al., 2011; Dupas, 2014; Graham, 2008b; Mas and Moretti, 2009; Miguel

7 Michael P. Leung and Kremer, 2004). In contexts with an explicit network structure, many papers assume that Yi only depends on Di and Ti, and often linearly (e.g. Bandiera et al., 2009; Cai et al., 2015; Manresa, 2013; Oster and Thornton, 2012).

Remark 3. Assumption3 can be relaxed. Suppose that nodes are endowed with one of K possible “types.” Then one can weaken anonymity to requiring that the response function is invariant to swapping treatment assignments and network positions of nodes with identical types. In that case, the response function in (3) would depend k k k on tpTi ,Ui q; i “ 1, . . . , n; k “ 1,...,Ku, where Ti is the number of treated neighbors k of type k and Ui is the number of untreated neighbors of type k.

2.2 Identification

In light of model (3), we take the tuple Wi “ pYi,Di,Ti,Uiq to be the unit of obser- vation. We assume that we observe a large sample of Wi’s, typically from the same network. For any integrable function hp¨q, define

µhpd, t, uq “ E rh prpd, t, u, εqqs , where pd, t, uq P supppDi,Ti,Uiq and γ “ t ` u. We can recover the ASF by setting hp¨q to be the identity function and the QSF by choosing hpxq “ 1tx ď qu, where q satisfies P prpd, t, u, εq ď qq “ α. When referring to the ASF, we will omit the subscript, writing µpd, t, uq “ E rrpd, t, u, εqs . We define treatment/spillover effects as the difference

1 1 1 µhpd, t, uq ´ µhpd , t , u q, (4) where d, d1 P t0, 1u and t, t1, u, u1 P N satisfy d ` u “ γ and d1 ` u1 “ γ1 for some γ, γ1 P Γ. The following support restrictions immediately yield identification of these quantities, provided analog estimators are consistent. These estimators are discussed in the next section.

Assumption 4 (Support).

(a) PpD1 “ 1q P p0, 1q.

(b) There exists a function P : N Ñ r0, 1s and a nonempty set Γ Ď N such that for all γ P Γ, n 1 p 1 tγ “ γu ÝÑ P pγq n i i“1 n ÿ for γi “ j“1 Gij. ř 8 Treatment and Spillover Effects Condition (b) defines Γ, the support of the limiting degree distribution P p¨q of G. In §A.1, we show that it is satisfied for a broad class of network-formation models. Under this assumption, treatment/spillover effects are identified for all d, d1 P t0, 1u and integers t, t1, u, u1 satisfying t ` u, t1 ` u1 P Γ.

3 Estimation

In this section, we construct estimators and standard errors for nonparametric and semiparametric estimators of the ASF and treatment/spillover effects. Dependence in our setting arises from two potential sources. First, treatment spillovers induce dependence between nodes who are linked together or to a common node. Second, there may be correlation between the unobservables of different nodes. In this paper, we will focus on the following “correlated effects” assumption in which the network G induces dependence between the unobservables of linked nodes.

Assumption 5 (Correlated Effects). For any node pair pi, jq, pεi, εjq KK G | Gij, and εi KK εj | Gij “ 0.

This states that the joint distribution of the unobservables for a pair of nodes may depend on their link status. In particular, εi and εj may be correlated if i and j are linked, but they must be independent otherwise. For instance, if nodes are students, and two students are linked if they share a common class, then correlated effects may arise from classroom-level heterogeneity, such as teacher ability. In principle, other sources of correlation are possible, for example if nodes are partitioned into groups and we wish to allow for clustering. Remark4 below discusses such generalizations. A key object used in the asymptotic theory is the dependency graph, an artificial network on the set of observations that defines their correlation structure. As such, this object is a key input in the formulas. The dependency graph A is a network on the set of observed nodes such that Aii “ 1, and for j ‰ i, Aij “ 1 if and only if either Gij “ 1 or maxk GikGjk “ 1. That is, two observations Wi and Wj are correlated if the two nodes pi, jq are directly connected or indirectly connected with a common neighbor. Given our two sources of dependence in the model, for any two disjoint subsets I1,I2 Ď t1, . . . , nu, we have tWi; i P I1u KKtWi; i P I2u | A if Aij “ 0 n for all i P I1 and j P I2. Let Ni “ tj : Aij “ 1u, and recall that γi “ j“1 Gij. We will make use of the following restrictions on moments that involve both G and A. ř

Assumption 6 (Network Moments I). The following moments are bounded in prob- 1 n 1 n 2 1 n 3 9 ability: n i“1 |Ni|, n i“1 |Ni| , and n i“1 j‰ipA qij.

9 3 pA qij isř the ij-th entryř of the third matrixř powerř of A. We write |S| for the cardinality of a set S.

9 Michael P. Leung This is the main assumption needed for a CLT. Roughly it ensures that the sample is not too correlated. To take an extreme counterexample, if the dependency graph were complete, this would mean that all observations are correlated in a manner that does not vanish with the sample size, so a CLT would be impossible to obtain. We must therefore ensure that any given observation is not linked with “too many” other observations in A, i.e. that the typical degree in A is not large. This idea is formalized 1 n in Assumption6. The requirement that n i“1 |Ni| “ Opp1q means that the average degree in A does not diverge, so the typical observation is correlated with only a small number of other observations, on average. Thatř is, the network A is sparse. However, we also require restrictions on the tails of the degree distribution. For instance, if the variance of this distribution were infinite, then a large number of observations would have high degree, and the sample would be too dependent. This is the purpose 1 n 2 of the restriction n i“1 |Ni| “ Opp1q; it ensures that the variance of the degree 1 n 3 distribution does not diverge. The last average n i“1 j‰ipA qij is a higher-order moment of the degreeř distribution, which is also required to remain asymptotically bounded. This plays a similar role to the restrictionř onř the variance of the degree distribution: to ensure that the tails of this distribution are sufficiently thin, so that most nodes have small degree. A simple example of a network that satisfies these conditions is one with uniformly bounded degree. This would be the case, for example, if G were composed of a large number of disconnected subnetworks of uniformly bounded size. As discussed in §A.1, Assumption6 also holds for a much broader class of network formation models.

Assumption 7 (Network Moments II). For any γ, γ1 P Γ, there exists Jpγ, γ1q ă 8 such that n 1 p 1tγ “ γu1tγ “ γ1u ÝÑ Jpγ, γ1q. n i j i“1 jPN ÿ ÿi The probability limit in this assumption corresponds to the limiting joint distribution of the degrees of node pairs pi, jq that are linked in the dependency graph. This is needed for the existence of the limits of certain expectations, including the asymptotic variance of frequency estimators. We verify this assumption for a class of network- formation models in §9.

Remark 4. We can relax Assumption5 to accommodate a general class of hetero- geneous correlated effects, including those arising from group-level correlation with possibly overlapping groups or certain forms of spatial dependence. Suppose there exists an exogenous network G for which Assumption5 holds with G replaced by G. Then the correct dependency graph would be A, where Aij “ maxtAij, Giju. As- sumptions6 and7 must then hold for A. The network G can capture group-level correlated effects if the network has a block-diagonal structure with blocks corre- sponding to groups. The groups may even be overlapping. We could also allow for

10 Treatment and Spillover Effects direct generalizations of Assumption5 where εi and εj are correlated if the path distance between i and j in G is below K for any finite K.

3.1 Nonparametric Estimation We next establish consistency and asymptotic normality of frequency estimators

n Y 1 pd, t, uq µˆ pd, t, uq “ i“1 i i h n 1 pd, t, uq ř i“1 i

and their differences, where 1ipd, t, uq “ 1řtDi “ d, Ti “ t, Ui “ uu. The results below n assume the entirety of tWiui“1 is observed, but they also hold if only a subsample is observed. See Remark6 below.

Theorem 2 (LLN). Maintain Assumptions1-4 and7, and suppose that for any d P t0, 1u, γ P Γ, and t, u P N such that t ` u “ γ, we have E rhprpd, t, u, εqq2s ă 8. Then p µˆhpd, t, uq ÝÑ µhpd, t, uq.

This implies consistency of analog estimators of the average and quantile treat- ment/spillover effects. The next theorem establishes asymptotic normality.

Theorem 3 (CLT). Maintain Assumptions1-7, and suppose that for any d P t0, 1u, γ P Γ, and t, u P N such that t ` u “ γ, we have E rhprpd, t, u, εqq4s ă 8. Then there exists a function ψhp¨q such that

? 1 n n µˆ pd, t, uq ´ µ pd, t, uq “ ? ψ pW q ` o p1q. (5) h h n h i p i“1 ` ˘ ÿ n n ?1 2 Let vh pd, t, uq “ Var n i“1 ψhpWiq G . Then there exists σhpd, t, uq ă 8 such that ´ ¯ ř n ˇ p 2 vh pd, t, uqˇ ÝÑ σhpd, t, uq.

2 d 2 10 If σhpd, t, uq ą 0, then (5) ÝÑ N 0, σhpd, t, uq . ` ˘ The following proposition exhibits a consistent variance estimator.

10The notion of convergence used is “in probability” conditional (on G) convergence in distribution. ? That is, the conditional (on G) Wasserstein distance between the laws of n µˆhpd, t, uq´µhpd, t, uq and N 0, σ2pd, t, uq converges in probability to zero. See Theorem4 in the appendix. h ` ˘ ` ˘

11 Michael P. Leung

1 n 2 Proposition 1 (Variance Estimator). Let ρˆpd, t, uq “ n i“1 1ipd, t, uq and σˆhpd, t, uq equal ř 1 n ρˆpd, t, uq´2 hpY q ´ µˆ pd, t, uq hpY q ´ µˆ pd, t, uq 1 pd, t, uq1 pd, t, uq. n i h j h i j i“1 jPNi ÿ ÿ ` ˘` ˘ 2 p 2 Under the assumptions of Theorem3, σˆhpd, t, uq ÝÑ σhpd, t, uq.

2 The estimator σˆhpd, t, uq is not necessarily non-negative. However, a negative es- timator appears to be a rare occurrence, as in our simulations (§4), the variance always turns out to be positive. Nonetheless, it is useful to have an estimator that is guaranteed to be non-negative. We propose the following simple fix. ´1 Let Z be an n-vector with ith entry equal to hpYiq´µˆhpd, t, uq 1ipd, t, uqρˆpd, t, uq . 2 1 1 Then σˆhpd, t, uq “ n Z AZ, which may be negative because A is not necessarily posi- ` ˜ ˘ tive semidefinite. A simple solution is to replace A with A “ A ` I, where I is the n ˆ n identity matrix, and then optimizing over  to obtain the smallest estimated ˜ variance. When  “ maxi |Ni|, the matrix A is diagonally dominant and symmetric and therefore positive semidefinite. Hence, 1 2 1 ˜ 11 sˆhpd, t, uq ” min Z AZ ě 0. (6) ě0 n

Remark 5. As can be seen from the proof, the result in Proposition1 follows from showing convergence conditional on the network G. At first glance this may seem surprising, since Yi depends on the network due to spillovers, so conditional on G, outcomes are non-identically distributed in addition to being dependent. It is well known that asymptotically exact inference on means is not achievable even with inde- pendent and non-identically distributed data. The key difference here is that we can represent the model as (3) under local spillovers and anonymity. Under this model, Yi depends on G only through γi, so observations are identically distributed conditional on degree. Since µˆhpd, t, uq is computed only using a subsample of observations with the same degree, it follows that asymptotically exact inference is possible.

Remark 6 (Sampled Network Data). In many cases, the econometrician may not fully observe the entire network. It is quite clear that the results in this section still n hold if only a random fraction of the data tWiui“1 is available, since µˆhpd, t, uq is merely a ratio of averages of functions of Wi. Such a sample can be collected using the following snowball sampling procedure. Sample n nodes at random from the population, collecting treatment responses, treatment assignments, and identities of network neighbors. Then collect treatment assignments of each neighbor.

11 An even simpler alternative is to construct A˜ by adding |Ni| ´ 1 to the ii-th entry of A, for each i and using this in place of A˜. However, this estimator can be quite conservative.

12 Treatment and Spillover Effects We next provide results for inference on treatment/spillover effects (4). By Theo- 1 1 1 rem2, µˆhpd, t, uq ´ µˆpd , t , u q is a consistent estimator. Unlike the standard SUTVA setting, the two quantities in this difference are dependent, so additional work is re- quired for a CLT. This is done in Theorem5, §A.3. The theorem also establishes the 2 existence of the limiting variance σTSE. The following proposition gives a consistent estimate of this variance.

Proposition 2. Define

1 n σˆ2 “ hpY qλˆ ´ κˆ hpY qλˆ ´ κˆ , TSE n i i i j j j i“1 jPNi ÿ ÿ ´` ˘` ˘¯ 1 pd, t, uq 1 pd1, t1, u1q λˆ “ i ´ i , i ρˆpd, t, uq ρˆpd1, t1, u1q µˆ pd, t, uq1 pd, t, uq µˆ pd1, t1, u1q1 pd1, t1, u1q κˆ “ h i ´ h i . i ρˆpd, t, uq ρˆpd1, t1, u1q

2 p 2 Under the assumptions of Theorem5, σˆTSE ÝÑ σTSE.

This estimator is not necessarily non-negative in finite samples, but this can be re- paired using an analog of (6).

3.2 Generalized Method of Moments Frequency estimators require many observations per “cell,” which motivates the use of semiparametric estimators. We focus here on generalized method of moments (GMM). Let mp¨q be a p-dimensional vector of moments, so that the model is defined by the moment condition ErmpWi, θ0q | Xis “ 0, (7) n where tWiui“1 is a set of n observations with Xi a subvector of Wi, and the depen- dence structure for observations Wi is captured by some dependency graph A. This setup is fairly general and accommodates a wide range of settings with dependent data, including spatial dependence, clustering with overlapping groups, and network spillovers. In all cases, the variance estimator will take the same basic form under a given dependency graph determined by the specific correlation structure. 1 n 1 n Let Mnpθq “ n i“1 mpWi, θq, Snpθq “ n i“1 ∇θmpWi, θq, Ψ be a fixed p ˆ p weighting matrix, and I the p ˆ p identity matrix. Using Theorem4, it is possible to derive regularity conditionsř such that ř

? ˆ d npθ ´ θ0q ÝÑ N p0, Λq,

13 Michael P. Leung where

´1 1 1 ´1 Λ “ lim Vn ErSnpθ0q | Gs ΨΩnpθ0qΨ ErSnpθ0q | GsVn , nÑ8 1 Vn “ ErSnpθ0q | Gs ΨErSnpθ0q | Gs, 1 Ωnpθ0q “ ErnMnpθ0qMnpθ0q | Gs.

The main distinction relative to the standard i.i.d. case is the asymptotic variance Ωnpθ0q. Because Wi KK Wj | G if Aij “ 0 by construction of the dependency graph, the rs-th entry of this matrix equals 1 E m W , θ m W , θ G , where n i jPNi r rp i q sp j q | s mrp¨q is the rth component of mp¨q. This can be consistently estimated using the sample analog 1 m W , θ m Wř, θ řusing arguments similar to the proof of n i jPNi rp i q sp i q Proposition1. Thus, under regularity conditions, a consistent estimator for Λ is ř ř ´1 ´1 ˆ ˆ 1 ˆ ˆ 1 ˆ ˆ 1 ˆ ˆ 1 ˆ Λ “ Snpθq ΨSnpθq Snpθq ΨΩpθqΨ Snpθq Snpθq ΨSnpθq , ´ ¯ ´ ¯ ˆ ˆ 1 1 where Ωpθq “ n M AM, with ˆ ˆ m1pW1, θq ¨ ¨ ¨ mppW1, θq . . M “ ¨ . . ˛ . m pW , θˆq ¨ ¨ ¨ m pW , θˆq ˚ 1 n p n ‹ ˝ ‚ Linear Regression. It is a matter of accounting to provide primitive conditions for the existence of the limits and inverses in the expression for Λ. We next formally derive these conditions for a linear version of model (3) with the correlated effects structure of Assumption5. Similar derivations are obviously possible for any other GMM estimator. Suppose the response function rp¨q in model (3) is linear in each argument, so that (7) holds for

1 1 mpWi, θq “ XipYi ´ Xiθq,

12 where Xi “ p1,Di,Ti, γiq. This model relaxes Assumption1 to mean independence of εi. We will need some new restrictions on moments of G in place of Assumption7.

Assumption 8 (Network Moments III).

1 n p (a) n i“1 γi has a finite probability limit for p P t1, 2u. (b) 1 řn γp has a finite probability limit for p 1, 2 . n i“1 jPNi j P t u (c) 1 řn ř pγ ` 1qpγ ` 1qpγ ` 1qpγ ` 1q “ O p1q. n i“1 jPNi kPNiYNj lPNk i j k l p

12 Noteř thatřTi,Ui,ř γi are collinear,ř since γi “ Ti `Ui. Because the derivations are slightly simpler, we will include γi as a regressor, rather than Ui.

14 Treatment and Spillover Effects Parts (a) and (b) are needed to establish the existence of the limiting asymptotic vari- ance. Part (c) plays the role of fourth moment assumptions used to show consistency of variance estimators. The next proposition establishes a central limit theorem. Let X be the n ˆ 4 matrix of regressors Xi.

Proposition 3 (CLT). Under Assumption8(a), there exists a matrix V such that 1 1 p n ErX X | Gs ÝÑ V . Let ρijpX,Aq “ Erεiεj | Xi,Xj,Aij “ 1s. Suppose there exists a matrix S such that n 1 p E ρ pX,AqX X1 G ÝÑ S (8) n ij i j i“1 jPNi ÿ ÿ “ ˇ ‰ If V and S are positive-definite, then under Assumptionˇ 6, ? ˆ d ´1 ´1 npθ ´ θ0q ÝÑ N p0,V SV q.

Primitive conditions for (8) can be established in the homoskedastic case.

Proposition 4 (Homoskedastic Case). Suppose there exist constants s, c such that ρiipX,Aq “ s for all i, and ρijpX,Aq “ c for all i ‰ j. Then (8) holds under parts (a) and (b) of Assumption8.

Proposition 5 (Variance Estimator). Under parts (a) and (c) of Assumption8, 1 1 p 1 ˆ n X X ÝÑ V . Let εˆ “ Yi ´ Xiθ and M be the n ˆ 4 matrix whose ith row equals 1 4 Xiεˆi. If E εi Xi is uniformly bounded and (8) and Assumption8(c) hold, then

“ ˇ ‰ 1 p ˇ Sˆ ” M1AM ÝÑ S. n

Thus, robust standard errors for θˆ are given by

Σˆ ” pX1Xq´1M1AMpX1Xq´1. (9)

This estimator is not necessarily positive semidefinite. However, we can employ a fix ˜ ˆ similar to (6). Let A “ A ` I, where I is the n ˆ n identity matrix. Construct Σ ˜ by replacing A in (9) with A. We propose using the estimator

ˆ ˚ ˆ Σ “ min Σ Á 0 :  ě 0 , ! ) where M Á 0 means the matrix M is positive semidefinite. The minimum is well ˆ ˚ defined because Σ˚ is positive semidefinite for  “ maxi |Ni|. ˜ Another, simpler approach is to construct A by adding |Ni| ´ 1 to the iith entry of A for each i and replacing A with A˜ in (9). However, this estimator may be quite conservative.

15 Michael P. Leung 4 Monte Carlo

For our first simulation exercise, we estimate the ASF µpd, t, uq and ATE µpd, t, uq ´ µpd1, t1u1q, where 1p¨q is the identity function, pd, t, uq “ p1, 2, 2q, and pd1, t1, u1q “ p0, 2, 2q. Outcomes are realized according to the model

2 Yi “ θi1 ` θi2Di ` θi3Ti ` θi4Ti ` θi5Tiγi,

iid where the parameters are independent and distributed as follows: θi1 „ N p1, 1q, iid iid θi2, θi4, θi5 „ Up0, 2q and θi3 „ Up´1, 3q. The network G is a random geometric graph. That is, nodes are endowed with n 2 positions tρiui“1 drawn i.i.d. uniformly from r0, 1s , and Gij “ 1t||ρi ´ ρj|| ď rnu 4 ´1{2 2 4 where rn “ 2.75n . Note that nrn “ 2.75 , so Assumption9(b) is satisfied. The 4 meaning of the fraction 2.75 is (1) it exceeds the percolation threshold so that a giant component exists` ˘ with high probability,13 and (2) the limiting expected degree is π times this quantity, which is approximately 4.7 and thus close to t ` u “ 4. The results for the ASF are displayed in Table1. In each table, the rows “ASF,” “SE,” “Coverage,” “% Giant,” and “Cell Count” are averaged over 1000 simulations. n “Cell Count” equals i“1 1ip1, 2, 2q, the number of “effective” observations for the standard frequency estimator. “% Giant” is the proportion of nodes in the giant component. In no simulationsř did the variance estimators turn out negative. The standard frequency estimator is unbiased, so unsurprisingly, the entries in the “ASF” column of Table1 are close to the desired value of µp1, 2, 2q “ 16 across all sample sizes. When the effective sample size (cell count) is small, the standard frequency estimator severely undercovers at 90 percent, but this undercoverage dis- appears in larger sample sizes, as expected. Similar results for the ATE are displayed in Table2. For our second exercise, we consider the linear model

1 Yi “ Xiθ0 ` εi,

iid where Xi “ p1,Di,Ti, γiq, θ0 “ p1, 2, 1.5, 0.5q, and εi „ N p0, 1q. Table3 shows simulation results for the linear regression estimator with robust standard errors (9) for 1000 simulations and n “ 1000. In no simulations did the variance estimators turn out negative.

13 d In the sparse regime of random geometric graphs, nrn Ñ λ ă 8, and percolation requires 2 4 λ ą 1.44 when positions lie in R (Penrose, 2003, p. 189). In our case, λ “ 2.75 « 1.45, which exceeds the necessary threshold.

16 Treatment and Spillover Effects

Table 1: ASF

Cell Count 17.35 34.88 175.94 352.49 ASF 16.00 16.01 16.00 16.00 SE 1.33 0.96 0.43 0.31 Coverage 0.91 0.93 0.95 0.96 % Giant 0.57 0.54 0.52 0.53 n 500 1k 5k 10k

Table 2: ATE

Cell Count L 17.35 34.88 175.94 352.49 Cell Count R 17.66 34.69 175.75 351.75 ATE 0.95 1.01 0.99 1.01 SE 1.90 1.37 0.61 0.43 Coverage 0.92 0.94 0.94 0.94 n 500 1k 5k 10k

“Cell Count L” is the cell count for µˆp1, 2, 2q, while “Cell Count R” is the count for µˆp0, 2, 2q.

Table 3: Linear Regression, n “ 1000.

θˆ 1.00 2.00 1.50 0.50 SE 0.08 0.06 0.03 0.02 Coverage 0.95 0.95 0.94 0.94

17 Michael P. Leung 5 Conclusion

This paper studies identification and estimation of treatment and spillover effects when SUTVA is violated. We depart from the literature by only requiring the obser- vation of a single large network in order to approximate the finite-sample reality in which only a small number of independent networks are typically observed. Focusing on the case of randomized assignment, we show that the average and quantile struc- tural functions are identified under common shape restrictions on the treatment re- sponse function. Despite the fact that observations are dependent and non-identically distributed conditional on the network, we provide conditions under which the non- parametric frequency estimators and GMM estimators consistent and asymptotically normal and construct new consistent variance estimators. While we focus on network-driven sources of dependence, these results are more broadly applicable to other forms of heterogeneous cross-sectional dependence includ- ing spatial correlation and clustering with overlapping groups. We also anticipate that it is straightforward to extend these results to unconfounded treatment assignment and quantile effects using existing approaches in the treatment effects literature.

A Appendix

A.1 Network Formation This section discusses models of network formation that satisfy Assumptions4(b),6, and7. d The setup follows Leung(2016) (see his Assumption 2). For n P N, let Nn Ď R be a set of n i.i.d. random vectors drawn from a common density fp¨q. We associate each node with a unique element of Nn, which we term its position. Because positions are almost surely unique, it will be convenient to label nodes according to their positions, so that we can view Nn as the set of nodes. Hence, in this section i P Nn represents both a node as well as its d position in R . In some models, a node’s position carries no economic meaning and simply functions as a label. In others, position can represent a vector of homophilous attributes. For instance, in a random geometric graph, nodes are linked if and only if the distance between their positions falls below some threshold rn. If position represents geographic location, then this network exhibits geographic homophily. In contrast, if links are simply drawn i.i.d. from some Bernoulli distribution, then positions are merely labels. Each i P Nn is endowed with attributes νi, and each i, j P Nn is endowed with pair-level d attributes ηij. Let ζij ” ζpi, j, Nnq be the mapping pi, jq ÞÑ pνi, νj, ηijq. For any X Ď R , let tνi; i P X u be i.i.d. random vectors, and let tηij; i, j P X u be independently distributed ´1 ´1 random vectors that satisfy stationarity: for any i, j, k, l P X satisfying rn ||i´j|| “ rn ||k´ d l||, we have ηij “ ηkl, where || ¨ || is some norm on R . Stationarity is innocuous if positions are labels. If positions represent locations, then this is analogous to the usual stationary condition imposed in time series and spatial econometrics. The network G ” GpNn, ζq is formed according to the following random graph model:

´1 Gij “ 1 u rn ||i ´ j||, ζij ě 0 (10) ` ˘ ( 18 Treatment and Spillover Effects

κ 1{d for any i, j P Nn, where rn “ n with κ ă 8. Because G is undirected, we must have ζ “ ζ with probability one. ij ji ` ˘ We assume the latent index up¨q also satisfies the following restrictions.

Assumption 9.

d (a) For any fixed i, j P R , P u ||i ´ j||, ζij ď 0 νi ą 0. (b) (Sparsity) sup sup n`E`G i, ν 㢠8. ˇ ˘ nPN i,νi ij i ˇ “ ˇ ‰ Leung(2016) discuss a number of canonicalˇ random graph models that are special cases of this model. These include the Erdős-Rényi model, stochastic block model, random geometric graph model, and random connection model. In all of these examples, Assumption9(a) holds under very mild conditions. For instance, if up¨q is additively separable in ηij, which has full support, and the remaining part of up¨q is either bounded above or below, then the assumption is satisfied. Assumption9(b) is substantive. It ensures that the graph is sparse in the sense that the expected degree tends to a finite constant. This is a desirable property, as it is well-known that most social networks are sparse (Barabási, 2015; Chandrasekhar, forthcoming).

Proposition 6. Let A be the dependency graph such that Aii “ 1, and for j ‰ i, Aij “ 1 if and only if either Gij “ 1 or maxk GikGjk “ 1. Then Assumptions4 (b),6, and7 are satisfied under Assumption9.

Remark 7 (Strategic Network Formation). Our results also generalize to strategic models of network formation for G for which there exists a random graph Π such that G Ď Π with probability one, and Π is generated by model (10). Such models are discussed in Leung(2016). No restrictions on strategic interactions are necessary. Under this model, the moments in Assumptions4(b) and7 may not necessarily have probability limits, but they are still bounded in probability. Then although the various probability limits derived in the appendix may not exist, consistency still holds. Provided that the variance is asymptotically n ´1 nondegenerate, i.e. vh pd, t, uq “ Opp1q, a CLT also holds.

Moment 1: Consider 1 1 γ γ . Since each γ is Proof of Proposition6. n iPNn t i “ u i determined by Nn, ζ, and rn, there exists a functional ξ such that ξpi, Nn, ζ, rnq “ 1tγi “ γu. We verify the conditions of Theorem 3 in Leung(2016). Itř is evident that the summands are uniformly bounded. Also, ζ satisfies the same restrictions as φ in §A.3 of Leung(2016). d It remains to verify strong stability. Let X Ď R be a random, locally finite set of nodes. For i P X , let Nkpi, X , ζq be the k-neighborhood of i, the set of nodes j P X such that the length of the shortest path from i to j in the network GpX , ζq is at most k. We can write n Xi “ ξpi, X ζ, rnq, where ξ depends on GpX , ζq only through N1piq. We will show that ξ is ζ-strongly stabilizing on Pτ for any τ “ κfpiq with i P supppfq (see Definition 4 in Leung, 2016). Let R ” RpX q be the smallest radius (possibly infinite) large enough such that N1pi, X , ζq Ď Bpi, Rq, the ball of radius R centered at i. Let dν and dη be the respective dimensions of

19 Michael P. Leung

νi and ηij. Using Assumption9(a), for almost all pi, j, νi, νjq, there exist random functions d dν dν d dν dη ve, va : R ˆ R Ñ R and ηe, ηa : R ˆ R Ñ R independent of pi, j, X , ζq satisfying

max u ||i ´ j||, pνi, vepi, νiq, ηepi, νiqq , u ||i ´ j||, pvapj, νjq, νj, ηapj, νjqq ď 0.

Then define ` ˘ ` ˘(

pνi, νj, ηijq if j, k P X X Bpi, Rq d ˚ pνi, νj, ηijq if j, k P R zBpi, Rq φ j, k, X “ $ d ’ pνj, vepj, νjq, ηepj, νjqq if j P X X Bpi, Rq, k P R zBpi, Rq &’ d ` ˘ pvapk, νkq, νk, ηapk, νkqq if j P R zBpi, Rq, k P X X Bpi, Rq

’ ˚ By construction, N1p%i, X , ζq “ N1pi, X , φ q. Also, RpPτ q ă 8 a.s. since node degrees ˚ are a.s. finite by Assumption9(b), so N1pi, Pτ , ζq “ N1pi, Pτ X Bpi, RpPτ qq Y A, φ q for d any A Ď R zBpi, RpPτ q locally finite. Since ξpi, X , φq only depends on i, X , φ through N1pi, X , φq, we have shown that ξ is φ-strongly stabilizing, as desired. Moment 2: Consider

1 n 1 n 1 n 1tγ “ γu1tγ “ γu “ 1tγ “ γu ` G 1tγ “ γu1tγ “ γ1u n i j n i n ij i j i“1 jPN i“1 i“1 j N i ÿ ÿi ÿ ÿ Pÿiz n 1 1 ` p1 ´ Gijq max GikGjk1tγi “ γu1tγj “ γ u. (11) n k i“1 j N i ÿ Pÿiz We derive probability limits for each of these three terms. The first term is the same as moment 1 above. Under the definition of Ni in §3, the second term equals

1 n 1 G 1tγ “ γu1tγ “ γ1u “ G 1tγ “ γu1tγ “ γ1u. n ij i j n ij i j i“1 j‰i iPN j‰i ÿ ÿ ÿn ÿ

We show that the right-hand side converges in probability to some m1pγq. The summands p (over the index i) are bounded by γi, so we verify that supn Erγi s ă 8 for p “ 3:

3 m 3 E γi “ ErGijGikGils ď 3 n sup ErGij | i, νis , j,k,l m“1 ˜ i,νi ¸ “ ‰ ÿ ÿ which is finite by Assumption9(b). Lastly, ζ-strong stability follows from the arguments for moment 1 above, replacing N1piq with N2piq, since almost-sure finiteness of node degrees implies almost-sure finiteness of 2-neighborhoods. Finally we consider the third term of (11). Under the definition of Ni in §3,

n 1 1 p1 ´ Gijq max GikGjk1tγi “ γu1tγj “ γ u n k i“1 j‰i ÿ ÿ 1 1 “ p1 ´ Gijq max GikGjk1tγi “ γu1tγj “ γ u n k iPN j‰i ÿn ÿ

20 Treatment and Spillover Effects

We show that the right-hand side converges in probability to some m2pγq. Write the right- hand side as 1 Xn. This is nonnegative and bounded by 1 G G n iPNn i n iPNn j‰i k ik jk ď 2 1 G 1 Y n. Since 1 Xn 1 Y n with probability one n iPNn j‰i řij ” n iPNn i n iPNn i ď n iPNnř i ř ř n 3 forř all n ´andř the summands¯ ř are identically distributed,ř we argueř that supn ErpY1 q s ă 8 implies sup E Xn 3 . Suppose otherwise. Then sup E 1 Xn 3 . But n rp 1 q s ă 8 n n iPNn i “ 8 sup E Y n 3 implies sup E 1 Y n 3 by the” arithmetic-geometricı mean n rp 1 q s ă 8 n n iPNn i ă 8 ` ř ˘ inequality. This contradicts the fact” that 1 ıXn ď 1 Y n with probability one ` ř n iPN˘n i n iPNn i for all n. ř 2p ř Therefore, it is enough to verify that supn Erγi s ă 8 for p “ 3 to verify the uniform integrability condition. We have 6 m 6 E γi ď c n sup ErGij | i, νis m“1 ˜ i,νi ¸ “ ‰ ÿ for some c ă 8. This expression is finite by Assumption9(b). Lastly, ζ-strong stability follows from the arguments for moment 1 above, replacing N1piq with N3piq, since almost-sure finiteness of node degrees implies almost-sure finiteness of 3-neighborhoods. Moment 3: Consider 1 1 1 1 |N | ď G ` G G “ γ ` γ2. n i n ij ik kj n i n i iPN iPN jPN ˜ k ¸ iPN iPN ÿn ÿn ÿn ÿ ÿn ÿn To see that this is Opp1q, we take expectations: 1 n 1 n n Erγ s “ ErG s ď n sup ErG | is, n i n ij ij i“1 i“1 j“1 i,νi ÿ ÿ ÿ which is finite by Assumption9(b). Likewise, 1 n Erγ2s “ n2E E G G i, ν n i ij ik i i“1 ÿ 2 “ “ ˇ ‰‰ “ n E E Gij i, νˇi E Gik i, νi 2 “ “ ˇ ‰ “ ˇ ‰‰ ď n sup E Gˇij i, νi 㡠8. ˜ i,νi ¸ “ ˇ ‰ Moment 4: Consider 1 N 2. This boundedˇ above by n iPNn | i| 2 1ř n n n n G ` G G . n ij ik kj i“1 ˜j“1 j“1 k“1 ¸ ÿ ÿ ÿ ÿ As with moment 3, we will take expectations to prove that this is Opp1q. For brevity, we will only consider the term 1 E rG G G G s . (12) n ik kj il lm i,j,k,l,m ÿ 21 Michael P. Leung

The argument for the other terms is similar. Consider first the quintuple summation when i ‰ j ‰ k ‰ l ‰ m. In this case, the summands equal

E E GikGkj i, νi E GilGlm i, νi “ “ “ ˇE E‰ E“ GikGkjˇ i, νi‰‰, k, νk i, νi E GilGlm i, νi, l, νl i, νi ˇ ˇ ď sup“E “Gkj“ k, νk supˇ E Glm‰ ˇl, νl ‰E “E Gik i,ˇ νi E G‰il ˇ i, νi‰ k,νk ˇl,νl ˇ ˇ ˇ “ ˇ ‰ “ ˇ ‰ “ “ ˇ ‰ “ ˇ ‰‰ 4 ˇ ˇ ˇ ˇ ď sup E Gij i, νi . ˜ i,νi ¸ “ ˇ ‰ ˇ Hence, 1 E G G G G n sup E G i, ν 4, which is finite by As- n i‰j‰k‰l‰m r ik kj il lms ď i,νi ij i sumption9(b). ř ` “ ˇ ‰˘ We consider one final case, as the others are similar but tedious,ˇ which is the quadruple 1 summation when i ‰ j ‰ k ‰ m but j “ l. n i‰j‰k‰m ErGikGkjGijGjms. The summands equal ř E E GikGkjGij i, νi, j, νj E Gjm i, νi, j, νj ď E rGikGijs sup E Gjm j, νj j,νj “ “ ˇ ‰ “ ˇ ‰‰ “ 3 ˇ ‰ ˇ ˇ ˇ ď sup E Gij i, νi . ˜ i,νi ¸ “ ˇ ‰ ˇ Hence, 1 E G G G G n sup E G i, ν 3 . n i‰j‰k‰m r ik kj ij jms ď i,νi ij i ă 8 Moment 5: Consider ř ` “ ˇ ‰˘ ˇ 1 1 pA3q ď G G G . n ij n ij jk kl iPN j‰i iPN ˜jPN kPN lPN ¸ ÿn ÿ ÿn ÿn ÿn ÿn As with moments 3 and 4, to show this is Opp1q, we take expectations. We will only consider 1 the derivation for n i‰j‰k‰l ErGijGjkGkls, as the others cases are similar. The summand equals ř E E GijGjk k, νk E Gkl k, νk ď E rGijGjks sup E Gkl k, νk k,νk “ “ ˇ ‰ “ ˇ ‰‰ “ ˇ ‰ ˇ ˇ “ E E Gij j, νj E Gjkˇ j, νj sup E Gkl k, νk k,νk “ “ ˇ ‰ “ 3 ˇ ‰‰ “ ˇ ‰ ˇ ˇ ˇ ď sup E Gij i, νi . ˜ i,νi ¸ “ ˇ ‰ ˇ Hence, 1 E G G G n sup E G i, ν 3 . n i‰j‰k‰l r ij jk kls ď i,νi ij i ă 8 ř ` “ ˇ ‰˘ ˇ A.2 CLT for Local Dependence

Let Z1,...,Zn, be real-valued random variables. A nˆn binary symmetric adjacency matrix A is a dependency graph on Z if for any two disjoint subsets I1,I2 Ď t1, . . . , nu, we have

22 Treatment and Spillover Effects tZi; i P I1u KKtZi; i P I2u conditional on the event that Aij “ 0 for all i P I1 and j P I2. Notice that Aii “ 1 for all i, and we allow A to be random. Let B be a σ-field containing 2 n the σ-field generated by A. Define σn “ Var i“1 Zi B . Assume ErZi | As “ 0 for all i. The usual CLTs for local dependence (Penrose, 2003; Ross, 2011) assume that the max- `ř ˇ ˘ imum degree of the dependency graph maxi |Ni| is ofˇ sufficiently small asymptotic order. n For instance, if 2 “ Opp1q, then maxi |Ni| “ oppnq is necessary for a CLT using the usual σn bounds derived via Stein’s method. We instead derive restrictions on moments of the de- pendency graph rather than requiring uniformly bounded degree.

d Theorem 4. Under the following assumptions, 1 n Z ÝÑ N p0, 1q. σn i“1 i n (a) 2 “ Opp1q. ř σn 4 (b) maxi E |Zi| | B “ Opp1q. ? 1 n “ ‰ 1 n 2 1 n 3 (c) n i“1 |Ni| “ oppnq, n i“1 |Ni| “ opp nq, and n i“1 j‰ipA qij “ oppnq. ř ř ř ř Part (a) requires that the conditional variance is nondegenerate. Part (c) restricts various moments of the dependency graph, ensuring that observations are not too dependent. The notion of convergence here is convergence in distribution “in probability.” That is, we first bound the distance between the laws of 1 n Z and a standard normal random σn i“1 i variable as a function of A, and then show that the bound converges in probability to zero. ř Proof of Theorem4. We follow the proof of Theorem 3.5 in Ross(2011). For any two random variables U, V with respective conditional probability laws µp¨ | Bq and νp9| Bq, define their conditional Wasserstein distance

∆pU, V | Bq “ sup hpxqdµpx | Bq ´ hpxqdνpx | Bq , hPH ˇ ż ż ˇ ˇ ˇ ˇ ˇ where H “ th : R Ñ R : |hpxq ´ hpyq|ˇ ď |x ´ y|u. Convergence of ∆pU,ˇ V | Bq implies con- vergence of the Kolmogorov metric distance between U and V (see e.g. Ross, 2011, Propo- sition 1.2), which in turn implies weak convergence. Thus, the theorem holds if

n 1 p ∆ Z , N p0, 1q B ÝÑ 0. σ i ˜ n i“1 ˇ ¸ ÿ ˇ ˇ Throughout the proof, expectations, variances, andˇ the Wasserstein distance are conditional on B. Let F “ f : Ñ : ||f||, ||f 2|| ď 2, ||f 1|| ď 2{π and S “ 1 n Z . By Stein’s R R σn i“1 i lemma (e.g. Ross, 2011, Theorem 3.1), ! a ) ř ∆ pS, N p0, 1qq ď sup E f 1pSq ´ SfpSq . (13) fPF ˇ “ ‰ ˇ Thus, we seek to bound the right-hand side ofˇ this expression. ˇ

23 Michael P. Leung

Let S X . By equation (3.9) of Ross, i “ jRNi j ř 1 n E f 1pSq ´ SfpSq ď E Z fpSq ´ fpS q ´ pS ´ S qf 1pSq σ i i i ˇ « n i“1 ff ˇ ˇ “ ‰ ˇ ˇ ÿ ` ˘ ˇ ˇ ˇ ˇ 1 n ˇ ˇ ` E f 1pSq 1 ´ Z ˇpS ´ S q . (14) σ i i ˇ « ˜ n i“1 ¸ff ˇ ˇ ÿ ˇ ˇ ˇ Label the two terms on the right-hand side rIs ˇand rIIs. ˇ Term rIs is straightforward to bound. By a Taylor expansion,

||f 2|| n rIs ď E |Z |pS ´ S q2 2σ i i n i“1 ÿ “ ‰ n n 1 3 1 2 ď 3 E r|ZiZjZk|s ď max E |Zi| 3 |Ni| , (15) σn i σn i“1 j,kPNi i“1 ÿ ÿ “ ‰ ÿ where the second inequality follows from the AM-GM inequality. Turning to term rIIs,

1{2 2 n rIIs ď Var Z Z . (16) πσ4 i j ˜ n ˜i“1 jPN ¸¸ ÿ ÿi We seek to bound the variance on the right-hand side. First notice

n 2 E ZiZj “ ErZiZjZkZls » fi ˜i“1 jPN ¸ i‰j kPN lPN ÿ ÿi ÿ ÿi ÿj – fl n n 2 2 2 ` E Zi Zj ` E Zi ZjZk . (17) i“1 jPNi i“1 jPNi kPN ztju ÿ ÿ “ ‰ ÿ ÿ ÿi “ ‰ The sum of the second and third terms on the right-hand side of (17) are bounded above by

n 4 2 max E Zi |Ni| ` |Ni| . (18) i i“1 “ ‰ ÿ ` ˘ by the AM-GM inequality. The first term on the right-hand side of (17) is bounded above by14 n 4 σn ` pE rZiZjZkZls ´ ErZiZksErZjZlsq . (19) i“1 jPN kPN YN lPN YN YN ÿ ÿi ÿi j i ÿj k Note that for a given i, the inner triple sum is over all four-node connected components in A that contain i. Hence, using the AM-GM inequality, the quadruple sum is bounded above

14See Ross equation (3.15).

24 Treatment and Spillover Effects by

n 4 max 2E Zi p|Ni| ` |Nj| ` |Nk|q i i“1 jPNi kPNiYNj “ ‰ ÿ ÿ ÿ n n n 4 2 ď max 2E Zi 2 |Ni| ` 2 |Ni| |Nj| ` |Nk| . (20) i ¨ ˛ i“1 i“1 jPNi i“1 jPNi kPNiYNj “ ‰ ÿ ÿ ÿ ÿ ÿ ÿ ˝ ‚ Notice n n 2 1 1 3 |Ni| |Nj| “ G 1n pG1nq “ 1nG 1n “ |Nk|. i“1 jPNi i“1 jPNi kPNiYNj ÿ ÿ ` ˘ ÿ ÿ ÿ Furthermore, n n n 2 |Nk| ď |Ni| ` |Nk|. i“1 jPN kPN YN i“1 i“1 jPN k N N ÿ ÿi ÿi j ÿ ÿ ÿi P ÿj z i Thus,

n n 4 2 (20) ď max 2E Zi 2 |Ni| ` 3 |Nk| i ¨ ˛ i“1 i“1 jPNi kPNiYNj “ ‰ ÿ ÿ ÿ ÿ ˝ n n ‚ 4 2 ď max 2E Zi 5 |Ni| ` 3 |Nk| i ¨ ˛ i“1 i“1 jPNi kPN zN “ ‰ ÿ ÿ ÿ ÿi j ˝ n ‚n 4 2 3 “ max 2E Zi 5 |Ni| ` 3 pA qij . (21) i ˜ i“1 i“1 j‰i ¸ “ ‰ ÿ ÿ ÿ Therefore, combining (16), (17), (18), (19), (20), and (21),

1{2 n n n 1{2 2 4 1 2 3 rIIs ď max E Zi |Ni| ` 11 |Ni| ` 6 pA qij . π i σ2 ˆ ˙ n ˜i“1 i“1 i“1 j‰i ¸ “ ‰ ÿ ÿ ÿ ÿ Combining this with (13), (14), and (15), we obtain

σ2 ´3{2 1 n ∆ pS, N p0, 1qq ď n´1{2 n |N |2 n n i ˆ ˙ i“1 ÿ 1{2 σ2 ´1 2 1 n 11 n 6 n ` n´1{2 n |N | ` |N |2 ` pA3q , n π n i n i n ij c ˜ i“1 i“1 i“1 j‰i ¸ ˆ ˙ ÿ ÿ ÿ ÿ ´1{2 which is Oppn q by assumptions (a)–(c) of the theorem.

25 Michael P. Leung

Lemma A.1 (Conditional Slutsky). For n P N, let Bn be a sub-σ-algebra of the Borel σ- k k p algebra on R , and let Xn,Yn,X be R -valued random vectors. If ∆pXn,X | Bnq ÝÑ 0 and p d Yn ÝÑ c for a constant c, then YnXn ÝÑ cX in the usual sense of weak convergence.

` Proof. For n P N, let Fn be a sub-σ-algebra of the Borel σ-algebra on R , and let Wn,Vn,V ` p p be R -valued random vectors. First, we show that |Wn ´ Vn| ÝÑ 0 and ∆pVn,V | Fnq ÝÑ 0 d ` imply Wn ÝÑ V in the usual sense of weak convergence. For some M, let f : R Ñ R satisfy ||f|| ă M and |fpxq ´ fpyq| ď |x ´ y|. Then for some  ą 0,

|ErfpWnqs ´ ErfpVnqs|

ď E r|fpWnq ´ fpVnq|1t|Wn ´ Vn| ă us ` E r|fpWnq ´ fpVnq|1t|Wn ´ Vn| ě us

ď E r|Wn ´ Vn|1t|Wn ´ Vn| ă us ` E r2M1t|Wn ´ Vn| ě us

ď  ` 2MP p|Wn ´ Vn| ě q .

Therefore,

|ErfpWnqs ´ ErfpV qs| ď |ErfpWnqs ´ ErfpVnqs| ` |ErfpVnqs ´ ErfpV qs|

ď  ` 2MP p|Wn ´ Vn| ě q ` E r|ErfpVnq | Fns ´ ErfpV q | Fns|s , which tends to zero by hypothesis. Therefore, we have shown that ∆pWn,V q Ñ 0, which implies weak convergence. Now let Wn “ pXn,Ynq, Vn “ pXn, cq, and V “ pX, cq. By the hypotheses of the lemma, p p we have |Wn ´ Vn| ÝÑ 0 and ∆pVn,V | Bnq ÝÑ 0. The result then follows from the above result and the continuous mapping theorem.

A.3 Proofs of Main Results Proof of Theorem1. The “if” direction is obvious. For the “only if” direction, fix any 1 1 1 1 nodes i and j. Suppose D,D , G, G satisfy t ” k DkGik “ k DkGjk and kp1´DkqGik “ 1 1 1 p1 ´ Dkq G . It follows that m ” Gik “ G . Let a1, . . . , at be the nodes linked k jk k ř k jk ř ř to i in G for which D “ 1, and let at`1, . . . , am be the nodes linked to i in G for which ř ř ř D “ 0. Similarly, let b1, . . . , bt and bt`1, . . . , bm be the nodes linked to j in G for which D “ 1 and D “ 0, respectively. Let π : t1, . . . , nu Ñ t1, . . . , nu be the bijection that maps j to i and aq to bq for all 2 1 1 q “ 1, . . . , m but leaves all other nodes unchanged. Let D “ pDπp1q,...,Dπpnqq and

1 1 Gπp1qπp1q ¨ ¨ ¨ Gπp1qπpnq 2 . . G “ ¨ . . ˛ . G1 ¨ ¨ ¨ G1 ˚ πpnqπp1q πpnqπpnq‹ ˝ ‚ Then by Assumption3, r j, D1,G1, e “ r i, D2,G2, e (22) for any e P supppε1q. ` ˘ ` ˘

26 Treatment and Spillover Effects

2 1 2 1 Now, by construction of D , we have Dj “ Di , and since Di “ Dj by assumption, it follows that D D2. By construction of a and b , it is the case that D D1 and i “ i q q aq “ bq G G1 1 for all q 1, . . . , m, and G 0 for all k a , . . . , a . Furthermore, iaq “ jbq “ “ ik “ R t 1 mu by construction of D2, we have D2 D1 and G2 G1 1 for all q 1, . . . , m, aq “ bq iaq “ jbq “ “ 2 2 and Gik “ 0 for all k R ta1, . . . , amu. Then it follows that for all nodes k, Gik “ Gik and 2 2 DiGik “ Di Gik. Therefore by Assumption2,

rpi, D, G, eq “ r i, D2,G2, e (23) ` ˘ for any e P supppε1q. Equations (22) and (23) yield

rpi, D, G, eq “ r j, D1,G1, e , as desired. ` ˘

Lemma A.2 (Expectations). Maintain Assumptions1-4 and6. Suppose for hp¨q P th1p¨q, h2p¨qu, 2 E hprpd, t, u, εqq ă 8. For any d P t0, 1u, γ P Γ, and t, u P N such that t ` u “ γ, “ n ‰ 1 p E hpY q1 pd, t, uq G ÝÑ µ pd, t, uqP pBinpγ, pq “ tq pdp1 ´ pq1´dP pγq, (24) n i i h i“1 ÿ “ ˇ ‰ ˇ where p “ PpD1 “ 1q and Binpγ, pq is a binomial random variable with parameters pγ, pq. Furthermore, the following quantities have finite probability limits:

1 n • n i“1 Var hpYiq1ipd, t, uq G , 1 n • ř ` E h1pYiq1ˇipd,˘ t, uq G E h2pYjq1jpd, t, uq G , and n i“1 jPNiztiu ˇ • 1 řn ř E “h Y h Y 1 d,ˇ t,‰ u 1“ d, t, u G . ˇ ‰ n i“1 jPNiztiu 1p iq 2p jq ipˇ q jp q ˇ ř ř “ ˇ ‰ Proof. We first show (24). For any t, u such that t ` u “ˇ γ P Γ,

E rhpYiq1ipd, t, uq | Gs “ E E h prpd, t, u, εqq D,G 1ipd, t, uq G “ “ “ µhpd,ˇ t, uqP‰pDi “ d, Ti ˇ“ t‰| γi “ γq1tγi “ γu (25) ˇ ˇ where the last line uses Assumptions1-3, as well as the fact that Ti ` Ui “ γi. Since treat- d 1´d ments are i.i.d., by Assumption1, PpDi “ d, Ti “ t | γi “ γq “ P pBinpγ, pq “ tq p p1 ´ pq . Then (24) follows from Assumption4. Next, we turn to bulletpoints. Finiteness of the limits that appear below will follow from the finite second moments assumption E hprpd, t, u, εqq2 ă 8. Using (25),

n “ ‰ 1 p Var hpY q1 pd, t, uq G ÝÑ E hprpd, t, u, εqq2 P pBinpγ, pq “ t, D “ dq n i i 1 i“1 ÿ ` ˇ ˘ ` “ 2 ‰ 2 ˇ ´ µhpd, t, uq P pBinpγ, pq “ t, D1 “ dq P pγq. (26) ˘ 27 Michael P. Leung

1 Note that Assumptions4(b) and7 imply that for any γ, γ P Γ, there exist m1pγq, m2pγq ă 8 such that n 1 p G 1tγ “ γu1tγ “ γ1u ÝÑ m pγq, and n ij i j 1 i“1 j N i ÿ Pÿiz n 1 1 p p1 ´ Gijq max GikGjk1tγi “ γu1tγj “ γ u ÝÑ m2pγq. n k i“1 j N i ÿ Pÿiz Thus similar to (26), we have

1 n E h pY q1 pd, t, uq G E h pY q1 pd, t, uq G n 1 i i 2 j j i“1 jPN ztiu ÿ ÿi “ ˇ ‰ “ ˇ ‰ p ˇ ˇ 2 ÝÑ µh1 pd, t, uqµh2 pd, t, uqP pBinpγ, pq “ t, D1 “ dq pm1pγq ` m2pγqq . (27)

For i ‰ j,

E h1pYiqh2pYjq1ipd, t, uq1jpd, t, uq G “ “ E h1prpd, t, u, ε1qqh2prpd, t,ˇ u,‰ ε2qq G12 “ 1 E Gij1ipd, t, uq1jpd, t, uq G ˇ “ ˇ ‰ “ rIs ˇ ‰ ˇ ˇ ` µh1 pd, t, uqµh2 pd, t, uq E p1loooooooooooooooooomoooooooooooooooooon´ Gijq1ipd, t, uq1jpd, t, uq G , (28) “ rIIs ˇ ‰ ˇ loooooooooooooooooooooomoooooooooooooooooooooon uni 2pγ´1´kq fof k by Assumptions1-3. Let tDi ui“1 , tDi ui“1 be i.i.d. Bernoullippq random variables that are jointly independent and independent of D. Since treatments are i.i.d., by Assump- tion1,

γ´1 2pγ´1´kq k uni fof rIs “ Gij1tγi “ γj “ γu P D ` 2 D “ 2pt ´ dq,D1 “ D2 “ d , ¨ l m ˛ k“0 l“1 m“1 ÿ ÿ ÿ ˝ ‚ βI pd,t,uq looooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooonγ 2pγ´1´kq k uni fof rIIs “ p1 ´ Gijq1tγi “ γj “ γu P D ` 2 D “ 2t, D1 “ D2 “ d . ¨ l m ˛ k“0 l“1 m“1 ÿ ÿ ÿ ˝ ‚ βII pd,t,uq loooooooooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooooooooon Thus, the probability limit of 1 n E h Y h Y 1 d, t, u 1 d, t, u G equals n i“1 jPNiztiu 1p iq 2p jq ip q jp q ř ř “ ˇ ‰ E h1prpd, t, u, ε1qqh1prpd, t, u, ε2qq G12 “ 1 βI pd, t, uqm1pγq ˇ µ d, t, u µ d, t, u β d, t, u m γ . (29) “ ˇ `‰ h1 p q h2 p q II p q 2p q ˇ

28 Treatment and Spillover Effects

2 Lemma A.3. Under the assumptions of Lemma A.2, there exists σhpd, t, uq ă 8, defined in (30), such that

n 1 p Var ? hpY q1 pd, t, uq G ÝÑ σ2pd, t, uq. n i i h ˜ i“1 ˇ ¸ ÿ ˇ 1 n ˇ It follows that n i“1 hpYiq1ipd, t, uq converges inˇL2 to the right-hand side of (24) at rate n´1{2. ř Proof. Compute:

1 n 1 n nVar hpY q1 pd, t, uq G “ Var hpY q1 pd, t, uq G n i i n i i ˜ i“1 ˇ ¸ i“1 ÿ ˇ ÿ ` ˇ ˘ ˇ 1 n ˇ ˇ ` Cov hpY q1 pd, t, uq, hpY q1 pd, t, uq G . n i i j j i“1 jPN ztiu ÿ ÿi ` ˇ ˘ ˇ Using the calculations in Lemma A.2, we have

2 σhpd, t, uq “ (26) ` (29) ´ (27). (30) This is finite since E hprpd, t, u, εq2 ă 8 by assumption. Convergence of the mean follows from Lemma A.2. “ ‰

Proof of Theorem2. This is a consequence of applying Lemma A.3 to the numerator and denominator of the estimator.

Proof of Theorem3. We first prove (5). Recall the definition of ρˆpd, t, uq, and define

1 n νˆpd, t, uq “ hpY q1 pd, t, uq. n i i i“1 ÿ Let νpd, t, uq “ E rνˆpd, t, uq | Gs and ρpd, t, uq “ E rρˆpd, t, uq | Gs. Note that ρp¨q is a random quantity and depends on n. In the following expression, we will suppress the dependence of these objects on pd, t, uq for ease of notation. By a Taylor expansion,

? ? νˆ ν 1? ν ? n µˆpd, t, uq ´ µpd, t, uq “ n ´ “ n νˆ ´ ν ´ n ρˆ ´ ρ ρˆ ρ ρ ρ2 ˆ ˙ ` ˘ ? 2 ` ? ˘ 2 `? ˘ ` Op n νˆ ´ ν ` n ρˆ ´ ρ ` n νˆ ´ ν ρˆ ´ ρ . ´ ` ˘ ` ˘ ` ˘` ˘¯ By Lemma A.3, the last term on the right-hand side is opp1q. Then the right-hand side equals n 1 1 µhpd, t, uq ? hpYiq1ipd, t, uq ´ 1ipd, t, uq (31) n ρpd, t, uq ρpd, t, uq i“1 ÿ ˆ ˙ 29 Michael P. Leung up to an opp1q term, since Er(31) | Gs “ 0. We will complete the proof by showing that conditional on G, the variance of (31) has a finite probability limit and deriving a conditional CLT. Variance. It is straightforward to compute that Var (31) G equals ` ˇ ˘ 1 n ˇ E hpY qhpY q ´ µ pd, t, uq hpY q ` hpY q ` µ pd, t, uq2 1 pd, t, uq1 pd, t, uq G n i j h i j h i j i“1 jPNi ÿ ÿ “` ` ˘ ˘ ˇ ‰ ˆ ρpd, t, uq´2 (32)ˇ

2 This has a finite probability limit σhpd, t, uq by Assumption4(b), Lemma A.2, and the assumption of finite fourth moments. CLT. We verify the conditions of Theorem4 for B “ σpGq and

1 µpd, t, uq Zi “ hpYiq1ipd, t, uq ´ 1ipd, t, uq. ρpd, t, uq ρpd, t, uq

Then the result follows from Theorem2 and Lemma A.1. Condition (a) holds since, as argued above, the variance tends to σ2pd, t, uq, which is strictly positive by assumption. To verify condition (b), it is enough to show that

4 max E phpYiq1ipd, t, uqq G “ Opp1q, (33) i „ ˇ  ˇ p ˇ since ρpd, t, uq ÝÑ P pBinpγ, pq “ tq pdp1 ´ pq1´dρpγqˇ by Lemma A.2 and Assumption4(b). Equation (33) follows because

4 4 E phpYiq1ipd, t, uqq G “ E phprpd, t, u, εqq1ipd, t, uqq G ” ˇ ı “ E ”hprpd, t, u, εqq4 Er1 pd, t, uˇqs ı ˇ i ˇ 4 d 1´d “ E “hprpd, t, u, εqq ‰ P pBinpγ, pq “ tq p p1 ´ pq 1tγi “ γu.

Since E hprpd, t, u, εqq4 ă 8, this“ establishes condition‰ (b). Lastly, Assumption6 is suffi- cient for condition (c). The result follows from Lemma A.1. “ ‰

Lemma A.4. Under the conditions of Lemma A.2 and Assumption6,

n 1 p h pY qh pY q1 pd, t, uq1 pd, t, uq ´ E h pY qh pY q1 pd, t, uq1 pd, t, uq G ÝÑ 0. n 1 i 2 j i j 1 i 2 j i j i“1 jPNi ÿ ÿ ` “ ˇ ‰˘ ˇ Proof. Let Zki “ hkpYiq1ipd, t, uq. The conditional (on G) variance of the left-hand side

30 Treatment and Spillover Effects equals

1 n n CovpZ Z ,Z Z | Gq n2 1i 2j 1k 2l i“1 jPN k“1 lPN ÿ ÿi ÿ ÿk 1 n “ CovpZ Z ,Z Z | Gq n2 1i 2j 1k 2l i“1 jPN kPN YN lPN ÿ ÿi ÿi j ÿk 1 n ` CovpZ Z ,Z Z | Gq n2 1i 2j 1k 2l i“1 jPN kRN YN l N N N ÿ ÿi ÿi j Pp iYÿj qX k 1 n ` CovpZ Z ,Z Z | Gq. (34) n2 1i 2j 1k 2l i“1 jPN kRN YN l N N N ÿ ÿi ÿi j P kzpÿiY j q The third term on the right-hand side is zero by definition of the dependency graph. By the arithmetic-geometric mean inequality, the first term is bounded in absolute value by a constant times n ´1 4 4 1 n max E Z1i G ` max E Z2i G γk, i i n ˆ ˙ i“1 jPNi kPNiYNj “ ˇ ‰ “ ˇ ‰ ÿ ÿ ÿ ˇ ˇ which is opp1q by finite fourth moments, Assumption6, and (20). The second term on the right-hand side of (34) equals

1 n CovpZ Z ,Z Z | Gq, n2 i j k l i“1 jPN kPN YN l N N N ÿ ÿi ÿi j P kzpÿiY j q which is opp1q by finite fourth moments, Assumption6, and (21).

p 2 Proof of Proposition1. As argued in the proof of Theorem3, (32) ÝÑ σhpd, t, uq, so 2 p by the triangle inequality, it remains to show that |σˆhpd, t, uq ´ (32)| ÝÑ 0. First note that p |ρˆpd, t, uq ´ ρpd, t, uq| ÝÑ 0 Theorem2. Second, the distance between

1 n ρpd, t, uq´2 hpY qhpY q´µ pd, t, uq hpY q`hpY q `µ pd, t, uq2 1 pd, t, uq1 pd, t, uq n i j h i j h i j i“1 jPNi ÿ ÿ ` ` ˘ ˘ and its conditional (on G) expectation converges in probability to zero by Lemma A.4. Note that this conditional expectation is precisely (32). By the triangle inequality, we complete the proof by showing

n 1 p µˆ pd, t, uq ´ µ pd, t, uq hpY q ` hpY q 1 pd, t, uq1 pd, t, uq ÝÑ 0, h h n i j i j i“1 jPNi ` ˘ ÿ ÿ ` n ˘ 1 p µˆ pd, t, uq2 ´ µ pd, t, uq2 1 pd, t, uq1 pd, t, uq ÝÑ 0. h h n i j i“1 jPNi ` ˘ ÿ ÿ

31 Michael P. Leung p By Theorem2, µˆhpd, t, uq ÝÑ µhpd, t, uq, so these equations hold by Lemmas A.2 and A.4.

Theorem 5 (CLT for TSE). Maintain the assumptions of Theorem3. Let τ “ pd, t, uq and τ 1 “ pd1, t1, u1q. Then

? 1 1 n pµˆhpτq ´ µˆhpτ qq ´ pµhpτq ´ µhpτ qq n 1 1 1 ` 1 1ipτq 1ipτ˘q µpτq1ipτq µpτ q1ipτ q “ ? hpYiq ´ ´ ´ `opp1q. (35) n ρpτq ρpτ 1q ρpτq ρpτ 1q i“1 ÿ „ ˆ ˙ ˆ ˙ ˜ ψhpWiq loooooooooooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooooooooooon n n ?1 ˜ 2 Let v˜h “ Var n i“1 ψhpWiq G . Then there exists σTSE ă 8 such that ´ ř ˇ ¯ ˇ n p 2 v˜h ÝÑ σTSE.

2 d 2 If σTSE ą 0, then (35) ÝÑ N 0, σTSE . ` ˘ Proof. Equation (35) follows from the proof of Theorem3; see (31). Variance. By rote computation,

1 n v˜n “ E hpY qhpY q1 pτq1 pτq G ρpτq´2 h n i j i j i“1 jPNi ˆ ÿ ÿ “ ˇ ‰ ˇ 1 1 ´1 ´ E hpYiqhpYjq1ipτq1jpτ q G ρpτqρpτ q 1 1 ´1 ´ E “hpYiqhpYjq1ipτ q1jpτq ˇ G‰ `ρpτqρpτ q˘ ˇ “ 1 ˇ1 ‰ ` 1 ´2 ˘ ` E hpYiqhpYjq1ipτ q1jpτˇ q G ρpτ q ˙ n “ ˇ ‰ 1 ˇ µpτq ´ E hpYiq ` hpYjq 1ipτq1jpτq G n ρpτq2 i“1 jPNi ˆ ÿ ÿ “` ˘ ˇ ‰ 1 µˇpτq ´ E hpYiq ` hpYjq 1ipτq1jpτ q G ρpτqρpτ 1q “` ˘ ˇ ‰ 1 1 µpτ q ´ E hpYiq ` hpYjq 1ipτ q1jpτq ˇ G ρpτqρpτ 1q “` ˘ ˇ ‰ 1 1 1 µpτ q ` E hpYiq ` hpYjq 1ipτ q1jpτ qˇ G ρpτ 1q2 ˙ n ˇ 1 “` ˘ ‰ 1 ` E 1 pτq1 pτq G ρpτq´2 ´ E 1 pˇτq1 pτ 1q G ρpτqρpτ 1q ´ n i j i j i“1 jPNi ˆ ÿ ÿ “ ˇ ‰ “ ˇ ‰ ` ˘ 1 ˇ 1 ´1 ˇ 1 1 1 ´2 ´ E 1ipτ q1jpτq G ρpτqρpτ q ` E 1ipτ q1jpτ q G ρpτ q . ˙ “ ˇ ‰ ` ˘ “ ˇ ‰ ˇ ˇ 32 Treatment and Spillover Effects

This has a finite limit by Lemma A.2, Assumption4(b), and the assumption of finite fourth moments. ˜ CLT. We verify the conditions of Theorem4 for B “ σpGq and Zi “ ψhpWiq. Then the result follows from Theorem2 and Lemma A.1. Condition (a) holds since, as argued above, 2 the variance tends to σTSE, which is strictly positive by assumption. To verify condition (b), it is enough to show that

4 max E phpYiq1ipd, t, uqq G “ Opp1q, i „ ˇ  ˇ p ˇ since ρpd, t, uq ÝÑ P pBinpγ, pq “ tq pdp1 ´ pq1´dP pγqˇ by Lemma A.2 and Assumption4(b). Equation (33) follows from the proof of Theorem3. Lastly, Assumption6 is sufficient for condition (c).

2 Proof of Proposition2. After expanding the product in the definition of σˆTSE, the proof proceeds similarly to that of Proposition1, but we use Theorem5 rather than Theorem 3.

1 1 p Proof of Proposition3. Showing that n ErX X | Gs ÝÑ V is a simple computation followed by an application of Assumption8(a). 1 1 1 Turning to the CLT proper, the left-hand side of (8) equals n ErX εε X | Gs, which is n ?1 also the conditional variance of n i“1 Xiεi. Since S is positive definite, by Theorem4, the Cramér-Wold device, and Lemma A.1. ř n 1 d ? X ε ÝÑ N p0,Sq. n i i i“1 ÿ ? Distributional convergence of npθˆ ´ θ0q then follows from Lemma A.1.

Under homoskedasticity, 1 n E ρ X,A X X1 G Proof of Proposition4. n i“1 jPNi ijp q i j reduces to n n ř ř ” ˇ ı 1 1 ˇ s E X X1 G ` c E X X1 G . n i i n i j i“1 i“1 jPN ztiu ÿ “ ˇ ‰ ÿ ÿi “ ˇ ‰ By definition, ˇ ˇ

1 Dj Tj γj n n 1 1 1 Di DiDj DiTj Diγj E XiXj G “ ¨ ˛ . (36) n n Ti TiDj TiTj Tiγj i“1 jPN ztiu i“1 jPN ztiu ÿ ÿi “ ˇ ‰ ÿ ÿi ˚ γ γ D γ T γ γ ‹ ˇ ˚ i i j i j i j ‹ ˝ ‚

33 Michael P. Leung

We compute the limits for each entry. First, recalling that p “ ErD1s,

1 n 1 n 1 n E D T G “ p G ` p2 G n i j n ij n jk i“1 jPN ztiu i“1 jPN ztiu i“1 jPN ztiu k‰j‰i ÿ ÿi “ ˇ ‰ ÿ ÿi ÿ ÿi ÿ ˇ 1 n 1 n 1 n “ p γ ` p2 γ ´ 2 γ , n i n j n i i“1 ˜ i“1 jPN i“1 ¸ ÿ ÿ ÿi ÿ which tends to a finite probability limit by Assumption8(b). Second,

1 n 1 n 1 E T T G “ p G ` p2 G G n i j n jk n jk jl i“1 jPN ztiu i“1 jPN ztiu k‰j jPN ztiu k‰j l‰k‰j ÿ ÿi “ ˇ ‰ ÿ ÿi ÿ ÿi ÿ ÿ ˇ 1 n 1 n 1 n 1 n “ p γ ´ γ ` p2 γ ´ γ2 , n j n i n j n j ˜ i“1 jPN i“1 ¸ ˜ i“1 jPN i“1 ¸ ÿ ÿi ÿ ÿ ÿi ÿ which tends to a finite probability limit by Assumption8(b). The other entries in (36) are 1 n 1 similar, as are the entries of n i“1 ErXiXi | Gs. ř 1 1 p Proof of Proposition5. In light of Proposition3, to establish n X X ÝÑ V , it is enough to show concentration of the variance elementwise. We do so first for entry 1 n n i“1 DiTi. Its variance equals

ř 1 n 1 n E D D T T G ´ ErD T | GsErD T | G. (37) n2 i j i j n2 i i j j i“1 jPNi i“1 jPNi ÿ ÿ “ ˇ ‰ ÿ ÿ ˇ This is bounded above by 2 n γ γ , n2 i j i“1 jPN ÿ ÿi 1 n 2 which is opp1q under Assumption8(c). Next we turn to the entry n i“1 Ti . Its variance equals 1 n 1 n ř E T 2T 2 G ´ ErT 2 | GsErT 4 | Gs. (38) n2 i j n2 i j i“1 jPNi i“1 jPNi ÿ ÿ “ ˇ ‰ ÿ ÿ This is bounded above by ˇ 2 n γ2γ2, n2 i j i“1 jPN ÿ ÿi 1 1 which is opp1q under Assumption8(c). The other entries of n X X follow by similar argu- ments.

34 Treatment and Spillover Effects

ˆ p 1 ˆ We next show that S ÝÑ S. Let 14 be a 4ˆ4 matrix of ones. Since εˆi “ εi `Xipθ0 ´θq,

1 n Sˆ “ εˆ εˆ X X1 n i j i j i“1 jPN ÿ ÿi 1 n “ ε ε X X1 ` ε X1 pθ ´ θˆqX X1 ` X1pθ ´ θˆqε X X1 n i j i j i j 0 i j i 0 j i j i“1 jPNi ÿ ÿ ` ˆ 1 1 ˆ 1 ` pθ0 ´ θq XiXjpθ0 ´ θqXiXj 1 n 1 n ď ε ε X X1 ` 4||θ ´ θˆ||1 pε `˘ε qpγ ` 1q2pγ ` 1q2 n i j i j 0 4 n i j i j i“1 jPN i“1 jPN ÿ ÿi ÿ ÿi 1 n ` 16||θ ´ θˆ||21 pγ ` 1q2pγ ` 1q2. 0 4 n i j i“1 jPN ÿ ÿi

The last two terms are opp1q because

1 n 1 n E pε ` ε qpγ ` 1q2pγ ` 1q2 G ď 2Er|ε |s pγ ` 1q2pγ ` 1q2, n i j i j 1 n i j «ˇ i“1 jPNi ˇ ˇ ff i“1 jPNi ˇ ÿ ÿ ˇ ˇ ÿ ÿ ˇ ˇ ˇ which isˇ Opp1q by Assumption8(c). Hence, inˇ ˇ light of (8), it remains to show that

n 1 p ε ε ´ E ε ε X ,X ,A “ 1 X X1 ÝÑ 0. n i j i j i j ij i j i“1 jPNi ÿ ÿ ` “ ˇ ‰˘ ˇ The conditional variance of the left-hand side, conditioning on pX,Aq, equals

1 n Var ε ε X ,X ,A “ 1 X X1 X X1 n2 i j i j ij i j k l i“1 jPNi kPNiYNj lPNk ÿ ÿ ÿ ÿ ` ˇ ˘ ˇ 1 n ď E ε4 X X X1 X X1, n2 i i i j k l i“1 jPNi kPNiYNj lPNk ÿ ÿ ÿ ÿ “ ˇ ‰ ˇ where the inequality follows from Jensen’s inequality and Cauchy-Schwarz. The last expres- sion is bounded by

1 n 4C1 pγ ` 1qpγ ` 1qpγ ` 1qpγ ` 1q, 4 n2 i j k l i“1 jPN kPN YN lPN ÿ ÿi ÿi j ÿk 4 where C is the uniform upper bound on E εi Xi that exists by assumption. This last expression is o p1q by Assumption8(c). p “ ˇ ‰ ˇ

35 Michael P. Leung References

Aronow, P. and C. Samii, “Estimating Average Causal Effects under Interference between Units,” working paper, 2015.4,5,6

Athey, S., D. Eckles, and G. Imbens, “Exact P-values for Network Interference,” Tech- nical Report, National Bureau of Economic Research 2015.6

Bandiera, O., I. Barankay, and I. Rasul, “Social Connections and Incentives in the Workplace: Evidence from Personnel Data,” Econometrica, 2009, 77 (4), 1047–1094.2,8

Banerjee, A., A. Chandrasekhar, E. Duflo, and M. Jackson, “The Diffusion of Microfinance,” Science, 2013, 341 (6144), 1236498.2

Barabási, A., Network Science, Cambridge University Press, 2015. 19

Basse, G. and E. Airoldi, “Optimal Design of in the Presence of Network- Correlated Outcomes,” arXiv preprint arXiv:1507.00803, 2015.4

Bhattacharya, D., P. Dupas, and S. Kanaya, “Estimating the Impact of Means-tested Subsidies under Treatment Externalities with Application to Anti-Malarial Bednets,” Technical Report, National Bureau of Economic Research 2013.2

Bond, R., C. Fariss, J. Jones, A. Kramer, C. Marlow, Jaime S., and James F., “A 61-Million-Person in Social Influence and Political Mobilization,” Nature, 2012, 489 (7415), 295–298.2

Bramoullé, Y., H. Djebbari, and B. Fortin, “Identification of Peer Effects through Social Networks,” Journal of Econometrics, 2009, 150 (1), 41–55.5

Brock, W. and S. Durlauf, “Discrete Choice with Social Interactions,” The Review of Economic Studies, 2001, 68 (2), 235–260.5

Bursztyn, L., F. Ederer, B. Ferman, and N. Yuchtman, “Understanding Mechanisms Underlying Peer Effects: Evidence From a Field Experiment on Financial Decisions,” Econometrica, 2014, 82 (4), 1273–1301.2

Cai, J., A. De Janvry, and E. Sadoulet, “Social Networks and the Decision to Insure,” American Economic Journal: Applied Economics, 2015, 7 (2), 81–108.2,8

Carrell, S., B. Sacerdote, and J. West, “From Natural Variation to Optimal Policy? The Importance of Endogenous Peer Group Formation,” Econometrica, 2013, 81 (3), 855–882.6

Chandrasekhar, A., “Econometrics of Network Formation,” in Y. Bramoullé, A. Galeotti, and B. Rogers, eds., Oxford Handbook on the Econometrics of Networks, forthcoming. 19

Christakis, N., “Social Networks and Collateral Health Effects,” 2004.2

36 Treatment and Spillover Effects

Duflo, E. and E. Saez, “The Role of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment,” The Quarterly Journal of Economics, 2003, pp. 815–842.2

, P. Dupas, and M. Kremer, “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya,” The American Economic Review, 2011, 101 (5), 1739.2,7

Dupas, P., “Short-Run Subsidies and Long-Run Adoption of New Health Products: Evi- dence From a Field Experiment,” Econometrica, 2014, 82 (1), 197–228.2,7

Giorgi, G. De, M. Pellizzari, and S. Redaelli, “Identification of Social Interactions through Partially Overlapping Peer Groups,” American Economic Journal: Applied Eco- nomics, 2010, 2, 241–275.5

Graham, B., “Identifying Social Interactions through Conditional Variance Restrictions,” Econometrica, 2008, 76 (3), 643–660.5

, “Identifying Social Interactions through Conditional Variance Restrictions,” Economet- rica, 2008, 76 (3), 643–660.7

Hudgens, M. and M. Halloran, “Toward Causal Inference with Interference,” Journal of the American Statistical Association, 2012.4

Kim, D., A. Hwong, D. Stafford, D. Hughes, A. O’Malley, J. Fowler, and N. Christakis, “ Targeting to Maximise Population Behaviour Change: a Cluster Randomised Controlled Trial,” The Lancet, 2015, 386 (9989), 145–153.2

Kling, J., J. Liebman, and L. Katz, “Experimental analysis of neighborhood effects,” Econometrica, 2007, 75 (1), 83–119.2

Kramer, A., J. Guillory, and J. Hancock, “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks,” Proceedings of the National Academy of Sciences, 2014, 111 (24), 8788–8790.2

Lazzati, Natalia, “Treatment Response with Social Interactions: Partial Identification via Monotone Comparative Statics,” Quantitative Economics, 2015, 6 (1), 49–83.5

Leung, M., “A Weak Law for Moments of Pairwise-Stable Networks,” working paper, 2016. 18, 19

Manresa, E., “Estimating the Structure of Social Interactions using Panel Data,” working paper, 2013.8

Manski, C., “Identification of Endogenous Social Effects: The Reflection Problem,” The Review of Economic Studies, 1993, 60 (3), 531–542.5

Manski, Charles F., “Identification of Treatment Response with Social Interactions,” The Econometrics Journal, 2013, 16 (1), S1–S23.5

37 Michael P. Leung

Mas, A. and E. Moretti, “Peers at Work,” The American Economic Review, 2009, 99 (1), 112–145.7

Miguel, E. and M. Kremer, “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities,” Econometrica, 2004, 72 (1), 159–217.2,7

Oster, E. and R. Thornton, “Determinants Of Technology Adoption: Peer Effects In Menstrual Cup Take-Up,” Journal of the European Economic Association, 2012, 10 (6), 1263–1293.2,8

Penrose, M., Random Geometric Graphs number 5. In ‘Oxford Studies in Probability.’, Oxford; New York: Oxford University Press, 2003.3, 16, 23

Ross, N., “Fundamentals of Stein’s method,” Probability Surveys, 2011, 8, 210–293.3, 23, 24

Shalizi, C. and A. Thomas, “Homophily and Contagion Are Generically Confounded in Observational Social Network Studies,” Sociological Methods and Research, 2011, 40 (2), 211–239.6

Sobel, M., “What Do Randomized Studies of Housing Mobility Demonstrate? Causal Inference in the Face of Interference,” Journal of the American Statistical Association, 2006, 101 (476), 1398–1407.2

Song, K., “Measuring the Graph Concordance of Locally Dependent Observations,” working paper, 2015.6

Toulis, P. and E. Kao, “Estimation of Causal Peer Influence Effects,” in “Proceedings of The 30th International Conference on Machine Learning” 2013, pp. 1489–1497.4

Valente, T., A. Ritt-Olson, A. Stacy, J. Unger, J. Okamoto, and S. Sussman, “Peer Acceleration: Effects of a Social Network Tailored Substance Abuse Prevention Program among High-Risk Adolescents,” Addiction, 2007, 102 (11), 1804–1815.2 van der Laan, M., “Causal Inference for a Population of Causally Connected Units,” Journal of Causal Inference, 2014, 2 (1), 13–74.4,5

38