<<

An Experimental Approach to Causal Identification of Spillover Effects under General Interference

Alexander Coppock∗ and Neelanjan Sircar†

Columbia University

July 18, 2013

Abstract

This paper extends the Rubin Causal Model to a general interference framework. In par- ticular, we develop a framework to define estimands in a general way using "reference assign- ments." This framework allows for the flexible stipulation of the non-interference assumptions required for estimation. We apply this approach to two common estimands, the average di- rect treatment effect and the average indirect exposure effect, recasting the framework in the language of network analysis. We use the network context to demonstrate the relationship between the potential outcomes and edges in a network, as well as to define intuitive non- interference assumptions. Finally, we derive unbiased and consistent estimators of average causal effects under interference and demonstrate their properties with simulated data.

[email protected][email protected]

We would like to thank Peter Aronow, Albert Fang, Donald Green, Dominik Hangartner, Macartan Humphreys, Ryan Moore and participants at European Political Science Association Annual Meeting 2013, the 6th Annual Meeting of the Political Networks Section of the American Political Science Association, and the Columbia University Methods Workshop for useful comments and advice. All errors are our own.

1 1 Introduction

Estimation of causal effects has typically relied on a strong assumption of non-interference be- tween experimental units. Many treatments, such as information, monitoring, or transfers, vi- olate this assumption in ways that are of great social scientific relevance, i.e., average spillover effects themselves are estimands of interest. We address the special estimation challenges asso- ciated with interference by extending the Rubin Causal Model to a general interference frame- work. In this paper, we derive a principled approach to the analysis of under interference. We define estimands of interest with respect to "reference assignments," which are treatment assignment vectors that guarantee that we will observe the outcome of interest. The benefit of this approach is that estimands can be decoupled from assumptions about non-interference. Once an has been conducted, we can flexibly state assumptions of non-interference, which allow for unbiased and consistent estimation of relevant estimands. We apply this approach to two common estimands, the average direct treatment effect and the average indirect exposure effect. In doing so, we recast a typical experiment involving spillovers in the language of network analysis. This allows us to comment on the relationship between the edges of a network and the potential outcomes for indirect exposure to a treatment. Furthermore, the network exposition allows us to frame plausible non-interference conditions in an intuitive way. Section 2 discusses relevant literature and a motivation for our approach. Section 3 derives our general framework using set-theoretic and measure-theoretic principles. Section 4 explores techniques to define non-interference conditions, as well as the bias and efficiency of estimation strategies, by applying the framework to a network context. Section 5 concludes the paper.

2 Related Literature and Motivation

The assumption of "no interference between units" (Cox, 1958) is commonly made in experi- ments and is one component of the popularly assumed stable unit treatment value assumption, or SUTVA (Rubin, 1980). Non-interference, however, may not always hold between experimental units, as with treatments that are likely to spillover such as persuasion, monitoring, or educa- tion. Spillovers, or indirect exposure effects, bias naïve treatment effect estimates and may be of substantive relevance to researchers in their own right. In the past decade, researchers have begun seriously investigating indirect exposure effects through experimental methods. Estimands of interest from such studies include, for instance, the effects of vaccination on infection rates in surrounding areas, the effects of get-out-the-vote campaigns on neighbors’ voting rates, and the effects of educational interventions for students on their classmates. Such studies have often been conducted through multilevel experimentation or double randomization (e.g., Duflo and Saez (2003), Giné and Mansuri (2011), Sinclair, McConnell and Green (2012)).

2 In multilevel experiments, clusters are selected randomly from the universe of clusters of units in the population, e.g. neighborhoods or counties. For a random subset of selected clusters, a fixed percentage of units are assigned to treatment, and in the other clusters no units are assigned to treatment. Crucially, non-interference is assumed to hold across clusters but not within clusters. This yields three types of experimental groups, consisting of: (a) those units that are directly treated; (b) those units that are not directly treated but are located in clusters where units were treated (thus experiencing spillovers); and (c) those units that are not directly treated and are in clusters where no other unit was treated (serving as a control group). The average indirect effect is estimated as the average outcome in group (b) minus the average outcome in group (c). The statistical foundations of estimation via multilevel experimentation are discussed in detail in Hudgens and Halloran (2008). However, multilevel experiments may not always be feasible. The interference structure may not resemble clusters, as is often the case in the study of peer effects and spatial spillovers. In such cases, it is useful to stipulate a network of nodes and edges that describes the paths along which spillovers can occur. Experiments conducted in this more general network setting include Angelucci, de Giorgi, Rangel and Rasul (2010), Chen, Humphreys and Modi (2010), and Ichino and Schuendeln (2012). The framework we develop in this paper is a generalization of the classic Rubin Causal Model (Rubin (1974)) and its extension to multilevel experimentation (Hudgens and Halloran, 2008). In particular, we follow the insights of Rubin (1990) and Sobel (2006) and explicitly define potential outcomes as a function of the vector of treatment assignments resulting from the randomization. We use this idea to define "reference assignments," which are assignment vectors that guaran- tee that we an observe the potential outcome of interest. For example, we are guaranteed to see the control potential outcome for each unit under the assignment vector where no unit re- ceives treatment. Following Sobel (2006), we then define the estimand of interest by dividing differences in the appropriate potential outcomes by the number of reference assignments. The benefit of this approach is that it defines estimands of interest independently of non-interference assumptions. The stable unit treatment value assumption entails one of a set of many possible non-interference assumptions that may be appropriate for a given network structure. A common non-interference assumption is that units’ potential outcomes do not vary with the treatment assignments of those units that they are not connected to over some network. A more complex version of this assumption is often implicit: units’ potential outcomes may vary with the treatment assignments of direct neighbors but they do not vary with the assignments of indirect neighbors. For instance, we might allow a voter’s turnout decision to be affected when a friend is treated with a get-out- the-vote message, but assume that the voter’s decision is not affected when a friend of a friend is treated. Defining a non-interference assumption entails classifying the set of assignment vectors that would yield the same potential outcomes as a reference assignment. We may then use the known randomization distribution from the experiment and a stipulated non-interference assumption as to construct the probability of assignment to particular treatment conditions. We use inverse probability weighting to construct unbiased or consistent estimates of our estimands of interest. The inverse probability weighting approach, with application, is discussed in Chen, Humphreys

3 and Modi (2010). We refer readers to Aronow and Samii (2013b), who discuss the statistical and mathematical underpinnings of the inverse probability approach, including variance estimation, using “exposure mappings."

3 General Framework

In this section, we develop a principled approach to constructing estimands of interest under interference. In particular, following Rubin (1990) and Sobel (2006), we define general poten- tial outcomes as a function of each possible vector of assignments to treatment statuses. Our estimands of interest are defined with respect to reference assignments over which we are guar- anteed to observe the relevant potential outcomes. After a set of estimands has been defined, we can flexibly stipulate non-interference assumptions and corresponding estimators. Many of the studies discussed above are interested in two estimands in particular: the direct and indirect effects of treatment. The average effect of direct exposure to a treatment is the average of all unit treatment effects in the absence of any interference. The average effect of (first-order) indirect exposure is the average of all spillover treatment effects, which are defined with respect to individual edges with receiver i and sender j. These quantities can be estimated if the treatment assignment vector, along with a known spillover structure, randomly assigns units into three treatment conditions (direct treatment, indirect treatment, or pure control) with known . The task of identifying reference assignments focuses on defining an assignment vector in which we are guaranteed to see the potential outcome of interest for a specific experimental unit. We are guaranteed to see the control condition for unit i if no other unit in the population is assigned to treatment. Similarly, we are guaranteed to isolate the spillover from sender j to receiver i under the assignment vector that assigns only unit j to the treatment, and all other units in the population to control. Notice, however, that i may have many neighbors, so we need to define one such reference assignment for each neighboring unit. Since we are making no parametric assumptions about potential outcomes, each reference assignment may yield a unique potential outcome. Thus, there are as many potential outcomes in each treatment condition as there are reference assignments. In the control condition, there is only one reference assignment, but in the indirect treatment condition, there are as many reference assignments for i as there are neighbors, and each potential outcome can be associated with a reference assignment assigning a unique neighbor to treatment. Two features of this framework are particularly important. First, spillovers are non-anonymous: each potential outcome in the indirect exposure condition for a unit is associated with a particular neighbor. For some treatments, it may matter a great deal which of a unit’s neighbors is assigned to direct treatment. Secondly, spillovers are non-reciprocal: the spillover from unit i to unit j may be quite different from the spillover from j to i. The non-anonymous and non-reciprocal properties of spillovers accounted for within our esti- mands. The average indirect effect of treatment is defined as an average over all edges ij, meaning that for a given node i with three neighbors j, k, and l, edges ij, ik, and il are counted separately in the estimand. Further, edge ji, if it exists, is counted separately from ij. Note that because the

4 average indirect effect is defined over edges, not nodes, a node (as a sender or a receiver) may appear multiple times in the estimand. Section 3.1 defines a general notion of treatment conditions and general interference estimands, incorporating the logic above, by appealing to standard set theory and probability measure the- ory. Sections 3.2 and 3.3 describe estimation procedures under particular non-interference as- sumptions.

3.1 Defining Potential Outcomes and Estimands

Consider an arbitrary number, t, of treatments, i.e., each subject i is assigned a treatment di ∈ {0, 1, . . . , t − 1}.1 Any particular treatment assignment vector over the entire population, Ω = {1, . . . , N} will be denoted as the N-tuple d ∈ D, where D = {0, 1, . . . , t − 1}N. Let y : D → RN be a real vector-valued potential outcome function over the treatment assignment draw, where yi(d) denotes the potential outcome for subject i associated with a particular treatment assign- ment vector, d. In studies of interference, we are generally interested in expected difference in the population between two treatment conditions. In the presence of spillovers, a treatment condi- tion is distinct from a treatment assignment: a treatment condition is any subset of assignment vectors defined for each unit that necessarily satisfy our condition of interest (e.g., spillover from a single neighboring unit).

Definition 3.1 (Treatment Conditions). A treatment condition is defined by the collection S = N {S1,..., SN} where Si ⊆ {0, . . . , t − 1} is a set of N-tuples (possibly empty) for each i ∈ Ω. The set Si will be referred to as the treatment condition with respect to i. The set of treatment conditions will be denoted by Ψ.

The cardinality of possible treatment conditions is quite high. For a fixed N (i.e. Si 6= ∅ for all 2 i ∈ Ω), |Ψ| = tN . If we allow treatment conditions to be over any non-empty subset of Ω, then N−1 N (N−k)2 2 N |Ψ| = ∑k=0 ( k )t = (t + 1) − 1.

Each element d ∈ Si will be referred to as a reference assignment for i in treatment condition S. As in the discussion above, each of these reference assignment yields a unique potential outcome that satisfies our treatment condition of interest. Intuitively, reference assignments are those assignment vectors for which we are guaranteed to see the potential outcome of interest.

Definition 3.2 (Reference Assignments). Each d ∈ Si is referred to as a reference assignment for 0 0 person i in treatment condition S. Given d, d ∈ Si, we have yi(d) 6= yi(d ). The estimand of interest is defined over treatment conditions. In particular, a general interference estimand will typically involve the (weighted) average of the difference in potential outcomes between the treatment conditions over all of the units, defined through the reference assignments. The notation for a general interference estimand is made more complex by the fact that each unit has a different number of potential outcomes in the same treatment condition, and each unit may have a different number of potential outcomes across treatment conditions. For a fixed set of units, Ω0 ⊆ Ω (i.e., S and S0 have non-empty components for the corresponding units in Ω0), we define a general interference estimand as follows:

1 We will assume a di = 0 implies that i has been assigned to control.

5 Definition 3.3. Let a general interference estimand, τ : D × D → R, be defined over Ω0 ⊆ Ω. Then,

0 1 0 0 τ(S, S ) = 0 ∑ ∑ ∑ wiyi(d) − wiyi(d ) ∑i∈Ω0 |Si||Si| ∈ 0 ∈ 0 0 i Ω d Si d ∈Si

1 0 1 0 0 = 0 ∑ ∑ |Si|wiyi(d) − 0 ∑ ∑ |Si|wiyi(d ) ∑i∈Ω0 |Si||Si| ∈ 0 ∈ ∑i∈Ω0 |Si||Si| ∈ 0 0 0 i Ω d Si i Ω d ∈Si 0 where wi, wi are unit-specific weights applied to yi. It will be useful at times to deal with the two terms in the final equation separately.

Definition 3.4. The weighted-average spillover outcome for a treatment condition, S, with respect to another treatment condition S0, is given by: 1 (S S0) = | 0| ( ) y , i w 0 ∑ ∑ Si wiyi d ∑ ∈ 0 |Si||S | 0 i Ω i i∈Ω d∈Si

0 0 Thus, we may write our estimand as τ = y(S, S )w − y(S , S)w0 Note that the definition of an estimand does not depend on model or non-interference assump- tions in this setting. Non-interference assumptions, which are necessary for estimation, are made independently of the definition of the estimand. As explored in the following section, a researcher may estimate causal effects under many alternative non-interference assumptions without altering the definitions of the estimands of interest.

3.2 Experiments and Non-Interference Partitions

Definition 3.5. An experiment is defined by the triple (D, F, P), where D = {0, . . . , t − 1}N is a space of possible assignment draws, and P is a probability measure with P(∅) = 0 and P(D) = 1. F is the usual σ-algebra formed by unions, intersections, and complements from the elements of D. In order to associate an experiment with potential outcomes and the estimand of interest, we must deduce the conditions under which each potential outcome will be observed. We observe a potential outcome when a particular assignment vector allows us to cleanly observe the potential outcome without "interference." The goal, then, is to define the set of assignment vectors that will allow us to observe each potential outcome – and deduce the probability of such an observation – in the estimand of interest. We will refer to this set of assignment vectors as the non-interference partition. A non-interference partition is comprised of the treatment assignment vectors over which we observe the same potential outcome as the reference assignment, for a particular unit i. For instance, under the standard non-interference assumption made in most experiments, the set of assignment vectors {d | di = 1} yields the same potential outcome as di, the vector that assigns i to treatment and all other units to no treatment. Each distinct potential outcome in the estimand of interest must be separately observable. This will imply that two reference assignments in the 0 relevant treatment conditions for i, Si and Si , have non-overlapping non-interference conditions (due to the fact that they yield distinct potential outcomes). We now make these ideas more precise below.

6 Definition 3.6 (Non-Interference). A non-interference partition for unit i is a partition Qi = {Qi ,..., Qi } 1 Mi of D. For ease of exposition we denote the element of the partition containing d as Qi(d). The following three conditions always hold for non-interference partitions:

SM i 1. j=1 Qj = D and Qj ∩ Qk = ∅ for j 6= k

∗ i ∗ 2. Let d ∈ Q (d). Then, yi(d) = yi(d ). 0 0 i i 0 3. For d, d ∈ Si ∪ Si,Q (d) ∩ Q (d ) = ∅. It is crucial to understand the difference between treatment conditions and non-interference par- titions. Treatment conditions allow us to select the reference assignments, and thus the potential outcomes used in the estimand of interest. Non-interference partitions allow us to define the set of assignment vectors over which we observe those potential outcomes. The probability that an outcome yi(d) is observed just the probability measure of the partition containing the reference assignment d: = ( i( )) πyi(d) P Q d (3.1)

In order for an estimand to be estimable, we must be able to observe each potential outcome in the estimand with positive probability, and at least one unit in each treatment condition must be observed.

Definition 3.7 (Estimability). Let Qi denote the non-interference condition for unit i. An estimand, τ, over the units Ω0 ⊆ Ω is said to be estimable if: > ∈ ∪ 0 ∈ 0 1. πyi(d) 0 for all d Si Si and all i Ω i j 0 i i 2. For each d ∈ D,P({d}) > 0 implies that there exists d ∈ Si, d ∈ Sj such that d ∈ Q (d ) ∩ Qj(dj) for i, j ∈ Ω0 To understand this framework, consider a standard experimental setup with binary treatment status and N units in the population. In this setting each unit, i has treatment assignment di ∈ {0, 1}. Intuitively, when we isolate the effect of di = 1 or di = 0, we are assuming that all other units are not given any other treatment. Thus, the "treated potential outcome" for i, generally denoted as yi(1), can be written as yi(di) where the reference assignment di = {d|di = 1, dj = 0, j 6= i} denotes an assignment draw where unit i has been assigned to treatment and all other units have been assigned to control. Similarly, the "control potential outcome" for unit i, generally denoted as yi(0) may be written as yi(0), where 0 is the assignment vector where each unit in the population is assigned to no treatment. The two treatment conditions of interest are the isolated effect of treatment on a unit and the control condition for a unit, i.e. 0 S = {{d1},..., {dN}} and S = {{0},..., {0}}. Averaging over all units, the usual estimand can then be written as: N 0 1 τ(S, S ) = ∑ yi(di) − yi(0) (3.2) N i=1 We reiterate that this estimand, the standard direct treatment estimand, requires no assumptions about non-interference. Independently of this experiment, ({0, 1}N, F, P) and associated esti- i mands, we define a complete non-interference condition (i.e., SUTVA) as Q (di) = {d|di = 1}

7 i and Q (0) = {d|di = 0}. The probabilities of assignment to treated and control potential out- comes for i, generally denoted as πi(1) and πi(0), are defined as: = ( ) = ( i( )) = ({ | = }) πyi(di) πi 1 P Q di P d di 1 (3.3) = ( ) = ( i( )) = ({ | = }) πyi(0) πi 0 P Q 0 P d di 0

While we have stipulated the usual SUTVA condition, we can just as easily define a more com- plicated interference structure and analyze the data with respect to that structure.

3.3 Estimation

Once we have determined the probability of observing each potential outcome, we can define unbiased or consistent estimators of average causal effects. The estimator, τˆ : RN → R, takes an observed set of potential outcomes y(d) and maps to a real number (as an estimate of a treatment effect, τ). Thus, under general interference, the estimation of a causal effect can be as defined the compositional map: τˆ ◦ y : {0, . . . , t − 1}N → R

In this subsection, we consider Horvitz-Thompson-type (Horvitz and Thompson, 1952) and Hájek-type estimators (Hájek, 1964). The Horvitz-Thompson estimator for the weighted aver- age outcome for a treatment condition (over Ω0 ⊆ Ω) is given by: i 0 HT 1 I(Q (d)) 0 y(S, S )w = 0 ∑ ∑ |Si|wiyi(d) (3.4) ∑ ∈ 0 |Si||S | 0 π ( ) i Ω i i∈Ω d∈Si yi d where I is an indicator function that takes the value 1 for each d∗ ∈ Qi(d) and 0 otherwise. An unbiased estimator for the estimand of interest is: HT 0 0 HT 0 HT τˆ (S, S ) = y(S, S )w − y(S , S)w0 (3.5)

HT ( ( i( ))) = It is trivial to show that τˆ is unbiased using the fact that E I Q d πyi(d). An alternative approach would be to use Hájek-type estimators, which would provide estimates that are consis- tent but biased. It often makes sense to use Hájek estimators to reduce the mean-squared-error of the estimator since Horvitz-Thompson estimators may be highly inefficient (Basu, 1971). The Hájek estimator is given by:

I(Qi(d)) 0 ∑ ∈ 0 ∑ ∈ |S |wiyi(d) Haj i Ω d Si πy (d) i y(S, S0) = i (3.6) w I(Qi(d)) ∑d∈S π yi(d) The Hájek estimator is simply the Horvitz-Thompson estimator for the total (i.e., (3.4) multiplied | || 0| by ∑d∈Si Si Si ) divided by the sum of the inverse probabilities included in the sample. The implied consistent estimator is:

Haj 0 0 Haj 0 Haj τˆ (S, S ) = y(S, S )w − y(S , S)w0 (3.7)  i  I(Q (d)) = | || 0| Consistency is established by the fact that E ∑d∈S π ∑d∈Si Si Si and the expectation yi(d) of a ratio converges to the ratio of expectations of the numerator and denominator as the sample size grows.

8 3.4 Variance Estimation

In general, it will not be possible to derive an unbiased or consistent estimator of the true variance of the estimators. This is due to two reasons. First, the inability to observe multiple treatment conditions at the same time implies that it is impossible to model the true covariance between many potential outcomes, just as in a standard potential outcomes framework. Second, there will be a large number of potential outcomes (e.g., those corresponding to different reference assignments in the same treatment condition for the same unit) that cannot be observed simul- taneously, and thus joint probability of observing such potential outcomes is 0. It is well-known that in such a scenario, standard formulas fail to provide consistent estimates of the true vari- ance. We refer the readers to Aronow and Samii (2013a) for discussion of these issues and a conservative estimator of the variance in zero joint probability designs. At the same time, analytically-defined estimators for the variance in complex designs are rarely used in practice due to their instability. Recent advances in jackknife variance estimation for the Hájek estimator (Berger and Skinner, 2005) and the Horvitz-Thompson estimator (Berger and Escobar, 2011) provide more stable estimates. While we have yet to implement a variance estima- tion strategy, we are likely to recommend jackknife estimates of variance, with conditions under which corrections for the zero joint probability of potential outcomes will be necessary.

3.5 Tuning Non-Interference Assumptions

A salient feature of the approach advocated in this paper is the ability to define non-interference assumptions independently of estimands. Researchers can estimate quantities of interest under a variety of assumptions about non-interference partitions. The following definitions and theorems outline the basis for tuning assumptions about non-interference partitions.

Definition 3.8. A refinement,Ri, of a partition, Qi, is any partition such that for each d ∈ D,Ri(d) ⊆ Qi(d).Ri is said to be a finer partition than Qi, and Qi is said to be a coarser partition than Ri. Intuitively, a refinement, Ri, of another partition, Qi, is any partition that further subdivides the elements of Qi. This implies that Ri denotes more restrictive non-interference assumptions than Qi. For some reference assignment, d∗, an assignment vector, d ∈ Ri(d∗) implies that d ∈ Qi(d∗), but the converse is not true. The next theorem shows that treatment conditions of interest may be observed for sufficiently restrictive non-interference assumptions, assuming each reference assignment can be selected with positive probability.

Theorem 3.9 (Existence). Consider a treatment condition, S, defined over Ω0 ⊆ Ω. Let (D, F, P) be 0 an experiment such that for each i ∈ Ω and d ∈ Si,P({d}) > 0 . Then, for each i, there exists a i non-interference partition, Q , such that each element of the treatment condition for unit i, Si, is observed.

i Proof: Consider the singleton partition, R = {{d}}d∈D, where the set containing each indi- vidual assignment vector constitutes an element of the partition. Ri is the finest possible non- interference partition. Let d ∈ Si be a reference assignment in the treatment condition of inter- est for unit i. It is trivially true that yi(d) is always observed for all elements in the partition i R (d) = {d}. Thus each outcome in the treatment condition, Si will always be observed under

9 Ri, implying that Ri is a non-interference partition. Thus, the singleton partition, Ri, or a coarser partition of assignment vectors, always satisfies the conditions of a non-interference partition.

Theorem 3.9 suggests that we can almost always find the non-interference partition for a treat- ment condition by taking refinements of any partition of the set of assignment vectors. In other words, we can eventually reach a correct non-interference assumption by making sequentially more restrictive assumptions. This technique of taking refinements of the partition of assignment vectors (i.e., sequentially more restrictive assumptions) also converges to unbiased (Horvitz- Thompson) or consistent (Hájek) estimates of the estimand.

Theorem 3.10 (Estimation Convergence). Let τˆ HT(Qi) denote an estimable Horvitz-Thompson esti- mator under the non-interference partitions Qi, and let τˆ HT(Ri) denote an estimable Horvitz-Thompson estimator under the partitions Ri, where Ri is a refinement of Qi. Then, E(τˆ(Qi)) = E(τˆ(Ri)) Proof: Ri constitutes a non-interference partition since it is a refinement of Qi, which is also a non-interference partition. We verify the three properties of a non-interference partition in (3.6) for Ri. First, any refinement of a partition is a partition, so the first property is satisfied. ∗ ∗ i ∗ Second, consider some reference assignment, d . By definition, yi(d) = yi(d ) if d ∈ Q (d ). i ∗ i ∗ i ∗ ∗ Since R (d ) ⊆ Q (d ), it follows that d ∈ R (d ) implies yi(d) = yi(d ), verifying the second 0 0 i i 0 property. Finally, for two reference assignments d, d ∈ Si ∪ Si, it follows that Q (d) ∩ Q (d ) = ∅. Since Ri(d) ⊆ Qi(d) and Ri(d0) ⊆ Qi(d0), it follows that Ri(d) ∩ Ri(d0) = ∅, verifying the third property. Since the Horvitz-Thompson estimator is an unbiased estimate of the estimand of interest under non-interference partitions, it follows that E(τˆ(Qi)) = E(τˆ(Ri)) since both Qi and Ri are non-interference partitions. Taken together, theorems 3.9 and 3.10 show that one may generally select sequentially more restrictive non-interference assumptions in order to reach a correct non-interference assumption and converge to an unbiased (or consistent) estimate of the estimand of interest.

4 Applying the Framework

Consider the following problem. Over a population of 50 farmers, 10 farmers are randomly selected to receive the treatment, encouragement to adopt high-yielding seeds. The researcher is interested in the following question: What is the average effect of encouraging a farmer to adopt high-yielding seeds upon his/her yield and upon the yield of farmers on adjacent plots of land? The goal is to isolate the usual direct treatment effect and the average effect of having a single treated neighbor. The treatment is the encouragement to adopt high-yielding seeds. The three treatment conditions of interest are the direct exposure condition, i.e., the direct effect of being treated without any other interference; the indirect exposure condition, i.e., the spillover effect from a neighboring treated farmer on to an untreated farmer; and the pure control condition, i.e., an untreated farmer experiencing no spillovers. The related estimands are the direct treatment effect, the average difference between the direct and pure control conditions, and the indirect exposure effect, the average difference between the indirect exposure and pure control conditions, over all 50 farmers.

10 Potential outcomes are a function of reference assignments that guarantee that we will see the outcome of interest. We are guaranteed to see the pure control condition for farmer i if no farmer in the population is assigned to treatment. Similarly, we are guaranteed to isolate the spillover from neighboring farmer j to farmer i under the assignment vector that only assigns farmer j to the treatment, and all other units in the population to no treatment. Finally, we are guaranteed to isolate the direct treatment condition for farmer i under assignment vector that only assigns i to treatment. Most problems involving spillovers can be most intuitively represented over a network or math- ematical graph. Accordingly, we apply the framework in section 3 through a graph-theoretic approach. In section 4.1, we state the problem, treatment conditions, and estimands of interest in formal terms. In section 4.2, we define a class of intuitive non-interference partitions and cor- responding estimators by appealing to the concept of "SUTVA degree." In section 4.3, we assess the performance of Horvitz-Thompson, Hájek, and unweighted OLS estimators with respect to our estimands of interest.

4.1 Defining the Estimands of Interest

Estimation problems involving spillovers are often best visualized over a network or a mathemat- ical graph. Thus, we apply the spillover framework in section 3 by appealing to graph-theoretic ideas. We estimate causal effects for a population of N farmers, the units of our analysis. The sample space is Ω = {1, . . . , N} with N = 50. We define an edge between two farmers if they have adjacent plots of land, where the set of such edges is denoted by E ⊆ Ω × Ω.2 G is uniquely defined by the pair G = (Ω, E). Figure 1 is a representation of the problem with 50 farmers. The farmers are represented as nodes of the graph, and an edge exists between any two nodes if the corresponding farmers own adjacent plots of land. The "neighbors" of a farmer, i, are those farmers that have adjacent plots of land to farmer i. As shown in figure 1, the neighbors of i, ν(i), are described by those nodes that share an edge with i. The degree of node i, δi = |ν(i)|, is the number of neighbors for node i in the graph.

In this experiment, the treatment is binary, di ∈ {0, 1}. Our treatment conditions of interest are the direct treatment condition (S1), the indirect exposure condition (S01), and the pure control condition (S0). Let dj = {d|dj = 1, dk = 0, k 6= j} denote a vector of treatment assignments where unit j is assigned to treatment and all other units have been assigned to no treatment. Furthermore, let 0 denote the vector of treatment assignments where each unit is assigned to no treatment. Our three treatment conditions of interest can be written as follows:

S1 = {{d1},..., {dN}} (4.1)

S01 = {{dj, j ∈ ν(1)},... {dj, j ∈ ν(N)}}

S0 = {{0},..., {0}}

2Although our discussion assumes "undirected edges," i.e. an edge between i and j implies and edge between j and i, this framework is applicable to "directed networks" as well, where this property need not be true. Also note that this implies that each undirected edge is counted as two reciprocal directed edges in E.

11 ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● i ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Figure 1: An Example of a Network or General Interference Structure

Figure 1 shows a general interference structure or network, G. The set of farmers is denoted by the nodes in the network. The edges between two nodes denote farmers with adjacent plots of land. The set of neighbors of i, ν(i), are shown as the nodes shaded in a darker blue. The degree of i is δi = |ν(i)| = 3.

The potential outcomes in S1 are of the form yi(di), the potential outcomes in S01 are of the form yi(dj) if j ∈ ν(i), and the potential outcomes in S0, are of the form yi(0). In order to define a more intuitive notation similar to Hudgens and Halloran (2008), we let yi(di) ≡ yi(1), yi(dj) ≡ yij(0, 1) 3 and yi(0) ≡ yi(0). The advantage of this notation is that it become clear that there is a potential outcome for each pair of nodes i, j, j ∈ ν(i). It follows that our estimands of interest, the direct treatment effect, τ(S1, S0), and the indirect exposure effect, τ(S01, S0), can be written as: 1 N 1 N 1 N τ(S1, S0) = ∑ yi(1) − yi(0) = ∑ yi(1) − ∑ yi(0) (4.2) N i=1 N i=1 N i=1 1 N 1 N 1 N τ(S , S ) = y (0, 1) − y (0) = y (0, 1) − δ y (0) (4.3) 01 0 |E| ∑ ∑ ij i |E| ∑ ∑ ij |E| ∑ i i i=1 j∈ν(i) i=1 j∈ν(i) i=1

3We may also write the condition where farmer i is treated and neighboring farmer j is treated (with all other farmers assigned to control) as yi(di + dj) = yij(1, 1).

12 4.2 Defining Non-Interference Partitions with SUTVA Degree

In this subsection, we will analyze a simulated experiment where exactly 10 of the 50 units shown in figure 1 are randomly treated. A natural choice for a non-interference assumption over such a graph is what we refer to as a SUTVAdegree or λ. The SUTVA degree is defined as the maximum "path length" between two nodes over which spillovers may occur. If the SUTVA degree is 1, the potential outcomes of unit i may vary with the treatment status of its neighbors, but not with the treatment status of the neighbors of neighbors. If the SUTVA degree is 2, a neighbor of a neighbor may interfere, and so on. The classic non-interference assumption has a SUTVA degree of 0. We state the formal definition of SUTVA degree below.

Definition 4.1. A path is an alternating sequence of edges and nodes where each node is incident with the edge before and after it in the sequence. The shortest path between two nodes i and j, p(i, j), is the path connecting i and j with smallest number of edges.

Definition 4.2. The SUTVA degree, λ, is a number such that two nodes, i and j, with p(i, j) ≤ λ, implies: 0 i 0 d ∈/ Q (d) if dj 6= dj The resulting non-interference partition corresponding to a reference assignment, d∗, is:

i ∗ ∗ Q (d ) = {d|dk = dk if p(i, k) ≤ λ}

Using this definition, we can easily state our non-interference assumptions for a SUTVA degree of λ for the problem at hand:

i Q (dj) = {d|dj = 1, dk = 0 if j 6= k and p(i, k) ≤ λ} (4.4)

i Q (0) = {d|dk = 0 if p(i, k) ≤ λ}

Figure 2 depicts how the choice of λ affects observed potential outcomes. In figure 2, 10 of 50 units are selected to receive treatment (those units with a thick red border), while no other units receive treatment. Among the magenta units, we observe yi(1), and we observe yi(0) among the light blue units. We observe yij(0, 1) among the light green units, those units with a single treated neighbor. Finally, we observe complex outcomes among the gray units, which do not correspond to any of our outcomes of interest. Increasing λ = 1 to λ = 2 makes it more difficult to observe the outcomes of interest. Only one untreated unit and four treated units cannot be classified as an outcome of interest when λ = 1, but 19 unclassified outcomes for untreated units and 6 unclassified outcomes for treated units emerge when λ is increased to 2. For example, two additional treated units cannot be classified when λ = 1 is increased to λ = 2 because they are separated by a path length of 2. In general, stipulating more stringent non-interference assumptions though a higher λ, or in- creased assumed network density will yield more units observed with complex spillovers. Thus, under complete , estimation is likely to get increasingly inefficient as λ or net- work density grows. In a related paper, we consider techniques for incorporating beliefs about λ directly into the randomization protocol, allowing for more efficient estimation. It can be shown

13 Figure 2: Observing Potential Outcomes with λ = 1 and λ = 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● Control ● Indirect ● Direct ● Complex ● Control ● Inirect ● Direct ● Complex

Treated Treated

(a) λ = 1 (b) λ = 2 Figure 2 describes how selecting λ = 1 and λ = 2 affects the ability to observe various potential outcomes, with 2(a) depicting λ = 1, and 2(b) depicting λ = 2. In each subfigure, a graph with 50 units, where 10 have been selected for treatment, is depicted. Those units with a thick red border correspond to units selected for treatment, while the rest of the units are not selected for treatment. The magenta units are those units for which yi(1) is cleanly observed, and the light blue units are those for which yi(0) is cleanly observed. The light green units are those for which yij(0, 1) is cleanly observed, i.e., the spillover to an untreated unit , i from a single treated neighbor, j, is observed without interference. The gray units correspond to complex spillovers where none of the potential outcomes of interest are observed. that the non-interference assumptions induced by higher values of λ correspond to refinements of partitions induced by lower values of λ. As discussed in section 3.5, it will generally be ad- visable to test multiple values of λ to make sure that estimator has "converged" to an unbiased estimate.

4.3 Simulation and Analysis of Estimators

We performed a series of simulations to assess the performance of this approach with respect to the estimators derived in section 3. The first goal of these simulations is to demonstrate the bias- efficiency tradeoff present when choosing between the Hájek and Horvitz-Thompson estimators of average causal effects. The second goal is to demonstrate the importance of calculating inverse probability weights according to observed treatment conditions. In order to demonstrate the performance of the estimators, we scale up the problem from section 4 by a factor of ten; that is, we analyze a simulated experiment that assigns 100 of 500 units to treatment. The basic Monte Carlo simulation was designed as follows: we randomly generated an Erd˝os-Renyigraph of 500 nodes, where the probability of an edge between any two nodes was 3/500. We then randomly selected 100 nodes for treatment. Using λ = 1, we classified units that were in S1, S01, S0, and other complex spillovers.

14 We calculated the probabilities of assignment by simulating the assignment procedure 10,000 times and counting the number of times an outcome in each of the treatment conditions was observed, dividing that number by 10,000. Note that in accordance with (4.3), the unit of obser- vation is the edge not the node in S01, meaning that we have to divide by the sum of degrees of the nodes, rather than simply the number of nodes. Potential outcomes were generated in a way that demonstrates how naïve analyses might go astray. Baseline outcomes (yi(0)) were set equal to the inverse of πi(1). This was chosen for two reasons: first, regardless of network characteristics, all units have a non-zero value of πi(1); and second, ordinary least squares performs particularly poorly when potential outcomes are related to probabilities of assignment. Each unit, if it receives treatment, imposes on its neighbors a spillover that is equal to its baseline outcome, rescaled by a constant so that the average of all spillovers is equal to 1. This setup ensures that which neighbor spills over changes the magnitude of the spillover effect. Finally, we set the direct treatment effect to 2 for all units.

λ = 0 λ = 1 Direct: τ(S1, S0) Direct: τ(S1, S0) Indirect: τ(S01, S0) truth 2.00 2.00 1.00 mean 1.60 2.00 1.00 HT RMSE (4.29) (3.44) (2.21) mean 1.53 1.97 1.00 Hájek RMSE (2.13) (1.67) (0.64) mean -2.25 -2.27 -1.59 OLS RMSE (4.29) (4.38) (2.80)

Table 1: Bias and Efficiency of HT, Hájek, and OLS estimators

Table 1 presents the results of 10,000 simulations, comparing the bias and efficiency of the Horvitz-Thompson (HT), Hájek, and Ordinary Least Squares (OLS) estimates of our two esti- mands of interest, the direct effect, τ(S1, S0), and the indirect exposure effect, τ(S01, S0). Addi- tionally, we demonstrate that ignoring the true interference structure (λ = 1) would yield biased estimates. Had we made overly conservative non-interference assumptions, e.g., λ = 2, we would retrieve unbiased estimates, although less efficiently (not shown). For each estimator and non- interference assumption, we report the average estimates of these two estimands as well as the associated root mean squared error relative to the true magnitude of the causal effects. Ordinary Least Squares is clearly biased and inefficient, with all estimates having the wrong sign. When the correct non-inteference assumption is made, the Hájek estimator is approximately unbiased, whereas the Horvitz-Thompson estimator is exactly unbiased. Nevertheless, Hajek outperforms Horvitz-Thompson across the board in terms of precision. For almost any practical implementa- tion, Hájek is clearly preferred. We do not report estimated confidence intervals here, but these will be included once decisions on variance estimation have been made.

15 5 Conclusion

In this paper, we have provided a standardized framework for the analysis of spillovers in an experimental setting. In particular, we showed that it is possible to define estimands with re- spect to reference assignments, so that the definition of the estimand is decoupled from the non-interference partition. This allows for greater flexibility in defining assumptions for non- interference. We also showed how the general framework can be applied to a network setting. The network setting allows for estimands and non-interference partitions to be defined in an in- tuitive way. Finally, we showed that in this complex experimental environment, as in many other environments, the Hájek estimator will outperform the Horvitz-Thompson estimator. Many tasks remain in this research plan. First, we intend to undertake a series of Monte Carlo simulations to determine the most practical advice for variance estimation and confidence inter- vals. Second, we are developing methods to incorporate beliefs about λ into sampling procedures that will allow for the more efficient estimation of causal effects. Third, we intend to define and discuss more complicated non-interference conditions, which affect not only the node receiv- ing spillovers but also the node sending spillovers. Finally, we intend to develop a strategy for deciding between alternative non-interference assumptions.

16 References

Angelucci, Manuela, Giacomo de Giorgi, Marcos A. Rangel and Imran Rasul. 2010. “Family Networks and School Enrolment: Evidence from a Randomized Social Experiment.” Journal of Public Economics 94(3):197–221. Aronow, Peter M. and Cyrus Samii. 2013a. “Conservative Variance Estimation for Sampling Designs with Zero Pairwise Inclusion Probabilities.” Survey Methodology . Aronow, Peter M. and Cyrus Samii. 2013b. “Estimating Average Causal Effects under General Interference.” Unpublished. Basu, Debabrata. 1971. An essay on the logical foundations of survey sampling, Part 1. In Foun- dations of , ed. Vidyadhar Prabhakar Godambe and David A. Sprott. Toronto: Holt, Reinhart, and Winston pp. 203–242. Berger, Yves G. and Chris J. Skinner. 2005. “A Jackknife Variance Estimator for Unequal Proba- bility Sampling.” Journal of the Royal Statistical Society, Series B 67(1):79–89. Berger, Yves G. and Emilio L. Escobar. 2011. Jackknife Variance Estimation for Functions of Horvitz-Thompson Estimators under Unequal Probability Sampling without Replacement. In Conference Papers of the 58th World Congress of the ISI. Chen, Jiehua, Macartan Humphreys and Vijay Modi. 2010. “Technology Diffusion and Social Networks: Evidence from a Field Experiment in Uganda.” Unpublished. Cox, David R. 1958. Planning of Experiments. New York: Wiley. Duflo, Esther and Emmanuel Saez. 2003. “The Role of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment.” Quarterly Journal of Economics 118(3):815–842. Giné, Xavier and Ghazala Mansuri. 2011. “Together We Will: Experimental Evidence on Female Voting Behavior in Pakistan.” Unpublished. Hájek, Jaroslav. 1964. “Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population.” Annal of Mathematical Statistics . Horvitz, Daniel G. and Donovan J. Thompson. 1952. “A Generalization of Sampling Without Replacement from a Finite Universe.” Journal of the American Statistical Association . Hudgens, Michael G. and M. Elizabeth Halloran. 2008. “Toward Causal Inference with Interfer- ence.” Journal of the American Statistical Association 103(482):832–843. Ichino, Nahomi and Matthias Schuendeln. 2012. “Deterring or Displacing Electoral Irregularities? Spillover Effects of Observers in a Randomized Field Experiment in Ghana.” Journal of Politics 74(1):292–307. Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonran- domized Studies.” Journal of Educational Psychology 74(5):688–701. Rubin, Donald B. 1980. “Discussion of “Randomization Analysis of Experimental Data in the Fisher Randomization Test.” Journal of the American Statistical Association 75:591–593.

17 Rubin, Donald B. 1990. “Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies.” Statistical Science 5(4):472–480. Sinclair, Betsy, Margaret McConnell and Donald P. Green. 2012. “Detecting Spillover Effects: De- sign and Analysis of Multi-level Experiments.” American Journal of Political Science 56(4):1055– 1069. Sobel, Michael E. 2006. “What Do Randomized Studies of Housing Mobility Demonstrate?: Causal Inference in the Face of Interference.” Journal of the American Statistical Association 101(476):1398–1407.

18