<<

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2021, VOL. 116, NO. 534, 632–644: Applications and Case Studies https://doi.org/10.1080/01621459.2020.1775612

Causal Inference With Interference and Noncompliance in Two-Stage Randomized

Kosuke Imaia , Zhichao Jiangb, and Anup Malanic,d aDepartment of Government and Department of , Institute for Quantitative Social Science, Harvard University, Cambridge, MA; bDepartment of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA; cPritzker School of Medicine, University of Chicago Law School, Chicago, IL; dNational Bureau of Economic Research, Cambridge, MA

ABSTRACT ARTICLE HISTORY In many social science experiments, subjects often interact with each other and as a result one unit’s Received November 2018 treatment influences the outcome of another unit. Over the last decade, a significant progress hasbeen Accepted May 2020 made toward causal inference in the presence of such interference between units. Researchers have shown that the two-stage randomization of treatment assignment enables the identification of average direct KEYWORDS and spillover effects. However, much of the literature has assumed perfect compliance with treatment Complier average causal effects; Encouragement assignment. In this article, we establish the nonparametric identification of the complier average direct and design; Program evaluation; spillover effects in two-stage randomized experiments with interference and noncompliance. In particular, Randomization inference; we consider the spillover effect of the treatment assignment on the treatment receipt as well asthe Spillover effects; Two-stage spillover effect of the treatment receipt on the outcome. We propose consistent estimators and derive least squares their randomization-based variances under the stratified interference assumption. We also prove the exact relationships between the proposed randomization-based estimators and the popular two-stage least squares estimators. The proposed methodology is motivated by and applied to our own randomized evaluation of India’s National Health Insurance Program (RSBY), where we find some evidence of spillover effects. The proposed methods are implemented via an open-source software package. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

1. Introduction Unfortunately, the ITT analysis is unable to tell, for example, whether a small causal effect arises due to ineffective treatment Early methodological research on causal inference has assumed or low compliance. While researchers have developed methods no interference between units (e.g., Neyman 1923;Fisher1935; to deal with noncompliance (e.g., Angrist, Imbens, and Rubin Holland 1986;Rubin1990). That is, spillover effects are assumed 1996), they are based on the assumption of no interference to be absent. In many social science experiments, however, subjects often interact with each other and as a result one between units. This assumption may be unrealistic since there unit’s treatment influences the outcome of another unit. Over are multiple ways in which spillover effects could arise. For the last decade, a significant progress has been made toward example, one unit’s treatment assignment may influence another causal inference in the presence of such interference between unit’s decision to receive the treatment. It is also possible that units (e.g., Sobel 2006;Rosenbaum2007;HudgensandHalloran one’s treatment receipt affects the outcomes of other units. 2008; Tchetgen Tchetgen and VanderWeele 2010;Aronow2012; In this article, we show how to analyze two-stage random- Vanderweele et al. 2013;LiuandHudgens2014;Hong2015; ized experiments with both interference and noncompliance Forastiere, Airoldi, and Mealli 2016; Aronow and Samii 2017; (Section 3). In an influential paper, Hudgens and Halloran Athey, Eckles, and Imbens 2018;Bairdetal.2018;Basseand (2008) proposed two-stage randomized experiments as a gen- Feller 2018). eral approach to causal inference with interference. We extend Much of this literature, however, has not addressed another their framework so that it is applicable even in the presence of common feature of social science experiments where some con- two-sided noncompliance. In particular, we define the complier trol units decide to take the treatment while others in the treat- average direct and spillover effects, propose consistent esti- ment group refuse to receive one. Such noncompliance often mators, and derive their randomization-based variances under occurs in these experiments because for ethical and logistical the stratified interference assumption. Like Aronow2012 ( ), we reasons, researchers typically cannot force experimental sub- follow Hudgens and Halloran (2008) by referring to the effect of jects to adhere to experimental protocol. The existing methods one’s own treatment as a direct effect and the effect of another either assume perfect compliance with treatment assignment unit’s treatment as a spillover effect. In a closely related work- or focus on intention-to-treat (ITT) analyses by ignoring the ing paper, Kang and Imbens (2016)alsoanalyzedtwo-stage information about actual receipt of treatment. randomized experiments with interference and noncompliance.

CONTACT Zhichao Jiang [email protected] Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA 01003. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. These materials were reviewed for reproducibility. © 2020 American Statistical Association JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 633

We consider a more general pattern of interference by allowing insurance scheme was cashless, with the plan paying providers forthespillovereffectofthetreatmentassignmentonthetreat- directly rather than reimbursing beneficiaries for expenses. The ment receipt as well as the spillover effect of the treatment receipt plan also covered INR 100 (or approximately USD 5.77 using on the outcome. the OECD’s purchasing-power parity adjusted exchange rate Finally, we prove the exact relationships between the pro- of INR 17.34/USD for 2013) of transportation costs per hos- posed randomization-based estimators and the popular two- pitalization. The coverage lasted one year starting the month stage least squares estimators as well as those between their after the first enrollment in a particular district, but was often corresponding variance estimators. Our results build upon and extended without cost to beneficiaries. The insurance plan was extend the work of Basse and Feller (2018) to the case with provided by private insurance companies, but the premium was noncompliance. We also conduct simulation studies to investi- paid by the government. In Karnataka, the state in which the gate the finite sample performance of the confidence intervals randomized evaluation was conducted, premiums were roughly basedontheproposedvarianceestimators(seeAppendixD INR 200 (USD 11.53) per year during the study. Households in the supplementary materials). The proposed methods are only had to pay INR 30 (USD 1.73) per year user fee to obtain implemented via an open-source software package, aninsurancecard.Therewerenodeductiblesorco-payments (Imai and Jiang 2018), which is available at https://cran.r-project. and there was an annual cap of INR 30,000 (USD 1,783) per org/package=experiment. household. Theproposedmethodologyismotivatedbyourown We conducted a randomized controlled trial to determine randomized evaluation of Indian’s Health Insurance Scheme whether RSBY increases access to hospitalization, and thus (known by the acronym RSBY), a study that employed the two- health, and reduced impoverishment due to high medical stage randomized design. In Section 2, we briefly describe the expenses. The findings are policy-relevant because the Indian background and experimental design of this study. In Section 4, government has announced a new scheme called the National we apply the proposed methodology to this study. We present Health Protection Scheme (known by the acronym PMJAY) some evidence concerning the existence of positive spillover that seeks to build on RSBY to provide coverage for nearly 500 effects of treatment assignment on the enrollment in the RSBY. million Indians, but has not yet finalized its design or how much In addition, we estimate the complier average direct effect to fund it. (CADE) to be positive under the “low” treatment assignment In this evaluation, spillover effects are of concern because mechanism, where fewer households in a village are encouraged formal insurance may crowd out informal insurance, which is a to enroll in the insurance program. Finally, Section 5 concludes. substitute method of smoothing health care shocks (e.g., Jowett 2003;Lin,Liu,andMeng2014). That is, the enrollment in RSBY by one household may depend on the treatment assignment of 2. A Motivating Empirical Application other households. In addition, we also must address noncompli- In this section, we describe the randomized evaluation of the ance because some households in the treatment group decided Indian health insurance program, which serves as our motivat- not to enroll in RSBY while others in the control group managed ing empirical application. We provide a brief background of the to join the insurance program. evaluation and introduce its experimental design.1 2.2. Experimental Design 2.1. Randomized Evaluation of the Indian Health Our evaluation study is based on a total of 11,089 above poverty Insurance Program line (APL) households in two districts of Karnataka state who Each year, 150 million people worldwide face financial catastro- had no pre-existing health insurance coverage and lived within phe due to spending on health. According to a 2010 study, more 25 km of an RSBY empaneled hospital. We selected APL house- than one third of them live in India (Shahrawat and Rao 2011). holds because they are not otherwise eligible for RSBY, but are Almost 63 million Indians fall below the poverty line (BPL) candidates for any expansion of RSBY. The two districts were due to health spending (Berman, Ahuja, and Bhandari 2010). Gulbarga and Mysore, which are economically and culturally In 2008, the Indian government introduced its first national, representative of central and southern India, respectively. We public health insurance scheme, Rastriya Swasthya Bima Yojana required proximity to a hospital as hospital insurance has little (RSBY), to address the problem. Its aim was to provide coverage value if there is no local hospital at which to use the insurance. for hospitalization to its BPL population, comprising roughly As shown in Table 1,weemployedatwo-stagerandomized 250 million persons. The program ran from 2013 to 2019. design to study both direct and spillover effects of RSBY. In RSBY provided access to an insurance plan that covered the first stage, randomly selected 219 villages were assigned to inpatienthospitalcareforuptofivemembersofeachhousehold. The plan covered all pre-existing diseases and there was no age limit of the beneficiaries. The rates of most surgical proce- Table 1. Two-stage randomization design. dures were fixed by the government. Beneficiaries could obtain Village-level arms Household-level arms treatment at any hospital empaneled in the RSBY network. The Mechanisms Number of Treatment Control Number of Enrollment Mechanisms villages Treatment Control households rates 1For a more detailed description of the design, see the preanalysis plan High 219 80% 20% 5714 67.0% posted on the American Economic Association’s Registry at https://www. Low 216 40% 60% 5373 46.2% socialscienceregistry.org/trials/1793 634 K. IMAI, Z. JIANG, AND A. MALANI the “High” treatment assignment mechanism whereas the rest (Aj = 0) indicates that a high (low) proportion of units are of villages were assigned to the “Low” treatment assignment assigned to the treatment within cluster j. In our application, mechanism. In the second stage, under the “High” assignment Aj = 1 corresponds to the treatment assignment mechanism, 80% of the households within a cluster are com- of 80%, whereas Aj = 0 represents 40%. We assume complete pletely randomly assigned to the treatment condition, whereas randomization, in which a total of Ja clusters are assigned to therestofhouseholdswereassignedtothecontrolgroup.In the assignment mechanism a for a = 0, 1 with J0 + J1 = J. contrast, under the “Low” assignment mechanism, 40% of the Finally, A = (A1, A2, ..., AJ) denotes the vector of treatment households within a cluster are completely randomly assigned to assignment mechanisms for all clusters. the treatment condition. The households in the treatment group The second stage of randomization concerns the treatment are given RSBY essentially for free, whereas some households in assignment for each unit within cluster j based on the assign- thecontrolgroupwereabletobuyRSBYatthegovernmentprice ment mechanism Aj.LetZij be the binary treatment assignment 2 of roughly INR 200. variable for unit i in cluster j where Zij = 1(Zij = 0) implies Households were informed of the assigned treatment con- that the unit is assigned to the treatment (control) condition. Let = ditionsandweregiventheopportunitiestoenrollinRSBY Zj (Z1j, ..., Znjj) denote the vector of assigned treatments for from April to May 2015. Approximately 18 months later, we the nj units in cluster j and Pr(Zj = zj | Aj = a) represent the carried out a post-treatment survey and measured a variety distribution of the treatment assignment vector when cluster j is of outcomes. Policy makers are interested in the health and assigned to the assignment mechanism Aj = a. We assume the financial effects of RSBY. To evaluate the efficacy of RSBY,we complete randomization such that a total of njz units in cluster must estimate the effects of actual treatment receipt as well j are assigned to the treatment condition z for z = 0, 1, where astheITTeffectsbecausesomehouseholdsinthetreatment nj1 + nj0 = nj. group may not enroll in RSBY while others in the control group may do so. Assumption 1 (Two-stage randomization). 1. Complete randomization of treatment assignment mecha- 3. The Proposed Methodology nism at the cluster level: 1 ( = ) = In this section, we first review the ITT analysis of two-stage Pr A a J randomized experiments proposed by Hudgens and Halloran J1 (2008) and others. We then introduce a new causal quantity  for all a such that 1 a = J where 1 is the J dimensional of interest, the CADE, present a nonparametric identification J 1 J vector of ones. result, and propose a consistent estimator. We further consider 2. Complete randomization of treatment assignment within the identification and inference of the CADE under the assump- each cluster: tion of stratified interference, and derive the randomization- 1 based variance of the proposed estimator. We also establish Pr(Z = z | A = a) = j j nj the direct connections between these randomization-based esti- nj1 mators and the two-stage least squares estimators. Finally, we  = present analogous results for another new causal quantity, the for all z such that 1nj z nj1. complier average spillover effect (CASE), in Appendix A in the supplementary materials. Following the literature, we adopt the finite population framework, in which potential outcomes are treated as constants and randomness comes from treatment assignment alone. We 3.1. Two-Stage Randomized Experiments consider two-stage randomized experiments with noncompli- We consider a two-stage randomized experiment (Hudgens and ance, in which the actual receipt of treatment may differ from the Halloran 2008)withatotalofN units and J clusters where each treatment assignment. Let Dij be the treatment receipt for unit i in cluster j and Dj = (D1j, ..., Dn j) be the vector of treatment unit belongs to one of the clusters. We use nj to denote the j = J receipts for the nj unitsinthecluster.TheoutcomevariableYij number of units in cluster j with N j=1 nj.Inatwo-stage is observed for each unit and Yj = (Y1j, ..., Yn j) denotes the randomized experiment, we first randomly assign each cluster j vector of observed outcomes for the nj units in cluster j. to one of the treatment assignment mechanisms, which in turn We use the potential outcomes framework of causal inference assigns different proportions of units within each cluster to the (e.g., Neyman 1923;Holland1986;Rubin1990). For unit i in treatment condition. For the sake of simplicity, we consider two ( ) ∈{ } = cluster j,letDij z represent the potential value of treatment assignment mechanisms indicated by Aj 0, 1 where Aj 1 receipt, when the treatment assignment vector for all N units in the experiment equals z.Inaddition,weuseYij(z; d) to denote 2For the sake of simplicity, we analyze this dichotomized assignment. In the the potential outcome, when the treatment assignment vector original experiment, households could be assigned to any of four groups; equals z and treatment receipt vector equals d. Lastly, let Yij(z) Group A was given RSBY for free, Group B was given RSBY for free and a cash transfer equal to the premium on insurance, Group C was sold RSBY for the represent the potential value of outcome when the treatment same premium as the government paid for RSBY coverage, and Group D assignment vector equals z,thatis,Yij(z) = Yij(z; Dij(z)).The had no intervention. Here, the High assignment group consists of villages observed values of treatment receipt and outcome are given by where 80% of households were assigned to Groups A and B, whereas = = the Low assignment group consists of villages with 60% of households Dij Dij(Z) and Yij Yij(Z) where Z is the N dimen- assigned to Groups C and D. sional vector of treatment assignment for all units. If there JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 635 were no restriction on the pattern of interference, each unit has We define the ITT effects, starting with the average direct 2N potential values of treatment receipt and outcome, making effect of treatment assignment on the treatment receipt and identification infeasible. Hence, following the literature (e.g., outcome under the treatment assignment mechanism a,as Hong and Raudenbush 2006;Sobel2006; Hudgens and Halloran DEDij(a) = Dij(1, a) − Dij(0, a), 2008), we only allow interference within each cluster. DEYij(a) = Yij(1, a) − Yij(0, a), Assumption 2 (Partial interference). whereDEDandDEYstandfortheaveragedirect effect on D   Yij(z) = Yij(z ) and Dij(z) = Dij(z ) and Y, respectively. These parameters quantify how the treat- ment assignment of a unit may affect its treatment receipt and  =  for all z and z with zj zj. outcome by averaging the treatment assignments of other units within the same cluster under a specific assignment mechanism. Assumption 2 implies that although the treatment receipt Finally, averaging these unit-level quantities gives the following and outcome of a unit can be influenced by the treatment assign- averagedirecteffectsoftreatmentassignmentforeachcluster ment of another unit within the same cluster, they cannot be andfortheentire(finite)population, affected by units in other clusters. This assumption substantially nj J reduces the number of potential values of treatment receipt and 1 1 n DEDj(a) = DEDij(a),DED(a) = njDEDj(a), outcome for each unit in cluster j from 2N to 2 j . n N j i=1 j=1 nj J 1 1 3.2. Intention-to-Treat Effects: A Review DEYj(a) = DEYij(a),DEY(a) = njDEYj(a). nj = N = We next review the previous results about the ITT analysis of i 1 j 1 two-stage randomized experiments under the partial interfer- Another quantity of interest is the spillover effect, which ence assumption (Hudgens and Halloran 2008). Our analysis quantifies how one unit’s treatment receipt or outcome is differs from the existing ones in that we weight each unit equally affected by other units’ treatment assignments. Following instead of giving an equal weight to each cluster as done in the Halloran and Struchiner (1995), we define the unit-level literature. spillover effects on the treatment receipt and outcome as,

SEDij(z) = Dij(z,1) − Dij(z,0), 3.2.1. Causal Quantities of Interest = − We begin by defining preliminary average quantities. First, we SEYij(z) Yij(z,1) Yij(z,0), define the average potential value of treatment receipt for unit i which compare the average potential values under two different in cluster j when the unit is assigned to the treatment condition assignment mechanisms, that is, a = 1anda = 0, while holding z under the treatment assignment mechanism a.Wedosoby one’s treatment assignment at z. We then define the spillover averaging over the distribution of treatment assignments for the effects on the treatment receipt and outcome at the cluster and other units within the same cluster, population levels, = = = nj J Dij(z, a) Dij(Zij z, Z−i,j z−i,j) 1 1 ∈Z SEDj(z) = SEDij(z),SED(z) = njSEDj(z), z−i,j −i,j n N j i=1 j=1 Pr(Z−i,j = z−i,j | Zij = z, Aj = a), nj J = = 1 = 1 where Z−i,j (Z1i, ..., Zi−1,j, Zi+1,j, ..., Znjj) represents the SEYj(z) SEYij(z),SEY(z) njSEYj(z). nj N (nj − 1) dimensional subvector of Zj with the entry for unit i i=1 j=1 Z− ={( ... − + ... ) |  ∈ removed and i,j z1j, , zi 1,j, zi 1,j, , znjj zi j The quantities defined above differ from those introduced { }  = − + } 0, 1 for i 1, ..., i 1, i 1, ..., nj is the set of all possible in the literature in that we equally weight each unit (see Basse values of the assignment vector Z−i,j. Similarly, we define the and Feller 2018). In contrast, Hudgens and Halloran (2008)gave average potential outcome for unit i in cluster j as, an equal weight to each cluster regardless of its size. While our analysis focuses on the individual-weighted estimands rather Yij(z, a) = Yij(Zij = z, Z−i,j = z−i,j) than cluster-weighted estimands, our method can be general- z− ∈Z− i,j i,j ized to any weighting scheme, and as such the proofs in the = | = = Pr(Z−i,j z−i,j Zij z, Aj a). supplementary appendix are based on general weights. Given these unit-level average values of potential outcomes, Finally, in actual policy implementations, the treatment we consider the cluster-level and population-level average assignment is typically based on a deterministic criterion potential values of treatment receipt and outcome, rather than randomization, suggesting that the causal quantities discussed above may not be of direct interest to policy makers. nj J Even in this situation, however, these causal quantities can = 1 = 1 Dj(z, a) Dij(z, a), D(z, a) njDj(z, a), provide some policy implications by telling us whether or not nj N i=1 j=1 spillover effects exist at all. We discuss this issue in the context nj J 1 1 of our application (see Section 4)andconsideramodel-based Yj(z, a) = Yij(z, a), Y(z, a) = njYj(z, a). approach to further address this point (see Appendix E in the n N j i=1 j=1 supplementary materials). 636 K. IMAI, Z. JIANG, AND A. MALANI

3.2.2. Nonparametric Identification cluster. Thus, the compliance status of a unit is a function of the Hudgens and Halloran (2008) established the nonparametric treatment assignment of other units in the same cluster, identification of the ITT effects, which equally weight each clus- C (z− ) = I{D (1, z− ) = 1, D (0, z− ) = 0}.(1) ter regardless of its size. Here, we present analogous results by ij i,j ij i,j ij i,j weighting each unit equally as done above. Define the following We consider a measure of compliance behavior for each unit quantities, by averaging over the distribution of treatment assignments of the other units within the same cluster under the treatment 1 J = N j=1 njDj(z, a)I(Aj a) assignment mechanism a. This general measure of compliance D(z, a) = , 1 J = behavior ranges from 0 to 1 and is defined as, j=1 I(Aj a) J 1 J Cij(z−i,j) = Cij(z−i,j) Pr(Z−i,j = z−i,j | Aj = a) (2) = n Y (z, a)I(A = a) N j 1 j j j ∈Z Y(z, a) = , z−i,j −i,j 1 J I(A = a) J j=1 j for a = 0, 1. Given this compliance measure, we now define where the CADE as the average direct effect of treatment assignment among compliers, nj = i=1 DijI(Zij z) J nj Dj(z, a) = , j=1 i=1 CDYij(a) nj = CADE(a) = , i=1 I(Zij z) J nj = = Cij(z−i,j) nj = j 1 i 1 i=1 YijI(Zij z) Yj(z, a) = . where nj = i=1 I(Zij z) CDYij(a) = {Yij(1, z−i,j) − Yij(0, z−i,j)} Then, we can obtain the unbiased estimators of the direct effects z−i,j∈Z−i,j and the spillover effects. × Cij(z−i,j) Pr(Z−i,j = z−i,j | Aj = a). Theorem 1 (Unbiased estimation of the ITT effects). Define the The definition requires that there exists at least one complier in following estimators, the population. If units do not influence each other, we have Y (z , z− ) = Y (z ) and D (z , z− ) = D (z ).Hence, ij ij i,j ij ij ij ij i,j ij ij DED(a) = D(1, a) − D(0, a), SED(z) = D(z,1) − D(z,0), the compliance status for each unit in Equations (1)and(2) DEY (a) = Y(1, a) − Y(0, a), SEY(z) = Y(z,1) − Y(z,0). no longer depends on the treatment assignment of the other units. As a result, under this setting, the CADE equals the finite Under Assumptions 1 and 2, these estimators are unbiased for sample version of the complier average causal effect defined the ITT effects, in Angrist, Imbens, and Rubin (1996). Finally, in the absence of noncompliance, that is, C (z− ) = 1forallz− and E{ }= E{ }= ij i,j i,j DED(a) DED(a), SED(z) SED(z), i, j,thenCADE(a) asymptotically equals DEY(a) as the cluster E{DEY(a)}=DEY(a), E{SEY(z)}=SEY(z). size grows. The CADE combines two causal pathways: a unit’s treatment Proof is straightforward and hence omitted. assignment Zij can affect its outcome Yij either through its own treatment receipt Dij or that of the other units D−i,j = (D , ..., D − , D + , ..., D ). If there is either no spillover 3.3. Complier Average Direct Effects 1j i 1,j i 1,j njj effect of encouragement on treatment receipt or no spillover We now address the issue of noncompliance in the presence of effect of treatment receipt on outcome, then the second causal interference between units. In a seminal paper, Angrist, Imbens, pathway no longer exists. Under this scenario, the CADE corre- and Rubin (1996) showed how to identify the complier aver- sponds to the average direct effect of one’s own treatment receipt age causal effect (CACE) in standard randomized experiments among compliers because the treatment assignment is the same under the assumption of no interference. The CACE represents as the treatment receipt. In contrast, when both types of spillover the average effect of treatment receipt among the compliers who effects exist, the CADE includes the indirect effect of one’s own wouldreceivethetreatmentonlywhenassignedtothetreatment encouragement on the outcome through the treatment receipt condition. Below, we introduce the CADE, which is a general- of other units in the same village as well as the direct effect ization of the CACE to settings with interference, and show how of one’s own treatment receipt on the outcome. Unfortunately, to nonparametrically identify and consistently estimate it using without additional assumptions, the CADE is not identifiable. the data from two-stage randomized experiments. We therefore propose a set of assumptions for nonparametric identification. In addition, Appendix E.2 in the supplementary 3.3.1. Causal Quantity of Interest materials considers a model-based approach to the identifica- We first generalize the definition of compliers to settings with tion and estimation for further distinguishing the two causal interference between units. Under the assumption of no inter- pathways. ference, compliers are those who receive the treatment only when assigned to the treatment condition. However, in the 3.3.2. Nonparametric Identification presence of partial interference, the treatment receipt is also To establish the nonparametric identification of the CADE, affected by the treatment assignment of other units in the same we begin by generalizing the exclusion restriction of JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 637

Angrist, Imbens, and Rubin (1996), which assumes no inter- is plausible in our application because the encouragement is ference between units. expected to increase the enrollment in the RSBY. In the absence of interference between units, exclusion Assumption 3 (Exclusion restriction with interference between restriction and monotonicity are sufficient for the nonpara- units). metric identification of the complier average causal effect.   However, when interference exists, an additional restriction ( ) = ( ) Yij zj; dj Yij zj; dj for any zj, zj and dj. ontheinterferencestructureisnecessary.Thereasonisthat there are two types of possible spillover effects: the spillover Assumption 3 states that the outcome of a unit does not effect of treatment assignment on the treatment receipt and depend on the treatment assignment of any unit within the same the spillover effect of treatment receipt on the outcome. cluster (including itself) so long as the treatment receipt for As a result, even under exclusion restriction, the treatment all the units of the cluster remains identical. In other words, assignment of a noncomplier can still affect its outcome the outcome of a unit depends only on the treatment receipt through the treatment receipts of other units within in the same vector of all units within its own cluster. The assumption is cluster. violated if the outcome of one unit is influenced by its own To address this problem, we propose the following identifi- treatmentassignmentorthatofanotherunitwithinthesame cation assumption. clusterevenwhenthetreatmentreceiptsofalltheunitsinthe cluster including itself are held constant. In our application, the Assumption 5 (Restricted interference under noncompliance). assumption is plausible since the encouragement to enroll in the For any unit i in cluster j,ifDij(1, z−i,j) = Dij(0, z−i,j) for some RSBY is unlikely to affect the hospital expenditure other than z−i,j ∈ Z−i,j,thenYij(Dj(1, z−i,j)) = Yij(Dj(0, z−i,j)) holds. through the actual enrollment itself. Under Assumption 3,wecanwritethepotentialoutcome The assumption states that if the treatment receipt of a unit is as the function of treatment receipt alone, Yij(dj).Thus,the not affected by its own treatment assignment (i.e., the unit is a = observed outcome is written as Yij(Dj) where Dj Dj(Zj).We noncomplier), then its outcome should also not be affected by maintain Assumption 3 for the remainder of the article. To avoid its own treatment assignment through the treatment receipts confusion, we will explicitly write out the treatment receipt as of other units in the same cluster. Although Assumption 5 = the argument of potential outcome. For example, Yij(Dj 1nj ) appears to be concerned only with the spillover effects of treat- = = represents the potential outcome when Dij 1forj 1, ..., nj, ment receipt on the outcome, its plausibility also depends on = while Yij(1nj ) represents the potential outcome when Zij 1for the spillover effects of treatment assignment on the treatment = j 1, ..., nj. receipt. We next generalize the monotonicity assumption of Angrist, To facilitate the understanding of this assumption, we con- Imbens, and Rubin (1996). sider the following three scenarios under which Assumption 5 is satisfied. First, assume no spillover effect of treatment receipt Assumption 4 (Monotonicity with interference between units). on the outcome (Scenario I of Figure 1(a)), D (1, z− ) ≥ D (0, z− ) for all z− ∈ Z− . ij i,j ij i,j i,j i,j =  =  Yij(dij, d−i,j) Yij(dij, d−i,j) for dij 0, 1, and any d−i,j, d−i,j. The assumption states that being assigned to the treatment (3) condition never negatively affects the treatment receipt ofa unit, regardless of how the other units within the same cluster TestableconditionsforthisscenarioaregiveninAppendixB.1 are assigned to the treatment/control conditions. Assumption 4 in the supplementary materials.

Figure 1. Three scenarios that imply Assumption 5: (a) no spillover effect of the treatment receipt on the outcome; (b) no spillover effect of the treatment assignment on the treatment receipt; (c) if treatment assignment of a unit does not affect its own treatment receipt, then it should not affect other units’ treatmenteceipts, r that is, the dotted edges do not exist. 638 K. IMAI, Z. JIANG, AND A. MALANI

Second, suppose that the treatment assignment has no Theorem 2 (Nonparametric identification and consistent estima- spillover effect on the treatment receipt (Scenario II of tion of the CADE). Figure 1(b)), 1. Under Assumptions 1–5,wehave   Dij(zij, z−i,j) = Dij(zij, z− ) for zij = 0, 1, and any z−i,j, z− . i,j i,j DEY(a) (4) = ( ) lim→∞ lim→∞ CADE a . nj DED(a) nj Such an assumption is made by Kang and Imbens (2016)in the context of online experiments, in which the assignment 2. Suppose that the outcome is bounded and the restriction on of treatment (e.g., social media messaging) can be individual- interference in Sävje, Aronow, and Hudgens (2017)holdsfor ized but units may interact with each other once they receive both the treatment receipt and the outcome. Then, as both thetreatment.WecantestthisscenariobyestimatingSED(1) the cluster size nj and the number of clusters J go to infinity, and SED(0). we can consistently estimate the CADE, Third, we can weaken the condition in Equation (4)by DEY (a) considering an alternative condition that if a unit’s treatment plim = lim CADE(a) →∞ →∞ n →∞,J→∞ receipt is not affected by its own treatment assignment (i.e., the nj ,J DED(a) j unit is a noncomplier), then the treatment assignment of this for each a = 0, 1. unit has no effect on the treatment receipts of the other units in the same cluster (the absence of dotted edges in Scenario III of The CADE is nonparametrically identifiable as the cluster Figure 1(c)), size and the number of clusters tend to infinity, and can be if Dij(1, z−i,j) = Dij(0, z−i,j),then consistently estimated by the ratio of two estimated ITT effects. Theasymptoticpropertiesarederivedwithinthefinitepopula- D−i,j(1, z−i,j) = D−i,j(0, z−i,j). tion framework, approximating the sampling distribution of an In our application, this scenario is violated, for example, if a estimator by embedding it in an asymptotically stable sequence household that already has insurance and is not going to be of finite populations (Hájek 1960;Lehmann2004). affected by the encouragement influences the enrollment deci- sion of another household by recommending the RSBY to it. To increase the plausibility of this scenario in our application, 3.4. Stratified Interference we excluded all the households with pre-existing insurance Unfortunately, as pointed out by Hudgens and Halloran (2008), from the experiment. As a result, this scenario is plausible a valid estimator of the variances of these ITT effect estimators because one’s encouragement is expected to have a much greater is unavailable without an additional assumption. Hudgens and influence on his/her own enrollment than the enrollment of Halloran (2008) relied upon the stratified interference assump- another unit. tion that the outcome of one unit depends on the treatment Although all three scenarios above satisfy Assumption 5,the assignment of other units only through the number of those who interpretation of the CADE is different. In particular, under areassignedtothetreatmentconditionwithinthesamecluster. Scenarios I and II, we can interpret the CADE as the average In other words, what matters is the number of units rather than direct effect of one’s own treatment receipt on the outcome which units are assigned to the treatment condition. among compliers. In contrast, under Scenario III, the CADE We assume that stratified interference applies to both the also includes the average direct effect of one’s own encourage- outcome and treatment receipt. ment on the outcome through the treatment receipts of other units. Nevertheless, this combined direct effect of encourage- Assumption 6 (Stratified interference). ment may be of interest to policy makers because most govern- =  =  =  ment programs including the RSBY are based on the encourage- Dij(zj) Dij(zj) and Yij(zj) Yij(zj) if zij zij and ment design. In Appendix E.2 in the supplementary materials, nj nj we address this issue using a model-based approach. =  zij zij. The next theorem establishes the nonparametric identifica- i=1 i=1 tion of the CADE as the cluster size tends to infinity. Under Assumptions 1–5, we show that in the limit, the CADE equals In our application, stratified interference for the treatment the ratio of the average direct effects of treatment assignment receipt requires that the enrollment decisions of households ontheoutcomeandonthetreatmentreceiptwhileholdingthe depend only on their own encouragement and the number of treatment assignment mechanism fixed. Although the unbiased encouraged households in their village. Under the assumption of estimation of DEY(a) and DED(a) is readily available (Hudgens no spillover effect of treatment receipt on the outcome, stratified and Halloran 2008), for the consistent estimation of the CADE, interference for the outcome holds so long as it is applicable we need an additional restriction on the structure of interfer- to the treatment receipt. However, for more general scenarios, ence.WefollowSävje,Aronow,andHudgens’s(2017)result Assumption 6 may not be satisfied for the outcome even if it on the consistency of average causal effect in finite population holds for the treatment receipt. framework, and assume that the average amount of interference per unit does not grow proportionally to the cluster size (see 3.4.1. Nonparametric Identification Appendix B.2 in the supplementary materials for a proof of the Under Assumption 6, we can simplify the CADE because the theorem and the details). number of the units assigned to the treatment condition in each JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 639 cluster is fixed given treatment assignment mechanism. This Thus, the treatment assignment can affect its outcome either implies that we can write Dij(zj) and Yij(zj) as Dij(z, a) and directly through its own treatment or indirectly through the Yij(z, a),respectively,andasaresultCADE(a) equals treatment receipts of the other units in the same cluster. For noncompliers (Dij(1, a) = Dij(0, a) = d), the exclusion CADE(a) restriction implies, n J j { ( ) − ( )} { ( ) − ( ) = } j=1 i=1 Yij 1, a Yij 0, a I Dij 1, a Dij 0, a 1 Y (Z = 1, a) − Y (Z = 0, a) = , ij ij ij ij J nj { − = } j=1 i=1 I Dij(1, a) Dij(0, a) 1 = Yij(Dij = dij, D−i,j(Zij = 1, a)) − = = where the complier status can also be simplified as a function Yij(Dij dij, D−i,j(Zij 0, a)).(7) ( ) = { ( ) = of assignment mechanism alone, that is, Cij a I Dij 1, a The treatment assignment affects its own outcome only through ( ) = } 1, Dij 0, a 0 . the treatment receipt of the other units in the same cluster. We now present the results on nonparametric identification Furthermore, Assumption 5 implies Yij(Dij = dij, D−i,j(Zij = and consistent estimation under stratified interference. 1, a)) = Yij(Dij = dij, D−i,j(Zij = 0, a)). Under this assumption, Equation (7) equals zero, implying NADE(a) = 0andthe Theorem 3 (Nonparametric identification and consistent estima- identification of CADE(a). tion of the CADE under stratified interference). Suppose that the outcome is bounded. Then, under Assumptions 1–6,wehave 3.4.3. Randomization-Based Variances DEY (a) We derive the randomization-based variances of the proposed lim CADE(a) = plim n →∞,J→∞ estimators within the finite population framework, in which j nj→∞,J→∞ DED(a) the uncertainty comes solely from the two-stage randomization. for a = 0, 1. As shown by Hudgens and Halloran (2008) in the context of ITT analysis, stratified interference enables the estimation of Proof is in Appendix B.4 in the supplementary materials. variance. Here, we first derive the variances of the ITT effect Under the stratified interference assumption, the consistent esti- estimators and then derive the variance of the proposed CADE mation of CADE no longer requires the restrictions on interfer- estimator. We begin by defining the following quantities, ence in Sävje, Aronow, and Hudgens (2017). nj 2 1 2 σ (z, a) = {Yij(z, a) − Yj(z, a)} , j n − 1 3.4.2. Effect Decomposition j i=1 Under stratified interference, we can decompose the average J 2 direct effect of treatment assignment as the sum of the average 2 1 njJ σ (a) = DEYj(a) − DEY(a) , DE J − 1 N direct effects for compliers and noncompliers, j=1 DEY(a) = CADE(a) · π (a) + NADE(a) ·{1 − π (a)},(5) nj c c 2 1 ω (a) = {Yij(1, a) − Yij(0, a)} j n − 1 where NADE(a) is the non-CADE and is defined as, j i=1 2 −{ − } NADE(a) Yj(1, a) Yj(0, a) , J nj { − } { = } j=1 i=1 Yij(1, a) Yij(0, a) I Dij(1, a) Dij(0, a) 2 = , where σ (z, a) isthewithin-clustervarianceofpotentialout- J nj { = } j j=1 i=1 I Dij(1, a) Dij(0, a) 2 comes, σDE(a) is the between-cluster variance of DEYij(a),and 2 and the proportion of compliers is given by, ωj (a) is the within-cluster variance of DEYij(a).Usingthis notation, we give the results for the ITT effects of treatment J nj 1 assignment on the outcome. The results for the ITT effects of π (a) = I{D (1, a) = 1, D (0, a) = 0}. c ij ij treatment assignment on the treatment receipt can be obtained N = = j 1 i 1 in the same way. According to the exclusion restriction given in Assumption 3, Theorem 4 (Randomization-based variances of the ITT effect for compliers with Dij(1, a) = 1andDij(0, a) = 0, we can write estimators). Under Assumptions 1, 2,and6,wehave the unit-level direct effect on the outcome as the sum of the direct effect through its own treatment receipt and the indirect σ 2 ( ) = − Ja DE a effect through the treatment receipts of other units within the var DEY(a) 1 J Ja same cluster, J 1 njJ = − = + var DEYj(a) | Aj = a , Yij(Zij 1, a) Yij(Zij 0, a) J J N a j=1 ={Yij(Dij = 1, D−i,j(Zij = 1, a)) − Yij(Dij = 0, D−i,j(Zij = 1, a))} where +{ = = σ 2( ) σ 2( ) ω2( ) Yij(Dij 0, D−i,j(Zij 1, a)) j 1, a j 0, a j a var DEYj(a) | Aj = a = + − . − Yij(Dij = 0, D−i,j(Zij = 0, a))}.(6) nj1 nj0 nj 640 K. IMAI, Z. JIANG, AND A. MALANI

Proof is given in Appendix B.5 in the supplementary materi- estimator, its variance blows up when DED is close to zero. als. Because we cannot observe Yij(1, a) and Yij(0, a) simultane- This is similar to the weak instrument problem in the standard 2 ously, no unbiased estimator exists for ωj (a), implying that no instrumental variable settings. unbiased estimation of the variances is possible. Thus, follow- We obtain the following variance estimator by replacing each ing Hudgens and Halloran (2008), we propose a conservative term in the brackets with its conservative estimator, estimator, 2  = 1 Ja σDE(a) var CADE(a) var DEY(a) var DEY(a) = 1 − DED(a)2 J Ja J 2 2 2 2 DEY(a) n J σ (1, a) σ (0, a) − ( ) ( ) 1 j j j 2 cov DEY a , DED a + + I(Aj = a),(8) DED(a) J J N2 n n a j=1 j1 j0 DEY (a)2 + var DED(a) ,(9) where DED(a)2 nj 2 = {Yij − Yj(z, a)} I(Zij = z) σ 2(z, a) = i 1 , where var DED(a) and cov DEY(a), DED(a) are obtained j n − 1 jz by replacing Y with D and the sample variances with the 2 J njJ − = sample covariances in Equation (8), respectively. Similar to j=1 N DEYj(a) DEY(a) I(Aj a) σ 2 ( ) = the ITT analysis, each of the three terms in the brackets of DE a − . Ja 1 var CADE (a) is a weighted average of between-cluster and 2 In Equation (8), σDE(a) represents the between-cluster sample within-cluster sample variances. 2 variance, and σj (z, a) isthewithin-clustervarianceinclusterj. Because the expectation of product is generally not equal Thus, the variance of the ITT direct effect estimator is a weighted to the product of expectations, unlike the ITT analysis, averageofthebetween-clustersamplevarianceandthewithin- var CADE (a) is not a conservative variance estimator in cluster sample variance. finite samples. In Appendix B.8 in the supplementary materials, It can be shown that this variance estimator is on average no however, we show that it is asymptotically conservative. Finally, less than the true variance, to evaluate the robustness of the variance estimator based on E var DEY (a) ≥ var DEY (a) , Assumption 6, we conduct simulation studies and find that the proposed variance estimator works well so long as the number of where the inequality becomes equality when the unit-level clusters is relatively large (see Appendix D in the supplementary ( ) − ( ) direct effect, that is, Yij 1, a Yij 0, a ,isconstantwithin materials). each cluster (see Appendix B.7 in the supplementary materials for a proof). In Appendix B.3 in the supplementary materials, we provide the asymptotic normality result of the ITT effect 3.5. Connections to Two-Stage Least Squares Regression estimators under additional regularity conditions based on In this section, we establish direct connections between the the finite population central limit theorems in Hájek1960 ( ), proposed estimator of the CADE and the two-stage least squares Ohlsson (1989), and Li and Ding (2017). These conditions are estimator, which is popular among applied researchers. Basse satisfied for a bounded outcome as the cluster size and the and Feller (2018) studied the relationships between the ordinary number of clusters go to infinity. See Chin (2018)formore least squares and randomization-based estimators for the ITT refined results on the asymptotic normality of the ITT effect analysis under a particular two-stage randomized experiment estimators without stratified interference. design(seealsoBairdetal.2018). Here, we further extend these We next derive the asymptotic randomization-based vari- previous results. ance of the proposed estimator.

Theorem 5 (Randomization-based variance of the CADE esti- 3.5.1. Point Estimates mator). Under Assumptions 1–6,theasymptoticvarianceof We begin with the ITT analysis. To account for different cluster CADE (a) is sizes, we transform the treatment and outcome variables so that each unit, rather than each cluster, is equally weighted. 1 − DEY(a) Specifically, we multiply them by the weights proportional to var DEY(a) 2 cov DEY(a), DED(a) ∗ ∗ DED(a)2 DED(a) the cluster size, that is, D = n JD /N and Y = n JY /N (see ij j ij ij j ij DEY(a)2 Appendix C in the supplementary materials for the results with + var DED(a) . DED(a)2 general weights). We consider the following linear models for the treatment receipt and outcome, Proof of Theorem 5 is a direct application of the Delta ∗ D = γ I(A = a) + γ Z I(A = a) + ξ , (10) methodbasedontheasymptoticnormalityoftheITTeffect ij a j 1a ij j ij a=0,1 a=0,1 estimators shown in Appendix B.3 in the supplementary materials. Due to the space limitation, we give the expression of ∗ Y = αaI(Aj = a) + α1aZijI(Aj = a) + ij, (11) ij cov DEY(a), DED(a) in Appendix B.6 in the supplementary a=0,1 a=0,1 materials. Because the proposed CADE estimator is a ratio where ξij and ij are error terms. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 641

Unlike the two-step procedure in Basse and Feller (2018), giveninEquations(10)and(11)withXij = (I(Aj = 1), I(Aj = = =  =    we fit the weighted least squares regression with the following 0), ZijI(Aj 1), ZijI(Aj 0)) .LetX (X1 , ..., XJ ) be = inverse probability weights, the entire design matrix, and Wj diag(w1j, ..., wnjj) be the = 1 1 weight matrix in cluster j, W diag(W1, ..., WJ) be the entire wij = · . (12) weight matrix. We use ˆ = (ˆ , ..., ˆ ) to denote the residual J n j 1j njj Aj jZij vector in cluster j obtained from the weighted least squares fit The next theorem shows that the resulting weighted least ofthemodelgiveninEquation(11), and ˆ = (ˆ1, ..., ˆJ) to squares estimators are equivalent to the randomization-based represent the residual vector for the entire sample. ITT effect estimators. Proof is given in Appendix C.1 in the Using the weights, the cluster-robust generalization of HC2 cluster wls supplementary materials. variance, varhc2 (α ),isgivenby ⎧ ⎫ Theorem 6 (Weighted least squares regression estimators for the ⎨J ⎬ wls wls  −1  ˜ −1/2 ˜ −1/2  −1 ITT analysis). Let γ and α be the weighted least squares (X WX) ⎩ Xj WjPj jj Pj WjXj⎭ (X WX) , estimators of the coefficients in the models given in Equa- j=1 tions (10)and(11), respectively. The regression weights are where Pj is the following “annihilator” matrix, given in Equation (12). Then, /  −  / P˜ = I − W1 2X (X WX) 1X W1 2, wls = wls = j nj j j j j γ1a DED(a), γa D(0, a), wls = wls = with In being the nj × nj identity matrix. α1a DEY(a), αa Y(0, a). j cluster wls = 2 It can be shown that varhc2 (α1a ) σDE(a)/Ja,represent- For the CADE, we consider the weighted two-stage least ing the between-cluster sample variance. However, as shown in squares regression where the weights are the same as before and Theorem 4, var DEY (a) is a weighted average of between- given in Equation (12). In our setting, the first-stage regression cluster and within-cluster sample variances. Thus, unlike the model is given by Equation (10) while the second-stage regres- results in Basse and Feller (2018), the cluster-robust HC2 vari- sion is given by ance no longer equals the randomization-based variance esti- mator, because it only takes into account the between-cluster ∗ = = + ∗ = + Yij βaI(Aj a) β1aDijI(Aj a) ηij, (13) variance. = = a 0,1 a 0,1 To address this problem, we introduce the following individ- ind wls where ηij is an error term. The weighted two-stage least squares ual individual-robust HC2 variance, varhc2(α) ,givenby estimators of the coefficients for the model in Equation (13) ⎧ ⎫ can be obtained by first fitting the model in Equation (10) ⎨J nj ⎬  −1 2∗2 ˜ −1   −1 with weighted least squares and then fitting the model in (X WX) ⎩ wijij Pij XijXij⎭(X WX) , ∗ j i=1 Equation (13) again via weighted least squares, in which Dij is replaced by its predicted values based on the first stage   − where P˜ = 1 − w X (X W X ) 1X is the individual anni- regression model. The following theorem establishes the ij ij ij j j j ij ∗ nj = −   = equivalence between the resulting weighted two-stage least hilator and ij ij i=1 i jI(Zi j z)/njz is the adjusted squares regression and randomization-based estimators. Proof = ∗ = residuals for Zij z so that we have Xj j 04. The next the- is given in Appendix C.2 in the supplementary materials. orem establishes that the weighted average of the cluster-robust and individual-robust HC2 variance estimators is numerically Theorem 7 (Weighted two-stage least squares regression estimator w2sls w2sls equivalent to the randomization-based variance estimator. for the CADE). Let βa and β1a be the weighted two-stage least squares estimators of the coefficients for the model given in Theorem 8 (Regression-based variance estimators for the ITT Equation (13). The first stage regression model is given in Equa- effects). The randomization-based variance estimator of the tion (10), and the regression weights are given in Equation (12). direct effect is a weighted average of the cluster-robust and Then, individual-robust HC2 variances, βw2sls = CADE (a), βw2sls = Y(0, a) − CADE (a) · D(0, a). 1a a = − Ja cluster wls + Ja ind wls var DED(a) 1 varhc2 (γ1a ) varhc2(γ1a ), J J 3.5.2. Variances = − Ja cluster wls + Ja ind wls Basse and Feller (2018) showed that the cluster-robust HC2 vari- var DEY(a) 1 varhc2 (α1a ) varhc2(α1a ). ance (Bell and McCaffrey 2002)isequaltotherandomization- J J based variance of the average spillover effect estimator under Proof is given in Appendix C.3 in the supplementary the assumption of equal cluster size. We first generalize this materials. equivalence result to the case where the cluster size varies and To gain some intuition about the weighted average of two then proposes a regression-based variance estimator for the robust variances, consider the following model commonly used CADE estimator that is equivalent to the randomization-based for split-plot designs, variance estimator. = ∗ We begin by introducing additional notation. Let Xj = α ( = ) + α ( = ) +  +   Yij aI Aj a 1aZijI Aj a Bj Wij, (X1j, ..., Xnjj) be the design matrix of cluster j for the model a=0,1 a=0,1 642 K. IMAI, Z. JIANG, AND A. MALANI where Bj represents the random effects for whole plots (or complier direct effect is a weighted average of the cluster-robust clusters), and Wij is the random effects for split-plots (or and individual-robust HC2 variances, individuals).Thecluster-robustHC2varianceisrelatedto Bj and the individual-robust HC2 variance is related to Wij. J var CADE (a) = 1 − a varcluster(βw2sls) In Appendix C.4 in the supplementary materials, we discuss J hc2 1a the connection between the random effects model and the + Ja ind w2sls randomization-based inference and explain why the adjustment varhc2(β1a ). ∗ J for ij is necessary. Finally, we consider the weighted two-stage least squares Proof is given in Appendix C.5 in the supplementary = regression given in Equations (10)and(13). Let Mj materials.  (M1j, ..., Mnjj) be the design matrix for cluster j in the second-  = = = ∗ = stage regression with Mij (I(Aj 1), I(Aj 0), DijI(Aj ∗ = ∗ 4. Empirical Analysis 1), DijI(Aj 0)) where Dij represents the fitted value given in =    In this section, we analyze the data introduced in Section 2 by Equation (10). Let M (M1 , ..., MJ ) be the entire design  applying the proposed methodology. We focus on the annual matrix. We use η = (η , ...,η ) to denote the residual j 1j njj household hospital expenditure, which ranges from 0 to INR vector in cluster j obtained from the model given in Equation    500,000 (USD 28,831) with the median value of 1,000 (USD (13), whereas η = (η , ...,η ) represents the residual 1 J 58). The outcome is missing for 926 households, which is less vector for the entire sample. We define the cluster-robust HC2 w2sls than 10% of the sample. For simplicity, we discard the observa- cluster(β ) variance, varhc2 ,as tions with missing data from the current analysis and leave the ⎧ ⎫ development of a method for analyzing two-stage randomized ⎨J ⎬  −1  ˜ −1/2 ˜ −1/2  −1 experiments with missing data to future research. (M WM) ⎩ Mj WjQj ηjηj Qj WjMj⎭ (M WM) , (14) j=1 As expected, the enrollment rate in the villages assigned to the “High” assignment mechanism is 67.0%, whereas the enroll- where Q is the cluster annihilator matrix, j ment rate in the villages under the “Low” assignment mecha- ˜ = − 1/2  −1  1/2 nism is just 46.2%. Because the encouragement proportion is Qj Inj M (M WM) Mj W . j j 80% under the “High” assignment mechanism and 40% under ind w2sls the “Low” assignment mechanism, this implies the existence The individual-robust HC2 variance, varhc2(β) ,isgivenby ⎧ ⎫ of two-sided noncompliance, in which some households in the ⎨J nj ⎬ treatment group did not receive the treatment and others in the  −1 2∗2 ˜ −1   −1 (M WM) ⎩ wijη Qij MijMij ⎭ (M WM) , control group managed to receive it. j=1 i=1 Table 2 presents the estimates of ITT effects and complier   − averagedirectandspillovereffects.Weshowtheresultsfor where Q˜ = 1 − w M (M W M ) 1M is the individual ij ij ij j j j ij both the individual/household-weighted and cluster/village- ∗ nj = −   = = annihilator and ηij ηij i=1 ηi jI(Zi j z)/njz for Zij z weighted estimands. In the top row, we show the estimated ∗ = is the adjusted residual with Xj ηj 04.Asinthecaseof average direct effect on enrollment in RSBY under the “High” ITT analysis, we can show that the weighted average of cluster- treatment mechanism (DED(1)) and under the “Low” treatment robust and individual-robust variance estimators is numerically mechanism (DED(0))aswellastheestimatedaveragespillover equivalent to the randomization-based variance estimator. effects under the treatment condition (SED(1))andthecontrol condition (SED(0)). The treatment assignment is estimated to Theorem 9 (Regression-based variance estimator for the CADE). increase the enrollment rate by more than 40 percentage points. The randomization-based variance estimator of the average This quantity represents the estimated proportions of compliers.

Table 2. Estimated intention-to-treat (ITT) and complier average direct and spillover effects. Enrollment in RSBY DED(1) DED(0) SED(1) SED(0) Household-weighted 0.482 (0.023) 0.441 (0.021) 0.086 (0.053) 0.045 (0.028) Village-weighted 0.457 (0.019) 0.445 (0.017) 0.044 (0.018) 0.031 (0.021) Hospital expenditure DEY(1) DEY(0) SEY(1) SEY(0) Household-weighted −795 (514) 875 (530) −1374 (823) 297 (858) Village-weighted −222 (575) 1666 (734) −1677 (972) 211 (761) Hospital expenditure CADE(1) CADE(0) CASE(1) CASE(0) Household-weighted −1649 (1061) 1984 (1215) −15,900 (15,342) 6568 (18,305) Village-weighted −485 (1258) 3752 (1652) −38,341 (26,845) 6846 (25,042) NOTE: For the “household-weighted”estimates, we equally weight households, whereas each village is equally weighted for the “village-weighted”estimates. The top row presents the average direct effects on enrollment in RSBY under the high treatment mechanism (DED(1)) and under the low treatment mechanism (DED(0))aswellasthe average spillover effects under the treatment (SED(1))andcontrol(SED(0)) conditions. The middle row presents the same set of ITT estimates for hospital expenditure. Finally, the bottom row presents the complier average direct effect under the “High” ((CADE 1))and“Low”(CADE(0)) treatment assignment mechanisms as well as the complier average spillover effect under the treatment ((CASE 1))andcontrol(CASE(0)) conditions. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 643

Interestingly, we find that the average spillover effects are hospitals when many households newly enroll in the RSBY. The estimated to be positive and, sometimes, statistically significant. government can address this issue by increasing the capacity of In particular, the village-weighted average spillover effect on local hospitals. enrollment is 4.4 percentage points with the of Inadditiontothequantitiesinouranalysisabove,wemay 1.8 under the treatment condition. The finding implies that also be interested in other quantities, for example, the average assigning a greater proportion of households to the treatment spillover effect of the treatment assignment when all households condition makes another household of the same village more are assigned to the treatment condition versus all households likely to enroll in RSBY especially if the latter is also encouraged are assigned to the control condition, and the direct effect of to enroll.3 one’s own treatment receipt when the treatment receipts of other ThemiddlerowofTable 2 presents the estimated ITT effects households are fixed at some constant levels. Unfortunately, on the outcome. The estimated average direct effect under the without modeling assumptions, we are unable to identify these “Low” assignment mechanism (DEY(0))tendstobepositive quantities. In Appendix E in the supplementary materials, we where the village-weighted estimate is statistically significant. In propose a model-based approach and estimate these quantities contrast, the estimated average direct effect under the “High” using our application data. assignment mechanism (DEY(1))isnegativealthoughnotsta- tistically significant.4 One possible explanation for this difference is that the assign- 5. Concluding Remarks ment to the treatment condition makes people visit hospitals In this article, we consider two-stage randomized experiments more often and spend more on healthcare so long as fewer with noncompliance and interference. We merge two strands of householdswithinthesamevillageareassignedtothetreat- the causal inference literature, one on experiments with non- ment. When a large number of households within the same vil- compliance and the other on experiments with interference. lage are assigned to the treatment condition, the overcrowding We introduce new causal quantities of interest, propose non- of hospitals may reduce hospital visits of each treated household. parametric identification results and consistent estimators, and We examine the plausibility of this explanation by estimating derive their variances. We connect these randomization-based the direct effect of the treatment assignment on the number estimators to two-stage least squares regressions that are com- of hospital visits. The estimated direct effect under the “High” monly used by applied researchers. We apply the proposed treatment assignment mechanism is −0.157, whereas that under methodology to evaluate the efficacy of the India’s National the “Low” treatment assignment mechanism is 0.132. Although Health Insurance program (RSBY) and find some evidence of these estimates are not statistically significant, they provide sug- spillover effects. We believe that the proposed methodology can gestive evidence consistent with the overcrowding hypothesis. help applied researchers make best use of this effective exper- The bottom row of Table 2 presents the estimates of the com- imental design for studying interference problems. In future plier average direct and spillover effects. The village-weighted research, it is of interest to relax the assumption of partial CADE under the “Low” assignment mechanism (CADE(0))is interference and allow for interference between units of different positive and statistically significant, implying that enrollment clusters. in RSBY directly increases the household hospital expenditure whenfewhouseholdsareassignedtothetreatmentcondition.In contrast, the CADE under the “High” assignment mechanism Supplementary Materials ( ) (CADE 1 ) is negative. This difference is consistent with the The supplementary appendix includes the mathematical proofs, additional overcrowding hypothesis discussed above. simulation results, and an alternative parametric modeling strategy. In addition, we also estimate the CASEs. Unfortunately, they are imprecisely estimated, making it difficult to draw a defi- nite conclusion about whether or not the proportion of treated Acknowledgments households in a village directly affects one’s outcome among The proposed methodology is implemented via an open-source software those who enroll in RSBY only when a greater proportion of package experiment (Imai and Jiang 2018), which is available at https:// households is encouraged to sign up for the insurance program. cran.r-project.org/package=experiment.WethankNaokiEgamiforhelpful Becausemostoftheestimatesarenotstatisticallysignifi- comments. We thank the editor, associate editor, and three reviewers for cant, it is difficult to draw a definitive conclusion. However, careful reading and many constructive comments. our analysis provides some suggestive policy recommendations. First, the estimated positive spillover effect of encouragement Funding on enrollment suggests that the government could increase the enrollment rate by leveraging existing social networks among This study was partially funded by grants from the Law School, the households within each village. Second, the estimated nega- MacLean Center for Bioethics, and the Becker-Friedman Institute at the University of Chicago, U.S.; the Department for International tive CADE under the “High” treatment assignment mechanism Development, U.K.; the International Growth Centre, U.K.; the Tata Trusts condition suggests that there might be overcrowding of local through the Tata Centre for Development at the University of Chicago; and SRM University, Andhra Pradesh, India. 3This result should be viewed as an illustration that spillovers are possible in this study. We leave a more complete examination of spillover effects to subsequent work based on the prespecified analysis. ORCID 4This result should also be viewed as illustrative. Analysis pursuant to the pre- specified analysis is left to future work. Kosuke Imai http://orcid.org/0000-0002-2748-1022 644 K. IMAI, Z. JIANG, AND A. MALANI

References Imai, K., and Jiang, Z. (2018), “Experiment: R Package for Designing and Analyzing Randomized Experiments,” Comprehensive R Archive Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996), “Identification of Network (CRAN), available at https://CRAN.R-project.org/package= Causal Effects Using Instrumental Variables,” Journal of the American experiment.[633,643] Statistical Association, 91, 444–455. [632,636,637] Jowett, M. (2003), “Do Informal Risk Sharing Networks Crowd Out Public Aronow, P. M. (2012), “A General Method for Detecting Interference Voluntary Health Insurance? Evidence From Vietnam,” Applied Eco- Between Units in Randomized Experiments,” Sociological Methods & nomics, 35, 1153–1161. [633] Research, 41, 3–16. [632] Kang, H., and Imbens, G. (2016), “Peer Encouragement Designs in Causal Aronow, P. M., and Samii, C. (2017), “Estimating Average Causal Inference With Partial Interference and Identification of Local Average Effects Under General Interference,” Annals of Applied Statistics, 11, Network Effects,” arXiv no. 1609.04464. [632,638] 1912–1947. [632] Lehmann, E. L. (2004), Elements of Large-Sample Theory,Boston,MA: Athey, S., Eckles, D., and Imbens, G. W.(2018), “Exact p-Values for Network Springer. [638] Interference,” Journal of the American Statistical Association, 113, 230– Li, X., and Ding, P. (2017), “General Forms of Finite Population Central 240. [632] Limit Theorems With Applications to Causal Inference,” Journal of the Baird, S., Bohren, J. A., McIntosh, C., and Özler, B. (2018), “Optimal Design American Statistical Association, 112, 1759–1769. [640] of Experiments in the Presence of Interference,” The Review of Economics Lin, W., Liu, Y., and Meng, J. (2014), “The Crowding-Out Effect of Formal and Statistics, 100(5), 844–860. [632,640] Insurance on Informal Risk Sharing: An Experimental Study,” Games Basse, G., and Feller, A. (2018), “Analyzing Two-Stage Experiments in the and Economic Behavior, 86, 184–211. [633] Presence of Interference,” Journal of the American Statistical Association, Liu, L., and Hudgens, M. G. (2014), “Large Sample Randomization Infer- 113, 41–55. [632,633,635,640,641] ence of CAUSAl Effects in the Presence of Interference,” Journal of the Bell, R. M., and McCaffrey, D. F. (2002), “Bias Reduction in Standard Errors American Statistical Association, 109, 288–301. [632] for Linear Regression With Multi-Stage Samples,” Survey Methodology, Neyman, J. (1923), “On the Application of Probability Theory to Agricul- 28, 169–181. [641] tural Experiments: Essay on Principles, Section 9” (Translated in 1990), Berman, P., Ahuja, R., and Bhandari, L. (2010), “The Impoverishing Effect Statistical Science, 5, 465–480. [632,634] of Healthcare Payments in India: New Methodology and Findings,” Ohlsson, E. (1989), “Asymptotic Normality for Two-Stage Sampling From Economic and Political Weekly, 45, 65–71. [633] a Finite Population,” ProbabilityTheoryandRelatedFields, 81, 341–352. Chin, A. (2018), “Central Limit Theorems via Stein’s Method for Random- [640] ized Experiments Under Interference,” arXiv no. 1804.03105. [640] Rosenbaum, P. R. (2007), “Interference Between Units in Randomized Fisher, R. A. (1935), The Design of Experiments,London:Oliverand Experiments,” JournaloftheAmericanStatisticalAssociation, 102, 191– Boyd. [632] 200. [632] Forastiere, L., Airoldi, E. M., and Mealli, F. (2016), “Identification and Esti- Rubin, D. B. (1990), “Comments on ‘On the Application of Probability mation of Treatment and Interference Effects in Observational Studies Theory to Agricultural Experiments. Essay on Principles. Section 9’ on Networks,” arXiv no. 1609.06245. [632] by J. Splawa-Neyman Translated from the Polish and edited by D. M. Hájek, J. (1960), “Limiting Distributions in Simple Random Sampling From Dabrowska and T. P. Speed,” Statistical Science, 5, 472–480. [632,634] a Finite Population,” Publications of the Mathematical Institute of the Sävje, F., Aronow, P. M., and Hudgens, M. G. (2017), “Average Treatment Hungarian Academy of Sciences, 5, 361–374. [638,640] Effects in the Presence of Unknown Interference,”arXiv no. 1711.06399. Halloran, M. E., and Struchiner, C. J. (1995), “Causal Inference in Infectious [638,639] Diseases,” Epidemiology, 6, 142–151. [635] Shahrawat, R., and Rao, K. D. (2011), “Insured Yet Vulnerable: Out-of- Holland, P. W. (1986), “Statistics and Causal Inference” (with discus- Pocket Payments and India’s Poor,” Health Policy and Planning, 27, 213– sion), Journal of the American Statistical Association, 81, 945–960. 221. [633] [632,634] Sobel, M. E. (2006), “What Do Randomized Studies of Housing Mobility Hong, G. (2015), Causality in a Social World: Moderation, Mediation and Demonstrate? Causal Inference in the Face of Interference,” Journal of Spill-Over, Chichester: Wiley. [632] the American Statistical Association, 101, 1398–1407. [632,635] Hong, G., and Raudenbush, S. W. (2006), “Evaluating Kindergarten Reten- Tchetgen Tchetgen, E. J., and VanderWeele, T. J. (2010), “On Causal Infer- tion Policy: A Case Study of Causal Inference for Multilevel Observa- ence in the Presence of Interference,” Statistical Methods in Medical tional Data,” Journal of the American Statistical Association, 101, 901– Research, 21, 55–75. [632] 910. [635] Vanderweele, T. J., Hong, G., Jones, S. M., and Brown, J. L. (2013), “Medi- Hudgens, M. G., and Halloran, M. E. (2008), “Toward Causal Inference ation and Spillover Effects in Group-Randomized Trials: A Case Study With Interference,” Journal of the American Statistical Association, 103, of the 4Rs Educational Intervention,” Journal of the American Statistical 832–842. [632,634,635,636,638,639,640] Association, 108, 469–482. [632]