<<

arXiv:2003.06023v4 [econ.EM] 24 Apr 2021 a ig,Uiest fCiaoadU ai o epu comments. helpful for Davis UC and L Chicago of University Applied Diego, UCSB San University, Stanford that at suggestions and participants discussions valuable for Steigerwald Doug and ∗ † hn aisCtae,C´mn eCasmri,Xne a Ke Ma, Xinwei Chaisemartin, Cl´ement de Cattaneo, Matias thank I eateto cnmc,Uiest fClfri,SnaBarbara Santa California, of University , of Department aslSilvrEet sn ntuetlVariables Instrumental Using Effects Spillover Causal eetmtdb SS lutaetepooe ehd usin behavior. methods voting proposed and the interactions analy illustrate social I then 2SLS. I by noncompliance, estimated one. one-sided be than under larger effects spillover is weights and of direct we sum a the recovers where generally effects para estimands interpretable first-stage causally to by link parameters clear a have not f sho do effects I hence spillover and types. direct compliance provid multiple of aggregate and parameters outcomes, marginal and the take-up of treatment tion types both compliance on population occur the can characterize I variables. Keywords e paptnilotoe rmwr oaayespillove analyze to framework potential-outcomes a up set I aslifrne ploe ffcs ntuetlvariabl instrumental effects, spillover inference, causal : ozl Vazquez-Bare Gonzalo pi 7 2021 27, April Abstract ral mrvdteppr lotakseminar thank also I paper. the improved greatly rdffrn opinetps and types, compliance different or nh otwsenUiest,UL,UC UCLA, University, Northwestern unch, . ees oevr ecln ITT rescaling Moreover, meters. ih aaaa laNmn ikStartz Dick Namen, Olga Nagasawa, nichi [email protected] n hwta hs ffcscan effects these that show and nastigi hc spillovers which in setting a in htitnint-ra (ITT) intention-to-treat that w † gtdcmiaino average of combination ighted aafo neprmn on experiment an from data g ffcsuiginstrumental using effects r odtosfridentifica- for conditions e eietfiaino causal of identification ze s ramn effects. treatment es, . ∗ 1 Introduction

An accurate assessment of spillover effects is crucial to understand the costs and benefits of policies or treatments (Athey and Imbens, 2017; Abadie and Cattaneo, 2018). Previous liter- ature has shown that appropriately designed randomized controlled trials (RCTs) are a pow- erful tool to analyze spillovers (Moffit, 2001; Duflo and Saez, 2003; Hudgens and Halloran, 2008; Baird et al., 2018; Vazquez-Bare, 2020). In particular, Vazquez-Bare (2020) shows that, when treatment is randomly assigned, causal average spillover effects can be identified and estimated under mild restrictions on a general treatment assignment mechanism by an- alyzing how average observed outcomes change in response to different configurations of own and peers’ treatment assignments.

The validity of this approach, however, relies on perfect compliance, by which individuals who are assigned to treatment effectively receive it and individuals who are assigned to the un- treated condition remain effectively untreated. The assumption of perfect compliance is often restrictive, as individuals assigned to treatment may refuse it or individuals not assigned to treatment may find alternative sources to receive it. In other cases, researchers may not have control on the treatment assignment, and instead may need to rely on quasi-experimental variation from a natural experiment (see e.g. Angrist and Krueger, 2001; Titiunik, 2019). In such cases, actual treatment receipt becomes endogenous as it is the result of individual decisions, even if treatment assignment is randomized (or quasi-randomized). As a result, simple comparisons of outcomes under different treatment status configurations do not gen- erally recover causally interpretable parameters. While previous literature has shown that instrumental variables (IVs) can address this type of endogeneity and identify local average treatment effects when spillovers are ruled out (Angrist et al., 1996), little is known about what an IV can identify in the presence of spillovers.

This paper provides a framework to study causal spillover effects using instrumental vari- ables and offers three main contributions. First, I define causal direct and spillover effects in a setup with two-sided noncompliance. This setup highlights that, when treatment take-up is endogenous, spillover effects can occur in both treatment take-up, when the instrument val- ues of a unit can affect its peers’ treatment status, and outcomes, when the treatment status of a unit can affect its peers’ responses. Focusing on the case in which spillovers occur within pairs (such as spouses or roommates), I propose a generalization of the monotonicity assump- tion (Imbens and Angrist, 1994) that partitions the population into five compliance types: in addition to always-takers, compliers and never-takers, units may be social-interaction com- pliers, who receive the treatment when either themselves or their peer are assigned to it, and group compliers, who only receive the treatment when both themselves and their peer are as- signed to it. Proposition 1 provides conditions for identification of the marginal distribution of compliance types, and shows that the joint distribution is generally not identified.

1 Second, I analyze intention-to-treat (ITT) parameters and show that these estimands conflate multiple direct and indirect effects for different compliance subpopulations, and hence do not have a clear causal interpretation in general. Moreover, rescaling the ITT by the first-stage estimand, which would recover the average effect on compliers in the absence of spillovers, generally yields a weighted average of direct and spillover effects where the sum of weights exceeds one.

Third, I show that, when noncompliance is one-sided, it is possible to identify the average direct effect on compliers and the average spillover effect on units with compliant peers. I provide a way to assess the external validity of these parameters and discuss testable impli- cations of the identification assumptions. In addition, I show that direct and indirect local average effects can be estimated by two-stage least squares (2SLS), which provides a straight- forward way to estimate these effects in practice based on standard regression methods and to conduct inference that is robust to weak instruments using Fieller-Anderson-Rubin con- fidence intervals. The proposed methods are illustrated using data from an experiment on social interactions in the household and voter turnout.

Finally, I discuss two generalizations of my results. The first one considers the case in which the IV identification assumptions hold after conditioning on a set of observed covari- ates. The second one generalizes the results to arbitrary group sizes by introducing a novel assumption, independence of peers’ types, by which potential outcomes are independent of peers’ compliance types conditional on own type. I show how this alternative assumption per- mits identification under two-sided noncompliance by limiting the amount of heterogeneity in the distribution of potential outcomes.

This paper contributes to the literature on causal inference under interference, which generally focuses on the case of two-stage experimental designs with perfect compliance (see Halloran and Hudgens, 2016, for a recent review). Existing studies analyzing imper- fect compliance (Sobel, 2006; Kang and Imbens, 2016; Kang and Keele, 2018; Imai et al., forthcoming) consider identification, estimation and inference for specific experimental de- signs or by imposing specific restrictions on the spillovers structure. My findings add to this literature by introducing a novel set of estimands and identification conditions that are inde- pendent of the experimental design and that simultaneously allow for spillovers on outcomes, spillovers on treatment take-up and multiple compliance types in a superpopulation setting.

This paper can also be linked to the literature on multiple instruments (Imbens and Angrist, 1994; Mogstad et al., 2019) since the vectors of own and peers’ instruments and treatments in spillover analysis can be rearranged as a multivalued instrument and a multivalued treat- ment. While most of the existing literature in this area considers the case of using multiple instruments for a single binary treatment, my setting with unrestricted spillovers introduces both multiple instruments and multiple treatments (see Remark 2 for further discussion).

The rest of the article is organized as follows. Section 2 describes the setup, introduces

2 the notation and defines the causal parameters of . Section 3 analyzes intention- to-treat (ITT) parameters. Section 4 provides the main identification results under one- sided noncompliance. Section 5 analyzes estimation and inference, and Section 6 applies the proposed methods in an empirical setting. Finally, Section 7 generalizes the results to conditional-on-observables IV and multiple units per group, and Section 8 concludes. The supplemental appendix contains the technical proofs and additional results and discussions not included here to conserve space.

2 Setup

Consider a random sample of independent and identically distributed groups indexed by g = 1,...,G. Spillovers are assumed to occur between units in the same group, but not between units in different groups. I start by considering the case in which each group con- sists of two identically-distributed units, so that each unit i in group g has one peer. This setup has a a wide range of applications in which groups consist, for example, of room- mates in college dormitories (Sacerdote, 2001; Babcock et al., 2015; Garlick, 2018), spouses (Fletcher and Marksteiner, 2017; Foos and de Rooij, 2017), siblings (Barrera-Osorio et al., 2011), etc. Section 7 generalizes this setup to the case of multiple units per group.

The goal is to study the effect of a treatment or policy on an outcome of interest Yig, allowing for the possibility of within-group spillovers. Individual treatment status of unit i in group g is denoted by Dig, taking values d 0, 1 . For each unit i, Djg with j = i is the ∈ { } 6 treatment indicator corresponding to unit i’s peer. Treatment take-up can be endogenous and is allowed to be arbitrarily correlated with individual characteristics, both observable and unobservable. To address this endogeneity, I assume the researcher has access to a pair ′ 2 of instrumental variables, (Zig,Zjg) for unit i and her peer j, taking values (z, z ) 0, 1 . ∈ { } These instruments are as-if randomly assigned in a sense formalized below. Borrowing from the literature on imperfect compliance in RCT’s, I will often refer to the instruments (Zig,Zjg) as “assignments”, as they indicate whether an individual is assigned (or encouraged) to get the treatment. However, all the results in the paper apply not only to cases in which the researcher has control on the assignment mechanism of (Zig,Zjg), as in an encouragement design, but also to cases in which the instruments come from a natural experiment (see e.g. Angrist and Krueger, 2001; Titiunik, 2019).

′ For a given realization of the treatment status (Dig, Djg) = (d, d ) and the instruments ′ (Zig,Zjg) = (z, z ), the potential outcome for unit i in group g is a random variable denoted ′ ′ by Yig(d, d ,z,z ). I assume that instruments do not directly affect potential outcomes, an assumption commonly known as the exclusion restriction.

′ ′ ′ ′ ′ ′ Assumption 1 (Exclusion restriction) Yig(d, d ,z,z )= Yig(d, d , z,˜ z˜ ) for all (z, z , z,˜ z˜ ).

3 ′ Under Assumption 1, potential outcomes are a function of treatment status only, Yig(d, d ). In this setting, the direct effect of the treatment on unit i given peer’s treatment status d′ is ′ ′ defined as Yig(1, d ) Yig(0, d ) and the spillover effect on unit i given own treatment status d − is defined as Yig(d, 1) Yig(d, 0). The observed outcome for unit i in group g is the of the − potential outcome under the observed treatment realization, given by Yig = Yig(Dig, Djg).

The possibility of endogenous treatment status introduces an additional channel through which spillovers can materialize. For example, in an encouragement design targeted at cou- ples, unit i may not be offered the incentive to participate in the program, so that Zig = 0, but she may hear about the program from her spouse who is assigned the incentive, Zjg = 1, and decide to participate so that Dig = 1. More generally, treatment status may depend on both own and peer’s assignment. To formalize this phenomenon, the potential treatment ′ status for unit i in group g given instruments values (Zig,Zjg) = (z, z ) will be denoted by ′ Dig(z, z ), and the spillover effects on unit i’s treatment status is Dig(z, 1) Dig(z, 0) for − z = 0, 1. The observed treatment status is Dig(Zig,Zjg).

The following assumption formalizes the requirement that instruments are as-if randomly assigned.

′ ′ ¯ ′ ′ Assumption 2 (Independence) Let yig = (Yig(d, d ))(d,d ) and dig = (Dig(z, z ))(z,z ).

Then, (yig, d¯ig, yjg, d¯jg) (Zig,Zjg). ⊥⊥

Assumption 2 imposes statistical independence between the vectors of potential outcomes and treatment statuses and the instruments in each group, by which the instrument can be considered (as-if) randomly assigned. Section 7.1 offers an alternative version of this assumption in which independence holds after conditioning on a set of observable covariates.

′ The different values that Dig(z, z ) can take determine each unit’s compliance type, which indicates how each unit’s treatment status responds to different configurations of instruments

(Zig,Zjg). A careful analysis of compliance types is crucial for defining and identifying causal parameters in this setting. As originally pointed out by Imbens and Angrist (1994), the instruments affect each compliance type in a different way, which in turns determines what parameters can and cannot be identified through harnessing the exogenous variation generated by the instruments, as I discuss next.

2.1 Compliance Types

As mentioned, the vector (Dig(0, 0), Dig (0, 1), Dig (1, 0), Dig (1, 1)), which indicates the unit’s treatment status for each possible assignment, determines each unit’s compliance type. For ′ ′ example, a unit with Dig(z, z ) = 0 for all (z, z ) always refuses the treatment regardless of her ′ ′ own and her peer’s assignment. A unit with Dig(z, z ) = 1 for all (z, z ) always receives the treatment regardless of her own and peer’s assignment. A unit with Dig(1, 1) = Dig(1, 0) = 1

4 Table 1: Population types

Dig(1, 1) Dig(1, 0) Dig(0, 1) Dig(0, 0) Type 1 1 1 1 Always-taker (AT) 1 1 1 0 Social-interaction complier (SC) 1 1 0 0 Complier (C) 1 0 0 0 Groupcomplier(GC) 0 0 0 0 Never-taker (NT)

and Dig(0, 1) = Dig(0, 0) = 0 only receives the treatment when she is assigned to it, regardless of her peer’s assignment, and so on. Without further restrictions, there is a total of 16 different compliance types in the population. To reduce the number of compliance types, I will introduce the following restriction that generalizes the commonly invoked monotonicity assumption (Imbens and Angrist, 1994) to the spillovers case.

Assumption 3 (Monotonicity) For all i and g, Dig(1, 1) Dig(1, 0) Dig(0, 1) ≥ ≥ ≥ Dig(0, 0).

Assumption 3 reduces the compliance types to five. Table 1 lists the five different compli- ance types in the population under Assumption 3. Always-takers (AT) are units who receive treatment regardless of own and peer treatment assignment. Social-interaction compliers (SC), a term coined by Duflo and Saez (2003), are units who receive the treatment as soon as someone in their group (either themselves or their peer) is assigned to it. Compliers (C) are units that receive the treatment if and only if they are assigned to it. Group compliers (GC) are units who only receive the treatment when their whole group (i.e. both themselves and their peer) is assigned to treatment. Finally, never-takers (NT) are never treated re- gardless of own and peer’s assignment. The categories in Table 1 are listed in decreasing order of likelihood of being treated. Note that this assumption is not testable, as one can only observe one out of the four possible potential treatment statuses, and hence its validity needs to be assessed on a case-by-case basis.

Remark 1 (Monotonicity and ordering) The ordering in Assumption 3 is without loss of generality and can be rearranged depending on the context. For example, if the treatment is alcohol consumption and the instrument is a randomly assigned incentive to reduce alcohol consumption, the inequalities can be reverted, Dig(1, 1) Dig(1, 0) Dig(0, 1) Dig(0, 0). ≤ ≤ ≤ The key restriction introduced by monotonicity is that there exists an ordering between po- tential treatment statuses that is the same for all units in the population.

Remark 2 (Connection to multi-valued instruments and treatments) The setup in this paper can be mapped into one where the pair (Zig,Zjg) is viewed a multi-valued in- strument Aig = 2Zig + Zjg 0, 1, 2, 3 and (Dig, Djg) is viewed a multi-valued treatment ∈ { }

5 Tig = 2Dig + Djg 0, 1, 2, 3 , where Tig(a) denotes potential treatment status under assign- ∈ { } ment a. This setup is analyzed under general conditions by Heckman and Pinto (2018). As shown in the upcoming sections, the spillovers case introduces specific modeling restrictions and a different way to interpret heterogeneity in treatment effects. In particular, unordered

monotonicity (Assumption A-3 in Heckman and Pinto, 2018) does not hold in this setting. ½ For example, unordered monotonicity requires that either ½(Tig(1) = 3) (Tig(2) = 3)

½ ½ for all i, g or ½(Tig(1) = 3) (Tig(2) = 3) for all i, g. In my setting, (Tig(1) = 3) = ≤

Dig(0, 1)Djg(1, 0) and ½(Tig(2) = 3) = Dig(1, 0)Djg(0, 1). Now, in the event ATig,Cjg , we { } have that Dig(0, 1)Djg(1, 0) = 1 > Dig(1, 0)Djg(0, 1) = 0, whereas in the event Cig, ATjg , { } Dig(0, 1)Djg(1, 0) = 0 < Dig(1, 0)Djg(0, 1) = 1 and hence the weak inequality does not hold.

In what follows, let ξig denote a random variable indicating unit i’s compliance type,

ξig AT,SC,C,GC,NT . Also, let Cig denote the event that unit i in group g is a complier, ∈ { } Cig = ξig =C , and similarly for ATig = ξig = AT , SCig = ξig = SC and so on. { } { } { } Under Assumptions 1, 2 and 3, the marginal distribution of compliance types in the population is identified, as the following proposition shows.

Proposition 1 (Distribution of compliance types) Under Assumptions 1, 2 and 3,

P[ATig]= E[Dig Zig = 0,Zjg = 0] | P[SCig]= E[Dig Zig = 0,Zjg = 1] E[Dig Zig = 0,Zjg = 0] | − | P[Cig]= E[Dig Zig = 1,Zjg = 0] E[Dig Zig = 0,Zjg = 1] | − | P[GCig]= E[Dig Zig = 1,Zjg = 1] E[Dig Zig = 1,Zjg = 0] | − | and P[NTig] = 1 P[ATig] P[SCig] P[Cig] P[GCig]. Finally, P[ATig, ATjg]= E[DigDjg Zig = − − − − | 0,Zjg = 0] and P[NTig, NTjg]= E[(1 Dig)(1 Djg) Zig = 1,Zjg = 1]. − − |

All the proofs can be found in the supplemental appendix. Proposition 1 can be used to test for the presence of average spillover effects on treatment status. Note that under

Assumption 3, E[Dig(0, 1) Dig(0, 0)] = P[SCig] and E[Dig(1, 1) Dig(1, 0)] = P[GCig], − − and thus testing for the presence of average spillover effects on treatment status amounts to testing for the presence of social-interaction compliers and group compliers. Because the instruments are as-if randomly assigned, these issues can be analyzed within the framework in Vazquez-Bare (2020).

2.2 Causal Parameters and Estimands of Interest

In the presence of spillovers, average direct and spillover effects can be defined as differences ′ between average potential outcomes under different treatment configurations, E[Yig(d, d ) −

6 ′ Yig(d,˜ d˜)]. When the treatment vector (Dig, Djg) is randomly assigned, average poten- ′ tial outcomes (and thus treatment effects) are identified by the relationship E[Yig(d, d )] = ′ E[Yig Dig = d, Djg = d ] (see Vazquez-Bare, 2020, and references therein). | When the treatment is endogenous, average causal effects are generally not point iden- tified. Imbens and Angrist (1994) analyze such settings in the absence of spillovers, that is, ′ ′ when Yig(d, d ) = Yig(d) and Dig(z, z ) = Dig(z). They show that the instrument’s exoge- nous variation can be leveraged to identify the local average effect on compliers, E[Yig(1) − Yig(0) Dig(1) > Dig(0)], emphasizing that average potential outcomes and treatment effects | can vary over compliance types, and that the instrument only provides identifying variation on the specific subpopulation whose behavior is affected by the instrument. In the presence of spillovers, without further restrictions, average potential outcomes can depend on both ′ ′ own and peer’s compliance types, E[Yig(d, d ) ξig = ξ,ξjg = ξ ]. I refer to these objects as | “local average potential outcomes”.

The upcoming section shows that, in general, the simultaneous presence of spillovers on treatment status an outcomes can impede identification of causally-interpretable parameters even when the instruments are randomly assigned. However, Section 4 shows that one non- compliance is one-sided, it is possible to identify the average direct effect on compliers,

E[Yig(1, 0) Yig(0, 0) Cig] and the average spillover effect on units with compliant peers, − | E[Yig(0, 1) Yig(0, 0) Cjg]. Section 7 provides an alternative identification assumption that − | applies to the case of multiple units per group.

3 Intention-to-Treat Parameters

Intention-to-treat (ITT) analysis focuses on the variation in Yig generated by the instru- ments. Imbens and Angrist (1994) showed that, in the absence of spillovers, the ITT esti- mand E[Yig Zig = 1] E[Yig Zig = 0] is an attenuated measure of the average treatment | − | effect on compliers, or local average treatment effect (LATE). Furthermore, the LATE can be easily recovered by rescaling the ITT parameter by the proportion of compliers, which is identified under monotonicity and as-if random assignment of the instrument. This section shows that, in the presence of spillovers, the link between ITT parameters and local average effects is much less clear, as the former will conflate multiple potentially different effects into a single number that may be hard to interpret in the presence of effect heterogeneity.

I will refer to differences in average outcomes changing own instrument leaving the peer’s ′ instrument fixed as direct ITT parameters, E[Yig Zig = 1,Zjg = z ] E[Yig Zig = 0,Zjg = | − | z′], and differences fixing own instrument and varying the peer’s instrument as indirect or spillover ITT parameters, E[Yig Zig = z, Zjg = 1] E[Yig Zig = z, Zjg = 0]. Finally, the total | − | ITT is defined as E[Yig Zig = 1,Zjg = 1] E[Yig Zig = 0,Zjg = 0]. | − |

7 The following result links the direct ITT estimand to potential outcomes. In what follows, the notation Cig,SCig ATjg refers to the event (Cig ATjg) (SCig ATjg), that is, { } × { } ∩ ∪ ∩ unit j is an always-taker and unit i can be a complier or a social complier. Similarly,

Cig,SCig Cjg, GCjg, NTjg represents all the combinations in which unit i is a complier { } × { } or a social complier and unit j is a complier, a group complier or a never-taker, and so on.

Lemma 1 (Direct ITT effects) Under Assumptions 1-3,

E[Yig Zig = 1,Zjg = 0] E[Yig Zig = 0,Zjg =0]= | − | E[Yig(1, 0) Yig(0, 0) Cig ,SCig Cjg, GCjg, NTjg ]P[ Cig,SCig Cjg, GCjg, NTjg ] − |{ } × { } { } × { } +E[Yig(1, 1) Yig(0, 0) Cig ,SCig SCjg ]P[ Cig,SCig SCjg ] − |{ } × { } { } × { } +E[Yig(1, 1) Yig(0, 1) Cig ,SCig ATjg ]P[ Cig,SCig ATjg ] − |{ } × { } { } × { } +E[Yig(0, 1) Yig(0, 0) GCig , NTig SCjg ]P[ GCig, NTig SCjg ] − |{ } × { } { } × { } +E[Yig(1, 1) Yig(1, 0) ATig ,SCjg]P[ATig,SCjg]. − |

The corresponding results for the indirect ITT and the total ITT are analogous, and are presented in Section A of the supplemental appendix to conserve space.

To interpret the above result, consider the effect of switching Zig from 0 to 1, while leaving Zjg fixed at zero. First, if unit i is either a complier or a social complier, switching

Zig from 0 to 1 will change her treatment status Dig from 0 to 1. This case corresponds to the first three expectations on the right-hand side of Lemma 1. Now, if unit j is a complier, a group complier or a never-taker, her observed treatment status would be Djg = 0. Hence, in these cases, switching Zig from 0 to 1 while leaving Zjg fixed at zero would let us observe

Yig(1, 0) Yig(0, 0). This corresponds to the first expectation on the right-hand side of − Lemma 1. On the other hand, if unit j was a social complier, switching Zig from 0 to 1 would push her to get the treatment, and hence in this case we would see Yig(1, 1) Yig(0, 0). − This case corresponds to the second expectation on the right-hand side of Lemma 1. If instead unit j was an always-taker, she would be treated in both scenarios, so we would see Yig(1, 1) Yig(0, 1) (third expectation of the above display). Next, suppose unit i was − a group complier or a never-taker. Then, switching Zig from 0 to 1 would not affect her treatment status, which would be fixed at 0, but it would affect unit j’s treatment status if she is a social complier. This case is shown in the fourth expectation on the right-hand side of Lemma 1. Finally, if unit i was an always-taker, her treatment status would be fixed at 1 but her peer’s treatment status would switch from 0 to 1 if unit j was a social complier. This case is shown in the last expectation on the right-hand side of Lemma 1.

Hence the direct ITT effect is averaging five different treatment effects, Yig(1, 0) Yig (0, 0), − Yig(1, 1) Yig(0, 0), Yig(1, 1) Yig(1, 0), Yig(0, 1) Yig(0, 0), and Yig(1, 1) Yig(0, 1), each − − − − one over different combinations of compliance types. Therefore, Lemma 1 shows that, even when fixing the peer’s assignment, the ITT parameter is unable to isolate direct and indirect

8 effects, which blurs its link to causal effects.

In some contexts, ITT parameters are deemed policy relevant as they measure the “effect” of offering the treatment or making the treatment available, as opposed to the effect of the treatment itself (Abadie and Cattaneo, 2018). This interpretation is based on the fact that, without spillovers, the ITT is E[Yig Zig = 1] E[Yig Zig = 0] = E[Yig(1) Yig(0) Cig]P[Cig] | − | − | which is the local average treatment effect, down-weighted by the compliance rate. In par- ticular, this well-known fact has three implications that facilitate the interpretation of the ITT as a policy-relevant parameter: (i) it has the same sign as the LATE, and it equals zero if and only if the LATE is zero (unless the instrument is completely irrelevant) (ii) it is a lower bound for the LATE (in absolute terms) and (iii) it is proportional to the LATE, so it can be easily rescaled to recover the LATE.

Lemma 1 shows that the close link between the ITT and the LATE breaks down in the presence of spillovers. First, the ITT now combines multiple average direct and spillover effects which can have different signs and magnitudes. In particular, this implies that the ITT could be zero even if all treatment effects are non-zero. Second, for this same reason, the ITT is no longer a lower bound for any of the direct or spillover effects. Finally, rescaling the ITT by the first stage does not recover a treatment effect. Specifically, the weights from the direct ITT sum to P[Cig]+ P[SCig]+ P[SCig, GCjg]+ P[SCig, NTjg]+ P[SCig, ATjg], whereas

E[Dig Zig = 1,Zjg = 0] E[Dig Zig = 0,Zjg =0]= P[Cig]+P[SCig] from Proposition 1. As an | − | illustration, consider the (extreme) case in which types are independent and equally likely, so ′ that P[ξig = ξ,ξjg = ξ ] = 1/25. In this case the rescaled weights equal (0.6, 0.2, 0.2, 0.2, 0.1) which sum to 1.3.

Remark 3 (Spillovers and instrument validity) Another way to interpret the result in Lemma 1 is to think of spillovers in treatment take-up as violating instrument validity. Since

Djg is a function of Zig, the instrument Zig can affect the outcome Yig not only through the variable it is instrumenting, Dig, but also through another variable Djg. Thus, spillovers on treatment take-up may render an instrument invalid even when the instrument would have been valid in the absence of spillovers. This fact shows that identification of causal parameters based on (Zig,Zjg) will require further assumptions, as discussed in the next section.

In all, this section shows that ITT parameters are generally not a useful measure of average direct and spillover effects. Instead of focusing on ITT parameters, which rely exclusively on variation generated by the instruments (Zig,Zjg), an alternative approach for identification is to exploit the combined variation in (Zig,Zjg, Dig, Djg). Imbens and Rubin (1997) show that, in the absence of spillovers, this approach allows for separate point identification of average (or distributions of) potential outcomes for compliers. This approach, however, breaks down in the presence of spillovers. The reason is that, without further assumptions, the possible combinations of treatment and instrument values are not enough to disentangle all the different compliance types. Further details on this issue are provided in Section A.3 of

9 the supplemental appendix. In what follows, I show that point identification can be restored when the degree of noncompliance is restricted.

4 Identification Under One-sided Noncompliance

This section shows that identification of some causal parameters can be achieved by limiting the amount of noncompliance. I will analyze the case in which noncompliance is one-sided. One-sided noncompliance refers to the case in which individual deviations from their assigned treatment, Dig = Zig, can only occur in one direction. 6 In many applications, units who are not assigned to treatment are unable to get the treatment through other channels. For example, consider a voter turnout experiment in which individuals in two-voter households are randomly assigned to receive a telephone call encouraging them to vote (as in Foos and de Rooij, 2017). In this case, units that are assigned

Zig = 1 may fail to receive the actual phone call (for example, because they refuse to pick up the phone), in which case Zig = 1 and Dig = 0, but whenever a unit is assigned Zig = 0, this automatically implies Dig = 0. More generally, one-sided noncompliance is common when the experimenter is the only provider of a treatment (Abadie and Cattaneo, 2018). I formalize this case as follows.

Assumption 4 (One-sided Noncompliance - OSN) For all i and g, Dig(0, 1) = Dig(0, 0) = 0.

One-sided noncompliance implies the absence of always-takers and social-interaction com- pliers. In what follows, all the results focus on identifying the expectation of potential out- comes, but these results immediately generalize to identification of marginal distributions of

potential outcomes by replacing Yig by ½(Yig y). ≤ Proposition 2 (Local average potential outcomes under OSN) Under Assumptions 1-4, the following equalities hold:

E[Yig(0, 0)] = E[Yig Zig = 0,Zjg = 0] | E[Yig(1, 0) Cig]P[Cig]= E[YigDig Zig = 1,Zjg = 0] | | E[Yig(0, 1) Cjg]P[Cjg]= E[YigDjg Zig = 0,Zjg = 1] | | E[Yig(0, 0) Cig]P[Cig]= E[Yig Zig = 0,Zjg = 0] E[Yig(1 Dig) Zig = 1,Zjg = 0] | | − − | E[Yig(0, 0) Cjg]P[Cjg]= E[Yig Zig = 0,Zjg = 0] E[Yig(1 Djg) Zig = 0,Zjg = 1] | | − − | E[Yig(0, 0) NTig, NTjg]P[NTig, NTjg]= E[Yig(1 Dig)(1 Djg) Zig = 1,Zjg = 1] | − − | where P[NTig, NTjg]= E[(1 Dig)(1 Djg) Zig = 1,Zjg = 1]. − − |

10 Combined with Proposition 1, the above result shows which local average potential out- comes can be identified by exploiting variation in the observed treatment status and instru- ments (Dig, Djg,Zig,Zjg). Proposition 2 has the following implication.

Corollary 1 (Local average direct and spillover effects under OSN) Under Assump- tions 1-4, if P[Cig] > 0,

E[Yig Zig = 1,Zjg = 0] E[Yig Zig = 0,Zjg = 0] E[Yig(1, 0) Yig(0, 0) Cig]= | − | − | E[Dig Zig = 1,Zjg = 0] | and

E[Yig Zig = 0,Zjg = 1] E[Yig Zig = 0,Zjg = 0] E[Yig(0, 1) Yig(0, 0) Cjg]= | − | . − | E[Djg Zig = 0,Zjg = 1] |

In the above result, E[Yig(1, 0) Yig(0, 0) Cig] represents the average direct effect on − | compliers with untreated peers, whereas E[Yig(0, 1) Yig(0, 0) Cjg] is the average effect on − | untreated units with compliant peers. See Section 6 for a detailed discussion on these esti- mands in the context of an empirical application. In addition, Section A.4 generalizes these results to the case of multiple treatment levels.

4.1 Assessing Heterogeneity

In addition to identifying these treatment effects, Proposition 2 can be used to assess, at least partially, whether average potential outcomes vary across own and peer compliance types, c as the following corollary shows. In what follows, Cig represents the event in which unit i is c not a complier, that is, C = NTig GCig. ig ∪ Corollary 2 (Assessing heterogeneity over compliance types) Under Assumptions

1-4, if 0 < P[Cig] < 1,

c E[Yig(0, 0) Cig] E[Yig(0, 0) C ]= | − | ig E[YigDig Zig = 1,Zjg = 0] 1 | E[Yig Zig = 0,Zjg = 0] . E[Dig Zig = 1,Zjg = 0] − | 1 E[Dig Zig = 1,Zig = 0]  |  − | and

c E[Yig(0, 0) Cjg] E[Yig(0, 0) C ]= | − | jg E[YigDjg Zig = 0,Zjg = 1] 1 | E[Yig Zig = 0,Zjg = 0] . E[Djg Zig = 0,Zjg = 1] − | 1 E[Djg Zig = 0,Zig = 1]  |  − |

c The first term in the above corollary, E[Yig(0, 0) Cig ] E[Yig(0, 0) C ], is the difference in | − | ig the average baseline outcome Yig(0, 0) between compliers and non-compliers (i.e. group com-

11 c pliers or never-takers), whereas E[Yig(0, 0) Cjg] E[Yig(0, 0) C ] is the difference in average | − | jg baseline potential outcomes among units with compliant and non-compliant peers. These differences can be used to determine whether average baseline potential outcomes vary with own and peers’ compliance types, which can help assess the external validity of the local average effects. More precisely, if these differences are small, the local effects may be con- sidered informative, at least to some extent, about average effects for the whole population, whereas finding marked heterogeneity across types would emphasize the local nature of the parameters in Corollary 1.

4.2 Testing Instrument Validity

While identification assumptions are generally not testable, the existing literature on instru- mental variables has provided ways to indirectly assess their validity by deriving testable implications (Balke and Pearl, 1997; Imbens and Rubin, 1997; Kitagawa, 2015). Based on these results, the following proposition provides a way to test for instrument validity in the presence of spillovers.

Proposition 3 (Testing Instrument Validity) Under Assumptions 1-4, for any Borel set , Y P[Yig , Dig = 0 Zig = 1,Zjg = 0] P[Yig Zig = 0,Zjg = 0] 0 ∈Y | − ∈ Y| ≤ and

P[Yig , Djg = 0 Zig = 0,Zjg = 1] P[Yig Zig = 0,Zjg = 0] 0. ∈Y | − ∈ Y| ≤

Because all the terms in Proposition 3 only involve observable variables, estimating them can provide a way to assess the validity of the identification assumptions. Estimation and inference for these objects can be conducted using the local regression distribution methods in Cattaneo et al. (forthcoming a) and Cattaneo et al. (forthcoming b). In particular, when the outcome variable is binary, the conditions in Proposition 3 reduce to:

E[Yig(1 Dig) Zig = 1,Zjg = 0] E[Yig Zig = 0,Zjg = 0] 0 − | − | ≤ and

E[Yig(1 Djg) Zig = 0,Zjg = 1] E[Yig Zig = 0,Zjg = 0] 0. − | − | ≤ In this case, these two restrictions can be tested straightforwardly using linear regression methods, as illustrated in the empirical application.

12 5 Estimation and Inference

This section discusses estimation and inference for the causal effects in Corollary 1. In what ′ ′ ′ follows, let yg = (Y1g,Y2g) , Zg = (Z1g,Z2g) and Dg = (D1g, D2g) .

Assumption 5 (Sampling and moments)

′ ′ ′ G (a) (yg, Zg, Dg)g=1 are independent and identically distributed across g.

(b) For each g, (Y1g,Z1g, D1g) and (Y2g,Z2g, D2g) are identically distributed, not necessarily independent. (c) E[Y 4 ] . ig ≤∞

Part (a) of Assumption 5 states that the researcher has access to a random sample of iid groups. Part (b) states that observations within each group are identically distributed, but allows for an unrestricted correlation structure within groups. Part (c) is a standard regularity condition to ensure the appropriate moments are bounded.1

The average direct and spillover effects in Corollary 1 can be estimated straightforwardly by replacing the right-hand side of each expression by their sample analog. Because these estimators are IV estimators using a binary instrument and a binary treatment on a specific subsample, I will refer to these estimators as conditional Wald estimators. More precisely, I formally define conditional Wald estimators and their corresponding cluster-robust variance estimator as follows.

′ ′ Definition 1 (Direct conditional Wald estimator) Let ˜zig = (1,Zig) , d˜ig = (1, Dig) , ′ ′ ′ ˆ ˆ ˆ ˜zg = (˜z1g, ˜z2g) . The direct conditional Wald estimator δ = (δ0, δ1) is defined as the estimator from the 2SLS regression of Yig on an intercept and Dig using Zig as an instrument, on the

subsample of observations with Zjg = 0, that is,

−1 δˆ δˆ = 0 = ˜z′ ˜w d˜ ˜z′ ˜w y ˆ g g g g g g "δ1# g ! g X X

whenever (1 Zig)(1 Zjg) > 0, Zig(1 Zjg) > 0 and DigZig(1 Zjg) > g i − − g i − g i − 0, where P P P P P P 1 Z2g 0 ˜wg = − . " 0 1 Z1g# − The cluster-robust variance estimator for δˆ is:

−1 −1 ˆ ˆ ′ ˜ ′ ′ ˜′ Vcr(δ)= ˜zg ˜wgdg ˜zg ˜wgˆug ˆug ˜wg˜zg dg ˜wg˜zg g ! g g ! X X X 1Notice the bounded moments condition for the treatment indicators and instruments are automatically satisfied, as these variables are bounded.

13 ′ ′ where uˆig = Yig d˜ δˆ and ˆug = (ˆu g, uˆ g) . − ig 1 2 ′ ′ Definition 2 (Indirect conditional Wald estimator) Let ˇzig = (1,Zjg) , dˇig = (1, Djg) , ′ ′ ′ ˆ ˆ ˆ ˇzg = (ˇz1g, ˇz2g) . The indirect conditional Wald estimator λ = (λ0, λ1) is defined as the estimator from the 2SLS regression of Yig on an intercept and Djg using Zjg as an instrument, on the subsample of observations with Zig = 0, that is,

−1 λˆ λˆ = 0 = ˇz′ ˇw dˇ ˇz′ ˇw y ˆ g g g g g g "λ1# g ! g X X whenever (1 Zig)(1 Zjg) > 0, Zjg(1 Zig) > 0 and DjgZjg(1 Zig) > g i − − g i − g i − 0, where P P P P P P 1 Z1g 0 ˇwg = − . " 0 1 Z2g# − The cluster-robust variance estimator for λˆ is:

−1 −1 ˆ ˆ ′ ˇ ′ ′ ˇ′ Vcr(λ)= ˇzg ˇwgdg ˇzg ˇwgνˆgνˆg ˇwgˇzg dg ˇwgˇzg g ! g g ! X X X ′ ′ where νˆig = Yig dˇ λˆ and νˆg = (ˆν g, νˆ g) . − ig 1 2

An alternative estimation strategy in a setting with multiple instruments and multiple en- dogenous variables is to combine all regressors and instruments into a single 2SLS regression. In a constant coefficients setting, this estimation strategy yields consistent and asymptoti- cally normal estimators. However, these features do not generally extend to a setting with heterogeneous effects. In what follows I show that, under one-sided noncompliance, if the 2SLS regression combining both instruments and endogenous variables is fully saturated, the resulting estimators and cluster-robust standard errors are in fact equivalent to the condi- tional Wald estimators. I start by defining the fully-saturated 2SLS regression estimator as follows.

′ ′ Definition 3 (Saturated 2SLS regression) Let zig = (1,Zig,Zjg,ZigZjg) , dig = (1, Dig, Djg, DigDjg) , ′ ′ ′ ˆ ˆ ˆ ˆ ˆ zg = (z1g, z2g) . The saturated 2SLS regression estimator β = (β0, β1, β2, β3) is defined as the estimator from the 2SLS regression of Yig on an intercept, Dig, Djg and DigDjg using

Zig, Zjg and ZigZjg as instruments on the full sample, that is,

βˆ0 ˆ −1 β1 ′ ′ βˆ = = z d z y . ˆ g g g g β2 g ! g   X X βˆ3     whenever (1 Zig)(1 Zjg) > 0, Zig(1 Zjg) > 0, ZigZjg > 0, g i − − g i − g i DigZig(1 Zjg) > 0 and DigDjgZigZjg > 0. The cluster-robust variance g i P P− g i P P P P P P P P 14 estimator for βˆ is:

−1 −1 ˆ ˆ ′ ′ ′ ′ Vcr(β)= zgdg zgεˆgεˆgzg dgzg g ! g g ! X X X

′ ′ where εˆig = Yig d βˆ and εˆg = (ˆε g, εˆ g) . − ig 1 2

The following theorem establishes the equivalence between conditional Wald estimators and the saturated 2SLS estimators.

Theorem 1 (Equivalence between conditional Wald and 2SLS) Consider the es- timators δˆ, λˆ and βˆ from Definitions 1, 2 and 3. Suppose that Assumptions 1-5 hold, and that (1 Zig)(1 Zjg) > 0, Zig(1 Zjg) > 0, ZigZjg > 0, g i − − g i − g i DigZig(1 Zjg) > 0 and DigDjgZigZjg > 0. Then, g i P P − g i P P P P P P P P δˆ0 = λˆ0 = βˆ0, δˆ1 = βˆ1, λˆ1 = βˆ2 and

Vˆ cr,11(δˆ)= Vˆ cr,11(λˆ)= Vˆ cr,11(βˆ), Vˆ cr,22(δˆ)= Vˆ cr,22(βˆ), Vˆ cr,22(λˆ)= Vˆ cr,33(βˆ).

Remark 4 (Interaction terms) Although the coefficient on βˆ3 corresponding to the inter- action term DigDjg does not have a direct causal interpretation, the terms DigDjg and ZigZjg need to be included to ensure the equivalence of the estimators by saturating the model. If

g i ZigZjg = 0, under imperfect compliance DigDjg = 0 and the results from Theorem 1 PholdP after excluding the interaction terms DigDjg and ZigZjg from the estimation procedure.

In what follows let “ P” denote convergence in probability and “ D” denote convergence → → in distribution. The following result shows that the 2SLS are consistent and asymptotically normal. This result is standard in 2SLS and included only for completion.

Lemma 2 (Consistency and asymptotic normality) Under Assumptions 1-5, if P[Zig =

0,Zjg = 0] > 0, P[Zig = 1,Zjg = 0] > 0, P[Zig = 1,Zjg = 1] > 0, E[Dig Zig = 1,Zjg = 0] > 0 | and E[DigDjg Zig = 1,Zjg = 1] > 0, then as G , | →∞

E[Yig(0, 0)] E  [Yig(1, 0) Yig(0, 0) Cig ] βˆ P β = − | → E  [Yig(0, 1) Yig(0, 0) Cjg]  − |   β3      and ′ −1 ′ ′ ′ −1 √2G(βˆ β) D (0, V), V = E[z dg] E[z εgε zg]E[d zg] − → N g g g g ′ ′ where εig = Yig d β and εg = (ε g, ε g) . In addition, 2GVˆ cr(βˆ) P V. − g 1 2 → 15 The interaction population coefficient β3 does not have a direct causal interpretation, and its exact formula is given in the proof in the supplemental appendix.

Notice that the magnitudes of interest defined in Corollary 2 and Proposition 3 cannot be written as 2SLS estimators, and hence Theorem 1 and Lemma 2 do not apply directly. However, all these parameters are nonlinear functions of sample means, and hence consistency and asymptotic normality follows under standard conditions. I provide further details on estimation and inference for these parameters in the supplemental appendix to conserve space.

5.1 Weak-Instrument Robust Inference

When non-compliance is very severe, the proportion of compliers can be close to zero and, as a result, instruments may be weak. In such cases, the estimators analyzed above may have poor finite-sample properties, and inference based on the normal approximation may be unreliable. The 2SLS literature has provided several alternatives to conduct inference that is robust to weak instruments (see Andrews et al., 2019, for a recent review). In particular, Fieller or Anderson-Rubin (AR) confidence intervals have been shown to provide correct coverage regardless of the strength of the instruments.

By Theorem 1, the average direct and spillover effects can be estimated by two separate regressions that involve a single binary instrument and a single binary endogenous variable (the conditional Wald estimators), which give the same estimates and standard errors as the saturated 2SLS regression. The advantage of the conditional Wald estimators is that their corresponding AR confidence intervals can be obtained by solving a quadratic inequality as shown by Dufour and Taamouti (2005) and Mikusheva (2010), without the need of compu- tationally intensive grid searches or projection methods. Section C provides further details on how to construct AR confidence intervals in this context. I illustrate these issues in the empirical application.

6 Application: Spillovers in Voting Behavior

In this section I illustrate the results in this paper using data from a randomized experiment on voter mobilization conducted by Foos and de Rooij (2017). The goal of their study is to assess if political discussions within close social networks such as the household have an effect on voter turnout, and, if so, in what direction.

To this end, the authors conducted a randomized experiment in which two-voter house- holds in Birmingham, UK were randomly assigned to receive a telephone message providing information and encouraging people to vote on the West Midlands Police and Crime Commis-

16 sioner (PCC) 2012 election. A sample of 5,190 two-voter households with landline numbers were divided into treatment and control households, and within the households assigned to the treatment, only one household member was randomly selected to receive the telephone message.2

Because the telephone message is delivered by landline, this type of experiments is usually subject to severe rates of nonresponse, since individuals assigned to treatment are likely to be unavailable, refuse to participate, may have moved or their phone numbers can be outdated or wrong. For these and other reasons, it is common to find compliance rates below 50 percent (Gerber and Green, 2000; John and Brannan, 2008). In the experiment described here, the response rate among individuals assigned to receive the message is about 45 percent. To account for the potential endogeneity of this type of noncompliance, the randomized treatment assignment can be used as an instrument for actual treatment receipt.

More precisely, for each household g, let (Zig,Zjg) be the randomized treatment assignment

for each unit, where Zig = 1 if individual i is randomly assigned to receive the phone call.

Let (Dig, Djg) be the treatment indicators, where Dig = 1 if individual i actually receives

the phone message. Finally, the outcome of interest Yig, voter turnout, equals 1 if individual i voted in the election.

In this experiment, noncompliance is one-sided, as units assigned to treatment can fail to receive the phone call, but units assigned to control do not receive it. Since only one member

of each treated household was selected to receive the call, we also have that P[Zig = 1,Zjg =

1] = 0. Given this experimental design, the first stage reduces to estimating E[Dig Zig = | 1,Zjg = 0] = E[Dig Zig = 1]. The estimated coefficient is 0.451, significantly different from | zero at the one percent level and with an F -statistic of 1759.03, which suggests a strong instrument.

The estimation results are shown in Table 2. Column (1) shows the naive estimates

obtained by ignoring the presence of spillovers, that is, running a 2SLS using Zig as an

instrument for Dig without accounting for peer’s assignment or treatment status. Taken at face value, these estimates suggest an ITT effect of about 2 percentage points and a local average effect of about 4 percentage points on voter turnout. However, in the presence of spillovers, these magnitudes do not generally have a clear causal interpretation. In fact, by Proposition 2, given this assignment mechanism,

E[Yig Zig = 1] E[Yig Zig = 0] | − | = E[Dig Zig = 1] | E[Yig(1, 0) Yig(0, 0) Cig] E[Yig(0, 1) Yig(0, 0) Cjg]P[Zjg = 1 Zig = 0], − | − − | |

so the naive 2SLS estimand that ignores spillovers equals the difference between the direct

2The experiment had two different treatment intensities, which I pool here for illustration purposes. Section D in the supplemental appendix provides a detailed analysis of the multi-level treatment.

17 and indirect LATEs, where the indirect LATE is rescaled by the conditional probability of treatment assignment. Therefore, the naive 2SLS estimand can be close to zero whenever direct and spillover effects have the same sign.

Column (2) in Table 2 shows the estimated direct and indirect ITT and LATE parameters based on Corollary 1. The results reveal strong evidence of direct and indirect effects. On the one hand, ITT effects are positive and significant. ITT estimates are around 3 and 7 percentage points, respectively. While these magnitudes do not directly measure causal effects, as shown in Lemma 1, under one-sided noncompliance ITT parameters are attenuated

(rescaled) measures of local average effects, that is, E[Yig Zig = 1,Zjg = 0] E[Yig Zig = | − | 0,Zjg = 0] = E[Yig(1, 0) Cig ]P[Cig] and E[Yig Zig = 0,Zjg = 1] E[Yig Zig = 0,Zjg = 0] = | | − | E[Yig(0, 1) Cjg]P[Cjg]. These magnitudes can be interpreted as the “effect” of offering the | treatment (see the discussion on ITT and LATE in Section 3).

On the other hand, 2SLS estimates reveal that the phone message increases voter turnout on compliers with untreated peers by about 7 percentage points, and turnout for untreated individuals with treated compliant peers by about 10 percentage points, both effects signifi- cant at the 1 percent level. Although the first-stage estimate suggests a strong instrument, I also calculate weak-instrument-robust AR 95%-confidence intervals as a robustness check. These intervals are shown in parentheses in Table 2. Reassuringly, the AR confidence inter- vals are slightly wider but very close to the large-sample-based confidence intervals, and lead to the same conclusions.

The finding that the estimated spillover effect is larger than the direct effect may seem surprising, as one may intuitively expect indirect effects to be weaker than direct ones. This comparison, however, must be done with care, as the estimated effects correspond to different subpopulations. More precisely, the direct effect is estimated for compliers, whereas the spillover effect is estimated for units with compliant peers, averaging over own compliance type. This is different than comparing the direct and spillover effects on the population of compliers. Note that the indirect LATE is:

E[Yig(0, 1) Yig(0, 0) Cjg]= E[Yig(0, 1) Yig(0, 0) Cig,Cjg]P[Cig Cjg] − | − | | c c + E[Yig(0, 1) Yig(0, 0) C ,Cjg]P[C Cjg] − | ig ig| so it combines the effects on compliers and non-compliers, conditional on them having a compliant peer.

Table 3 provides some further insight to interpret these findings. The results estimate the difference in average baseline potential outcomes Yig(0, 0) between compliers and noncom- pliers (first row) and between units with compliant and non-compliant peers (second row) following Corollary 2. The differences are about 17 and 19 percentage points, respectively, significant at the 1 percent level. Because the outcome of interest is binary, the fact that

18 Table 2: Main Estimation Results

(1) (2) ITT Zig 0.0178 0.0305 (0.0089) (0.0114) [0.0004 , 0.0352] [0.0081 , 0.0529] Zjg 0.0459 (0.0116) [0.0232 , 0.0686] 2SLS Dig 0.0394 0.0675 (0.0196) (0.0252) [0.0010 , 0.0778] [0.0182 , 0.1168] (0.0008 , 0.0782) (0.0175 , 0.1208) Djg 0.1017 (0.0255) [0.0517 , 0.1518] (0.0502 , 0.1570)

N 9,860 9,860 Clusters 4,930 4,930

Notes: estimated results from reduced-form regressions (“ITT”) and 2SLS regressions (“2SLS”). Column (1) shows the naive reduced-form and 2SLS estimates that ignore spillovers. Column (2) shows the ITT effects and local average direct and spillover effects obtained by 2SLS. Standard errors in parentheses. 95%-confidence intervals in brackets are based on the large-sample normal approximation. 95%-confidence intervals in parentheses are weak- instrument-robust AR confidence intervals. Estimation accounts for clustering at the household level.

compliers start from a higher baseline leaves a smaller margin for the treatment to increase turnout. For this reason, we can expect the noncompliers to show larger average effects than compliers, which could explain the difference between the direct and indirect effects in Table 2.

Finally, instrument validity can be assessed based on Proposition 3. In this application, because the outcome variable is binary, it is sufficient to run the following regression:

Yig(1 Dig)(1 Djg)= γ + γ Zig + γ Zjg + uig (1) − − 0 1 2

and test that γ 0 and γ 0. The results from this regression are shown in Table 4. As 1 ≤ 2 ≤ expected, both coefficients are negative and significant.

19 Table 3: Assessing heterogeneity over types

(1)

c E[Yig(0, 0) Cig] E[Yig(0, 0) C ] 0.1705 | − | ig (0.0297) [0.1123 , 0.2286] c E[Yig(0, 0) Cjg] E[Yig(0, 0) C ] 0.1913 | − | jg (0.0300) [0.1325 , 0.2500]

N 9,860 Clusters 4,930

Notes: estimated heterogeneity measures from Corollary 2. Standard errors in parentheses. 95%-confidence intervals based on the large-sample normal approximation. Estimation accounts for clustering at the household level.

Table 4: Testing instrument validity

(1)

Zi 0.0996 − (0.0094) [ 0.1181 , 0.0811] − − Zj 0.0893 − (0.0096) [ 0.1082 , 0.0704] − − N 9,860 Clusters 4,930

Notes: estimated results from Equation (1). Standard errors in parentheses. 95%-confidence intervals based on the large-sample normal approximation. Estimation accounts for clustering at the household level.

20 7 Generalizations and Extensions

7.1 Conditional-on-observables IV

In many cases, the assumption that the instruments are as good as randomly assigned is more credible after conditioning on a set of covariates. This is particularly relevant when the variation in the instruments is not controlled by the researcher, but instead comes from a natural experiment (see Angrist and Krueger, 1991; Card, 1995; Angrist and Evans, 1998; Currie and Moretti, 2003, for well-known examples). Furthermore, Abadie (2003) shows that covariates can be exploited to identify the characteristics of the compliant subpopulation.

In this section I generalize my results to the case in which quasi-random assignment of

(Zig,Zjg) holds after conditioning on observable characteristics, following Abadie (2003). ′ ′ ′ Let Xg = (Xig, Xjg) be a vector of observable characteristics for units i and j in group g. I introduce the following assumption.

Assumption 6 (Conditional-on-observables IV)

′ ′ ′ ′ ′ 1. Exclusion restriction: Yig(d, d ,z,z )= Yig(d, d ) for all (d, d ,z,z ), ′ ′ 2. Independence: for all i, j = i and g, ((Yig(d, d )) ′ , (Dig(z, z )) ′ ) (Zig,Zjg) Xg, 6 (d,d ) (z,z ) ⊥⊥ | 3. Monotonicity: P[Dig(1, 1) Dig(1, 0) Dig(0, 1) Dig(0, 0) Xg] = 1, ≥ ≥ ≥ | ′ ′ 4. One-sided noncompliance: P[Dig(0, z ) = 0 Xg] = 1 for z = 0, 1. |

′ Let pzz′ (Xg)= P[Zig = z, Zjg = z Xg]. Then we have the following result. | Proposition 4 (Identification conditional on observables) Under Assumption 6,

P[Cig Xg]= E[Dig Zig = 1,Zjg = 0, Xg] | | P[GCig Xg]= E[Dig Zig = 1,Zjg = 1, Xg] E[Dig Zig = 1,Zjg = 0, Xg] | | − | P[NTig Xg] = 1 P[GCig Xg] P[Cig Xg] | − | − | and for any (integrable) function g( , ), · ·

(1 Zig)(1 Zjg) E[g(Yig(0, 0), Xg )] = E g(Yig, Xg) − − p (X )  00 g  Zig(1 Zig) E[g(Yig(1, 0), Xg ) Cig]P[Cig]= E g(Yig, Xg)Dig − | p (X )  10 g  (1 Zig)Zig E[g(Yig(0, 1), Xg) Cjg]P[Cjg]= E g(Yig, Xg)Djg − | p (X )  01 g  (1 Zig)(1 Zjg) Zig(1 Zjg) E[g(Yig(0, 0), Xg ) Cig]P[Cig]= E g(Yig, Xg) − − E g(Yig, Xg)(1 Dig) − | p (X ) − − p (X )  00 g   10 g  (1 Zig)(1 Zjg) (1 Zig)Zjg E[g(Yig(0, 0), Xg) Cjg]P[Cjg]= E g(Yig, Xg) − − E g(Yig, Xg)(1 Djg) − , | p (X ) − − p (X )  00 g   01 g  21 whenever the required conditional probabilities pzz′ (Xg) are positive. Furthermore, these equalities also hold conditional on Xg.

This result shows identification of functions of potential outcomes and covariates for compliers and for units with compliant peers. In particular, note that setting g(y,x) = y recovers the result from Proposition 2, which gives identification of local direct and spillover effects, both unconditionally or conditional on Xg. On the other hand, setting g(y,x) = x shows that it is possible to identify the average characteristics of compliers and units with compliant peers. Hence, even if compliance type is unobservable, it is possible to characterize the distribution of observable characteristics for these subgroups.

7.2 Multiple Units Per Group

Without further assumptions, identification becomes increasingly harder as group size grows. The larger the group, the larger the set of compliance types, as units may respond in different ways to the different possible combinations of own and peers’ treatment assignments, and it is generally not possible to pin down each unit’s type. In particular, one-sided noncompliance is not enough to identify causal parameters when more than one unit is assigned to treatment since, as soon as more than one unit is assigned to treatment, it is not possible to distinguish between compliers and group compliers or between group compliers and never-takers.

Some studies addressed this problem by restricting the way in which potential treat- ment status depends on peers’ assignments (see e.g. Kang and Imbens, 2016; Imai et al., forthcoming). In this section I provide an alternative assumption, independence of peers’s types (IPT), under which average potential outcomes do not depend on peers’ compliance types. I then show that, under a generalization of the monotonicity assumption, average potential outcomes can be identified in the presence of two-sided noncompliance and an unrestricted spillovers structure in both outcomes and treatment statuses.

Suppose that each group g has ng + 1 identically-distributed units, so that each unit in group g has ng neighbors or peers. The vector of treatment statuses in each group is given by Dg = (D1g,...,Dng +1,g). For each unit i, Dj,ig is the treatment indicator corresponding to unit i’s j-th neighbor, collected in the vector D(i)g = (D1,ig, D2,ig,...,Dng,ig). This vector ng takes values dg = (d , d ,...,dn ) g 0, 1 . For a given realization of the treatment 1 2 g ∈ D ⊆ { } status (d, dg), the potential outcome for unit i in group g is Yig(d, dg) with observed outcome

Yig = Yig(Dig, D(i)g). In what follows, 0g and 1g will denote ng-dimensional vectors of zeros and ones, respectively, and the analysis is conducted for a given group size ng.

ng Let Z be the vector of unit i’s peers’ instruments, taking values zg 0, 1 . To reduce (i)g ∈ { } the complexity of the model, I will assume that potential statuses and outcomes satisfy an exchangeability condition under which the identities of the treated peers do not matter, and

22 thus the variables depend on the vectors zg and dg, respectively, only through the sum of its elements. See Vazquez-Bare (2020) for a detailed discussion on this assumption. Under this ′ condition, we have that Dig(z, zg)= Dig(z, wg) where wg = 1gzg and Yig(d, dg)= Yig(d, sg) ′ where sg = 1gdg.

The following assumption collects the required conditions for the upcoming results.

Assumption 7 (Identification conditions for general ng)

(a) Exclusion restriction: Yig(d, dg, zg)= Yig(d, dg, ˜zg) for all zg, ˜zg. ¯ (b) Independence: let yig = (Yig(d, dg))(d,dg ), y(i)g = (yjg)j=6 i, dig = (Dig(z, zg))(z,zg ) and

d¯ = (d¯jg)j6 i. Then, (yig, d¯ig, y , d¯ ) (Zig, Z ). (i)g = (i)g (i)g ⊥⊥ (i)g (c) Exchangeability:

′ (a) Dig(z, zg)= Dig(z, wg) where wg = 1gzg. ′ (b) Yig(d, dg)= Yig(d, sg) where sg = 1gdg. (d) Monotonicity:

′ ′ (i) Dig(z, wg) Dig(z, w ) for wg w and z = 0, 1, ≥ g ≥ g (ii) Dig(1, 0) Dig(0,ng). ≥

Parts (a) and (b) in Assumption 7 generalize Assumptions 1 and 2 to the case of gen- eral group sizes. Part (c) imposes exchangeability as discussed above. Part (d) generalizes the monotonicity assumption requiring that treatment take-up becomes more likely as the number of peers assigned to treatment increases, and that the effect of own assignment is stronger than the effect of peers’ assignments.

Under monotonicity, we can define five compliance classes. First, always-takers, AT, ∗ are units with Dig(0, 0) = 1 which implies Dig(z, wg) = 1 for all (z, wg). Next, w -social ∗ compliers, SC(w ), are units for whom Dig(1, wg) = 1 for all wg, and for which there exists ∗ ∗ ∗ 0 < w < ng such that Dig(0, wg) = 1 for all wg w . Thus, w -social compliers start ≥ receiving treatment as soon as w∗ of their peers are assigned to treatment. Compliers, C, are ∗ ∗ units with Dig(1, wg)=1 and Dig(0, wg) = 0 for all wg. Next, w -group compliers, GC(w ) ∗ have Dig(0, wg) = 0 for all wg and there exists 0 < w < ng such that Dig(1, wg) = 1 for ∗ ∗ all wg w . That is, w -group compliers need to be assigned to treatment and have at ≥ least w∗ peers assigned to treatment to actually receive the treatment. Finally, never-takers,

NT, are units with Dig(z, wg) = 0 for all (z, wg). Let ξig be a random variable indicating ∗ ∗ ∗ unit i’s compliance type, with ξig Ξ = NT, GC(w ), C, SC(w ), AT w = 1,...,ng and ∈ { | } ξ the vector collecting ξjg for j = i. As before, let the event ATig = ξig = AT , (i)g 6 { } ∗ ∗ Cig = ξig = C , GC(ω ) = ξig = GC(ω ) and similarly for the other compliance types. { } { } Finally, let Wig = j=6 i Zjg be the observed number of unit i’ peers assigned to treatment. The following resultP discusses identification of the distribution of compliance types.

23 Proposition 5 Under Assumption 7,

P[ATig]= E[Dig Zig = 0,Wig = 0] | ∗ ∗ ∗ ∗ P[SCig(w )] = E[Dig Zig = 0,Wig = w ] E[Dig Zig = 0,Wig = w 1], 1 < w

The following assumption restricts the amount of heterogeneity in potential outcomes by ensuring that potential outcomes are independent from peers’ compliance types, conditional on own type.

Assumption 8 (Independence of peers’ types) Let yig = (Yig(d, dg))(d,dg ). Potential outcomes are independent of peers’ types: yig ξ ξig. ⊥⊥ (i)g |

The following result shows which average potential outcomes are identified under these assumptions.

Proposition 6 (Identification under IPT) Under Assumptions 7 and 8, if P[Dig = d, Zig = z,Sig = s,Wig = w] > 0 for all (d, z, s, w), then E[Yig(1,s) ξig] is identified for all s | and ξig = NT, and E[Yig(0,s) ξig] is identified for all s and ξig = AT. 6 | 6

In particular, this result implies that all the average potential outcomes for compliers

E[Yig(d, sg) Cig] are identified. See the proof in the supplemental appendix for further details. |

8 Conclusion

This paper proposed a potential outcomes framework to analyze identification and estimation of causal spillover effects using instrumental variables. The findings in this paper highlight the challenges of analyzing spillover effects with imperfect compliance and provide practical guidance on how to address them. First, when groups consist of pairs (such as couples, roommates, siblings), local average effects can be identified under one-sided noncompliance using 2SLS methods. One advantage of this approach is that one-sided noncompliance is typically straightforward to verify in practice. While 2SLS methods do not work in general under two-sided noncompliance or when units have multiple peers, the independence of peers’ types assumption introduced in Section 7.2 provides an alternative restriction on effect het- erogeneity that permits identification of causally interpretable parameters. Finally, Section 7.1 generalizes the results to cases in which the assumption of as-if random assignment of the instruments holds after conditioning on a set of covariates, which allows the researcher to apply the results in this paper in more general settings such as natural experiments.

24 References

Abadie, A. (2003), “Semiparametric instrumental variable estimation of treatment response models,” Journal of , 113, 231 – 263.

Abadie, A., and Cattaneo, M. D. (2018), “Econometric Methods for Program Evaluation,” Annual Review of Economics, 10, 465–503.

Andrews, I., Stock, J. H., and Sun, L. (2019), “Weak Instruments in Instrumental Variables Regression: Theory and Practice,” Annual Review of Economics, 11, 727–753.

Angrist, J. D., and Evans, W. N. (1998), “Children and Their Parents’ Labor Supply: Evi- dence from Exogenous Variation in Family Size,” American Economic Review, 88, 450–477.

Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996), “Identification of Causal Effects Using Instrumental Variables,” Journal of the American Statistical Association, 91, 444– 455.

Angrist, J. D., and Krueger, A. B. (1991), “Does Compulsory School Attendance Affect Schooling and Earnings?” Quarterly Journal of Economics, 106, 979–1014.

Angrist, J. D., and Krueger, A. B. (2001), “Instrumental Variables and the Search for Iden- tification: From to Natural Experiments,” Journal of Economic Per- spectives, 15, 69–85.

Athey, S., and Imbens, G. (2017), “The Econometrics of Randomized Experiments,” in Handbook of Field Experiments, eds. A. V. Banerjee and E. Duflo, Vol. 1 of Handbook of Economic Field Experiments, North-Holland, pp. 73 – 140.

Babcock, P., Bedard, K., Charness, G., Hartman, J., and Royer, H. (2015), “Letting Down the Team? Social Effects of Team Incentives,” Journal of the European Economic Associ- ation, 13, 841–870.

Baird, S., Bohren, A., McIntosh, C., and Ozler,¨ B. (2018), “Optimal Design of Experiments in the Presence of Interference,” The Review of Economics and Statistics, 100, 844–860.

Balke, A., and Pearl, J. (1997), “Bounds on Treatment Effects from Studies with Imperfect Compliance,” Journal of the American Statistical Association, 92, 1171–1176.

Barrera-Osorio, F., Bertrand, M., Linden, L. L., and Perez-Calle, F. (2011), “Improving the Design of Conditional Transfer Programs: Evidence from a Randomized Education Experiment in Colombia,” American Economic Journal: , 3, 167–195.

Card, D. (1995), “Using Geographic Variation in College Proximity to Estimate the Return to Schooling,” in Aspects of Labor Behaviour: Essays in Honour of John Vanderkamp, eds. E. G. L.N. Christofides and R. Swidinsky.

25 Cattaneo, M. D., Jansson, M., and Ma, X. (forthcoming a), “Local Regression Distribution Estimators,” Journal of Econometrics.

(forthcoming b), “lpdensity: Local Polynomial Density Estimation and Inference,” Journal of Statistical Software.

Currie, J., and Moretti, E. (2003), “Mother’s Education and the Intergenerational Trans- mission of Human Capital: Evidence from College Openings,” The Quarterly Journal of Economics, 118, 1495–1532.

Duflo, E., and Saez, E. (2003), “The Role of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment,” The Quarterly Journal of Economics, 118, 815–842.

Dufour, J.-M., and Taamouti, M. (2005), “Projection-Based Statistical Inference in Linear Structural Models with Possibly Weak Instruments,” Econometrica, 73, 1351–1365.

Fletcher, J., and Marksteiner, R. (2017), “Causal Spousal Health Spillover Effects and Im- plications for Program Evaluation,” American Economic Journal: , 9, 144–66.

Foos, F., and de Rooij, E. A. (2017), “All in the Family: Partisan Disagreement and Elec- toral Mobilization in Intimate Networks—A Spillover Experiment,” American Journal of Political Science, 61, 289–304.

Garlick, R. (2018), “Academic Peer Effects with Different Group Assignment Policies: Resi- dential Tracking versus Random Assignment,” American Economic Journal: Applied Eco- nomics, 10, 345–69.

Gerber, A. S., and Green, D. P. (2000), “The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment,” American Political Science Review, 94, 653–663.

Halloran, M. E., and Hudgens, M. G. (2016), “Dependent Happenings: a Recent Method- ological Review,” Current Epidemiology Reports, 3, 297–305.

Heckman, J. J., and Pinto, R. (2018), “Unordered Monotonicity,” Econometrica, 86, 1–35.

Hudgens, M. G., and Halloran, M. E. (2008), “Toward Causal Inference with Interference,” Journal of the American Statistical Association, 103, 832–842.

Imai, K., Jiang, Z., and Malani, A. (forthcoming), “Causal Inference with Interference and Noncompliance in Two-Stage Randomized Experiments,” Journal of the American Statis- tical Association.

26 Imbens, G. W., and Angrist, J. D. (1994), “Identification and Estimation of Local Average Treatment Effects,” Econometrica, 62, 467–475.

Imbens, G. W., and Rubin, D. B. (1997), “Estimating Outcome Distributions for Compliers in Instrumental Variables Models,” The Review of Economic Studies, 64, 555–574.

John, P., and Brannan, T. (2008), “How Different Are Telephoning and Canvassing? Results from a ‘Get Out the Vote’ Field Experiment in the British 2005 General Election,” British Journal of Political Science, 38, 565–574.

Kang, H., and Imbens, G. (2016), “Peer Encouragement Designs in Causal Inference with Par- tial Interference and Identification of Local Average Network Effects,” arXiv:1609.04464.

Kang, H., and Keele, L. (2018), “Spillover Effects in Cluster Randomized Trials with Non- compliance,” arXiv:1808.06418.

Kitagawa, T. (2015), “A Test for Instrument Validity,” Econometrica, 83, 2043–2063.

Mikusheva, A. (2010), “Robust confidence sets in the presence of weak instruments,” Journal of Econometrics, 157, 236–247.

Moffit, R. (2001), “Policy Interventions, Low-level Equilibria and Social Interactions,” in Social Dynamics, eds. S. N. Durlauf and P. Young, MIT Press, pp. 45–82.

Mogstad, M., Torgovitsky, A., and Walters, C. R. (2019), “The Causal Interpretation of Two-Stage Least Squares with Multiple Instrumental Variables,” NBER Working Paper.

Sacerdote, B. (2001), “Peer Effects with Random Assignment: Results for Dartmouth Room- mates,” The Quarterly Journal of Economics, 116, 681–704.

Sobel, M. E. (2006), “What Do Randomized Studies of Housing Mobility Demonstrate?: Causal Inference in the Face of Interference,” Journal of the American Statistical Associ- ation, 101, 1398–1407.

Titiunik, R. (2019), “Natural Experiments,” in Advances in Experimental Political Science, eds. J. Druckman and D. Green, In preparation for Cambridge University Press.

Vazquez-Bare, G. (2020), “Identification and Estimation of Spillover Effects in Randomized Experiments,” arXiv:1711.02745.

27 Appendices

A Additional Identification Results

A.1 Indirect ITT Effects

The following result characterizes the indirect ITT effect.

Lemma 3 Under Assumptions 1-3,

E[Yig Zig = 0,Zjg = 1] E[Yig Zig = 0,Zjg =0]= | − | E[Yig(1, 0) Yig(0, 0) SCig GCjg, NTjg ] − |{ } × { } P[ SCig GCjg, NTjg ] × { } × { } +E[Yig(1, 1) Yig(0, 0) SCig SCjg,Cjg ] − |{ } × { } P[ SCig SCjg,Cjg ] × { } × { } +E[Yig(1, 1) Yig(0, 1) SCig, ATjg] − | P[SCig, ATjg] × +E[Yig(0, 1) Yig(0, 0) Cig, GCig, NTig SCjg,Cjg ] − |{ } × { } P[ Cig, GCig, NTig SCjg,Cjg ] × { } × { } +E[Yig(1, 1) Yig(1, 0) ATig SCjg,Cjg ] − |{ } × { } P[ ATig SCjg,Cjg ]. × { } × { }

A.2 Total ITT Effects

The following result characterizes the total ITT effect.

28 Lemma 4 Under Assumptions 1-3

E[Yig Zig = 1,Zjg = 1] E[Yig Zig = 0,Zjg =0]= | − | E[Yig(1, 0) Yig(0, 0) SCig ,Cig, GCig NTjg ] − |{ } × { } P[ SCig,Cig, GCig NTjg ] × { } × { } +E[Yig(1, 1) Yig(1, 0) ATig SCjg,Cjg, GCjg ] − |{ } × { } P[ ATig SCjg,Cjg, GCjg ] × { } × { } +E[Yig(0, 1) Yig(0, 0) NTig , SCig,Cjg, GCjg ] − |{ } { } P[ NTig , SCig,Cjg, GCjg ] × { } { } +E[Yig(1, 1) Yig(0, 1) SCig ,Cig, GCig ATjg ] − |{ } × { } P[ SCig,Cig, GCig ATjg ] × { } × { } +E[Yig(1, 1) Yig(0, 0) SCig ,Cig, GCig SCjg,Cjg, GCjg ] − |{ } × { } P[ SCig,Cig, GCig SCjg,Cjg, GCjg ]. × { } × { }

A.3 Identification Under Monotonicity

In the absence of spillovers, Imbens and Rubin (1997) show that different combinations of Dig and Zig can be exploited to identify average potential outcomes for compliers. The intuition behind this approach is that a unit with Dig = 1 and Zig = 0 is necessarily an always-taker, whereas a unit with Dig = 1 and Zig = 1 can be an always-taker or a complier, and hence the combination of these two cases identifies E[Y Cig] (and an analogous argument implies ig(1)| identification of E[Yig(0) Cig]). To see why does approach does not work in the presence of | spillovers, notice that a unit with Dig = 1, Djg = 1,Zig = 0,Zjg = 0 is an always-taker with an always-taker peer. However, a unit with Dig = 1, Djg = 1,Zig = 1,Zjg = 0 could be an always-taker, a social complier or a complier, and her peer could be an always-taker or a social complier, and it is not possible to disentangle each unit’s compliance type.

Table 5 shows what can be identified under monotonicity without further restrictions by exploiting the variation in (Dig, Djg,Zig,Zjg). For instance, the first row in the table indicates that:

E[Yig Dig = 1, Djg = 1,Zig = 0,Zjg =0]= E[Yig(1, 1) ATig , ATjg]. | |

The table shows that, without further assumptions, is it not possible to point identify aver- age potential outcomes for specific compliance types, with the exception of E[Yig(1, 1) ATig , ATjg] | and E[Yig(0, 0) NTig , NTjg]. |

29 Table 5: System of equations

′ Dig Djg Zig Zjg Yig(d,d ) ξig ξjg

1 1 0 0 Yig(1, 1) AT AT 1 1 0 1 Yig(1, 1) AT,SC AT,SC,C 1 1 1 0 Yig(1, 1) AT,SC,C AT,SC 1 1 1 1 Yig(1, 1) AT,SC,C,GC AT,SC,C,GC 0 0 1 1 Yig(0, 0) NT NT 0 0 1 0 Yig(0, 0) GC,NT C,GC,NT 0 0 0 1 Yig(0, 0) C,GC,NT GC,NT 0 0 0 0 Yig(0, 0) SC,C,GC,NT SC,C,GC,NT 1 0 0 0 Yig(1, 0) AT SC,C,GC,NT 1 0 0 1 Yig(1, 0) AT,SC GC,NT 1 0 1 0 Yig(1, 0) AT,SC,C C,GC,NT 1 0 1 1 Yig(1, 0) AT,SC,C,GC NT 0 1 1 1 Yig(0, 1) NT AT,SC,C,GC 0 1 1 0 Yig(0, 1) GC,NT AT,SC 0 1 0 1 Yig(0, 1) C,GC,NT AT,SC,C 0 1 0 0 Yig(0, 1) SC,C,GC,NT AT

A.4 Multiple Treatment Levels

The identification results in the paper can be extended to the case of multi-level treat- ments. To adapt the notation to this case, suppose that Dig = 0, 1,...,K . The ∈ D { } ′ potential outcome is Yig(k, k ) (which implicitly imposes the exclusion restriction) where k, k′ 0, 1,...,K indicate different treatment levels. The observed outcome is: ∈ { }

′ ′ ½ Yig = Yig(k, k )½(Dig = k) (Djg = k ). ′ kX∈D kX∈D

Suppose that the instrument Zig takes as many values as the treatment, that is, Zig ∈ Z where = . For example, Zig could indicate random assignment into different treatment Z D levels, and Dig indicates whether unit i actually receives the assigned treatment level. The ′ ′ 2 potential treatment status is Dig(k, k ) where k, k . Each unit i can have up to (K + 1) ∈Z different potential treatment statuses given by own and peer’s treatment assignment.

I will assume that non-compliance is one-sided, and that units cannot switch between non-control treatment statuses. For example, in Foos and de Rooij (2017), the treatment assignment consists of three treatment levels: control, low-intensity treatment and high- intensity treatment. In this setting, units may refuse the treatment they are assigned, but units assigned to the low-intensity treatment cannot receive the high-intensity treatment and vice versa.

30 Proposition 7 Suppose that, in addition to Assumption 2, = = 0, 1, 2,...,K , and: D Z { }

′ ′ (a) Dig(0, k ) = 0 for all k , ∈Z ′ ′ (b) Dig(k, k ) 0, k for all k, k . ∈ { } ∈Z

′ E ′ E ½ Then, for any k, k 0, 1,...,K such that [½(Dig(k, 0) = k)] > 0 and [ (Djg(0, k ) = ∈ { }

k′)] > 0, E ½ E[½(Dig = k) Zig = k, Zjg =0]= [ (Dig(k, 0) = k)] | E[Yig Zig = k, Zjg = 0] E[Yig Zig = 0,Zjg = 0] | − | = E[Yig(k, 0) Yig(0, 0) Dig (k, 0) = k]

E[½(Dig = k) Zig = k, Zjg = 0] − | E ′| E [Yig Zig = 0,Zjg = k ] [Yig Zig = 0,Zjg = 0] E ′ ′ ′ | ′ − | ′ = [Yig(0, k ) Yig(0, 0) Djg(k , 0) = k ]. E[½(Djg = k ) Zig = 0,Zjg = k ] − | |

Condition (a) in Proposition 7 implies that units who are assigned to the control condition remain untreated, and Condition (b) states that a unit who is offered treatment level k can either receive that treatment level or remain untreated. This result shows how to identify

the proportion of units who comply with treatment level k, E[½(Dig(k, 0) = k)], the average direct effect on units who comply with treatment level k, E[Yig(k, 0) Yig(0, 0) Dig(k, 0) = k], − | and the average spillover effect on units whose peer complies with treatment level k′.

B Further Details on Estimation and Inference

All the parameters of interest in Section 4 can be recovered by estimating expectations using sample means. More precisely, let

½(Zig = 0,Zjg = 0)

z ½(Zig = 1,Zjg = 0) ½ig = ½(Zig = 0,Zjg = 1)   ½(Zig = 1,Zjg = 1)     and let H( ) be a vector-valued function whose exact shape depends on the parameters to be · estimated, as illustrated below. Then the goal is to estimate:

z E ½ig µ = z "H(Yig, Dig, Djg) ½ # ⊗ ig ′ where the first four elements correspond to the assignment probabilities P[Zig = z, Zjg = z ] E and the remaining elements corresponds to estimands of the form [Yig ½(Dig = d, Djg =

31 ′ ′ d )½(Zig = z, Zjg = z )]. The most general choice of H in this setup is the following:

½(Dig = 0, Djg = 0) .  . 

 ½(Dig = 1, Djg = 1)  H(Yig, Dig, Djg)=  

 ½  Yig (Dig = 0, Djg = 0)  .   .   .   

Y ½(D = 1, D = 1)  ig ig jg    which is a vector of dimension equal to eight that can be used to estimate all the first-stage

E ′ ′ E ½ estimands [½(Dig = d, Djg = d ) Zig = z, Zjg = z ] and average outcomes [Yig (Dig = | ′ ′ d, Djg = d ) Zig = z, Zjg = z ]. In this general case, the total number of equations to be | P ′ estimated is 36: four probabilities [Zig = z, Zjg = z ] plus the four indicators ½(Zig = ′ z, Zjg = z ) times each of the eight elements in H( ). The dimension of H( ) can be reduced, · · ′ for example, by focusing on ITT parameters E[Yig Zig = z, Zjg = z ], which corresponds to: |

H(Yig, Dig, Djg)= Yig, or by imposing the assumptions described in previous sections. For instance, under one-sided noncompliance, the parameters in Corollaries 1 and 2 can be estimated by defining:

½(Zig = 0,Zjg = 0)

z ½ ig = ½(Zig = 1,Zjg = 0)

½(Z = 0,Z = 1)  ig jg    and Dig  Yig  H(Yig, Dig, Djg)= . Yig(1 Dig)  −  Yig(1 Djg)  −    z

Regardless of the choice of ½ and H( ), the dimension of the vector of parameters to be ig · estimated is fixed (and at most 36). Consider the following sample mean estimator:

G 1 µˆ = W G g g X=1 where

z z ½ (½1g + 2g)/2

Wg = z z . ½ "(H(Y1g, D1g, D2g) ½ g + H(Y2g, D2g, D1g) g)/2# ⊗ 1 ⊗ 2

It is straightforward to see that under assumption 5, µˆ is consistent for µ and converges

32 in distribution to a normal random variable after centering and rescaling as G : →∞ D √G(µˆ µ) (0, Σ) − →N

′ where Σ = E[(Wg µ)(Wg µ) ], and where the limiting variance can be consistently − − estimated by: 1 ′ Σˆ = (Wg µˆ)(Wg µˆ) . G g − − X Finally, once µˆ has been estimated, the treatment effects of interest can be estimated as (pos- sibly nonlinear) transformations of µˆ, and their variance estimated using the delta method.

C Further Details on AR confidence intervals

In this section I outline the procedure to construct weak-instrument-robust confidence inter- vals for the direct effect on compliers (the procedure for the spillover effect is analogous). ∗ Given some hypothesized value β1 for this effect, consider the following regression on the subsample of units with untreated peers:

∗ Yig β Dig = θ + θ Zig + ǫig. − 1 0 1

∗ Since Zig is binary, θˆ =γ ˆ β D¯ where D¯ = DigZig(1 Zjg)/ Zig(1 Zjg) is the 1 1− 1 10 10 g,i − g,i − first-stage estimate andγ ˆ is the reduced-form estimate YigZig(1 Zjg)/ Zig(1 1 P g,i P− g,i − Zjg) Yig(1 Zig)(1 Zjg)/ (1 Zig)(1 Zjg). By previous results, as G − g,i − − g,i − − P P → ∗ , θˆ P (E[Yig(1, 0) Yig(0, 0) Cig ] β ) P[Cig] and thus under the null hypothesis that ∞ 1 →P − | P− 1 ∗ E[Yig(1, 0) Yig(0, 0) Cig]= β , θ = 0 and: − | 1 1 ˆ2 θ1 2 D χ1 V (θˆ1) →

2 ˆ ˆ where χ1 is the chi-squared distribution and V (θ1) is the variance of θ1. Noting that both ∗ θˆ and its variance depend on β , an AR confidence interval for E[Yig(1, 0) Yig(0, 0) Cig ] is 1 1 − | given by: β∗ : θˆ2 V (θˆ )χ2 . { 1 1 ≤ 1 1,1−α} Finally, since θˆ =γ ˆ β∗D¯ , V (θˆ )= V (ˆγ ) + (β∗)2V (D¯ ) 2β∗Cov(ˆγ , D¯ ). Therefore, 1 1 − 1 10 1 1 1 10 − 1 1 10 the AR confidence interval can be obtained by solving the inequality:

(β∗)2 D¯ 2 χ2 V (D¯ ) + 2β∗ χ2 Cov(ˆγ , D¯ ) γˆ D¯ +γ ˆ2 χ2 V (ˆγ ) 0 1 10 − 1,1−α 10 1 1,1−α 1 10 − 1 10 1 − 1,1−α 1 ≤     which depends only on the vector of estimated coefficients, their variance matrix and a 2 quantile from the χ1 distribution. In particular, notice that this quadratic function is strictly

33 ¯ 2 ¯ 2 convex whenever D10/V (D10) >χ1,1−α, which holds when the null hypothesis the first-stage coefficient is zero is rejected at the α level. In this case, the AR confidence interval is bounded and convex.

D Additional Empirical Results

The experiment conducted by Foos and de Rooij (2017) included two different treatment levels. More precisely, the sample of 5,190 two-voter households with landline numbers were stratified into three blocks based on their last recorded party (Labor party sup- porter, rival party supported, unattached) and randomly assigned to one of three treatment arms:

• High-intensity treatment: the telephone message had a strong partisan tone, explicitly mentioning the Labour party and policies, taking an antagonistic stance toward the main rival party. • Low-intensity treatment: the telephone message avoided statements about party com- petition and did not mention the candidate’s affiliation nor the rival party.

• Control: did not receive any form of contact from the campaign.

Finally, within the households assigned to the low- or high-intensity treatment arms, only one household member was randomly selected to receive the telephone message.

In this section, I apply the results from Proposition 7 to analyze the effect of each treat- ment arm by separately comparing households exposed to each treatment intensity to the control households. The empirical results are shown in Table 6. The estimates suggest that both treatment arms had very similar effects.

34 Table 6: Estimation Results with Multiple Treatments

Low-intensity treatment High-intensity treatment ITT Zig 0.0300 0.0310 (0.0145) (0.0146) [0.0016 , 0.0584] [0.0023 , 0.0597] Zjg 0.0485 0.0433 (0.0148) (0.0149) [0.0195 , 0.0775] [0.0142 , 0.0724] 2SLS Dig 0.0662 0.0688 (0.0318) (0.0323) [0.0039 , 0.1285] [0.0055 , 0.1322] Djg 0.1070 0.0962 (0.0324) (0.0328) [0.0435 , 0.1706] [0.0318 , 0.1606]

N 7,750 7,696 Clusters 3,875 3,848

Notes: estimated results from reduced-form regressions (“ITT”) and 2SLS regressions (“2SLS”) separately for each treatment arm against the control households. The first column shows the ITT and 2SLS results for the low- intensity treatment arm. The second column shows the ITT and 2SLS results for the high-intensity treatment arm. Standard errors in parentheses. 95%-confidence intervals are based on the large-sample normal approximation. Estimation accounts for clustering at the household level.

35 E Proofs of Main Results

E.1 Proof of Proposition 1

′ ′ By Assumption 2, E[Dig Zig = z, Zjg = z ] = E[Dig(z, z )]. Thus, under monotonicity | (Assumption 3),

E[Dig Zig = 0,Zig =0]= E[Dig(0, 0)] = P[ATig] | E[Dig Zig = 0,Zig =1]= E[Dig(0, 1)] = P[ATig]+ P[SCig] | E[Dig Zig = 1,Zig =0]= E[Dig(1, 0)] = P[ATig]+ P[SCig]+ P[Cig] | E[Dig Zig = 1,Zig =1]= E[Dig(1, 1)] = P[ATig]+ P[SCig]+ P[Cig]+ P[GCig] | and by solving the system it follows that:

P[ATig]= E[Dig Zig = 0,Zig = 0] | P[SCig]= E[Dig Zig = 0,Zig = 1] E[Dig Zig = 0,Zig = 0] | − | P[Cig]= E[Dig Zig = 1,Zig = 0] E[Dig Zig = 0,Zig = 1] | − | P[GCig]= E[Dig Zig = 1,Zig = 1] E[Dig Zig = 1,Zig = 0] | − | and by monotonicity P[NTig] = 1 P[ATig] P[SCig] P[Cig] P[GCig]. Finally, − − − −

E[DigDjg Zig = 0,Zig =0]= E[Dig(0, 0)Djg(0, 0)] = P[ATig, ATjg] | E[(1 Dig)(1 Djg) Zig = 1,Zig =1]= E[(1 Dig(1, 1))(1 Djg(1, 1))] = P[NTig, NTjg]. − − | − −

See Tables 7 and 8 for the whole system of equations. 

E.2 Proof of Lemma 1

Using that:

′ E[Yig Zig = z, Zjg = z ]= E[Yig(0, 0)] | ′ ′ + E[(Yig(1, 0) Yig(0, 0))Dig(z, z )(1 Djg(z , z))] − − ′ ′ + E[(Yig(0, 1) Yig(0, 0))(1 Dig(z, z ))Djg(z , z)] − − ′ ′ + E[(Yig(1, 1) Yig(0, 0))Dig(z, z )Djg(z , z)], −

36 Table 7: System of equations

Dig Djg Zig Zjg Probabilities

1 1 0 0 pAA 1 1 0 1 pAA + pAS + pAC + pSA + pSS + pSC 1 1 1 0 pAA + pAS + pAC + pSA + pSS + pSC 1111 1 2pN + pNN − 0 0 1 1 pNN 0 0 1 0 pGC + pGG + pGN + pNC + pNG + pNN 0 0 0 1 pGC + pGG + pGN + pNC + pNG + pNN 0000 1 2pA + pAA − 1 0 0 0 pAS + pAC + pAG + pAN 1 0 1 1 pNA + pNS + pNC + pNG 1 0 0 1 pAG + pAN + pSG + pSN 1 0 1 0 pAC + pAG + pAN + pSC + pSG + pSN + pCC + pCG + pCN 0 1 0 0 pAS + pAC + pAG + pAN 0 1 1 1 pNA + pNS + pNC + pNG 0 1 1 0 pAG + pAN + pSG + pSN 0 1 0 1 pAC + pAG + pAN + pSC + pSG + pSN + pCC + pCG + pCN

Table 8: System of equations - simplified

Dig Djg Zig Zjg Probabilities Independent?

1 1 0 0 pAA 1 1 1 0 1 pA + pS (pAG + pAN + pSG + pSN ) 2 − 1 1 1 0 pA + pS (pAG + pAN + pSG + pSN )- − 1111 1 2pN + pNN 3 − 0 0 1 1 pNN 4 0 0 1 0 pG + pN (pAG + pAN + pSG + pSN ) 5 − 0 0 0 1 pG + pN (pAG + pAN + pSG + pSN )- − 0000 1 2pA + pAA 6 − 1 0 0 0 pA pAA - − 1 0 1 1 pN pNN - − 1 0 0 1 pAG + pAN + pSG + pSN 7 1 0 1 0 pC +(pAG + pAN + pSG + pSN )- 0 1 0 0 pA pAA - − 0 1 1 1 pN pNN - − 0 1 1 0 pAG + pAN + pSG + pSN - 0 1 0 1 pC +(pAG + pAN + pSG + pSN )-

37 Table 9: Dig(1, 0)(1 Djg(0, 1)) Dig(0, 0)(1 Djg(0, 0)) − − −

Dig(1, 0) Djg(0, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 -1 AT SC 1 1 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 C,SC C,GC,NT 0 0 0 0 0

Table 10: (1 Dig(1, 0))Djg(0, 1) (1 Dig(0, 0))Djg(0, 0) − − −

Dig(1, 0) Djg(0, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 0 1 1 0 1 -1 C,SC AT 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 GC,NT SC 1 0 0 0 0 0 0 0 0 0

we have:

E[Yig Zig = 1,Zjg = 0] E[Yig Zig = 0,Zjg =0]= | − | + E[(Yig(1, 0) Yig(0, 0))(Dig (1, 0)(1 Djg(0, 1)) Dig(0, 0)(1 Djg(0, 0)))] − − − − + E[(Yig(0, 1) Yig(0, 0))((1 Dig(1, 0))Djg(0, 1) (1 Dig(0, 0))Djg(0, 0))] − − − − + E[(Yig(1, 1) Yig(0, 0))(Dig (1, 0)Djg(0, 1) Dig(0, 0)Djg(0, 0))]. − −

Tables 9, 10 and 11 list the possible values that the terms that depend on the potential treatment statuses can take, which gives the desired result after some algebra. 

38 Table 11: Dig(1, 0))Djg(0, 1) Dig(0, 0)Djg(0, 0) −

Dig(1, 0) Djg(0, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 1 AT SC 1 1 0 1 1 C,SC AT 1 1 0 0 1 C,SC SC 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0

E.3 Proof of Proposition 2

The result follows using that

′ E[Yig Zig = z, Zjg = z ]= E[Yig(0, 0)] | ′ ′ + E[(Yig(1, 0) Yig(0, 0))Dig(z, z )(1 Djg(z , z))] − − ′ ′ + E[(Yig(0, 1) Yig(0, 0))(1 Dig(z, z ))Djg(z , z)] − − ′ ′ + E[(Yig(1, 1) Yig(0, 0))Dig(z, z )Djg(z , z)], − combined with the facts that under one-sided noncompliance, Dig(0, 1) = Dig(0, 0) = 0, for all i, Dig(1, 0) = 1 implies that i is a complier and Dig(1, 1) = 0 implies i is a never-taker. 

E.4 Proof of Corollary 1

Combine lines 2 and 5 from the display in Proposition 2 and the results in Proposition 1, noting that under one-sided noncompliance E[Dig Zig = 0,Zjg = 1] = 0.  |

E.5 Proof of Corollary 2

c c We have that E[Yig(0, 0)] = E[Yig(0, 0) Cig]P[Cig]+ E[Yig(0, 0) C ]P[C ] and thus | | ig ig E E P E c [Yig(0, 0)] [Yig(0, 0) Cig ] [Cig] [Yig(0, 0) Cig]= − | | 1 P[Cig] −

39 from which E E E E c [Yig(0, 0) Cig] [Yig(0, 0)] [Yig(0, 0) Cig] [Yig(0, 0) Cig ]= | − . | − | 1 P[Cig] − Using Proposition 2, we obtain

c E[Yig(0, 0) Cig] E[Yig(0, 0) C ]= | − | ig E[YigDig Zig = 1,Zjg = 0] 1 | E[Yig Zig = 0,Zjg = 0] . E[Dig Zig = 1,Zjg = 0] − | 1 E[Dig Zig = 1,Zig = 0]  |  − | Similarly,

E E E E c [Yig(0, 0) Cjg] [Yig(0, 0)] [Yig(0, 0) Cjg] [Yig(0, 0) Cjg]= | − . | − | 1 P[Cjg] − and thus

c E[Yig(0, 0) Cjg] E[Yig(0, 0) C ]= | − | jg E[YigDjg Zig = 0,Zjg = 1] 1 | E[Yig Zig = 0,Zjg = 0] .  E[Djg Zig = 0,Zjg = 1] − | 1 E[Djg Zig = 0,Zig = 1]  |  − |

E.6 Proof of Proposition 3

Under one-sided noncompliance, for any Borel set , Y

P[Yig , Dig = 0 Zig = 1,Zjg =0]= P[Yig(0, 0) , Dig(1, 0) = 0 Zig = 1,Zjg = 0] ∈Y | ∈Y | = P[Yig(0, 0) , Dig(1, 0) = 0] ∈Y c = P[Yig(0, 0) ,C ] ∈Y ig and

P[Yig Zig = 0,Zjg =0]= P[Yig(0, 0) Zig = 0,Zjg = 0] ∈ Y| ∈ Y| = P[Yig(0, 0) ] ∈Y c = P[Yig(0, 0) ,C ]+ P[Yig(0, 0) ,Cig] ∈Y ig ∈Y from which

P[Yig(0, 0) ,Cig]= P[Yig Zig = 0,Zjg = 0] P[Yig , Dig = 0 Zig = 1,Zjg = 0]. ∈Y ∈ Y| − ∈Y |

Since P[Yig(0, 0) ,Cig] 0, a testable implication of this model is: ∈Y ≥

P[Yig Zig = 0,Zjg = 0] P[Yig , Dig = 0 Zig = 1,Zjg = 0] 0. ∈ Y| − ∈Y | ≥

40 By the same reasoning,

P[Yig(0, 0) ,Cjg]= P[Yig Zig = 0,Zjg = 0] P[Yig , Djg = 0 Zig = 0,Zjg = 1] ∈Y ∈ Y| − ∈Y | and the testable implication is;

P[Yig Zig = 0,Zjg = 0] P[Yig , Djg = 0 Zig = 0,Zjg = 1] 0 ∈ Y| − ∈Y | ≥ as required. 

E.7 Proof of Theorem 1

First, consider the 2SLS regression of Yig on 1 Dig and Dig (without an intercept) using − Zig and 1 Zig as instruments, on the subsample of units with Zjg = 0. The 2SLS estimator − is given by:

−1 Y¯ ˜′ ˜ −1 ˜ 1 Zig 1 Zig 00 αˆ = (Z D) ZY = − 1 Dig Dig (1 Zjg) − Yig(1 Zjg)= ¯ ¯ .  Z − −  Z − Y10−Y00 + Y¯ g,i " ig # g,i " ig # " D¯ 10 00# X h i X   The cluster-robust variance estimator is:

−1 −1 1 1 ′ ′ ′ 1 ′ ˆcr α ˜ ˜ ˜ ˜ V (ˆ)= 2 Z D ˜zg ˆug ˆug˜zg D Z 4G 2G g 2G   X   ′ whereu ˆig = (Yig αˆ (1 Dig) αˆ Dig)(1 Zjg) and ˆu = [ˆu g uˆ g]. Next, − 0 − − 1 − g 1 2

2 2 (1 Z g)(1 Z g)(ˆu +u ˆ + 2ˆu guˆ g) 0 ˜z′ ˆu ˆu′ ˜z = − 1 − 2 1g 2g 1 2 g g g g 2 2 " 0 Z1g(1 Z2g)ˆu g + Z2g(1 Z1g)ˆu g# − 1 − 2 Then, 1 2 Vˆcr, (αˆ)= (1 Zig)(1 Zjg)(ˆu +u ˆiguˆjg) 11 4G2pˆ2 − − ig 00 g,i X and

¯ 2 1 2 (1 D10) 2 Vˆcr, (αˆ)= Zig(1 Zjg)ˆu + − (1 Zig)(1 Zjg)(ˆu +u ˆiguˆjg). 22 4G2D¯ 2 pˆ2 − ig 4G2D¯ 2 pˆ2 − − ig 10 10 g,i 10 00 g,i X X Now, for any invertible transformation A˜ , consider the transformed variables Z˜A˜ and D˜ A˜ . Let: δˆ(A˜ ) = ((Z˜A˜ )′(D˜ A˜ ))−1(Z˜A˜ )′Y. The 2SLS estimator using this transformed variables

41 is:

δˆ(A˜ ) = ((Z˜A˜ )′(D˜ A˜ ))−1(Z˜A˜ )′Y = (A˜ ′Z˜′D˜ A˜ )−1A˜ ′Z˜′Y = A˜ −1(Z˜′D˜ )−1(A˜ ′)−1A˜ ′Z˜′Y = A˜ −1(Z˜′D˜ )−1Z˜′Y = A˜ −1αˆ. and −1 −1 ′ Vˆcr(δˆ(A˜ )) = A˜ Vˆcr(αˆ)(A˜ )

Now, setting: 1 0 A˜ = "1 1# gives the 2SLS estimator from the regression of Yig on Dig, and a constant using Zig as an instrument, conditional on Zjg = 0, which is the parameterization considered in the paper. It follows that the conditional Wald estimator of the direct effect on compliers is:

Y¯10 Y¯00 δˆ1 = − D¯ 10 and

Vˆcr(δˆ1)= Vˆcr,11(αˆ)+ Vˆcr,22(αˆ).

The case for the spillover effect estimator based on the regression of Yig on Djg using Zjg as an instrument on the subsample of units with Zig = 0 follows analogously.

Next, consider the 2SLS regression of Yig on (1 Dig)(1 Djg), Dig(1 Djg), (1 Dig)Djg − − − − and DigDjg using (1 Zig)(1 Zjg), Zig(1 Zjg), (1 Zig)Zjg and ZigZjg as instruments. − − − − The 2SLS estimator is:

1 −1 1 θˆ = Z′D Z′Y. 2G 2G  

42 Now,

(1 Zig)(1 Zjg) − − 1 ′ 1  Zig(1 Zjg)  Z D = − (1 Dig)(1 Djg) Dig(1 Djg) (1 Dig)Djg DigDjg 2G 2G − − − − g,i  (1 Zig)Zjg  X  −  h i  ZigZjg      pˆ00 0 0 0 ¯ ¯ (1 D10)ˆp10 D10pˆ10 0 0  = − (1 D¯ )ˆp 0 D¯ pˆ 0  − 10 10 10 10   00 10 01 11   D¯ pˆ11 D¯ pˆ11 D¯ pˆ11 D¯ pˆ11  11 11 11 11    where

1

′ ′ ½ pˆ = ½(Zig = z) (Zjg = z ) zz 2G g,i X

′ ½ g,i Dig ½(Zig = z) (Zjg = z ) ¯ ′

Dz,z = ′ ½ P g,i ½(Zig = z) (Zjg = z )

′ ′

½ ½ ½ ′ ½(D = d) (D = d ) (Z = z) (Z = z ) ¯ d,d Pg,i ig jg ig jg

Dz,z′ = ′ . ½ P g,i ½(Zig = z) (Zjg = z ) P The inverse can be found by direct calculation as the matrix Qˆ such that 1 Z′D Qˆ = I. 2G × This gives:

1 pˆ00 0 0 0 ¯ −1 (1−D10) 1 0 0 1 ′  D¯ 10pˆ00 D¯10pˆ10  Qˆ = Z D = − ¯ . 2G (1−D10) 0 1 0  D¯ 10pˆ00 D¯10pˆ10     ¯ 10 − ¯ 00 ¯ 10 ¯ 10  2D11 1−D¯ 10 D11 D11 D11 1  11 11 11 11 11   p00D¯ D¯ 10 p00D¯ D¯ 10D¯ p10 D¯ 10D¯ p10 D¯ p11   ˆ 11 − ˆ 11 − 11 ˆ − 11 ˆ 11 ˆ      On the other hand,

Y¯00pˆ00 ¯ 1 ′ Y10pˆ10 Z Y = 2G ¯ Y01pˆ10   Y¯11pˆ11     where

′ ½ g,i Yig ½(Zig = z) (Zjg = z ) ¯ ′

Yzz = ′ . ½ P g,i ½(Zig = z) (Zjg = z ) P

43 Thus:

Y¯00 ¯ ¯ Y10−Y00 + Y¯  D¯ 10 00  θˆ = ¯ ¯ . Y01−Y00 + Y¯  D¯ 10 00   ¯ 10  Y¯11−Y¯00 D11 Y¯10+Y¯01−2Y¯00  11 11 + Y¯   D¯ D¯ D¯ 10 00  11 − 11      Now, for any invertible transformation A, consider the transformed variables ZA and DA. Let: βˆ(A) = ((ZA)′(DA))−1(ZA)′Y. The 2SLS estimator using this transformed variables is:

βˆ(A) = ((ZA)′(DA))−1(ZA)′Y = (A′Z′DA)−1A′Z′Y = A−1(Z′D)−1(A′)−1A′Z′Y = A−1(Z′D)−1Z′Y = A−1θˆ. and −1 −1 ′ Vˆcr(βˆ(A)) = A Vˆcr(θˆ)(A )

Now, setting: 10 00 11 00 A =   10 10   11 11     gives the 2SLS estimator from the regression of Yig on Dig, Djg, DigDjg and a constant using

Zig, Zjg, ZigZjg as instruments, which is the parameterization considered in the paper. It follows that the 2SLS estimator, βˆ, is:

βˆ0 1 0 00 ˆ β1  11 00 βˆ = = − θˆ ˆ β2  10 10   −  βˆ3  1 1 1 1    − −      Y¯00 Y¯10−Y¯00  D¯ 10  = Y¯01−Y¯00  D¯ 10   11 10  Y¯11−Y¯00 D¯ 11+D¯11 Y¯10−Y¯00 Y¯01−Y¯00  11 11 +   D¯ D¯ D¯ 10 D¯ 10   11 − 11       which implies that the treatment effect estimators βˆ1 and βˆ2 are identical to the conditional Wald ratio estimators.

44 The cluster-robust variance estimator for θˆ is:

−1 −1 1 1 ′ ′ ′ 1 ′ ˆcr θˆ ε ε V ( )= 2 Z D zg ˆg ˆgzg D Z 4G 2G g 2G   X  

′ whereε ˆig = Yig D θˆ. We have that: − ig

2 2 (1 Z g)(1 Z g)(ˆε +ε ˆ gεˆ g) (1 Z g)(1 Z g)(ˆε +ε ˆ gεˆ g) − 1 − 2 1g 1 2 − 2 − 1 2g 1 2 2 2 ′ ′ Z1g(1 Z2g)ˆε1g + Z2g(1 Z1g)ˆε1gεˆ2g Z2g(1 Z1g)ˆε2g + Z1g(1 Z2g)ˆε1gεˆ2g z εˆ εˆ = − − − − g g g 2 2 (1 Z g)Z gεˆ + (1 Z g)Z gεˆ gεˆ g (1 Z g)Z gεˆ + (1 Z g)Z gεˆ gεˆ g  − 1 2 1g − 2 1 1 2 − 2 1 2g − 1 2 1 2   2 2   Z1gZ2g(ˆε +ε ˆ1gεˆ2g) Z1gZ2g(ˆε +ε ˆ1gεˆ2g)   1g 1g    and

ˆ ′ ′ Ω = zgεˆgεˆgzg g X ω11 0 0 0  0 ω22 ω23 0  =  0 ω23 ω33 0     0 0 0 ω   44   where

2 ω = (1 Zig)(1 Zjg)(ˆε +ε ˆigεˆjg) 11 − − ig g,i X 2 ω = Zig(1 Zjg)ˆε 22 − ig g,i X ω = Zig(1 Zjg)ˆεigεˆjg 23 − g,i X 2 ω = (1 Zig)Zjgεˆ 33 − ig g,i X 2 ω44 = ZigZjg(ˆεig +ε ˆigεˆjg). g,i X Let q11 0 0 0 1 −1 q q 0 0 Qˆ = Z′D =  21 22  2G   q21 0 q22 0    q41 q42 q43 q44     where the exact values of the matrix were given above. The cluster-robust variance estimator for θˆ can be rewritten as: 1 ′ Vˆcr(θˆ)= Qˆ Ωˆ Qˆ . 4G2

45 Then,

1 2 Vˆcr, (θˆ)= (1 Zig)(1 Zjg)(ˆε +ε ˆigεˆjg) 11 4G2pˆ2 − − ig 00 g,i X ¯ 2 1 2 (1 D10) 2 Vˆcr, (θˆ)= Zig(1 Zjg)ˆε + − (1 Zig)(1 Zjg)(ˆε +ε ˆigεˆjg). 22 4G2D¯ 2 pˆ2 − ig 4G2D¯ 2 pˆ2 − − ig 10 10 g,i 10 00 g,i X X

But notice that, if the residuals are the same, we have that Vˆcr,11(θˆ) = Vˆcr,11(αˆ) and

Vˆcr,22(θˆ)= Vˆcr,22(αˆ) which in turns implies that the cluster-robust variance estimators for βˆ1 and δˆ1 are equal.

To see that the residuals are indeed equal, note that:

2 (1 Zjg)ˆu = (1 Zjg)(Yig αˆ (1 Dig) αˆ Dig) − ig − − 0 − − 1 = (1 Zjg)(Yig θˆ (1 Dig) θˆ Dig) − − 0 − − 1 2 = (1 Zjg)ˆε − ig which implies that δˆ0 = βˆ0, Vˆcr(δˆ0)= Vˆcr(βˆ0), δˆ1 = βˆ1 and Vˆcr(δˆ1)= Vˆcr(βˆ1). The results for

δˆ2 and βˆ2 follow by the same argument. 

E.8 Proof of Lemma 2

This result is well-known and follows from standard 2SLS properties and using Theorem 1 as G . To find the exact formula for β , note that →∞ 3

′ ′ E[Yig Zig = z, Zjg = z ]= β + β E[Dig Zig = z, Zjg = z ] | 0 1 | ′ + β E[Djg Zig = z, Zjg = z ] 2 | ′ + β E[DigDjg Zig = z, Zjg = z ] 3 |

′ Under one-sided noncompliance, E[Dig Zig = 0,Zjg = z ] = E[Djg Zig = z, Zjg = 0] = 0 | | and thus:

E[Yig Zig = 0,Zjg =0]= β | 0 E[Yig Zig = 1,Zjg =0]= β + β E[Dig Zig = 1,Zjg = 0] | 0 1 | E[Yig Zig = 0,Zjg =1]= β + β E[Dig Zig = 0,Zjg = 1] | 0 2 |

46 from which:

β = E[Yig Zig = 0,Zjg =0]= E[Yig(0, 0)] 0 | E[Yig Zig = 1,Zjg = 0] E[Yig Zig = 0,Zjg = 0] β1 = | − | = E[Yig(1, 0) Yig(0, 0) Cig ] E[Dig Zig = 1,Zjg = 0] − | | E[Yig Zig = 0,Zjg = 1] E[Yig Zig = 0,Zjg = 0] β2 = | − | = E[Yig(0, 1) Yig(0, 0) Cjg] E[Djg Zig = 0,Zjg = 1] − | | E[Yig Zig = 1,Zjg = 1] β0 β1E[Dig(1, 1)] β2E[Djg(1, 1)] β3 = | − − − E[Djg(1, 1)Djg(1, 1)] as long as E[Djg(1, 1)Djg(1, 1)] > 0 (otherwise, β3 is not identified). Finally, note that

E[Yig Zig = 1,Zjg =1]= E[Yig(0, 0)] | + E[Yig(1, 0) Yig(0, 0) Dig(1, 1) = 1]E[Dig(1, 1)] − | + E[Yig(0, 1) Yig(0, 0) Djg(1, 1) = 1]E[Djg(1, 1)] − | + E[Yig(1, 1) Yig(1, 0) Yig(0, 1) + Yig(0, 0) Dig(1, 1) = 1, Djg(1, 1) = 1] − − | E[Dig(1, 1)Djg(1, 1)] × and use the fact that Dig(1, 1) = 1 if i is a complier or a group complier to get that:

P[GCig] β3 = (E[Yig(1, 0) Yig(0, 0) GCig ] E[Yig(1, 0) Yig(0, 0) Cig]) − | − − | E[Dig(1, 1)Djg(1, 1)] P[GCjg] + (E[Yig(0, 1) Yig(0, 0) GCjg] E[Yig(0, 1) Yig(0, 0) Cjg]) − | − − | E[Dig(1, 1)Djg(1, 1)]

+ E[Yig(1, 1) Yig(1, 0) (Yig(0, 1) Yig(0, 0)) Dig (1, 1) = 1, Djg(1, 1) = 1]. − − − | which gives the desired result. 

E.9 Proof of Proposition 4

First,

E[Dig Zig = 1,Zjg = 0, Xg]= E[Dig(1, 0) Zig = 1,Zjg = 0, Xg] | | = E[Dig(1, 0) Xg ]= P[Cig Xg] | | and

E[Dig Zig = 1,Zjg = 1, Xg]= E[Dig(1, 1) Zig = 1,Zjg = 1, Xg] | | = E[Dig(1, 1) Xg ]= P[Cig Xg]+ P[GCig Xg]. | | |

47 For the second part, we have that for the first term,

(1 Zig)(1 Zjg) (1 Zig)(1 Zjg) E g(Y , X ) − − = E E g(Y , X ) − − X ig g p (X ) Xg ig g p (X ) g  00 g    00 g 

= EX E [g(Yig, Xg) Zig = 0,Zjg = 0, X g] g { | } = EX E [g(Yig(0, 0), Xg) Zig = 0,Zjg = 0, Xg] g { | } = EX E [g(Yig(0, 0), Xg) Xg] g { | } = E[g(Yig(0, 0), Xg )].

For the second term,

Zig(1 Zig) Zig(1 Zig) E g(Y , X )D − = E E g(Y , X )D − X ig g ig p (X ) Xg ig g ig p (X ) g  10 g    10 g 

= EX E [g(Yig, Xg)Dig Zig = 1,Zjg = 0, Xg] g { | } = EX E [g(Yig(1, 0), Xg )Dig(1, 0) Zig = 1,Zjg = 0, Xg] g { | } = EX E [g(Yig(1, 0), Xg )Dig(1, 0) Xg] g { | } = E [g(Yig(1, 0), Xg )Dig(1, 0)]

= E [g(Yig(1, 0), Xg ) Cig] P[Cig]. |

For the third term,

(1 Zig)Zig (1 Zig)Zig E g(Y , X )D − = E E g(Y , X )D − X ig g jg p (X ) Xg ig g jg p (X ) g  01 g    01 g 

= EX E [g(Yig, Xg)Djg Zig = 0,Zjg = 1, Xg] g { | } = EX E [g(Yig(0, 1), Xg )Djg(0, 1) Zig = 0,Zjg = 1, Xg] g { | } = EX E [g(Yig(0, 1), Xg )Djg(0, 1)] g { } = E [g(Yig(0, 1), Xg )Djg(0, 1)]

= E[g(Yig(0, 1), Xg ) Cjg]P[Cjg]. |

For the fourth term,

Zig(1 Zjg) Zig(1 Zjg) E g(Yig, Xg)(1 Dig) − = EX E g(Yig, Xg)(1 Dig) − Xg − p (X ) g − p (X )  10 g    10 g 

= EX E [g(Yig, Xg)(1 Dig) Zig = 1,Zjg = 0, Xg] g { − | } = EX E [g(Yig(0, 0), Xg )(1 Dig(1, 0)) Zig = 1,Zjg = 0, Xg] g { − | } = EX E [g(Yig(0, 0), Xg )(1 Dig(1, 0)) Xg] g { − | } = E[g(Yig(0, 0), Xg )(1 Dig(1, 0))] − c c = E[g(Yig(0, 0), Xg ) C ]P[C ] | ig ig

48 c c and the result follows from E[g(Yig(0, 0), Xg )] = E[g(Yig(0, 0), Xg) Cig]P[Cig]+E[g(Yig(0, 0), Xg ) C ]P[C ]. | | ig ig Similarly for the fifth term,

(1 Zig)Zjg (1 Zig)Zjg E g(Yig, Xg)(1 Djg) − = EX E g(Yig, Xg)(1 Djg) − Xg − p (X ) g − p (X )  01 g    01 g 

= EX E [g(Yig, Xg)(1 Djg) Zig = 0,Zjg = 1, Xg] g { − | } = EX E [g(Yig(0, 0), Xg )(1 Djg(1, 0)) Zig = 0,Zjg = 1, Xg] g { − | } = EX E [g(Yig(0, 0), Xg )(1 Djg(1, 0)) Xg] g { − | } = E[g(Yig(0, 0), Xg)(1 Djg(1, 0))] − c c = E[g(Yig(0, 0), Xg) C ]P[C ] | jg jg and it can be seen that all these equalities also hold conditional on Xg. 

E.10 Proof of Proposition 5

By independence, E[Dig Zig = z,Wig = w] = E[Dig(z, w)]. Then, under monotonicity, | E[Dig Zig = 0,Wig = 0] = E[Dig(0, 0)] = P[Dig(0, 0) = 1] = P[ATig]. Next, E[Dig Zig = | | ∗ ∗ ∗ ∗ ∗ 0,Wig = w ] E[Dig Zig = 0,Wig = w 1] = E[Dig(0, w )] E[Dig(0, w 1)] = P[Dig(0, w ) > − | − − − ∗ ∗ Dig(0, w 1)] = P[SCig(w )]. Similarly, E[Dig Zig = 1,Wig = 0] E[Dig Zig = 0,Wig = − | − | ∗ ng]= P[Dig(1, 0) > Dig(0,ng)] = P[Cig] and E[Dig Zig = 1,Wig = w ] E[Dig Zig = 1,Wig = | − | ∗ ∗ ∗ ∗ w 1] = P[Dig(1, w ) > Dig(1, w 1)] = P[GCig(w )]. Finally, E[1 Dig Zig = 1,Wig = − − − | ng]= P[Dig(1,ng)=0]= P[NTig]. 

E.11 Proof of Proposition 6

Under the assumptions in the proposition,

E E [Yig Dig = d, Sig = s, Zig = z,Wig = w]= [Yig Dig = d, Sig = s, Zig = z,Wig = w, Z(i)g = zg] | z | Xg P[Z = zg Dig = d, Sig = s, Zig = z,Wig = w] × (i)g | E = [Yig(d, s) Dig(z, w)= d, Sig(z, zg)= s, Zig = z, Z(i)g = zg] z | Xg P[Z = zg Dig = d, Sig = s, Zig = z,Wig = w] × (i)g | = E[Yig(d, s) Dig(z, w)= d, Sig(z, zg)= s] z | Xg P[Z = zg Dig = d, Sig = s, Zig = z,Wig = w] × (i)g | = E[Yig(d, s) Dig(z, w)= d] |

49 where the second equality uses the fact that Sig depends on the whole vector of instruments, the third equality follows by independence and the fourth equality uses independence of peers’ types.

Now, note that for any s, E[Yig Dig = 1,Sig = s, Zig = 0,Wig =0]= E[Yig(1,s) Dig(0, 0) = | | 1] = E[Yig(1,s) ATig] which shows that E[Yig(1,s) ATig] is identified. Then, | |

E[Yig Dig = 1,Sig = s, Zig = 0,Wig =1]= E[Yig(1,s) Dig(0, 1) = 1] | | P[ATig] = E[Yig(1,s) ATig] | P[ATig]+ P[SC(1)ig] P[SCig(1)] + E[Yig(1,s) SCig(1)] | P[ATig]+ P[SCig(1)] and hence E[Yig(1,s) SCig(1)] is identified by the results above and Proposition 5. By the | same logic,

E[Yig Dig = 1,Sig = s, Zig = 0,Wig =2]= E[Yig(1,s) Dig(0, 2) = 1] | | P[ATig] = E[Yig(1,s) ATig] | P[ATig]+ P[SC(1)ig]+ P[SC(2)ig] P[SCig(1)] + E[Yig(1,s) SCig(1)] | P[ATig]+ P[SC(1)ig]+ P[SC(2)ig] P[SCig(2)] + E[Yig(1,s) SCig(2)] | P[ATig]+ P[SC(1)ig]+ P[SC(2)ig] and thus E[Yig(1,s) SCig(2)] is identified. The same reasoning shows that as long as all | required probabilities are non-zero, E[Yig(1,s) ξig] is identified for all values of ξig except for | ξig = NT , since the assignment (1,s) is never observed for never-takers. Identification of

E[Yig(0,s) ξig] for all values of ξig = AT follows similarly.  | 6

F Proofs of Additional Results

F.1 Proof of Lemma 3

Using that:

′ E[Yig Zig = z, Zjg = z ]= E[Yig(0, 0)] | ′ ′ + E[(Yig(1, 0) Yig(0, 0))Dig(z, z )(1 Djg(z , z))] − − ′ ′ + E[(Yig(0, 1) Yig(0, 0))(1 Dig(z, z ))Djg(z , z)] − − ′ ′ + E[(Yig(1, 1) Yig(0, 0))Dig(z, z )Djg(z , z)], −

50 Table 12: Dig(0, 1)(1 Djg(1, 0)) Dig(0, 0)(1 Djg(0, 0)) − − −

Dig(1, 0) Djg(0, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 -1 AT SC,C 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 0 0 0 1 SC GC,NT 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0

Table 13: (1 Dig(0, 1))Djg(1, 0) (1 Dig(0, 0))Djg(0, 0) − − −

Dig(1, 0) Djg(0, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 0 1 0 1 0 0 1 1 0 1 -1 SC AT 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 C,CG,NT SC,C 0 0 0 0 0

we have:

E[Yig Zig = 1,Zjg = 0] E[Yig Zig = 0,Zjg =0]= | − | + E[(Yig(1, 0) Yig(0, 0))(Dig (0, 1)(1 Djg(1, 0)) Dig(0, 0)(1 Djg(0, 0)))] − − − − + E[(Yig(0, 1) Yig(0, 0))((1 Dig(0, 1))Djg(1, 0) (1 Dig(0, 0))Djg(0, 0))] − − − − + E[(Yig(1, 1) Yig(0, 0))(Dig (0, 1)Djg(1, 0) Dig(0, 0)Djg(0, 0))]. − −

Tables 12, 13 and 14 list the possible values that the terms that depend on the potential treatment statuses can take, which gives the desired result after some algebra. 

51 Table 14: Dig(0, 1)Djg(1, 0) Dig(0, 0)Djg(0, 0) −

Dig(1, 0) Djg(0, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 1 AT SC,C 1 0 1 0 0 1 1 0 1 1 SC AT 1 1 0 0 1 SC SC,C 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0

F.2 Proof of Lemma 4

Using that:

′ E[Yig Zig = z, Zjg = z ]= E[Yig(0, 0)] | ′ ′ + E[(Yig(1, 0) Yig(0, 0))Dig(z, z )(1 Djg(z , z))] − − ′ ′ + E[(Yig(0, 1) Yig(0, 0))(1 Dig(z, z ))Djg(z , z)] − − ′ ′ + E[(Yig(1, 1) Yig(0, 0))Dig(z, z )Djg(z , z)], − we have:

E[Yig Zig = 1,Zjg = 1] E[Yig Zig = 0,Zjg =0]= | − | + E[(Yig(1, 0) Yig(0, 0))(Dig (1, 1)(1 Djg(1, 1)) Dig(0, 0)(1 Djg(0, 0)))] − − − − + E[(Yig(0, 1) Yig(0, 0))((1 Dig(1, 1))Djg(1, 1) (1 Dig(0, 0))Djg(0, 0))] − − − − + E[(Yig(1, 1) Yig(0, 0))(Dig (1, 1)Djg(1, 1) Dig(0, 0)Djg(0, 0))]. − −

Tables 15, 16 and 17 list the possible values that the terms that depend on the potential treatment statuses can take, which gives the desired result after some algebra. 

F.3 Proof of Proposition 7

Under the conditions of the proposition,

E E ½ E ½ [½(Dig = k) Zig = k, Zjg =0]= [ (Dig(k, 0) = d) Zig = k, Zjg =0]= [ (Dig(k, 0) = k)] | |

52 Table 15: Dig(1, 1)(1 Djg(1, 1)) Dig(0, 0)(1 Djg(0, 0)) − − −

Dig(1, 1) Djg(1, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 0 1 0 0 1 0 1 0 1 1 1 0 -1 AT SC,C,GC 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 -1 SC,C,GC NT 0 0 0 0 0

Table 16: (1 Dig(1, 1))Djg(1, 1) (1 Dig(0, 0))Djg(0, 0) − − −

Dig(1, 1) Djg(1, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 0 1 -1 SC,C,GC AT 0 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 NT SC,C,GC 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0

Table 17: Dig(1, 1)Djg(1, 1) Dig(0, 0)Djg(0, 0) −

Dig(1, 1) Djg(1, 1) Dig(0, 0) Djg(0, 0) Difference ξig ξjg 1 1 1 1 0 1 1 1 0 1 SC,C,GC AT 1 0 1 0 0 1 1 0 1 1 AT SC,C,GC 1 1 0 0 1 SC,C,GC SC,C,GC 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0

53 On the other hand, for any k 0, 1,...,K ,

∈ { } E ½ E[Yig ½(Dig = k) Zig = k, Zjg =0]= [Yig(k, 0) (Dig(k, 0) = k)]

| E ½ E[Yig ½(Dig = 0) Zig = k, Zjg =0]= [Yig(0, 0) (Dig(k, 0) = 0)] |

= E[Yig(0, 0) Dig(k, 0) = 0]E[½(Dig(k, 0) = 0)] |

E E ½ = [Yig(0, 0)½(Dig(k, 0) = k)] + [Yig(0, 0) (Dig(k, 0) = 0)]

E E E ½ and [Yig Zig = k, Zjg = 0] = [Yig ½(Dig = k) Zig = k, Zjg = 0]+ [Yig (Dig = 0) Zig = | | | k, Zjg = 0], from which:

E[Yig Zig = k, Zjg = 0] E[Yig Zig = 0,Zjg = 0] | − | = E[Yig(k, 0) Dig (k, 0) = k]

E[½(Dig = k) Zig = k, Zjg = 0] | | Similarly,

E[Yig Zig = 0,Zjg = k] E[Yig Zig = 0,Zjg = 0] | − | = E[Yig(0, k) Djg(k, 0) = k]

E[½(Djg = k) Zig = 0,Zjg = k] | | as required. 

54