<<

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2019, VOL. 114, NO. 526, 529–540: Applications and Case Studies https://doi.org/./..

Causal Interaction in Factorial : Application to Conjoint Analysis

Naoki Egami a and Kosuke Imai b,c aDepartment of Politics, Princeton University, Princeton, NJ; bDepartment of Government, and Department of , Harvard University, Cambridge, MA; cDepartment of Politics and Center for Statistics and Machine Learning, Princeton University, Princeton, NJ

ABSTRACT ARTICLE HISTORY We study causal interaction in factorial experiments, in which several factors, each with multiple levels, are Received January  randomized to form a large number of possible treatment combinations. Examples of such experiments Revised January  include conjoint analysis, which is often used by social scientists to analyze multidimensional preferences KEYWORDS in a population. To characterize the structure of causal interaction in factorial experiments, we propose a ANOVA; Causal inference; new causal interaction efect, called the average marginal interaction efect (AMIE). Unlike the conventional Heterogenous treatment interaction efect, the relative magnitude of the AMIE does not depend on the choice of baseline condi- effects; Interaction effects; tions, making its interpretation intuitive even for higher-order interactions. We show that the AMIE can Randomized experiments; be nonparametrically estimated using ANOVA regression with weighted zero-sum constraints. Because the Regularization AMIEs are invariant to the choice of baseline conditions, we directly regularize them by collapsing levels and selecting factors within a penalized ANOVA framework. This regularized estimation procedure reduces and further facilitates interpretation. Finally, we apply the proposed methodology to the con- joint analysis of ethnic voting behavior in Africa and fnd clear patterns of causal interaction between politi- cians’ethnicity and their prior records. The proposed methodology is implemented in an open source soft- ware package. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

1. Introduction Recently, conjoint analysis has also gained its popularity Statistical interaction among treatment variables can be inter- among medical and social scientists who study multidimen- preted as causal relationships when the treatments are random- sional preferences among a population of individuals (e.g., ized in an . Causal interaction plays an essential Marshall et al. 2010; Hainmueller and Hopkins 2015). In this role in the exploration of heterogenous treatment efects. This article, we focus on the latter use of conjoint analysis by estimat- article develops a framework for studying causal interaction ing population average causal efects. Specifcally, we analyze a in randomized experiments with a factorial design, in which conjoint analysis about coethnic voting in Africa to examine the there are multiple factorial treatments with each having several conditions under which voters prefer political candidates of the levels. A primary goal of causal interaction analysis is to identify same ethnicity (see Section 2 for the details of the experiment the combinations of treatments that induce large additional and Section 6 for our empirical analysis). efects beyond the sum of efects separately attributable to each One important limitation of conjoint analysis, as currently treatment. conducted in applied research, is that causal interactions are Our motivating application is conjoint analysis, which is a largely ignored. This is unfortunate because studies of multi- type of randomized survey experiment with a factorial design dimensional choice necessarily involve the consideration of (Luce and Tukey 1964). Conjoint analysis has been extensively interaction efects. However, the exploration of causal interac- used in marketing research to investigate consumer preferences tions in conjoint analysis is often difcult for two reasons. First, and predict product sales (e.g., Green, Krieger, and Wind 2001; the relative magnitude of the conventional causal interaction Marshall and Bradlow 2002). In a typical conjoint analysis, efect depends on the choice of baseline condition. This is respondents are asked to evaluate pairs of product profles problematic because many factors used in conjoint analysis do where several characteristics of a commercial product such as not have natural baseline conditions (e.g., gender, racial group, price and color are randomly chosen. Because these product religion, occupation). Second, a typical conjoint analysis has characteristics are represented by factorial variables, conjoint several factors with each having multiple levels. This that analysis can be seen as an application of randomized factorial we must apply a regularization method to reduce false discovery design. Thus, the causal estimands and estimation methods and facilitate interpretation. Yet, the lack of invariance property proposed in this article are widely applicable to any factorial means that the results of standard regularized estimation will experiments with many factors. depend on the choice of baseline conditions.

CONTACT Kosuke Imai [email protected] Department of Government and Department of Statistics, Harvard University, Cambridge, MA . Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/JASA. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA These materials were reviewed for reproducibility. ©AmericanStatisticalAssociation 530 N. EGAMI AND K. IMAI

To overcome these problems, we propose an alternative def- 2. Conjoint Analysis of Ethnic Voting inition of causal interaction efect that is invariant to the choice Conjoint analysis has a long history dating back to the theoreti- of baseline condition, making its interpretation intuitive even cal article by Luce and Tukey (1964). In terms of its application, for higher-order interactions (Sections 3 and 4). We call this it has been widely used by marketing researchers over the last new causal quantity of interest, the average marginal interaction 40 years to measure consumer preferences and predict product efect (AMIE), because it marginalizes the other treatments sales (Green and Rao 1971;Green,Krieger,andWind2001; rather than conditioning on their baseline values as done in the Marshall and Bradlow 2002). It has also become a popular conventional causal interaction efect. The proposed approach statistical tool in the medical and social sciences (e.g., Marshall enables researchers to efectively summarize the structure of et al. 2010; Hainmueller and Hopkins 2015)tostudymulti- causal interaction in high-dimension by decomposing the total dimensional preferences of a variety of populations such as efect of any treatment combination into the separate efect of patients and voters. each treatment and their interaction efects. Conjoint analysis can be considered as an application of Finally, we also establish the identifcation condition and factorial randomized experiments. For example, in a typical develop estimation strategies for the AMIE (Section 5). We conjoint analysis used for marketing research, respondents propose a nonparametric estimator of the AMIE and show evaluate a commercial product whose several characteristics that this estimator can be recast as an ANOVA with weighted such as price and color, etc., are randomly selected. Factorial zero-sum constraints (Schefe 1959). Exploiting this equiv- variables represent these characteristics with several levels alence relationship, we apply the method proposed by Post (e.g., $1, $5, $10 for price, and red, green, and blue for color). and Bondell (2013) and directly regularize the AMIEs within Similarly, in political science research, conjoint analysis may be the ANOVA framework by collapsing levels and selecting fac- used to evaluate candidates where factors may represent their tors. Because the AMIE is invariant to the choice of baseline party identifcation, race, gender, and other attributes. condition, our regularization also has the same invariance In this article, we examine a recent conjoint analysis con- property. This also enables a proper regularization of the ducted to study coethnic voting in Uganda (Carlson 2015). conditional average efects, which can be computed using Coethnic voting refers to the tendency of some voters to prefer the AMIEs. Without the invariance property, the results of political candidates whose ethnicity is the same as their own. regularized estimation will depend on the choice of baseline Researchers have observed that coethnic voting occurs fre- conditions. All of our theoretical results and estimation strate- quently among African voters, but the identifcation of causal gies are shown to hold for causal interaction of any order. The efects is often difcult because the ethnicity of candidates is proposed methodology is implemented via an open-source often correlated with other characteristics that may infuence software package, FindIt: Finding Heterogeneous Treatment voting behavior. To address this problem, the original author Efects (Egami, Ratkovic, and Imai 2017), which is available for conducted a conjoint analysis, in which respondents were download at the Comprehensive R Archive Network (CRAN; asked to choose one of the two hypothetical candidates whose https://cran.r-project.org/package FindIt). = attributes were randomly assigned. Our article builds on the causal inference and experimental For the experiment, a total of 547 respondents were sampled design literatures that are concerned about interaction efects from villages in Uganda. We analyze a subset of 544 obser- (see, e.g., Cox 1984; Jaccard and Turrisi 2003;deGonzálezand vations after removing three observations with missing . Cox 2007;VanderWeeleandKnol2014). In addition, we draw Each respondent was given the description of three pairs of upon the recent articles that provide the potential outcomes hypothetical presidential candidates. They were then asked to framework for causal inference with factorial experiments and cast a vote for one of the candidates within each pair. These conjoint analysis (Hainmueller, Hopkins, and Yamamoto 2014; hypothetical candidates are characterized by a total of four Dasgupta, Pillai, and Rubin 2015;Lu2016a, 2016b). Indeed, the factors shown in Table 1: Coethnicity (2 levels), Record AMIE is a direct generalization of the average marginal efect (7 levels), Platform (3 levels), and Degree (2 levels). studied in this literature that can be used to characterize the While the levels of all factors are randomly and indepen- causal heterogeneity of a high-dimensional treatment. dently selected for each hypothetical candidate, the distribution Finally, this article is also related to the literature on het- of candidate ethnicity depends on the local ethnic diversity erogenous treatment efects, in which the goal of analysis is so that enough respondents share the same ethnicity as their to fnd an optimal treatment regime. Much of this literature, assigned hypothetical candidates. The original analysis was however, focuses on the interaction between a single treatment based on a mixed efects with a respondent and pretreatment covariates (e.g., Hill 2012;GreenandKern random efect. While previous studies showed that many voters 2012;WagerandAthey2017;Grimmer,Messing,andWest- unconditionally favor coethnic candidates, Carlson (2015) wood 2017)oradynamicsettingwhereasequenceoftreatment found that voters tend to favor only coethnic candidates with decisions is optimized (e.g., Murphy 2003;Robins2004). We good prior record. emphasize that if the goal of analysis is to fnd an optimal We focus on two methodological challenges of the original treatment regime, rather than to understand the structure of analysis. First, the author tests the existence of causal interaction causal heterogeneity, the marginalized causal quantities such between Coethnicity and Record,butdoesnotexplicitly as the one proposed in this article may be of little use. In estimate causal interaction efects. We propose a defnition of such settings, researchers typically estimate the causal efects causal interaction efects in randomized experiments with a of specifc treatment combinations (e.g., Imai and Ratkovic factorial design and show how to estimate them. Second, the 2013). JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 531

Table . Levels of four factors from the conjoint analysis in Carlson (). when A aℓ and B b ,andtakingthevalue0otherwise. i = i = m Factors Levels In this article, we make the stability assumption, which states that there is neither interference between units nor diferent Coethnicity Yes acoethnicofarespondent versions of the treatment (Cox 1958;Rubin1990). No not a coethnic of a respondent Record Yes/Village politician for a village with good prior In addition, we assume that the treatment assignment is record randomized. Yes/District politician for a district with good prior record Yi(aℓ, bm) aℓ ,b Ai, Bi for all i 1,...,n. (1) Yes/MP member of parliament with good prior { } ∈A m∈B⊥⊥{ } = record Pr(Ai aℓ, Bi bm)>0forallaℓ and bm . (2) No/Village politician for a village without good = = ∈ A ∈ B prior record This assumption rules out the use of fractional factorial designs No/District politician for a district without good where certain combinations of treatments have zero probabil- prior record No/MP member of parliament without good ity of occurrence. In some cases, however, researchers may wish prior record to eliminate certain treatment combinations for substantive rea- No/Business businessman without good prior sons. The standard recommendation is to set the probability for record Platform Job promise to create new jobs those treatment combinations to small nonzero values under a Clinic promise to create clinics full factorial design so that the assumption continues to hold Education promise to improve education (see Hainmueller, Hopkins, and Yamamoto 2014,footnote18). Degree Yes masters degree in business, law, economics, or development Another possibility is to restrict one’s analysis to a subset of data No bachelors degree in tourism, and hence the corresponding subset of estimands so that the horticulture, forestry or theater assumption is satisfed. Under this set-up, we review two noninteractive causal author dichotomized two factors, Record and Platform, efects of interest. First, we defne the average combination efect which have more than two levels and does not have a natural (ACE), which represents the average causal efect of a treat- ( , ) ( , ) baseline condition. We show how to use a data-driven regular- ment combination Ai Bi aℓ bm relative to a prespecifed ( , )= ization method when estimating causal interaction efects in baseline condition a0 b0 (e.g., Dasgupta, Pillai, and Rubin ahigh-dimensionalsetting.Ourreanalysisofthisexperiment 2015): appears in Section 6. τAB(aℓ, bm a0, b0) E Yi(aℓ, bm) Yi(a0, b0) , (3) ; ≡ { − } where aℓ, a and b , b . 0 ∈ A m 0 ∈ B 3. Two-Way Causal Interaction Another causal quantity of interest is the average marginal In this section, we introduce a new causal quantity, the efect (AME). For each unit, we defne the marginal efect of treatment condition A aℓ relative to a baseline condition average marginal interaction efect (AMIE), and show that, i = unlike the conventional causal interaction efect, it is invariant a0 by averaging over the distribution of the other treatment to the choice of baseline condition. The invariance property Bi. Then, the AME is the population average of this unit-level enables simple interpretation and efective regularization even marginal efect (e.g., Hainmueller, Hopkins, and Yamamoto when there are many factors. While this section focuses on two- 2014;Dasgupta,Pillai,andRubin2015): way causal interaction for the sake of simplicity, all defnitions and results will be generalized beyond two-way interaction in ψA(aℓ, a0 ) E Yi(aℓ, Bi) Yi(a0, Bi) dF(Bi) , (4) ≡ { − } Section 4. "# $ where aℓ, a and B is another factor whose distribu- 0 ∈ A i tion function is F(B ). The AME of b relative to b ,thatis, 3.1. The Setup i m 0 ψB(bm, b0 ),canbedefnedsimilarly. Consider a simple random sample of n units from the target We emphasize that while these two causal quantities require population .LetA and B be two factorial treatment vari- the specifcation of baseline conditions, the relative magnitude P i i ables of interest for unit i where LA and LB be the number of is not sensitive to this choice. For example, if we sort the ACEs ordered or unordered levels for factors A and B,respectively. by their relative magnitude, the resulting order does not depend We use aℓ and bm to represent levels of the two factors where on the values of the treatment variables selected for the baseline ℓ 0, 1,...,L 1 and m 0, 1,...,L 1 . The sup- conditions (a0, b0). The same property is applicable to the ={ A − } ={ B − } port of treatment variables A and B,therefore,isgivenby AMEs where the choice of baseline condition a0 does not alter A = a0, a1,...,aL 1 and b0, b1,...,bL 1 ,respectively. their relative magnitude. { A− } B ={ B− } We call a combination of factor levels (aℓ, bm) a treat- ment combination. Thus, in the current set-up, the total 3.2. The Average Marginal Interaction Efect number of unique treatment combinations is L L .Let A × B Yi(aℓ, bm) denote the potential outcome variable of unit i We propose a new two-way causal interaction efect, called if the unit receives the treatment combination (aℓ, bm).For the average marginal interaction efect (AMIE), which is useful each unit, only one of the potential outcome variables can for randomized experiments with a factorial design. For each be observed, and the realized outcome variable is denoted unit, the marginal interaction efect represents the causal efect , ( , ) by Yi aℓ ,b 1 Ai aℓ Bi bm Yi aℓ bm ,whereinduced by the treatment combination beyond the sum of the = ∈A m∈B { = = } 1 A aℓ, B b is an indicator variable taking the value 1 marginal efects separately attributable to each treatment. The { i = !i = m} 532 N. EGAMI AND K. IMAI

AMIE is the population average of this unit-level marginal inter- Finally, the AMIE and the AIE are linear functions of one action efect. Specifcally, the two-way AMIE of treatment com- another. This result is presented below as a special case of The- bination (aℓ, bm),withbaselinecondition(a0, b0 ),isdefnedas orem 1 presented in Section 4.

π (aℓ, b a , b ) Result 1 (Relationships Between the Two-Way AMIE and the AB m; 0 0 Two-Way AIE). The two-way average marginal interaction efect E Yi(aℓ, bm) Yi(a0, b0) ≡ − (AMIE), defned in Equation (5), equals the following lin- " ear function of the two-way average interaction efects (AIEs), Yi(aℓ, Bi) Yi(a0, Bi) dF(Bi) defned in Equation (6): − { − } # πAB(aℓ, bm a0, b0) ξAB(aℓ, bm a0, b0) Yi(Ai, bm) Yi(Ai, b0) dF(Ai) ; = ; − { − } Pr(A a)ξ (a, b a , b ) # $ − i = AB m; 0 0 τAB(aℓ, bm a0, b0) ψA(aℓ, a0) ψB(bm, b0), a = ; − − %∈A (5) Pr(B b)ξ (aℓ, b a , b ). − i = AB ; 0 0 b where aℓ, a and b , b , π (aℓ, b a , b ) is the %∈B 0 ∈ A m 0 ∈ B AB m; 0 0 AMIE, and ψ( , ) is the AME defned in Equation (4). · · Likewise, the AIE can be expressed as the following linear func- The AMIE is closely connected to the conventional defnition tion of the AMIEs: of the average interaction efect (AIE). In the causal inference literature (e.g., Cox 1984;VanderWeele2015;Dasgupta,Pillai, ξAB(aℓ, bm a0, b0) πAB(aℓ, bm a0, b0) πAB(aℓ, b0 a0, b0) ; = ; − ; and Rubin 2015), researchers defne the AIE of treatment π (a , b a , b ). − AB 0 m; 0 0 combination (aℓ, bm) relative to baseline condition (a0, b0) as, Result 1 implies that all the AMIEs are zero if and only if all ξ (aℓ, b a , b ) AB m; 0 0 the AIEs are zero. Thus, testing the absence of causal interaction E Yi(aℓ, bm) Yi(a0, bm) Yi(aℓ, b0) Yi(a0, b0 ) ,(6) ≡ { − − + } can be done by an F-test, investigating either all the AIEs or all the AMIEs are zero. All causal estimands introduced in this sec- where aℓ, a and b , b . 0 ∈ A m 0 ∈ B tion are identifable under the assumption of randomized treat- Similar to the AMIE, the AIE has an interactive efect inter- ment assignment (i.e., Equations (1)and(2)). pretation,representingtheadditionalaveragecausalefect induced by the treatment combination beyond the sum of the average causal efects separately attributable to each treatment. 3.3. Invariance to the Choice of Baseline Condition This interpretation is based on the following algebraic equality: One advantage of the AMIE over the AIE is its invariance to ξAB(aℓ, bm a0, b0 ) τAB(aℓ, bm a0, b0 ) the choice of baseline condition. That is, the relative diference ; = ; E Yi(aℓ, b0 ) Yi(a0, b0 ) of any pair of AMIEs remains unchanged even if one chooses − { − } adiferentbaselinecondition.Mostcausalefects,includingthe E Yi(a0, bm ) Yi(a0, b0 ) . − { − } ACE and the AME, have this invariance property. In , (7) the relative magnitude of any two AIEs depends on the choice The diference between the AMIE and the AIE is that the former of baseline condition unless all AIEs are zero. The invariance subtracts the AMEs from the ACE while the latter subtracts the property is important because without it researchers cannot systematically compare interaction efects of diferent treatment sum of two separate efects due to A aℓ and B b while i = i = m holding the other treatment variable at its baseline value, that combinations. We state this as Result 2,whichisaspecialcase is, A a or B b . of Theorem 2 presented in Section 5. i = 0 i = 0 In addition, the AIE has a conditional efect interpretation, Result 2 (Invariance and Lack Thereof to the Choice of Baseline Condition). The average marginal interaction efect (AMIE), ξAB(aℓ, bm a0, b0) E Yi(aℓ, bm) Yi(a0, bm ) ; = { − } defned in Equation (5), is interval invariant. That is, for E Yi(aℓ, b0 ) Yi(a0, b0) , − { − } any (aℓ, bm) (aℓ′ , bm′ ) and (a0, b0) (aℓ, bm),thefollowing ̸= ̸= ˜ ˜ which denotes the diference in the average causal efect of A equality holds, i = aℓ relative to Ai a0 between the two scenarios, one when Bi = = π (aℓ, b a , b ) π (aℓ , b a , b ) b and the other when B b .Whensuchconditionalefects AB m; 0 0 − AB ′ m′ ; 0 0 m i = 0 πAB(aℓ, bm aℓ, bm) πAB(aℓ′ , bm′ aℓ, bm). are of interest, the AMIE can be used to obtain them. For exam- = ; ˜ ˜ − ; ˜ ˜ ple, we have Note that the above diference of the AMIEs is also equal to π ( , , ) E Yi(aℓ, b0) Yi(a0, b0) ψA(aℓ a0) πAB(aℓ, b0 a0, b0). another AMIE, AB aℓ bm aℓ′ bm′ . { − }= ; + ; ; (8) In contrast, the average interaction efect (AIE), defned in Equation (6) does not have the invariance property. That is, the Clearly, the scientifc question of interest should determine the following equality does not generally hold, choice between the AMIE and AIE. In Section 6,weillustrate ξ (aℓ, b a , b ) ξ (aℓ , b a , b ) how to use the AMIEs for estimating the average conditional AB m; 0 0 − AB ′ m′ ; 0 0 efects when necessary. ξAB(aℓ, bm aℓ, bm) ξAB(aℓ′ , bm′ aℓ, bm). = ; ˜ ˜ − ; ˜ ˜ JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 533

In addition, the AIE is interval invariant if and only if all the Assumption 1 (Randomized Treatment Assignment). AIEs are zero. Y (t) T and Pr(T t)>0forallt. i ⊥⊥ i i = The sensitivity of the AIEs to the choice of baseline condi- In addition, we assume that J factorial treatments are inde- tion can be further illustrated by the fact that the AIE of any pendent of one another. treatment combination pertaining to one of levels in the baseline condition is equal to zero. That is, if (a0, b0) is the baseline con- Assumption 2 (Independent Treatment Assignment). dition, then ξAB(a0, bm a0, b0) ξAB(aℓ, b0 a0, b0) 0. If the ; = ; = Tij Ti, j for all j 1, 2,...,J , researchers are only interested in the conditional efect interpre- ⊥⊥ − ∈{ } tation of the AIEs, these zero AIEs are not of interest. However, where Ti, j denotes the (J 1) factorial treatments excluding − − this restriction is problematic for the interactive efect interpre- Tij. tation especially when no natural baseline condition exists. In such circumstances, zero AIEs make it impossible to explore all Assumption 2 is not required for some of the results obtained relevant causal interaction efects. To the contrary, researchers below, but it considerably simplifes the notation. need not to restrict their quantities of interest when using the We now generalize the defnition of the two-way ACE given AMIE, which can take a nonzero value even when one treatment in Equation (3) by accommodating more than two factorial 1:K is set to the baseline condition. For example, the AMIE can be treatments of interest Ti while allowing for the existence of (K 1):J positive if the efect of the second treatment is large when the additional treatments Ti + , which are marginalized out. frst treatment is set to its baseline value. Defnition 1 (The K-Way Average Combination Efect). The K- While it is invariant to the choice of baseline condition, the way average combination efect (ACE) of treatment combi- AMIE critically depends on the distribution of treatments, that nation T1:K t1:K relative to baseline condition T1:K t1:K is is, P(A, B). This is because the AMIE is a function of the AMEs, i = i = 0 defned as, which are themselves obtained by marginalizing out other treatments. This dependency of causal quantities is not new. 1:K 1:K 1:K 1:K (K 1):J k τ1:K (t t ) E Yi(T t , T + ) The potential outcomes framework for 2 factorial experiments ; 0 ≡ i = i " introduced by Dasgupta, Pillai, and Rubin (2015), for example, # & 1:K 1:K (K 1):J (K 1):J Y T t , T + dF(T + ) . defnes causal estimands based on the uniform distribution of − i i = 0 i i $ treatments. Many applied researchers independently randomize ' () multiple treatments and then estimate the AME of each treat- The generalization of the AME defned in Equation (4)to ment by simply ignoring the other treatments. This estimation this setting is straightforward. For example, the AME of Ti1 is 2:J procedure implicitly conditions on the empirical distribution of obtained by marginalizing the remaining factors Ti out. treatment assignments. Although the uniform or empirical distribution would 4.2. The K-Way Average Marginal Interaction Efect be a reasonable default choice for many experimentalists, researchers can improve the external of their exper- We now extend the defnition of the two-way AMIE, given in iment by using a treatment distribution based on the target Equation (5), to higher-order causal interaction and discuss its population (Hainmueller, Hopkins, and Yamamoto 2014). This relationships with the conventional higher-order causal interac- is important for the conjoint analysis, in which treatments are tion efect. We defne the K-way AMIE as the additional efect often characteristics of people. In our empirical application (see of treatment combination beyond the sum of all lower-order Section 2), for example, researchers could obtain the detailed AMIEs. information about the attributes of actual candidates and use it Defnition 2 (The K-Way Average Marginal Interaction as the basis of treatment distribution. Efect). The K-way average marginal interaction efect (AMIE) of treatment combination T1:K t1:K ,relativetobaseline i = condition, T1:K t1:K ,isgivenby, 4. Generalization to Higher-Order Interaction i = 0 K 1 In this section, we generalize the two-way AMIE introduced in − 1:K 1:K (i) 1:K 1:K (i) k k Section 3 to higher-order causal interaction with more than two π1:K (t t0 ) E τ1:K (t t0 ) π (tK t0K ) ; ≡ ⎧ ; − Kk ; ⎫ k 1 factors. We prove that a higher-order AMIE retains the same ⎨ %= K%k⊆KK ⎬ K 1 desirable properties and intuitive interpretation. − ⎩ 1:K 1:K k k ⎭ τ1:K (t t ) π (tK tK ), = ; 0 − Kk ; 0 k 1 k K 4.1. The Setup %= K%⊆K where k K 1,...,K such that k k with k Suppose that we have a total of J factorial treatments denoted by K ⊆(i)K 1:K={1:K } |K |= = 1,...,K, τ1:K (t t0 ) is the unit-level combination efect, ( , ,..., ) (i) ; an vector Ti Ti1 Ti2 TiJ where J 2 and each factor and π (t1:K t1:K ) is the unit-level K-way marginal interaction = ≥ 1:K 1:K ; 0 Tij has a total of L j levels. Without loss of generality, let Ti efect. be a subset of K treatments of interest where K J whereas (K 1):J ≤ T + denotes the remaining (J K) factorial treatment This defnition reduces to Equation (5)whenK 2because i − = variables, which are not of interest. As before, we assume that the one-way AMIE is equal to the AME, that is, π (t t ) 1 ; 0 = the treatment assignment is randomized. ψ1(t, t0). 534 N. EGAMI AND K. IMAI

As in the two-way case, the K-way AMIE is closely related to decompose the K-way ACE as the sum of the K-way AMIE and the K-way AIE. To generalize the two-way AIE given in Equa- all lower-order AMIEs. tion (6), we frst defne the two-way AIE of treatment combina- K 1:2 1:2 ( , ) ( , ) 1:K 1:K k tion t t1 t2 ,relativetobaselineconditiont0 t01 t02 τ ( ) π ( Kk K ). = = 1:K t t0 k t t0 (10) by marginalizing the remaining treatments T3:J. The unit-level ; = K ; k 1 k K two-way interaction efect and the two-way AIE are defned as %= K%⊆K The decomposition is useful for understanding how interaction 1:2 1:2 3:J 3:J ξ1:2(t t ) E Yi(t1, t2, T ) Yi(t01, t2, T ) efects of various order relate to the overall efect of treatment ; 0 ≡ i − i "# combination. However, because of conditioning on the baseline 0 value, a similar decomposition is not applicable to the AIEs. Y (t , t , T3:J ) Y (t , t , T3:J ) dF T3:J . − i 1 02 i + i 01 02 i i Second, in the experimental design literature, the K-way $ 1 ' ( AIE is often interpreted as a conditional interaction efect In addition, defne the conditional two-way AIE by fxing the (see, e.g., Jaccard and Turrisi 2003; Wu and Hamada 2011). level of another treatment T at t∗. i3 For example, the three-way AIE of treatment combination 1:2 1:2 T1:3 t1:3 (t , t , t ) relative to baseline condition T1:3 ξ1:2(t t0 Ti3 t∗) i 1 2 3 i ; | = 1:3 = = = t0 (t01, t02, t03 ),giveninEquation(9), can be rewritten as the 4:J 4:J = E Yi(t1, t2, t∗, T ) Yi(t01, t2, t∗, T ) diference in the conditional two-way AIEs where the third fac- ≡ { i − i "# torial treatment is either set to t3 or t03, 4:J 4:J 4:J Yi(t1, t02, t∗, Ti ) Yi(t01, t02, t∗, Ti ) dF(Ti ) . 1:3 1:3 1:2 1:2 1:2 1:2 − + } ξ1:3(t t ) ξ1:2(t t Ti3 t3 ) ξ1:2(t t Ti3 t03 ). $ ; 0 = ; 0 | = − ; 0 | = Then, the three-way AIE can be defned as the diference Lemma 1 shows that this equivalence relationship can be gener- between the ACE of treatment combination t1:3 (t , t , t ) alized to the K-way AIE (see Appendix A.1). = 1 2 3 and the sum of all conditional two-way and one-way AIEs while Unfortunately, as recognized by others (see, e.g., Wu and conditioning on the baseline condition t1:3 (t , t , t ), Hamada 2011,p.112),althoughitisusefulwhenK 2, this 0 = 01 02 03 = conditional interpretation faces difculty when K is greater than ξ (t1:3 t1:3) 1:3 ; 0 two. For example, the three-way AIE has the conditional efect τ (t1:3 t1:3) ξ (t1:2 t1:2 T t ) = 1:3 ; 0 − 1:2 ; 0 | i3 = 03 interpretation, characterizing how the conditional two-way AIE 2:3 2:3 1,3 1,3 ξ (t t T t ) ξ , (t t T t ) varies as a function of the third factorial treatment. However, + 2:3 ; 0 |0 i1 = 01 + 1 3 ; 0 | i2 = 02 , , according to this interpretation, the two-way AIE, which varies ξ (t t T2:3 t2:3) ξ (t t T1 3 t1 3) − 1 1; 01 | i = 0 + 2 2; 02 | 1 = 0 1 according to the second treatment of interest, itself describes ξ (t t T1:2 t1:2) . (9) + 03 3; 03 | i = 0 how the main efect of one treatment changes as a function of another treatment. This means that the three-way AIE is the Note that the one-way conditional1 AIEs are equivalent to the conditional efect of another conditional efect, making it dif- average efects of single treatments while holding the other cult for applied researchers to gain an intuitive understanding. treatments at their base level. For example, ξ (t t T 2:3 1 1; 01 | i = Finally, as in the two-way case, we can express the K-way t2:3) is equal to τ (t , t2:3 t ).Wealsonotethatξ (t t ) 0 1:3 1 0 ; 0 1 1; 01 = AMIE and K-way AIE as linear functions of one another. The ψ (t t ) π (t t ) holds. In this way, we can generalize the 1 1; 01 = 1 1; 01 next theorem summarizes this result. AIE to higher-order causal interaction. Theorem 1 (Relationships Between the K-Way AMIE and the Defnition 3 (The K-Way Average Interaction Efect). The K- K-Way AIE). Under Assumption 2,theK-way average marginal way average interaction efect (AIE) of treatment combination 1:K 1:K 1:K interaction efect (AMIE), given in Defnition 2, equals the fol- Ti t (t1,...,tK ) relative to baseline condition Ti 1:K = = = lowing linear function of the K-way average interaction efects t0 (t01,...,t0K ) is given by, 1:K 1:K = (AIEs), given in Defnition 3. That is, for any t and t0 ,we ξ (t1:K t1:K ) have 1:K ; 0 K 1 − 1:K 1:K 1:K 1:K (i) (i) π ( ) ξ ( ) 1:K 1:K k Kk KK \Kk KK \Kk 1:K t t0 1:K t t0 E τ1:K (t t0 ) ξ (tK t0 Ti t0 ) ; = ; ≡ ; − Kk ; | = k 1 K 1 2 k⊆ K 3 − %= K%K k k K k K k K 1 ( 1) ξ (TK , tK \K tK )dF(TK ), − + − Kk ; 0 1:K 1:K k k K k K k τ1:K (t t ) ξ (tK tK TK \K tK \K ), k 1 k K # = ; 0 − Kk ; 0 | i = 0 %= K%⊆K k 1 K %= K%k⊆K where 1,...,K such that k with k Kk ⊆ KK ={ } |Kk|= = where the second summation is taken over the set of all possi- 1,...,K.Likewise,butwithoutrequiringAssumption2,the (i) 1:K 1:K ble k K 1, 2,...,K such that k k, τ1:K (t t0 ) K-way AIE can be written as the following linear function of the K ⊆ K ={ } |K(i)|= ; k k KK \Kk is the unit-level combination efect, and ξ (tK t0K Ti K-way AMIEs: Kk ; | = K k t0K \K ) represents the unit-level interaction efect. K 1:K 1:K K k k K k k K k ξ1:K (t t ) ( 1) − π (tK , tK \K tK , tK \K ). While both estimands have similar interpretations, the K- ; 0 = − Kk 0 ; 0 0 k 1 k K way AMIE difers from the K-way AIE in important ways. First, %= K%⊆K the AMIE is expressed as a function of its lower-order efects Proof is in Appendix A.2. All causal estimands introduced whereas the AIE is based on the lower-order conditional AIEs above are identifable under Assumption 1.Weproposenon- rather than the lower-order AIEs. This implies that we can parametric unbiased estimators in Section 5. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 535

4.3. Invariance to the Choice of Baseline Condition These estimators are unbiased only when the treatment assign- ment distribution of an experimental study is used to defne As is the case for the two-way AMIE, the K-way AMIE is the AMEs and AMIEs. Then, Defnition 2 naturally implies the invariant to the choice of baseline condition. In contrast, the following nonparametric estimator of the two-way AMIE: K-way AIEs lack this invariance property. The next theorem generalizes Result 2 to the K-way causal interaction. π (ℓ, m 0, 0) τ (ℓ, m 0, 0) ψ (ℓ 0) ψ (m 0). ˆ jj′ ; =ˆjj′ ; − ˆ j ; − ˆ j′ ; Theorem 2 (Invariance and Lack Thereof to the Choice of Base- Similarly, the nonparametric estimator of higher-order AMIE line Condition). The K-way average marginal interaction efect can be constructed. It is important to emphasize that these (AMIE), given in Defnition 2, is interval invariant. That is, for nonparametric estimators do not assume the absence of higher- any treatment combination t1:K t1:K and control condition order interactions (Hainmueller, Hopkins, and Yamamoto ̸= ˜ t1:K t1:K ,thefollowingequalityholds, 2014). 0 ̸= ˜0 π (t1:K t1:K ) π (t1:K t1:K ) 5.2. Nonparametric Estimation with ANOVA 1:K ; 0 − 1:K ˜ ; 0 π (t1:K t1:K ) π (t1:K t1:K ). = 1:K ; ˜0 − 1:K ˜ ; ˜0 Alternatively, the AMIEs can be estimated nonparametrically using ANOVA with weighted zero-sum constraints, which is In contrast, the average interaction efect (AIE), given in Def- a convex optimization problem (Schefe 1959). For example, inition 3 does not possess the invariance property. That is, the the two-way AMIE considered above can be estimated by the following equality does not generally hold, saturated ANOVA whose objective function is as follows,

K K K K L 1 ξ (tK tK ) ξ (tK tK ) n J j− KK ; 0 − KK ˜ ; 0 j K K Yi µ β 1 Tij ℓ ξ ( K K ) ξ ( K K ). ℓ K tK t0 K tK t0 (11) − − { = } = K ; ˜ − K ˜ ; ˜ i 1 4 j 1 ℓ 0 %= %= %= J 1 L j 1 L j 1 Proof is in Appendix A.3. − − ′ − β jj′ 1 T ℓ, T m − ℓm { ij = ij′ = } j 1 j > j ℓ 0 m 0 %= %′ %= %= 5. Estimation and Regularization J 2 βKk 1 TKk tKk , (12) In this section, we show how to estimate the AMIE using − tKk { i = } k 3 k J Kk 5 the general notation introduced in Section 4.Forthesakeof %= K%⊂ K %t j simplicity, our discussion focuses on the two-way AMIE but we where µ is the global , βℓ is the coefcient for the frst- show that all the results presented here can be generalized to jj′ order term for the jth factor with ℓ level, β is the coefcient the K-way AMIE. We frst introduce nonparametric estimators ℓm for the second-order interaction term for the jth and j′th based on diference in sample means. We then prove that the k factors with ℓ and m levels, respectively, and more generally βK AMIE can also be nonparametrically estimated using ANOVA tKk is the coefcient for the interaction term for a set of k factors with weighted zero-sum constraints (Schefe 1959). when their levels are equal to tKk .NotethatasinSection 4, While ANOVA is mainly used for a balanced design, our Kk we have k and 1, 2,...,J .Weemphasizethat approach is applicable to the unbalanced design as well so long |Kk|= KJ ={ } the nonparametric estimation requires all interaction terms up as Assumptions 1 and 2 hold. Finally, we show how to directly to J-way interaction. See Section 5.3 for efcient parametric regularize the AMIEs by collapsing levels and selecting factors estimation. (Post and Bondell 2013). Because of the invariance property of We minimize the objective function given in Equation (12) the AMIEs, this regularization method is also invariant to the subject to the following weighted zero-sum constraints where choice of baseline condition. The proposed method reduces the weights are given by the marginal distribution of treatment false discovery and facilitates interpretation when there are assignment, many factors and levels. L 1 j− Pr(T ℓ)β j 0forallj, (13) ij = ℓ = 5.1. Diference-in-Means Estimators ℓ 0 %= L 1 j− In the causal inference literature, the following diference-in- jj′ Pr(T ℓ)β 0forallj j′ means estimators have been used to nonparametrically estimate ij = ℓm = ̸= ℓ 0 the ACE and AME (e.g., Hainmueller, Hopkins, and Yamamoto %= and m 0, 1,...,L 1 , (14) 2014;Dasgupta,Pillai,andRubin2015): ∈{ j′ − } L 1 j− n ℓ, ( ℓ) ℓ βKk , k , i 1 Yi1 Tij Tij′ m Pr Tij 1 t j 0forallj tK τ (ℓ, m 0, 0) = { = = } = { = } tKk = jj′ n ℓ, ℓ 0 ˆ ; = i 1 1 Tij Tij′ m %= ! = { = = } . n , and k J such that k 3and j k (15) i 1 Yi1 Tij 0 Tij′ 0 K ⊂ K ≥ ∈ K ! = { = = }, n , − i 1 1 Tij 0 Tij′ 0 Finally, the next theorem shows that the diference in the esti- ! = { = = } n n mated ANOVAcoefcients represents a nonparametric estimate i 1 Yi1 Tij ℓ i 1 Yi1 Tij 0 ψ (ℓ ) != { = } = { = }. ˆ j 0 n n of the AMIE. ; = i 1 1 Tij ℓ − i 1 1 Tij 0 ! = { = } ! = { = } ! ! 536 N. EGAMI AND K. IMAI

Theorem 3 (Nonparametric Estimation with ANOVA). Under Finally, the penalty is given by,

Assumptions 1 and 2,diferencesintheestimatedcoefcients J from ANOVAbased on Equations (12)–(15) represent nonpara- j j w max φ (ℓ,ℓ′) c, ℓℓ′ { }≤ metric unbiased estimators of the AME and the AMIE: j 1 ℓ,ℓ %= %′ (β j β j ) ψ (ℓ ), (β jj′ β jj′ ) π (ℓ, , ), j E ℓ 0 j 0 E ℓm 00 jj′ m 0 0 where c is the cost parameter and w is the adaptive weight of ˆ − ˆ = ; ˆ − ˆ = ; ℓℓ′ E(βKk βKk ) π (tKk tKk ). the following form, ˆtKk ˆ Kk k 0 − t0 = K ; 1 j j − w (L j 1) L j max φ (ℓ,ℓ′) , ℓℓ′ ¯ Proof is given in Appendix A.4. These estimators are asymp- = + { } where (L 1) L8 is the standardization9 factor: (Bondell and totically equivalent to their corresponding diference-in-means j + j φ j (ℓ,ℓ ) estimators when the treatment assignment distribution of an Reich 2009), and9 ¯ ′ represents the corresponding set experimental study is used as weights. The proposed ANOVA of all AMEs and AMIEs estimated without regularization. framework, however, allows researchers to use any treatment Post and Bondell (2013)showedthat,whencombinedwith assignment distributions to defne the AME and the AMIE so Equations (12)–(15), the resulting optimization problem is long as Assumptions 1 and 2 hold. aquadraticprogrammingproblem.Theyalsoprovethatthe method has the oracle property.

5.3. Regularization 6. Empirical Analysis AkeyadvantageofthisANOVA-basedestimatorinSection 5.2 We apply the proposed method to the conjoint analysis of over the diference-in-means estimator in Section 5.1 is that coethnic voting described in Section 2. Although conjoint anal- we can directly regularize the AMIEs in a penalized regression ysis is based on the of multiple factors, it difers framework. The regularization is especially useful for reducing from factorial experiments in that respondents evaluate pairs of false positives and facilitating interpretation when the number randomly selected profles. Thus, we only observe which profle of factors is large. they prefer within a given pair but do not know how much We apply the regularization method (Grouping and Selection they like each profle. As shown below, this particular feature using Heredity in ANOVA or GASH-ANOVA) proposed by of conjoint analysis leads to a modifed formulation of ANOVA Post and Bondell (2013), which places penalties on diference model. As explained in Section 5,wecanapplythestandard in coefcients of the ANOVA regression. As shown above, these ANOVA (possibly with regularization) to estimate the AMEs diferences correspond to the AMEs and AMIEs. While there and AMIEs in a typical . Our analysis fnds exist other regularization methods for categorical variables (e.g., clear patterns of causal interaction between the Record and Yuan and Lin 2006;Meier,VanDeGeer,andBühlmann2008; Coethnicity variables as well as between the Record and Zhao, Rocha, and Yu 2009;Huangetal.2009;Huang,Breheny, Platform variables. and Ma 2012;LimandHastie2015), these methods regularize coefcients rather than their diferences. In addition, GASH- 6.1. A of Preference Diferentials ANOVA collapses levels and selects factors by jointly consid- Our empirical application is based on the choice-based conjoint ering the AMEs and AMIEs rather than the AMEs alone. This analysis, in which respondents are asked to evaluate three pairs is attractive because many social scientists believe large interac- of hypothetical presidential candidates in turn. Let Y (t) be the tion efects can exist even when marginal efects are small. The i potential preference by respondent i for a hypothetical candidate method also collapses levels in a mutually consistent manner. characterized by a vector of attributes t.Inthisexperiment,t is Finally, because the AMEs and AMIEs are invariant to a four-dimensional vector, based on the values of factorial treat- the choice of baseline condition, this regularization method ments shown in Table 1 where each factor T has L levels (i.e., also inherits the invariance property, which is not generally ij j {Coethnicity, Record, Platform, Degree}). the case (Lim and Hastie 2015). In particular, even if one is Given the limited sample size, we assume the absence of interested in conditional average causal efects, regularization three-way or higher-order causal interaction and use the fol- should be based on the AMEs and AMIEs because of their lowing ANOVA regression model of potential outcomes with invariance property. As shown in Equation (8), we can compute all one-way efects and two-way interactions: the conditional average efects directly from these quantities. L 1 To illustrate the application of GASH-ANOVA, consider a 4 j− situation of practical interest in which we assume the absence Y (t) µ β j1 t ℓ i = + ℓ { ij = } j 1 ℓ 0 of causal interaction higher than the second order. That is, = = k % % in Equation (12), we assume βK 0forallk 3. GASH- 3 L j 1 L j 1 tKk = ≥ − ′ − ANOVA collapses two levels within a factor by directly and β jj′ 1 t ℓ, t m ϵ (t). + ℓm { ij = ij′ = }+ i jointly regularizing the AMEs and AMIEs that involve those j 1 j > j ℓ 0 m 0 %= %′ %= %= two levels. Defne the set of all the AMEs and AMIEs that (16) involve levels ℓ and ℓ′ of the jth factor as follows, The results in Section 5.2 imply that the coefcients in this L 1 j′ − model represent the AIEs and AMIEs. j j j jj′ jj′ φ (ℓ,ℓ′) β β β β . In this conjoint analysis, respondents evaluate a pair of hypo- = ℓ − ℓ′ ⎧ ℓm − ℓ′m ⎫ j j m 0 &6 6) ⎨ ′ = 6 6⎬ thetical candidates with diferent attributes. This means that 6 6 7 7̸= 7 6 6 6 6 6 6 ⎩ ⎭ JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 537 we only observe whether respondent i prefers a candidate with Table . Ranges of the estimated average marginal effects (AMEs) and estimated † average marginal interaction effects (AMIEs). The estimated selection probability of attributes Ti∗ over another candidate with attributes Ti . Thus, the AME (AMIE) is one minus the proportion of  bootstrap replicates in which based on the model of preference given in Equation (16), we all coefficients for the corresponding factor (factor interaction) are estimated to be construct a linear probability model of preference diferential, zero.

† † Selection Pr(Y (T∗)>Y (T ) T∗, T ) i i i i | i i prob. L 1 4 j− j † AME µ β (1 T ∗ ℓ 1 T ℓ ) =˜+ ℓ { ij = }− { ij = } Record . . j 1 ℓ 0 Coethnicity . . %= %= Platform . . jj′ † † β (1 T ∗ ℓ, T ∗ m 1 T ℓ, T m ), Degree . . + ℓm { ij = ij′ = }− { ij = ij′ = } j, j ℓ,m AMIE %′ % Coethnicity Record . . Record Platform× . . where µ 0.5ifapositionwithinapairdoesnotmatter.Note × ˜ = Platform Coethnicity . . Record ×Degree . . that the independence of irrelevant alternatives is assumed. If × we additionally assume the diference in errors follow indepen- Coethnicity Degree . . Platform Degree× . . dent Type I extreme value distributions, the model becomes the × conditional logit model, which is popular in conjoint analysis Record,whichisthebasisofthemainfndingintheoriginal (McFadden 1974). article, is estimated to have a large range of 5.4 percentage point, We minimize the sum of squared residuals, subject to the and is selected with probability one. The range of this AMIE is constraints given in Equations (13)and(14)wherePr(Tij ℓ) Coethnicity † = as great as that of the AME of and is greater represents the marginal distribution of Tij∗ and Tij together. We than that of Platform.Additionally,theproposedmethod also apply the regularization method discussed in Section 5.3. selects the causal interactions, Record Platform and × To be consistent with the original dummy coding, we treat Platform Coethnicity, with probability close to one. × Record and Platform as ordered categorical variables and We focus on the two largest causal interactions, Coethnicity place penalties on the diferences between adjacent levels rather Record and Record Platform. × × than the diferences based on every pairwise comparison. We Next, we examine the estimated AMEs presented in use the order of levels as shown in Table 1.Wechoosethe Table 3.FortheRecord variable, under the 90% selection uniform distribution for treatment assignment and select the probability rule, we collapse a total of original seven lev- value of the cost parameter c based on the minimum mean els into three levels—{Yes/Village, Yes/District, squared error criterion in 10-fold cross-validation. Yes/MP}, {No/Village, No/District , No/MP}, and Since the inference for a regularization method that collapses {No/Businessman}. This partition suggests that politicians levels of factorial variables is not established in the literature with good record are preferred over those without it including (Bühlmann and Dezeure 2016), we focus on the stability of businessman. Similarly, we fnd two groups in the Platform selection (e.g., Breiman 1996;MeinshausenandBühlmann variable—{Jobs, Clinic}and{Education}—where 2010). In particular, we estimate the selection probability for voters appear to favor candidates with the education platform each AME and AMIE using one minus the proportion of 5000 on average. bootstrap replicates in which all coefcients for the correspond- We now investigate two signifcant causal interactions, ing factor or factor interaction are estimated to be zero (Efron Coethnicity Record and Record Platform. × × 2014;Hastie,Tibshirani,andWainwright2015). Although we Figure 1 visualizes all estimated AMIEs within each factor do not control the family wise error rate, we follow Meinshausen interaction. The cells with warmer red (colder blue) color and Bühlmann (2010)anduse90%cutofasourdefault. Another possible inferential approach is sample splitting Table . The estimated average marginal effects (AMEs). The estimated selection where we collapse levels and select factors using training data probability is the proportion of  bootstrap replicates in which the difference and then estimate and compute confdence intervals for the between two adjacent levels is estimated to be different from zero. AMEs and AMIEs using test data (Wasserman and Roeder Selection 2009;AtheyandImbens2016;Chernozhukovetal.2018). Factor AME prob. Although we do not present the results based on this approach Record here, it can be implemented through our open-source software Yes/Village . 0.64 FindIt Yes/District . ⟩ package, . ⎧ 0.80 Yes/MP . ⟩ ⎨ 1.00 No/Village . ⟩ ⎩ 0.76 No/District . ⟩ 6.2. Findings ⎧ 0.84 No/MP . ⟩ ⎨ 0.99 We begin by reporting the ranges of the estimated AMEs and No/Businessman base ⟩ Platform⎩ AMIEs and their selection probability to determine signifcant 0 Jobs . − 0.80 factors and factor interactions, respectively. As shown in Table 2, Clinic . ⟩ 2 − 0.97 three factors—Record, Platform,andCoethnicity— Education base ⟩ { are found to be signifcant factors whereas Degree is not. Coethnicity . . In terms of the AMIEs, the interaction Coethnicity Degree . . × 538 N. EGAMI AND K. IMAI

Figure . The estimated AMIEs for Coethnicity Record (the first row) and Platform Record (the second row). The first and second columns show the estimated AMIEs without and with regularization, respectively.× × represents a greater (smaller) AMIE than the average AMIE has an additional penalty of 3 percentage points relative to within that factor interaction. The estimates with regularization non-coethnic MP without good record. All three estimates are (right column) show clearer patterns for causal interaction than selected with probability close to one. those without regularization (left column). Finally, we examine the Platform Record interac- × First, regarding the Coethnicity Record inter- tion, which was not discussed in the original study. We fnd × action (upper panel of the fgure), for example, we two distinct groups: (1) politicians with record, businessmen fnd that being coethnic gives an average bonus of 5.3 without record and (2) politicians without record. Candidates percentage point if a candidate is an MP with good in the second group appear to receive an additional penalty record beyond the average efect of coethnicity (selec. by promising to improve education. Specifcally, the estimated prob. 1). In contrast, being coethnic has an additional AMIE of {Education, No/MP}relativeto{Job, No/MP} = penalty of 4.6 percentage points when a candidate is a district is 2.3 percentage point (selec. prob. 0.99). In fact, the − = level politician without good record (selec. prob. 0.98). As average conditional efect of Education relative to Job given = shown in Equation (8), we can compute the average conditional No/MP is about zero (selec. prob. 0.75). These results suggest = efect as the sum of the AME and AMIE. As expected, while the that even though promising to improve education is efective conditional average efect of being coethnic for an MP candidate on average (the estimated AME of Education relative to Job with good record is 10.7 percentage point (selec. prob. 1), this is 2.3 percentage point (selec. prob. 0.98)), it has no efect for = = efect is almost zero for an MP candidate without good record. politicians without record. These fndings support the argument of Carlson (2015). The decomposition shown in Equation (10)canbeused 7. Concluding Remarks to understand the ACE. As an illustration, we decompose the ACE of Coethnic, No/Business relative to In this article, we propose a new causal interaction efect for { } Non-coethnic, No/MP , which is a estimated negative randomized experiments with a factorial design, in which there { } efect of 2.4 percentage points (selec. prob. 0.89), as follows, exist many factors with each having several levels. We call = this quantity, the average marginal interaction efect (AMIE). τ(Coethnic, No/Business Non-coethnic, No/MP) Unlike the conventional causal interaction efect, the AMIE is ; 2.4 invariant to the choice of baseline. This enables us to provide a − ψ(Coethnic Non-coethnic) simpler interpretation even in a high-dimensional setting. We ; = ; <= > 5.3 show how to nonparametrically estimate the AMIE within the ψ(No/Business No/MP) ANOVA regression framework. The invariance property also ;+ <= ; > 4.7 enables us to apply a regularization method by directly penal- − π(Coethnic, No/Business Non-coethnic, No/MP) . izing the AMIEs. This reduces false discovery and facilitates + ; <= > ; 3.0 interpretation. − We emphasize that the AMIE, which is a generalization ; <= > We observe that while the average efect of being coethnic is of the average marginal efect studied in the literature on 5.3 percentage points, being a businessman, relative to being an factorial experiments, critically depends on the distribution MP without good record, yields an average efect of negative 4.7 of treatments. For example, in a well-known audit study of percentage points. In addition, being a coethnic businessman labor market discrimination where researchers randomize JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 539 the information on the resume of a fctitious job applicant Bühlmann, P., and Dezeure, R. (2016), Discussion of “Regularized Regres- (e.g., Bertrand and Mullainathan 2004), the average efect of sion for Categorical Data” by Tutz and Gertheiss, Statistical Modelling, applicant’s race requires the specifcation of other attributes 16, 205–211. [537] Carlson, E. (2015), “Ethnic Voting and Accountability in Africa: A Choice such as education levels and prior job experiences. In the real Experiment in Uganda,” World Politics,67,353–385.[530,538] world, these characteristics may be correlated with race and Chernozhukov, V., Chetverikov, D., Demirer, M., Dufo, E., Hansen, C., act as an efect modifer. Thus, ideally, researchers should Newey, W., and Robins, J. (2018), “Double/Debiased Machine Learn- obtain the target population distribution of treatments, for ing for Treatment and Structural Parameters,” The Jour- example, the characteristics of job applicants in a relevant nal,21,C1–C68.[537] Cox,D.R.(1958),PlanningofExperiments,NewYork:Wiley.[531] labor market, and use it as the basis for treatment randomiza- ———(1984),“Interaction,”InternationalStatisticalReview,52,1–24.[530 ,532] tion. This will improve the external validity of experimental Dasgupta,T.,Pillai,N.S.,andRubin,D.B.(2015),“CausalInferenceFrom studies. 2k Factorial Designs by Using Potential Outcomes,” Journal of the Royal Finally, our method is motivated by and applied to conjoint Statistical Society,SeriesB,77,727–753.[530,531,532,533,535] analysis, a popular survey experiment with a factorial design. de González, A. B., and Cox, D. R. (2007), “Interpretation of Interaction: A Review,” The Annals of Applied Statistics,1,371–385.[530] The methodological literature on conjoint analysis has largely Efron, B. (2014), “Estimation and Accuracy After ,” Journal ignored the role of causal interaction. The method proposed of the American Statistical Association,109,991–1007.[537] in this article allows researchers to efectively explore signif- Egami, N., Ratkovic, M., and Imai, K. (2017), “FindIt: Finding Het- icant causal interaction among several factors. Although not erogeneous Treatment Efects,” available at the Comprehen- investigated in this article, future research should investigate sive R Archive Network (CRAN), available at https://CRAN.R- project.org/package FindIt.[530,539] = interaction between treatments and pretreatment covariates. It Green, D. P., and Kern, H. L. (2012), “Modeling Heterogeneous Treat- is also of interest to develop sequential experimental designs ment Efects in Survey Experiments With Bayesian Additive Regres- in the context of factorial experiments so that researchers can sion Trees,” Public Opinion Quarterly,76,491–511.[530] efciently reduce the number of treatments. Green, P.E., Krieger, A. M., and Wind, Y. (2001), “Thirty Years of Conjoint Analysis: Refections and Prospects,” Interfaces,31,56–73.[529,530] Green, P.E., and Rao, V.R. (1971), “Conjoint Measurement for Quantifying Supplementary Materials Judgmental Data,” Journal of Marketing Research,8,355–363.[530] Grimmer, J., Messing, S., and Westwood, S. J. (2017), “Estimating Heteroge- In the supplementary materials, we provide proofs of all the the- neous Treatment Efects and the Efects of Heterogeneous Treatments With Ensemble Methods,” Political Analysis,25,413–434.[530] orems presented in the article. Hainmueller, J., and Hopkins, D. J. (2015), “The Hidden American Immi- gration Consensus: A Conjoint Analysis of Attitudes Toward Immi- grants,” American Journal of Political Science,59,529–548.[529,530] Acknowledgments Hainmueller, J., Hopkins, D. J., and Yamamoto, T. (2014), “Causal Inference in Conjoint Analysis: Understanding Multidimensional The proposed methods are implemented through open-source software Choices via Stated Preference Experiments,” Political Analysis,22,1– FindIt: Finding Heterogeneous Treatment Efects (Egami, Ratkovic, and Imai 30. [530,531,533,535] 2017), which is freely available as an R package at the Comprehensive R Hastie, T., Tibshirani, R., and Wainwright, M. (2015), Statistical Learning Archive Network (CRAN; https://cran.r-project.org/package FindIt). The = With Sparsity: The Lasso and Generalizations,BocaRaton,FL:CRC authors thank Elizabeth Carlson for providing them with data and answer- Press. [537] ing their questions. The authors are also grateful for Jens Hainmueller, Hill, J. L. (2012), “Bayesian Nonparametric Modeling for Causal Inference,” Walter Mebane, Dustin Tingley, Teppei Yamamoto, Tyler VanderWeele, Journal of Computational and Graphical Statistics,20,217–240.[530] and seminar participants at Carnegie Mellon University (Statistics), Huang, J., Breheny, P., and Ma, S. (2012), “A Selective Review of Group Georgetown University (School of Public Policy), Stanford (Political Selection in High-Dimensional Models,” Statistical Science,27,481– Science), Umea University (Statistics), University of Bristol (Mathematics), 499. [536] and UCLA (Political Science) for helpful comments on an earlier version Huang, J., Ma, S., Xie, H., and Zhang, C.-H. (2009), “A Group Bridge of the article. Approach for Variable Selection,” Biometrika,96,339–355.[536] Imai, K., and Ratkovic, M. (2013), “Estimating Treatment Efect Hetero- geneity in Randomized Program Evaluation,” Annals of Applied Statis- ORCID tics,7,443–470.[530] Jaccard, J., and Turrisi, R. (2003), Interaction Efects in Multiple Regression, Naoki Egami http://orcid.org/0000-0002-5491-2174 Thousand Oaks, CA: Sage Publications. [530,534] Kosuke Imai http://orcid.org/0000-0002-2748-1022 Lim, M., and Hastie, T. (2015), “Learning Interactions via Hierarchical Group-Lasso Regularization,” Journal of Computational and Graphical Statistics,24,627–654.[536] References Lu, J. (2016a), “Covariate Adjustment in Randomization-Based Causal Inference for 2k Factorial Designs,” Statistics & Probability Letters,119, Athey, S., and Imbens, G. (2016), “Recursive Partitioning for Heteroge- 11–20. [530] neous Causal Efects,” Proceedings of the National Academy of Sciences, ——— (2016b), “On Randomization-Based and Regression-Based Infer- 113, 7353–7360. [537] ences for 2k Factorial Designs,” Statistics & Probability Letters,112,72– Bertrand, M., and Mullainathan, S. (2004), “Are Emily and Greg More 78. [530] Employable Than Lakisha and Jamal?: A Field Experiment on Labor Luce,R.D., andTukey,J.W.(1964), “Simultaneous Conjoint Measurement: Market Discrimination,” American Economic Review,94,991–1013. ANewTypeofFundamentalMeasurement,”Journal of Mathematical [539] Psychology,1,1–27.[529,530] Bondell, H. D., and Reich, B. J. (2009), “Simultaneous Factor Selection and Marshall, D., Bridges, J. F., Hauber, B., Cameron, R., Donnalley, L., Fyie, K., Collapsing Levels in ANOVA,” Biometrics,65,169–177.[536] and Johnson, F. R. (2010), “Conjoint Analysis Applications in Health: Breiman, L. (1996), “Heuristics of Instability and Stabilization in Model How are Studies Being Designed and Reported?” The Patient: Patient- Selection,” The Annals of Statistics,24,2350–2383.[537] Centered Outcomes Research,3,249–256.[529,530] 540 N. EGAMI AND K. IMAI

Marshall, P., and Bradlow, E. T. (2002), “A Unifed Approach to Conjoint Rubin, D. B. (1990), Comments on “On the Application of Probability Analysis Models,” Journal of the American Statistical Association,97, Theory to Agricultural Experiments. Essay on Principles. Section 9” 674–682. [529,530] by J. Splawa-Neyman translated from the Polish and edited by D. M. McFadden, D. (1974), “Conditional Logit Analysis of Qualitative Choice Dabrowska and T. P. Speed, Statistical Science,5,472–480.[531] Behavior,” in Frontiers in Econometrics,ed.P.Zarembka,NewYork: Schefe, H. (1959), The Analysis of ,NewYork:Wiley.[530,535] Academic Press, pp. 105–142. [537] VanderWeele, T. (2015), Explanation in Causal Inference: Methods for Medi- Meier, L., Van De Geer, S., and Bühlmann, P. (2008), “The Group Lasso for ation and Interaction,Oxford:OxfordUniversityPress.[532] Logistic Regression,” Journal of the Royal Statistical Society,SeriesB,70, VanderWeele, T. J., and Knol, M. J. (2014), “A Tutorial on Interaction,” Epi- 53–71. [536] demiologic Methods,3,33–72.[530] Meinshausen, N., and Bühlmann, P. (2010), “Stability Selection,” Wager, S., and Athey, S. (2017), “Estimation and Inference of Heteroge- Journal of the Royal Statistical Society,SeriesB,72,417–473. neous Treatment Efects Using Random Forests,” Journal of the Ameri- [537] can Statistical Association.[530] Murphy, S. A. (2003), “Optimal Dynamic Treatment Regimes (with discus- Wasserman, L., and Roeder, K. (2009), “High Dimensional Variable Selec- sions),” Journal of the Royal Statistical Society,SeriesB,65,331–366. tion,” Annals of Statistics,37,2178–2201.[537] [530] Wu, C. J., and Hamada, M. S. (2011), Experiments: Planning, Analysis, and Post,J.B.,andBondell,H.D.(2013), “Factor Selection and Structural Iden- Optimization (Vol. 552), New York: Wiley. [534] tifcation in the Interaction ANOVA Model,” Biometrics,69,70–79. Yuan, M., and Lin, Y. (2006), “Model Selection and Estimation in Regression [530,535,536] With Grouped Variables,” Journal of the Royal Statistical Society,Series Robins, J. M. (2004), “Optimal Structural Nested Models for Optimal B, 68, 49–67. [536] Sequential Decisions,”in Proceedings of the Second Seattle Symposium in Zhao, P., Rocha, G., and Yu, B. (2009), “The Composite Absolute Penalties : Analysis of Correlated Data,NewYork:Springer,pp.189– Family for Grouped and Hierarchical Variable Selection,” The Annals 326. [530] of Statistics, 37, 3468–3497. [536]