Sufficient Covariates and Linear Propensity Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Sufficient covariates and linear propensity analysis Hui Guo A. Philip Dawid Statistical Laboratory, University of Cambridge Abstract ;; 0; 1: FT = ; labels the observational regime, while, for t = 0; 1, FT = t labels the interventional regime in which T is set to t. There will be a joint distribution Working within the decision-theoretic frame- P of all relevant variables associated with each regime work for causal inference, we study the f f. Notations such as P (A) and P(A j F = f) will be properties of “sufficient covariates", which f T used interchangeably. support causal inference from observational data, and possibilities for their reduction. In Causal assumptions, relating the different regimes, can particular we illustrate the r^oleof a propen- be conveniently expressed using the notation and cal- sity variable by means of a simple model, and culus of conditional independence, extended to allow explain why such a reduction typically does some of the variables (here, the regime indicator FT ) not increase (and may reduce) estimation ef- to be non-random (Dawid 1979a; Dawid 1980; Dawid ficiency. 2002). For example, the \ignorable treatment assign- ment" assumption, which states that the distribution of Y given T = t in the observational regime is the Keywords: Average causal effect; Propensity variable; same as in the regime that intervenes to set T = t, can Linear discriminant; Quadratic discriminant; Suffi- be expressed as cient covariate. Y ?? FT j T: (1) This is however a strong condition that will rarely be 1 Introduction: Decision-theoretic appropriate in the absence of randomisation. causality 2 Sufficient covariate Our concern is to understand the causal effect of a bi- nary treatment variable T on a real-valued outcome For simplicity we confine attention to the average variable Y , and to consider when and how it might be causal effect ACE of T on Y , defined by: estimated from observational data. In particular, we shall be concerned with defining, and making appro- ACE := E1(Y ) − E0(Y ): (2) priate adjustment for, confounding variables. Because it is defined in terms of interventional regimes, In contrast to the prevalent \potential outcomes" in- ACE has a direct causal interpretation. Our prime terpretation of statistical causality (Rubin 1974; Rubin task is to try and identify ACE from data collected 1978), we shall operate within the decision-theoretic under the observational regime, FT = ;. The nat- framework for causal inference (Dawid 2002). This ural observational counterpart of ACE is the \face- aims to identify appropriate assumptions allowing value average causal effect”, FACE := E;(Y j T = transfer of distributional information between various 1)−E;(Y j T = 0). We typically will not have FACE = regimes, comprising an observational regime, whose ACE unless we can assume ignorable treatment assign- properties can be identified from data, and interven- ment, which will often be unreasonable. However, ex- tional regimes, that arise when the treatment is as- ternal considerations may make it relatively easy to signed by external manipulation. We introduce a non- argue that a certain variable X is a sufficient covari- stochastic regime indicator variable FT , with values ate, defined as follows. Appearing in Proceedings of the 13th International Con- Definition 1 A (possibly multivariate) variable X is ference on Artificial Intelligence and Statistics (AISTATS) a covariate (with respect to treatment T ) if: 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: W&CP 9. Copyright 2010 by the authors. Property 1: X ?? FT : 2 281 Sufficient covariates and linear propensity analysis Property 1 requires that the distribution of X be the Lemma 1 Suppose X is a strongly sufficient covari- same in all regimes, whether observational or interven- ate. Then, as distributions for (Y; X; T ), Pt tional. This will typically be appropriate when X is an P; (t = 0; 1). attribute of the unit to which treatment is applied, or of its environment, determined prior to the treatment Proof. Let A be an event for (Y; X; T ). Property 2, decision. expressed equivalently as (Y; X; T ) ?? FT j (X; T ), as- serts that there exists a function w(X; T ) such that Definition 2 X is a sufficient covariate (for the ef- Pf (A j X; T ) = w(X; T ), a.s. for f = 0; 1; ;. If P1 fect of treatment T on outcome Y ) if, in addition to now P;(A) = 0, then, a.s. [P;], t=0 P;(T = Property 1, we have P1 t j X) w(X; t) = t=0 P;(T = t j X)P;(A j X; T = t) = P;(A j X) = 0. By Property 3, for t = 0; 1, Property 2: Y ?? FT j (X; T ): 2 w(X; t) = 0 a.s. [P;] and hence, by Property 1, a.s. [P ]. So P (A) = E fw(X; t)g = 0. 2 Property 2 states, informally, that the conditional dis- t t t tribution of Y , given X and T , is the same in all regimes. Property 2 can also be described as \ignor- Theorem 1 Let X be a strongly sufficient covariate. able treatment assignment, given X" (Rosenbaum and Then for any integrable Z (Y; X; T ), and any ver- Rubin 1983). In any given problem there may be sev- sions of the conditional expectations, eral distinct sufficient covariates, or none at all. In contrast to the case for statistical (Fisher) sufficiency, Et(Z j X) = E;(Z j X; T = t)(t = 0; 1) (3) there need not exist a minimal sufficient covariate. almost surely in any regime. We can take E;(Z j X; T ) A rigorous statement (Dawid 1979a; Dawid 1979b) of or ET (Z j X) as a version of E(Z j X; T ) in all regimes. Property 2 is as follows. Let Z be a function of Y | which we henceforth notate as Z Y | whose expec- Proof. By Property 2 there exists a function tation exists in each regime | which we henceforth w(X; T ) which is a version of Ef (Z j X; T ) for f = denote by \Z is integrable". Then there exists a ran- 0; 1; ;. In particular, E;(Z j X; T ) = w(X; T ) a.s. [P;] dom variable W (X; T ) such that, for each regime and thus, by Lemma 1, a.s. [Pt](t = 0; 1) | thus f = 0; 1; ;, W serves as a version of the conditional we can take E;(Z j X; T ) as the common version of Ef (Z j X; T ) for f = 0; 1; ;. In particular, a.s. [Pt], expectation Ef (Z j X; T ) under the distribution Pf as- sociated with regime f. (Because we focus on ACE, Et(Z j X) = Et(Z j X; T = t) = E;(Z j X; T = t). So we will only need this property for the case Z ≡ Y | (3) holds a.s. [Pt] and thus, by Property 1, a.s. in each assumed integrable.) regime. 2 Properties 1 and 2 can be represented graphically by means of the DAG (influence diagram) of Figure 1.1 Theorem 1 expresses rigorously what we mean by say- ing that the observational conditional distribution of Y , given a strongly sufficient covariate X, for those X happening to receive treatment t, is the same as the interventional conditional distribution of Y given X, FT T Y for those given treatment t. Figure 1: Sufficient covariate 2.1 Back-door formula Let X be a covariate. For many purposes we will also require the follow- ing reasonable positivity condition, requiring that, for Definition 4 The specific causal effect (of T on Y , each possible level of X, both treatments are used in relative to X) is the random variable 2 the observational regime: SCE := E1(Y j X) − E0(Y j X): Definition 3 A variable X is a strongly sufficient co- Then SCE is a function of X, defined almost surely variate if it is a sufficient covariate and, in addition: (under any regime). Where we need to indicate its con- struction from the specific covariate X, we annotate Property 3: For t = 0 or 1, P;(T = t j X) > 0 with SCE as SCEX ; we also write SCE(X) or SCEX (X) to probabilility 1. 2 express SCE as a function of X. Because it is defined 1The dotted arrow indicates a link that disappears un- in terms of interventional regimes, SCE has a direct der an interventional regime: when FT = t, T will have the causal interpretation: SCE(x) is the average causal ef- 1-point distribution at t, independently of X. fect in the subpopulation having X = x. 282 Hui Guo, A. Philip Dawid Theorem 2 For any covariate X, ACE = E(SCEX ) That is, for each applied treatment, once V is (where the expectation may be taken under any known, any further information about X is of no regime). value for predicting Y . (b). Treatment-sufficient reduction: Proof. By Property 1, E;fEt(Y j X)g = EtfEt(Y j X)g = Et(Y ). By subtraction, ACE = T ?? X j (V; FT = ;): (7) Ef (SCEX ) for f = ; and thus, again by Property 1, also for f = 0; 1. 2 That is, in the observational regime, the choice of treatment depends on X only through V . SCEX is typically not identifiable from purely obser- Proof. Assume first Condition (a). By (6), for vational data. However, if X is strongly sufficient, any integrable Z Y there exists a version, w(X; t) by (3) we can also express SCEX as E;(Y j X; T = say, of Et(Z j X) that is a function of V (t = 0; 1).