Sufficient Covariates and Linear Propensity Analysis

Sufficient covariates and linear propensity analysis Hui Guo A. Philip Dawid Statistical Laboratory, University of Cambridge Abstract ;; 0; 1: FT = ; labels the observational regime, while, for t = 0; 1, FT = t labels the interventional regime in which T is set to t. There will be a joint distribution Working within the decision-theoretic frame- P of all relevant variables associated with each regime work for causal inference, we study the f f. Notations such as P (A) and P(A j F = f) will be properties of “sufficient covariates", which f T used interchangeably. support causal inference from observational data, and possibilities for their reduction. In Causal assumptions, relating the different regimes, can particular we illustrate the rôleof a propen- be conveniently expressed using the notation and cal- sity variable by means of a simple model, and culus of conditional independence, extended to allow explain why such a reduction typically does some of the variables (here, the regime indicator FT ) not increase (and may reduce) estimation ef- to be non-random (Dawid 1979a; Dawid 1980; Dawid ficiency. 2002). For example, the \ignorable treatment assignment" assumption, which states that the distribution of Y given T = t in the observational regime is the Keywords: Average causal effect; Propensity variable; same as in the regime that intervenes to set T = t, can Linear discriminant; Quadratic discriminant; Suffi- be expressed as cient covariate. Y ?? FT j T: (1) This is however a strong condition that will rarely be 1 Introduction: Decision-theoretic appropriate in the absence of randomisation. causality 2 Sufficient covariate Our concern is to understand the causal effect of a bi- nary treatment variable T on a real-valued outcome For simplicity we confine attention to the average variable Y , and to consider when and how it might be causal effect ACE of T on Y , defined by: estimated from observational data. In particular, we shall be concerned with defining, and making appro- ACE := E1(Y ) − E0(Y ): (2) priate adjustment for, confounding variables. Because it is defined in terms of interventional regimes, In contrast to the prevalent \potential outcomes" in- ACE has a direct causal interpretation. Our prime terpretation of statistical causality (Rubin 1974; Rubin task is to try and identify ACE from data collected 1978), we shall operate within the decision-theoretic under the observational regime, FT = ;. The nat- framework for causal inference (Dawid 2002). This ural observational counterpart of ACE is the \face- aims to identify appropriate assumptions allowing value average causal effect”, FACE := E;(Y j T = transfer of distributional information between various 1)−E;(Y j T = 0). We typically will not have FACE = regimes, comprising an observational regime, whose ACE unless we can assume ignorable treatment assign- properties can be identified from data, and interven- ment, which will often be unreasonable. However, ex- tional regimes, that arise when the treatment is as- ternal considerations may make it relatively easy to signed by external manipulation. We introduce a non- argue that a certain variable X is a sufficient covari- stochastic regime indicator variable FT , with values ate, defined as follows. Appearing in Proceedings of the 13th International Con- Definition 1 A (possibly multivariate) variable X is ference on Artificial Intelligence and Statistics (AISTATS) a covariate (with respect to treatment T ) if: 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: W&CP 9. Copyright 2010 by the authors. Property 1: X ?? FT : 2 281 Sufficient covariates and linear propensity analysis Property 1 requires that the distribution of X be the Lemma 1 Suppose X is a strongly sufficient covari- same in all regimes, whether observational or interven- ate. Then, as distributions for (Y; X; T ), Pt tional. This will typically be appropriate when X is an P; (t = 0; 1). attribute of the unit to which treatment is applied, or of its environment, determined prior to the treatment Proof. Let A be an event for (Y; X; T ). Property 2, decision. expressed equivalently as (Y; X; T ) ?? FT j (X; T ), as- serts that there exists a function w(X; T ) such that Definition 2 X is a sufficient covariate (for the ef- Pf (A j X; T ) = w(X; T ), a.s. for f = 0; 1; ;. If P1 fect of treatment T on outcome Y ) if, in addition to now P;(A) = 0, then, a.s. [P;], t=0 P;(T = Property 1, we have P1 t j X) w(X; t) = t=0 P;(T = t j X)P;(A j X; T = t) = P;(A j X) = 0. By Property 3, for t = 0; 1, Property 2: Y ?? FT j (X; T ): 2 w(X; t) = 0 a.s. [P;] and hence, by Property 1, a.s. [P ]. So P (A) = E fw(X; t)g = 0. 2 Property 2 states, informally, that the conditional dis- t t t tribution of Y , given X and T , is the same in all regimes. Property 2 can also be described as \ignor- Theorem 1 Let X be a strongly sufficient covariate. able treatment assignment, given X" (Rosenbaum and Then for any integrable Z (Y; X; T ), and any ver- Rubin 1983). In any given problem there may be sev- sions of the conditional expectations, eral distinct sufficient covariates, or none at all. In contrast to the case for statistical (Fisher) sufficiency, Et(Z j X) = E;(Z j X; T = t)(t = 0; 1) (3) there need not exist a minimal sufficient covariate. almost surely in any regime. We can take E;(Z j X; T ) A rigorous statement (Dawid 1979a; Dawid 1979b) of or ET (Z j X) as a version of E(Z j X; T ) in all regimes. Property 2 is as follows. Let Z be a function of Y | which we henceforth notate as Z Y | whose expec- Proof. By Property 2 there exists a function tation exists in each regime | which we henceforth w(X; T ) which is a version of Ef (Z j X; T ) for f = denote by \Z is integrable". Then there exists a ran- 0; 1; ;. In particular, E;(Z j X; T ) = w(X; T ) a.s. [P;] dom variable W (X; T ) such that, for each regime and thus, by Lemma 1, a.s. [Pt](t = 0; 1) | thus f = 0; 1; ;, W serves as a version of the conditional we can take E;(Z j X; T ) as the common version of Ef (Z j X; T ) for f = 0; 1; ;. In particular, a.s. [Pt], expectation Ef (Z j X; T ) under the distribution Pf associated with regime f. (Because we focus on ACE, Et(Z j X) = Et(Z j X; T = t) = E;(Z j X; T = t). So we will only need this property for the case Z ≡ Y | (3) holds a.s. [Pt] and thus, by Property 1, a.s. in each assumed integrable.) regime. 2 Properties 1 and 2 can be represented graphically by means of the DAG (influence diagram) of Figure 1.1 Theorem 1 expresses rigorously what we mean by say- ing that the observational conditional distribution of Y , given a strongly sufficient covariate X, for those X happening to receive treatment t, is the same as the interventional conditional distribution of Y given X, FT T Y for those given treatment t. Figure 1: Sufficient covariate 2.1 Back-door formula Let X be a covariate. For many purposes we will also require the follow- ing reasonable positivity condition, requiring that, for Definition 4 The specific causal effect (of T on Y , each possible level of X, both treatments are used in relative to X) is the random variable 2 the observational regime: SCE := E1(Y j X) − E0(Y j X): Definition 3 A variable X is a strongly sufficient co- Then SCE is a function of X, defined almost surely variate if it is a sufficient covariate and, in addition: (under any regime). Where we need to indicate its con- struction from the specific covariate X, we annotate Property 3: For t = 0 or 1, P;(T = t j X) > 0 with SCE as SCEX ; we also write SCE(X) or SCEX (X) to probabilility 1. 2 express SCE as a function of X. Because it is defined 1The dotted arrow indicates a link that disappears un- in terms of interventional regimes, SCE has a direct der an interventional regime: when FT = t, T will have the causal interpretation: SCE(x) is the average causal ef- 1-point distribution at t, independently of X. fect in the subpopulation having X = x. 282 Hui Guo, A. Philip Dawid Theorem 2 For any covariate X, ACE = E(SCEX ) That is, for each applied treatment, once V is (where the expectation may be taken under any known, any further information about X is of no regime). value for predicting Y . (b). Treatment-sufficient reduction: Proof. By Property 1, E;fEt(Y j X)g = EtfEt(Y j X)g = Et(Y ). By subtraction, ACE = T ?? X j (V; FT = ;): (7) Ef (SCEX ) for f = ; and thus, again by Property 1, also for f = 0; 1. 2 That is, in the observational regime, the choice of treatment depends on X only through V . SCEX is typically not identifiable from purely obser- Proof. Assume first Condition (a). By (6), for vational data. However, if X is strongly sufficient, any integrable Z Y there exists a version, w(X; t) by (3) we can also express SCEX as E;(Y j X; T = say, of Et(Z j X) that is a function of V (t = 0; 1).

Sufficient Covariates and Linear Propensity Analysis

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support