Targeted Estimation of Nuisance Parameters to Obtain Valid Statistical Inference
Total Page:16
File Type:pdf, Size:1020Kb
doi 10.1515/ijb-2012-0038 The International Journal of Biostatistics 2014; 10(1): 29–57 Research Article Mark J. van der Laan* Targeted Estimation of Nuisance Parameters to Obtain Valid Statistical Inference Abstract: In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distrib- uted copies of a random variable consisting of baseline covariates, a subsequently assigned binary treat- ment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score. Keywords: asymptotic linearity; cross-validation; efficient influence curve; influence curve; targeted mini- mum loss based estimation *Corresponding author: Mark J. van der Laan, Division of Biostatistics, University of California – Berkeley, Berkeley, CA, USA, E-mail: [email protected] 1 Introduction and overview This introduction provides an atlas for the contents of this article. It starts with formulating the role of estimation of nuisance parameters to obtain asymptotically linear estimators of a target parameter of interest. This demonstrates the need to target this estimator of the nuisance parameter in order to make the estimator of the target parameter asymptotically linear when the model for the nuisance parameter is large. The general approach to obtain such a targeted estimator of the nuisance parameter is described. Subsequently, we present our concrete example to which we will apply this general method for targeted estimation of the nuisance parameter, and for which we establish a number of formal theorems. Finally, we discuss the link to previous articles that concerned some kind of targeting of the estimator of the nuisance parameter, and we provide an organization of the remainder of the article. 30 M. J. van der Laan: Targeted Estimation of Nuisance Parameters 1.1 The role of nuisance parameter estimation Suppose we observe n independent and identically distributed copies of a random variable O with prob- ability distribution P0. In addition, assume that it is known that P0 is an element of a statistical model M ψ Ψ Ψ : and that we want to estimate 0 ¼ ðP0Þ for a given target parameter mapping M 7! IR. In order to guarantee that P0 2M one is forced to only incorporate real knowledge, and, as a consequence, such models M are always very large and, in particular, are infinite dimensional. We assume that the target parameter mapping is path-wise differentiable and let DÃðPÞ denote the canonical gradient of the path-wise Ψ ψ Ψ^ Ψ^ derivative of at P 2M[1]. An estimator n ¼ ðPnÞ is a functional applied to the empirical distribution ^ Pn of O1; ...; On and can thus be represented as a mapping Ψ : MNP 7! IR from the non-parametric ^ statistical model MNP into the real line. An estimator Ψ is efficient if and only if it is asymptotically linear à with influence curve D ðP0Þ: 1 Xn pffiffiffi ψ À ψ ¼ DÃðP ÞðO Þþo ð1= nÞ: n 0 n 0 i P i¼1 à The empirical mean of the influence curve D ðP0Þ represents the first-order linear approximation of the estimator as a functional of the empirical distribution, and the derivation of the influence curve is a by-product of the application of the so-called functional delta-method for statistical inference based on functionals (i.e. Ψ^) of the empirical distribution [2–4]. Suppose that ΨðPÞ only depends on P through a parameter QðPÞ and that the canonical gradient depends on P only through QðPÞ and a nuisance parameter gðPÞ. The construction of an efficient estimator requires the construction of estimators Qn and gn of these nuisance parameters Q0 and g0, respectively. Targeted minimum loss-based estimation (TMLE) represents a method for construction of (e.g. efficient) Ψ Ã Ã asymptotically linear substitution estimators ðQnÞ, where Qn is a targeted update of Qn that relies on the estimator gn [5–7]. The targeting of Qn is achieved by specifying a parametric submodel : : fQnð Þ gfQðPÞ P 2MgÐ through the initial estimator Qn and a loss function O 7! LðQÞðOÞ for Q arg min P L Q ; L Q o dP o , so that the generalized score d L Q spans a desired user- 0 ¼ Q 0 ð Þ ð Þð Þ 0ð Þ d ð nð ÞÞ ¼0 supplied estimating function DðQn; gnÞ. In addition, one may decide to target gn by specifying a parametric submodel fg ð Þ : gfgðPÞ : P 2Mg and loss function O7!LðgÞðOÞ for g ¼ arg min P LðgÞ, so that n 1 1 0 g 0 d the generalized score Lðgnð1ÞÞ spans another desired estimating function D1ðgn; η Þ for some d1 n 1¼0 η η estimator n of nuisance parameter . The parameter is fitted with MLE n ¼ arg min PnLðQnð ÞÞ, provid- 1 ing the first-step update Qn ¼ Qnð nÞ, and similarly 1;n ¼ arg min1 PnLðgnð 1ÞÞ. This updating process that ; 1 ; 1 mapped a current fit ðQn gnÞ into an update ðQn gnÞ is iterated till convergence at which point the TMLE Ã; à Ã; à ðQn gnÞ solves PnDðQn gnÞ¼0, i.e. the empirical mean of the estimating function equals zero at the final Ã; à Ã; η TMLE ðQn gnÞ. If one also targeted gn, then it also solves PnD1ðgn nÞ¼0. The submodel through Qn will η depend on gn, while the submodel through gn will depend on another nuisance parameter n. By setting DðQ; gÞ equal to the efficient influence curve DÃðQ; gÞ, the resulting TMLE solves the efficient influence curve à Ã; ; estimating equation PnD ðQn gnÞ¼0 and thereby will be asymptotically efficient when ðQn gnÞ is consistent for ðQ0; g0Þ under appropriate regularity conditions, where the targeting of gn is not needed. The latter is shown as follows. By the property of the canonical gradient (in fact, any gradient) we have Ψ Ã Ψ Ã Ã; Ã; ; ; ðQnÞ ðQ0Þ¼P0D ðQn gnÞþRnðQn Q0 gn g0Þ, where Rn involves integrals of second-order products à à Ã; of the differences ðQn À Q0Þ and ðgn À g0Þ. Combined with PnD ðQn gnÞ¼0, this implies the following identity: Ψ Ã Ψ Ã Ã; Ã; ; ; : ðQnÞ ðQ0Þ¼ðPn À P0ÞD ðQn gnÞþRnðQn Q0 gn g0Þ The first term is an empirical process term that, under empirical process conditions (mentioned below), pffiffiffi à ; ; Ã; = equals ðPn À P0ÞD ðQ gÞ, where ðQ gÞ denotes the limit of ðQn gnÞ, plus an oPð1 nÞ-term. This then yields pffiffiffi Ψ Ã Ψ Ã ; Ã; ; ; = : ðQnÞ ðQ0Þ¼ðPn À P0ÞD ðQ gÞþRnðQn Q0 gn g0ÞþoPð1 nÞ M. J. van der Laan: Targeted Estimation of Nuisance Parameters 31 pffiffiffi Ψ Ã = To obtain the desired asymptotic linearity of ðQnÞ one needs Rn ¼ oPð1 nÞ, which in general requires at minimal that both nuisance parameters are consistently estimated: Q ¼ Q0 and g ¼ g0. However, in many à problems of interest, Rn only involves a cross-product of the differences Qn À Q0 and gn À g0, so that Rn à converges to zero if either Qn is consistent or gn is consistent: i.e. Q ¼ Q0 or g ¼ g0. In this latter case, the TMLE is so-called double robust. Either way, the consistency of the TMLE relies now on one of the nuisance parameter estimators being consistent, thereby requiring the use of non-parametric adaptive estimation such as super-learning [8–10] for at least one of the nuisance parameters. If only one of the nuisance parameter estimators is consistent, and we are in the double robust scenario, then it follows that the bias is of the same order as the bias of the consistent nuisance parameter estimator. However, if the nuisance parameter estimator is not based on a correctly specified parametric model, but instead is a data-adaptive pffiffiffi pffiffiffi estimator, then this bias will be converging to zero at a rate slower than 1= n: i.e. nRn converges to infinity as n 7!1. Thus, in that case, the estimator of the target parameter may thus be overly biased and thereby will not be asymptotically linear.