<<

Targeted Maximum Likelihood Estimation in Safety Analysis

Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1

1UC Berkeley

2Kaiser Permanente

ISPE Advanced Topics Session, Barcelona, August 2012

1 / 35 Outline

1 Introduction

2 Super learning

3 TMLE and collaborative TMLE

4 Kaiser Permanente example

5 Simulations based on KP data

2 / 35 Outline

1 Introduction

2 Super learning

3 TMLE and collaborative TMLE

4 Kaiser Permanente data example

5 Simulations based on KP data

3 / 35 Traditional approach in and clinical medicine

• Fit several parametric models, and select a favorite one. • Report point estimate of coefficient in front of treatment, confidence intervals, and p-value, as if this parametric model was a priori-specified. • Problems • Parametric model is misspecified, but estimates are interpreted as if the model is correct • Estimates of do not account for , so confidence intervals and p-values are wrong, even if the final model is somehow correct!

4 / 35 The statistical estimation problem

• Observed data: Realizations of random variables with a . • : Set of possible distributions for the data-generating distribution, defined by actual knowledge about the data. e.g. in an RCT, we know the probability of each subject receiving treatment. • Statistical target parameter: Function of the data-generating distribution that we wish to learn from the data. • : An a priori-specified algorithm that takes the observed data and returns an estimate of the target parameter. Benchmarked by a dissimilarity-measure (e.g., MSE) w.r.t target parameter.

5 / 35 Causal inference

• Non-testable assumptions in addition to the assumptions defining the statistical model. (e.g. the “no unmeasured confounders” assumption). • Allows for causal interpretation of statistical parameter estimates • Even if we don’t believe the non-testable causal assumptions, the statistical estimation problem is still the same, and estimates still have valid statistical interpretations.

6 / 35 Targeted learning

• Define true statistical models, and interesting target • Avoid reliance on human art and nonrealistic parametric models • Target the fit of the data-generating distribution to the parameter of interest • • Has been applied to: static or dynamic treatments, direct and indirect effects, parameters of MSMs, variable importance analysis, longitudinal/repeated measures data with time-dependent , /missingness, case-control studies, RCTs

7 / 35 Two stage estimation methodology

• Super learning (SL) (van der Laan et al. 2007) • Uses a library of candidate (e.g. multiple parametric models, machine learning algorithms like neural networks, RandomForest, etc.) • Builds data-adaptive weighted combination of estimators using cross validation • Targeted maximum likelihood estimation (TMLE) (van der Laan and Rubin 2006) • Updates initial estimate, often a Super Learner, to remove bias for the parameter of interest • Calculates final parameter from updated fit of the data-generating distribution

8 / 35 Outline

1 Introduction

2 Super learning

3 TMLE and collaborative TMLE

4 Kaiser Permanente data example

5 Simulations based on KP data

9 / 35 Super learning

• No need to chose a priori a particular parametric model or machine learning algorithm for a particular problem • Allows one to combine many data-adaptive estimators into one improved estimator. • Grounded by oracle results for loss-function based cross-validation (Van Der Laan and Dudoit 2003). needs to be bounded. • Performs asymptotically as well as best (oracle) weighted combination, or achieves parametric rate of convergence.

10 / 35 Super learning

Figure: Relative Cross-Validated Squared Error (compared to main terms regression)

11 / 35 Super learning

12 / 35 Outline

1 Introduction

2 Super learning

3 TMLE and collaborative TMLE

4 Kaiser Permanente data example

5 Simulations based on KP data

13 / 35 TMLE algorithm

14 / 35 Targeted MLE

1 Identify the least favorable parametric model for fluctuating initial Pˆ – Small “fluctuation” → maximum change in target. 2 Identify optimum amount of fluctuation by MLE. 3 Apply optimal fluctuation to Pˆ → 1st-step targeted maximum likelihood estimator. 4 Repeat until the incremental “fluctuation" is zero – Some important cases: 1 step to convergence. 5 Final probability distribution solves efficient equation for target parameter

→ T-MLE is a double robust & locally efficient plug-in estimator

15 / 35 Collaborative TMLE (CTMLE) algorithm

• Like TMLE, but chooses an estimate gˆ of the treatment mechanism/propensity score based on how well it helps estimate Ψ(Q0) instead of how well it estimates the true g0.

• Build estimate for g0 in a stepwise fashion • Strongest confounders are adjusted for first • Instrumental variables and weak confounders tend to be excluded • Order of terms added to gˆ is chosen via a penalized log likelihood, and number of terms is chosen via cross-validation

16 / 35 Kang and Schafer (2007) simulations

• Outcome Y continuous subject to missingness, and 4 covariates, W1, W2, W3, W4 • True population mean (target parameter) is 210, mean among the non-missing is 200.

• Positivity violations g0(∆ = 1 | W ) as small as 0.01

• Modification 1: stronger positivity violations, g0(∆ = 1 | W ) as small as 1.1 × 10−5 • Modification 2: same as 1, but one covariate is no longer affects Y , so it is an instrumental variable.

17 / 35 Kang and Schafer (2007) simulations

Kang and Schafer Simulation 10 5 0 −5 −10 OLS WLS TMLE A−IPCW C−TMLE

18 / 35 Kang and Schafer (2007) simulations

Modification 1 to Kang and Schafer Simulation 40 20

● 0

● ●

−20 ● ●

● −40 ● ● OLS WLS TMLE A−IPCW C−TMLE

19 / 35 Kang and Schafer (2007) simulations

Modification 2 to Kang and Schafer Simulation 40 20

● 0

● −20

● ● ●

−40 ● ● OLS WLS TMLE A−IPCW C−TMLE

20 / 35 Outline

1 Introduction

2 Super learning

3 TMLE and collaborative TMLE

4 Kaiser Permanente data example

5 Simulations based on KP data

21 / 35 Description of dataset

• A subset of data from Kaiser Permanente, part of which is used in FDA’s Mini-Sentinel drug safety surveillance. • Population: diabetic patients without prior cardiovascular disease who are new users of pioglitazone or a sulfonylurea (two anti-diabetic drugs) and who are followed up for at least 6 months without also starting the other drug.1 • Treatment arm (in this example): pioglitazone (Treatment variable A = 1) • Comparator: sulfonylurea (A = 0) • Outcome (Y ): acute myocardial infarction (AMI) in first 6 months of new anti-diabetic drug use. • Baseline covariates (W ): fifty covariates including demographics, comorbidities, and other drug use.

1We found that adjusting for missing outcomes had no effect on the results in this case so we suppress those results and ignore missingness in this example. 22 / 35 Causal model, counterfactual outcomes, and parameter of interest • Non-parametric structural equation model: Each variable is an unknown deterministic function of the past and an error. • W = fW (UW ) • A = fA(W , UA) • Y = fY (A, W , UY ) • Counterfactual outcomes: substitute a fixed treatment for A in fY : Ya = fY (W , a, UY ) for a ∈ {0, 1}. • Causal parameter of interest: The average treatment effect (ATE). E(Y1 − Y0) • Statistical parameter of interest:

Ψ(P0) = E[E(Y | A = 1, W ) − E(Y | A = 0, W )]

equals E(Y1 − Y0) under assumption (“no unmeasured confounders”) and positivity assumption 23 / 35 Analysis results

• Summary of outcome by treatment Treatment Comparator Total Total 2146 25022 27168 AMI 5 (0.233%) 86 (0.3437%) 91 (0.335%) • Estimates Estimate p-value Unadjusted −0.0011 0.3943 G-comp −0.0007 0.6134 PS −0.0013 0.4512 IPTW −0.0005 0.7476 AIPTW −0.0003 0.8585 TMLE −0.0004 0.8042 • Though size is large, there are so few AMIs in this subset of data from Kaiser Permanente that it is hard to tell if adjustment for potential confounders is important.

24 / 35 Outline

1 Introduction

2 Super learning

3 TMLE and collaborative TMLE

4 Kaiser Permanente data example

5 Simulations based on KP data

25 / 35 Strategy

Simulate datasets based on real study data where the true effect is known to highlight properties of estimators. • Start with KP , including additional new users of three other anti-diabetic drugs. • Sample W with replacement from empirical distribution of baseline covariates • Simulate treatment A assignments based on a known function of baseline covariates • Simulate outcome Y based on a function of W adjusted so that Y is not too rare. • Because the Y is simulated based on a function of only baseline covariates and not the treatment, the true average treatment effect is known to be zero.

26 / 35 Simulation 1

• Treatment mechanism a function of 12 covariates strongly predictive of the outcome. • Outcome and propensity score models known and can be correctly specified. • Outcome and propensity score models are misspecified by leaving out half of the important confounders. • Results demonstrate the double-robustness of TMLE and AIPTW: when either the model for the outcome regression or the PS is specified correctly, the parameter estimate is consistent, which is not the case for the G-computation estimator or IPTW.

27 / 35 Simulation 1

Estimator Bias MSE n=1000 n=5000 n=1000 n=5000 Unadjusted 0.0584 0.0575 0.0038 0.0034 G-comp 0.0015 0.0000 0.0003 0.0001 PSM 0.0012 0.0003 0.0006 0.0001 IPTW 0.0017 0.0002 0.0005 0.0001 AIPTW 0.0013 0.0002 0.0004 0.0001 TMLE 0.0014 0.0002 0.0004 0.0001 G-comp, misspecified 0.0183 0.0168 0.0007 0.0004 PSM, misspecified 0.0179 0.0167 0.0008 0.0004 IPTW, misspecified 0.0180 0.0166 0.0007 0.0004 AIPTW, Outcome misspecified 0.0016 0.0002 0.0004 0.0001 AIPTW, PS misspecified 0.0014 0.0001 0.0004 0.0001 TMLE, Outcome misspecified 0.0015 0.0002 0.0004 0.0001 TMLE, PS misspecified 0.0015 0.0001 0.0004 0.0001

28 / 35 Simulation 2

• Treatment mechanism now depends on a covariate that is very predictive of treatment, resulting in positivity violations, but is not a confounder. • Results illustrate that IPTW has much higher variance than other estimators, particularly in small samples, and that CTMLE is very robust to violations of the positivity assumption, particularly in small samples.

29 / 35 Simulation 2

Estimator Bias MSE n=100 n=500 n=100 n=500 Unadjusted 0.1624 0.1663 0.0499 0.0319 G-comp 0.0171 0.0062 0.0308 0.0049 PSM 0.0141 0.0116 0.1090 0.0240 IPTW2 0.0108 0.0093 1.4677 0.0227 AIPTW 0.0206 −0.0004 0.9779 0.0193 TMLE 0.0149 0.0046 0.0835 0.0120 CTMLE 0.0064 −0.0096 0.0540 0.0077

2Some estimates are out of the parameter space (> 1) due to very large weights, resulting in the high variance. 30 / 35 Simulation 3

• Treatment mechanism depends on the interactions between binary covariates. • Main terms logistic regression for the PS is not sufficient to account for all confounding. • Results demonstrate that data adaptive SuperLearning is necessary to estimate the PS well enough to adjust for confounding.

31 / 35 Simulation 3

Estimator Bias MSE n=1000 n=5000 n=1000 n=5000 Unadjusted −0.04816 −0.05050 0.00312 0.00270 PSM, PS main terms only 0.00330 0.00041 0.00172 0.00032 IPTW, PS main terms only 0.05930 0.05575 0.00494 0.00335 AIPTW, PS main terms only 0.02672 0.02343 0.00193 0.00076 TMLE, PS main terms only 0.01197 0.00958 0.00113 0.00027 PSM, PS SuperLearner 0.00270 −0.00003 0.00189 0.00033 IPTW, PS SuperLearner −0.00192 −0.00038 0.00173 0.00033 AIPTW, PS SuperLearner −0.00062 −0.00033 0.00160 0.00031 TMLE, PS SuperLearner 0.00292 0.00033 0.00166 0.00031

Here the outcome regression in TMLE and AIPTW is unadjusted to emphasize the benefits of SuperLearning for the PS.

32 / 35 Further Materials

Targeted Learning Book Springer Series in Statistics van der laan & Rose targetedlearningbook.com

33 / 35 References I

J. Kang and J. Schafer. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4):523–539, 2007. M. Van Der Laan and S. Dudoit. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. UC Berkeley Division of Working Paper Series, page 130, 2003. M. J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York, 2011. ISBN 1441997814.

34 / 35 References II

M. J. van der Laan and D. Rubin. Targeted Maximum Likelihood Learning. The International Journal of Biostatistics, 2(1), Jan. 2006. ISSN 1557-4679. doi: 10.2202/1557-4679.1043. M. J. van der Laan, E. C. Polley, and A. E. Hubbard. Super learner. Statistical applications in genetics and molecular biology, 6(1), Jan. 2007. ISSN 1544-6115. doi: 10.2202/1544-6115.1309.

35 / 35