<<

Index

A B Absolute standardized difference Bayesian approach, 218 (ASMD), 121–122 BlackBoost, 214 ACEAIPW precision Boosting model, 119–120 known propensity score model arbitrary function, 77 disadvantage, 79 C Monte Carlo computations, 79 cART. See Combined antiretroviral therapies quadratic function minimization, 78 (cART) simulated 100 datasets, 79, 80 Causal inference , 78 counterfactual outcome weighted mean squared error, 79 mediation, treatment effect, 8, 9 known response regression model, 80–81 post-treatment confounders, RCT, 7–8 Acquired immunodeficiency syndrome potential outcome, 5–6 (AIDS), 203 , 5 AdaBoost, 213 selection bias, observational studies, 7 AIDS Clinical Trials Group (ACTG) Study and clinical trials, 4 A5095, 204 ITT, 4 ASMD. See Absolute standardized mean statistical models difference (ASMD) case-control designs, 10 ATE. See Average treatment effect (ATE) causal mediation, 19–20 ATT. See Average treatment effect among the MAR mechanism, 9 treated (ATT) and propensity score Augmented inverse probability weighted matching, 10–11 (AIPW) estimator missing , 9 construction, 75 MSMs, 12–13 correct PM, 74 post-treatment confounders, RCT, correct RRM, 74 13–19 double robustness property, 76 sequential ignorability (SI) and model HT estimator, 74, 75 identification, 20, 21 Average causal mediation effect (ACME), Causal mediation models 20 ACME, 20, 23 Average treatment effect (ATE), 112, 121 causal diagram, 244, 254 Average treatment effect among the treated CDF, , 21 (ATT), 112, 121 direct effect/natural direct effect, 19

© Springer International Publishing Switzerland 2016 315 H. He et al. (eds.), Statistical Causal Inferences and Their Applications in Public Health Research, ICSA Book Series in , DOI 10.1007/978-3-319-41259-7 316 Index

Causal mediation models (cont.) CDE. See Controlled direct effect (CDE) equivalence of different choices, 260 CFI. See Comparative fit index (CFI) estimation of parameters Child Resilience Project (CRP), 219, 234–236 general treatment X; 252–253 Chronic thromboembolic pulmonary maximum likelihood estimation, 250 hypertension (CTEPH), 104 estimation, 250 Cognitive behavioral smoking cessation three-value treatment, 251–252 therapy (CBT), 187 GMM, 243 Combined antiretroviral therapies (cART) identifiability of parameters ACTG A5095, 204, 209–211 continuous mediator M; 248–249 AdaBoost, 213 discrete variable M; 248 BlackBoost, 214 general conditions, 246–248 HIV-1 infected patients, 203 of M; 249–250 methods indirect and direct effects, 241 , 205–207 equation, 258 two-stage designs, 207–209 LSEM, 21, 22 variance estimate, 209 matrix Geff, 261 non-parametric estimator, 212 of estimates, 255 simulation studies, 211–212 mediator–outcome relationship, 242 The Commit to Quit (CTQ) study moderated-mediation model, 255–257 compliance model, 196–197 necessity and sufficiency theorem, 258–259 compliance regions, 200 notation and definitions, 243–246 data, 195–196 OLS estimates, 254 estimated causal effects, 199 OLS regression, 243 maximum likelihood estimates, 198 pure indirect effect, 20 principal effects, 196–197 three linear models, 242 two-stage ML approach, 200 total effect of treatment, 20 The Commit to Quit (CTQ) trials, 187 treatment–outcome relationship, 242 Comparative fit index (CFI), 303 unobserved pre-treatment confounder, 242 Compliance behavior, 13 Causal models Complier average causal effect (CACE), 14 CBT, 187 Controlled direct effect (CDE), 19, 269 continuous measures, 187 Cox regression, 104 CTQ study CRP. See Child Resilience Project (CRP) compliance model, 196–197 CTQ trials. See The Commit to Quit (CTQ) compliance regions, 200 trials data, 195–196 Cumulative distribution function (CDF), 21 estimated causal effects, 199 maximum likelihood estimates, 198 principal effects, 196–197 D two-stage ML approach, 200 Data-adaptive matching score, 117–118 likelihood and inference methods DomEXT Baseline, 234–235 compliance regions, 193–195 Donsker classes, 158–161 contribution, 192 two-stage approach, 192–193 placebo-controlled trials, 188 E principal stratification approach, 188 Empirical processes structural principal effects model average treatment effect, 161–163 compliance distributions, 191–192 Donsker classes, 158–161 ITT effects, 190–191 estimating equation, 157 notation and assumptions, 189–190 motivation and setup, 157–158 Causal relative risk, 176, 177 Estimated propensity variable (EPV), 62, CBT. See Cognitive behavioral smoking 63 cessation therapy (CBT) Estimating equation (EE), 37 CD4 cell, 210–211 Exposure to agents, 7 Index 317

F J Face-value average causal effect (FACE), 52 Jackknife method, 182–183 Fisher’s linear discriminant (LD), 61 Functional response models (FRM), 221 L Latent growth modeling (LGM), 296 G Likelihood and inference methods Generalized boosted model (GBM), 116–117 compliance regions, 193–195 Generalized Linear Structural Equation Models contribution, 192 (GLSEM), 21 two-stage approach, 192–193 Generalized method of moments (GMM), 243 Linear predictor (LP), 60 Genetic Epidemiology Network of Salt Linear SEM (LSEM), 21, 22 Sensitivity (GenSalt) Study LISREL formulation, 299–300 covariate adjustment, 43–44 Logistic regression covariates, 40 model construction, 70–71 outcomes, 39 propensity analysis, custodial sanctions parameter estimations, 40, 41 study, 71–73 pre vs. post score matching, 41–43 propensity score weighting approach, 43 treatment conditions, 39–40 M GMM. See Generalized method of moments Mahalanobis distance, 33 (GMM) Mahalanobis metric matching, 33 Greedy algorithm, 33 Mann-Whitney-Wilcoxon rank sum test, 222 MAR. See Missing at random (MAR) H Marginal structural models (MSMs), 12–13, 274 balancing property of PS/PV, 67 mboost package, 207, 212 matrices, 66 MCAR. See Missing Complete at Random linear discriminant, 67 (MCAR) QD, 67 Mean squared error (MSE), 278 simulations, 68–70 Missing at random (MAR), 9, 31, 302 High-Risk Youth Demonstration Grant Missing Complete at Random (MCAR), 235, Programs, 95 296, 308 Homoscedasticity MMDP. See Monotone missing data patterns asymptotic variance analysis (MMDP) EPV, 62, 63 Moderated-mediation model, 255–257 propensity variable, 62, 63 Monotone missing data patterns (MMDP), 227 sample size, 63 Monte Carlo (MC) cross-validation criteria, variance multiplier of coefficient, 63–64 122–123 model construction, 60–61 Monte Carlo (MC) mean, 278 precision, propensity analysis, 62 Monte Carlo (MC) replications, 212 simulations, 65, 66 MSE. See Mean squared error (MSE) Horvitz-Thompson (HT) estimator, 74, 75 MSMs. See Marginal structural models (MSMs) Multinomial logistic regression (MLR), 115 I Important variables stratification (IVS), 128 Intention to treat (ITT) approach, 4, 218, 219 N Inverse probability of treatment weights Natural direct effects (NDEs), 268–269 (IPTW), 102 Natural indirect effects (NIEs), 268–269 Inverse probability weighiting (IPW), 36, 106, Newton’s method, 170 273, 309 Nonparametric black-box algorithms, 112 ITT approach. See Intention to treat (ITT) Nonparametric curve regression methods, 37 approach Nonparametric , 119 318 Index

Nonparametric models, 147 controlled effects, 274–275 Non-randomized controlled trials (non-RCTs), natural effects, 274 91 principal strata effects, 273–274 Nucleoside reverse-transciptase inhibitor identification, 289 (NRTI), 210–211 controlled effects, 272–274 natural effects, 271–272 principal strata effects, 270–271 O limitations, 289 Observational data, 91 RPM G-estimator, 275, 287 OLS regression. See Ordinary simulation study (OLS) regression conditions, 275 Optimal pair matching (OPM) method data generation, 276–278 advantages, 126 IPWCDE, 286 control-Philadelphia creation MC SD, 286 covariate balance before and after no confounders, 279–282 matching, 130, 132–133 population values, 278–279 PSS illustration, 130, 131 post-T confounder, M and Y, 284 standardized differences, 133 pre-T confounder, M and Y, 281–284 stratification tree, 130, 132 pre-T confounder, T, M, and Y, 285–286 stratification variables and stratification squared MC SD, 278, 280 intervals, 129–130 TSLS IV estimator, 275, 285, 287 tolerance number of subclasses, 129 Principal stratification (PST), 15–16, 218 tolerance size of distance matrix, 129 PROC SYSLIN, 175 covariate balance, 126 Propensity analysis massive obstetric unit closures in balancing score, 58 Philadelphia, 127, 134 logistic regression, 70–73 rank-based Mahalanobis distances, 126, normal linear model (see 127 Heteroscedasticity; R package, 134 Homoscedasticity) stratification tree construction propensity variable, 59 checking matching feasibility, 127–128 PS, 58, 59 checking statistical criteria, 127 Propensity score (PS) estimation checking the number of strata after assessment steps, 97–98 propensity, 128 definition, 92 flowchart, 128, 129 empirical example IVS, 128 approximated Type IV Pearson PSS, 128 distribution, 95, 96 structure of data, 127 composite score, 30-day substance use, (OLS) regression, 243 95 empirical distribution of robustness, 95, 97 P empirical distribution of sensitivity Pearl’s causal framework, 50 indices, 95, 96 PHQ-9, 304 High-Risk Youth Demonstration Grant Potential outcome approaches Programs, 95 causal inference, 265–266 in literature, 105 define mediation effects, 287–288 missing confounder data CDE, 269 balance assessment, 107–108 controlled effects, 266 IPTW, 107 identification assumptions, 269–270 IPW method, 106 natural effects, 266 method selection, 107 NDEs and NIEs, 268–269 missing value indicators, 105–106 principal stratification, 266–268 multiple imputation, 106 estimation, 288 observed distribution of covariates, 106 Index 319

pattern mixture models, 105 continuous treatment patterns of missing covariates, 106 dose-response function, 118 propensity score matching, 107 generalized propensity score, 118 sensitivity analysis, 108 ignorability assumption, 118 unmeasured confounder, 107, 108 inverse probability weight, 118 using complete records only, 105 techniques, 119–120 reference distribution, 98 parametric approaches, 119 robustness, 94 multi-level treatment sensitivity, 93–94 issues, 114 uncontrolled confounders, 92–93 machine learning techniques, 116–118 Propensity score (PS) evaluation MLR, 115 by checking balance, 121–122 multinomial probit model, 114 method selection, 120 nested logit model, 114 two-stage procedure, 122–123 nonparametric algorithms, 114 Propensity Score (PS) matching, 11 notations, 114 Propensity score (PS) methods Propensity scores stratification (PSS), 128 causal inference Pseudo-isolation condition, 9 covariate adjustment, 37–39 PS method. See Principal Stratification (PS) matching, 32–34 method stratification/subclassification, 34–36 Pulmonary endarterectomy (PEA), 104 weighting, 36–37 causal inferences, 92 counterfactual outcome framework, 30 Q definition, 31–32, 105 Quadratic discriminant (QD), 67 distribution of measured confounders, 105 GenSalt study R covariate adjustment, 43–44 Random forests (RF) model, 117–118 covariates, 40 Randomized clinical trials outcomes, 39 controlling potential , 101 parameter estimations, 40, 41 controlling time-varying confounders, 102 pre vs. post score matching, 41–43 long-term safety, biologics treatment propensity score weighting approach, in rheumatoid arthritis patients, 43 103–104 treatment conditions, 39–40 long-term survival after PEA surgery, 104 marginal structural models, 105 methodological issues, 102 propensity score estimation (see Propensity Randomized control trials (RCT), 218 score estimation) independent treatment assignment, 6 randomization, 30 population-level estimation, 5 SAS program codes, 45–46 post-treatment confounders selection bias, 29, 30 instrumental variable estimate, 13–15 selection bias reduction, 91 ITT, 7, 8, 13 virtual randomization, 105 PST, 15–16 Propensity score (PS) modeling regression, 7 binary treatment structural mean models, 16–19 ATE, 112 Rank-based Mahalanobis distances, 126, 127 ATT, 112 Rank preserving model (RPM), 274 linear discriminant analysis, 112 RCT. See Randomized control trials (RCT) logistic regression, 112 Response-sufficient reduction, 56, 57 machine learning techniques, 112–113 Root Mean Square Error of Approximation notations, 111–112 (RMSEA), 303 parametric approaches, 112 Root-n rates, 163–164 probit regression modeling, 112 RPM. See Rank preserving model (RPM) via balancing covariates, 113–114 RPM G-estimator, 275, 287 320 Index

Rubin’s causal model (RCM), 5–6, 51. See doubleb robustness also Rubin’s potential response ACEAIPW precision, 77–81 framework AIPW estimator, 74–76 Rubin’s potential response framework, 50 parametric models, 76–77 pair matching artificial twins, 126 S OPM method (see Optimal pair Sample means, 6 matching (OPM) method) School-based water, sanitation, and hygiene propensity analysis (WASH) study, 170 balancing score, 58 SD. See (SD) logistic regression, 70–73 SEM. See Structural equation modeling (SEM) normal linear model (see Semiparametric models, 147 Heteroscedasticity; Semiparametric theory Homoscedasticity) average treatment effect, 153–154 propensity variable, 59 classical maximum likelihood approaches, PS, 58, 59 142 R code of simulations and data analysis, counterfactual questions, 141 82–88 data-generating process, 142 standard statistical modelling, 125–126 efficiency benchmarks, 143 , 146–147 full vs. observed data influence functions, Strongly ignorable treatment assignment, 31, 154–156 52 influence functions, 143, 148–150 Structural equation modeling (SEM), 8, 9 nonparametricmodels, 147 advantages, 296 para-metric assumptions, 147–148 ANOVA, 297 semiparametric models, 147 applications, 303–305 setup causal networks, 297 identification, 145–146 covariance between variables, 297 the target parameter, 143–144 definition, 297 statistical model, 146–147 , 297 tangent spaces, 150–152 FRM-based distribution-free SEM SFRM. See Structural functional response approach models (SFRM) child resilience, 310–312 SMM. See Structural mean model (SMM) MAR assumption, 305, 309 SNMs. See Structural nested models (SNMs) MCAR assumption, 308 Specific causal effect (SCE), 55–56 ML-based methods, 306–307 SRMR. See Standardized Root Mean Square Newton Raphson algorithm, 308 Residual (SRMR) pseudo-isolation assumption, 308 Standard deviation (SD), 278 revised mediation model, 307–308 Standardized Root Mean Square Residual wald and score tests, 308 (SRMR), 303 LGM framework, 296 Standard models, 38 limitations, 303 Statistical causal inference linear regression, 297 ACE identification LISREL formulation, 299–300 dimension reduction, strongly sufficient measurement error, 297 covariate, 56–57 mediation model, 300–302 SCE, 55–56 Model fit, 302–303 strongly sufficient covariate, 52–55 path diagram, 298–299 causal effect, definition, 51 Structural functional response models (SFRM) causal statements, 126 causal inference counterfactual, 126 average/population-level, 220 Dawid’s decision theoretic framework, counterfactual outcome, 220 50–51 potential outcome, 220 distributions of covariates, 126 CRP, 219, 234–236 Index 321

FRM, 221 ordinary SNMs, 170 inference, 225–226 potential outcomes, 171 ITT analyses, 219 simulation study ITT approach, 218 jackknife method, 182–183 longitudinal data, 226–228 joint distribution, 179–181 Mann-Whitney-Wilcoxon rank sum test, linear SNM, 179 222 logistic SNM, 179 multi-layered interventions, 228–230 loglinear SNM, 178, 180–181 post-treatment confounders, 223–225 t-test, 182 PS method, 218 time-to-event outcomes, 170 RCTs, 218 WASH intervention, 183–184 selection bias, 222–223 WASH study, 170–171 simulation studies weighted SNMs, 170 cross-sectional data scenario, 230 Structural principal effects model Model I, 231–232 compliance distributions, 191–192 Model II, 233–234 ITT effects, 190–191 Monte Carlo (MC) sample, 230 notation and assumptions, 189–190 SMM, 236 Substance Abuse and Mental Health Services Structural mean model (SMM), 236 Administration, 95 active treatments, 16 compliance explainable condition, 18 compliance non-selective assumption, 17 T linear function, 18 TLI. See Tucker–Lewis Index (TLI) medication vs. placebo study, 16 Treatment-sufficient reduction, 56, 57 psychosocial intervention studies, 18–19 TSLS. See Two-stage least-squares (TSLS) Structural nestedmean models (SMM), 218 TSLS IV estimator, 275, 285, 287 Structural nested models (SNMs) Tucker–Lewis Index (TLI), 303 assumptions, 172–173 Two-stage least-squares (TSLS), 271 causal relative risk, 172 double-logistic structural mean model, 170 estimation methodology W construct confidence intervals, 177–178 Wald and score tests, 308 ordinary SNM approach, 173–176 WASH study. See School-based water, weighted SNM approach, 176–177 sanitation, and hygiene (WASH) instrumental variables, 169–170 study instrumental variables software, 184 Weighted generalized Newton’s method, 170, 184 (WGEE), 227