Propensity Score Analysis with Hierarchical Data
Fan Li Alan Zaslavsky Mary Beth Landrum
Department of Health Care Policy Harvard Medical School
May 19, 2008 Introduction
I Population-based observational studies are increasingly important sources for estimating treatment effects.
I Proper adjustment for differences between treatment groups is crucial to valid comparison and causal inference.
I Regression has long been the standard method.
I Propensity score (Rosenbaum and Rubin, 1983) is a robust alternative to regression for adjusting for observed differences. Hierarchically structured data
I Propensity score has been developed and applied in cross-sectional settings with unstructured data.
I Data in medical care and health policy research are often hierarchically structured.
I Subjects are grouped in natural clusters, e.g., geographical area, hospitals, health service provider, etc.
I Significant within- and between-cluster variations are often the case. Hierarchically structured data
I Ignoring cluster structure often leads to invalid inference.
I Standard errors could be underestimated. I Cluster level effect could be confounded with individual level effect.
I Hierarchical regression models provide a unified framework to study clustered data.
I Propensity score methods for hierarchical data have been less explored. Propensity score
I Propensity score: e(x) = P(z = 1|x).
I Balancing on propensity score also balances the covariates of different treatment groups: z ⊥ x|e(x).
I Two steps procedure.
I Step 1: estimate the propensity score, e.g., by logistic regression. I Step 2: estimate the treatment effect by incorporating (e.g., weighting, stratification) the estimated propensity score.
I We will introduce and compare several possible estimators of treatment effect using propensity score in context of hierarchical data.
I We will investigate the large sample behavior of each estimator. Notation
I h cluster; k individual.
I m no. of clusters; nh no. of subjects in cluster h.
I zhk binary treatment assignment - individual level.
I xhk individual level covariates; vh cluster level covariates.
I ehk propensity score.
I yhk outcome.
I Estimand: “treatment effect” ∆ = Ex [E(Y |X, Z = 1)] − Ex [E(Y |X, Z = 0)].
I Note: ∆ does not necessarily have a causal interpretation, it is the difference of the average of a outcome between two populations controlling for covariates. Step 1: Marginal model
I Marginal analysis ignores clustering.
I Marginal propensity score model ehk e e log = β xhk + κ vh, 1 − ehk
where ehk = P(zhk = 1 | xhk , vh).
I If treatment assignment mechanism (TAM) follows above
(xhk , vh) ⊥ zhk | ehk . Step 1: Pooled within-cluster model
I Pooled within-cluster model for propensity score (ehk = P(zhk = 1 | xhk , h))
ehk e e log( ) = δh + β xhk , 1 − ehk
e e where δh is a cluster-level main effect, δh ∼ N(0, ∞). I General (weaker) assumption of TAM than marginal model:
(xhk , h) ⊥ zhk | ehk .
e 2 I Assuming δh ∼ N(0, σδ ) gives a similar random effects model. Step 1: Surrogate indicator model
P h zhk I Define d = = cluster-specific proportion of being h nh treated.
I Propensity score model
ehk dh e e log( ) = λ log( ) + β xhk + κ vh. 1 − ehk 1 − dh
I logit(dh) is “surrogate” for the cluster indicator with the coefficient being around 1.
I Analytic model is same as the marginal analysis with proportion treated dh as additional variable.
I Greatly reduce computation, but based on strong linear assumption. Step 2: Estimate “treatment effect”
Estimate treatment effect using propensity score.
I Weighting - weight as function of propensity score.
I Stratification.
I Matching.
I Regression using propensity score as a predictor. Marginal weighted estimator - ignore cluster structure
I wh1(wh0: sum of whk with z = 1(z = 0) in cluster h. P P I w1 = h wh1, w0 = h wh0, w = w1 + w0. I Marginal weighted estimator - difference of weighted mean
zhk =1 zhk =0 ˆ X whk X whk ∆.,marg = yhk − yhk . w1 w0 h,k h,k
I Large sample variance under homoscedasticity of yhk 2 ˆ s.,marg = var(∆.,marg) z =1 z =0 Xhk w 2 Xhk w 2 = σ2( hk + hk ). w 2 w 2 h,k 1 h,k 0
2 I In practice, σ estimated from sample variance of yhk . Clustered weighted estimator
I Cluster-specific weighted estimator
zhk =1 zhk =0 ˆ X whk X whk ∆h = yhk − yhk . wh1 wh0 k∈h k∈h
I The overall clustered estimator X w ∆ˆ = h ∆ˆ . .,clu w h h Clustered weighted estimator
ˆ I Variance of ∆h under within-cluster homoscedasticity
zhk =1 2 zhk =0 2 2 ˆ 2 X whk X whk s = var(∆h) = σ ( + ). h h w 2 w 2 k∈h h1 k∈h h0
I Overall variance
X w 2 s2 = var(∆ˆ ) = h s2. .,clu .,clu w 2 h h
I Standard error can also be obtained using bootstrap. Doubly-robust estimators (Scharfstein et al., 1999)
I Weighted mean can be viewed as a weighted regression without covariates.
I In step 2, replace the weighted mean by a weighted regression.
I Estimator is consistent if either or both of step 1 and 2 models are correctly specified.
I Numerous combination of regression models in two steps. Choice of weight
I Horvitz-Thompson (inverse probability) weight
( 1 , for zhk = 1 w = ehk hk 1 , for z = 0. 1−ehk hk
I Balance covariates distribution between two groups: XZ X(1 − Z ) E = E . e(X) 1 − e(X)
I H-T estimator compares the counterfactual scenario: all subjects placed in trt=0 vs. all subjects placed in trt=1. YZ Y (1 − Z) E − = E[(Y |Z = 1) − (Y |Z = 0)]. e(X) 1 − e(X)
I H-T has large variance if e(X) approaches 0 or 1. Choice of weight
I Population-overlap weight 1 − ehk , for zhk = 1 whk = ehk , for zhk = 0.
I Each subject is weighted by the probability of being assigned to the other trt group.
I Balance covariates distribution between two groups:
E{XZ[1 − e(X)]} = E[X(1 − Z )e(X)].
I Small variance, different estimand.
E{YZ[1 − e(X)] − Y (1 − Z )e(X)} = E{[(Y |Z = 1) − (Y |Z = 0)]e(X)[1 − e(X)]}. Bias of Estimators
I Focus on the simplest case with two-level hierarchical structure and no covariates.
I nh1(nh0): no. of subjects with z = 1(z = 0) in cluster h. P P I n1 = h nh1, n0 = h nh0, n = n1 + n0. I Assume outcome generating mechanism is:
yhk = δh + γhzhk + αdh + hk , (1)
2 2 where δh ∼ N(0, σδ ), hk ∼ N(0, σ ), and the true treatment 2 effect: γh ∼ N(γ0, σγ). Bias of Marginal Estimator
ˆ n1 I For marginal model in step 1, ehk = n , ∀h, k. I The marginal estimator is ˆ ∆marg,marg
zhk =1 zhk =0 X yhk X yhk = − n1 n0 h,k h,k
zhk =1 zhk =0 X nh1 X nh1 nh0 X hk X hk = γh + ( − )δh + ( − ) n1 n1 n0 n1 n0 h h h,k h,k n P − nhdh(1 − dh) n1n0 h +α n n1n0 Bias of Marginal Estimator
I By WLLN of weighted sum of i.i.d. random variables 2 P∞ nh1 (assuming h 2 < ∞): n1
X nh1 nh,m→∞ γh → γ0. n1 h
I Similarly the middle two parts go to 0 as nh, m → ∞ . n I = var(n ): variance of total no. of treated, if all n1n0 1 n1 clusters follow the same TAM, z ∼ Bernoulli( n ). P P I h nhdh(1 − dh) = h var(nh1): sum of variance of no. of treated within each cluster, if each cluster follows a nh1 separate TAM: z ∈ ∼ Bernoulli( ). k h nh Bias of Marginal Estimator
I Exact form of bias P var(n1) − h var(nh1) Bias(∆ˆ marg,marg) = α . (2) var(n1)
I Controlled by two factors: (1) variance ratio - treatment assignment mechanism; (2) |α| - outcome generating mechanism.
I Both are ignored by the marginal estimator ∆ˆ marg,marg. Bias of Clustered Estimator
nh1 I For pooled within-cluster model in step 1, eˆ = , k ∈ h. hk nh I The clustered estimator with p.s. estimated from pooled ˆ within-cluster model ∆pool,clu is consistent ˆ ∆pool,clu P (Pzhk =1 yhk ) P (Pzhk =0 yhk ) = h k∈h nh1 − h k∈h nh0 m m P Pzhk =1 hk P Pzhk =0 hk P γ ( ) ( ) = h h + h k∈h nh1 − h k∈h nh0 m m m nh,m→∞ → γ0 (3)
I This result is free of type of weights. Bias of Clustered Estimator
I Clustered estimator with p.s. estimated from marginal ˆ model, ∆marg,clu, exactly follows (3), thus consistent.
I Marginal estimator with p.s. estimated from pooled ˆ within-cluster model, ∆pool,marg, also consistent.
I But different small sample behavior between H-T and population-overlap weights. Extensions
I Without covariates, surrogate indicator model gives the estimated p.s. as pooled within-cluster model.
I Above results regarding pooled within-cluster model automatically hold for surrogate indicator model.
I Proofs are analogous for data with higher order of hierarchical level. Double-robustness
I For the simplest case without covariates, we show “double-robustness” of the p.s. estimators:
I When both of the true underlying treatment assignment mechanism and outcome generating mechanism are hierarchically structured: I Estimators using a balancing weight are consistent as if hierarchical structure is taken into account in at least one of the two steps in the p.s. procedure.
I A special case of Scharfstein et al. (1999), but free of form of weights. Cases with covariates
I No closed-form solution to p.s. models, thus no closed-form of the bias of those estimators.
I Can be explored by (1) large-scale simulations; or (2) adopting a probit (instead of logistic) link for estimating p.s.
I Intuitively, “double-robustness” property still holds.
I Bias of ∆ˆ marg,marg is affected by: P var(n1)− h var(nh1) I α and in (2); var(n1) I Size of true trt effect γ (negative correlated); I Ratio of between-cluster and within-cluster variance, 2 σδ g = 2 (positively correlated). σ Racial disparity data
I Disparity: racial differences in care attributed to operations of health care system.
I Breast cancer screening data are collected from health insurance plans.
I Focus on the plans with at least 25 whites and 25 blacks: 64 plans with a total sample size of 75012.
I Subsample 3000 subjects from large (>3000) clusters to restrict impact of extremely large clusters, resulting sample size 56480. Racial disparity data
I Cluster level covariates vh: geographical code, non/for-profit status, practice model.
I Individual level covariates xhk : age category, eligibility for medicaid, poor neighborhood.
I “Treatment” variable zhk : black race (1=black, 0=white).
I Not strictly causal. Compare groups with balanced covariates.
I Outcome yhk : receive screening for breast cancer or not.
I Research aim: investigate racial disparity in breast cancer screening. Estimated propensity score Estimated propensity score
I Different propensity score models give quite different estimates.
I Each method leads to good overall covariates balance between groups in this data.
I Marginal analysis does not lead to balance in covariates in each cluster, surrogate indicator analysis does better, pooled- within the best. Analysis results: racial disparity estimated from Horvitz-Thompson weight
weighted doubly-robust regression pooled clustered marginal pooled-within marginal -0.050 -0.020 -0.042 -0.021 -0.044 (0.008) (0.008) (0.004) (0.004) (0.007) pooled- -0.024 -0.021 -0.018 -0.022 -0.032 within (0.009) (0.008) (0.004) (0.004) (0.007) surrogate -0.017 -0.015 -0.012 -0.015 -0.014 indicator (0.009) (0.008) (0.004) (0.004) (0.007) Analysis results: racial disparity estimated from population-overlap weight
weighted doubly-robust regression pooled clustered marginal pooled-within marginal -0.043 -0.030 -0.043 -0.032 -0.044 (0.007) (0.008) (0.004) (0.004) (0.007) pooled- -0.030 -0.031 -0.031 -0.031 -0.032 within (0.007) (0.008) (0.004) (0.004) (0.007) surrogate -0.035 -0.030 -0.031 -0.030 -0.014 indicator (0.007) (0.008) (0.004) (0.004) (0.007) Diagnostics
I Check the balance of weighted covariates between treatment groups. Each method leads to good balance in this data.
I Quantiles table. Remarks on results
I Ignoring cluster structure in both steps gives results greatly defer from others.
I Results from surrogate indicator analysis are different from others, suggesting Portion treated is correlated with covariates.
I Taking into account cluster structure in at least one of the two steps leads to similar results - “doubly-robustness”.
I Doubly-robust estimates have smaller s.e., extra variation is explained by covariates in step 2.
I Incorporating cluster structure in step 2 is preferable to step 1.
I Between-cluster variation is large in breast cancer data.
I Standard errors obtained from bootstrap are much larger than those from analytic formula. Summary
I We introduce and compare several possible propensity score analyses for hierarchical data.
I We show “double-robustness” property of propensity score weighted estimators: cluster structure must be taken into account in at least one of the two steps.
I We obtain the analytic form of bias of the marginal estimator.
I Case by case. In practice, total number of clusters, size of each cluster, within- and between- cluster variations can greatly affect the conclusion.