Explained Variation and Explained Randomness for Proportional Hazard Models John O’Quigley and Ronghui Xu
Lukas Butikofer¨
Biostatistics Journal Club
Zurich, 15 May 2013 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Outline
1. Introduction
2. Explained Variation
3. Explained Randomness
4. Measure of Predictability based on Fisher Information
5. Properties and Interpretation
6. Extensions
7. Illustrations
Lukas Butikofer¨ 2/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Goal
provide a quantifying measure of a proportional hazard model’s predictability → how strong are predictive effects? statements about: - how much of survival is explained by treatment? - changes in predictability by adding a prognostic factor to a model
compare non-nested models
Lukas Butikofer¨ 3/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Predictability Measures
regression coefficients: - depend on scale of covariates explained variation R2: - only straightforward for linear models - generalizations for proportional hazard models depend on distribution of the covariate - not invariant to transformations of response variable deviance: likelihood has to be fully specified nested models required
Lukas Butikofer¨ 4/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Desired Characteristics of Predictability Measures
direct relation to predictability of survival ranks → perfect prediction: 1; absence of effect: 0 → intermediate values should be interpretable invariance to monotonic transformation of time (as regression coefficient in Cox model) invariance to linear transformation of covariates unaffected by independent censoring accommodation of time-dependent covariates interpretation as explained variation or randomness
Lukas Butikofer¨ 5/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Model and Notation
T : potential failure time, C : potential censoring time → time observed: X = min(T , C ) Z (t): possibly time-dependent covariates
i = 1, 2, ..., n subjects with (Ti , Ci , Zi ) δ = I (T ≤ C ) → 1: failure, 0: censoring Y (t) = I (X ≥ t) → 1: at risk at time t
proportional hazard: h(t|Z (t)) = h0(t) exp(βZ (t))
Lukas Butikofer¨ 6/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Cox Partial Likelihood
Conditional probability of subject i chosen to fail given all individuals at risk and that one failure occurs:
Yi (t) exp(βZi (t)) πi (β, t) = n P [Yj (t) exp(βZj (t)] j =1
Cox partial likelihood [Cox, 1972]:
n Y Y δi Lp(β) = πi (β, t) = πi (β, t) failures i=1 n X X lp(β) = log(Yi (t) exp(βZi (t))) − log [Yj (t) exp(βZj (t))] failures j =1
Lukas Butikofer¨ 7/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Cox Scores
n P [Zj (t)Yj (t) exp(βZj (t)] X j =1 Scores: S(β) = Zi (t) − n P failures [Yj (t) exp(βZj (t)] j =1
X = [Zi (t) − Eβ(Z |t)] failures
n P n [Zj (t)Yj (t) exp(βZj (t)] X j =1 since: Eβ(Z |t) = Zj (t)πj (β, t) = n P j =1 [Yj (t) exp(βZj (t)] j =1
Lukas Butikofer¨ 8/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Residuals
ˆ X MLE of β: S(β) = 0 = [Zi (t) − Eβˆ(Z |t)] failures
ˆ Schoenfeld residuals: ri (β) = Zi (Xi ) − Eβˆ(Z |Xi ) for δi = 1 [Schoenfeld, 1982]
n X X Note that: S(β) = [ri (β)] = [δi ri (β)] failures i=1
sum of residuals is set to 0 to estimate unknown parameter → analogous to ordinary regression
Lukas Butikofer¨ 9/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Fisher Information
n n X X ∂ri (β) J (β) = E[S 2(β)] = δ E[r 2(β)] = − δ E i i i ∂(β) i=1 i=1
residual sum of squares (SSres): estimate of Fisher Info
SSres is used to study predictability: → explained variation linear model: n P 2 (Yi −Yˆi ) 2 i=1 SSres R = 1 − n = 1 − P 2 SStotal (Yi −Y¯ ) i=1 → based on a measure of information, interpretation as explained randomness
Lukas Butikofer¨ 10/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Variation versus Explained Randomness
explained variation Ω2/R2: proportion to which a model accounts for the variation of a given data set proportion of total variation explained by conditioning on a model (or a random variable)
explained randomness ρ2: information gain of one model over the other distance between distributions interpretable as explained variation in some cases
Lukas Butikofer¨ 11/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Variation
consider the random pair (T , Z ):
Var(T ) = E[Var(T |Z )] + Var[E(T |Z )]
Var[E(T |Z )]: signal, explained by conditioning on Z E[Var(T |Z )]: noise, residual variance
explained variation: amount of variation in T explained by conditioning on Z
Var[E(T |Z )] E[Var(T |Z )] Ω2 (Z ) = = 1 − T Var(T ) Var(T )
Lukas Butikofer¨ 12/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Variation
Idea: the more the variance can be reduced by the model, the greater is the model’s predictability Problem: more complicated if the variables are linked by a regression model → no practical interpretation Cox:
2 - upper bound on ΩT (Z ) that depends heavily on distribution of covariate Z
- e.g. binary covariates with βA = 2 and βB = 20 2 2 → ΩT (A) = 0.25 and ΩT (B) = 0.16 2 - consider ΩZ (T )
Lukas Butikofer¨ 13/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Randomness
Quantifies how far away the null model is from the model under consideration by measuring the distance between the distributions
measures of distanced between distributions: Fisher Information or Kullback-Leibler Information more general than explained variation coincides with explained variation in special cases: 2 multivariate normal, Cox regression (if ΩZ (T ) is used)
Lukas Butikofer¨ 14/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Randomness based on Kullback-Leibler Info (1)
Kullback-Leibler discrepancy:
pX (x), pY (y): probability density functions
D(pX ||pY ) = E[log(pX (X )] − E[log(pY (X )]
distance between null model (β = 0, no regression effect) and model of interest: Γ(β) = 2[I (β) − I (0)]
explained randomness: ρ2(β) = 1 − exp(−Γ(β)) → measure for predictability, not depending on scale
Lukas Butikofer¨ 15/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Randomness based on Kullback-Leibler Info (2)
I1(θ) = x log[f (t|z; θ)]f (t|z; β)dt dG(z) ZT
I2(θ) = x log[g(z|t; θ)]g(z|t; β)dz dF (t) TZ
rank invariant to monotonic transformations on time
estimation of g(z|t; β) by πi (β,ˆ t)
estimation of F (t) by Kaplan-Meier → W (Xi )
k n ˆ ! X X πi (β, Xj ) Γˆ (βˆ) = 2 W (X ) π (β,ˆ X ) log 2 j i j π (0, X ) j =1 i=1 i j j = 1,...,k: failures → T ; i = 1,...,n: subjects, → Z
Lukas Butikofer¨ 16/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Variation/Randomness based on Fisher Info (I)
Schoenfeld residuals: ri (β) = Zi (Xi ) − Eβ(Z |Xi ) for δi = 1
n P 2 residual sum of squares: I(β) = δi ri (β) i=1
n P 2 total sum of squares: I(0) = δi ri (0) i=1
2 I(β) explained variation: R (β) = 1 − I(0)
Normal model: coincides with coefficient of correlation, percentage of explained variation
Lukas Butikofer¨ 17/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Explained Variation/Randomness based on Fisher Info (II)
multivariate case:
- prognostic index: η(t) = βZ (t) - individuals with different Z but same η have same survival probability - univariate: Z equivalent to η → residuals of Z can be used - multivariate: residuals of η instead of Z
n n X 2 X 2 I(β) = δi [βri (β)] and I(0) = δi [βri (0)] i=1 i=1
Lukas Butikofer¨ 18/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Correction for Censoring
R2 asymptotically dependent on censoring → correction weight squared residuals by estimate of F (t) → Kaplan-Meier weights: Sˆ(t) W (t) = n relative jumps of the KM-curve P Yi (t) i=1 weighted R2: n P 2 δi W (Xi )ri (β) 2 I(β) i=1 R (β) = 1 − = 1 − n I(0) P 2 δi W (Xi )ri (0) i=1 unaffected by independent censoring, coincides with previous definition in the absence of censoring
Lukas Butikofer¨ 19/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Population Parameter
in practice: R2(βˆ) instead of R2(β) with βˆ: partial likelihood estimate population parameter of R2(βˆ):
R 2 2 Eβ{[Z (t) − Eβ(Z (t)|t)] |t} dF (t) Ω (β) = 1 − R 2 Eβ{[Z (t) − E0(Z (t)|t)] |t} dF (t)
F : failure time distribution function; F = 1 − S
if Z time-invariant: Ω2(β) = explained variation Ω2(β) depends only weakly on covariate distribution
Lukas Butikofer¨ 20/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Properties of R2
R2(0) = 0
R2(βˆ) ≤ 1
R2(βˆ) is invariant under linear transformation of Z and monotonically increasing transformations of T
R2(βˆ) consistently estimates Ω2(β)
R2(βˆ) is asymptotically normal
R2(βˆ) can be negative (best fitting model provides poorer fit than null model)
Lukas Butikofer¨ 21/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Properties of Ω2
Ω2(0) = 0
0 ≤ Ω2(β) ≤ 1
Ω2(β) is invariant under linear transformation of Z and monotonically increasing transformations of T
Ω2(β) increases with |β| and as |β| → ∞, Ω2(β) → 1
Lukas Butikofer¨ 22/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Interpretation of R2
sum of squares decomposition
n X 2 SStot = δi W (Xi )ri (0) i=1 n X 2 ˆ SSres = δi W (Xi )ri (β) i=1 n X 2 SSreg = δi W (Xi )[Eβˆ(Z |Xi ) − E0(Z |Xi )] i=1
for n → ∞: SStot = SSres + SSreg
R2 = 1 − SSres = SSreg SStot SStot
Lukas Butikofer¨ 23/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Interpretation of Ω2
for time-invariant covariates independent censoring
Ω2 can be interpreted as explained variation of Z by conditioning on T :
E[Var(Z |T )] Var[E(Z |T )] Ω2(β) ≈ Ω2 (T ) = 1 − = Z Var(Z ) Var(Z )
Lukas Butikofer¨ 24/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Partial Coefficients
partial effect of Zq+1, ..., Zp after having accounted for effect of Z1, ..., Zq (q < p):
2 2 1 − R (Z1, ..., Zp) 1 − R (Zq+1, ..., Zp|Z1, ..., Zq ) = 2 1 − R (Z1, ..., Zq )
2 R (Z1, ..., Zp): based on model with p covariates Z1, ..., Zp
2 R (Z1, ..., Zq ): based on model with q covariates Z1, ..., Zq
Lukas Butikofer¨ 25/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Stratified Model
stratum s = 1, ..., S:
ri (b; s) = Zis (Xis ) − Eb(Z |Xis )
n S X X 2 I(b) = δis W (Xis )ri (b, s) i=1 s=1
n S P P δ W (X )r 2(β, s) I(β) is is i R2(β) = 1 − = 1 − i=1 s=1 I(0) n S P P 2 δis W (Xis )ri (0, s) i=1 s=1
Lukas Butikofer¨ 26/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Other Relative Risk Models
general form of relative risk r(t, z)
Yi (t)ˆr(t, Zi ) 2 πi (t) = n → similar definition of R P [Yj (t)ˆr(t, Zi )] j =1
risk functions - r(t, z) = exp(βz) → Cox - r(t, z) = 1 + βz - r(t, z) = exp(β(t)z) → time-varying regression effects
Lukas Butikofer¨ 27/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Breast Cancer: Univariate Analysis
covariate βˆ p-Value R2 Age -0.24 < 0.01 0.005 Hist 0.37 < 0.01 0.12 Stage 0.53 < 0.01 0.20 Prog -0.73 < 0.01 0.07 Size 0.02 < 0.01 0.18
all variables significant but prognostic power differs stage and tumor size: high predictability non-proportional regression effects for histology → model with time-varying coefficient for histology: R2 = 0.24 age has only weak effect, suboptimal coding? → recoding in three groups (0-33, 34-40, >40), either linear (1-2-3) or by two binary variables → similar R2, keep simplest model (1-2-3)
Lukas Butikofer¨ 28/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
Breast Cancer: Multivariate Analysis
covariates R2 partial R2 Age 0.01 Age and Hist 0.12 0.12 Age, Hist, and Stage 0.26 0.16 Age, Hist, Stage, and Prog 0.33 0.09 Age, Hist, Stage, Prog, and Size 0.33 0.01
partial R2 for size having accounted for the other covariates is small → extra amount of variation in survival explained by size is limited
Lukas Butikofer¨ 29/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations
References
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), pages 187–220.
Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2):269–276.
Crowley, J. and Hoering, A. (2012). Handbook of statistics in clinical oncology. Chapman and Hall. Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3):691–692.
O’Quigley, J. and Flandre, P. (1994). Predictive capability of proportional hazards regression. Proceedings of the National Academy of Sciences, 91(6):2310–2314.
Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. Biometrika, 69(1):239–241.
Xu, R. (1996). Inference for the proportional hazards model. PhD thesis of University of California, San Diego, CA. Xu, R. and O’Quigley, J. (2000). Estimating average regression effect under non-proportional hazards. Biostatistics, 1(4):423–439.
Lukas Butikofer¨ 30/ 30