<<

Explained Variation and Explained for Proportional Hazard Models John O’Quigley and Ronghui Xu

Lukas Butikofer¨

Biostatistics Journal Club

Zurich, 15 May 2013 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Outline

1. Introduction

2. Explained Variation

3. Explained Randomness

4. Measure of based on Fisher Information

5. Properties and Interpretation

6. Extensions

7. Illustrations

Lukas Butikofer¨ 2/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Goal

provide a quantifying measure of a proportional hazard model’s predictability → how strong are predictive effects? statements about: - how much of survival is explained by treatment? - changes in predictability by adding a prognostic factor to a model

compare non-nested models

Lukas Butikofer¨ 3/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Predictability Measures

regression coefficients: - depend on scale of covariates explained variation R2: - only straightforward for linear models - generalizations for proportional hazard models depend on distribution of the covariate - not invariant to transformations of response variable deviance: likelihood has to be fully specified nested models required

Lukas Butikofer¨ 4/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Desired Characteristics of Predictability Measures

direct relation to predictability of survival ranks → perfect prediction: 1; absence of effect: 0 → intermediate values should be interpretable invariance to monotonic transformation of time (as regression coefficient in Cox model) invariance to linear transformation of covariates unaffected by independent censoring accommodation of time-dependent covariates interpretation as explained variation or randomness

Lukas Butikofer¨ 5/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Model and Notation

T : potential failure time, C : potential censoring time → time observed: X = min(T , C ) Z (t): possibly time-dependent covariates

i = 1, 2, ..., n subjects with (Ti , Ci , Zi ) δ = I (T ≤ C ) → 1: failure, 0: censoring Y (t) = I (X ≥ t) → 1: at risk at time t

proportional hazard: h(t|Z (t)) = h0(t) exp(βZ (t))

Lukas Butikofer¨ 6/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Cox Partial Likelihood

Conditional of subject i chosen to fail given all individuals at risk and that one failure occurs:

Yi (t) exp(βZi (t)) πi (β, t) = n P [Yj (t) exp(βZj (t)] j =1

Cox partial likelihood [Cox, 1972]:

n Y Y δi Lp(β) = πi (β, t) = πi (β, t) failures i=1  n  X X lp(β) = log(Yi (t) exp(βZi (t))) − log [Yj (t) exp(βZj (t))] failures j =1

Lukas Butikofer¨ 7/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Cox Scores

 n  P [Zj (t)Yj (t) exp(βZj (t)]   X  j =1  Scores: S(β) = Zi (t) − n  P  failures  [Yj (t) exp(βZj (t)]  j =1

X = [Zi (t) − Eβ(Z |t)] failures

n P n [Zj (t)Yj (t) exp(βZj (t)] X j =1 since: Eβ(Z |t) = Zj (t)πj (β, t) = n P j =1 [Yj (t) exp(βZj (t)] j =1

Lukas Butikofer¨ 8/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Residuals

ˆ X MLE of β: S(β) = 0 = [Zi (t) − Eβˆ(Z |t)] failures

ˆ Schoenfeld residuals: ri (β) = Zi (Xi ) − Eβˆ(Z |Xi ) for δi = 1 [Schoenfeld, 1982]

n X X Note that: S(β) = [ri (β)] = [δi ri (β)] failures i=1

sum of residuals is set to 0 to estimate unknown parameter → analogous to ordinary regression

Lukas Butikofer¨ 9/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Fisher Information

n n X X ∂ri (β) J (β) = E[S 2(β)] = δ E[r 2(β)] = − δ E i i i ∂(β) i=1 i=1

residual sum of squares (SSres): estimate of Fisher Info

SSres is used to study predictability: → explained variation linear model: n P 2 (Yi −Yˆi ) 2 i=1 SSres R = 1 − n = 1 − P 2 SStotal (Yi −Y¯ ) i=1 → based on a measure of information, interpretation as explained randomness

Lukas Butikofer¨ 10/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Variation versus Explained Randomness

explained variation Ω2/R2: proportion to which a model accounts for the variation of a given data set proportion of total variation explained by conditioning on a model (or a )

explained randomness ρ2: information gain of one model over the other distance between distributions interpretable as explained variation in some cases

Lukas Butikofer¨ 11/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Variation

consider the random pair (T , Z ):

Var(T ) = E[Var(T |Z )] + Var[E(T |Z )]

Var[E(T |Z )]: signal, explained by conditioning on Z E[Var(T |Z )]: noise, residual

explained variation: amount of variation in T explained by conditioning on Z

Var[E(T |Z )] E[Var(T |Z )] Ω2 (Z ) = = 1 − T Var(T ) Var(T )

Lukas Butikofer¨ 12/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Variation

Idea: the more the variance can be reduced by the model, the greater is the model’s predictability Problem: more complicated if the variables are linked by a regression model → no practical interpretation Cox:

2 - upper bound on ΩT (Z ) that depends heavily on distribution of covariate Z

- e.g. binary covariates with βA = 2 and βB = 20 2 2 → ΩT (A) = 0.25 and ΩT (B) = 0.16 2 - consider ΩZ (T )

Lukas Butikofer¨ 13/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Randomness

Quantifies how far away the null model is from the model under consideration by measuring the distance between the distributions

measures of distanced between distributions: Fisher Information or Kullback-Leibler Information more general than explained variation coincides with explained variation in special cases: 2 multivariate normal, Cox regression (if ΩZ (T ) is used)

Lukas Butikofer¨ 14/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Randomness based on Kullback-Leibler Info (1)

Kullback-Leibler discrepancy:

pX (x), pY (y): probability density functions

D(pX ||pY ) = E[log(pX (X )] − E[log(pY (X )]

distance between null model (β = 0, no regression effect) and model of interest: Γ(β) = 2[I (β) − I (0)]

explained randomness: ρ2(β) = 1 − exp(−Γ(β)) → measure for predictability, not depending on scale

Lukas Butikofer¨ 15/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Randomness based on Kullback-Leibler Info (2)

I1(θ) = x log[f (t|z; θ)]f (t|z; β)dt dG(z) ZT

I2(θ) = x log[g(z|t; θ)]g(z|t; β)dz dF (t) TZ

rank invariant to monotonic transformations on time

estimation of g(z|t; β) by πi (β,ˆ t)

estimation of F (t) by Kaplan-Meier → W (Xi )

k n ˆ ! X X πi (β, Xj ) Γˆ (βˆ) = 2 W (X ) π (β,ˆ X ) log 2 j i j π (0, X ) j =1 i=1 i j j = 1,...,k: failures → T ; i = 1,...,n: subjects, → Z

Lukas Butikofer¨ 16/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Variation/Randomness based on Fisher Info (I)

Schoenfeld residuals: ri (β) = Zi (Xi ) − Eβ(Z |Xi ) for δi = 1

n P 2 residual sum of squares: I(β) = δi ri (β) i=1

n P 2 total sum of squares: I(0) = δi ri (0) i=1

2 I(β) explained variation: R (β) = 1 − I(0)

Normal model: coincides with coefficient of correlation, percentage of explained variation

Lukas Butikofer¨ 17/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Explained Variation/Randomness based on Fisher Info (II)

multivariate case:

- prognostic index: η(t) = βZ (t) - individuals with different Z but same η have same survival probability - univariate: Z equivalent to η → residuals of Z can be used - multivariate: residuals of η instead of Z

n n X 2 X 2 I(β) = δi [βri (β)] and I(0) = δi [βri (0)] i=1 i=1

Lukas Butikofer¨ 18/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Correction for Censoring

R2 asymptotically dependent on censoring → correction weight squared residuals by estimate of F (t) → Kaplan-Meier weights: Sˆ(t) W (t) = n relative jumps of the KM-curve P Yi (t) i=1 weighted R2: n P 2 δi W (Xi )ri (β) 2 I(β) i=1 R (β) = 1 − = 1 − n I(0) P 2 δi W (Xi )ri (0) i=1 unaffected by independent censoring, coincides with previous definition in the absence of censoring

Lukas Butikofer¨ 19/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Population Parameter

in practice: R2(βˆ) instead of R2(β) with βˆ: partial likelihood estimate population parameter of R2(βˆ):

R 2 2 Eβ{[Z (t) − Eβ(Z (t)|t)] |t} dF (t) Ω (β) = 1 − R 2 Eβ{[Z (t) − E0(Z (t)|t)] |t} dF (t)

F : failure time distribution function; F = 1 − S

if Z time-invariant: Ω2(β) = explained variation Ω2(β) depends only weakly on covariate distribution

Lukas Butikofer¨ 20/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Properties of R2

R2(0) = 0

R2(βˆ) ≤ 1

R2(βˆ) is invariant under linear transformation of Z and monotonically increasing transformations of T

R2(βˆ) consistently estimates Ω2(β)

R2(βˆ) is asymptotically normal

R2(βˆ) can be negative (best fitting model provides poorer fit than null model)

Lukas Butikofer¨ 21/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Properties of Ω2

Ω2(0) = 0

0 ≤ Ω2(β) ≤ 1

Ω2(β) is invariant under linear transformation of Z and monotonically increasing transformations of T

Ω2(β) increases with |β| and as |β| → ∞, Ω2(β) → 1

Lukas Butikofer¨ 22/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Interpretation of R2

sum of squares decomposition

n X 2 SStot = δi W (Xi )ri (0) i=1 n X 2 ˆ SSres = δi W (Xi )ri (β) i=1 n X 2 SSreg = δi W (Xi )[Eβˆ(Z |Xi ) − E0(Z |Xi )] i=1

for n → ∞: SStot = SSres + SSreg

R2 = 1 − SSres = SSreg SStot SStot

Lukas Butikofer¨ 23/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Interpretation of Ω2

for time-invariant covariates independent censoring

Ω2 can be interpreted as explained variation of Z by conditioning on T :

E[Var(Z |T )] Var[E(Z |T )] Ω2(β) ≈ Ω2 (T ) = 1 − = Z Var(Z ) Var(Z )

Lukas Butikofer¨ 24/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Partial Coefficients

partial effect of Zq+1, ..., Zp after having accounted for effect of Z1, ..., Zq (q < p):

2 2 1 − R (Z1, ..., Zp) 1 − R (Zq+1, ..., Zp|Z1, ..., Zq ) = 2 1 − R (Z1, ..., Zq )

2 R (Z1, ..., Zp): based on model with p covariates Z1, ..., Zp

2 R (Z1, ..., Zq ): based on model with q covariates Z1, ..., Zq

Lukas Butikofer¨ 25/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Stratified Model

stratum s = 1, ..., S:

ri (b; s) = Zis (Xis ) − Eb(Z |Xis )

n S X X 2 I(b) = δis W (Xis )ri (b, s) i=1 s=1

n S P P δ W (X )r 2(β, s) I(β) is is i R2(β) = 1 − = 1 − i=1 s=1 I(0) n S P P 2 δis W (Xis )ri (0, s) i=1 s=1

Lukas Butikofer¨ 26/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Other Relative Risk Models

general form of relative risk r(t, z)

Yi (t)ˆr(t, Zi ) 2 πi (t) = n → similar definition of R P [Yj (t)ˆr(t, Zi )] j =1

risk functions - r(t, z) = exp(βz) → Cox - r(t, z) = 1 + βz - r(t, z) = exp(β(t)z) → time-varying regression effects

Lukas Butikofer¨ 27/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Breast Cancer: Univariate Analysis

covariate βˆ p-Value R2 Age -0.24 < 0.01 0.005 Hist 0.37 < 0.01 0.12 Stage 0.53 < 0.01 0.20 Prog -0.73 < 0.01 0.07 Size 0.02 < 0.01 0.18

all variables significant but prognostic power differs stage and tumor size: high predictability non-proportional regression effects for histology → model with time-varying coefficient for histology: R2 = 0.24 age has only weak effect, suboptimal coding? → recoding in three groups (0-33, 34-40, >40), either linear (1-2-3) or by two binary variables → similar R2, keep simplest model (1-2-3)

Lukas Butikofer¨ 28/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

Breast Cancer: Multivariate Analysis

covariates R2 partial R2 Age 0.01 Age and Hist 0.12 0.12 Age, Hist, and Stage 0.26 0.16 Age, Hist, Stage, and Prog 0.33 0.09 Age, Hist, Stage, Prog, and Size 0.33 0.01

partial R2 for size having accounted for the other covariates is small → extra amount of variation in survival explained by size is limited

Lukas Butikofer¨ 29/ 30 Introduction Expl. Variation Expl. Randomness Based on Fisher Info Properties/Interpretation Extensions Illustrations

References

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), pages 187–220.

Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2):269–276.

Crowley, J. and Hoering, A. (2012). Handbook of in clinical oncology. Chapman and Hall. Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3):691–692.

O’Quigley, J. and Flandre, P. (1994). Predictive capability of proportional hazards regression. Proceedings of the National Academy of Sciences, 91(6):2310–2314.

Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. Biometrika, 69(1):239–241.

Xu, R. (1996). Inference for the proportional hazards model. PhD thesis of University of California, San Diego, CA. Xu, R. and O’Quigley, J. (2000). Estimating average regression effect under non-proportional hazards. , 1(4):423–439.

Lukas Butikofer¨ 30/ 30