Survival Analysis: Weeks 2-3

Home , Survival analysis, Survival function

Lu Tian and Richard Olshen Stanford University 2

Kaplan-Meier(KM) Estimator

• Nonparametric estimation of the survival function S(t) = pr(T > t) • The nonparametric estimation is more robust and does not depend on any parametric assumption. 3

If there is no censoring

• S(t) can be consistently estimated by ∑n −1 Sˆ(t) = n I(Ti > t). i=1

• Sˆ is a discrete distribution with mass probability of n−1 at

observed times T1, ··· ,Tn

• Sˆ(t) is the nonparametric maximum likelihood estimator (NPMLE) for S(t). 4

CDF as NPMLE

• Assuming that F (·) is discrete with mass probability at

T1 < T2 < ··· < Tn, where {T1,T2, ···} are observed times.

• Let f1 = pr(T = T1), f2 = pr(T = T2), ··· .

• Objective: estimate f1, f2, ··· . ∏ ∑ • n n Method: maximize i=1 fi subject to i=1 fi = 1. ˆ ˆ ˆ −1 • Solution: f1 = f2 = ··· = fn = n . 5

Data and Assumptions

• Data: {(Ui, δi), i = 1, ··· , n} where Ui = min(Ti,Ci) and

δi = I(Ti ≤ Ci). • Assumptions:

1. T1, ··· ,Tn i.i.d. ∼ F (·) = 1 − S(·)

2. C1, ··· ,Cn i.i.d. ∼ G(·)

3. Ti ⊥ Ci, i = 1, ··· , n. Noninformative censoring! 6

If there is censoring

• Assuming that F (·) is discrete with mass probability at

v1 < v2 < ··· , where {v1, v2, ···} are observed times.

• Let f1 = pr(T = v1), f2 = pr(T = v2), ··· .

• Objective: estimate f1, f2, ··· . 7

Example

• Obs: 2, 2, 3+, 5, 5+, 7, 9, 16, 16, 18+, where + means censored

+ • v1 = 2; v2 = 3, v3 = 5, v4 = 7, v5 = 9, v6 = 16, v7 = 18, v8 = 18

• The likelihood function in terms of (f1, f2, ··· ):

2 2 L(F ) = f1 (f3+f4+f5+f6+f7+f8)f3(f4+f5+f6+f7+f8)f4f5f6 f8,

where f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 = 1 8

Reparametrization tricks

• The discrete hazard function: h1 = pr(T = v1) and

hj = pr(T = vj|T > vj−1), j > 2

• For t ∈ [vj, vj+1)

∏j S(t) = pr(T > t) = pr(T > vj) = (1 − hi). i=1

• For t = vj

j∏−1 fj = f(t) = pr(T = t) = hj (1 − hi). i=1 9

Reparametrization tricks

• The likelihood function in terms of (h1, h2, ··· ): 2 × { − − } × { − − } L(F ) =h1 (1 h1)(1 h2) (1 h1)(1 h2)h3

× {(1 − h1)(1 − h2)(1 − h3)} × {(1 − h1)(1 − h2)(1 − h3)h4}

× {(1 − h1)(1 − h2)(1 − h3)(1 − h4)h5} 2 × {(1 − h1)(1 − h2)(1 − h3)(1 − h4)(1 − h5)h6}

× {(1 − h1)(1 − h2)(1 − h3)(1 − h4)(1 − h5)(1 − h6)(1 − h7)} 2 − 8 × − 8 × − 6 =h1(1 h1) (1 h2) h3(1 h3) − 4 × − 3 × 2 − × − h4(1 h4) h5(1 h5) h6(1 h6) (1 h7) 10

KM estimation

• The likelihood function ∏ d − j − Y (vj ) dj L(F ) = hj (1 hj) j where ∑n dj = δiI(Ui = vj) = #failures at vj i=1 ∑n Y (vj) = I(Ui ≥ vj) = #“at risk” at vj. i=1 11

KM estimation

ˆ • hj = dj/Y (vj)

•   1 t < v1 Sˆ(t) = ∏ ,  j − ˆ ≤ i=1(1 hi) vj t < vj+1 which is the Kaplan-Meier Estimator. 12

Example

∏j vj Y (vj ) dj hˆj Sˆ(vj ) = (1 − hˆi) = Pˆ(T > vj ) i=1

2 10 2 2/10 .8 × 6 5 7 1 1/7 .69 (= .8 7 ) × 4 7 5 1 1/5 .55 (= .69 5 ) × 3 9 4 1 1/4 .41 (= .55 4 ) × 1 16 3 2 2/3 .14 (= .41 3 ) 18 1 0 0 .14 13

KM estimation

• Suppose that vg denotes the largest vj for which Y (vj) > 0.

1. if dg = Y (vj), then Sˆ(t) = 0 for t ≥ vg

2. if dg < Y (vj), then Sˆ(t) > 0 but not deﬁned for t > vg. (Not

identiﬁable beyond vg.) • The survival distribution may not be estimable with right-censored data. Implicit extrapolation is sometimes used. • The KM estimator can also be used to estimate the survival function for the censoring distribution. 14

KM estimation

• KM estimator is a special MLE − ˆ 1 S(t) = argmaxF L(F ) where F is the CDF for all discrete random variables (nonparametric MLE). 15

Self-Consistency

∑ • ˆ −1 n No censoring S(t) = n i=1 I(Ti > t) ∑ • ˆ −1 n | Right censoring: S(t) = n i=1 E(I(Ti > t) Ui, δi)

1. E(I(Ti > t)|Ui, δi = 1) = I(Ui > t)

2. E(I(Ti > t)|Ui, δi = 0) = S(t)/S(Ui)I(t ≥ Ui) + I(Ui > t) • Self-consistency iteration: { } ∑n Sˆ (t) Sˆ (t) = n−1 I(U > t) + (1 − δ ) old I(U ≤ t) new i i ˆ i i=1 Sold(Ui) • The solution is still the KM estimator. 16

Redistribution of Mass

Step 1 2 2 3+ 5 5+ 7 9 16 16 18+

1 1 1 1 1 1 1 1 1 1 Step 2 10 10 10 10 10 10 10 10 10 10 ↓ ↓ → 1 1 1 1 1 1 1 Step 3 , 70 70 70 70 70 70 70

↓ ↓ ↓ → 1 8 1 8 1 8 1 8 1 8 , 5 ( 70 ) 5 ( 70 ) 5 ( 70 ) 5 ( 70 ) 5 ( 70 )

| {z } | {z } | {z } 2 8 24 24 48 Total 10 0 70 0 175 175 175 Assume Mass this is somewhere > 18 17

Nelson-Aalen Estimator

• How to estimate the cumulative hazard function? ∑j ˆ Hˆ (t) = hi for vj ≤ t < vj+1 i=1

• H(t) = − log{S(t)}

∑j ∑j ˆ ˆ − log{Sˆ(t)} = {− log(1 − hi)} ≈ hi = Hˆ (t) i=1 i=1

for vj ≤ t < vj+1. 18

Asymptotical properties of KM estimator

• As n → ∞, Sˆ(t) → S(t) in probability. • As n → ∞, n1/2{Sˆ(t) − S(t)} converges to N(0, σ2(t)) in distribution. 19

Asymptotical properties of KM estimator

• How to estimate the variance of Sˆ(t) ˆ • hi is an estimated probability. ˆ • The variance of hi can be approximated by ˆ ˆ hi(1 − hi) di(Y (vi) − di) = 3 Y (vi) Y (vi) ˆ ˆ • hi and hj are asymptotically independent (why?). 20

Asymptotical variance of KM estimator

( ) ∑j ( ) ˆ For vj ≤ t < vj+1 : Var lnSˆ(t) ≈ Var ln(1 − hi) i=1

∑j ( ) δ−method ˆ 1 = Var hi · − ˆ 2 i=1 (1 hi)

j ∑ d = i . Y (v )(Y (v ) − d ) i=1 i i i 21

Asymptotical variance of KM estimator

Using the δ-method

( ) ( )( )2 Var Sˆ(t) ≈ Var lnSˆ(t) elnSˆ(t)

( ) = Sˆ(t)2Var lnSˆ(t)

j ∑ d ≈ Sˆ(t)2 i (v ≤ t < v ) Y (v )(Y (v ) − d ) j j+1 i=1 i i i =σ ˆ2(t)

Greenwood’s formula 22

By-Product

The by-product of the Greenwood’s formula is the variance estimator for Nelson-Aalen Estimator:

j ∑ d var(Hˆ (t)) = i , v ≤ t < v Y (v )(Y (v ) − d ) j j+1 i=1 i i i 23

Conﬁdence Interval

• Sˆ(t) 1.96ˆσ(t), drawbacks?

• By δ-method σˆ2(t) var(log(− log(Sˆ(t)))) = (log(Sˆ(t)))2Sˆ(t)2

• The conﬁdence interval for Sˆ(t) [ ] ˆ log(− log(Sˆ(t)))− 1.96σ(t) log(− log(Sˆ(t)))+ 1.96ˆσ(t) exp{−e log(Sˆ(t))Sˆ(t) }, exp{−e log(Sˆ(t))Sˆ(t) } 24

Median survival time

• How to estimate the median survival time

• Solving Sˆ(tˆM ) = 1/2, not always solvable!

• How to construct the CI for the median survival time? Suppose that

1. pr(SˆL(t) < S(t)) = pr(SˆU (t) > S(t)) = 0.975.

2. SˆL(tˆML) = 0.5

3. SˆU (tˆMU ) = 0.5

4. The conﬁdence interval for tM is [tˆML, tˆMU ]. 25

CI for median survival time

25403086 3445 0.0 0.2 0.4 0.6 0.8 1.0

0 1000 2000 3000 4000 26

Median survival time

0.975 = pr(SˆL(tM ) < S(tM )) = pr(SˆL(tM ) < 0.5)

= pr(SˆL(tM ) < SˆL(tˆML)) = pr(tM ≥ tˆML)

0.975 = pr(SˆU (tM ) > S(tM )) = pr(SˆU (tM ) > 0.5)

= pr(SˆU (tM ) > SˆU (tˆMU )) = pr(tM ≤ tˆMU ) 27

Restricted mean survival time

• The area under the survival curve is a nice summary for the curve ∫ • τ The AUC µ = 0 S(t)dt : ∫ τ τ

µ = tS(t) + tf(t)dt 0 0 ∫ ∫ τ ∞ = τS(τ) + tf(t)dt = min(t, τ)f(t)dt = E{min(t, τ)} 0 0 • µ can be estimated as ∫ τ Sˆ(t)dt. 0 28

Restricted mean survival time

• The restricted mean survival time E{min(T, τ)} can also be estimated as ∑n δ + (1 − δ )I(U ≥ τ) µˆ = n−1 i i i T ∧ τ, IPW b ∧ i i=1 SC (Ti τ) b where SC (·) is a consistent estimator of the survival function of the censoring time C. (How?) • Rational [ ] ≥ ∧ ≥ ∧ | I(Ci τ Ti) ∧ | ≈ ∧ P (Ci τ Ti Ti) ∧ E b Ti τ Ti (Ti τ) ∧ = Ti τ SC (Ti ∧ τ) SC (Ti τ)

• This type of estimator is called the inverse probability weighting estimator 29

Restricted mean survival time

• µˆ andµ ˆIPW are equivalent! • The variance of the estimated area under the survival curve is complicated (the derivation will be given later). ∫ {∫ } τ τ 2 1 t S(u)du h(t)dt ≥ . n 0 P (U t) 30

Area under the KM curve 0.0 0.2 0.4 0.6 0.8 1.0

0 1000 2000 3000 4000 31

Model Checking

• Under the exponential distribution: log(S(t)) = −λt • Plot t vs. log(Sˆ(t)) to visually check the expected linear pattern. • Under the Weibull distribution log(− log(S(t))) = p log(λ) + p log(t)

• Plot log(t) vs. log(− log(Sˆ(t))) to visually check the expected linear pattern. 32

Model Checking log(−log(S(t))) −5 −4 −3 −2 −1 0

4 5 6 7 8

log(t) 33

Coming Soon

• The distribution of Sˆ(t) as a process.