1
Survival Analysis: Weeks 2-3
Lu Tian and Richard Olshen Stanford University 2
Kaplan-Meier(KM) Estimator
• Nonparametric estimation of the survival function S(t) = pr(T > t) • The nonparametric estimation is more robust and does not depend on any parametric assumption. 3
If there is no censoring
• S(t) can be consistently estimated by ∑n −1 Sˆ(t) = n I(Ti > t). i=1
• Sˆ is a discrete distribution with mass probability of n−1 at
observed times T1, ··· ,Tn
• Sˆ(t) is the nonparametric maximum likelihood estimator (NPMLE) for S(t). 4
CDF as NPMLE
• Assuming that F (·) is discrete with mass probability at
T1 < T2 < ··· < Tn, where {T1,T2, ···} are observed times.
• Let f1 = pr(T = T1), f2 = pr(T = T2), ··· .
• Objective: estimate f1, f2, ··· . ∏ ∑ • n n Method: maximize i=1 fi subject to i=1 fi = 1. ˆ ˆ ˆ −1 • Solution: f1 = f2 = ··· = fn = n . 5
Data and Assumptions
• Data: {(Ui, δi), i = 1, ··· , n} where Ui = min(Ti,Ci) and
δi = I(Ti ≤ Ci). • Assumptions:
1. T1, ··· ,Tn i.i.d. ∼ F (·) = 1 − S(·)
2. C1, ··· ,Cn i.i.d. ∼ G(·)
3. Ti ⊥ Ci, i = 1, ··· , n. Noninformative censoring! 6
If there is censoring
• Assuming that F (·) is discrete with mass probability at
v1 < v2 < ··· , where {v1, v2, ···} are observed times.
• Let f1 = pr(T = v1), f2 = pr(T = v2), ··· .
• Objective: estimate f1, f2, ··· . 7
Example
• Obs: 2, 2, 3+, 5, 5+, 7, 9, 16, 16, 18+, where + means censored
+ • v1 = 2; v2 = 3, v3 = 5, v4 = 7, v5 = 9, v6 = 16, v7 = 18, v8 = 18
• The likelihood function in terms of (f1, f2, ··· ):
2 2 L(F ) = f1 (f3+f4+f5+f6+f7+f8)f3(f4+f5+f6+f7+f8)f4f5f6 f8,
where f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 = 1 8
Reparametrization tricks
• The discrete hazard function: h1 = pr(T = v1) and
hj = pr(T = vj|T > vj−1), j > 2
• For t ∈ [vj, vj+1)
∏j S(t) = pr(T > t) = pr(T > vj) = (1 − hi). i=1
• For t = vj
j∏−1 fj = f(t) = pr(T = t) = hj (1 − hi). i=1 9
Reparametrization tricks
• The likelihood function in terms of (h1, h2, ··· ): 2 × { − − } × { − − } L(F ) =h1 (1 h1)(1 h2) (1 h1)(1 h2)h3
× {(1 − h1)(1 − h2)(1 − h3)} × {(1 − h1)(1 − h2)(1 − h3)h4}
× {(1 − h1)(1 − h2)(1 − h3)(1 − h4)h5} 2 × {(1 − h1)(1 − h2)(1 − h3)(1 − h4)(1 − h5)h6}
× {(1 − h1)(1 − h2)(1 − h3)(1 − h4)(1 − h5)(1 − h6)(1 − h7)} 2 − 8 × − 8 × − 6 =h1(1 h1) (1 h2) h3(1 h3) − 4 × − 3 × 2 − × − h4(1 h4) h5(1 h5) h6(1 h6) (1 h7) 10
KM estimation
• The likelihood function ∏ d − j − Y (vj ) dj L(F ) = hj (1 hj) j where ∑n dj = δiI(Ui = vj) = #failures at vj i=1 ∑n Y (vj) = I(Ui ≥ vj) = #“at risk” at vj. i=1 11
KM estimation
ˆ • hj = dj/Y (vj)
• 1 t < v1 Sˆ(t) = ∏ , j − ˆ ≤ i=1(1 hi) vj t < vj+1 which is the Kaplan-Meier Estimator. 12
Example
∏j vj Y (vj ) dj hˆj Sˆ(vj ) = (1 − hˆi) = Pˆ(T > vj ) i=1
2 10 2 2/10 .8 × 6 5 7 1 1/7 .69 (= .8 7 ) × 4 7 5 1 1/5 .55 (= .69 5 ) × 3 9 4 1 1/4 .41 (= .55 4 ) × 1 16 3 2 2/3 .14 (= .41 3 ) 18 1 0 0 .14 13
KM estimation
• Suppose that vg denotes the largest vj for which Y (vj) > 0.
1. if dg = Y (vj), then Sˆ(t) = 0 for t ≥ vg
2. if dg < Y (vj), then Sˆ(t) > 0 but not defined for t > vg. (Not
identifiable beyond vg.) • The survival distribution may not be estimable with right-censored data. Implicit extrapolation is sometimes used. • The KM estimator can also be used to estimate the survival function for the censoring distribution. 14
KM estimation
• KM estimator is a special MLE − ˆ 1 S(t) = argmaxF L(F ) where F is the CDF for all discrete random variables (nonparametric MLE). 15
Self-Consistency
∑ • ˆ −1 n No censoring S(t) = n i=1 I(Ti > t) ∑ • ˆ −1 n | Right censoring: S(t) = n i=1 E(I(Ti > t) Ui, δi)
1. E(I(Ti > t)|Ui, δi = 1) = I(Ui > t)
2. E(I(Ti > t)|Ui, δi = 0) = S(t)/S(Ui)I(t ≥ Ui) + I(Ui > t) • Self-consistency iteration: { } ∑n Sˆ (t) Sˆ (t) = n−1 I(U > t) + (1 − δ ) old I(U ≤ t) new i i ˆ i i=1 Sold(Ui) • The solution is still the KM estimator. 16
Redistribution of Mass
Step 1 2 2 3+ 5 5+ 7 9 16 16 18+
1 1 1 1 1 1 1 1 1 1 Step 2 10 10 10 10 10 10 10 10 10 10 ↓ ↓ → 1 1 1 1 1 1 1 Step 3 , 70 70 70 70 70 70 70
↓ ↓ ↓ → 1 8 1 8 1 8 1 8 1 8 , 5 ( 70 ) 5 ( 70 ) 5 ( 70 ) 5 ( 70 ) 5 ( 70 )
| {z } | {z } | {z } 2 8 24 24 48 Total 10 0 70 0 175 175 175 Assume Mass this is somewhere > 18 17
Nelson-Aalen Estimator
• How to estimate the cumulative hazard function? ∑j ˆ Hˆ (t) = hi for vj ≤ t < vj+1 i=1
• H(t) = − log{S(t)}
∑j ∑j ˆ ˆ − log{Sˆ(t)} = {− log(1 − hi)} ≈ hi = Hˆ (t) i=1 i=1
for vj ≤ t < vj+1. 18
Asymptotical properties of KM estimator
• As n → ∞, Sˆ(t) → S(t) in probability. • As n → ∞, n1/2{Sˆ(t) − S(t)} converges to N(0, σ2(t)) in distribution. 19
Asymptotical properties of KM estimator
• How to estimate the variance of Sˆ(t) ˆ • hi is an estimated probability. ˆ • The variance of hi can be approximated by ˆ ˆ hi(1 − hi) di(Y (vi) − di) = 3 Y (vi) Y (vi) ˆ ˆ • hi and hj are asymptotically independent (why?). 20
Asymptotical variance of KM estimator
( ) ∑j ( ) ˆ For vj ≤ t < vj+1 : Var lnSˆ(t) ≈ Var ln(1 − hi) i=1
∑j ( ) δ−method ˆ 1 = Var hi · − ˆ 2 i=1 (1 hi)
j ∑ d = i . Y (v )(Y (v ) − d ) i=1 i i i 21
Asymptotical variance of KM estimator
Using the δ-method
( ) ( )( )2 Var Sˆ(t) ≈ Var lnSˆ(t) elnSˆ(t)
( ) = Sˆ(t)2Var lnSˆ(t)
j ∑ d ≈ Sˆ(t)2 i (v ≤ t < v ) Y (v )(Y (v ) − d ) j j+1 i=1 i i i =σ ˆ2(t)
Greenwood’s formula 22
By-Product
The by-product of the Greenwood’s formula is the variance estimator for Nelson-Aalen Estimator:
j ∑ d var(Hˆ (t)) = i , v ≤ t < v Y (v )(Y (v ) − d ) j j+1 i=1 i i i 23
Confidence Interval
• Sˆ(t) 1.96ˆσ(t), drawbacks?
• By δ-method σˆ2(t) var(log(− log(Sˆ(t)))) = (log(Sˆ(t)))2Sˆ(t)2
• The confidence interval for Sˆ(t) [ ] ˆ log(− log(Sˆ(t)))− 1.96σ(t) log(− log(Sˆ(t)))+ 1.96ˆσ(t) exp{−e log(Sˆ(t))Sˆ(t) }, exp{−e log(Sˆ(t))Sˆ(t) } 24
Median survival time
• How to estimate the median survival time
• Solving Sˆ(tˆM ) = 1/2, not always solvable!
• How to construct the CI for the median survival time? Suppose that
1. pr(SˆL(t) < S(t)) = pr(SˆU (t) > S(t)) = 0.975.
2. SˆL(tˆML) = 0.5
3. SˆU (tˆMU ) = 0.5
4. The confidence interval for tM is [tˆML, tˆMU ]. 25
CI for median survival time
25403086 3445 0.0 0.2 0.4 0.6 0.8 1.0
0 1000 2000 3000 4000 26
Median survival time
0.975 = pr(SˆL(tM ) < S(tM )) = pr(SˆL(tM ) < 0.5)
= pr(SˆL(tM ) < SˆL(tˆML)) = pr(tM ≥ tˆML)
0.975 = pr(SˆU (tM ) > S(tM )) = pr(SˆU (tM ) > 0.5)
= pr(SˆU (tM ) > SˆU (tˆMU )) = pr(tM ≤ tˆMU ) 27
Restricted mean survival time
• The area under the survival curve is a nice summary for the curve ∫ • τ The AUC µ = 0 S(t)dt : ∫ τ τ
µ = tS(t) + tf(t)dt 0 0 ∫ ∫ τ ∞ = τS(τ) + tf(t)dt = min(t, τ)f(t)dt = E{min(t, τ)} 0 0 • µ can be estimated as ∫ τ Sˆ(t)dt. 0 28
Restricted mean survival time
• The restricted mean survival time E{min(T, τ)} can also be estimated as ∑n δ + (1 − δ )I(U ≥ τ) µˆ = n−1 i i i T ∧ τ, IPW b ∧ i i=1 SC (Ti τ) b where SC (·) is a consistent estimator of the survival function of the censoring time C. (How?) • Rational [ ] ≥ ∧ ≥ ∧ | I(Ci τ Ti) ∧ | ≈ ∧ P (Ci τ Ti Ti) ∧ E b Ti τ Ti (Ti τ) ∧ = Ti τ SC (Ti ∧ τ) SC (Ti τ)
• This type of estimator is called the inverse probability weighting estimator 29
Restricted mean survival time
• µˆ andµ ˆIPW are equivalent! • The variance of the estimated area under the survival curve is complicated (the derivation will be given later). ∫ {∫ } τ τ 2 1 t S(u)du h(t)dt ≥ . n 0 P (U t) 30
Area under the KM curve 0.0 0.2 0.4 0.6 0.8 1.0
0 1000 2000 3000 4000 31
Model Checking
• Under the exponential distribution: log(S(t)) = −λt • Plot t vs. log(Sˆ(t)) to visually check the expected linear pattern. • Under the Weibull distribution log(− log(S(t))) = p log(λ) + p log(t)
• Plot log(t) vs. log(− log(Sˆ(t))) to visually check the expected linear pattern. 32
Model Checking log(−log(S(t))) −5 −4 −3 −2 −1 0
4 5 6 7 8
log(t) 33
Coming Soon
• The distribution of Sˆ(t) as a process.