Survival Analysis: Counting Process and Martingale
Lu Tian and Richard Olshen Stanford University
1 Lebesgue-Stieltjes Integrals
• G(·) is a right-continuous step function having jumps at x1, x2, ··· .. ∫ b ∑ ∑ { − − { } f(x)dG(x) = f(xj) G(xj) G(xj )) = f(xj)∆ G(xj) . a a 2 One-sample problem • In one-sample problem: observation (Ui, δi), i = 1, 2, ··· , n. ∑ • n ≤ N(t) = i=1 I(Ui t, δi = 1) # observed failures by time t ∫ ∞ ∑K f(t)dN(t) = f(τj)dj 0 j=1 ∑ • { n ≥ }−1 f(t) = i=1 I(Ui t) ∫ ∞ ∑K d f(t)dN(t) = j , Y (τ ) 0 j=1 j The NA estimator for the cumulative hazard function! 3 Two-sample problem • In two-sample problem: observation (Ui, δi,Zi), i = 1, 2, ··· , n. ∑ • n ≤ Nl(t) = i=1 I(Ui t, δi = 1,Zi = l) # observed failures by time t at group l, l = 0, 1. ∑ • n ≥ Yl(t) = i=1 I(Ui t, Zi = l) # at risk by time t at group l, l = 0, 1. • Consider the stochastic integral ∫ ∞ ∫ ∞ Y0(t) Y1(t) W = dN1(t) − dN0(t) 0 Y (t) 0 Y (t) 4 Two-sample problem • ∑K ∑K Y0(τj ) Y1(τj ) W = ∆N1(τj ) − ∆N0(τj ) Y (τj ) Y (τj ) j=1 j=1 [ ] ∑K Y0(τj ) Y1(τj ) = d1j − d0j Y (τj ) Y (τj ) j=1 K ∑ Y (τ )d − Y (τ )(d − d ) = 0 j 1j 1 j j 0j Y (τj ) j=1 ∑ ( ) ∑ Y1(τj ) = d1j − dj · = (Oj − Ej ); Y (τj ) j j 5 Regression problem • In Cox regression: observations (Ui, δi,Zi), i = 1, 2, ··· , n. • Ni(t) = I(Ui ≤ t, δi = 1) • The score function related to PL function is { } ∑n S(1)(β, U ) S (β) = δ Z − i n i i S(0)(β, U ) i=1 i ∫ { } ∑n ∞ S(1)(β, s) = Z − dN (s) i S(0)(β, s) i i=1 0 6 σ-algebra • F is a non-empty collection of subsets in Ω. F is called σalgebra if • If E ∈ F then Ec ∈ F • ∈ F ··· ∪∞ ∈ F if Ei , i = 1, , n then i=1Ei . • A probability space (Ω, F,P ) is an abstract space Ω equipped with a σ−algebra and a set function P defined on F such that 1. If P {E} ≥ 0 when E ∈ F 2. P {Ω} = 1 ∑ {∪∞ } ∞ { } ∈ F ··· ∩ ̸ 3. P i=1Ei = i=1 P Ei when Ei , i = 1, , n and Ei Ej = ϕ, i = j. 8 Random Variables and Stochastic Process • (Ω, F,P ) is a probability space. A random variable X :Ω → R is a real-valued function satisfying {ω : X(ω) ∈ (−∞, b]} ∈ F for any b. • A stochastic process is a family of random variables X = {X(t): t ∈ Γ} indexed by a set Γ all defined on the same probability space (Ω, F,P ). Often times Γ = [0, ∞) in survival analysis. 9 Filtration and Adaptability • A family of σ-algebra {Ft : t ≥ 0} is called a filtration of a probability space (Ω, F,P ) if Ft ⊂ F and Fs ⊂ Ft for s ≤ t. • A filtration {Ft; t ≥ 0} is right-continuous if for any t Ft+ = Ft where Ft+ = ∩ϵ>0Ft+ϵ. • A stochastic process {X(t): t ≥ 0} is said to be adapted to {Ft, t ≥ 0} if X(t) is Ft measurable for each t, i.e., {ω : X(t; ω) ∈ (−∞, b]} ∈ Ft. 10 History of a stochastic process • Let X = {X(t): t ≥ 0} be a stochastic process, and let Ft = σ{X(s) : 0 ≤ s ≤ t}, the smallest σ-algebra making all of the random variables X(s), 0 ≤ s ≤ t measurable. The filtration {Ft : t ≥ 0} is called the history of X. 11 Conditional Expectation • Suppose that Y is random variable on a probability space (Ω, F,P ) and let G be a sub-σ-algebra of F. Let X be a random variable satisfying 1. X is G-measurable ∫ ∫ ∈ G 2. B Y dP = B XdP for all sets B . The random variable X is the called the conditional expectation of Y given G and denoted by E(Y |G). 12 Conditional Expectation • If Ft = {ϕ, Ω}, then E(X|Ft) = E(X) • E{E(X|Ft)} = E(X) • If Fs ⊂ Ft, then E{E(X|Fs)|Ft} = E(X|Fs) • If Y is Ft-measurable, then E(XY |Ft) = YE(X|Ft) • If X and Y are independent, then E(Y |σ(X)) = E(Y ) • E(aX + bY |Ft) = aE(X|Ft) + bE(Y |Ft) • If g(·) is a convex function, then E{g(X)|Ft} ≥ g{E(X|Ft)}. • E(Y |σ(Xt, t ∈ Γ)) = E(Y |Xt, t ∈ Γ) 13 Example of conditional expectation • The probability space (Ω, F,P ) 1. Ω = [0, 1) 2. F is the Borel algebra generated by all the open interval (a, b) ⊂ [0, 1] 3. The probability measure: the lebesgue measure P ((a, b)) = b − a. • Define the sigma fields 1. F1 = σ([0, 1/2), [1/2, 1)) 2. F2 = σ([0, 1/4), [1/4, 2/4), [2/4, 3/4), [3/4, 1)) k k k 3. Fk = σ([i/2 , (i + 1)/2 ), i = 0, 1, ··· , 2 − 1) 14 Example of conditional expectation • Random Variable X : (0, 1) → R1 : X(ω) = f(ω) • Xk = E(X|Fk) • Xk is Fk measurable k k – Xk(ω) is a step function: Xk(ω) = ai, ω ∈ [i/2 , (i + 1)/2 ] ∫ ∫ • B XkdP = B XdP – For i = 0, 1, ··· , 2k − 1, ∫ (i+1)/2k k ai = 2 f(ω)dω i/2k 15 Martingale • Let X(·) = {X(t), t ≥ 0} be a right-continuous a stochastic process with left-hand limit and Ft be a filtration on a common probability space. X(·) is a martingale if 1. X is adapted to {Ft : t ≥ 0}. 2. E|X(t)| < ∞ for any t 3. E[X(t + s)|Ft] = X(t) for any t, s ≥ 0. • X(·) is called a sub-martingale if = is replaced by ≥ and super-martingale if = is replaced by ≤ . 16 Martingale • In the survival analysis: Ft = σ{X(s) : 0 ≤ u ≤ t}. • For t, s ≥ 0 E[X(t + s)|Ft] = E[X(t + s)|X(u), 0 ≤ u ≤ t], where (X(u), 0 ≤ u ≤ t) is the history of the process X(·) from 0 to t. 17 Example of martingale • Y1,Y2,... are i.i.d. random variables satisfying +1 w.p. 1/2 Yi = −1 w.p. 1/2 • Define X(0) = 0, and ∑n X(n) = Yj n = 1, 2, ··· j=1 • (X(u) : 0 ≤ u ≤ n) = (X(1),...,X(n)) ∼ (Y1,...,Yn) Fn = σ(X(1),...,X(n)) = σ(Y1,...,Yn) 18 Example of martingale • Clearly, E | X(n) |< ∞ for each n. • n∑+k E[X(n + k) | Fn] = E Yj | X(1),...,X(n) j=1 n∑+k = E Yj | Y1,...,Yn j=1 = Y1 + ··· + Yn + E [Yn+1 + ··· + Yn+k | Y1,...,Yn] = Y1 + ··· + Yn = X(n) 19 Two important properties of martingale − • Let X be a martingale with respect to a filtration {Ft : t ≥ 0} then E{X(t)|Ft− } = X(t ), where Ft− = ∪s • Let X be a martingale with respect to a filtration {Ft : t ≥ 0} then E{dX(t)|Ft− } = 0 E{X(t) − X(t − ϵ)|Ft−ϵ} = X(t − ϵ) − X(t − ϵ) = 0. 20 Counting Process • A stochastic process N(·) = (N(t): t ≥ 0) is called a counting process if 1. N(0) = 0 2. N(t) < ∞, all t 3. With probability 1, N(t) is a right-continuous step function with jumps of size +1. • Ni(t), N(t) and Nl(t) are all counting process. 21 Compensator Process • the stochastic process A(·), defined by ∫ t A(t) def= Y (u) h(u)du. 0 • E(N(t)) = E(A(t)) • M(t) = N(t) − A(t) is a mean zero process 22 Example • T ∼ EXP (λ) ⇒ h(t) = λ ∫ ∫ • t t A(t) = 0 h(u)Y (u)du = λ 0 Y (u)du = λ min(t, U) • M(t) = I(U ≤ t)δ − λ min(t, U) ∫ • ≤ − t In general M(t) = I(U t)δ 0 Y (u)h(u)du • Indeed, M(t) is a martingale under independent censoring assumption! 23 Martingale ∫ • − t F { C ∈ } M(t) = N(t) 0 Y (s)h(s)ds and t = σ N(u),N (u), u [0, t] . The key is to show that E(M(t + s) − M(t)|Ft) = 0 a.s. • Need to show that [∫ ] t+s E[N(t + s) − N(t)|Ft] = E h(u)Y (u)du|Ft t • Consider the value of Y (t) E [N(t + s) − N(t)|F ] t 0 if Y (t) = 0 = E [N(t + s) − N(t)|Y (t) = 1] if Y (t) = 1. 24 Martingale • Since N(t+s)-N(t) is zero or one, E[N(t + s) − N(t)|Y (t) = 1] = P [N(t + s) − N(t) = 1|Y (t) = 1] = P [t < U ≤ t + s, δ = 1|U ≥ t] P [t < T ≤ t + s, C > T ] = . P [U ≥ t] 25 Martingale • Consider the value of Y (t) [∫ ] t+s E h(u)Y (u)du|Ft t 0 if Y (t) = 0 = [∫ ] t+s | E t h(u)Y (u)du Y (t) = 1 if Y (t) = 1. • When Y (t) = 1 [∫ ] ∫ E t+s h(u)Y (u)du|Y (t) = 1 = t+s h(u)E [Y (u)|Y (t) = 1] du t∫ t = t+s h(u)P [U ≥ u|U ≥ t]du ∫t t+s h(u)P (U ≥ u)du P [t < T ≤ t + s, C > T ] = t = . P (U ≥ t) P [U ≥ t] 26 Martingale • In the last step, we argue that ∫ t+s h(u)P (U ≥ u)du P [t < T ≤ t + s, C > T ] t = P (U ≥ t) P [U ≥ t] • It holds if P (u ≤ T ≤ u + ϵ|T ≥ u, C ≥ u) h(u) = lim , ϵ→0 ϵ which is the noninformative censoring assumption! 27 Predictable process • A stochastic process X(·) is called predictable with respect to a filtration Ft if for very t, X(t) is measurable with respect to Ft− • Any left-continuous adapted process is predictable. N(t) is not predictable since it is right-continuous but Y (t) = I(U ≥ t) is a predictable process. 28 Doob-Meyer decomposition • Suppose that X(·) is a non-negative submartinagle adapted to {Ft, t ≥ 0}. Then there exists a right-continuous and non-decreasing predictable process A(·) such that E(A(t)) < ∞ for all t and M(·) = X(·) − A(·) is a martinagle. If A(0) = 0 a.s., then A(·) is unique! • The process A(·) is the called the compensator for X(·). ∫ • t ≤ 0 h(u)Y (u)du is the unique compensator for the counting process N(t) = I(U t, δ = 1). 29 Property of Martingale associated with counting process • If M(·) is a martinagle, then M 2(·) is a submartingale (why?) • By Doob-Meyer decomposition, there is a unique predicable process < M, M > such that M 2(·)− < M, M > (·) is a martingale. • E(M 2(t)) = E{< M, M > (t)} • If M(·) = N(·) − A(·), then < M, M > (·) = A(·) • We can calculate the variance of N(·) − A(·)! ∫ ∫ t t var{N(t) − A(t)} = E{A(t)} = E Y (u)h(u)du = P (U > u)h(u)du. 0 0 30