Survival Analysis: Counting Process and Martingale

Lu Tian and Richard Olshen Stanford University

1 Lebesgue-Stieltjes Integrals

• G(·) is a right-continuous step function having jumps at x1, x2, ··· .. ∫ b ∑ ∑ { − − { } f(x)dG(x) = f(xj) G(xj) G(xj )) = f(xj)∆ G(xj) . a a

2 One-sample problem

• In one-sample problem: observation (Ui, δi), i = 1, 2, ··· , n. ∑ • n ≤ N(t) = i=1 I(Ui t, δi = 1) # observed failures by time t ∫ ∞ ∑K f(t)dN(t) = f(τj)dj 0 j=1 ∑ • { n ≥ }−1 f(t) = i=1 I(Ui t) ∫ ∞ ∑K d f(t)dN(t) = j , Y (τ ) 0 j=1 j

The NA estimator for the cumulative hazard function!

3 Two-sample problem

• In two-sample problem: observation (Ui, δi,Zi), i = 1, 2, ··· , n. ∑ • n ≤ Nl(t) = i=1 I(Ui t, δi = 1,Zi = l) # observed failures by time t at group l, l = 0, 1. ∑ • n ≥ Yl(t) = i=1 I(Ui t, Zi = l) # at risk by time t at group l, l = 0, 1.

• Consider the stochastic integral ∫ ∞ ∫ ∞ Y0(t) Y1(t) W = dN1(t) − dN0(t) 0 Y (t) 0 Y (t)

4 Two-sample problem

• ∑K ∑K Y0(τj ) Y1(τj ) W = ∆N1(τj ) − ∆N0(τj ) Y (τj ) Y (τj ) j=1 j=1 [ ] ∑K Y0(τj ) Y1(τj ) = d1j − d0j Y (τj ) Y (τj ) j=1

K ∑ Y (τ )d − Y (τ )(d − d ) = 0 j 1j 1 j j 0j Y (τj ) j=1 ∑ ( ) ∑ Y1(τj ) = d1j − dj · = (Oj − Ej ); Y (τj ) j j

5 Regression problem

• In Cox regression: observations (Ui, δi,Zi), i = 1, 2, ··· , n.

• Ni(t) = I(Ui ≤ t, δi = 1)

• The score function related to PL function is { } ∑n S(1)(β, U ) S (β) = δ Z − i n i i S(0)(β, U ) i=1 i ∫ { } ∑n ∞ S(1)(β, s) = Z − dN (s) i S(0)(β, s) i i=1 0

6 σ-algebra

• F is a non-empty collection of subsets in Ω. F is called σalgebra if

• If E ∈ F then Ec ∈ F • ∈ F ··· ∪∞ ∈ F if Ei , i = 1, , n then i=1Ei .

7 Probability Space

• A probability space (Ω, F,P ) is an abstract space Ω equipped with a σ−algebra and a set function P deﬁned on F such that

1. If P {E} ≥ 0 when E ∈ F

2. P {Ω} = 1 ∑ {∪∞ } ∞ { } ∈ F ··· ∩ ̸ 3. P i=1Ei = i=1 P Ei when Ei , i = 1, , n and Ei Ej = ϕ, i = j.

8 Random Variables and Stochastic Process

• (Ω, F,P ) is a probability space. A random variable X :Ω → R is a real-valued function satisfying {ω : X(ω) ∈ (−∞, b]} ∈ F for any b.

• A stochastic process is a family of random variables X = {X(t): t ∈ Γ} indexed by a set Γ all deﬁned on the same probability space (Ω, F,P ). Often times Γ = [0, ∞) in survival analysis.

9 Filtration and Adaptability

• A family of σ-algebra {Ft : t ≥ 0} is called a ﬁltration of a probability space (Ω, F,P ) if Ft ⊂ F

and Fs ⊂ Ft for s ≤ t.

• A ﬁltration {Ft; t ≥ 0} is right-continuous if for any t Ft+ = Ft where Ft+ = ∩ϵ>0Ft+ϵ.

• A stochastic process {X(t): t ≥ 0} is said to be adapted to {Ft, t ≥ 0} if X(t) is Ft measurable

for each t, i.e., {ω : X(t; ω) ∈ (−∞, b]} ∈ Ft.

10 History of a stochastic process

• Let X = {X(t): t ≥ 0} be a stochastic process, and let Ft = σ{X(s) : 0 ≤ s ≤ t}, the smallest σ-algebra making all of the random variables X(s), 0 ≤ s ≤ t measurable. The ﬁltration

{Ft : t ≥ 0} is called the history of X.

11 Conditional Expectation

• Suppose that Y is random variable on a probability space (Ω, F,P ) and let G be a sub-σ-algebra of F. Let X be a random variable satisfying

1. X is G-measurable ∫ ∫ ∈ G 2. B Y dP = B XdP for all sets B . The random variable X is the called the conditional expectation of Y given G and denoted by E(Y |G).

12 Conditional Expectation

• If Ft = {ϕ, Ω}, then E(X|Ft) = E(X)

• E{E(X|Ft)} = E(X)

• If Fs ⊂ Ft, then E{E(X|Fs)|Ft} = E(X|Fs)

• If Y is Ft-measurable, then E(XY |Ft) = YE(X|Ft)

• If X and Y are independent, then E(Y |σ(X)) = E(Y )

• E(aX + bY |Ft) = aE(X|Ft) + bE(Y |Ft)

• If g(·) is a convex function, then E{g(X)|Ft} ≥ g{E(X|Ft)}.

• E(Y |σ(Xt, t ∈ Γ)) = E(Y |Xt, t ∈ Γ)

13 Example of conditional expectation

• The probability space (Ω, F,P )

1. Ω = [0, 1)

2. F is the Borel algebra generated by all the open interval (a, b) ⊂ [0, 1]

3. The probability measure: the lebesgue measure P ((a, b)) = b − a.

• Deﬁne the sigma ﬁelds

1. F1 = σ([0, 1/2), [1/2, 1))

2. F2 = σ([0, 1/4), [1/4, 2/4), [2/4, 3/4), [3/4, 1)) k k k 3. Fk = σ([i/2 , (i + 1)/2 ), i = 0, 1, ··· , 2 − 1)

14 Example of conditional expectation

• Random Variable X : (0, 1) → R1 : X(ω) = f(ω)

• Xk = E(X|Fk)

• Xk is Fk measurable k k – Xk(ω) is a step function: Xk(ω) = ai, ω ∈ [i/2 , (i + 1)/2 ] ∫ ∫ • B XkdP = B XdP – For i = 0, 1, ··· , 2k − 1, ∫ (i+1)/2k k ai = 2 f(ω)dω i/2k

15 Martingale

• Let X(·) = {X(t), t ≥ 0} be a right-continuous a stochastic process with left-hand limit and Ft be a ﬁltration on a common probability space. X(·) is a martingale if

1. X is adapted to {Ft : t ≥ 0}. 2. E|X(t)| < ∞ for any t

3. E[X(t + s)|Ft] = X(t) for any t, s ≥ 0.

• X(·) is called a sub-martingale if = is replaced by ≥ and super-martingale if = is replaced by ≤ .

16 Martingale

• In the survival analysis: Ft = σ{X(s) : 0 ≤ u ≤ t}.

• For t, s ≥ 0 E[X(t + s)|Ft] = E[X(t + s)|X(u), 0 ≤ u ≤ t], where (X(u), 0 ≤ u ≤ t) is the history of the process X(·) from 0 to t.

17 Example of martingale

• Y1,Y2,... are i.i.d. random variables satisfying   +1 w.p. 1/2 Yi =  −1 w.p. 1/2

• Deﬁne

X(0) = 0, and ∑n X(n) = Yj n = 1, 2, ··· j=1

• (X(u) : 0 ≤ u ≤ n) = (X(1),...,X(n)) ∼ (Y1,...,Yn)

Fn = σ(X(1),...,X(n)) = σ(Y1,...,Yn)

18 Example of martingale

• Clearly, E | X(n) |< ∞ for each n.

•   n∑+k   E[X(n + k) | Fn] = E Yj | X(1),...,X(n) j=1  n∑+k   = E Yj | Y1,...,Yn j=1

= Y1 + ··· + Yn + E [Yn+1 + ··· + Yn+k | Y1,...,Yn]

= Y1 + ··· + Yn = X(n)

19 Two important properties of martingale

− • Let X be a martingale with respect to a ﬁltration {Ft : t ≥ 0} then E{X(t)|Ft− } = X(t ),

where Ft− = ∪s

• Let X be a martingale with respect to a ﬁltration {Ft : t ≥ 0} then E{dX(t)|Ft− } = 0

E{X(t) − X(t − ϵ)|Ft−ϵ} = X(t − ϵ) − X(t − ϵ) = 0.

20 Counting Process

• A stochastic process N(·) = (N(t): t ≥ 0) is called a counting process if

1. N(0) = 0

2. N(t) < ∞, all t

3. With probability 1, N(t) is a right-continuous step function with jumps of size +1.

• Ni(t), N(t) and Nl(t) are all counting process.

21 Compensator Process

• the stochastic process A(·), deﬁned by ∫ t A(t) def= Y (u) h(u)du. 0

• E(N(t)) = E(A(t))

• M(t) = N(t) − A(t) is a mean zero process

22 Example

• T ∼ EXP (λ) ⇒ h(t) = λ ∫ ∫ • t t A(t) = 0 h(u)Y (u)du = λ 0 Y (u)du = λ min(t, U) • M(t) = I(U ≤ t)δ − λ min(t, U) ∫ • ≤ − t In general M(t) = I(U t)δ 0 Y (u)h(u)du

• Indeed, M(t) is a martingale under independent censoring assumption!

23 Martingale

∫ • − t F { C ∈ } M(t) = N(t) 0 Y (s)h(s)ds and t = σ N(u),N (u), u [0, t] . The key is to show that

E(M(t + s) − M(t)|Ft) = 0 a.s.

• Need to show that [∫ ] t+s E[N(t + s) − N(t)|Ft] = E h(u)Y (u)du|Ft t

• Consider the value of Y (t)

E [N(t + s) − N(t)|F ]  t  0 if Y (t) = 0 =  E [N(t + s) − N(t)|Y (t) = 1] if Y (t) = 1.

24 Martingale

• Since N(t+s)-N(t) is zero or one,

E[N(t + s) − N(t)|Y (t) = 1]

= P [N(t + s) − N(t) = 1|Y (t) = 1]

= P [t < U ≤ t + s, δ = 1|U ≥ t]

P [t < T ≤ t + s, C > T ] = . P [U ≥ t]

25 Martingale

• Consider the value of Y (t) [∫ ] t+s E h(u)Y (u)du|Ft t   0 if Y (t) = 0 = [∫ ]  t+s | E t h(u)Y (u)du Y (t) = 1 if Y (t) = 1.

• When Y (t) = 1

[∫ ] ∫ E t+s h(u)Y (u)du|Y (t) = 1 = t+s h(u)E [Y (u)|Y (t) = 1] du t∫ t = t+s h(u)P [U ≥ u|U ≥ t]du ∫t t+s h(u)P (U ≥ u)du P [t < T ≤ t + s, C > T ] = t = . P (U ≥ t) P [U ≥ t]

26 Martingale

• In the last step, we argue that ∫ t+s h(u)P (U ≥ u)du P [t < T ≤ t + s, C > T ] t = P (U ≥ t) P [U ≥ t]

• It holds if P (u ≤ T ≤ u + ϵ|T ≥ u, C ≥ u) h(u) = lim , ϵ→0 ϵ which is the noninformative censoring assumption!

27 Predictable process

• A stochastic process X(·) is called predictable with respect to a ﬁltration Ft if for very t, X(t) is

measurable with respect to Ft−

• Any left-continuous adapted process is predictable. N(t) is not predictable since it is right-continuous but Y (t) = I(U ≥ t) is a predictable process.

28 Doob-Meyer decomposition

• Suppose that X(·) is a non-negative submartinagle adapted to {Ft, t ≥ 0}. Then there exists a right-continuous and non-decreasing predictable process A(·) such that E(A(t)) < ∞ for all t and

M(·) = X(·) − A(·)

is a martinagle. If A(0) = 0 a.s., then A(·) is unique!

• The process A(·) is the called the compensator for X(·). ∫ • t ≤ 0 h(u)Y (u)du is the unique compensator for the counting process N(t) = I(U t, δ = 1).

29 Property of Martingale associated with counting process

• If M(·) is a martinagle, then M 2(·) is a submartingale (why?)

• By Doob-Meyer decomposition, there is a unique predicable process < M, M > such that

M 2(·)− < M, M > (·)

is a martingale. • E(M 2(t)) = E{< M, M > (t)}

• If M(·) = N(·) − A(·), then < M, M > (·) = A(·)

• We can calculate the variance of N(·) − A(·)! ∫ ∫ t t var{N(t) − A(t)} = E{A(t)} = E Y (u)h(u)du = P (U > u)h(u)du. 0 0