Comparison of Survival Curves How can we form a basis for comparison? At a specific point in time, we could see whether the confi- dence intervals for the survival curves overlap. We spent the last class looking at some nonparametric ap- proaches for estimating the survival function, Sˆ(t), over time However, the confidence intervals we have been calculating for a single sample of individuals. are “pointwise” ⇒ they correspond to a confidence inter- val for Sˆ(t∗) at a single point in time, t∗. Now we want to compare the survival estimates between two groups. In other words, we can’t say that the true survival function Example: Time to remission of leukemia patients S(t) is contained between the pointwise confidence intervals with 95% probability.
(Aside: if you’re interested, the issue of confidence bands for the estimated survival function are discussed in Section 4.4 of Klein and Moeschberger)
1 2 Looking at whether the confidence intervals for Sˆ(t∗) overlap Nonparametric comparisons of groups between the 6MP and placebo groups would only focus on comparing the two treatment groups at a single point in time, t∗. We want an overall comparison. All of these are pretty reasonable options, and we’ll see that there have been several proposals for how to compare the survival of two groups. For the moment, we are sticking to Should we base our overall comparison of Sˆ(t) on: nonparametric comparisons. • the furthest distance between the two curves?
• the median survival for each group? Why nonparametric? • the average hazard? (for exponential distributions, this would be like comparing the mean event times) • fairly robust • adding up the difference between the two survival esti- • efficient relative to parametric tests mates over time? • often simple and intuitive ˆ ˆ S(tjA) − S(tjB) j • a weighted sum of differences, where the weights reflect Before continuing the description of the two-sample compar- the number at risk at each time? ison, I’m going to try to put this in a general framework to • a rank-based test? i.e., we could rank all of the event give a perspective of where we’re heading in this class. times, and then see whether the sum of ranks for one group was less than the other.
3 4 General Framework for Survival Analysis We’ll proceed as follows:
We observe (Xi,δi, Zi) for individual i, where • Two group comparisons • Multigroup and stratified comparisons - stratified logrank • X is a censored failure time random variable i • Failure time regression models • δ is the failure/censoring indicator i – Cox proportional hazards model • Z represents a set of covariates i – Accelerated failure time model
Note that Zi might be a scalar (a single covariate, say treat- ment or gender) or may be a (p × 1) vector (representing several different covariates).
These covariates might be: • continuous • discrete • time-varying (more later)
If Zi is a scalar and is binary, then we are comparing the survival of two groups, like in the leukemia example.
More generally though, it is useful to build a model that characterizes the relationship between survival and all of the covariates of interest.
5 6 Two sample tests Mantel-Haenszel Logrank test
• Mantel-Haenszel logrank test • Peto & Peto’s version of the logrank test The logrank test is the most well known and widely used.
• Gehan’s Generalized Wilcoxon It also has an intuitive appeal, building on standard meth- • Peto & Peto’s and Prentice’s generalized Wilcoxon ods for binary data. (Later we will see that it can also be obtained as the score test from a partial likelihood from the • Tarone-Ware and Fleming-Harrington classes Cox Proportional Hazards model.) • Cox’s F-test (non-parametric version)
First consider the following (2 × 2) table classifying those with and without the event of interest in a two group setting: References:
Hosmer & Lemeshow Section 2.4 Event Collett Section 2.5 Group Yes No Total
Klein & Moeschberger Section 7.3 0 d0 n0 − d0 n0 Kleinbaum Chapter 2 1 d1 n1 − d1 n1 Lee Chapter 5 Total dn− dn
7 8 If the margins of this table are considered fixed, then d0 Example: Toxicity in a clinical trial with two treatments follows a ? distribution. Under the null hypothesis of no association between the event and group, it follows that Toxicity n d E(d )= 0 Group Yes No Total 0 n 0 8 42 50 n n d(n − d) 1 2 48 50 Var(d )= 0 1 0 n2(n − 1) Total 10 90 100
Therefore, under H0: [d − n d/n]2 χ2 = 0 0 ∼ χ2 MH n0 n1 d(n−d) 1 n2(n−1) 2 χp =4.00 (p =0.046)
2 This is the Mantel-Haenszel statistic and is approximately χMH =3.96 (p =0.047) equivalent to the Pearson χ2 test for equality of the two groups given by: (o − e)2 χ2 = p e
Note: recall that the Pearson χ2 test was derived for the case where only the row margins were fixed, and thus the variance above was replaced by: n n d(n − d) Var(d )= 0 1 0 n3
9 10 Now suppose we have K (2×2) tables, all independent, and How does this apply in survival analysis? we want to test for a common group effect. The Cochran- Suppose we observe Mantel-Haenszel test for a common odds ratio not equal to 1 can be written as: Group 1: (X11,δ11) ...(X1n1,δ1n1)
Group 0: (X01,δ01) ...(X0n0,δ0n0) [ K (d − n ∗ d /n )]2 2 j=1 0j 0j j j We could just count the numbers of failures: eg., d = χCMH = K 2 1 j=1 n1jn0jdj(nj − dj)/[nj(nj − 1)] K j=1 δ1j Example: Leukemia data, just counting up the number of remissions in each treatment group. where the subscript j refers to the j-th table: Fail Event Group Yes No Total Group Yes No Total 0210 21 0 d0j n0j − d0j n0j 1 9 12 21 1 d1j n1j − d1j n1j Total 30 12 42 Total dj nj − dj nj
2 χp =16.8(p =0.001) 2 χMH =16.4(p =0.001) 2 This statistic is distributed approximately as χ1.
But, this doesn’t account for the time at risk.
Conceptually, we would like to compare the KM survival curves. Let’s put the components side-by-side and compare.
11 12 Cox & Oakes Table 1.1 Leukemia example Logrank Test: Formal Definition
The logrank test is obtained by constructing a (2 × 2) ta- Ordered Group 0 Group 1 ble at each distinct death time, and comparing the death Death Times dj cj rj dj cj rj 120210021 rates between the two groups, conditional on the number at 220190021 risk in the groups. The tables are then combined using the 310170021 Cochran-Mantel-Haenszel test. 420160021 520140021 Note: The logrank is sometimes called the Cox-Mantel test. 600123121 700121017 Let t1, ..., tK represent the K ordered, distinct death times. 840120016 At the j-th death time, we have the following table: 90080116 10 0 0 8 1115 11 2 0 8 0113 Die/Fail 12 2 0 6 0012 Group Yes No Total 13 0 0 4 1012 15 1 0 4 0011 0 d0j r0j − d0j r0j 16 0 0 3 1011 17 1 0 3 0110 1 d1j r1j − d1j r1j 19 0 0 2 01 9 20 0 0 2 01 8 Total dj rj − dj rj 22 1 0 2 10 7 23 1 0 1 10 6 where d and d are the number of deaths in group 0 and 25 0 0 0 01 5 0j 1j 1, respectively at the j-th death time, and r0j and r1j are the number at risk at that time, in groups 0 and 1. Note that I wrote down the number at risk for Group 1 for times 1-5 even though there were no events or censorings at those times.
13 14 The logrank test is: First several tables of leukemia data
CMH analysis of leukemia data