<<

CHAPTER 6 ST 745, Daowen Zhang 6 Modeling Survival Data with Cox Regression Models

6.1 The Proportional Hazards Model

A proportional hazards model proposed by D.R. Cox (1972) assumes that

T z1β1+···+zpβp z β λ(t|z)=λ0(t)e = λ0(t)e , (6.1) where z is a p × 1 vector of covariates such as treatment indicators, prognositc factors, etc., and

β is a p × 1 vector of regression coefficients. Note that there is no intercept β0 in model (6.1).

Obviously,

λ(t|z =0)=λ0(t).

So λ0(t) is often called the baseline hazard function. It can be interpreted as the hazard function for the population of subjects with z =0.

The baseline hazard function λ0(t) in model (6.1) can take any shape as a function of t.The

T only requirement is that λ0(t) > 0. This is the nonparametric part of the model and z β is the parametric part of the model. So Cox’s proportional hazards model is a semiparametric model.

Interpretation of a proportional hazards model

1. It is easy to show that under model (6.1)

exp(zT β) S(t|z)=[S0(t)] ,

where S(t|z) is the survival function of the subpopulation with covariate z and S0(t)isthe

survival function of baseline population (z = 0). That is

R t − λ0(u)du S0(t)=e 0 .

PAGE 120 CHAPTER 6 ST 745, Daowen Zhang

2. For any two sets of covariates z0 and z1,

zT β λ(t|z1) λ0(t)e 1 T (z1−z0) β, t ≥ , = zT β =e for all 0 λ(t|z0) λ0(t)e 0

which is a constant over time (so the name of proportional hazards model). Equivalently, " # λ(t|z1) T log =(z1 − z0) β, for all t ≥ 0. λ(t|z0)

3. With one unit increase in zk while other covariate values being held fixed, then " # λ(t|zk +1) log =log(λ(t|zk +1))− log(λ(t|zk)) = βk. λ(t|zk)

Therefore, βk is the increase in log hazard (i.e., log hazard-ratio) at any time with unit

increase in the kth covariate zk.Equivalently,

λ(t|zk +1) =eβk , for all t ≥ 0. λ(t|zk)

So exp(βk) is the hazard ratio associated with one unit increase in zk. Furthermore, since

P [t ≤ T

P [t ≤ T

so exp(βk) can be loosely interpreted as the ratio of two conditional probabilities of dying

in the near future given a subject is alive at any time t.Since

λ(t|zk +1)− λ(t|zk) =eβk − 1. λ(t|zk)

So eβk − 1 can be interpreted as the percentage change (increase or decrease) in hazard

with one unit increase in zk while adjusting for other covariates.

Inferential Problems

From the interpretation of the model, it is obvious that β characterizes the “effect” of z.So

β should be the focus of our inference while λ0(t) is a nuisance “parameter”. Given a sample of censored survival data, our inferential problems include:

1. Estimate β; derive its statistical properties.

PAGE 121 CHAPTER 6 ST 745, Daowen Zhang

2. Testing hypothesis H0 : β = 0 or for part of β.

3. Diagnostics.

Estimation

Since the baseline hazard λ0(t) is left completely unspecified (infinite dimensional), ordinary likelihood methods can’t be used to estimate β. Cox conceived of the idea of a partial likelihood to remove the nuisance parameter λ0(t) from the proposed estimating equation.

Historical Note: Cox described the proportional hazards model in JRSSB (1972), in what is now the most quoted statistical papers in history. He also outlined in this paper the method for estimation which he referred to as using conditional likelihood. It was pointed out to him in the literature that what he proposed was not a conditional likelihood and that there may be some

flaws in his logic. Cox (1975) was able to recast his method of estimation through what he called

“partial likelihood” and published this in Biometrika. This approach seemed to be based on sound inferential principles. Rigorous proofs showing the consistency and asymptotic normality were not published until 1981 when Tsiatis (Annals of ) demonstrated these large sample properties. In 1982, Anderson and Gill (Annals of Statistics) simplified and generalized these results through the use of counting processes.

6.2 Estimation Using Partial Likelihood

Data and Model

1. Data: (Xi, ∆i,zi), i =1, ···,n, where for the ith individual

Xi =min(Ti,Ci).

∆i = I(Ti ≤ Ci).

T zi =(zi1,zi2, ···,zip) is a vector of covariates.

PAGE 122 CHAPTER 6 ST 745, Daowen Zhang

2. Model: Proportional hazards model

zT β λ(t|zi)=λ0(t)e i ,

where ( ) P [t ≤ Ti

Assume that Ci and Ti are conditionally independent given zi. Then the cause-specific hazard

can be used to represent the hazard of interest. That is (in terms of conditional probabilities)

P x ≤ X

Similar to the case of log rank test, we need to define some notation. Let us break the time

axis (patient time) into a grid of points. Assume the survival time is continuous. We hence can

take the grid points dense enough so that at most one death can occur within any interval.

Let dNi(u) denote the indicator for the ith individual being observed to die in [u, u +∆u).

Namely,

dNi(u)=I(Xi ∈ [u, u +∆u), ∆i =1).

Let Yi(u) denote the indicator for whether or not the ith individual is at risk at time u.

Namely,

Yi(u)=I(Xi ≥ u).

Pn Let dN(u)= i=1 dNi(u) denote the number of deaths for the whole sample occurring in

[u, u +∆u). Since we are assuming ∆u is sufficiently small, so dN(u) is either 1 or 0 at any time

u.

Pn Let Y (u)= i=1 Yi(u) be the total number from the entire sample who are at risk at time u.

Let F(x) denote the information up to time x (one of the grid points)

F(x)={(dNi(u),Yi(u),zi),i=1, ···,n; for grid points u

PAGE 123 CHAPTER 6 ST 745, Daowen Zhang

Note: Conditional on F(x), we know who has died or was censored prior to x, when they died or were censored, together with their covariate values. We know the individuals at risk at time x and their corresponding covariate value. In addition, we also know if a death occurs at interval [x, x +∆x).

What we don’t know is the individual who was observed to die among those at risk at time x if dN(x)=1.

Let I(x) denote the individual in the sample who died at time x if someone died. If no one dies at time x,thenI(x)=0

For example, if I(x)=j, then this means that the jth individual in the sample with covariate vector zj died in [x, x +∆x).

Let F(∞) denote all the data in the sample. Namely

F(∞)={(Xi, ∆i,zi),i=1, ···,n}.

If we let u1

(with redundancy) can be expressed as

(F(u1),I(u1), F(u2),I(u2), ···, F(∞)).

Denote the observed values of the above random variables by lower cases. Then the likelihood of the parameter λ0(t)andβ can be written as

P [F(u1)=f(u1); λ0(·),β] × P [I(u1)=i(u1)|F(u1)=f(u1); λ0(·),β]

×P [F(u2)=f(u2)|F(u1)=f(u1),I(u1)=i(u1); λ0(·),β]

×P [I(u2)=i(u2)|F(u1)=f(u1),I(u1)=i(u1), F(u2)=f(u2); λ0(·),β]

×··· and the last term can be simplified as

P [I(u2)=i(u2)|F(u1)=f(u1),I(u1)=i(u1), F(u2)=f(u2); λ0(·),β]

PAGE 124 CHAPTER 6 ST 745, Daowen Zhang

= P [I(u2)=i(u2)|F(u2)=f(u2); λ0(·),β].

That is, the full likelihood can be written as the product of a series of conditional likelihoods.

The partial likelihood (as defined by D.R. Cox) consists of the product of every other condi- tional probabilities in the above presentation. That is

Y PL = P [I(u)=i(u)|F(u)=f(u); λ0(·),β]. {all grid pt u}

Suppose we have the following small data set, we will try to find find out this partial likeli- hood:

Patient ID xδz

1 212

2 202

3 311

4 413

It turns out that the partial likelihood is

e2β eβ e3β PL(β)= × × . (6.2) e2β +e2β +eβ +e3β eβ +e3β e3β

In general, we have to consider two cases in calculating the above partial likelihood.

Case 1: Suppose conditional on F(u)wehavedN(u) = 0. That is, no death is observed at

time u.Insuchacase,I(u) = 0 with probability 1.

Hence for any grid point u where dN(u)=0,wehave

P [I(u)=0|F(u)=f(u)] = 1.

Therefore, the partial likelihood is not affected at any point u such that dN(u)=0.

PAGE 125 CHAPTER 6 ST 745, Daowen Zhang

Case 2: dN(u) = 1. Conditional on F(u), if we know that one individual dies at time u,

then it must be one of the individuals still at risk (alive and not censored) at time u; i.e.,

among the following individuals

{i : Yi(u)=1}.

Also conditional on F(u), we know the covariate vector zi associated to each individual i

such that Yi(u) = 1. Therefore, we ask the following question:

Pn Among Y (u)= i=1 Yi(u) individuals, what is the probability that the observed

death happened to the ith subject (who is actually observed to die at u) rather

than to the other patients? Unlike the null hypothesis case for the two-sample problem, the probabilities of choosing

these subjects are not equally likely, but rather, they are proportional to their cause-

specific hazard of dying at time u, which can be derived as follows:

Let Ai = the event that subject i is going to die in [u, u +∆u) given that he/she is still

alive at u. If a patient is not at risk at u (i.e., Yi(u)=0),thenAi = φ.Sincewechose∆u

to be so small that there is at most one death in [u, u +∆u), so we know

A1,A2, ···,An are mutually exclusive.

Because of the independence of survival times and censoring times, those Y (u)patients

who are at risk at u (not censored and still alive at u) make up a random sample of the

subpopulation consisting of the patients who will survive up to u (and with the same

covariate value). Under independent censoring assumption, we already showed in Chapter

3 that the cause-specific hazard is the same as the hazard of interest; i.e.,

λ(u, δi =1|zi)=λ(u, |zi).

Since ∆u is chosen to be very small, so

P [Ai] ≈ Yi(u)λ(u, δi =1|zi)∆u

= Yi(u)λ(u, |zi)∆u.

T = Yi(u)λ0(u)exp(zi β)∆u,

PAGE 126 CHAPTER 6 ST 745, Daowen Zhang

where the last equation is due to the assumption of the cox model. Therefore

P [I(u)=i(u)|F(u)=f(u); λ0(·),β]

= P [Ai(u)|A1 ∪···∪An]

P [Ai(u)] = Pn l=1 P [Al] T λ0(u)exp(z β)∆u ≈ i(u) Pn T l=1 λ0(u)exp(zl β)Yl(u)∆u exp(zT β) i(u) . = Pn T l=1 exp(zl β)Yl(u)

Here Yi(u)(u) = 1 since we know this patient had to be at risk at u (since we know that

this patient died in [u, u +∆u)).

Combining these cases, the partial likelihood can be written as " #dN(u) Y exp(zT β) PL β i(u) . ( )= Pn T exp(z β)Yl(u) {all grid pt u} l=1 l

Remark: To be formal, we need to define z0 even though it is never used. We can, for example, take z0 =0.

Other equivalent ways of writing the partial likelihood include: Let t1, ···,td define the distinct death times, then   Yd exp(zT β) PL β  i(tj )  ( )= Pn T ; j=1 l=1 exp(zl β)Yl(tj) " # Yn Y zT β dNi(u) PL β exp( i ) ( )= Pn T ; exp(z β)Yl(u) i=1 {all grid pt u} l=1 l " # Yn zT β δi PL β exp( i ) . ( )= Pn T i=1 l=1 exp(zl β)Yl(xi)

Remark: Stare at these different representations for a while, you will convince yourself that they are all equivalent.

The importance of using the partial likelihood is that this function depends only on β, the parameter of interest, and is free of the baseline hazard λ0(t), which is infinite dimensional nuisance function.

PAGE 127 CHAPTER 6 ST 745, Daowen Zhang

Cox suggested treating PL as a regular and making inference on β ac- cordingly. For example, we maximize the partial likelihood to get the estimate of β, often called

MPLE (maximum partial likelihood estimate), and use the minus of the second derivative of the log partial likelihood as the information matrix, etc.

Properties of the score of the partial likelihood

For ease of presentation, let us focus on one covariate case. The extension is straightforward.

Obviously, the log partial likelihood function of β is " Ã !# X Xn `(β)= dN(u) zI(u)β − log exp(zlβ)Yl(u) . {all grid pts u} l=1

The score function is " P # X n ∂`(β) l=1 zlexp(zlβ)Yl(u) U(β)= = dN(u) zI(u) − Pn , ∂β exp(zlβ)Yl(u) {all grid pts u} l=1 and the second derivative is P ÃP !  2 X n 2 n 2 ∂ `(β) z exp(zlβ)Yl(u) zlexp(zlβ)Yl(u) − dN u  Pl=1 l − Pl=1  . 2 = ( ) n n ∂β u l=1 exp(zlβ)Yl(u) l=1 exp(zlβ)Yl(u)

Define P n Xn l=1 zlexp(zlβ)Yl(u) z¯(u, β)= Pn = zlwl, l=1 exp(zlβ)Yl(u) l=1 where

exp(zlβ)Yl(u) wl = Pn l=1 exp(zlβ)Yl(u) is the weight that is proportional to the hazard of the individual failing. Soz ¯(u, β)canbe interpreted as the weighted average of the covariate z among those individuals still at risk at time u with weights wl.

Define P ÃP !  n z2 z β Y u n z z β Y u 2  l=1 l exp( l ) l( ) l=1 lexp( l ) l( )  Vz(u, β)= Pn − Pn l=1 exp(zlβ)Yl(u) l=1 exp(zlβ)Yl(u)

PAGE 128 CHAPTER 6 ST 745, Daowen Zhang " # Pn 2 l=1 zl exp(zlβ)Yl(u) 2 = Pn − (¯z(u, β)) l=1 exp(zlβ)Yl(u) Xn 2 2 = zl wl − (¯z(u, β)) . l=1

This can be shown to be equal to " # Xn 2 Xn (zl − z¯(u, β)) exp(zlβ)Yl(u) 2 Vz(u, β)= Pn = (zl − z¯(u, β)) wl. l=1 l=1 exp(zlβ)Yl(u) l=1

This last representation says that Vz(u, β) can be interpreted as the weighted of the covariates among those individuals still at risk at u and hence Vz(u, β) > 0. Consequently,

∂2` β X ( ) − dN u V u, β < . 2 = ( ) z( ) 0 ∂β u

The above property can also be displayed graphically. For example, the partial likelihood function (6.2) looks like:

Figure 6.1: The partial likelihood (6.2)

partial likelihood

0.00 0.05 0.10 0.15

−4 −2 0 2 4

beta

Therefore `(β) has a unique maximizer and can be obtained uniquely by solving the following partial likelihood equation: " P # X n ∂`(β) l=1 zlexp(zlβ)Yl(u) U(β)= = dN(u) zI(u) − Pn =0. ∂β exp(zlβ)Yl(u) {all grid pts u} l=1

PAGE 129 CHAPTER 6 ST 745, Daowen Zhang

This maximizer βˆ defines the MPLE of β.

Terminology:Thequantity ∂2` β X − ( ) dN u V u, β 2 = ( ) z( ) ∂β u is defined as the partial likelihood observed information and is denoted by J(β).

Ultimately, we want to show that the MPLE βˆ has nice statistical properties. These include:

• Consistency: That is, βˆ will converge to the true value of β which generated the data as

the sample size gets larger. We call this true value β0.

• Asymptotic Normality: βˆ will be approximately normally distributed with mean β0 and a

variance which can be estimated from the data. This approximation will be better as the

sample size gets larger. This result is useful in making inference for the true β.

• Efficiency: Among all other competing for β, the MPLE has the smallest vari-

ance, at least, when the sample size gets larger.

In order to show the properties for βˆ,weexpandU(βˆ) at the true value β0 using Taylor expansion: ∂U β U βˆ ≈ U β ( 0) βˆ − β . 0= ( ) ( 0)+ ∂β ( 0)

Since ∂U β ∂2` β ( 0) ( 0) −J β , ∂β = ∂β2 = ( 0) therefore

−1 (βˆ − β0) ≈ [J(β0)] U(β0)

This expression indicates that we need to investigate the properties of the score function U(β0)

X h i U(β0)= dN(u) zI(u) − z¯(u, β0) . u

PAGE 130 CHAPTER 6 ST 745, Daowen Zhang

Properties of the score:

(1) E[U(β0)] = 0.

Since " # X ³ ´ E[U(β0)] = E dN(u) zI(u) − z¯(u, β0) u X h ³ ´i = E dN(u) zI(u) − z¯(u, β0) , u and

h ³ ´i E dN(u) zI(u) − z¯(u, β0) h h ³ ´¯ ii ¯ =EE dN(u) zI(u) − z¯(u, β0) ¯ F(u)

Conditional on F(u), dN(u)and¯z(u, β0) are both known. Consequently the inner expecta- tion can be written as

h i dN(u) E[zI(u)|F(u)] − z¯(u, β0) .

Remember that I(u) is the patient identifier for the individual that dies at time u and is set to zero if no one dies at u. If no one dies at u,thendN(u) = 0, and hence the above quantity is zero. If someone dies at u,thendN(u) = 1, and conditional on F(u), we know it has to be one of the Y (u)peopleatriskattimeu; i.e., I(u) must be one of the values {i : Yi =1}.

The conditional distribution of zI(u) given F(u) can be derived through the conditional distribution of I(u)givenF(u) as shown in Table 6.1.

Therefore P Xn n l=1 zlexp(zlβ0)Yl(u) E[zI(u)|F(u)] = zlwl = Pn =¯z(u, β0). l=1 l=1 exp(zlβ0)Yl(u)

From this, we immediately get

E[U(β0)] = 0.

PAGE 131 CHAPTER 6 ST 745, Daowen Zhang

Table 6.1: Conditional distribution of zI(u) given F(u)

Values of I(u)ValuesofzI(u) Probability

Pn 1 z1 exp(z1β0)Y1(u)/ l=1 exp(zlβ0)Yl(u)=w1

Pn 2 z2 exp(z2β0)Y2(u)/ l=1 exp(zlβ0)Yl(u)=w2

......

Pn nzn exp(znβ0)Yn(u)/ l=1 exp(zlβ0)Yl(u)=wn

Note: From the conditional distribution of zI(u) given F(u), it is easy to see the conditional variance of zI(u)

n X ³ ´2 Var[zI(u)|F(u)] = zl − E[zI(u)|F(u)] wl l=1 Pn 2 l=1(zl − z¯(u, β0)) exp(zlβ0)Yl(u) = Pn l=1 exp(zlβ0)Yl(u)

= Vz(u, β0).

(2) Finding an unbiased estimate for the variance of U(β0)

Since E[U(β0)] = 0, so

2 Var[U(β0)] = E[U(β0)] " # X h i 2 =E dN(u) zI(u) − z¯(u, β0) " u # X n h io2 =E dN(u) zI(u) − z¯(u, β0)  u  X n h io n h io  0 0  +E dN(u) zI(u) − z¯(u, β0) dN(u ) zI(u0) − z¯(u ,β0) u6=u0

PAGE 132 CHAPTER 6 ST 745, Daowen Zhang

As usual, we will take an arbitrary cross-product and show it has zero expectation. Assume u0 >uand denote

h i h i 0 0 0 A(u)=dN(u) zI(u) − z¯(u, β0) ,A(u )=dN(u ) zI(u) − z¯(u ,β0) .

Then the expectation of the cross-product is

E[A(u)A(u0)]

=E[E[A(u)A(u0)|F(u0)]] .

Since u0 >u, conditional on F(u0), A(u)isknown.So

E[A(u)A(u0)|F(u0)] = A(u)E [A(u0)|F(u0)] = 0.

Therefore

X h i 2 Var[U(β0)] = E A (u) u X h i = E A2(u) u h h ¯ ii X ¯ = E E A2(u)¯ F(u) u

The inner conditional expectation is · ¯ ¸ h ¯ i n h io ¯ 2 ¯ 2¯ E A (u)¯ F(u) =E dN(u) zI(u) − z¯(u, β0) ¯ F(u) .

Since we pick the grid points in our partition of time fine enough so that dN(u)iseither0 or 1, so dN 2(u)=dN(u). Hence · ¯ ¸ h ¯ i h i ¯ 2 ¯ 2¯ E A (u)¯ F(u) =E dN(u) zI(u) − z¯(u, β0) ¯ F(u) .

Conditional on F(u), dN(u)isknown,¯z(u, β0) is also known and from Table 6.1

z¯(u, β0)=E[zI(u)|F(u)].

PAGE 133 CHAPTER 6 ST 745, Daowen Zhang

Therefore · ¯ ¸ h ¯ i h i ¯ 2 ¯ 2¯ E A (u)¯ F(u) = dN(u)E zI(u) − z¯(u, β0) ¯ F(u)

= dN(u)Var[zI(u)|F(u)]

= dN(u)Vz(u, β0).

Consequently,

X Var [U(β0)] = E[dN(u)Vz(u, β0)] u" # X =E dN(u)Vz(u, β0) . u

P Note that the quantity u dN(u)Vz(u, β0) is a (can be calculated from the observed P P data), so u dN(u)Vz(u, β0) is an unbiased estimate of Var [U(β0)]. In fact, u dN(u)Vz(u, β0) is the partial likelihood observed information J(β0) we defined before.

Conclusion

P The score U(β0)= u A(u) is a sum of conditionally uncorrelated mean zero random vari- ables and its variance can be unbiasedly estimated by

X J(β0)= dN(u)Vz(u, β0). u

By the martingale CLT, we have:

a U(β0) ∼ N(0,J(β0)).

Previously, we have shown that

−1 (βˆ − β0) ≈ [J(β0)] U(β0).

Treating J(β0) as a constant, we get the approximate distribution of (βˆ − β0)

a −1 (βˆ − β0) ∼ N(0,J (β0)).

PAGE 134 CHAPTER 6 ST 745, Daowen Zhang

−1 Of course, in practice, β0 is unknown. But we can substitute βˆ for β0 and use J (βˆ)asthe estimated variance of βˆ. That is, we use the following approximate distribution for (βˆ − β0)

a −1 (βˆ − β0) ∼ N(0,J (βˆ)), where

X h i J(βˆ)= dN(u) Vz(u, βˆ) , u and βˆ is the MPLE of β solving the following equation

X h i U(βˆ)= dN(u) zI(u) − z¯(u, βˆ) =0. u

Inference with a Single Covariate

Assume a proportional hazards model with a single covariate z

zβ λ(t)=λ0(t)e .

Afterwegetourdata(xi,δi,zi),wecanobtaintheMPLEβˆ by solving the partial likelihood equation; i.e., setting the partial score to zero. Then asymptotically,

a −1 βˆ ∼ N(β0,J (βˆ)).

We can use this fact to construct confidence interval for β and test the hypothesis H0 : β = β0, etc. For example, a (1 − α)CIofβ is

−1 1/2 βˆ § zα/2[J (βˆ)] .

Myelomatosis data revisited: We analyzed myelomatosis data and did not find statistically significant difference between treatments 1 and 2. We want to quantify the difference by assuming the hazards of these two treatments are proportional to each other. Define a treatment indicator trt1 which takes value 0 for treatment 1 and takes value 1 for treatment 2. Then we can use

Proc Phreg for this purpose.

PAGE 135 CHAPTER 6 ST 745, Daowen Zhang proc phreg data=myel; model dur*status(0)=trt1; run;

Part of the output is given as follows:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

16:43 Thursday, March 2, 2000 15 The PHREG Procedure

Data Set: WORK.MYEL Dependent Variable: DUR Censoring Variable: STATUS Censoring Value(s): 0 Ties Handling: BRESLOW

Summary of the Number of Event and Censored Values

Percent Total Event Censored Censored

25 17 8 32.00

Testing Global Null Hypothesis: BETA=0 Without With Criterion Covariates Covariates Model Chi-Square

-2 LOG L 94.084 92.765 1.319 with 1 DF (p=0.2508) Score . . 1.297 with 1 DF (p=0.2547) Wald . . 1.263 with 1 DF (p=0.2610)

Analysis of Maximum Likelihood Estimates

Parameter Standard Wald Pr > Risk Variable DF Estimate Error Chi-Square Chi-Square Ratio

TRT1 1 0.572807 0.50960 1.26344 0.2610 1.773

So βˆ =0.5728 with standard error 0.5096. This means that compared to treatment 1, treatment 2 will increase the hazard of dying at any time by 77% (exp(βˆ) − 1). A 95% CI of β is

βˆ § 1.96 ∗ se[βˆ]=0.5728 § 1.96 ∗ 0.5096 = [−0.426, 1.572].

And a 95% CI for the hazard ratio exp(β)is

[e−0.426, e1.572]=[0.653, 4.816].

PAGE 136 CHAPTER 6 ST 745, Daowen Zhang

Note: The output also gives three tests for H0 : β = 0: likelihood ratio, score and Wald tests.

Comparison of and two-sample log rank test

Assume z is the dichotomous indicator for treatment; i.e.,    1 for treatment 1 z = ,   0 for treatment 0 and the proportional hazards model:

zβ λ(t)=λ0(t)e .

Score test: Under H0 : β = 0, the score U(0) (evaluated under H0) has the distribution

a U(0) ∼ N(0,J(0)).

Or equivalently, " # U 2 (0) ∼a χ2. J 1/2(0) 1

Since the score U(0) has the expression

X h i U(0) = dN(u) zI(u) − z¯(u, 0) . u

Then

1. If a death occurs at time u,thendN(u) = 1, in which case there will a contribution to

U(0) by adding [zI(u) − z¯(u, 0)]. Otherwise no contribution.

2. Since z = 1 for treatment 1 and z = 0 for treatment 0, zI(u) will then the number of deaths

at time u from treatment 1.

3. Under H0 : β =0,¯z(u, 0) is simplified to be

Pn l=1 zlYl(u) z¯(u, 0) = Pn , l=1 Yl(u)

PAGE 137 CHAPTER 6 ST 745, Daowen Zhang

which is the proportion of individuals in group 1 among those at risk at time u.Since

we only assume one death at time u, this proportion is the expected number of death

for treatment 1 among those at risk at time u, under the null hypothesis of no treatment

difference.

4. Therefore, U(0) is the sum over the death times of the observed number of deaths from

treatment 1 minus the expected number of deaths under the null hypothesis. This was the

numerator of the two-sample log rank test: " # X Y1(u) dN1(u) − dN(u) u Y (u)

where dN 1(u) = # of observed deaths from treatment 1, Y1(u)=#atriskattimeu from

treatment 1, Y (u) = total # at risk at time u from 2 treatments, dN(u) = total # of

deaths from 2 treatments.

5. The denominator of the score test was computed as " # X 1/2 1/2 J (0) = dN(u)Vz(u, 0) , u

where

P 2 l[zl − z¯(u, 0)] Yl(u) Vz(u, 0) = P . l Yl(u)

Note: Among the Y (u) individuals at risk at time u,thereareY1(u) individuals whose zl

value of zl =1andY0(u) individuals whose zl value of zl = 0. We already argued that

Y1(u) z¯(u, 0) = . Y (u)

Therefore,

P 2 l[zl − z¯(u, 0)] Yl(u) Vz(u, 0) = P l Yl(u) h i2 h i2 − Y1(u) Y u − Y1(u) Y u 1 Y (u) 1( )+ 0 Y (u) 0( ) = (zl(u)takes1or0) Y (u) 2 2 Y0 (u)Y1(u) Y1 (u)Y0(u) Y 2(u) + Y 2(u) = (Y1(u)+Y0(u)=Y (u)) Y (u)

PAGE 138 CHAPTER 6 ST 745, Daowen Zhang

Y0(u)Y1(u)Y (u) = Y 3(u) Y0(u)Y1(u) = . Y 2(u)

Therefore,

X Y u Y u J dN u 0( ) 1( ). (0) = ( ) 2 u Y (u)

Let us contrast this with the variance used to compute the logrank test statistic: " # X Y u Y u dN u Y u − dN u 1( ) 0( ) ( )[ ( ) ( )] . 2 u Y (u)[Y (u) − 1]

Note: In the special case where dN(u) can only be one or zero, then above expression reduces to " # " # X Y u Y u dN u Y u − X Y u Y u dN u 1( ) 0( ) ( )[ ( ) 1] 1( ) 0( ) ( ) , 2 = 2 u Y (u)[Y (u) − 1] u Y (u) which is exactly equal to J(0).

Therefore, we have demonstrated with continuous survival time data with no ties, the score test of the hypothesis H0 : β = 0 in the proportional hazards model is exactly the same as the logrank test for dichotomous covariate z.

The score test

" #2 U(0) J 1/2(0) can be used to test the hypothesis H0 : β = 0 for the model

zβ λ(t|z)=λ0(t)e

for any covariate value z, whether or not z is discrete or continuous. The null hypothesis H0 : β =

0 implies that the hazard rate at any time t is unaffected by the covariate z. This also implies that the survival distribution does not depend on z. The alternative hypothesis HA : β =06 implies that hazard rate increases or decreases (depending on the sign of β)asz increases throughout all time. Therefore, belief in this alternative hypothesis would mean that individuals with a higher value of z would have stochastically larger (or smaller depending on the sign of

PAGE 139 CHAPTER 6 ST 745, Daowen Zhang

β) survival distribution than those individuals with a smaller values of z.Thetest command in Proc Lifetest computes the score test of the hypothesis H0 : β = 0 for the proportional

hazards model. Consequently, when using the test command, the covariate z is not limited to

being dichotomous, nor discrete.

For example, we can test the treatment difference between treatments 1 and 2 for myelo-

matosis data using the following SAS command:

proc lifetest data=myel; time dur*status(0); test trt; run;

and part of the output is presented in the following:

Univariate Chi-Squares for the LOG RANK Test

Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square

TRT -2.3376 2.0522 1.2975 0.2547

Covariance Matrix for the LOG RANK Statistics

Variable TRT

TRT 4.21151

Forward Stepwise Sequence of Chi-Squares for the LOG RANK Test Pr > Chi-Square Pr > Variable DF Chi-Square Chi-Square Increment Increment

TRT 1 1.2975 0.2547 1.2975 0.2547

Likelihood Ratio Test

As in the ordinary likelihood theory, the (partial) likelihood ratio test can also be used to

test the null hypothesis:

H0 : β = β0.

PAGE 140 CHAPTER 6 ST 745, Daowen Zhang

Recall that `(β) is the log partial likelihood. Intuitively, if H0 is true, then βˆ,theMPLE of β, should be close to β0. Hence `(βˆ) should be close to `(β0). Since `(βˆ) − `(β0) is always non-negative, so we should reject H0 when this difference is large.

The likelihood ratio test uses the fact that h i a 2 2 `(βˆ) − `(β0) ∼ χ1, under H0 : β =0.

Therefore, for a given level of significance α, we reject H0 : β = β0 if h i 2 2 `(βˆ) − `(β0) ≥ χ1,α

2 2 2 where χ1,α is the value such that P [χ1 >χ1,α]=α.

Expanding `(β0)attheMPLEβˆ,weget

2 d`(βˆ) 1 d `(βˆ) 2 `(β0) ≈ `(βˆ)+ (β0 − βˆ)+ (β0 − βˆ) . dβ 2! d2β Since MPLE βˆ maximizes `(β), i.e., d` βˆ U βˆ ( ) , ( )= dβ =0 and d2`(βˆ) = −J(βˆ), d2β so h i 2 2 `(βˆ) − `(β0) ≈ J(βˆ)(βˆ − β0) .

We already derived that

a −1 (βˆ − β0) ∼ N(0,J (βˆ)).

Therefore, h i 2 2 `(βˆ) − `(β0) ≈ J(βˆ)(βˆ − β0) " #2 βˆ − β0 a 2 = ∼ χ1 under H0 : β = β0. J −1/2(βˆ)

Note: The SAS procedure Phreg can ONLY handle right censored data.

PAGE 141