7 Cox Proportional Hazards Regression Models (Cont'd)

CHAPTER 7 ST 745, Daowen Zhang 7 Cox Proportional Hazards Regression Models (cont’d) 7.1 Handling Tied Data in Proportional Hazards Models So far we have assumed that there is no tied observed survival time in our data when we construct the partial likelihood function for the proportional hazards model. However, in practice, it is quite common for our data to contain tied survival times due to obvious reasons. Therefore, we need a different technique to construct the partial likelihood in the presence of tied data. Throughout this subsection, we will work with the following super simple example: Patient xδz 1 x1 1 z1 2 x2 1 z2 3 x3 0 z3 4 x4 1 z4 5 x5 1 z5 where x1 = x2 <x3 <x4 <x5. So the first two patients have tied survival times. We assume the following proportional hazards model λ(t|zi)=λ0(t)exp(ziβ) Since there are 3 distinct survival times (i.e, x1,x4,x5) in this data set, intuitively, the partial likelihood function of β will take the following form L(β)=L1(β)L2(β)L3(β), where Lj(β) is the component in the partial likelihood corresponding to the jth distinct survival time. Since the second and third survival times x4 and x5 are distinct, L2(β)andL3(β)canbe constructed in the usual way. So we will focus on the construction of L1(β). In fact, ez4 β L2(β)= , and L3(β)=1. ez4β +ez5β PAGE 142 CHAPTER 7 ST 745, Daowen Zhang We will discuss 4 methods that are implemented in SAS. 1. The Exact Method: This method assumes that the survival time has a continuous distribution and the true survival times of patients 1 and 2 are different. These two patients have the same survival times in our data because our measurement does not have enough accuracy or the original data was rounded for convenience and this information got lost, etc. Without any knowledge of the true ordering of the survival times of patients 1 and 2, we have to consider all possible orderings. There are 2! = 2 possible orderings. Let A1 denote the event that patient 1 died before patient 2 and A2 denote the event that patient 2 died before patient 1. Then by the law of total probability, we have L1(β)=P [observe two deaths at x1]=P [A1 ∪ A2]=P [A1]+P [A2], and P [A1],P[A2] are given in the usual way: ez1β ez2β P [A1]= × ez1β +ez2β +ez3β +ez4β +ez5β ez2β +ez3β +ez4β +ez5β ez2β ez1β P [A2]= × ez2β +ez1β +ez3β +ez4β +ez5β ez1β +ez3β +ez4β +ez5β After the partial likelihood L(β) is constructed, the inference of β is exactly the same as the case where there is no tied survival time (tied survival time and censoring time have no effect on the partial likelihood construction). Specifically, we maximize the new partial likelihood L(β)to obtain MPLE of β, use inverse of minus second derivative of the log partial likelihood function to estimate the variability in the MPLE of β. We can also perform score test and likelihood ratio test. The exact method is implemented in Proc Phreg in SAS. Suppose in our data set mydata we use time to denote the (censored) survival times with cens the censoring indicator, and z the covariate, then the PH model can be fit with the exact method using the following SAS code: Proc Phreg data=mydata; model time*cens(0) = z / ties=exact; run; PAGE 143 CHAPTER 7 ST 745, Daowen Zhang Of course, the exact method will yield optimal estimate of β. However, this method can be potentially computationally intensive. For example, suppose there are dj tied survival times at the jth distinct survival time, then dj! different orderings have to be considered and Lj(β)isthe sum of dj! different terms, each of which is the product of dj terms (conditional probabilities). This number could be very large. For example, when dj =5thendj × dj!=5× 5! = 6000 different terms have to be calculated to get Lj(β). Because of this computational difficulties, two methods have been proposed to approximate the exact partial likelihood. 2. Breslow’s Approximation (default in Proc Phreg): Obviously, we can have the following approximation for our example: βz2 z2β e ≈ e ez2β +ez3β +ez4β +ez5β ez1β +ez2β +ez3β +ez4β +ez5β z1β z1β e ≈ e ez1β +ez3β +ez4β +ez5β ez1β +ez2β +ez3β +ez4β +ez5β Therefore both P [A1]andP [A2], and hence L1(β) can be approximated by ez1β ez2β e(z1+z2)β × = P . z1β z2β z3β z4β z5β z1β z2β z3β z4β z5β 5 zlβ 2 e +e +e +e +e e +e +e +e +e [ l=1 e ] In genera, if there are dj tied survival times at the jth distinct survival time, then Lj(β)is approximated by P exp(β l∈Dj zl) ≈ h i Lj(β) P dj , l∈Rj exp(zlβ) where Rj is the risk set at the jth survival time and Dj is the event (death) set at the jth distinct survival time. So the partial likelihood of β is P YD YD exp(β l∈Dj zl) ≈ h i L(β)= Lj(β) P dj , j=1 j=1 l∈Rj exp(zlβ) where D is the total distinct events. This approximation was proposed by Breslow (1974) and is the default in Proc Phreg of SAS. Obviously, if at each distinct survival time the number of events (failures) dj is small or/and thenumberofpatientsatrisknj is large (so the ratio dj/nj is small), then Breslow’s approximation should work well (the approximated partial likelihood should be very close to the exact PAGE 144 CHAPTER 7 ST 745, Daowen Zhang partial likelihood) However, if these conditions do not satisfy, the approximation can be poor. Therefore Efron (1977) suggested another approximation. 3. Efron’s Approximation: For our example, L1(β) in the exact partial likelihood using the exact method can be written as bc bc L1(β)= + , a(a − b) a(a − c) which can be approximated by 2bc L1(β)= . a(a − (b + c)/2) This motivates the general approximation: P zlβ e l∈D1 ³ ´ L1(β)=Q P −1 P . d1 zlβ j zlβ ∈ e − ∈ e j=1 l R1 d1 l D1 We can specify the option ties=efron in Proc Phreg for this approximation. 4. Discrete Method: This method does not assume that there is underlying ordering of the tied survival times. Instead, the time is assumed to be discrete, which may arise in some applications. For example, suppose we are interested in studying the number of times we drop a dish before it breaks. In this case, we consider the following model: for any death time t,let πit = P [subject i will die at t|subject i survive up to t], then assume the following proportional odds model (a logistic regression with time-varying intercepts) µ ¶ πit log = αt + ziβ, 1 − πit where αt’s are nuisance parameters and β is the parameter of interest (treatment effect, for example). In this case, L1(β) can be interpreted as L1(β)=P [deaths occurred to subjects 1 and 2)|there are 2 deaths out of 5 subjects]. It can be shown that the above probability is equal to e(z1+z2)β L1(β)=P , esj β all Dj PAGE 145 CHAPTER 7 ST 745, Daowen Zhang 5 where Dj are = 10 possible combinations. 2 Obviously, the model considered here is not a proportional hazards model. However, when there is no tied observation in the data set, the resulting likelihood is exactly the same as the Cox partial likelihood. This is the main reason that discrete method is included in Proc Phreg. Note that conditional logistic model is a special case of this model. So Proc Phreg can be used to fit conditional logistic model. Also note that this method can be even more computationally intensive than, say, the exact method. 7.2 Multiple Covariates The real strength of the proportional hazards model is that it allows us to model the relationship of survival time, through its hazard function, to many covariates simultaneously: T z1β1+···+zpβp z β λ(t|z)=λ0(t)e = λ0(t)e , T where z is a (p × 1) vector and β =(β1, ···,βp) is a (p × 1) vector of regression coefficients. Estimation of β is exactly similar to the case of one covariate. The partial likelihood of β is given by " # T dN(u) Y exp(z ( )β) PL(β)= P i u , n exp(zT β)Y (u) {all grid pt u} l=1 l l and the log partial likelihood of β is " Ã !# X Xn T − T `(β)= dN(u) zI(u)β log exp(zl β)Yl(u) . {all grid pts u} l=1 T Note: zl is the covariate value for the lth individual; i.e., zl =(zl1, ···,zlp) . The maximum partial likelihood estimate βˆ (MPLE) of β is obtained by maximizing `(β), i.e., by setting the score vector to be zero ∂`(β) U(β)= =0, ∂β PAGE 146 CHAPTER 7 ST 745, Daowen Zhang where Ã ! ∂`(β) ∂`(β) ∂`(β) T = , ···, . ∂β ∂β1 ∂βp Similar to the previous chapter, we have ∂`(β) X h i = dN(u) zI(u)j − z¯j(u, β) , ∂βj u where zI(u)j denotes the jth element of the covariate vector for the individual I(u) who died at time u,and P n z exp(zT β)Y (u) Xn exp(zT β)Y (u) Pl=1 lj l l P l l z¯j(u, β)= n T = zljwl,wl = n T , l=1 exp(zl β)Yl(u) l=1 l=1 exp(zl β)Yl(u) is the weighted average of the jth element of the covariate vector for the individuals at risk at time u.

7 Cox Proportional Hazards Regression Models (Cont'd)

Analysis of Variance; *******43*1T**T

6 Modeling Survival Data with Cox Regression Models

18.650 (F16) Lecture 10: Generalized Linear Models (Glms)

Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

Bayesian Methods: Review of Generalized Linear Models

Simple, Lower-Variance Gradient Estimators for Variational Inference

STAT 830 Likelihood Methods

Score Functions and Statistical Criteria to Manage Intensive Follow up in Business Surveys

Journal of Econometrics a Consistent Bootstrap Procedure for The

Introduces the Maximum Score Estimator

Glossary of Testing, Measurement, and Statistical Terms

Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes