CPS397-P1-S.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Model selection curves for survival analysis with accelerated failure time models Karami, J. H.*, Luo, K., Fung, T. Macquarie University, Sydney, Australia – [email protected], [email protected], [email protected] Abstract Many model selection processes involve minimizing a loss function of the data over a set of models. A recent introduced approach is model selection curves, in which the selection criterion is expressed as a function of penalty multiplier in a linear (mixed) model or generalized linear model. In this article, we have adopted the model selection curves for accelerated failure time (AFT) models of survival data. In our simulation study, it was found that for data with small sample size and high proportion of censoring, the model selection curves approach outperformed the traditional model selection criteria, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). In other situations with respect to sample size and proportion of censoring, the model selection curves correctly identify the true model whether it is a full or reduced model. Moreover, through bootstrapping, it was shown that the model selection curves can be used to enhance the model selection process in AFT models. Keywords: log-likelihood function; penalty function; penalty multiplier; survival data. 1. Introduction Many model selection strategies are based on minimizing 푙표푠푠 + 푝푒푛푎푙푡푦 over a set of models. The penalty term usually consists of a penalty multiplier 휆 and a penalty function 푓푛(푝훼), where 푝훼 is the number of parameters in model 훼 and 푛 is the sample size. For a fixed penalty function, say 푓푛(푝훼) = 푝훼, if we restrict ourselves to a single value of 휆, we get a specififc model selection criteria. For example, 휆 = 2 will give AIC (Akaike, 1974) and 휆 = log(푛) will give BIC (Schwarz, 1978). Other model selection criteria can be obtained using other values of 휆, or other penalty functions. Recently, Müller and Welsh (2010) studied model selection criteria as a function of penalty multiplier, which resulted in what is known as model selection curves. In their study the model selection curves approach for linear regression models was illustrated, where they used residual sum of squares as the loss function. This approach allows researchers to study and rank candidate models before selecting a model. The stability of the selected model can be assessed in a range of 휆 values using the model selection curves. This approach has the potential to outperform other model selection criteria that are based on single value for penalty multiplier. For many models, including accelerated failure time (AFT) model in survival analysis, it is common to use the log-likelihood to compare models. See for example Liang and Zou (2008), and Murray et al (2012). Our article concentrates on using the log-likelihood (푙) for measuring the descriptive ability of an AFT model, and thus log-likelihood is a natural choice of loss function. There are also other loss functions, such as mean squared error, that can be used. Suppose, we wish to choose one model 훼 from all candidate models in 풜 using model selection curves. The function of 푙표푠푠 + 푝푒푛푎푙푡푦 can be expressed as: ℳ(휆; 훼) = −2푙 + 휆푝훼, 휆 ≥ 0, 훼 ∈ 풜(1) For each specified 휆, a model is chosen, which minimizes ℳ(휆; 훼) over 훼 ∈ 풜 in (1). Now we can define a rank function: 푟(휆; 훼) = 푟푎푛푘(ℳ(휆; 훼))(2) The 푟(휆; 훼) is computed at each 휆. Note that rank functions are step functions, pairs of which have jumps at the values of 휆 where the ranks of models change. Now, assuming that no ties of ℳ(휆; 훼) occur for 훼 ∈ 풜, the 푘 (1 ≤ 푘 ≤ 푚, and 푚 is the number of models in 풜) rank model selection curves can be defined by 퐶(푘)(휆; 풜) = 푚푎푥{ℳ(휆; 훼); 훼 ∈ 풜 ∧ 푟(휆, 훼) ≤ 푘}(3) This definition can be extended to tied ℳ(휆; 훼)’s for 훼 ∈ 풜 by considering continuous locus of 퐶(푘)(휆; 풜) over 휆 > 0. The 1 rank model selection curve is the lower enveloping curve, which is defined as 퐶(1)(휆; 풜) = 푚푖푛{ℳ(휆; 훼); 훼 ∈ 풜}(4) This will be referred to as model selection curve. If perpendiculars are drawn from the vertices of the lower enveloping curve to the horizontal axis, the line segments on the horizontal axis give the respective length of cathetus corresponding to models that appear on the model selection curves. Thus model 훼 should be chosen or selected if it has the longest cathetus in 퐶(1)(휆; 풜), ie, the 1 rank model selection curve. Let’s now illustrate how to construct and use model selection curves using a simple example considering only four models, ie, 훼 = 1, 2, 3, 4. The plot of ℳ(휆; 훼) in 푒푞푢푎푡푖표푛(1) against 휆 are shown on the top left of Figure 1 where 푀1 to 푀4 corresponds to model 훼 = 1 to 4, respectively. Plot of ranks, 푟(휆; 훼) in (2), against 휆 is on top right, ie, Figure 1(ii). Figure 1(iii) is the four rank selection curves, based on (3), and the model selection curve, obtained via (4), is presented on the bottom right of Figure 1. According to the model selection curve, Figure 1(iv), and also with reference to Figure 1(ii), model 2 (ie, 푀2) has the longest cathetus and is chosen from the four models considered. In this paper, it is intended to apply model selection curves for accelerated failure time models of right censored survival data using a simulation study. The rest of the paper is organized as follows: In Section 2, we briefly discuss the model selection curves in the case of Weibull AFT models. In Section 3, a simulation study with different combination of coefficients, censoring proportions and sample sizes are considered and discussed. In Section 4, we conclude our findings. 2. Model selection curves in Weibull AFT model A general form of an AFT model is given by: 푙표푔푇 = 훼0 + 훼1푥1 + ⋯ + 훼푝푥푝 + 휏푊(5) where 푇 is survival time with a specific distribution, 푊 is random error term with a known form of density (eg, extreme value distribution), 푥푗 ’s are predictors and 훼푗 ’s are coefficients of 푥푗 (푗 = 1, 2, ⋯ , 푝), 훼0 is the model intercept and 휏(휏 > 0) is a scale parameter. Let {(푦푖, 훿푖), 푖 = 1, 2, ⋯ , 푛} denote the survival data with right censoring where 푦푖 = 푚푖푛(푇푖, 퐶푖), 퐶푖 is the censoring time for 푖푡ℎ individual and 훿푖 = 퐼(푇푖 ≤ 퐶푖) is the censoring indicator. Assuming the pairs (푦푖, 훿푖), 푖 = 1, 2, ⋯ , 푛 are independent, a general form of likelihood function for the 푛 훿푖 1−훿푖 survival data is 퐿 = ∏푖=1 푓(푦푖) 푆(푦푖) , and thus the log-likelihood function is 푛 푙(휽) = ∑ {훿푖푙표푔푓(푦푖; 푥푖) + (1 − 훿푖)푙표푔푆(푦푖)} 푖=1 푇 푇 푇 where, 휽 = (훼0, 훼 , 휏) , 훼 = (훼1, 훼2, ⋯ , 훼푝) , 푓(⋅) is the density function and 푆(⋅) is the survival function. The 푙(휽) can be expressed in terms of the density function (푔0(푤)) and the survival function (퐺0(푤)) of 푊 as below: 푛 1 푙(휽) = ∑ [훿푖푙표푔 { 푔0(푤푖)} + (1 − 훿푖)푙표푔퐺0(푤푖)] 푖=1 푦푖휏 or, 푛 푛 푛 푙(휽) = −푙표푔휏 ∑ 훿푖 + ∑ {훿푖푙표푔푔0(푤푖) + (1 − 훿푖)푙표푔퐺0(푤푖)} − ∑ 훿푖푙표푔푦푖 (6) 푖=1 푖=1 푖=1 푙표푔푦 −훼 −훼푇푥 where, 푤 = 푖 0 푖. 푖 휏 A Weibull AFT model arises if the error term 푊 in model (5) follows an extreme value distribution with density and survival functions respectively as 푔0(푤) = 푒푥푝[푤 − 푒푥푝(푤)] and 퐺0(푤) = 푒푥푝[−푒푥푝(푤)], 푤 ∈ ℝ. Plugging these two functions into equation (6) gives the log-likelihood for the Weibull AFT model as: 푛 푛 푇 푛 1 − 휏 푙표푔푦푖 − 훼0 − 훼 푥푖 훼0 푇 푙(휽) = ( ) ∑ 훿푖푙표푔푦푖 − ∑ 푒푥푝 ( ) − ∑ 훿푖 (푙표푔휏 + + 훼 푥푖) 휏 푖=1 푖=1 휏 푖=1 휏 The maximum likelihood estimates (MLE) of the parameters can be obtained from the above equation using an iterative procedure (eg, Newton-Raphson), and the model selection curves for the Weibull AFT model can then be constructed from ℳ(휆; 훼) = −2푙(휽̂) + 휆푝훼, 휆 > 0(7) where, 휽̂ is the MLE of 휽. As expected, both AIC and BIC would appear as single points on the model selection curves. To construct model selection curves, we choose 휆 ∈ [0, 4푙표푔(푛)] in (7), same as Müller and Welsh’s study (2010). Such range of 휆 would cover most of existed model selection criteria that are based on single values of 휆, 푠푢푐ℎ푎푠퐴퐼퐶푎푛푑퐵퐼퐶. 3. A simulation study For our simulation study, we consider Weibull AFT model with four predictors, ie, 푙표푔푇푖 = 훼0 + 훼1푥1 + 훼2푥2+훼3푥3 + 훼4푥4 + 휏푊푖, 푖 = 1, 2, ⋯ , 푛. Observations for the (weakly correlated) predictors 푥1, 푥2, 푥3, 푥4 are drawn from a multivariate normal distribution 푀푁(ퟎ, 휮0 ) with zero mean vector and dispersion matrix, 1 0.001 0.001 0.001 0.001 1 0.001 0.001 휮 = [ ]. We considered different sample sizes ranging from 30 to 300 0 0.001 0.001 1 0.001 0.001 0.001 0.001 1 at various censoring proportions such as 10% and 50% along with different sets of coefficients: coefs1 = (0.1, 0, 0, 0.9, 0.8), coefs2 = (0.1, 0, 0, 0.9, 0), and coefs3 = (0.1, 1, 0.7, 0.9, 0.8). Both uncensored and censored observations (ie, 푇푖 and 퐶푖 ) are drawn separately from Weibull distribution with specified shape and scale.