Penalised maximum likelihood estimation for multi-state models
Robson Jose´ Mariano Machado
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London.
Department of Statistical Science University College London
October 17, 2018 2
I, Robson Jose´ Mariano Machado, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the work. Abstract
Multi-state models can be used to analyse processes where change of status over time is of interest. In medical research, processes are commonly defined by a set of living states and a dead state. Transition times between living states are often in- terval censored. In this case, models are usually formulated in a Markov processes framework. The likelihood function is then constructed using transition probabili- ties. Models are specified using proportional hazards for the effect of covariates on transition intensities. Time-dependency is usually defined by parametric models, which can represent a strong model assumption. Semiparametric hazards specifica- tion with splines is a more flexible method for modelling time-dependency in multi- state models. Penalised maximum likelihood is used to estimate these models. Se- lecting the optimal amount of smoothing is challenging as the problem involves multiple penalties. This thesis aims to develop methods to estimate multi-state models with splines for interval-censored data. We propose a penalised likelihood method to estimate multi-state models that allow for parametric and semiparametric hazards specifications. The estimation is based on a scoring algorithm, and a grid search method to estimate the smoothing parameters. This method is shown using an application to ageing research. Furthermore, we extend the proposed method by developing a computationally more efficient method to estimate multi-state models with splines. For this extension, the estimation is based on a scoring algorithm, and an automatic smoothing parameters selection. The extended method is illustrated with two data analyses and a simulation study. Acknowledgements
I would like to thank the Conselho Nacional de Desenvolvimento Cient´ıfico e Tec- nologico´ - Brazil for funding this research. I would like to express my sincere grati- tude to Dr Ardo van den Hout for his thorough supervision of my PhD. In particular, I am thankful for all the opportunities he provided, for motivating me and for his patience. I would like to thank Dr Giampiero Marra for providing good directions to this research. I am thankful to Dr Aidan O’Keeffe and Dr Andrew Titman for their valuable comments that significantly improved this thesis. I thank Dr Boo Jo- hansson for permission to use the OCTO data, a connection that was made possible by Dr Graciela Muniz-Terrera to whom I am also thankful. In addition, I wish to thank the team of researchers behind the ELSA study. I would like to thank my friends and colleagues from University College London for the time spent together and exchange of ideas. Finally, I am thankful to my family and friends for their support in general and throughout my academic path, especially to my parents for making me independent. Special thanks to Verena for making everything easier during the development of this research, for providing support in difficult moments and for celebrating the good ones. Contents
1 Introduction 14 1.1 Research aim and overview ...... 14 1.2 Special features of multi-state processes ...... 16 1.2.1 Model structure ...... 17 1.2.2 Observational patterns ...... 18 1.3 Literature review ...... 19 1.4 Survival analysis ...... 22 1.4.1 The exponential distribution ...... 25 1.4.2 The Gompertz distribution ...... 25 1.4.3 The Weibull distribution ...... 26 1.4.4 The lognormal distribution ...... 27 1.5 Simulation of multi-state processes ...... 29 1.6 Examples ...... 29 1.6.1 Cardiac allograft vasculopathy data ...... 29 1.6.2 English longitudinal study of ageing data ...... 31 1.6.3 Origins of variance in the oldest-old data ...... 32 1.7 Discussion ...... 33
2 Parametric multi-state models 35 2.1 Continuous-time Markov processes ...... 35 2.2 Model representation ...... 38 2.3 Piecewise-constant hazards approximation ...... 40 2.4 Maximum likelihood estimation ...... 41 Contents 6
2.4.1 Likelihood function of the model ...... 41 2.4.2 Scoring algorithm ...... 43 2.5 Confidence intervals ...... 45 2.6 Model selection ...... 46 2.7 Model validation ...... 47 2.8 Applications ...... 48 2.8.1 Origins of variance in the oldest-old data ...... 48 2.8.2 Cardiac allograft vasculopathy data ...... 51 2.9 Discussion ...... 56
3 Multi-state models with splines 57 3.1 Introduction ...... 57 3.1.1 Smoothing methods ...... 58 3.1.2 Cubic regression splines ...... 59 3.1.3 P-splines ...... 61 3.2 Model representation ...... 63 3.3 Penalised maximum likelihood estimation ...... 65 3.3.1 Penalised log-likelihood function ...... 65 3.3.2 Parameter estimation ...... 66 3.3.3 Smoothing parameter estimation ...... 68 3.4 Prediction ...... 69 3.5 Application to the English longitudinal study of ageing data . . . . . 69 3.5.1 Predicting cognitive function ...... 76 3.6 Discussion ...... 78
4 Automatic smoothing for multi-state models 81 4.1 Background for automatic smoothing ...... 81 4.2 Penalised maximum likelihood estimation ...... 85 4.2.1 Model representation ...... 85 4.2.2 Piecewise-constant hazards ...... 85 4.2.3 Penalised log-likelihood function ...... 86 Contents 7
4.2.4 Parameter estimation ...... 86 4.2.5 Smoothing parameters estimation ...... 88 4.2.6 Summary of the algorithm ...... 89 4.2.7 Confidence intervals ...... 89 4.3 Simulation study ...... 90 4.4 Applications ...... 92 4.4.1 Origins of variance in the oldest-old data ...... 93 4.4.2 Cardiac allograft vasculopathy data ...... 98 4.5 Discussion ...... 101
5 Conclusions and future work 103
Appendices 107
A Code for the R software 107
Bibliography 116 List of Figures
1.1 A two-state survival model ...... 17 1.2 An unidirectional model ...... 17 1.3 An illness-death model ...... 18 1.4 A competing risks model ...... 18 1.5 The trajectory of an illness-death process. Dashed vertical lines represent the follow-up times. Solid horizontal lines represent the state occupancy over time ...... 19 1.6 Gompertz hazard functions for scale and shape parameters (λ,θ) = (1,−0.8),(0.1,0.15) and (0.01,0.45) ...... 26 1.7 Weibull hazard function for scale parameter λ = 1 and the shape parameters γ = 0.5, 1, 2 and 3 ...... 27 1.8 Lognormal hazard functions for µ = 1 and σ 2 = 0.75, 1, and 1.75 . 28 1.9 Four-state model for disease progression after transplant for the CAV data ...... 30 1.10 Five-state model for longitudinal data in ELSA on number of words remembered in a recall ...... 32 1.11 Four-state model for longitudinal data in OCTO ...... 33
2.1 Histogram of age at baseline transformed by minus 80 in the OCTO data. The variable t represents age minus 80 ...... 49 2.2 Estimated Gompertz hazards (solid lines) for women, with 95% confidence intervals (dashed lines). The confidence intervals are obtained by simulation with b = 1000 replications ...... 51 List of Figures 9
2.3 Comparison of model-based survival from states 1, 2, and 3 with Kaplan-Meier curves. Model-based survival: grey lines for indi- viduals, blue lines for the mean of the individual survival curves. Kaplan-Meier in black lines with 95% confidence intervals. Fre- quencies for baseline state along vertical axes ...... 52 2.4 Illness-death without recovery model for disease progression after transplant for the CAV data ...... 53 2.5 Histogram of time since transplant in the CAV data ...... 54 2.6 Estimated Gompertz hazards for subjects with IHD and with donor age of 26 (solid lines), with 95% confidence intervals (dashed lines). The confidence intervals are obtained by simulation with b = 1000 replications ...... 55 2.7 Comparison of model-based survival from states 1 with Kaplan- Meier curves. Model-based survival: grey lines for individuals, blue line for the mean of the individual survival curves. Kaplan-Meier in black lines with 95% confidence intervals ...... 56
3.1 The left hand panel illustrates one basis function, B4(x), for a cu- bic regression spline. On the right hand panel, the various curves
of medium thickness show the basis functions, Bi(x), of a cubic re-
gression spline, each multiplied by its coefficients α j. These scaled basis functions are summed to get the smooth curve illustrated by the thick continuous curve ...... 61 3.2 Illustrations of B-splines basis functions of degrees (a) one, (b) two, and (c) three ...... 62 3.3 Illustration of smooth curves made up of B-spline basis functions. On the left hand panel, the dashed curves show B-splines basis func- tions with m = 1 multiplied by their associated coefficients. The thick solid smooth curve is the sum of the scaled basis functions. The right hand panel shows the same, but for B-splines basis func- tion with m = 2...... 63 List of Figures 10
3.4 Five-state model for longitudinal data in ELSA on number of words remembered in a recall ...... 70
3.5 AIC results and fitted hazard transitions for men with ten or more
year of education. In (a) and (c), the AIC results for fixed λ2 = 10 −3 and fixed λ1 = 10 , respectively. In (b) and (d), the estimated hazards for 2 → 3 and 3 → 4, respectively. Solid line for P-splines I and dotted line for Gompertz. In (e) and (f), the AIC results for model P-splines II and fitted hazard for 3 → 4, respectively. Time denotes age transformed by subtracting 49 years ...... 74
3.6 Comparison of model-based survival from states 1, 2, 3, and 4 with Kaplan-Meier curves. Model-based survival: grey lines for indi- viduals, smooth black line for the mean of the individual survival curves. Kaplan-Meier in black lines with 95% confidence bands. Frequencies for baseline state along vertical axes ...... 77
3.7 For the P-splines II model, estimated ten-year transition probabil- ities for men aged 60 with ten or more years of education, and in state 3 at baseline. Solid line for transition probabilities (with B = 1000) and dashed lines for 95% confidence bands ...... 78
4.1 An unidirectional three-state model ...... 82
4.2 Simulation study: true (black lines), estimated (grey lines) and mean estimated (red lines) hazards for the illness-death model for 100 replications ...... 91
4.3 Four-state model for longitudinal data in OCTO ...... 94
4.4 Histogram of age transformed by minus 80 in the OCTO data . . . . 95
4.5 Estimated smooth hazards (solid lines) for women, with 95% con- fidence intervals (dashed lines) ...... 96 List of Figures 11
4.6 Comparison of model-based survival from states 1, 2, and 3 with Kaplan-Meier curves. Model-based survival: grey lines for indi- viduals, smooth blue lines for the mean of the individual survival curves. Kaplan-Meier in black lines with 95% confidence intervals. Frequencies for baseline state along vertical axes ...... 97 4.7 Illness-death without recovery model for disease progression after transplant ...... 98 4.8 Histogram of time since transplant in the CAV data ...... 99 4.9 Estimated smooth hazards for subjects with IHD and with donor age of 26 (solid lines), with 95% confidence intervals (dashed lines) 100 4.10 Comparison of model-based survival with Kaplan-Meier curves for the Gompertz model (left-hand side) and spline model (right-hand side). Model-based survival: grey lines for individuals, blue lines for the mean of the individual curves. Kaplan-Meier in black lines with 95% confidence intervals ...... 101 List of Tables
1.1 State table for the CAV data: number of times each pair of states was observed at successive observation times. The three living states are defined by CAV severity ...... 31
1.2 State table for the OCTO data: number of times each pair of states was observed at successive observation times. The three living states are defined by grades of cognitive function ...... 34
2.1 State table for the OCTO data for the history of State 3: number of times each pair of states was observed at successive observation times. The three living states are defined by grades of cognitive function ...... 48
2.2 Parameter estimates for the four-state for the OCTO data. Estimated standard errors in parentheses. Time scale t is age in years minus 80 49
2.3 State table for the CAV data: number of times each pair of states was observed at successive observation times. The two living states are defined by CAV severity ...... 53
2.4 Parameter estimates for the illness-death without recovery for the CAV data. Estimated standard errors in parentheses. The variable t represents time since baseline ...... 54
3.1 State table for the ELSA data: number of times each pair of states was observed at successive observation times. The four living states are defined by number of words remembered ...... 70 List of Tables 13
3.2 Comparison between models for the ELSA data with N = 1000, where -2LL stands for -2 times the (penalised) loglikelihood func- tion evaluated at its maximum. The variable t denotes age trans- formed by subtracting 49 years ...... 73 3.3 Results for sex, education and time for the five-state P-splines II model for the ELSA data. Estimated standard errors in parentheses. The variable t denotes age transformed by subtracting 49 years . . . 75
4.1 Simulation study to investigate the performance of the multi- state models with splines for modelling time-dependent processes. Mean, bias and estimated standard errors (eSE) for R = 100 repli- cations. Absolute bias less than x is denoted by dxe ...... 93 Chapter 1
Introduction
1.1 Research aim and overview
Multi-state models are a broadly applicable approach to analysing longitudinal data, where change of status over time is of interest. In medical research, these statuses are usually defined by the severity of a disease or condition of subjects in a follow- up study. Some standard examples include, dementia research in which interest lies in the decline of cognitive function, post-transplantation chronic disease studies, which aim to investigate disease progression after transplant, and human immun- odeficiency virus (HIV), where modelling the transition of subjects across disease stages is of interest. Data deriving from these medical conditions share similar char- acteristics inherent to how data are obtained. Specifically, time of change in a med- ical condition cannot always be observed exactly, instead changes are recorded at pre-specified follow-up times. In this case, transition times are said to be interval- censored. On the other hand, if a dead state is defined, time of death is usually known exactly. Multi-state models for data with these characteristics are the focus of this thesis.
Multi-state models for interval-censored data are commonly formulated in a Markov processes framework (Kalbfleisch and Lawless, 1985). The Markov prop- erty states that the future of the process only depends on the current state. Models can be specified through the transition intensities (also called hazards), which rep- resent the instantaneous risks of moving across states. To facilitate estimation, a 1.1. Research aim and overview 15 time homogeneous Markov process is usually assumed (Kalbfleisch and Lawless, 1985; Jackson, 2011). In this case, the hazards are assumed to be constant over time. For a wide range of applications, the risks of moving across states depend on the current state and on time. In this case, a non-homogeneous Markov assumption is assumed to model the multi-state process. Several time-dependent models can be fitted with parametric specifications (Van den Hout, 2017). However, the functional forms underlying the hazards are often unknown and parametric models can be too restrictive.
Flexible multi-state models can be obtained with smooth nonparametric haz- ard functions specification. In this case, the hazards are specified in terms of splines function basis. Splines are polynomial functions, which allow for flexible modelling De Boor (1978). In particular, they enable modelling time-dependent hazards with- out making strong model assumptions. For each hazard function, an extra parameter is employed to control model smoothness. These parameters are commonly called smoothing parameters. In the frequentist context, penalised maximum likelihood is used to estimate the models. Estimation of multi-state models with splines can be carried out in two challenging steps. First, for fixed values of smoothing parame- ters, penalised maximum likelihood estimation is used to obtain estimates for the model parameters. Second, given the estimates of model parameters in the first step, we aim to define a stable and efficient method to estimate the optimal values for the smoothing parameters. These two steps are iterated until a convergence criterion is met. At the present time, methods available cannot fully address the problem of estimating flexible multi-state models in the presence of interval censoring.
This thesis aims to develop a new efficient method for estimating multi-state models with splines in the presence of censoring. The focus is on observation schemes in which the transition times between living states are interval-censored and times into the dead (or absorbing) state are known exactly (or right-censored).
In the remainder of this chapter, some special features of multi-state modelling are presented. This includes a description of basic concepts of survival analysis that are used throughout this thesis. In addition, the literature on techniques for 1.2. Special features of multi-state processes 16 parametric and semiparametric multi-state models is reviewed. This also includes a review of smoothing methods useful for the development of methods in this thesis. Finally, we introduce data used to illustrate methods studied and developed in this thesis. Chapter 2 follows this introduction by presenting the theory of multi-state mod- elling. Specifically, we discuss parametric model specification and estimation. A detailed description of a scoring algorithm to obtain the maximum likelihood esti- mates is discussed. The method is then illustrated through an application to data for post-heart transplantation patients, and data for decline of cognitive function. Chapter 3 then extends parametric multi-state models by allowing for transition-specific hazards specification with splines. Firstly, we provide an in- troduction to smoothing methods that will be used for the remainder of the thesis. In particular, we present cubic regression splines and P-splines. We then present our method for specification and estimation of multi-state models with splines, that uses a scoring algorithm for estimating the model parameters, and a grid search for selecting the optimal amount of smoothing. This method is then illustrated with an application to ageing research. Chapter 4 proposes an unifying and efficient framework for estimating multi- state models with splines in the presence of censoring. A simulation study is carried out to analyse the performance of the method presented. Subsequently, the method is illustrated with an application to data used in Chapter 2. The final chapter provides a discussion of the methods and applications devel- oped in this thesis, and indicates areas for future research.
1.2 Special features of multi-state processes
In this section, a number of different multi-state models are presented in order to illustrate some essential features of these models. In particular, we discuss some model structures and observational patterns commonly found in medical research. The discussion presented in this section is mainly based on the papers by Andersen and Keiding (2002) and Commenges (2002). 1.2. Special features of multi-state processes 17 1.2.1 Model structure Graphically, multi-state models can be illustrated using diagrams with boxes rep- resenting the states and with arrows between the states representing the possible transitions. A state is said an absorbing state if further transitions cannot occur from that state. A transient state is a state that is not absorbing, meaning that at least one transition is possible from that state.
Survival model A simple multi-state model can be defined for survival data in which individuals are followed-up until the event death (or censoring) occurs. The two-state model is then defined by the transient State 1 (alive) and the absorbing State 2 (death). This model is illustrated in Figure 1.1.
Figure 1.1: A two-state survival model
Unidirectional model Unidirectional models extend the survival model by defining a set of transient states before an absorbing State D (Figure 1.2) (Titman, 2008). For this model, individuals start at a transient state and can move forward to the next state until the absorbing state. Cook and Lawless (2002) use these models for the analysis of repeated events data.
Figure 1.2: An unidirectional model
The illness-death model The illness-death model is defined by a disease-free state (State 1), from which subjects can move into the dead state (State 3), or move into a diseased state (State 2). Once in State 2, subjects can move back to State 1 or move into State 3, see Figure 1.3. 1.2. Special features of multi-state processes 18
Figure 1.3: An illness-death model
Competing risks model In the context of medical research, competing risks models are defined by one transient state (State 1: alive) and a number, k, of absorbing states, State h, for h = 1,...,k corresponding to “death from cause h”. Figure 1.4 illustrates the model for k = 2.
Figure 1.4: A competing risks model
1.2.2 Observational patterns As discussed in Commenges (2002), the observations are often incomplete in the sense that it is not possible to observe the whole population of interest, and it is not possible to observe the process continuously over time. The first problem can be addressed by drawing a representative sample of the population. The second problem leads to what is called censored observations. Commenges (2002) provides a thorough discussion on how to approach different types of censoring in multi-state processes. We next describe right and interval-censoring as these are the focus of this thesis. Consider first that a multi-state process is observed in continuous time from time t0 until time t0 + c. If the state of the process at time t0 + c is an absorbing 1.3. Literature review 19
Dead
Illness
Healthy
0 1 2 3 4 6
Figure 1.5: The trajectory of an illness-death process. Dashed vertical lines represent the follow-up times. Solid horizontal lines represent the state occupancy over time state, then the process is observed completely. If not, then some transition times are said to be right-censored. For many applications, a multi-state process is observed at only a finite number of times. In this case, the exact time of transitions are only known to have occurred within a time interval. In this case, the transition times are said to be interval cen- sored. This type of censoring is common in cohort studies, where states are recorded at fixed visit times. If a dead state is defined, the time of death is usually known exactly. In this case, we have a mix of discrete and continuous time observations. This thesis focus on methods for these pattern of observations. Figure 1.5 illustrates the trajectory of an illness-death process over time. The vertical dashed lines repre- sent the follow-up times at which state occupancy is recorded. The horizontal lines represent the state occupancy over time. The process moves back and forth between the living states, subsequently moves from the illness state into the dead state.
1.3 Literature review For the analysis of time-homogeneous multi-state processes, Kalbfleisch and Law- less (1985) developed a general procedure for obtaining maximum likelihood esti- mates of the model parameters. The method is based on a scoring algorithm that uses an approximation to the second order derivatives of the log-likelihood func- tion. Kay (1986) proposes a similar method that calculates the first and second or- 1.3. Literature review 20 der derivatives of some particular multi-state processes. Kay also provides methods for hypothesis testing and model diagnostics. Satten and Longini (1996) developed a method for fitting these models when states are subject to measurement errors. Jackson (2011) presented the msm package in R for analysing time homogeneous multi-state processes for interval-censored data.
For time-dependent multi-state processes, Kalbfleisch and Lawless (1985) sug- gested using two methods. The first method uses piecewise-constant hazards to approach time-dependent processes. In this case, the hazards are constant within specified intervals, but can change for different intervals, see also Kay (1986). For many applications, it is not reasonable to assume that the underlying hazards are piecewise-constant functions. Also, the total number of parameters in the model can become large with the number of transition specific-hazards. The second method fo- cuses on a special case in which the non-homogeneity is due to a time-varying mul- tiplicative change in the matrix of transition intensities. In this case, it is possible to find a transformation function so that the resulting process is time homogeneous. Hubbard et al. (2008) investigated this method and proposed a flexible approach for estimating the transformation functions. Titman (2011) uses a numerical approx- imation to calculate the transition probabilities at the level of the corresponding differential equations. A disadvantage of this method is that estimation can become computationally expensive if many time-varying covariates are included. Jackson (2011) suggests using a piecewise-constant approximation for parametric models, which is employed to obtain the transition probabilities for the likelihood function. Van den Hout (2017) provides a general method for estimating multi-state models for interval-censored data. Focus is given to time-dependent parametric models, such as, Gompertz and Weibull distributions. Given a piecewise-constant approx- imation to the parametric hazards, a scoring algorithm is used for estimating the models. Further literature on time-dependent multi-state models includes Omar et al. (1995), Van den Hout and Matthews (2008a) and Van den Hout and Matthews (2008b).
Even though multi-state models with splines for known transition times are out 1.3. Literature review 21 of the scope of this research, a literature review of these models can be important for the development of this work. Fahrmeir and Klinger (1998) approached these models in a counting process framework. Transition intensities are specified in terms of spline functions. Estimation is based on a backfitting scheme with internal smoothing parameter selection by AIC optimisation. The application is to human sleep data with a discrete set of sleep states. Measurements are made every 30 seconds for a group of 30 patients. A Bayesian approach to this model and data is given in Kneib and Hennerfeind (2008). In this case, the inferential procedures developed allow for simultaneous estimation of model and smoothing parameters. Sennhenn-Reulen and Kneib (2016) developed an estimation procedure based on a structured lasso penalisation for multi-state models. The aim of their research is to identify covariate effect coefficient equal to zero. Baseline transition intensities are specified with piecewise-constant models, or unspecified and equal across all transitions. The R package Flexsurv is mainly designed for modelling time-to- event data (Jackson, 2016). It can fit any parametric time-to-event distribution and the spline models of (Royston and Parmar, 2002). This package can also be used for fitting some multi-state model by writing data in a survival format. Models can be specified with a rage of parametric and nonparametric shapes.
In the context of multi-state models with splines for interval-censored data, a penalised approach for an unidirectional three-state model is used in Joly and Com- menges (1999). Estimation is performed with an algorithm which uses derivatives of the penalised log-likelihood. The smoothing parameters are selected using a grid search with cross-validation. Joly et al. (2002) use the same approach for an illness- death without recovery model. Joly et al. (2009) further extend their method for a five-state model without recovery. These methods require explicit expressions for the transition probabilities. Calculating those formulae can be intractable for more complex models, such as, models with more than four states and backward transi- tions. In addition, their methods can be computationally intractable for models with multiple smoothing parameters, as the grid search method requires models to be fitted for all possible combination of smoothing parameters values given in the grid. 1.4. Survival analysis 22
The method proposed in Titman (2011) allows for nonparametric hazard specifica- tions with B-splines; however, the log-likelihood is maximised without penalisation. If a large set of B-splines basis is used to model a transition-specific hazard, models can become unidentifiable.
Efficient smoothing parameter selection methods are important for practical multi-state modelling with splines. We next discuss some relevant literature for automatic smoothing parameter estimation that will be useful for defining an au- tomatic and efficient algorithm to estimate multi-state models with splines. Gu and Wahba (1991) developed an efficient Newton method for multiple smoothing parameter selection. Wood (2000) extends their method for multiple smoothing parameter selection in generalized ridge regression problems. Wood (2004) pro- vides a stable and efficient method that improves the method developed in Wood (2000). These methods are widely used in generalised additive models. However, they can be further extended for more general settings. Radice et al. (2016) pro- poses a method for multiple smoothing parameter estimation for copula regression. Their method is general, but lead to numerical instability for some applications. Marra et al. (2017) developed a more general and stable method that can be esti- mate multiple smoothing parameters using only the gradient and Hessian (or Fisher information matrix). This method is general and can be used in a variety of settings.
1.4 Survival analysis
Some concepts of time-to-event analysis can be used and extended to multi-state modelling. In this section, we describe important features of these concepts that will be useful for the development of this thesis. Time-to-event analysis aims to describe the analysis of data in the form of a well-defined time origin until the occurrence of some particular event. In medical research, the time origin will often correspond to beginning of a treatment and the event of interest can be relief of pain, the recurrence of symptoms, or death. If the event is death of a patient, then time-to- event data are commonly called survival data. In this thesis, we focus on examples that include a dead state, and we shall refer to time-to-event data as survival data. 1.4. Survival analysis 23
This section is partly based on Collett (2015). Let T be the random variable associated with the survival time of patients, which can take any non-negative values. The distribution function of T is given by
Z t F(t) = P(T < t) = f (u)du, (1.1) 0 where f (t) represents the probability density function of T. Equation (1.1) repre- sents the probability that survival is less than some value t. The survival function, S(t), is defined to be the probability that the survival time is greater than or equal to t,
S(t) = P(T ≥ t) = 1 − F(t), (1.2) where F(t) is as defined in (1.1). The hazard function is widely used to express the risk or hazard of an event occurring at some time t. This function is obtained from the probability that an individual dies at time t, conditional on them having survived to that time. Formally, the hazard function, h(t), is given by
P(t ≤ T ≤ t + δt|T ≥ t) h(t) = lim . (1.3) δt→0 δt Equations (1.1), (1.2) and (1.3) provide that the hazard function can be ex- pressed as f (t) h(t) = . (1.4) S(t) The cumulative hazard function is used to express the cumulative risk of an event occuring by time t. Formally, the cumulative hazard function, H(t), is given by Z t H(t) = h(u)du. (1.5) 0 d From Equation (1.4), it follows that h(t) = − dt {logS(t)}, which implies that S(t) = exp{−H(t)}. Therefore, the survival function can also be expressed in terms of the cumulative hazard function. 1.4. Survival analysis 24
Empirical non-parametric methods are useful to summarise the survival times for individuals in a particular group. Suppose that for a single sample of survival times none of the observations is censored. The empirical survival function is given by Number of individuals with survival times ≥ t Sb(t) = . (1.6) Number of individuals in the data The empirical survival function is equal to unity for values of t before the first death time, and zero after the final death time. The estimated survivor function, Sb(t), is assumed to be constant between two adjacent death times, and so a plot of Sb(t) against t is a step-function. The function decreases immediately after each observed survival time.
Suppose that there are n individuals with survival times at t1,...,tn. These ob- servations can be right-censored and more than one death can occur at a given time.
Suppose that there are r deaths among the individuals, where r ≤ n. Let t(1),...,t(r) be the r ordered death times, n j the number of individuals who are alive just before t( j), including those who are about to die, and d j the number of individuals who die at t( j). The Kaplan-Meier estimate of the survival function is given by
k n j − d j Sb(t) = ∏ (1.7) j=1 n j
for t(k) ≤ t < t(k+1), k = 1,...,r, with Sb(t) = 1 for t < t(1), and where t(r+1) is taken to be ∞. If the largest observation is a censored survival time, t∗, then Sb(t) is undefined ∗ for t > t . If the largest observed survival time, t(r), is an uncensored observation, nr = dr, and so Sb(t) = 0 for t ≥ t(r). A plot of the Kaplan-Meier estimate of the survival function is a step-function, in which the estimated survival probabilities are constant between adjacent death times and decrease at each death time. This thesis uses the Kaplan-Meier estimate of the survival function for model validation. 1.4. Survival analysis 25 1.4.1 The exponential distribution
The probability density function of a random variable T that has an exponential distribution with parameter λ > 0 is given by
f (t) = λ exp−λt, (1.8) for 0 ≤ t < ∞. The survival function is then given by
Z t S(t) = 1 − f (t)dt 0 S(t) = exp(−λt). (1.9)
d Using the relation h(t) = − dt log(S(t)), the hazard function is given by
h(t) = λ, (1.10) for t ≥ 0. Therefore, if the hazard is constant over time. For many applications, it may be more reasonable to consider time-dependent hazard functions.
1.4.2 The Gompertz distribution
The probability density function of a random variable T that has a Gompertz distri- bution with parameters λ > 0 and θ ∈ R is given by
λ f (t) = λeθt exp (1 − eθt) , (1.11) θ for 0 ≤ t < ∞. The survival function of the Gompertz distribution is given by
λ S(t) = exp (1 − eθt) . (1.12) θ
The hazard function of the Gompertz is given by
h(t) = λeθt, (1.13) 1.4. Survival analysis 26
λ =
1.0 0.01 θ = 0.45 0.8 0.6
λ = 0.1 θ = 0.15 0.4 Hazard function 0.2
λ = 1
θ = − 0.8 0.0 0 2 4 6 8 10 Time
Figure 1.6: Gompertz hazard functions for scale and shape parameters (λ,θ) = (1,−0.8),(0.1,0.15) and (0.01,0.45)
for t ≥ 0. The parameters θ and λ are the shape and scale parameters, respec- tively. For the case in which θ = 0, the hazard function is constant and equal to λ, and the survival times then have an exponential distribution. Positive values of θ lead to increasing hazards, while negative values of θ lead to decreasing hazards. As an illustration, the hazard function for Gompertz distributions with parameters (λ,θ) = (1,−0.8),(0.1,0.15) and (0.01,0.45) are shown in Figure 1.6.
1.4.3 The Weibull distribution
The probability density function of a random variable T that has a Weibull distribu- tion, with parameters λ > 0 and γ > 0 is given by
f (t) = λγγ−1 exp(−λtγ ), (1.14) for 0 ≤ t < ∞. The survival function of the Weibull distribution is given by
S(t) = exp(−λtγ ). (1.15) 1.4. Survival analysis 27
γ = 5 3
γ = 2 4 3 2 Hazard function
γ = 1 1
γ = 0.5 0
0.0 0.5 1.0 1.5 2.0 Time
Figure 1.7: Weibull hazard function for scale parameter λ = 1 and the shape parameters γ = 0.5, 1, 2 and 3
The corresponding hazard function is given by
h(t) = λγtγ−1 (1.16) for t ≥ 0. For the case in which γ = 1, the hazard function is constant over time, and the survival times have an exponential distribution. The shape of the hazard function is controlled by the parameter γ, which is then defined as the shape parameter, while λ is a scale parameter. A plot of the Weibull hazard function for distributions with λ = 1 and γ = 0.5,1,2 and 3 is shown in Figure 1.7.
1.4.4 The lognormal distribution
A random variable T has a lognormal distribution, with parameters µ and σ, if the random variable logT has a normal distribution with mean µ and variance σ 2. The probability density function of T is given by
1 f (t) = √ t−1 exp−(logt − µ)2/2σ 2 , (1.17) σ 2π 1.4. Survival analysis 28 0.5 0.4
σ = 0.3 0.75
0.2 σ = 1 Hazard function
0.1 σ = 1.75 0.0 0 2 4 6 8 10 Time
Figure 1.8: Lognormal hazard functions for µ = 1 and σ 2 = 0.75, 1, and 1.75
for 0 ≤ t < ∞ and σ. The survival function of the lognormal distribution is
logt − µ S(t) = 1 − Φ , (1.18) σ where Φ(·) is the standard normal distribution function, given by
1 Z z Φ(z) = √ exp(−u2/2)du. (1.19) 2π −∞
The hazard function can be found from the relation h(t) = f (t)/S(t). This function is zero at t = 0, increases to a maximum and then decreases to zero as t tends to infinity. The hazard and survival functions are given in terms of integrals limits, which makes the use of this model difficult for applications. However, the flexibility of this model is attractive and they will be useful in Chapter 4 to simulate transition times from a distribution function that leads to a non-linear shape of haz- ard functions. For illustration, the hazard function for lognormal distributions with parameters µ = 1 and σ 2 = 0.75,1, and 1.75 are shown in Figure 1.6. 1.5. Simulation of multi-state processes 29 1.5 Simulation of multi-state processes
In this section, we describe how to simulate multi-state processes using the inversion method (Bender et al., 2005). For a fixed transition from state r to state s, r 6= s, the random variable T represents the time to event . If the cumulative distribution function for leaving state r to state s is given by F(t) = 1 − S(t), then U = F(T) has a uniform distribution on the interval [0,1], denoted by U ∼ U[0,1]. Moreover, if a random variable has distribution U ∼ U[0,1] then 1 −U ∼ U[0,1] as well. It means that S(T) ∼ U[0,1]. Therefore, if the function H(t) can be inverted, the time to event from state r to state s can expressed as
−1 > T = H [−log(U)exp(β rsz)] (1.20) where U ∼ U[0,1]. Suppose there are m − 1 competing intensities for leaving state r. The time to next event can be obtained by taking T = min(T1,...,Tm−1), where Ti is obtained applying (1.20). For the calculation of subsequent event times, we use that the survival function conditional on entering the state r at time t0 > 0 is given by
S(t|t0) = P(T > t|T > t0) S(t) = . (1.21) S(t0)
1.6 Examples
This section further discusses special features of multi-state processes through ex- amples. Data from these examples will then be used to illustrate the statistical methods presented in subsequent chapters.
1.6.1 Cardiac allograft vasculopathy data Cardiac allograft vasculopathy (CAV) is a narrowing of the arterial walls and one of the main cause of death in heart transplantation patients. The data are a series of approximately yearly angiographic examinations of heart transplant recipients. 1.6. Examples 30
Figure 1.9: Four-state model for disease progression after transplant for the CAV data
The data come from Papworth Hospital U.K. The data contain 3217 rows which are grouped by 622 patients and ordered by years after transplant. For the CAV data, baseline means years since the beginning of the study.
Of interest is the onset of CAV after transplantation. The state at each follow- up time is a grade of CAV which can be normal, mild, moderate or severe. Dead is the absorbing state and time of death is known within one day. Three living states are defined by CAV severity: State 1, 2, and 3, for Normal, Mild/Moderate, and Severe, respectively. An additional State 4 is defined as the Dead state, see Figure 1.9.
The interval-censored multi-state process is summarised by the frequencies in Table 1.1. The code 99 represents right-censored observations. As it can be seen, some backward transitions are recorded. However, the narrowing process is assumed to be biologically irreversible. There are essentially two ways to approach this problem. First, it is possible to fit multi-state models with misclassification of states (Jackson et al., 2003). Such an approach is out of the scope of the current research. Second, the data can be redefined for the history of CAV. In this case, the states are classified as Healthy (1) if the patient has not developed the disease, Mild/Moderate (2) if the patient has developed mild/moderate or Severe (3) if the patient has developed severe CAV and Dead (4) if the patient has died. Then, the data are consistent with the model in Figure 1.9.
The CAV data include several covariates which may be used to analyse pa- tient’s progression across states. These include recipient age at examination, donor age, the sexes of recipient and donor, and primary diagnosis of ischaemic heart dis- ease (IHD). Sharples et al. (2003) showed in an analyse of these data that IHD and 1.6. Examples 31
Table 1.1: State table for the CAV data: number of times each pair of states was observed at successive observation times. The three living states are defined by CAVseverity
To From 1 2 3 4 99 1 1367 204 44 148 276 2 46 134 54 48 69 3 4 13 107 55 26
donor age are major risk factors of disease onset.
1.6.2 English longitudinal study of ageing data
The English Longitudinal Study of Ageing (ELSA) baseline (1998-2001) is a rep- resentative sample of the English population aged 50 and older. ELSA contains information on health, economic position, and quality of life. Data from ELSA can be obtained via the Economic and Social Data Service (www.esds.ac.uk). There are 11932 individuals in the ELSA baseline. The following description of these data can also be found in Van den Hout (2017). Of interest is the change of cognitive function in older population. For the analysis in this thesis, a random sample of size N = 1000 is taken from ELSA. Of these 1000 individuals, 205 died during the follow-up with age at death available. Because ELSA data are publicly available, measures have been taken by the data provider to prevent identification of the individuals. One of those measures is the censoring of ages above 90 years. In the sampling of the subset of N = 1000, indi- viduals who were 90 years or older at baseline were ignored, where baseline means entry in the study. The sample has 544 women and 456 men. Highest educational qualification is dichotomised according to years of formal education: fewer than ten versus ten or more. There are 558 individuals with fewer than ten years of education. The focus is on the number of words remembered in a delayed recall from a list of ten. The score on this test is equal to the number of words remembered, i.e., score ∈ {0,1,2,··· ,10}. It is of interest to explore the effect of age and gender on cognitive change over time when controlling for education. Four living states are 1.6. Examples 32
Figure 1.10: Five-state model for longitudinal data in ELSA on number of words remem- bered in a recall defined by the number of words an individual can remember: State 1, 2, 3, and 4, for the number of words {7,8,9,10}, {6,5}{4,3,2}, and {1,0}, respectively. An additional State 5 is defined as the dead state, see Figure 1.10.
1.6.3 Origins of variance in the oldest-old data
The origins of variance in the oldest-old (OCTO) study included dizygotic (DZ) and monozygotic (MZ) twin pairs aged 80 years of age and older. The sample was selected from older adults in the population-based Swedish Twin Registry. Older adults participating in the study were tested in their residence by nurses. Informed consent was obtained from each participant. Five cycles of longitudinal data were collected at two year intervals. The initial sample consisted of 702 individuals (351 same-sex pairs). The final analysis included 694 participants. Of interest is the effects of age on cognitive function. The mini-mental state examination (MMSE) is used to assess global cognitive functioning of participants at each time point (Folstein et al., 1975). The state at each follow-up time is a clas- sification of respondents in terms of MMSE, which can be normal MMSE (27 ≤ MMSE ≤ 30), mild MMSE impairment (23≤ MMSE ≤ 26) and severe MMSE im- pairment (MMSE ≤ 22). The MMSE is not used to determine clinical diagnosis but as suggestive of mild cognitive impairment and dementia. These MMSE severity are used to define three living states: State 1, 2, and 3, for No cognitive impair- ment, Mild cognitive impairment, and Severe cognitive impairment, respectively. An additional State 3 is defined as the Dead state, see Figure 1.11. The interval-censored multi-state process is summarised by the frequencies in Table 1.2. Even though there are some backwards transitions from State 3, severe 1.7. Discussion 33
State 1 State 2 State 3 No cognitive Mild cognitive Severe cognitive impairment impairment impairment
Dead
Figure 1.11: Four-state model for longitudinal data in OCTO cognitive impairment is assumed to be irreversible. Here, the OCTO data are rede- fined for the history of severe cognitive impairment. In this case, if a participant has moved into State 3, they are only allowed to stay in State 3 or move into the dead state, see Figure 1.11. The data include several covariates which may be used to analyse patient’s decline of cognitive function and death. These include education, sex, and socioe- conomic status (SES). Robitaille et al. (2018) investigate the effect of age and these covariates on the transitions between cognitive states and life.
1.7 Discussion To conclude, estimation of time-dependent multi-state models for interval- censoring can be challenging. Most parametric models are specified with restrictive functional forms, such as, Gompertz and Weibull. Flexible parametric distributions can also be used, but obtaining the derivatives of the log-likelihood function for those models can be intractable. For the examples in Section 1.6, there is no clear information on good parametric shapes. Nonparametric hazards specification with splines provides a flexible approach for modelling time-dependent multi-state pro- cess. The methods developed so far focus on particular cases and are not feasible for many applications. 1.7. Discussion 34
Table 1.2: State table for the OCTO data: number of times each pair of states was observed at successive observation times. The three living states are defined by grades of cognitive function
To From 1 2 3 4 1 715 133 82 233 2 67 106 116 110 3 8 21 274 319 Chapter 2
Parametric multi-state models
This chapter introduces parametric multi-state models for interval-censored data. The general formulation builds up within a Markov processes framework, and mod- els are specified through hazard functions. Maximum likelihood is used for estima- tion. Model selection and model validation are briefly discussed as they will be useful for comparing and validating models throughout this thesis. The method is illustrated with the CAV and OCTO data sets. These applications show the versatil- ity of multi-state modelling for longitudinal data, but also highlight some limitations of parametric model specifications. The aim of this chapter is to present the general theory of multi-state models. All methods presented are standard and can be found in textbooks (Cox, 2017; Van den Hout, 2017). The applications are original, as the CAV and OCTO data sets have not been analysed with the formatting proposed in this chapter.
2.1 Continuous-time Markov processes
A stochastic process is a collection of random variables {Y(t)|t ∈ U}, where U is an index set that can take discrete or continuous values. The state space, S , is the set of possible values of Y(t), which can also be discrete or continuous. The focus here is on the case in which U is continuous and S is discrete, where the former is a set of model states and Y(t) denotes the state of S occupied by the system at time t. The content presented in this section follows the presentation in Cox (2017).
Given the time points t1,...tn, it is of interest to examine the joint distribu- 2.1. Continuous-time Markov processes 36 tion of Y1,...,Yn, where Yj = Y(t j) for j = 1,...,n. Commonly, {Y(t)|t ∈ U} is assumed to be a Markov process, which means that the future state of the process only depends on the current state. Formally, a continuous-time Markov process on the discrete states S is defined through a set of probabilities, prs(t), such that,