Regression Modeling of Time to Event Data Using the Ornstein-Uhlenbeck Process

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Roger Alan Erich, M.S.

Graduate Program in Biostatistics

The Ohio State University

2012

Dissertation Committee:

Professor Michael L. Pennell, Advisor Professor Thomas J. Santner Professor Dennis K. Pearl c Copyright by

Roger Alan Erich

2012

2012 Abstract

In this research, we develop innovative regression models for survival analysis that model time to event data using a latent health process which stabilizes around an equilibrium point; a characteristic often observed in biological systems. Regression modeling in survival analysis is typically accomplished using Cox regression, which requires the assumption of proportional hazards. An alternative model, which does not require proportional hazards, is the First Hitting Time (FHT) model where a subject’s health is modeled using a latent . In this modeling framework, an event occurs once the process hits a predetermined boundary. The parameters of the process are related to covariates through generalized link functions thereby providing regression coefficients with clinically meaningful interpretations. In this dissertation, we present an FHT model based on the Ornstein-Uhlenbeck (OU) process; a modified which drifts from the starting value of the process toward a state of equilibrium or homeostasis present in many biological applications.

We extend previous OU process models to allow the process to change according to covariate values. We also discuss extensions of our methodology to include random effects accounting for unmeasured covariates. In addition, we present a mixture model with a cure rate using the OU process to model the latent health status of those subjects susceptible to experiencing the event under study. We apply these methods

ii to survival data collected on melanoma patients and to another survival data set pertaining to carcinoma of the oropharynx.

iii This document is dedicated to my family and to those brave men and women of the

Armed Forces that gave their lives to protect our country’s freedom during the

completion of this PhD.

iv Acknowledgments

Without the support, patience and guidance of the following people, this study would not have been completed. It is to them that I owe my deepest gratitude.

Dr. Michael Pennell, my advisor, who guided me through the entire process of • this research. Without his expertise, this would not have been possible.

Dove Erich, my dear wife, without whom this effort would have been worth • nothing. Your love, support, and sacrifice helped tremendously during this

trying time, and I will be forever grateful.

Ashley and Ellie Erich, my girls, who have sacrificed so much time with me. •

Arnold and Doris Erich, my parents, who have always believed in me. •

My dissertation committee that provided me with valuable guidance and feedback • to make this research stronger and more viable.

All of the faculty and staff at The Air Force Institute of Technology who • supported me through this longer than anticipated PhD program.

Dr. Bill Baker who provided valuable mathematical insight to help me get • “unstuck” in my research which allowed me to complete this degree.

Last, but not least, I am ever grateful to God who makes all things possible. •

v Vita

June 1992...... St Marys Area High School

2003...... B.S. Mathematics, Pennsylvania State University

2005...... M.S. Applied Mathematics, A.F. Inst. of Technology

2007 to 2012...... Graduate Student, Department of Biostatistics,

The Ohio State University

Fields of Study

Major Field: Biostatistics

vi Table of Contents

Page

Abstract...... ii

Dedication...... iv

Acknowledgments...... v

ListofFigures ...... x

ListofTables...... xii

1. Introduction...... 1

2. Threshold Regression: A First Hitting Time Regression Model...... 6

2.1 Gamma Process and Inverse Gamma First Hitting Time ...... 7 2.2 WienerProcessModels ...... 9 2.2.1 TheWienerProcess ...... 9 2.2.2 TheInverseGaussianDistribution ...... 10 2.2.3 The Inverse Gaussian Distribution and Survival Analysis.. 12 2.2.4 Previous Work using the FHT model with an Underlying WienerStochasticProcess...... 15 2.2.5 Strengths and Limitations of Using the Wiener Process Model 19 2.3 Survival Models Based on the Ornstein-Uhlenbeck Process ..... 20 2.3.1 First Hitting Time for the Ornstein-Uhlenbeck Process . .. 20 2.3.2 TheShapeoftheHazardFunction ...... 23 2.3.3 Modeling the Hazard as the Square of an Ornstein-Uhlenbeck Process ...... 25 2.3.4 The Ornstein-Uhlenbeck Process in Biostatistical Applications ...... 27 2.4 OperationalTime...... 29

vii 3. The Ornstein-Uhlenbeck Model With Initial State Dependent On Covariates 32

3.1 ProposedOUThresholdRegressionModel ...... 33 3.2 SimulationStudy...... 35 3.3 Application of OU-TR Model to Overall Survival of Patients with CarcinomaoftheOropharynx...... 38 3.4 Discussion...... 46

4. TheOrnstein-UhlenbeckMixtureModel ...... 47

4.1 Proposed OU Threshold Regression Mixture Model ...... 50 4.2 SimulationStudy...... 53 4.3 Application of OU-TR Mixture Model to Time to Relapse Data from PatientswithMelanoma ...... 55 4.4 Discussion...... 63

5. The Ornstein-Uhlenbeck Random Effects Model for Survival Data with UnmeasuredCovariates ...... 65

5.1 OU-TRRandomEffectsModel ...... 68 5.1.1 ProposedModel ...... 68 5.1.2 SimulationStudy...... 70 5.1.3 Application of the OU-TR Random Effects Model to Overall Survival of Patients with Carcinoma of the Oropharynx . . 81 5.2 OU-TRRandomEffectsMixtureModel ...... 83 5.2.1 ProposedModel ...... 83 5.2.2 SimulationStudy...... 85 5.2.3 Application of OU-TR Random Effects Mixture Model to Time to Relapse Data from Patients with Melanoma . . . . 97 5.3 Discussion...... 101

6. Conclusion...... 104

Bibliography ...... 110

Appendices ...... 117

A. StandardErrorDerivationsforOU-TRModel ...... 113

viii B. Standard Error Derivations for OU-TR Mixture Model ...... 115

C. Random Effects Density Function and Survival Function Derivations . . 117

D. Simplification of the Likelihood Function Under the OU-TR Random Effects Model ...... 119

E. Standard Error Derivations for OU-TR Random Effects Model ...... 124

F. Standard Error Derivations for OU-TR Random Effects MixtureModel . 127

G. Newly Developed Matlab Functions for Fitting OU-TR Models toData . 131

G.1 OU-TRModel ...... 131 G.2 OU-TRMixtureModel ...... 134 G.3 OU-TRRandomEffectsModel ...... 137 G.4 OU-TRRandomEffectsMixtureModel ...... 140

ix List of Figures

Figure Page

2.1 Inverse Gaussian Densities with τ = 1 for Several Values of λ ..... 11

2.2 Inverse Gaussian Densities with λ = 1 for Several Values of τ ..... 11

2.3 SamplePathsofOUProcessandWienerProcess ...... 22

2.4 Hazard function of time to absorption (parameter values: a = 0,b = 1,σ2 =2) ...... 25

3.1 Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model.. 41

3.2 Goodness of Fit of Second Best BIC Carcinoma of the Oropharynx ModelforSubjectswithaDisability ...... 42

3.3 Goodness of Fit of Second Best BIC Carcinoma of the Oropharynx ModelforSubjectswithoutaDisability...... 43

3.4 Comparing Goodness of Fit of Model with Interaction Between Disability Status and Tumor Size (Int) with the best BIC Main Effects Model (No Int)...... 44

4.1 Goodness of Fit of Best and Second Best Melanoma Models (in terms ofBIC)forNodalCategories0and1 ...... 60

4.2 Goodness of Fit of Best and Second Best Melanoma Models (in terms ofBIC)forNodalCategories2and3 ...... 61

5.1 Estimated Survival Curves for OU-TR and OU-TR Random Effects ModelsWhenPsi=0.25(Scenario1fromTable5.6) ...... 79

x 5.2 Estimated Survival Curves for OU-TR and OU-TR Random Effects ModelsWhenPsi=0.5(Scenario2fromTable5.6) ...... 79

5.3 Estimated Survival Curves for OU-TR and OU-TR Random Effects ModelsWhenPsi=1(Scenario3fromTable5.6) ...... 80

5.4 Estimated Survival Curves for OU-TR and OU-TR Random Effects ModelsWhenPsi=2(Scenario4fromTable5.6) ...... 80

5.5 Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model (OU-TR and OU-TR Random Effects) for Subjects with Disability . 82

5.6 Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model (OU-TR and OU-TR Random Effects) for Subjects with No Disability 82

5.7 Comparison of Survival Estimates When Bias is Present in ψˆ ..... 92

5.8 Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 0.25 (Scenario 1 from Table 5.13)...... 95

5.9 Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 0.5 (Scenario 2 from Table 5.13)...... 95

5.10 Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 1 (Scenario 3 from Table 5.13)...... 96

5.11 Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 2 (Scenario 4 from Table 5.13)...... 96

5.12 Goodness of Fit of Best BIC Melanoma Model (Mixture Model and Random Effects Mixture Model) for Nodal Categories 0 and 1 . . . . 99

5.13 Goodness of Fit of Best BIC Melanoma Model (Mixture Model and Random Effects Mixture Model) for Nodal Categories 2 and 3 . . . . 100

xi List of Tables

Table Page

3.1 Results of Simulation Study Based on 1000 Data Sets of Size200. .. 37

3.2 Summary of Variables Considered in Modeling the Oropharynx Data...... 40

3.3 Final Model for Death from Carcinoma of the Oropharynx ...... 45

4.1 Results of Mixture Model Simulation Study Based on 1000 Data Sets ofSize300...... 55

4.2 Summary Statistics of Variables Considered in Modeling the Melanoma Data...... 57

4.3 Stage 1 of OU-TR Mixture Model Building for Melanoma Data With RelapseasEvent ...... 58

4.4 Stage 2 of OU-TR Mixture Model Building for Melanoma Data With RelapseasEvent ...... 59

4.5 FinalModelforRelapsefromMelanoma ...... 62

5.1 Simulation Results for OU-TR Random Effects Model Based on 1000 DataSetsofSize200withPsi=0.25...... 73

5.2 Simulation Results for OU-TR Random Effects Model Based on 1000 DataSetsofSize200withPsi=0.5...... 74

5.3 Simulation Results for OU-TR Random Effects Model Based on 1000 DataSetsofSize200withPsi=1...... 75

xii 5.4 Simulation Results for OU-TR Random Effects Model Based on 1000 DataSetsofSize200withPsi=2...... 76

5.5 Simulation Results for OU-TR RE Model Based on 1000 Data Sets of Size300withPsi=2...... 77

5.6 Simulation Results Examining Effect of Ignoring Random Effect (RE) in the OU-TR Model. Results are Based on 1000 Data Sets of Size 200. 78

5.7 OU-TR and OU-TR Random Effects Models for Carcinoma of the OropharynxData...... 81

5.8 Simulation Results Based on 1000 Data Sets of Size 300 with Psi = 0.25. 87

5.9 Simulation Results Based on 1000 Data Sets of Size 300 with Psi = 0.5. 88

5.10 Simulation Results Based on 1000 Data Sets of Size 300 withPsi=1. 89

5.11 Simulation Results Based on 1000 Data Sets of Size 300 withPsi=2. 90

5.12 Simulation Results Based on 1000 Data Sets of Size 1000 with Varying TrueValuesofPsi...... 91

5.13 Simulation Results Based on 1000 Data Sets of Size 300 for OU-TR Random Effects Mixture Model Comparison to OU-TR Mixture Model withoutRandomEffects...... 93

5.14 Final OU-TR Mixture Model and OU-TR Random Effects Mixture ModelfortheMelanomaData ...... 98

xiii Chapter 1: Introduction

In biostatistical research, it is often the goal to determine important factors affecting subject’s survival time or time to development of disease. Numerous models are used to identify these factors. One popular choice is the Cox proportional hazards model (Cox, 1972). This model has many great features that include not being required to assume a distribution for the baseline hazard function, interpretation of regression parameters in terms of relative risk and the use of the partial likelihood function. However, this model provides erroneous results if the proportional hazards assumption is violated due to time varying covariate effects. For example, the effectiveness of a drug treatment may increase or decrease over time. A doctor may prescribe an antibiotic which loses its ability to fight infection over time. Thus, another treatment may be prescribed that builds up in the system eliminating the infection.

If we look at a patient’s health post-surgery, there usually is an initial increase in mortality risk immediately following surgery before a beneficial health effect is observed. Also, unexplained heterogeneity in a subject’s risk or frailty may also result in non proportional hazards (Hougaard, 1991 and Keiding, 1997). To combat this phenomenon, a shared frailty model may be fit to the data (Vaupel et. al., 1979).

However, within each cluster of subjects with the shared frailty value, the proportional hazards assumption must still be met in order to draw sound inferences. Other

1 methods are available to remedy problems associated with non proportional hazards under the Cox model. They include using time dependent covariate effects in the model or simply stratifying on the covariate that is introducing the non proportional hazards (Klein and Moeschberger, 2003).

Another useful, though less frequently used, approach for identifying important prognostic variables of survival is a First Hitting Time (FHT) model. This approach does not require the proportional hazards assumption. In an FHT model, a stochastic process represents patient health with failure occurring once the process hits a boundary

(Lee and Whitmore, 2006). For example, we may model a subject’s health status using a Wiener process resulting in an FHT with an inverse Gaussian distribution

(Chhikara and Folks, 1989).

Since death or disease is the outcome of a series of genetic and physiological events where a subject’s health deteriorates until it reaches a boundary, the FHT model is theoretically an attractive choice (Pennell et. al., 2010). Take for instance, a subject who has been diagnosed with lung cancer. This cancer has stages 0 through

5 with higher stages indicating more extensive disease. If left untreated, subjects will transition from one stage to the next, until they ultimately die from the disease. In this context, an FHT model would be well suited to analyze time to event data and highlight important variables that have an impact on survival.

In threshold regression, covariate information is integrated into the parameters of

FHT models via generalized link functions (Lee and Whitmore, 2006). For example, the initial state and variance of the Wiener process have been related to covariates using a log-link, and an identity link has been used for the drift parameter (cf. Lee et. al., 2000, 2004; Aalen and Gjessing, 2001; Aalen et. al., 2008). The proportional

2 hazards assumption is avoided in these models given that the effects on the hazard

vary with time.

In a 2004 paper, Aalen and Gjessing discuss survival models based on the Ornstein-

Uhlenbeck (OU) process. The OU process is a modification of a Wiener process to include drift toward an equilibrium state. Many biological processes have the property, termed homeostasis, of diffusing back and forth while simultaneously tending to stabilize around a certain point. Examples of homeostatic biological processes include body temperature regulation in warm-blooded animals and blood pH regulation in the human body (Blessing, 1997). Also, the urinary system in the human body removes salt, excess ions and waste from plasma which is vital in the homeostatic regulation of the ionic composition, volume and pH of the internal environment

(Chiras, 2005).

In this dissertation, we developed new statistical methodologies for analyzing time to event data based on the OU process. We have extended previous models, based on the OU process, to incorporate available covariate information. To demonstrate the usefulness of these methodologies, we applied the methods to real data from biomedical studies and assessed model fit by comparing our estimated OU survival curve to the Kaplan-Meier curve generated from the same data. The first data set consists of 192 subjects from a clinical trial in the treatment of carcinoma of the oropharynx found in Kalbfleisch and Prentice (1980). Patients were randomly assigned to one of two treatments, radiation therapy in itself or radiation therapy in conjunction with chemotherapy. An objective of this study was to compare the two treatments with respect to patient survival. Covariates considered in this OU model approach included age, sex, treatment, patient physical condition, tumor site,

3 tumor grade, tumor T-stage and tumor N-stage. Time until death from cancer of the

oropharynx was recorded. The second data set examined comes from a clinical trial

which includes 713 melanoma patients who, after definitive surgery, were randomly

assigned to treatment or observation groups. We applied our model to data from

315 subjects who did not receive treatment during the study in order to analyze the

natural progression of the disease. Covariate information for each subject, including

age, sex, treatment, nodal category, and Breslow score were available for analysis.

Time until relapse and/or death from melanoma were recorded. We also broaden our

threshold regression approach to model data using a mixture model which introduces

a cure rate. Finally, we extended our approach by including subject specific random

effects which account for unexplained heterogeneity in initial health status.

The remainder of this dissertation is organized as follows. First we describe the concept of the threshold regression model and provide some examples of these models.

Next, we detail two specific types of threshold regression models; the Wiener process model and the Ornstein-Uhlenbeck process model. Then, we perform a simulation study using the OU process with covariates incorporated into the initial health status

(called the OU-TR model). Following this section, we apply the OU-TR model to the carcinoma of the oropharynx clinical trial data and present results. In the next chapter, we describe the use of a mixture model incorporating the OU process to model those subjects susceptible to experiencing the event under study. A simulation study is conducted using this OU-TR mixture model, and the model is applied to the melanoma study data. Following this chapter, an explanation is given for the

OU process models in which random effects are incorporated to capture unexplained heterogeneity between subjects. Simulation studies of this random effect modeling

4 method are conducted for both the OU-TR random effects model and the OU-TR random effects mixture model, and these models are applied to the carcinoma of the oropharynx and the melanoma clinical trial data respectively. Finally, we highlight future work to be accomplished that may enhance the capabilities of the OU process models described in this dissertation.

5 Chapter 2: Threshold Regression: A First Hitting Time Regression Model

There are two basic components to the FHT model as described in Lee and

Whitmore, 2006. The first is a parent stochastic process X ,t T,X = x X { t ∈ t ∈ } with initial value X0 = x0, T is the time space and X is the state space of the process. The second component consists of a boundary set or threshold B, where

B X. X may have many different properties such as one or more dimensions, the ⊂ t , a continuous or discrete state or a monotonic sample path. In the context of medical applications, Xt is often latent and describes the health status of the subject. In epidemiological applications, Xt frequently describes the unobservable status of the disease under investigation.

As described in Lee and Whitmore (2006), if we take x0 to lie outside of B, the first hitting time of B is the random variable S = inf t : X B . Therefore, the time { t ∈ } when the stochastic process first encounters B is the first hitting time. The threshold state is the first state encountered by the process in the boundary set, X B. Thus, S ∈ a stopping condition is defined by the boundary set. If the parent process is latent, we cannot observe the FHT event in the state space of the process directly. For example, liver transplant patients have several factors used to determine initial health status after transplant. These factors may include type of transplant, age and weight to

6 name a few. The boundary can be set as death due to complications from the liver

transplant. Thus, the process models the decline in health from the initial point to

death when the process hits the threshold.

First hitting time (FHT) models have been applied in an array of fields such as

engineering, economics, business and medicine. They have been used to model labour

turnover (Whitmore, 1979), the onset time for a cancer induced by occupational

exposure (Lee et al., 2004), length of a hospital stay (Eaton and Whitmore, 1977)

and strike duration (Linden, 2000). What makes FHT models valuable in applications

is the capability to include regression structures. This allows effects of covariates

to account for natural dispersion of the data, thereby explaining variability and

sharpening inferences. Regression structures also provide scientific insights into potential

causal roles of covariates in the underlying processes, boundary sets and time scales

(Lee and Whitmore, 2006). As described by Lee and Whitmore (2006), there are

several possible choices for the stochastic process Xt including a ,

Poisson process, , Wiener process, gamma process and an Ornstein-

Uhlenbeck (OU) process.

2.1 Gamma Process and Inverse Gamma First Hitting Time

In the gamma process model, described in Lee and Whitmore (2006), the parent process is X ,t 0 with initial value X = x > 0 and X = x G where { t ≥ } 0 0 t 0 − t G ,t 0 is a gamma process with G = 0. The gamma process, described in { t ≥ } 0

Kyprianou (2006) in section 1.2.4 (pages 7-8), has increments Gt s = Gt Gs, where − − 0 s

7 gamma(α, β) distribution is

α β α 1 βGt−s f(Gt s α, β)= (Gt s) − e− − | Γ(α) − where Gt s (0, ), α> 0 and β > 0. Some authors have considered generalizations − ∈ ∞ of the gamma process in which the shape or scale parameter vary monotonically with time. For instance, Kalbfleisch (1978) used a gamma process with shape parameter

α(t) Λ∗(t) as the prior for the cumulative baseline hazard function (Λ (t)) in a ∝ 0 0

Bayesian analysis of the Cox model, where Λ∗(t) is a parametric cumulative baseline hazard function representing one’s best a priori guess at Λ0(t).

Since a gamma process has monotonic sample paths, the first hitting time of the parent process (Xt = 0) has an inverse . An advantage of this model is that computational routines for the gamma distribution are readily available. In Singpurwalla (1995), the use of this model is motivated by the fact that item wear is nondecreasing and failure of many components or systems of components is more likely due to wear than a traumatic event. In Lawless and

Crowder (2004), Singpurwalla’s gamma process model is extended to incorporate covariates to better explain reliability of items with certain characteristics. Random effects are also included to explain heterogeneity between these items not accounted for by the observed covariates. Lawless and Crowder (2004) set up the gamma process model by defining Gt to be gamma(α,η(t)) where η(t) is a given monotone increasing function of time. Covariates are incorporated in the gamma process by changing α to α(v) where v is a vector of covariate values which allows rescaling of Gt without changing the shape parameter of its gamma distribution. To incorporate random effects into this model, further alteration of α is accomplished by using rα(v) where r is the random effect. In their paper, an application is presented involving metal

8 fatigue crack growth data with random effects specific to each unit but no covariates.

However, it is suggested that α(v) = exp(βv′) be used as the regression function specification when covariates are involved. They define the monotone increasing

−1 β2 β function of time to be η(t)= β (1 y β β t)− 2 where y is the initial crack length 0 − 0 1 2 0 and the β’s are parameters that vary randomly across units.

A possible use of the gamma process model in a biological application is given in

Lee and Whitmore (2006). Here, they define the process as Xt = x0 with

1 p and X = x Z with probability p where p is a susceptibility probability and − t 0 − t

Zt is a gamma process. For example, a patient can have a benign form of disease

with probability 1 p or a malignant form with probability p. Thus, the malignant − form of the disease advances monotonically en route to death from the disease. In

contrast, the gamma process model may not be a good choice in applications where

health does not decline consistently over time; for example, diseases with long latency

periods.

2.2 Wiener Process Models

2.2.1 The Wiener Process

We begin by defining the Wiener process (Prahbu, 1965, Section 3, p. 10), Xt,

with drift µ ( , ) and variance σ2 > 0. The process has the following properties ∈ −∞ ∞

for any t1

1. X has independent increments; X X and X X are independent. t t2 − t1 t4 − t3

2. X X has a normal distribution with mean µ(t t ) and variance σ2(t t ) t2 − t1 2 − 1 2 − 1

where t1

9 Under these conditions, the probability density function (pdf) for Xt = X given that the process started at x0 is

2 1 (x x0 µt) f(x x0,t)= exp − − . (2.1) | σ√2πt − 2σ2t   Further details on the Wiener process can be found in Chhikara and Folks (1989) and

Prahbu (1965).

Next we focus on the first passage time T of Xt to a < x0, where a is the predetermined threshold value. The conditions X0 = x0, Xt > a, 0

and XT = a are necessary for T to be the first passage time. If T is finite, the density function of T is derived by finding the Laplace transform (Prabhu, 1965).

Details of these derivations can be found in Chhikara and Folks (1989). The resulting

first hitting time distribution is inverse Gaussian which is explained in the following section.

2.2.2 The Inverse Gaussian Distribution

The probability density function (pdf) of an inverse Gaussian random variable X is

2 λ 3/2 λ(x τ) f ∗(x τ,λ)= x− exp − , x> 0 (2.2) | 2π − 2τ 2x r  

where τ and λ are greater than zero. The mean of the distribution is τ and the scale parameter is λ. If we define φ = λ/τ, the shape of the distribution depends on φ only.

The inverse Gaussian distribution represents a broad class of distributions, varying from a highly skewed to a symmetrical distribution as φ goes from 0 to (Chhikara ∞ 1 and Folks, 1989). Since φ− = τ/λ, the inverse Gaussian distribution moves closer

10 to normal when φ is increased. As shown in Chhikara and Folks (1989), the density curves in Figures 2.1 and 2.2 illustrate the wide range of shapes possible when using the inverse Gaussian distribution.

3

2.5 λ = 0.2

2 λ = 30

1.5 λ = 0.5

λ = 1 λ = 5 1

0.5

0 0 0.5 1 1.5 2 2.5 3 Time

Figure 2.1: Inverse Gaussian Densities with τ = 1 for Several Values of λ

6

5 τ = 0.2

4

3

2 τ = 0.5

1 τ = 1 τ = 5 τ = 30 0 0 0.5 1 1.5 2 2.5 3 Time

Figure 2.2: Inverse Gaussian Densities with λ = 1 for Several Values of τ

11 2.2.3 The Inverse Gaussian Distribution and Survival Analysis

When studying subject survival or time to disease occurrence/recurrence, the

inverse Gaussian model has some useful properties. The hazard function for the

inverse Gaussian tends to initially increase, then decrease, and approach a constant

value as the lifetime becomes infinite. This property is frequently found when lifetimes

are dominated by early event times (Chhikara and Folks, 1989) such as studies

involving organ and bone marrow transplants (Klein and Moeschberger, 2003). Another

important property is that the family of inverse Gaussian distributions is rather broad.

This distribution can represent a highly skewed to an almost normal distribution

(Chhikara and Folks, 1989).

Suppose F (t) denotes the cdf of a subject’s survival time. Then, the subject’s

survival function S(t) at time t is the probability of experiencing the event after

time t. Therefore, S(t)=1 F (t). The cdf of t in terms of the standard normal − distribution function, given by Schuster (1968), is

λ t λ t F (t) = Φ 1 + exp(2λ/τ)Φ 1+ . (2.3) t τ − − t τ "r  # " r  # Therefore, the survival function for the inverse Gaussian is

λ t λ t S(t) = Φ 1 exp(2λ/τ) Φ 1+ . (2.4) t − τ − − t τ "r  # " r  # As mentioned in Section 2.2.1, the first hitting time (T ) of a Wiener process follows an inverse Gaussian distribution. The drift of the Wiener process may be positive, negative or zero. If the Wiener process has negative drift (µ < 0), then there is a propensity to drift toward the threshold (a), (Whitmore, 1979); i.e., S( ) = 0. With ∞ µ< 0, we obtain, from derivations explained in Chhikara and Folks (1989), a proper

12 inverse Gaussian distribution IG(τ,λ) of the first hitting times with parameters

defined as follows:

(x a) (a x )2 τ = 0 − and λ = − 0 . (2.5) µ σ2

Thus, the pdf of the inverse Gaussian first hitting time distribution when µ < 0 in terms of the Wiener process parameters is

x0 a 2 µ(t + − ) 2 x0 a 3/2 µ 2 ∗ − f (t µ,σ )= − t exp 2 , t> 0,σ > 0. (2.6) | √2πσ2 "− 2σ t #

The corresponding cdf is

2 2 (x0 a) tµ 2(x0 a)µ (x0 a) tµ F (t)=Φ − 1 + exp − Φ − 1+ . (2.7) 2 2 2 s σ t x0 a −  σ −s σ t x0 a   −     −      In a process that has positive drift (µ> 0), hitting a preset threshold (a) theoretically may never occur (Whitmore, 1979); i.e., it has a cure rate (S( ) is not necessarily ∞ 0). For example, in a clinical trial study where a patient’s health was modeled using

the Wiener process with positive drift, the subject may never experience the event

under study (they are cured). With µ > 0, the resulting improper distribution of

the first hitting time is inverse Gaussian IG( τ,λ) and the cure rate is S( ) = − ∞ 1 exp( 2x µ/σ2). If µ = 0, the process also has a propensity to drift toward a − − 0 (Whitmore, 1979). However, as seen in Chhikara and Folks (1989), the FHT is not

inverse Gaussian when µ = 0, but it is a stable distribution with index 1/2 (See Feller

1966, Section 6.1, p. 170) with probability density function

1 1 f(t σ2)= exp . | √ 3 −8tσ2 2σ 2πt  

13 In threshold regression, the process X and boundary set B have parameters that { t} are dependent on covariates differing between individuals (Lee and Whitmore 2006).

Using appropriate regression link functions, these parameters are joined to linear

combinations of covariates, such as gθ(θi)= ziγ for θ. Here gθ is the link function, the

parameter θi is the value of the parameter θ for individual i, zi = (1,zi1,zi2,...,zik)

is the covariate vector of individual i and γ is the associated vector of regression

coefficients. Normally, the link function will be chosen to map the parameter space

into the real line. Likewise, covariates and their mathematical forms in the regression

function zγ must be chosen appropriately, as is the case in a conventional regression

analysis. An attractive feature of the threshold regression model is that we are able

to relate subject characteristics to clinically meaningful parameters. We illustrate an

example of this using the Wiener process FHT model. The Wiener process has mean

2 (drift) parameter µ and variance parameter σ initial process level parameter x0 and the boundary set that includes the threshold a. However, the survival function only depends on these three parameters via x0/σ and µ/σ. Hence, when analyzing right censored data, there are essentially only two free parameters. Thus, we can arbitrarily set σ2 = 1 without loss of generality (Aalen and Gjessing, 2001).

In the railroad worker case-control study presented in Lee et al. (2009), covariates, such as smoking status, asbestos exposure and whether or not the subject worked as an engineer, were incorporated into the Wiener process model to determine their effect on survival. In this case-control study, the following link functions were used:

µ = β + β y + + β y . 0 1 1 ··· k k

ln(x )= γ + γ y + + γ y . 0 0 1 1 ··· k k 14 where y =(y1,y2,...,yk) is a vector of regression covariates. The underlying process was assumed to be a Wiener process with negative drift. Therefore, the first hitting time distribution was IG( τ,λ), with τ and λ as defined in (2.5), and maximum − likelihood techniques were used to find the estimates of β and γ from the link functions above. In this model setup, β represents the covariate effects on the initial health status and γ represents the covariate effects on the rate of decline in health status.

2.2.4 Previous Work using the FHT model with an Underlying Wiener Stochastic Process

Several authors have utilized the Wiener process FHT model in survival, reliability and economic applications. In this section, we will summarize several specific uses of this model.

The first example comes from research conducted in Lee et al. (2009). An FHT model with the Wiener process as the underlying stochastic process was used to analyze data, detailed in Garshick et al. (2004), from a case-control study which includes 3641 railroad workers where 1256 died from lung cancer (cases) and 2385 workers from the same population that did not die of lung cancer, suicide, accident or unknown cause (controls). Since 1959, the rail industry used diesel power for their locomotives. Thus, railroad workers began to be exposed to diesel exhaust.

For this case-control study, diesel exhaust exposure was captured by breaking down jobs with the railroad into three categories. The first category contains engineers, brakemen, firemen, conductors and hostlers. The second category consists of railroad shop workers and the third includes all other workers such as ticket and station agents, clerks and rail car repair workers. Since this data comes from a case-control study, each case subject contributes an observed lifetime from the reference date to the year

15 of death and each control subject contributes a censored survival time (censored by

some other cause of death) measured from the same reference date (Lee et al., 2009).

Covariates were incorporated into the model via the drift parameter and the initial

health status. Operational time was also used in this study and is explained at the

end of this chapter.

In Pennell et al. (2010), a Bayesian methodology was used in a Wiener process

FHT model that accounts for unmeasured covariates in both the initial health status

and the drift. To accomplish this, a random effect was included in the drift component

and each subject’s initial health status, x0i, was modeled as a truncated normal random variable. This methodology was applied to data from malignant melanoma patients where non proportional hazards and unexplained heterogeneity were present.

The results are compared to previous studies conducted on this data using Cox regression and fitting a similar FHT model without random effects.

Research conducted by Lee, Whitmore and Rosner (2010) explored threshold regression for survival data in longitudinal studies involving time-varying covariates.

To handle this type of data, the authors suggest breaking up longitudinal data into intervals and modeling time to event over each interval using threshold regression with a latent Wiener process under a Markov assumption. This method was illustrated using data from a nurse’s health study of lung cancer risk with completion times of surveys defining the different time intervals.

Lee, Chang and Whitmore (2008) conducted research using a threshold regression

mixture model for assessing treatment efficacy in a multiple myeloma clinical trial.

The subjects in this study were initially randomized to either receive Velcade or a

high-dose Dexamethasone treatment. Based on the subject’s response, they were

16 switched to the other treatment if necessary. A mixture of two Wiener process FHT

models was fit to the survival data since there was evidence of a bimodal FHT

distribution in each treatment group. The parameter in this model is the

proportion of patients receiving one of the two treatments. A composite time scale

was used to distinguish the rate of disease progression before and after switching

treatments (see section 2.4 at the end of this chapter). Covariates were incorporated in the model via the drift parameter.

An extension of the univariate Wiener process model can be found in research conducted by Whitmore, Crowder and Lawless (1998). Here, a bivariate Wiener process was used to jointly model a latent process and an observable marker process.

This technique was demonstrated with a simulated example and applied to a data set obtained from an aluminum production process. The data set contained failure age in days of the reduction cells (used to perform electrolysis of molten alumina and cryolite) and failure data on two markers which include the percentage iron contamination level and horizontal distortion of the cell in inches.

The bivariate Wiener process model of Whitmore et al. (1998) was extended by

Tong et al. (2008) to the case when only current status data are available. Current status data is also known as interval censored data where only one observation on each subject is available and the failure time is either smaller or larger than the observed time. This type of data can be found in cross-sectional studies and animal studies examining time to appearance of internal tumors.

Horrocks and Thompson (2004), proposed a Wiener process model for competing risks data. The model is based on the time, T , that a Wiener process hits one of two boundaries which represent two possible competing outcomes. Covariates

17 were incorporated into the model via the drift component of the underlying Wiener

process. The model was used on a subset of data from the Utah Department of

Health representing all hospital discharges in 1996. The two competing outcomes

were healthy discharge and death in hospital. The upper and lower thresholds

were modeled as a linear function of covariates. Horrocks and Thompson discussed

an extension of their model for length of stay that accounts for the presence of

heterogeneity in the population (accomplished through use of a mixture model).

Another competing risk application of the Wiener process FHT model was used in research conducted by Lindqvist and Skogsrud (2009). They considered a competing risks framework for a component that will experience either failure or a preventive maintenance procedure to avoid failure. A novel approach is presented that models component degradation using the Wiener process with failure associated with hitting a predetermined threshold. In addition, a potential time for maintenance associated with hitting a threshold before the failure threshold is accounted for in the model.

A final example of an application involving the Wiener process comes from research conducted by Saebo et al. (2005) on genetic evaluation of mastitis resistance in cows.

In the model setup, it is assumed that each cow is in a unique state of health at any given time that is a certain distance from onset of disease. The latent physiological battle against the disease can be modeled by a Wiener process with drift toward the disease threshold. Two risk patterns are associated with development of mastitis that include physical changes known to start in the days leading up to calving and the cow’s environment such as milking technique and hygiene. Thus, these two risk patterns invite the model setup involving two latent Wiener processes.

18 2.2.5 Strengths and Limitations of Using the Wiener Process Model

The Wiener process is widely used as the underlying stochastic process in research involving first hitting time models in survival and reliability applications. The first hitting time distribution, when using this process, is inverse Gaussian. As explained in Chhikara and Folks (1989), this distribution is very flexible and can represent skewed as well as approximately normal data. Thus, the Wiener process can be an important tool when modeling data characterized by early incidence of events. Also, when using the Wiener process, the inverse Gaussian distribution and the survival function for the first hitting times are easily computable and provide an efficient means of finding maximum likelihood estimates of model parameters. Another useful feature is the ability to model cure rates since the drift parameter can either be positive or negative. Finally, the ease of incorporating covariates into the model via the drift parameter or the applicable threshold make the Wiener process a viable choice in biostatistical research.

A limitation of the Wiener process model, originating in the defining properties of the process, is that disjoint time increments are independent. This can cause problems for modeling, for example, movements of an organism or a patient’s health status that logically depend on the state in the previous time increment of the process. This virtually eliminates the capability to model homeostasis in the underlying process as the Wiener process models rapidly fluctuating phenomena (Horsthemke and Lefever,

1984). A possible solution to this deficiency is the incorporation of the Ornstein-

Uhlenbeck process, a modification of the Wiener process, to allow adequate modeling

19 of the homeostatic properties of many biological processes. The OU process is described in the next section.

2.3 Survival Models Based on the Ornstein-Uhlenbeck Process

In a 2004 paper, Aalen and Gjessing discussed survival models based on the

Ornstein-Uhlenbeck (OU) process. This process is a mean reverting modification of a Wiener process in that it has a propensity to drift in the direction of a fixed equilibrium level. Homeostasis, defined as simultaneously diffusing back and forth while stabilizing around a certain point, is a characteristic found in many natural processes. Thus, the OU process is natural to consider in a biological context. An example of a biological process that exhibits homeostasis is kidney function. Kidneys get rid of extra water and ions from blood through passage of urine. Thus, the kidneys carry out homeostatic regulation by removing waste or excess products from the body.

For the purposes of this research, we consider two concepts when modeling with the

OU process. If the event under study is a positive one, such as modeling time to discharge from the hospital or ICU, the threshold, in the FHT model context, will be regarded as a healthy homeostasis. Also, in another modeling situation, we can have subjects being pulled from a healthy status toward an unhealthy homeostasis or threshold representing death or disease. In the following sections, details of the OU process are explained and an example is given describing a previous use of this model in the literature.

2.3.1 First Hitting Time for the Ornstein-Uhlenbeck Process

Aalen and Gjessing (2004) state that the Wiener process, represented by Wt, is well known for modeling random processes with continuous sample paths. Its time steps

20 over an interval are normally distributed with mean 0 and variance proportional to

the interval length. The OU process, represented by Xt, is actually a modified Wiener

process with a drift toward a state of equilibrium. The OU process can be defined by

the stochastic differential equation (Cox and Miller, 1965, Section 5.8, p. 226)

dX =(a bX )dt + σdW (2.8) t − t t

Where 0 and σ > 0. According to Aalen and Gjessing (2004), −∞ ∞

X0 is typically modeled as Gaussian or treated as a constant. This equation tells

us that for small time intervals (t,t +∆t), the change in Xt has drift toward a/b,

but is agitated by the Gaussian noise contained in dWt (often called ).

This process is attracted to the equilibrium point a/b. This attraction is known as

the mean-reverting property of the OU process. As shown in Aalen and Gjessing

(2004), X is Gaussian, and EX = a/b +(X a/b)exp( bt) which converges to a/b t 0 − − as t . Also, V ar(X )=[σ2/(2b)](1 exp( 2bt)) which converges to σ2/(2b) as → ∞ t − − t and Cov(X ,X )=[σ2/(2b)][exp( b s t ) exp( b(s + t))]. If we ignore → ∞ s t − | − | − − the initial fluctuations at the start of the process due to X = a/b, the OU process is 0 6 stationary and Gaussian with an autocorrelation function that decays exponentially over time (Aalen and Gjessing, 2004). Details regarding the OU process are also available in Aalen et al. (2008). To demonstrate the tendency for the OU process to reach a state of equilibrium in contrast to the Wiener process, corresponding sample paths are generated for the two processes and shown in figure 2.3. The initial value,

2 X0, of all processes was set to 4, σ is set to 2, the mean of the OU process was set to 0 (a = 0,b = 1) and the drift parameter for the Wiener process was set to 2, 0 − and 2. In this plot we see the Wiener process with positive drift tends to move away from X0 in a positive direction, and the Wiener process with negative drift tends to

21 move away from X0 in a negative direction. For the Wiener process with zero drift, the path tends to stay close to the starting point of the process. The OU sample path moves toward the process mean of 0 and stabilizes. This behavior exhibits the OU process mean reverting property. The variance of the Wiener processes, described in this plot, is 2 while the variance of the OU process converges to 1 as t approaches

infinity.

Wiener Process (drift = −2, 0, and 2) and OU Process (mean = 0) Paths with Xo = 4 25

20 Wiener Process drift = 2

15

10 Wiener Process drift = 0

5

OU Process 0 Health Status

−5

Wiener Process drift = −2 −10

−15

−20 0 1 2 3 4 5 6 7 8 9 10 Time (years)

Figure 2.3: Sample Paths of OU Process and Wiener Process

From this point on, it is assumed that the process is absorbed once it hits zero.

The OU process Xt is a process describing the latent progression of a subject toward a

health-related event. Suppose we let Xt correspond to a subject’s disease development

which may be latent. Then, the time the development reaches a particular level, an

event occurs for that subject. Therefore, we define the subject’s event time as the

first time the latent process Xt hits a threshold. We assume the OU process, Xt,

22 starts at a deterministic positive value x . We define T = inf t : X = 0 , where 0 { t } T 0, to be the event time (Aalen and Gjessing, 2004). We can define the hazard ≥ rate for continuous T by

d/dtP (T >t) h(t)= − . (2.9) P (T >t)

It can be shown that when t approaches , the hazard rate h(t), the rate of the first ∞

passage across the boundary, converges to a constant h0 (Aalen and Gjessing, 2004).

2.3.2 The Shape of the Hazard Function

Exact formulas for the hazard rate exist in the symmetric case(a = 0 in equation

(2.8)). Thus, in this situation, we are modeling time to homeostasis since the mean and the threshold of the process are equal to 0. Unfortunately, under the general

OU model, there is no closed form for the hazard rate. In Finch (2004), an attempt was made to find the general formula in closed form, but only a numerical solution is available. Also, in Aalen and Gjessing (2004), they state a closed-form symbolic inversion is hardly possible in general for the Laplace transforms required to find formulas for the density and survival functions.

In Ricciardi and Sato (1988, p. 46) the probability density of time to event when

2 starting in X0 (parameter values a =0,b =1,σ = 2) is given by

2 e2t X2 f(t)= X exp 0 . (2.10) π 0 (e2t 1)3/2 −2(e2t 1) r −  −  and the corresponding survival function is

X S(t)=2Φ 0 1, (2.11) √e2t 1 −  −  23 where Φ(.) is the standard cumulative normal distribution function. The corresponding

hazard rate is calculated as h(t) = f(t)/S(t). For the OU process, the hazard rate, where the parameter values are a = 0,b = 1,σ2 = 2, starts at 0 and then converges toward an equilibrium level. In the case of these specific model parameters, the hazard rate converges to 1. According to Aalen and Gjessing (2004), this convergence indicates the advancement of the underlying distribution toward quasi-stationarity on the state space, (see also Aalen and Gjessing, 2001).

Aalen and Gjessing (2004) discuss how the shape of the hazard function changes with X0; Figure 2.4 is a redrawing of their Figure 1. The hazard rate is generally increasing if X0 is far from 0. When X0 moves closer to zero, we get a more unimodal hazard. For small X0 that are close to 0, we obtain a generally decreasing hazard rate. Note that the hazard function corresponding to X0 = 0.2 in Figure 2.4 starts

out at zero, has a strong initial increase and then is generally decreasing. Therefore,

the shape of the hazard rate is driven by the distance X0 is from the threshold (Aalen

and Gjessing, 2001).

24 Hazard Function of Time to Absorption (a = 0, b = 1, sigmasq = 2) 3.5

3

2.5 Xo = 0.2

2

1.5 Xo = 0.8 Hazard Rate

1

0.5 Xo = 2.0

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (years)

Figure 2.4: Hazard function of time to absorption (parameter values: a = 0,b = 1,σ2 = 2)

2.3.3 Modeling the Hazard as the Square of an Ornstein- Uhlenbeck Process

Woodbury and Manton (1977), Myers (1981), and Yashin (1985), used the square of an OU process to model the hazard rate. In Myers (1981), a quadratic hazard is applied to a stochastic process and used to determine the survival function. The resulting survival function is evaluated with regards to the solution of a related Riccati equation where the coefficients of the equation are dependent on the coefficients of the process and the quadratic hazard function. This is important since the coefficients of the Riccati equation can depend on time allowing integration of aging effects into the model. According to Myers (1981), this modeling technique is particularly useful for clinical cancer trials evaluating effectiveness of therapies. Woodbury and Manton

(1977) develop models to represent the mechanics of human physiological aging and mortality using a stochastic process. Hazard rates and the corresponding survival

25 functions are modeled directly from several different models involving different ways

of incorporating chronological age.

There are similarities between using the OU process in an FHT model and modeling

the hazard function as the square of an OU process. If we let Xt be the OU process

defined by (2.8) and h(t) is the individual hazard at time t, we can assume a model

2 where the individual hazard rate is h(t)= Zt = Xt . Since the process Xt, is Gaussian,

Zt is still Gaussian conditional on survival (Yashin, 1985). To express the conditional

distribution of X, it is enough to obtain its conditional mean and variance. This model

also generates quasi-stationarity for X with its limiting distribution being Gaussian.

Now, instead of being restricted to b 0, we can use any value for b (Aalen and ≥ Gjessing, 2004).

Let X denote the level of the stochastic process at time t (0

Yashin’s (1985) model, the distribution of Xt, given that the subject’s survival time T

is greater than t, is normally distributed with mean m(t)= E[X T >t] and variance t| γ(t)= V ar[X T >t]. The population hazard rate for this model is then t|

θ(t)= E[Z T >t]= m2(t)+ γ(t). (2.12) t|

After some derivation (see Aalen and Gjessing, 2004), where g =( b B)/2, g = 1 − − 2 ( b + B)/2, c =1/2log (γ(0) g )/(γ(0) g ) and B = √b2 +2σ2,we find that − | − 1 − 2 | b B γ(t)= + tanh(Bt + c), (2.13) −2 2

cosh(c) sinh(Bt + c) sinh(c) a m(t)= m(0) + − . (2.14) cosh(Bt + c) cosh(Bt + c) B

From the equations for m and γ, we can see that when t increases, the mean and variance converge to the limits limt m(t) = a/B, and limt γ(t)=( b + B)/2 →∞ →∞ − 26 respectively. Thus, the limiting distribution is N[a/B, (b + B)/2]. The limiting − value, a/B, for m(t) is always less than the mean-reversion value a/b. Also, the greater the diffusion parameter σ, the more the quasi-stationary distribution of X is

pulled toward zero (Aalen and Gjessing, 2004).

2.3.4 The Ornstein-Uhlenbeck Process in Biostatistical Applications

As described previously, the term homeostasis refers to the state of balance in

an organism or system needed to survive and function correctly. In this research,

we consider two concepts when modeling with the OU process. One of the concepts

involves modeling time to a positive event, such as time to hospital discharge. The

threshold, in the FHT model context, is regarded as a healthy homeostasis. In the

opposite modeling situation, we can have subjects being pulled from a healthy status

toward an unhealthy homeostasis or threshold representing death or disease. Since

the OU process is mean-reverting, it is a sensible choice in these two cases.

In the medical literature, there are many papers pertaining to homeostasis. In an

article by de Leeuw and Dees (2003) titled “Fluid Homeostasis in Chronic Obstructive

Lung Disease”, the steady state of extracellular body fluid is disturbed by problems

from chronic obstructive pulmonary disease (COPD). This paper takes a look at some

of the mechanisms involved in this complication. Perhaps the OU process could be

used to model time to return of the steady state of extracellular body fluid under a

COPD treatment. Another article by Fry et al. (2003) in the Blood journal examines

the potential role for interleukin-7 in T-cell homeostasis. The size of the peripheral

T-cell compartment is firmly regulated all through life, meaning that homeostatic

27 systems operate to regulate peripheral T-cell numbers. This regulated state of health could possibly be modeled as a healthy homeostasis threshold.

An article from the American College of Chest Physicians by Cely et al. (2004) explores the connection of baseline glucose homeostasis to hyperglycemia during medical critical illness. Hyperglycemia occurs regularly in critically ill patients, including those with acute stroke, trauma, sepsis and myocardial infarction. Aggressive control of euglycemia (normal concentration of glucose in the blood) was linked with decreased morbidity and mortality. The investigators argue that glucose levels previously considered routine are pathologic, especially for surgical ICU patients.

Thus, if the glucose levels can be maintained in a state of healthy homeostasis, there may be a direct positive impact on the outcome of a critically ill patient.

Using the OU process to model the state of homeostasis would be relevant in the aforementioned applications due to its mean reverting property. An application of the OU process in biostatistical research can be found in a paper by Shen and

Pearl (2007). This research explores liver damage in drug treatment trials. The data includes a number of different measurements on analytes collected over irregular time points depending on the time of visits the patient had during the trial. To analyze the data, the researchers assumed there were two parts to the model. The

first part models the latent liver function process over time. The second makes a connection between the measurement process and the latent liver function process.

In the part of the model describing liver function, there is a homeostatic property modeled using a discretized OU process due to the researcher’s belief that immune and other organ repair systems instantaneously adjust liver function when it strays from its normal level. Another application involving the OU process can be found

28 in Boscardin, Taylor and Law (1998) regarding longitudinal models for AIDS marker

data. In their research, an integrated OU process is utilized in univariate and bivariate

longitudinal mixed models to allow for a range of biologically plausible derivative

tracking encompassing both random trajectory and Brownian motion behavior.

In previous sections, two modeling concepts were presented. They involved using the OU process to model latent health status and using the square of the OU process to model the hazard function directly. In this project, we use the OU process to model the latent health status due to its biological plausibility. Later we show that the model can be extended to include covariate information with meaningful covariate effects; such is not the case when the square of the OU process is used. However, under the square model, we are not restricted to the symmetric case (a = 0) and nonnegative values of b in order to obtain a closed form of the hazard or survival function.

2.4 Operational Time

An important detail to consider in FHT models is how the time scale should be represented. According to Lee and Whitmore (2006), calendar time is usually not the natural time scale of the parent process. For example, an automobile component may wear according to mileage rather than the amount of elapsed calendar time.

Hence some authors have proposed transforming calendar time into operational time scale that more accurately describes the “age” of the stochastic process. In Lee and

Whitmore (2006), r(t) is the function that transforms calendar time t to operational

time r with r(0) = 0. An absolute requirement is that r(t) is a monotonic transformation

but it does not need to be strictly increasing.

29 An example of operational time is a composite operational time which is a mixture of different accumulation measures. A composite running time can be defined by

J

r(t)= αjrj(t) (2.15) j=1 X The rj(t) are the different accumulation measures used to represent factors that

may strengthen disease progression. The αj are positive parameters that weight the

effects of the jth factor. To ensure identifiability, one αj parameter, αk, will be set

equal to one allowing all other factors to be weighted in terms of the kth factor. For

example, we could consider a model characterizing the degradation of a person’s lung

health. Naturally one would expect lung health to degrade according to time spent

smoking. Using equation (2.15), our setup for composite operational time would be

r(t)= α1r1(t)+ α2r2(t) where r1(t) is the number of months spent not smoking and

r2(t) is the number of months spent smoking. We set α1 = 1 and hence all time

is translated into nonsmoking time. If, for example, the estimate obtained for α2 is

greater than one, we say that lung degradation when smoking is greater than when

not smoking.

There are several examples of operational time use in the literature. In Lee et al.

(2009), operational time is incorporated into a FHT Wiener process model to capture

the effects of diesel exhaust exposure on disease progression for subjects working in

specific railroad job categories. Various calendar time periods during the subject’s

lifetime are believed to have different effects on disease progression dependent on the

disease risk and health effects exposed to during the time period. Time intervals

examined in their research reflect the subject’s job categories experienced throughout

their lifetime. They include years of life prior to joining the railroad, years spent

working in an exposed job category such as an engineer or brakeman, years spent

30 working in an unexposed job category such as ticket agents and rail car workers,

years spent working in the locomotive repair shop and years spent in retirement until

death. The retirement job category is used as the reference category (αk = 1) to allow weighting of the other job category factors in terms of retirement time. Thus, the calendar time t is replaced by r(t) in the model and estimates are obtained for the

α′s to determine the effect of exposure on disease progression for the job categories under study.

Another study conducted by Lee, Chang and Whitmore (2008) examined a Wiener process FHT mixture model for assessing treatment efficacy in a multiple myeloma clinical trial. This trial compares the effects of Velcade and high-dose Dexamethasone for treatment of this disease. Subjects are randomized to either treatment at the start of the study and they are switched between treatments during the study based on subject response. The operational time concept is applied in this study to capture differences in disease progression before and after switching treatments. The operational time scale in this study has the form r = αt1 + t2 where t1 corresponds to time spent under the primary therapy and t2 corresponds to time spent under the alternative therapy. Thus, the model parameter α represents the ratio of the disease progression under the primary therapy relative to the alternate therapy. Other applications using time scales other than calendar time, as suggested in Lee, Chang and Whitmore

(2008), can be found in Oakes (1995), Duchesne and Lawless (2000) and Lawless

(2003).

31 Chapter 3: The Ornstein-Uhlenbeck Model With Initial State Dependent On Covariates

Due to its mean reverting property, the Ornstein-Uhlenbeck (OU) process is a logical choice to model the state of homeostasis found in many biological processes.

In this research, homeostasis is cast as an unhealthy state for which the subjects are being pulled toward. Death or disease is viewed as the mean the OU process is reverting to. However, the OU process may also be used to model time to a healthy homeostasis if the event under study is a positive one such as modeling time to discharge from the hospital or ICU.

To demonstrate the use of the OU threshold regression (OU-TR) model, we examine a data set consisting of 192 subjects from a clinical trial of the treatment of carcinoma of the oropharynx found in Kalbfleisch and Prentice (1980). This clinical trial contains information on subjects who experienced death or survived past the end of the trial. The median survival time after entrance in the trial is 465 days, the mean patient age is 60 years and the censoring rate is 28%. These patients appear to be very ill and their Kaplan-Meier survival curve decreases rather quickly toward death. This study provides a good framework to apply the OU-TR model due to the mean reverting property of the underlying OU process where the mean is set as death from oropharynx cancer. Thus, we can model the subject’s health status where they

32 can be pulled toward death from oropharynx cancer, and the effects of covariates collected at the time of trial entry can be examined.

This chapter is organized as follows. First, we describe the OU-TR model with initial state dependent on covariates. A simulation study is then conducted to demonstrate the model’s ability to produce unbiased maximum likelihood estimates.

We also verify, through simulation, that the standard errors of these estimates are consistent with empirical estimates. Finally, this model is applied to the carcinoma of the oropharynx data set and the results and limitations of the model are discussed.

3.1 Proposed OU Threshold Regression Model

Often, in lifetime analysis, the goal is to determine covariate effects on the time to event. In a first hitting time model with an underlying stochastic process, this is possible through the use of a link function. In the Wiener process FHT model described in Lee et al. (2009), the covariates are linked to the drift parameter through an identity link, and they are also linked to the starting point (X0) of the process through a log-link function. Interpretation of the parameters under this structure can be found in Lee et al. (2009). In the OU-TR model, we look to incorporate covariates by use of a log-link function for the initial state given by

log(X )= γ + γ Z + + γ Z (3.1) 0 0 1 1 ··· k k where Z1,...,Zk are k covariates and γ0,...γk are their corresponding regression coefficients with γ0 denoting the intercept which is the value of log(X0) when

Z1 = ... = Zk = 0. Using this method to incorporate covariates into the OU-TR FHT model allows for interpretation of covariate effects in terms of relative initial health.

33 Recall from section 2.3.2, under the OU model, the density function of t is given by

2 e2t X2 f(t)= X exp 0 . (3.2) π 0 (e2t 1)3/2 −2(e2t 1) r −  −  and the corresponding survival function is

X S(t)=2Φ 0 1, (3.3) √e2t 1 −  −  where Φ(.) is the standard normal cumulative distribution function. Each censored

observation’s contribution to the likelihood function is the survival function in equation

(3.3) and each subject’s event time contributes the density function in equation (3.2).

For ease in writing the likelihood, we assume that the first m observations are events

and the last N m are censored. Thus, the likelihood function is given by −

m 2ti 2 N 2 e x0i x0i L = x0i exp 2Φ 1 (3.4) π (e2ti 1)3/2 − 2(e2ti 1) √ 2ti − i=1 "r # i=m+1 e 1 Y −  −  Y   −   and the corresponding log-likelihood is

m 2ti 2 N 2 e x0i x0i log(L)= log x0i exp + log 2Φ 1 (3.5) π (e2ti 1)3/2 − 2(e2ti 1) √ 2ti − i=1 "r # i=m+1 e 1 X −  −  X   −  

In order to estimate the standard errors of the maximum likelihood estimates (MLEs)

under the OU-TR model, the observed Fisher information is used. In Efron and

Hinkley (1978), it was convincingly argued that the variance of an MLE can be

estimated by

\ τ (θ) var(τ(θ)) ′ (3.6) ≈ ∂2ln(L(θ Z)) 2 | ˆ − ∂θ θ=θMLE

d In our model setup, τ(θ) = θ since each regression coefficient is linear in the link function. Thus, the numerator of the above equation is always equal to 1. Let Z be a matrix of covariate values for all subjects, γ = (γ0,γ1,...,γk)′ and γˆ be the

34 MLE of γ. The approximate variance of the parameter estimates γ, can be found by calculating the Fisher observed information matrix as follows

∂2ln(L(γ Z)) ′| ∂γ∂γ − γ=γˆMLE  

and then taking the inverse of this matrix. The diagonal elements of the inverse matrix

will be the approximate variances of the respective model parameters. The results

of the mathematical derivations of these approximate model parameter variances are

found in Appendix A.

To utilize the OU-TR model for data analysis, a new Matlab function called

OUTR was developed. An explanation of the function can be found in Appendix G.

3.2 Simulation Study

A simulation study was conducted to examine the properties of our estimation

procedure. The natural log of initial health status was simulated using a fixed

intercept term along with a N(2, 1) random variable (Norm), truncated below at 0,

simulating a continuous measurement on a subject (e.g., height), and a Bernoulli(0.5)

random variable (Bern) simulating the presence or absence of an attribute (e.g.,

exposed or unexposed to a toxin). The log-link function for subject i was

log(X0i)= γ0 + γ1Normi + γ2Berni (3.7)

We used an algorithm proposed by Finch (2004) to generate a sample path of the

OU process over a time interval [0,T ]. Let R be a large integer and c0,c1,...,cR

be independent random variables generated from a normal distribution with mean 0

2 and variance σ /(2ρ) and X0 is as defined above. We define the following recursive

35 relationship:

2 xr = µ + kR(xr 1 µ)+ 1 kR cr (3.8) − − − q  for 1 r R, where k = exp( ρT/R). The sequence x ,x ,...,x is called ≤ ≤ R − 0 1 R a first-order autoregressive sequence (a discrete analog of the OU process) with lag

2 one correlation coefficient kR. We assume that µ = 0, ρ = 1, and σ = 2 to obtain

Ricciardi and Sato’s (1988) FHT distribution. The sample path was simulated with a threshold value of 0 and the initial health status (X0) determined by link function

(3.1). If the sample path reached the threshold before it completed the time interval, the survival time was recorded as S = min t = 1 ...R : X 0 X = c . If { t ≤ | 0 } the sample path did not reach the threshold in the time interval, the survival time

was a right censored observation. After simulating the sample, maximum likelihood

estimation was used to estimate the parameters in the OU-TR model.

Matlab Version 7.9 was used to conduct the simulations and the fminsearch procedure was utilized to obtain the maximum likelihood estimates of γ. The fminsearch function in Matlab employs the simplex search algorithm which is a direct search method that does not use numerical or analytic gradients. In order to get the MLEs, we minimized the negative of the log-likelihood. Since the algorithm can converge to a local minimum, we implemented the method under 10 randomly selected sets of starting values and only saved the estimates that generated the greatest log-likelihood.

However, the algorithm rarely converged to a local minimum during the simulation analysis. The sample standard deviations of the estimates (SSE) were recorded and compared to the average of the standard errors of the estimates (SEE) generated by the negative of the inverse information matrix. Five different simulation scenarios were considered that involved different combinations of parameter values to simulate

36 data sets with differing numbers of events. The goal was to evaluate the model’s estimation performance under various censoring rates and parameter values. For each scenario, we generated 1000 data sets of size 200.

Scenario γ0 γ1 γ2 Number 1 True 1 0.5 0.5 Avg Events (200) MLE 1.0712 0.4968 0.4996 Bias 7.12% -0.63% -0.07% SSE 0.1348 0.0543 0.1073 SEE 0.1288 0.0532 0.1011 Percent Diff 4.53% 1.97% 5.96% Number 2 True 3 0.25 2 Avg Events (199) MLE 3.0602 0.2491 2.0062 Bias 2.01% -0.37% 0.31% SSE 0.1321 0.0547 0.1068 SEE 0.1284 0.0531 0.1011 Percent Diff 2.79% 2.96% 5.52% Number 3 True 7.5 0.5 0.5 Avg Events (143) MLE 7.5624 0.4991 0.5050 Bias 0.83% -0.17% 0.99% SSE 0.1333 0.0558 0.1046 SEE 0.1287 0.0532 0.1012 Percent Diff 3.54% 4.69% 3.30% Number 4 True 8 0.5 0.5 Avg Events (114) MLE 8.0668 0.4976 0.4977 Bias 0.84% -0.48% -0.47% SSE 0.1361 0.0571 0.1019 SEE 0.1311 0.0550 0.1020 Percent Diff 3.77% 3.67% 0.08% Number 5 True 8.75 0.5 0.5 Avg Events (63) MLE 8.7653 0.5002 0.4948 Bias 0.75% 0.04% -1.04% SSE 0.1494 0.0683 0.1172 SEE 0.1443 0.0646 0.1098 Percent Diff 3.48% 5.69% 6.52%

Table 3.1: Results of Simulation Study Based on 1000 Data Sets of Size 200.

37 The results of our simulation study are in Table 3.1. Bias was small (< 8%) even under a high censoring rate (Scenario 5). In addition, the SEEs of the estimates obtained via the observed information matrix methods were very close to the SSEs across the differing number of events in the sample. This is a good indication that we have reliable standard error estimates for the model parameters regardless of the number of events. We conclude, through simulation, that the maximum likelihood estimates behave as expected, like minimum variance consistent estimators.

3.3 Application of OU-TR Model to Overall Survival of Patients with Carcinoma of the Oropharynx

To demonstrate the use of our model, we briefly examine a real data set consisting of 192 subjects from a clinical trial of treatment for carcinoma of the oropharynx conducted by the Radiation Therapy Oncology Group in the United States. This data set and description can be found in Kalbfleisch and Prentice (1980). This trial consisted of patients randomly assigned to one of two treatments, radiation therapy alone or radiation therapy in conjunction with chemotherapy. An objective of this original study was to compare the two treatments with respect to patient survival. Covariates considered in this OU analysis included age, sex, treatment, patient physical condition (1 = no disability, 2 = restricted work, 3 = requires assistance with self care and 4 = bed confined), site of the tumor (1 = faucial arch,

2 = tonsillar fossa, 3 = posterior pillar, 4 = pharyngeal tongue and 5 = posterior wall; note that there were no observations in categories 3 and 5), grade of the tumor

(1=well differentiated, 2=moderately differentiated and 3=poorly differentiated), T- stage (1=primary tumor measuring 2 cm or less in largest diameter, 2=primary tumor measuring 2 cm to 4 cm in largest diameter with minimal infiltration in depth,

38 3=primary tumor measuring more than 4 cm, 4=massive invasive tumor) and N-stage

(0=no clinical evidence of node metastases, 1=single positive node 3 cm or less in diameter, not fixed, 2=single positive node more than 3 cm in diameter, not fixed and

3=multiple positive nodes or fixed positive nodes). Due to low counts in some of the categorical variables, some modifications to the data were made. The T-stage variable was modified by combining categories 1, 2 and 3 into one category labeled “not massive” and category 4 was left as “massive”. The N-stage variable was modified by leaving category 0 as “no clinical evidence of a lymph node metastasis” and categories

1, 2, and 3 were combined into one group called “lymph node involvement”. Finally, the condition variable was recoded to 0 for “no disability” and categories 2, 3, and 4 were combined into one category called “disability” which was coded as 1. Survival time was recorded as years from diagnosis to death or the end of the trial. Three patients were removed from the data set due to missing data, leaving 192 patients for analysis; 139 died and 53 (28%) were censored. Descriptive statistics for the covariates are given in Table 3.2.

The OU-TR model selection procedure was conducted using Matlab Version 7.9.

Initially, all variables were checked in separate univariate models using the likelihood ratio test. The variables age (p-value = 0.9774) and sex (p-value = 0.7436) were removed from further consideration. All subsets analysis was performed with main effects only using BIC as the selection criteria. The best BIC model (BIC = 486.107) contained the condition and T-stage covariates while the second best BIC model

(BIC = 486.519) contained condition, T-stage and N-stage. To assess fit of the OU-

TR model, we compare the estimated survival curves with the Kaplan-Meier curves generated from the survival data. The closer the OU-TR curve is to resembling the

39 Continuous Variable Age (years) Mean = 60.083 Std Err = 10.921 Median = 60 Max = 90 Min = 20 Categorical Variables Grade Well Differentiated: 49(25.52%) Moderately Differentiated: 109(56.77%) Poorly Differentiated: 34(17.71%) Site Faucial Arch: 65(33.85%) Tonsillar Fossa: 63(32.81%) Pharyngeal Tounge: 64(33.33%) Condition Disability: 50(26.04%) No Disability: 142(73.96%) T-Stage Not Massive: 126(65.63%) Massive: 66(34.38%) Sex Male: 147(76.56%) Female: 45(23.44%) Treatment Radiation: 98(51.04%) Radiation and Chemotherapy: 94(48.96%) N-Stage No lymph node involvement: 38(19.79%) Lymph node involvement: 154(80.21%)

Table 3.2: Summary Statistics of Variables Considered in Modeling the Oropharynx Data

Kaplan-Meier curve, the better the fit. Currently, this is the best method available to assess fit of threshold regression models. The goodness-of-fit plots in Figures 3.1, 3.2 and 3.3 are consistent with the BIC and suggest that the model containing condition and T-stage is best. Additionally, we checked for a possible interaction between disability and T-stage. The resulting BIC was 488.28 and the goodness-of-fit plot

(Figure 3.4) showed no improvement in fit over the main effects model. Therefore,

40 we selected the main effects model with disability and T-stage as the final model. Note that under this model, the fit was not very good for the patients with no disability with tumors that were not considered massive. This highlights a limitation of the

OU-TR model which we describe in Section 3.4.

OU Survival curves vs KM Plots for Subjects with a Disability by T−Stage 1 OU T−Stage: Not Massive 0.9 OU T−Stage: Massive KM T−Stage: Not Massive 0.8 KM T−Stage: Massive

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time to Death (Years)

OU Survival curves vs KM Plots for Subjects without a Disability by T−Stage 1

0.9

0.8

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time to Death (Years)

Figure 3.1: Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model

Information on the parameter estimates is found in Table 3.3. Based on this model, having no disability was associated with better health at the onset of the clinical trial.

0.9807 In fact, those with a disability were estimated to be 100(1 e− ) = 62% closer − to the threshold (death) at baseline holding T-stage constant. Also, not having a

41 GOF Plots for Subjects with a Disability and Tstage:Not Massive by N−Stage 1 OU N−Stage: No Metastasis 0.9 OU N−Stage: Metastasis KM N−Stage: No Metastasis 0.8 KM N−Stage: Metastasis

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time to Death (Years)

GOF Plots for Subjects with a Disability and Tstage:Massive by N−Stage 1

0.9

0.8

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Time to Death (Years)

Figure 3.2: Goodness of Fit of Second Best BIC Carcinoma of the Oropharynx Model for Subjects with a Disability

massive tumor was associated with better health at the onset of the trial. For those

0.5126 with a massive tumor were 100(1 e− ) = 40% closer to the threshold (death) at − baseline holding disability constant.

In Kalbfleisch and Prentice (1980), a proportional hazards model was fit to these data. The significant covariates were sex, condition and T-stage. Being male, having a disability and having a larger tumor had negative effects on the patient’s survival.

These results support our findings under our OU-TR model. However, we analyzed the data with respect to the condition and T-stage covariates using the stcox procedure in Stata 10.1 (Stata Corp., College Station, TX) to determine if the proportional

42 GOF Plots for Subjects without a Disability and Tstage:Not Massive by N−Stage 1 OU N−Stage: No Metastasis 0.9 OU N−Stage: Metastasis KM N−Stage: No Metastasis 0.8 KM N−Stage: Metastasis

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time to Death (Years)

GOF Plots for Subjects without a Disability and Tstage:Massive by N−Stage 1

0.9

0.8

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.5 1 1.5 2 2.5 3 Time to Death (Years)

Figure 3.3: Goodness of Fit of Second Best BIC Carcinoma of the Oropharynx Model for Subjects without a Disability

hazards assumption was violated. We performed a 1 degree of freedom likelihood ratio test to determine if there was a significant interaction between an indicator variable representing condition and the log of time. The proportional hazards assumption was violated (p-value value of likelihood ratio test was 0.003). For the t-stage covariate, the proportional hazards assumption was not violated (p-value = 0.116).

This highlights an advantage of our OU-TR model since it does not require the proportional hazards assumption be satisfied. Also, our OU-TR model provides a different insight on covariate effects than the Cox model. With the OU-TR model, we can estimate the effect a covariate has on a subject’s health at the start of the

43 OU Survival curves vs KM Plots for Subjects with a Disability by T−Stage 1 OU T−Stage: Not Massive(No Int) 0.9 OU T−Stage: Not Massive(Int) OU T−Stage: Massive(No Int) 0.8 OU T−Stage: Massive(Int)

0.7 KM T−Stage: Not Massive KM T−Stage: Massive 0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time to Death (Years)

OU Survival curves vs KM Plots for Subjects without a Disability by T−Stage 1

0.9

0.8

0.7

0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time to Death (Years)

Figure 3.4: Comparing Goodness of Fit of Model with Interaction Between Disability Status and Tumor Size (Int) with the best BIC Main Effects Model (No Int)

observation period and how close the subject is to experiencing the event under

study. For the oropharynx study, having no disability was associated with better

health at the onset of the clinical trial (those with a disability were estimated to be

0.9807 100(1 e− ) = 62% closer to the threshold (death) at baseline holding T-stage − constant). Also, not having a massive tumor was associated with better health at the

0.5126 onset of the trial (those with a massive tumor were 100(1 e− ) = 40% closer to the − threshold (death) at baseline holding disability constant). The proportional hazards

model provides us insight on the hazard ratio between those with a disability and those

44 without a disability while holding t-stage constant and the hazard ratio between those with a massive tumor and those without while holding condition constant.

Variable Parameter Estimate Std Err P-value Intercept 1.1161 0.0675 <0.0001 Disability -0.9807 0.1189 <0.0001 Massive Tumor -0.5126 0.1099 <0.0001

Table 3.3: Final Model for Death from Carcinoma of the Oropharynx

45 3.4 Discussion

In this chapter, we proposed an Ornstein-Uhlenbeck threshold regression model for survival data. A simulation study was conducted under the model and excellent results were obtained. The maximum likelihood estimates generated had small bias even for data with a high censoring rate. Also, reliable standard error estimates were obtained through use of the observed Fisher information. Thus, our model performs very well and is suitable for use in survival data applications.

In addition, the OU-TR model was fit to the oropharynx data set described previously. Application of our model to these data provided valuable insight on the effect of covariates on patient health at the onset of the clinical trial. However, it also demonstrated a limitation of the methodology. Our results indicated that those patients who were worse off in terms of condition and T-stage of the primary tumor were well represented by the OU-TR model. However, for those patients who tended to be better off in terms of these two variables, the model does not fit very well. In general, we expect the OU-TR model to be a good fit to the survival data when subjects experience the event rather quickly. For example, time to death of high grade cancer patients or the elderly would be a good fit because the health of these subjects is being pulled quickly to a point of unhealthy homeostasis at which point they die. However, when the data suggests a longer time is needed for subjects to experience the event, the model poorly fits the data. For example, cancers which progress slowly such as prostate cancer.

46 Chapter 4: The Ornstein-Uhlenbeck Mixture Model

Sometimes in survival analysis, the event of interest is not inevitable. For example, smokers may be prone to dying from lung cancer, nonsmokers may not. Thus, two distinct groups may emerge from the data; those whose lung condition declines until the event occurs, and those whose lung condition does not decline and never experience the event. This phenomenon is captured using a cure rate model.

One of the most popular types of cure rate models is the mixture model, presented in Berkson and Gage (1952), where it is assumed that a certain fraction p of the population is cured and the other 1 p is not cured. Previous works using this model − include Farwell (1982), Kuk and Chen (1992), Taylor (1995), Laska and Meisner

(1992), Yamaguchi (1992), Xiao et al. (2012), and a gamma process cure rate model is mentioned in Lee and Whitmore (2006). In Farewell (1982), a general parametric mixture model was proposed to be used for survival data with a significant proportion of long-term survivors. In Farewell’s paper, the cure rate was modeled using both conditional and non-conditional logistic regression. In Kuk and Chen

(1992), a mixture model was proposed that combines a logistic function for the probability of the event occurring with a proportional hazards model for the time of event occurrence. The proposed model is a semi parametric generalization of a

47 parametric model due to Farewell (1982). Taylor (1995) proposed a different semi-

parametric generalization of the mixture model of Farewell (1982). The cure rate

portion uses a logistic regression model, and a Kaplan-Meier approach is used to

estimate the latency distribution, that is the survival function for those subjects who

are not “cured”. In Laska and Meisner (1992), nonparametric generalized maximum

likelihood product limit point estimators and corresponding confidence intervals are

developed for a cure rate model with random censoring. Yamaguchi (1992) discusses

accelerated failure-time regression models with an added regression model for the

cure rate. His modeling techniques attempt to estimate the effects of covariates on

the acceleration/deceleration of the timing of a given event and the surviving fraction

simultaneously.

Using a threshold regression mixture model has several advantages over previous

cure rate methods. In threshold regression, we have a parametric model which allows

us to determine covariate effects on both the baseline health of the subject and the

cure rate. Also, we are not constrained by the proportional hazards assumption

required in Cox regression which highlights the flexibility offered in threshold regression

mixture models. In the context of threshold regression, Lee and Whitmore (2006)

discuss a gamma process model with a cure rate. The latent health process is defined

as a constant set to the starting point of the process when the subject is “cured”,

and it follows a gamma process if the subject is not “cured”. The mixing parameter

p (0 p 1) helps determine which path the latent health process will follow. ≤ ≤ For example, a population may be at risk of death from cholera with probability p. Thus, a proportion p of the population will advance monotonically toward death from cholera where a proportion (1 p) will not experience death from this disease. −

48 A similar mixture model framework was used by Xiao et al. (2012). Subjects who

were at risk of disease or death follow the Wiener process, and a logit link function

was used for p to incorporate covariates into the model. They developed a threshold regression model package utilizing the Wiener process (stthreg) that is available in

Stata.

Much like the gamma and Wiener process model, we can formulate a mixture model incorporating the OU process. For those subjects that are not susceptible to the event under study, their latent health process is constant at a level equal to their initial health. For those subjects who are susceptible, we can model their latent health process with the mean reverting OU process. The mean reverting nature of the OU process will pull them toward the threshold which we can define as the point of death or disease. Using the OU-TR mixture model makes more biological sense than the gamma process model because the subject’s health is being drawn to an unhealthy homeostasis, defined as death or disease, which is not necessarily monotonic. There may be periods of time where the subject health levels off before declining again until they experience the event.

In order to implement our OU-TR mixture model, we must extend our OU-TR model approach, which assumes everyone experiences the event at some time, to allow a cure rate. This chapter is organized as follows. First, we describe our OU-

TR mixture model. Then, a simulation study is conducted to evaluate our model estimates in terms of bias and accuracy of the standard error estimates. Finally, this model is applied to a melanoma data set, described in Section 4.3, and the results and limitations of the model are discussed.

49 4.1 Proposed OU Threshold Regression Mixture Model

Our model assumes two latent groups of subjects and a binary latent variable (Y ) denoting group membership; Y = 1 if subject is not susceptible to experiencing the event under study (or is cured), Y = 0 if subject will (eventually) experience the event. As in chapter 3, we let Xt be the latent health status of a subject at time t.

Given Y , we assume

X Y = YX + (1 Y )X∗ (4.1) t| 0 − t where Xt∗ follows the OU process defined in Chapter 3 and X0 = x0. Lee and

Whitmore (2006), proposed a related model extension where the gamma process was the underlying stochastic process. In words, this model means that the health of

“cured” subjects remains at x0 over the time period under study. Subjects who are not “cured” follow the OU stochastic process defined in Chapter 3 and revert to the process mean at which point the event occurs.

Covariates are introduced into our model in two different places. First, covariates are linked to the cure rate (pi) which is modeled as the logistic function:

exp(β + β W + + β W ) p = 0 1 1i ··· q qi (4.2) i (1 + exp(β + β W + + β W )) 0 1 1i ··· q qi

where W1i,...,Wqi are q covariates for subject i and β0,...βq are their corresponding regression coefficients with β0 denoting the intercept where exp(β0)/(1 + exp(β0)) is the cure rate when W1i = W2i = ... = Wqi = 0. The modeling of the cure rate with the logistic function is similar to previous cure rate models found in Farewell (1982),

Kuk and Chen (1992), and Taylor (1995). As in Chapter 3, the initial health

50 status (x0i) is linked to a set of covariates (Z1i,...,Zki) via a log-link function defined as:

ln(x )= γ + γ Z + + γ Z . (4.3) 0i 0 1 1i ··· k ki

The covariates in W1i,...,Wqi and Z1i,...,Zki need not be the same; the choice depends on the application. For example, in a randomized drug trial, we might expect treatment to affect the cure rate, but not the initial health. Having two regression models also allows you to distinguish between two different types of covariate effects; internal and external. Internal effects influence a subject’s health after the observation or study period has started whereas external effects impact the subject’s initial health at the start of the study period. In our model, we can capture external effects through analysis of the regression parameters found in the initial health link function. Internal effects can be determined through analysis of the regression parameters in the logistic function associated with the mixing parameter. For example, in a randomized drug trial, the treatment may cure the subject or prolong their life beyond the end of the study. Thus, the effect of the treatment would be considered internal. On the other hand, if a subject were admitted to the hospital for treatment for a particular condition, their previous living conditions and time spent working a particular job would be external effects since they are no longer experienced while in the hospital.

As discussed in Aalen and Gjessing (2001), internal and external effects are captured by the different models for µ and X0 in a Wiener process model.

51 As in Chapter 3, subjects may have either uncensored or right censored survival

times. When a subject experiences an event, their contribution to the likelihood is

f ∗(ti) = P (Yi = 0)f(ti) (4.4)

= (1 p )f(t ), − i i

where f(ti) is the pdf under the OU-TR model in Chapter 3. When a subject does not

experience an event during the study, the following could be true: (1) The subject will

never experience the event (Yi = 1) or (2) The subject will eventually experience the

event (Yi = 0), but did not during the observation period. Thus, the corresponding survival function and contribution to the likelihood is

S∗(t )= p + (1 p )S (t ), (4.5) f i i − i f i where Sf (ti) is the survival function under the OU-TR model in Chapter 3. The portion of the survival function Sf∗(ti) that captures subjects in category (1) is pi and

(1 p )S (t ) captures those in category (2). Note that S∗( ) = p > 0 indicating − i f i f ∞ i an improper survival function.

For ease in writing the likelihood, we assume that the first m observations are events and the last N m are censored. Thus, the likelihood function is given by −

m 2 2ti N x0ie 2 π x0i x0i L = (1 pi) exp pi + (1 pi) 2Φ 1 (4.6)  − (qe2ti 1)3/2 − 2(e2ti 1)  − √e2ti 1 − i=1 −  −  i=m+1    −   Y   Y   and the corresponding log-likelihood is

m 2 2ti N x0ie 2 π x0i x0i log(L)= log (1 pi) exp + log pi + (1 pi) 2Φ 1  − (qe2ti 1)3/2 − 2(e2ti 1)  − √e2ti 1 − i=1 −  −  i=m+1    −   X   X   (4.7) Estimating the standard errors proceeds using the observed Fisher information as

shown previously in Chapter 3. The expressions for the standard errors are provided

in Appendix B.

52 To utilize the OU-TR mixture model for data analysis, a new Matlab function called OUTRMM was developed. An explanation of the function can be found in

Appendix G.

4.2 Simulation Study

A simulation study was conducted to examine the properties of our estimation procedure. The natural log of initial health status was simulated using an intercept term along with a standard normal random variable (Norm) simulating a centered

continuous measurement on a subject (e.g., height or weight) and a Bernoulli(0.5)

random variable (Bern) simulating the presence or absence of an attribute. The same

simulated covariates (Norm and Bern) were used in the cure rate parameter (p) to

simulate the covariates’ effects on the subject’s susceptibility to experience the event.

The log-link function was

log(X0i)= γ0 + γ1Normi + γ2Berni (4.8)

and the cure rate was

exp(β0 + β1Normi + β2Berni) pi = (4.9) (1 + exp(β0 + β1Normi + β2Berni))

To simulate the survival data, we first drew Y Bern(p ). If a 1 was generated, i ∼ i this subject was not prone to experiencing the event, and hence their health status

remained at x0i. If a 0 was generated, the subject was prone to experiencing the

event and the path of their health status was simulated from an OU process initiated

at x0i, as described in Chapter 3, Section 2. After simulating the sample, maximum

53 likelihood estimation was used to estimate the parameters in the OU-TR mixture

model.

Matlab Version 7.9 was used to conduct the simulations and the fminsearch procedure was utilized to obtain the maximum likelihood estimates of γ and β as

described in Chapter 3, Section 2. Note that the fminsearch function’s simplex

search algorithm rarely converged to a local minimum during the simulation analysis.

Five different simulation scenarios were considered that involved different combinations

of parameter values to simulate data sets with differing numbers of events. The goal

was to evaluate the model’s estimation performance under various censoring rates

and parameter values. For each scenario, we generated 1000 data sets of size 300.

The results of our simulation study are in Table 4.1. Bias was small even under

heavily censored data (Scenario 5), with bias being the largest for the mixing parameter

coefficients (β0, β1, and β2). In addition, the SEEs (see Chapter 3, Section 2 for

definition) obtained via the observed information matrix were close to the SSEs

( 10% different) across the differing number of events in the sample. We conclude, ≤ through simulation, that the maximum likelihood estimates behave like minimum

variance consistent estimators.

54 Scenarios γ0 γ1 γ2 β0 β1 β2 Number 1 Truth 5 0.5 0.5 -3 0.5 0.5 Avg Events (276) MLE 5.0592 0.4976 0.4989 -3.0589 0.5092 0.5321 Bias 1.17% -0.47% -0.22% 1.92% 1.81% 6.03% SSE 0.0616 0.0460 0.0898 0.4652 0.2716 0.5655 SEE 0.0598 0.0437 0.0855 0.4333 0.2582 0.5328 Percent Diff 3.06% 5.07% 4.80% 7.09% 5.06% 5.95% Number 2 Truth 5 1 0.5 -1 0.5 0.5 Avg Events (198) MLE 5.0632 0.9987 0.5009 -1.0099 0.5114 0.5061 Bias 1.25% -0.13% 0.19% 0.98% 2.23% 1.20% SSE 0.0708 0.0582 0.1084 0.1946 0.1392 0.2574 SEE 0.0690 0.0530 0.1017 0.1930 0.1390 0.2613 Percent Diff 2.56% 9.30% 6.34% 0.83% 0.13% -1.53% Number 3 Truth 5 2 1.5 -1 1 0.9 Avg Events (176) MLE 5.0635 1.9932 1.4961 -0.9990 1.0329 0.9108 Bias 1.25% -0.34% -0.26% -0.10% 3.18% 1.19% SSE 0.0758 0.0629 0.1100 0.2138 0.1902 0.2914 SEE 0.0722 0.0606 0.1088 0.2067 0.1808 0.2868 Percent Diff 4.96% 3.78% 1.09% 3.39% 5.05% 1.59% Number 4 Truth 5 1 0.5 0.3 0.75 0.5 Avg Events (114) MLE 5.0648 0.9979 0.5021 0.3045 0.7621 0.5148 Bias 1.28% -0.22% 0.41% 1.47% 1.59% 2.87% SSE 0.0979 0.0768 0.1423 0.1811 0.1515 0.2643 SEE 0.0933 0.0738 0.1370 0.1781 0.1453 0.2589 Percent Diff 4.88% 4.02% 3.81% 1.65% 4.20% 2.08% Number 5 Truth 5 1 0.5 1 0.5 0.5 Avg Events (70) MLE 5.0756 1.0021 0.5099 1.0255 0.5167 0.5017 Bias 1.49% 0.21% 1.94% 2.49% 3.23% 0.33% SSE 0.1219 0.0984 0.1871 0.2002 0.1550 0.2938 SEE 0.1173 0.0933 0.1777 0.1929 0.1509 0.2880 Percent Diff 3.87% 5.32% 5.16% 3.70% 2.68% 1.97%

Table 4.1: Results of Mixture Model Simulation Study Based on 1000 Data Sets of Size 300.

4.3 Application of OU-TR Mixture Model to Time to Relapse Data from Patients with Melanoma

To demonstrate the use of the OU-TR mixture model in a data application, we apply it to time to relapse data from the E1690 clinical trial analyzed in Kirkwood et al. (2000). This data set or a variation thereof was also used in a paper by Chen et

55 al. (2002) to demonstrate a Bayesian cure rate model. The outlook for subjects with

this disease depends largely on the number of lymph nodes to which the disease has

spread and thickness of the primary tumor known as the Breslow score (Kirkwood

et al. 2000). Those patients with more severe diagnoses of these two factors are at

high risk of relapse after definitive surgery (Kirkwood et al. 2000). This clinical trial

randomized patients into two groups; one that received treatment with Interferon

Alfa-2b and the other did not receive treatment. For the purpose of our research,

we consider only those 315 subjects who did not receive treatment. This allows us

to apply the OU-TR mixture model to study the natural progression of melanoma

after entrance into the study: a certain proportion of subjects (p) is likely to be cured by surgery while the remaining subjects are pulled toward cancer recurrence. The research in Kirkwood et al. (2000) and Chen et al. (2002) focused on differences in survival between the treated and control groups, and thus their analyses are not comparable to ours.

Explanatory variables in this study include the subject’s age at the start of the clinical trial, sex, Breslow score and nodal category. The nodal category is broken down into four groups; N0 denotes no spread to nearby lymph nodes, N1 indicates spread to one nearby lymph node, N2 denotes spread to two or three nearby lymph nodes or the melanoma has spread to nearby skin or near a lymph node area without actually reaching the lymph nodes and N3 indicates spread to four or more lymph nodes or spread to lymph nodes which are clumped together or spread to nearby skin or toward a lymph node area and into the lymph node(s) (Amer. Cancer Soc. 2011).

Summary statistics for these covariates are given in Table 4.2.

56 Continuous Variables Variable Mean Std Err Median Max Min Breslow Score 3.971 3.604 3 35 0.08 Age 48.621 13.142 48.95 78.053 18.680 Categorical Variables Nodal Cat N0: 68(21.59%) N1: 106(33.65%) N2: 75(23.81%) N3: 66(20.95%) Sex M: 198(62.86%) F: 117(37.14%)

Table 4.2: Summary Statistics of Variables Considered in Modeling the Melanoma Data

There were two different times recorded during the study; time to relapse and time to death from the date of study entry, we focused on the former. Patients were all required to have surgery within two months (Kirkwood et al., 2000) and thus date of surgery and date of study entry were not the same, but were relatively close considering the lengthly follow-up in this study. Each patient, who experienced relapse during the trial, contributed an observed survival time. Each patient, who remained cancer free past the end of the trial, contributed a censored survival time.

The original data set was modified for these analyses by dropping 37 subjects with missing covariate data. Thus, a total of 315 subjects were included in this analysis where 205 died and 110 (35%) were censored.

The OU-TR mixture model selection procedure was conducted using Matlab

Version 7.9. Our cure rate modeling approach involved two stages. First we fit the OU-TR model (Chapter 3) to the data and used an all subsets approach to select the model with the smallest BIC. The set of covariates from the best OU-TR model was then considered in the OU-TR mixture model. All subsets variable selection was then performed in which each covariate was considered in the model for initial health status, cure rate, both initial health status and cure rate or neither. The two models

57 with the smallest BIC values were compared using goodness-of-fit plots (described in

Chapter 3, Section 3) to determine the best fitting model.

Nodal Category Breslow Log Intercept Age 1 2 3 Male Score Likelihood BIC 0.0170 -763.4140 1532.5806 0.0172 0.0123 -763.3578 1538.2207 0.0684 -0.5921 -0.7986 -0.9141 -736.5809 1496.1721 -0.0240 0.0669 -763.0833 1537.6717 0.0211 0.0774 -762.1569 1535.8190 0.7229 -0.0624 -0.6156 -0.8663 -0.9624 -753.2907 1499.3443 -0.0221 0.0065 0.0639 -763.0682 1543.3942 0.0211 -0.0027 0.0783 -762.1544 1541.5665 0.6532 -0.5843 -0.7875 -0.9132 0.0416 -736.4601 1501.6832 0.7786 -0.7015 -0.9156 -1.0347 -0.1009 -735.2200 1499.2028 -0.0142 0.0573 0.0747 -761.9159 1541.0895 0.6748 -0.0697 -0.6089 -0.8562 -0.9677 0.0726 -734.9342 1504.3838 0.7885 -0.0482 -0.6987 -0.9431 -1.0438 -0.0786 -734.5229 1503.5613 -0.0161 -0.0077 0.0604 0.0770 -761.8958 1546.8018 0.7434 -0.6948 -0.9044 -1.0364 0.0496 -0.1029 -735.0489 1504.6132 0.7400 -0.0558 -0.6923 -0.9328 -1.0495 0.0737 -0.0788 -734.1589 1508.5859

Table 4.3: Stage 1 of OU-TR Mixture Model Building for Melanoma Data With Relapse as Event

The results from the first stage of our analysis are found in Table 4.3. The best

OU-TR model contained only the nodal category. Therefore, in all subsets model selection, we compared the following: nodal category in initial health with only an intercept in the mixing parameter (Model 1), intercept only in initial health with nodal category in the mixing parameter (Model 2) and nodal category in both initial health and the mixing parameter (Model 3). The results are given in Table 4.4. Model

1 is the best in terms of BIC followed by Model 3. Figures 4.1 and 4.2 demonstrates goodness-of-fit of these two models. Based on the goodness-of-fit plots, the second best BIC model with nodal category in the initial health status and mixing parameter

58 more closely resembles the Kaplan-Meier curves and thus provides the better fit to

the melanoma data. Therefore, we selected this as the final model. Information on

the parameter estimates for our final model are found in Table 4.5.

Model 1 Model 2 Model 3 Initial Health Intercept 0.3692 -0.1832 0.3601 Node Cat 1 -0.5127 - -0.5058 Node Cat 2 -0.6274 - -0.6148 Node Cat 3 -0.7073 - -0.6971 Mixing Parameter Intercept -0.7124 -0.0613 -0.0845 Node Cat 1 - -0.4254 -0.4040 Node Cat 2 - -1.0589 -1.0298 Node Cat 3 - -1.3722 -1.3451 Log Like. -404.4910 -405.9336 -396.7977 BIC 837.7449 840.6300 839.6160

Table 4.4: Stage 2 of OU-TR Mixture Model Building for Melanoma Data With Relapse as Event

Based on the final model results, as nodal category increases in severity (category

zero being the least severe while category 3 is the most severe), there is a decrease in

health immediately following entry into the study, which, as stated earlier, was close

0.5058 to the time of surgery. Patients in nodal category 1 were 100(1 e− ) = 40% − closer to tumor recurrence than those in nodal category 0 at the beginning of the study. Also, nodal category 2 patients were 46% and nodal category 3 patients were

50% closer to tumor recurrence at thhe beginning of the study than those in nodal category 0. Likewise, as nodal category increases in severity, the probability that surgery cures the subject decreases. For those in nodal category 0 (no spread of

59 OU Plots Versus Kaplan−Meier Plots for Node 0 1 OU Model 1 0.9 OU Model 3 KM Plot 0.8

0.7

0.6

0.5

0.4

Relapse Free Probability 0.3

0.2

0.1

0 0 1 2 3 4 5 6 7 8 9 Time to Relapse (Years)

OU Plots Versus Kaplan−Meier Plots for Node 1 1 OU Model 1 OU Model 3 0.9 KM Plot

0.8

0.7

0.6

Relapse Free Probability 0.5

0.4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time to Relapse (Years)

Figure 4.1: Goodness of Fit of Best and Second Best Melanoma Models (in terms of BIC) for Nodal Categories 0 and 1

cancer to lymph nodes), there is a 48% chance that the patient will remain cancer

free following surgery. For those in nodal category 1, the cure rate decreases to 0.47.

The remaining cure rates for nodal categories 2 and 3 are 0.25 and 0.19 respectively.

This is biologically plausible since the higher the nodal category, the worse off the patient becomes in terms of survival. Overall, the cure rate model provided a good fit to the melanoma data. The estimated OU survival curves had a tendency to slightly

60 OU Plots Versus Kaplan−Meier Plots for Node 2 1 OU Model 1 OU Model 3 0.9 KM Plot

0.8

0.7

0.6

0.5 Relapse Free Probability 0.4

0.3

0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time to Relapse (Years)

OU Plots Versus Kaplan−Meier Plots for Node 3 1 OU Model 1 0.9 OU Model 3 KM Plot 0.8

0.7

0.6

0.5

0.4 Relapse Free Probability

0.3

0.2

0.1 0 1 2 3 4 5 6 Time to Relapse (Years)

Figure 4.2: Goodness of Fit of Best and Second Best Melanoma Models (in terms of BIC) for Nodal Categories 2 and 3

underestimate survival for nodal categories 0 through 2. However, the estimated

survival curve and Kaplan-Meier curves for nodal category 3 agree almost exactly.

In addition to the model building analysis, we checked the proportional hazard

assumption of the nodal category variable under the Cox model. The data was

analyzed using the stcox procedure in Stata 10.1 (Stata Corp., College Station, TX).

We performed a 3 degree of freedom likelihood ratio test to determine if there was a

61 significant interaction between indicator variables representing nodal category and the log of time. There was significant evidence that the proportional hazard assumption was violated (p-value of likelihood ratio test was 0.0296). This highlights an advantage of our OU-TR mixture model since it does not require the proportional hazards assumption be satisfied. Also, our OU-TR mixture model provides a different insight on covariate effects than a mixture model in which the Cox model is used. With our model, we can estimate the effect a covariate has on a subject’s health at the start of the observation period and how close the subject is to experiencing the event under study. For the melanoma study, as nodal category increases in severity (category zero being the least severe while category 3 is the most severe), there is a decrease in health immediately following entry into the study. The proportional hazards model provides us insight on the hazard ratio between those in different nodal categories.

Variable Parameter Estimate Std Err P-value X0:Intercept 0.3601 0.1201 0.0027 X0:Nodal Cat 1 -0.5058 0.1487 0.0007 X0:Nodal Cat 2 -0.6148 0.1530 <0.0001 X0:Nodal Cat 3 -0.6971 0.1544 <0.0001 P:Intercept -0.0845 0.2506 0.7361 P:Nodal Cat 1 -0.4040 0.3240 0.2124 P:Nodal Cat 2 -1.0298 0.3763 0.0062 P:Nodal Cat 3 -1.3451 0.4029 0.0008

Table 4.5: Final Model for Relapse from Melanoma

62 4.4 Discussion

In this chapter, we proposed an Ornstein-Uhlenbeck threshold regression mixture model for survival data with a cure rate. Similar to the OU-TR model presented in Chapter 3, a thorough simulation study was conducted and excellent results were obtained. The MLEs generated had small bias even for data with a high censoring rate. Also, reliable standard error estimates were obtained through use of the observed

Fisher information.

While demonstrating the OU-TR mixture model using the melanoma data, we observed close agreement between the Kaplan-Meier curve and the estimated OU survival curve for each nodal category. This suggests the model fit quite well for this data application. More generally, we would expect the OU-TR mixture model to fit cure rate data well when those, who will experience the event, approach the event rather quickly since this model allows for two different survival experiences; those that are “cured” with probability p and those that are not with probability 1 p. − For those that are not cured, the OU-TR model can capture their health decline to the event under study due to it’s mean reverting property. For example, consider a scenario where patients had surgery to remove a malignant tumor. If the surgery had completely removed all of the cancer cells which eliminated the possibility of re- growth, those subjects would fall into the “cured” category with probability p. Those who had surgery, the tumor returned, and their health rapidly deteriorated, would be pulled toward the threshold representing death from cancer with probability 1 p. − Thus, we would expect the OU-TR mixture model to fit these survival data quite well.

63 We do not expect all survival data to behave this way since there may not always be a cure rate present. Even if a cure rate is present, the OU-TR mixture model may not fit well if the subjects who will experience the event do not approach the threshold quickly. We have to be careful not to assume there is a cure rate unless there is strong scientific evidence suggesting subjects belong to two or more distinct populations

(Farewell, 1982). Even though we are not dealing with distinct populations in the melanoma study, there are notable distinctions between the health conditions of those in the different nodal categories due to significant increases in severity of the disease associated with increasing nodal categories.

We can gain valuable clinical insight from the estimated regression coefficients for p and X0. These two regression models allow the researcher to distinguish between internal and external covariate effects. Internal effects influence a subject’s health after the observation or study period has started whereas external effects impact the subject’s initial health at the start of the study period. In our melanoma data example, nodal category was deemed a significant covariate in both the initial health and the mixing parameter and thus had both internal and external effects on the subject’s health. The more severe the nodal category, the worse their health was at the start of the study (external). Also, the more severe the nodal category, the less chance the subject was cured from surgery (internal). The ability to distinguish between these two types of effects is an advantage of our approach over previous cure rate models.

64 Chapter 5: The Ornstein-Uhlenbeck Random Effects Model for Survival Data with Unmeasured Covariates

An assumption common in threshold regression is that individuals with the same set of covariates have the same expected sample path. From a reliability standpoint, where machines are under study, this assumption may be reasonable. However, in biological studies, this assumption may not hold due to unexplained heterogeneity in unobserved covariates which may have an effect on the parameters associated with the latent stochastic process. To account for these unmeasured covariates, a random effects (or frailty) model can be used. Failure to account for these covariates can lead to inaccurate estimates of survival (Pennell et. al., 2010).

A few authors have considered threshold regression models with random effects.

One model described in Aalen and Gjessing (2001) incorporates the random effect in the drift component of the Wiener process. Since it is natural for the health of some individuals to move slower or faster toward the threshold than others, it is biologically plausible to make this adjustment. For example, there may be some subjects who are able to fight off disease better than others and thus their movement

(drift) toward the threshold is much slower. In the same paper, Aalen and Gjessing also considered a gamma distributed initial state for the Wiener process model.

Whitmore (1986) used a normal-gamma mixture of inverse Gaussian distributions

65 to account for variation in the Wiener process parameters from one observation to another. Whitmore provided scenarios where a random effects model was necessary and others where it was unnecessary. For instance, repair times for an airborne communications receiver did not warrant the use of the normal-gamma mixture.

However, he found that his normal-gamma mixture model was needed to model reaction time data of a subject in a psychological experiment and hospital stay data of schizophrenic patients. Another example concerning frailty models in the threshold regression setting was provided by Saebo and others (2005). They utilized random effects in a Wiener process model taking heterogeneity of dairy cows into account when modeling susceptibility to diseases varying across different sires. This random effect entered into the model via the drift parameter.

Recently, Pennell et al. (2010) proposed a Bayesian methodology for fitting TR models with random drifts and initial states. Results from a simulation study showed that when random effects were not used, several survival estimates varied from the true value by 0.2 or more. When random effects were incorporated into the model, the difference between the estimates and the true values differed by no more than 0.11.

This demonstrates the importance of accounting for between-subject heterogeneity in threshold regression.

In Chapter 3, we proposed an Ornstein-Uhlenbeck threshold regression model for survival data with initial state dependent on covariates. This model provided excellent results in simulations and, for the most part, reasonably fit overall survival data of oropharynx cancer patients. However, we may be able to improve the fit of this model and provide more accurate regression parameter estimates by accounting for the unexplained heterogeneity between subjects. For instance, in the oropharynx

66 cancer study, genotype (which was not examined in the study) may differ substantially between subjects and may have an influence on their overall survival from this disease.

In this chapter, we extend the OU-TR model presented in Chapter 3 to include random effects. We incorporate the random effect into the initial health status to capture the between-subject heterogeneity that may be present in the data. We define the distribution of the square of the random effect to be gamma(ψ,ψ). Since this distribution forms a conjugate pair with the conditional pdf, we are able to integrate out this random effect and provide a new survival distribution. Simulation studies are presented which demonstrate the numerical accuracy of parameter estimates and their standard errors and demonstrate the consequences of ignoring the random effects in the analysis. This random effects model is then applied to the oropharynx data used in Chapter 3.

We next present an OU-TR random effects mixture model for survival data with a cure rate. In conjunction with accounting for unexplained heterogeneity between subjects, this model allows us to analyze survival data that contains two distinct groups of subjects; one that is susceptible to the event under study and another that is not susceptible. As with the OU-TR random effects model, we evaluate our method and consequences of ignoring random effects in a simulation study. Finally, the OU-

TR random effects mixture model is applied to the melanoma data considered in

Chapter 4. The chapter concludes with a discussion of the results and strengths and limitations of our models.

67 5.1 OU-TR Random Effects Model

5.1.1 Proposed Model

In Chapter 3, we presented an OU-TR model with initial state dependent on

covariates. This model performed well in simulations and adequately when we applied

the model to the carcinoma of the oropharynx data. However, if there is unexplained,

between-subject heterogeneity present in the data, the OU-TR model will not adequately

account for it. As a result, we may obtain biased regression coefficient estimates from

the OU-TR model, incorrect standard errors, and poor fit. To address these issues,

we extend the OU-TR model by adding random effects to account for unmeasured

covariates in the initial health status. The initial health status for subject i is modeled as follows:

log(X∗ ) = log(λ )+ γ + γ Z + γ Z + + γ Z 0i i 0 1 1i 2 2i ··· k ki

= log(λi) + log(X0i). (5.1)

The subject-specific random effect is denoted as λi, Z1i,...,Zki are k covariates for

subject i and γ0,...γk are their corresponding regression coefficients with γ0 denoting

the intercept where log(X0i) = log(λi)+ γ0 when Z1i = ... = Zki = 0. Substituting

this new initial health status, X0∗i, into Equation (2.10), we get

2ti 2 2 e (λiX0i) g(ti λi)= λiX0i exp , (5.2) | π (e2ti 1)3/2 −2(e2ti 1) r −  −  the pdf of the event time given λ . We let φ = λ2 gamma(ψ,ψ) be defined as i i i ∼ ψ ψ ψ 1 ψφ f(φ )= φ − e− i i Γ(ψ) i

where φ (0, ) and ψ > 0. We selected φ = λ2 to be distributed as gamma(ψ,ψ) i ∈ ∞ i i since it is a conjugate pair with the conditional pdf in Equation (5.2); we can integrate

68 over the space of the random effects and end up with a closed form for the marginal

pdf. We note that the expected value and variance of the subject specific random

effect λi are

Γ(ψ +1/2) E(λ ) = i Γ(ψ)√ψ Γ(ψ +1/2) 2 V AR(λ ) = 1 . (5.3) i − Γ(ψ)√ψ   We also note that

ψ ψ ∞ ψ 1 φψ E[log(λ)] = log(φ)φ − e− dφ. (5.4) 2Γ(ψ) Z0 Equation (5.2) can be re-written as

2ti 2 2 e φiX0i g(ti φi)= φiX0i exp . (5.5) | π (e2ti 1)3/2 −2(e2ti 1) r   p − − Integrating φi out of the joint distribution, f(t,φi), we get the marginal pdf

2 e2ti ψψΓ(ψ +1/2) g(ti)= X0i ψ+1/2 (5.6) π X2 r 2ti 3/2 0i (e 1) Γ(ψ) 2(e2ti 1) + ψ − −   and cdf

ψ ti 2z 2 X0iΓ(ψ +1/2)ψ e G(ti)= ψ+1/2 dz. (5.7) π Γ(ψ) X2 r 0 2z 3/2 0i   Z (e 1) 2(e2z 1) + ψ − − h i More details regarding these derivations involved can be found in Appendix C. For

ease in writing the likelihood, we assume that the first m observations are events and

the last N m are censored. Thus, the likelihood function is given by −

m 2 2ti ψ π X0ie ψ Γ(ψ + 1/2) L = ψ+1/2  q X2  × i=1 2ti 3/2 0i (e 1) Γ(ψ) 2ti − + ψ Y  − 2(e 1)    2   N ψ ti 2z π X0iΓ(ψ + 1/2)ψ e 1 dz .  q 2 ψ+1/2  −  Γ(ψ)  0 2z 3/2 X0i i=m+1  (e 1) z + ψ  Y  Z 2(e2 −1)    −  h i   69  Unfortunately, this likelihood is difficult to maximize with respect to the regression

coefficients and random effects precision parameter (ψ) due to the complicated form

coupled with the necessary use of numerical integration. A much simpler form of the

likelihood may be derived through a change of variables:

2ψu2 m 2u i +1 N 1 i X2 Bx ( ,ψ) L = 0i i 2 (5.8) 1 ψ+ 1 1   2  2  B( ,ψ) i=1 B( ,ψ)(ui + 1) i=m+1 2 Y 2 Y     where u = X / 2ψ(e2ti 1), B(a,b) is the beta function, and B (a,b) is the i 0i − xi p 2 1 incomplete beta function where xi = sin [tan− (ui)]. The details of the derivations

involved are found in Appendix D. Standard errors were estimated using the observed

Fisher information as in Chapter 3. The results of the mathematical derivations of

these approximate variances are found in Appendix E.

To utilize the OU-TR random effects model for data analysis, a new Matlab function called OUTRREM was developed. An explanation of the function can be found in Appendix G.

5.1.2 Simulation Study

A simulation study was conducted to examine the properties of our random effects

(RE) model and estimation procedure. The natural log of the initial health status was simulated using a fixed intercept term along with a N(2, 1) random variable

(Norm), truncated below at 0, simulating a continuous measurement on a subject

(e.g., height), a Bernoulli(0.5) random variable (Bern) simulating the presence or

absence of an attribute (e.g., exposed or unexposed to a toxin), and the log of a

70 subject-specific random effect (λ ) with λ2 gamma(ψ,ψ); i.e., i i ∼

log(X0∗i) = log(λi)+ γ0 + γ1Normi + γ2Berni. (5.9)

To generate the data, the path of the subject’s health status was simulated from an

OU process initiated at X0∗i, as described in Chapter 3, Section 2. After generating

the sample, maximum likelihood estimation was used to estimate the parameters

(γ0,γ1,γ2,ψ) in the OU-TR random effects model.

Matlab Version 7.9 was used to conduct the simulations and the fminsearch

procedure was utilized to obtain the maximum likelihood estimates of γ and ψ as described in Chapter 3, Section 2. Note that the fminsearch function’s simplex

search algorithm rarely converged to a local minimum during the simulation analysis.

Twenty different simulation scenarios were considered. Four different values of ψ

were used in the simulations (0.25, 0.5, 1, and 2), and, under each of these values,

five simulations were conducted that involved different combinations of regression

parameter values to simulate data sets with differing numbers of events. The goal

was to evaluate the model’s estimation performance under various censoring rates and

parameter values. For each scenario, we generated 1000 data sets of size 200. The

results of our simulation study are in Tables 5.1 through 5.4. For all combinations

of true parameter values, bias was small under sparsely and heavily censored data

for the regression estimates with bias being the largest for the intercept (γ0) when

the true value of γ0 was small (γ0 = 2). In addition, the SEEs (see Chapter

3, Section 2 for definition) of the regression coefficient estimates obtained via the

observed information matrix were close to the SSEs ( 10% different) across the ≤ differing number of events in the sample regardless of the true value of ψ. Thus,

71 we conclude, through simulation, that the maximum likelihood estimates of the

regression parameters appear to behave like minimum variance consistent estimators.

When we look at the simulation results for the random effect precision parameter,

ψ, we also see small bias and small differences between the SEE and SSE values. Bias

in ψˆ increased with its true value. Note that, in Equation (5.3), Var(λ) decreases as ψ increases. When between subject variability is small, we do not have much information about the random effects in our model and hence the observed trend in accuracy of ψˆ makes logical sense. The greatest bias in ψ was observed in scenario

20 where ψ = 2 and approximately 3/4 of the data were censored. We also note that

the percent difference between the SEE and SSE for ψˆ tended to be larger when there

were more events in the simulated data since ψˆ had a smaller SSE forcing the SEE

to be even more precise.

The sample size used in these simulations was relatively small (200). If we slightly

increase it to 300, we see the bias issue disappear when the true value of ψ is 2 (shown

in Table 5.5). Thus, if the sample size is at least 300, the bias and the difference

between the SSE and SEE remained at 10% or less. Therefore, we conclude, through

simulation, that the maximum likelihood estimate of ψ appears to behave like the minimum variance consistent estimator. Thus we can draw inferences on the covariate effects using the estimated regression coefficients and their estimated standard errors, and we can accurately estimate survival probabilities under the random effects model.

We conducted a second simulation study to quantify the effects of ignoring random effects on the estimation of the regression coefficients and survival. This study consisted of four different simulation scenarios in which the true regression parameters remained the same but the true values of ψ varied (ψ = 0.25, 0.5, 1 or 2). First,

72 the OU-TR random effects model was fit to the simulated random effects data and the resulting estimates and standard errors were recorded. Then we fit the OU-TR model (without random effects) to these same data. Again, the resulting estimates and standard errors were recorded.

Scenarios γ0 γ1 γ2 ψ Number 1 Truth 4 1 0.5 0.25 Avg Events (195) MLE 4.0709 0.9980 0.5033 0.2619 Bias 1.74% -0.20% 0.65% 4.53% SSE 0.3238 0.1350 0.2664 0.0247 SEE 0.3337 0.1337 0.2674 0.0274 Percent Diff 3.01% 0.92% 0.36% 10.23% Number 2 Truth 7 0.5 0.5 0.25 Avg Events (176) MLE 7.1111 0.4928 0.4985 0.2549 Bias 1.56% -1.45% -0.30% 1.91% SSE 0.3467 0.1345 0.2716 0.0246 SEE 0.3386 0.1362 0.2696 0.0262 Percent Diff 2.36% 1.27% 0.72% 6.11% Number 3 Truth 8 0.75 -0.5 0.25 Avg Events (145) MLE 8.0949 0.7501 -0.4896 0.2541 Bias 1.17% 0.01% -2.13% 1.63% SSE 0.3448 0.1484 0.2724 0.0250 SEE 0.3466 0.1494 0.2809 0.0264 Percent Diff 0.54% 0.65% 3.06% 5.38% Number 4 Truth 11 -1 0.5 0.25 Avg Events (143) MLE 11.0846 -0.9989 0.4968 0.2553 Bias 0.76% -0.11% -0.64% 2.09% SSE 0.3925 0.1567 0.2755 0.0253 SEE 0.3978 0.1559 0.2851 0.0268 Percent Diff 1.35% 0.47% 3.42% 5.45% Number 5 Truth 9 0.5 0.5 0.25 Avg Events (106) MLE 9.0668 0.5066 0.5063 0.2553 Bias 0.74% 1.31% 1.25% 2.08% SSE 0.3813 0.1714 0.3122 0.0276 SEE 0.3698 0.1656 0.3167 0.0281 Percent Diff 3.08% 3.46% 1.42% 1.73%

Table 5.1: Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = 0.25.

73 Scenarios γ0 γ1 γ2 ψ Number 6 Truth 4 1 0.5 0.5 Avg Events (194) MLE 4.0988 0.9969 0.4991 0.5048 Bias 2.44% -0.31% -0.17% 0.96% SSE 0.2690 0.1060 0.2131 0.0677 SEE 0.2581 0.1021 0.2035 0.0649 Percent Diff 4.13% 3.69% 4.63% 4.27% Number 7 Truth 7 0.5 0.5 0.5 Avg Events (171) MLE 7.0884 0.5017 0.5049 0.5049 Bias 1.25% 0.34% 0.96% 0.98% SSE 0.2651 0.1064 0.2075 0.0675 SEE 0.2577 0.1023 0.2035 0.0646 Percent Diff 2.84% 3.92% 1.95% 4.35% Number 8 Truth 8 0.75 -0.5 0.5 Avg Events (132) MLE 8.0860 0.7553 -0.5041 0.5082 Bias 1.06% 0.70% 0.82% 1.61% SSE 0.2637 0.1116 0.2108 0.0696 SEE 0.2644 0.1118 0.2103 0.0666 Percent Diff 0.27% 0.22% 0.27% 4.39% Number 9 Truth 11 -1 0.5 0.50 Avg Events (130) MLE 11.0774 -0.9946 0.4965 0.5084 Bias 0.70% -0.54% -0.70% 1.65% SSE 0.3016 0.1170 0.2276 0.0701 SEE 0.3032 0.1175 0.2136 0.0671 Percent Diff 0.52% 0.44% 6.33% 4.44% Number 10 Truth 9 0.5 0.5 0.5 Avg Events (82) MLE 9.0753 0.5023 0.5036 0.5156 Bias 0.83% 0.45% 0.71% 3.03% SSE 0.2919 0.1284 0.2388 0.0751 SEE 0.2801 0.1257 0.2359 0.0734 Percent Diff 4.13% 2.11% 1.25% 2.26%

Table 5.2: Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = 0.5.

The results of these simulations are found in Table 5.6. As expected under all scenarios, the bias and difference between the SSE and SEE for the OU-TR random effects model were small for the regression parameter estimates. However, when we look at the results under the OU-TR model, there were larger biases for the intercept estimate when the true value of ψ was small. This occurs since the smaller ψ is, the greater the variance of the gamma random effect and the greater the amount of heterogeneity in X0 not accounted for by the OU-TR model. One would expect bias in the intercept when ignoring the random effect since the mean of the log of the random effect (E[log(λ)]) is not equal to 0 (expression given in Equation (5.4)).

74 Scenarios γ0 γ1 γ2 ψ Number 11 Truth 4 1 0.5 1 Avg Events (193) MLE 4.0935 0.9990 0.4984 1.0257 Bias 2.28% -0.10% -0.32% 2.51% SSE 0.2181 0.0819 0.1694 0.2116 SEE 0.2076 0.0807 0.1608 0.1977 Percent Diff 4.94% 1.46% 5.16% 6.77% Number 12 Truth 7 0.5 0.5 1 Avg Events (168) MLE 7.0916 0.4965 0.4997 1.0301 Bias 1.29% -0.70% -0.06% 2.92% SSE 0.2130 0.0821 0.1596 0.2102 SEE 0.2068 0.0805 0.1605 0.1984 Percent Diff 3.00% 1.94% 0.59% 5.78% Number 13 Truth 8 0.75 -0.5 1 Avg Events (123) MLE 8.0843 0.7470 -0.4915 1.0347 Bias 1.04% -0.40% -1.74% 3.35% SSE 0.2175 0.0884 0.1699 0.2167 SEE 0.2117 0.0881 0.1653 0.2057 Percent Diff 2.71% 0.28% 2.76% 5.22% Number 14 Truth 11 -1 0.5 1 Avg Events (120) MLE 11.0775 -0.9998 0.5053 1.0356 Bias 0.70% -0.02% 1.05% 3.44% SSE 0.2584 0.1006 0.1734 0.2154 SEE 0.2453 0.0942 0.1687 0.2078 Percent Diff 5.17% 6.65% 2.84% 3.74% Number 15 Truth 9 0.5 0.5 1 Avg Events (65) MLE 9.0666 0.5011 0.4985 1.0767 Bias 0.73% 0.21% -0.29% 7.12% SSE 0.2290 0.1067 0.1868 0.2517 SEE 0.2248 0.1019 0.1859 0.2422 Percent Diff 1.85% 4.52% 0.46% 3.85%

Table 5.3: Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = 1.

However, the bias we observed was not equal to E[log(λ)]. For example, when ψ = 1,

E[log(λ)] = 0.2886 and the bias observed for the intercept under the OU-TR model − was 0.4994. − To demonstrate the importance of accounting for random effects in terms of

survival estimation, plots of the simulated survival curves are shown in Figures 5.1

through 5.4 corresponding to the different true values of ψ (0.25, 0.5, 1, 2) respectively.

When there is a strong presence of a random effect (ψ =0.25, 0.5), the OU-TR model

provides highly inaccurate survival estimates. On the other hand, when there is less

of a random effect present (ψ =1, 2), the survival estimates from the OU-TR model

75 Scenarios γ0 γ1 γ2 ψ Number 16 Truth 4 1 0.5 2 Avg Events (193) MLE 4.0872 0.9969 0.4996 2.1904 Bias 2.13% -0.31% -0.08% 8.69% SSE 0.1896 0.0723 0.1374 0.9341 SEE 0.1779 0.0680 0.1349 0.8444 Percent Diff 6.38% 6.17% 1.86% 10.08% Number 17 Truth 7 0.5 0.5 2 Avg Events (166) MLE 7.0823 0.4983 0.5008 2.2115 Bias 1.16% -0.33% 0.16% 9.56% SSE 0.1844 0.0686 0.1389 0.9196 SEE 0.1770 0.0680 0.1345 0.8548 Percent Diff 4.13% 0.95% 3.19% 7.30% Number 18 Truth 8 0.75 -0.5 2 Avg Events (117) MLE 8.0787 0.7508 -0.5005 2.1958 Bias 0.97% 0.11% 0.09% 8.92% SSE 0.1871 0.0751 0.1435 0.8629 SEE 0.1809 0.0743 0.1389 0.8445 Percent Diff 3.35% 1.13% 3.30% 2.16% Number 19 Truth 11 -1 0.5 2 Avg Events (114) MLE 11.0762 -0.9979 0.4897 2.2343 Bias 0.69% -0.21% -2.11% 10.49% SSE 0.2140 0.0830 0.1454 0.8630 SEE 0.2117 0.0797 0.1413 0.8863 Percent Diff 1.06% 4.12% 2.88% 2.67% Number 20 Truth 9 0.5 0.5 2 Avg Events (55) MLE 9.0781 0.4948 0.5009 2.3260 Bias 0.86% -1.05% 0.17% 14.01% SSE 0.1956 0.0908 0.1585 1.0054 SEE 0.1941 0.0884 0.1575 1.0548 Percent Diff 0.76% 2.67% 0.66% 4.80%

Table 5.4: Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = 2.

are reasonably accurate. Based on these results, the OU-TR random effects model consistently provides accurate survival estimates regardless of the value of ψ.

76 Scenarios γ0 γ1 γ2 ψ Number 16 Truth 4 1 0.5 2 Avg Events (289) MLE 4.0868 0.9982 0.5069 2.0233 Bias 2.1% -0.18% 1.37% 1.15% SSE 0.1480 0.0572 0.1134 0.5966 SEE 0.1445 0.0553 0.1104 0.5606 Percent Diff 2.35% 3.40% 2.70% 6.24% Number 17 Truth 7 0.5 0.5 2 Avg Events (249) MLE 7.0790 0.5030 0.4983 2.0469 Bias 1.12% 0.59% -0.34% 2.29% SSE 0.1538 0.0587 0.1166 0.6198 SEE 0.1450 0.0555 0.1101 0.5944 Percent Diff 5.93% 5.55% 5.66% 4.19% Number 18 Truth 8 0.75 -0.5 2 Avg Events (176) MLE 8.0865 0.7452 -0.4966 2.0864 Bias 1.07% -0.64% -0.68% 4.14% SSE 0.1503 0.0609 0.1154 0.6408 SEE 0.1473 0.0604 0.1129 0.6098 Percent Diff 2.04% 0.86% 2.20% 4.96% Number 19 Truth 11 -1 0.5 2 Avg Events (171) MLE 11.0769 -0.9970 0.4922 2.1089 Bias 0.69% -0.30% -1.58% 5.17% SSE 0.1792 0.0666 0.1203 0.7035 SEE 0.1715 0.0649 0.1153 0.6430 Percent Diff 4.39% 2.58% 4.24% 8.99% Number 20 Truth 9 0.5 0.5 2 Avg Events (82) MLE 9.0681 0.4988 0.4918 2.2125 Bias 0.75% -0.24% -1.66% 9.60% SSE 0.1579 0.0735 0.1294 0.7748 SEE 0.1572 0.0715 0.1275 0.7582 Percent Diff 0.41% 2.78% 1.46% 2.17%

Table 5.5: Simulation Results for OU-TR RE Model Based on 1000 Data Sets of Size 300 with Psi = 2.

77 Scenario γ0 γ1 γ2 Truth 9.3 0.5 0.5 1: OU-TR RE Model MLE 9.3856 0.5018 0.4970 ψ = 0.25 Bias 0.91% 0.36% -0.61% Avg # Events=119 SSE 0.2884 0.2640 0.2899 SEE 0.2866 0.2641 0.2956 Percent Diff 0.63% 0.06% 1.94% OU-TR Model MLE 2.7683 0.5714 0.5513 Bias -235.95% 12.49% 9.30% SSE 2.1022 1.7642 2.3273 SEE 0.1377 0.1366 0.1356 Percent Diff 176.09% 171.25% 177.98% 2: OU-TR RE Model MLE 9.3839 0.4960 0.4892 ψ = 0.5 Bias 0.89% -0.81% -2.20% Avg # Events=99 SSE 0.2169 0.2002 0.2096 SEE 0.2167 0.1958 0.2177 Percent Diff 0.10% 2.23% 3.81% OU-TR Model MLE 6.7991 0.5229 0.4773 Bias -36.78% 4.37% -4.76% SSE 1.2437 0.9185 1.3583 SEE 0.1176 0.1139 0.1200 Percent Diff 165.44% 155.88% 167.53% 3: OU-TR RE Model MLE 9.3731 0.5063 0.4925 ψ = 1 Bias 0.78% 1.25% -1.51% Avg # Events=83 SSE 0.1813 0.1612 0.1787 SEE 0.1752 0.1556 0.1708 Percent Diff 3.45% 3.58% 4.49% OU-TR Model MLE 8.5006 0.4932 0.5013 Bias -9.41% -1.38% 0.26% SSE 0.4253 0.3307 0.4841 SEE 0.1054 0.0951 0.1062 Percent Diff 120.55% 110.66% 128.02% 4: OU-TR RE Model MLE 9.3711 0.5045 0.5024 ψ = 2 Bias 0.76% 0.89% 0.47% Avg # Events=74 SSE 0.1551 0.1362 0.1470 SEE 0.1509 0.1318 0.1424 Percent Diff 2.74% 3.29% 3.16% OU-TR Model MLE 9.0521 0.4852 0.4948 Bias -2.74% -3.05% -1.06% SSE 0.1950 0.1633 0.1963 SEE 0.1019 0.0903 0.1039 Percent Diff 62.76% 53.97% 61.62%

Table 5.6: Simulation Results Examining Effect of Ignoring Random Effect (RE) in the OU-TR Model. Results are Based on 1000 Data Sets of Size 200.

78 OU Estimated Survival vs True Survival Plots by Berns 1

0.9

0.8

0.7

0.6

0.5

0.4 OU−TR Berns = 0

Survival Probability 0.3 OU−TR RE Berns = 0

0.2 True Berns = 0 OU−TR Berns = 1 0.1 OU−TR RE Berns = 1 True Berns = 1 0 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.1: Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 0.25 (Scenario 1 from Table 5.6)

OU Estimated Survival vs True Survival Plots by Berns 1

0.9

0.8

0.7

0.6

0.5

0.4 OU−TR Berns = 0

Survival Probability 0.3 OU−TR RE Berns = 0

0.2 True Berns = 0 OU−TR Berns = 1 0.1 OU−TR RE Berns = 1 True Berns = 1 0 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.2: Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 0.5 (Scenario 2 from Table 5.6)

79 OU Estimated Survival vs True Survival Plots by Berns 1

0.9

0.8

0.7

0.6

0.5 OU−TR Berns = 0

Survival Probability OU−TR RE Berns = 0 0.4 True Berns = 0 OU−TR Berns = 1 0.3 OU−TR RE Berns = 1 True Berns = 1 0.2 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.3: Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 1 (Scenario 3 from Table 5.6)

OU Estimated Survival vs True Survival Plots by Berns 1

0.9

0.8

0.7

0.6 OU−TR Berns = 0

Survival Probability OU−TR RE Berns = 0 True Berns = 0 0.5 OU−TR Berns = 1 OU−TR RE Berns = 1 True Berns = 1 0.4 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.4: Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 2 (Scenario 4 from Table 5.6)

80 5.1.3 Application of the OU-TR Random Effects Model to Overall Survival of Patients with Carcinoma of the Oropharynx

To demonstrate the use of the OU-TR random effects model in a data application, we examined the data from the clinical trial conducted by the Radiation Therapy

Oncology Group of treatment for carcinoma of the oropharynx found in Kalbfleisch and Prentice (1980). This data set was previously described in Chapter 3, Section

3. In that section, the OU-TR model was fit to the data and it was determined that the best fitting model contained the disability and massive tumor indicator variables in the model for initial health. The OU-TR random effects model was also fit to the orophayrnx data incorporating the disability and massive tumor covariates. The

OU-TR random effects model data analysis was conducted using Matlab Version 7.9.

OU-TR Model Variable Parameter Estimate Std Err P-value Intercept 1.1161 0.0675 <0.0001 Disability -0.9807 0.1189 <0.0001 Massive Tumor -0.5126 0.1099 <0.0001 OU-TR Random Effects Model Variable Parameter Estimate Std Err P-value Intercept 1.3832 0.1355 <0.0001 Disability -0.9866 0.1544 <0.0001 Massive Tumor -0.5384 0.1438 0.0002 ψ 2.0501 0.7475 N/A

Table 5.7: OU-TR and OU-TR Random Effects Models for Carcinoma of the Oropharynx Data

The results for both of these models are shown in Table 5.7. When we look at these

results, we notice the estimated regression coefficients are not that much different

81 OU Survival Curves vs KM Plots for Subjects With a Disability by T−Stage 1 OU−TR T−Stage: Not Massive 0.9 OU−TR RE T−Stage: Not Massive OU−TR T−Stage: Massive 0.8 OU−TR RE T−Stage: Massive

0.7 KM T−Stage: Not Massive KM T−Stage: Massive 0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time to Death (Years)

Figure 5.5: Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model (OU- TR and OU-TR Random Effects) for Subjects with Disability

OU Survival Curves vs KM Plots for Subjects Without a Disability by T−Stage 1 OU−TR T−Stage: Not Massive 0.9 OU−TR RE T−Stage: Not Massive OU−TR T−Stage: Massive 0.8 OU−TR RE T−Stage: Massive

0.7 KM T−Stage: Not Massive KM T−Stage: Massive 0.6

0.5

0.4

Survival Probability 0.3

0.2

0.1

0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time to Death (Years)

Figure 5.6: Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model (OU- TR and OU-TR Random Effects) for Subjects with No Disability

except for the intercept term. This is expected since the mean of log(λ) = 0 as shown 6 in Equation (5.4). The estimate of ψ in the random effects model is approximately

82 equal to 2. This relatively large value of ψ indicates there is a smaller variance

in the random effect distribution. Therefore only a small amount of unexplained

heterogeneity is present when disability and T-stage are in the model. The Goodness

of fit plots in Figures 5.5 and 5.6 for both the OU-TR model and the OU-TR random

effects model confirm this. The estimated survival curves generated under both

models are similar to each other. In fact, the OU-TR model looks to provide a slightly

better fit to the data. However, the standard errors of the regression coefficients may

still be underestimated by the OU-TR model, as demonstrated by our simulation

study in Section 5.1.2. In Table 5.5, scenario 4 involves simulated random effects data where ψ = 2. We notice that the standard errors for the regression parameters were underestimated by roughly 60% in the simulation. In the data application, the standard errors from the OU-TR model were 26% lower than those from the random effects model. We must also note that, like the OU-TR model, the OU-TR random effects model had difficulty fitting survival data for those subjects with no disability and their primary tumor was not massive. In general, we expect the OU-TR random effects model to be a good fit to the survival data when subjects experience the event rather quickly. However, when the data suggests a longer time is needed for subjects to experience the event, the model poorly fits the data as seen in Figure 5.6.

5.2 OU-TR Random Effects Mixture Model

5.2.1 Proposed Model

We now extend our OU-TR random effects model to incorporate a cure rate. This model basically has the same framework as the mixture model in Chapter 4 with the added feature that allows us to account for unexplained heterogeneity between

83 subjects. Thus, we are able to simultaneously model data where a cure rate is present

and account for unmeasured covariates in the initial health status. The initial health

status for subject i remains the same as in the OU-TR random effects model:

log(X∗ ) = log(λ )+ γ + γ Z + γ Z + + γ Z 0i i 0 1 1i 2 2i ··· k ki

= log(λi) + log(X0i) (5.10)

Recall from Chapter 4, our model assumes two latent groups of subjects and a binary latent variable (Y ) denoting group membership; Y = 1 if subject is not

susceptible to experiencing the event under study (or is cured), Y = 0 if subject will

(eventually) experience the event. Also, subjects may have either uncensored or right

censored survival times. When a subject experiences an event, their contribution to

the likelihood is

g∗(ti) = P (Yi = 0)g(ti)

= (1 p )g(t ), (5.11) − i i

where g(ti) is the pdf under the OU-TR random effects model in equation (5.6).

When a subject does not experience an event during the study, the following could be

true: (1) The subject will never experience the event (Yi = 1) or (2) The subject will

eventually experience the event (Yi = 0), but did not during the observation period.

Thus, the corresponding survival function and contribution to the likelihood is

S∗(t )= p + (1 p )S (t ), (5.12) g i i − i g i where S (t )=1 G(t ) with G(t ) given under the OU-TR random effects model in g i − i i equation (5.7). The portion of the survival function, Sg∗(ti), that captures subjects

84 in category (1) is p and (1 p )S (t ) captures those in category (2). Note that i − i g i

S∗(inf) = pi > 0 indicating an improper survival function.

For ease in writing the likelihood, we assume that the first m observations are events and the last N m are censored. As shown earlier in equation (5.8) for − the OU-TR random effects model, we can write the random effects mixture model likelihood function in a simplified form given by

2 m 2ψui N 2ui 2 +1 1 X0i Bxi ( 2 ,ψ) L = (1 pi) pi + (1 pi) (5.13) 1 ψ+ 1 1 −   2  2  − B( ,ψ) i=1 B( ,ψ)(ui + 1) i=m+1 2 Y 2 Y      where u = X / 2ψ(e2ti 1), B(a,b) is the beta function, and B (a,b) is the i 0i − xi p 2 1 incomplete beta function where xi = sin [tan− (ui)]. Estimating the standard errors

proceeds using the observed Fisher information as shown previously in Chapter 3.

The derivations for the standard errors are provided in Appendix F .

To utilize the OU-TR random effects mixture model for data analysis, a new

Matlab function called OUTRREMM was developed. An explanation of the function

can be found in Appendix G.

5.2.2 Simulation Study

A simulation study was conducted to examine the properties of our estimation

procedure. The natural log of the initial health status was simulated using a fixed

intercept term along with a N(0, 1) random variable (Norm) simulating a centered

continuous measurement on a subject, a Bernoulli(0.5) random variable (Bern) which

simulates the presence or absence of an attribute (e.g., exposed or unexposed to a

toxin), , and the log of a subject-specific random effect (λ ) with λ2 gamma(ψ,ψ); i i ∼ 85 i.e.,

log(X0∗i) = log(λi)+ γ0 + γ1Normi + γ2Berni. (5.14)

The cure rate was simulated from the model

exp(β0 + β1Normi + β2Berni) pi = (5.15) (1 + exp(β0 + β1Normi + β2Berni))

To simulate the survival data, we first drew Y Bern(p ). If a 1 was generated, ∼ i this subject was not prone to experiencing the event, and hence their health status

remained at X0∗i. If a 0 was generated, the subject was prone to experiencing the

event and the path of their health status was simulated from an OU process initiated

at X0∗i, as described in Chapter 3, Section 2. After simulating the sample, maximum

likelihood estimation was used to estimate the parameters (γ0,γ1,γ2,ψ,β0, β1, β2) in the OU-TR random effects mixture model.

Matlab Version 7.9 was employed to conduct the simulations and the fminsearch procedure was utilized to obtain the maximum likelihood estimates of γ, β and ψ as described in Chapter 3, Section 2. Note that sometimes fminsearch function’s simplex search algorithm converged to a local minimum during the simulation analysis.

Thus, a wide range of random starting values was necessary to find the MLEs when using fminsearch. Twenty different simulation scenarios were considered. Four

different values of ψ were used in the simulations (0.25, 0.5, 1, 2), and, under each of

these values, five simulations were conducted that involved different combinations of

regression parameter values to simulate data sets with differing numbers of events.

The goal was to evaluate the model’s estimation performance under various censoring

rates and parameter values. For each scenario, we generated 1000 data sets of size

300.

86 Scenarios γ0 γ1 γ2 β0 β1 β2 ψ Number 1 Truth 5 0.5 0.5 -3 0.5 0.5 0.25 Avg Events (278) MLE 5.0582 0.4989 0.5035 -3.0684 0.5046 0.5221 0.2637 Bias 1.15% -0.22% 0.69% 2.23% 0.91% 4.23% 5.19% SSE 0.1753 0.1238 0.2366 0.4279 0.2629 0.5621 0.0203 SEE 0.1707 0.1150 0.2268 0.4424 0.2626 0.5368 0.0246 Percent Diff 2.65% 7.33% 4.23% 3.32% 0.11% 4.60% 19.17% Number 2 Truth 5 1 0.5 -1 0.5 0.5 0.25 Avg Events (200) MLE 5.0718 0.9915 0.4752 -1.0141 0.5119 0.5035 0.2674 Bias 1.42% -0.85% -5.22% 1.39% 2.32% 0.70% 6.52% SSE 0.2061 0.1435 0.2894 0.2037 0.1468 0.2739 0.0246 SEE 0.1979 0.1400 0.2688 0.1953 0.1401 0.2626 0.0300 Percent Diff 4.06% 2.49% 7.37% 4.22% 4.71% 4.19% 19.92% Number 3 Truth 5 2 1.5 -1 1 0.9 0.25 Avg Events (179) MLE 5.0255 1.9692 1.4818 -1.0053 1.0201 0.9144 0.2763 Bias 0.51% -1.56% -1.23% 0.53% 1.97% 1.58% 9.52% SSE 0.2191 0.1725 0.3067 0.2126 0.1804 0.2932 0.0276 SEE 0.2072 0.1621 0.2896 0.2090 0.1774 0.2846 0.0329 Percent Diff 5.59% 6.23% 5.75% 1.73% 1.67% 2.99% 17.46% Number 4 Truth 5 1 0.5 0.3 0.75 0.5 0.25 Avg Events (114) MLE 5.0179 0.9863 0.5098 0.3194 0.7693 0.4885 0.2775 Bias 0.36% -1.38% 1.92% 6.07% 2.51% -2.36% 9.25% SSE 0.2765 0.2051 0.3930 0.1819 0.1500 0.2702 0.0361 SEE 0.2665 0.1914 0.3581 0.1790 0.1461 0.2595 0.0419 Percent Diff 3.67% 6.92% 9.28% 1.64% 2.63% 4.02% 14.96% Number 5 Truth 5 1 0.5 1 0.5 0.5 0.25 Avg Events (71) MLE 5.0520 1.0024 0.4864 1.0192 0.5135 0.4916 0.2786 Bias 1.03% 0.24% -2.80% 1.88% 2.64% -1.71% 10.26% SSE 0.3540 0.2606 0.5031 0.1887 0.1596 0.2892 0.0468 SEE 0.3341 0.2381 0.4640 0.1930 0.1509 0.2877 0.0543 Percent Diff 5.80% 9.01% 8.10% 2.22% 5.62% 0.55% 14.81%

Table 5.8: Simulation Results Based on 1000 Data Sets of Size 300 with Psi = 0.25.

The results of our simulation study are in Tables 5.8 through 5.11. For all combinations of true values, bias was small under sparsely and heavily censored data for the regression estimates with bias being larger, in general, for regression parameter estimates in the mixing parameter when compared to the estimates in the initial health. In addition, the SEEs (see Chapter 3, Section 2 for definition) of the regression coefficient estimates obtained via the observed information matrix methods were close to the SSEs ( 10% different) across the differing number of events in the ≤ sample regardless of the true value of ψ. We conclude, through simulation, that the

87 Scenarios γ0 γ1 γ2 β0 β1 β2 ψ Number 6 Truth 5 0.5 0.5 -3 0.5 0.5 0.5 Avg Events (277) MLE 5.0943 0.4972 0.4986 -3.0735 0.5110 0.5159 0.5023 Bias 1.85% -0.55% -0.28% 2.39% 2.16% 3.09% 0.46% SSE 0.1333 0.0892 0.1770 0.4773 0.2663 0.5682 0.0554 SEE 0.1334 0.0874 0.1731 0.4431 0.2585 0.5428 0.0544 Percent Diff 0.09% 2.05% 2.20% 7.43% 2.96% 4.59% 1.70% Number 7 Truth 5 1 0.5 -1 0.5 0.5 0.5 Avg Events (199) MLE 5.0925 0.9858 0.4931 -1.0190 0.5218 0.5138 0.5103 Bias 1.82% -1.44% -1.39% 1.87% 4.18% 2.69% 2.03% SSE 0.1581 0.1120 0.2101 0.1988 0.1398 0.2626 0.0668 SEE 0.1545 0.1063 0.2053 0.1932 0.1390 0.2611 0.0662 Percent Diff 2.27% 5.25% 2.30% 2.86% 0.60% 0.55% 0.92% Number 8 Truth 5 2 1.5 -1 1 0.9 0.5 Avg Events (178) MLE 5.0758 1.9862 1.4928 -1.0090 1.0257 0.9133 0.5178 Bias 1.49% -0.70% -0.48% 0.90% 2.50% 1.45% 3.43% SSE 0.1697 0.1261 0.2240 0.1950 0.1813 0.2803 0.0707 SEE 0.1627 0.1230 0.2209 0.2063 0.1776 0.2845 0.0714 Percent Diff 4.19% 2.49% 1.41% 5.66% 2.04% 1.45% 1.03% Number 9 Truth 5 1 0.5 0.3 0.75 0.5 0.5 Avg Events (113) MLE 5.0750 1.0023 0.5044 0.3103 0.7700 0.5164 0.5294 Bias 1.48% 0.23% 0.87% 3.31% 2.60% 3.19% 5.55% SSE 0.2210 0.1537 0.2894 0.1758 0.1424 0.2546 0.1021 SEE 0.2075 0.1449 0.2737 0.1781 0.1454 0.2590 0.0942 Percent Diff 6.35% 5.89% 5.57% 1.32% 2.13% 1.71% 7.96% Number 10 Truth 5 1 0.5 1 0.5 0.5 0.5 Avg Events (71) MLE 5.0822 0.9935 0.4941 0.9989 0.5132 0.5170 0.5432 Bias 1.62% -0.66% -1.19% -0.11% 2.58% 3.28% 7.96% SSE 0.2726 0.1909 0.3573 0.1890 0.1482 0.2939 0.1440 SEE 0.2575 0.1799 0.3519 0.1914 0.1502 0.2867 0.1259 Percent Diff 5.71% 5.95% 1.51% 1.27% 1.30% 2.48% 13.43%

Table 5.9: Simulation Results Based on 1000 Data Sets of Size 300 with Psi = 0.5.

maximum likelihood estimates of the regression parameters appear to behave like minimum variance consistent estimators.

When we look at the simulation results for the random effect precision parameter

ψ, we see some issues with bias and the discordance between the SEE and SSE values.

In Tables 5.8 through 5.11, we observe that the estimates for ψ are biased when the number of events present in the data are smaller. The bias generally increases as the number of events in the simulated data decreases. Also, the accuracy of ψˆ decreases as the true value of ψ increases. In the gamma distribution of our random effect, the variance decreases as the value of ψ increases. Thus, we do not have much

88 Scenarios γ0 γ1 γ2 β0 β1 β2 ψ Number 11 Truth 5 0.5 0.5 -3 0.5 0.5 1 Avg Events (277) MLE 5.0865 0.4991 0.4997 -3.0473 0.5120 0.5031 1.0138 Bias 1.70% -0.19% -0.06% 1.55% 2.34% 0.62% 1.36% SSE 0.1129 0.0717 0.1384 0.4403 0.2603 0.5491 0.1709 SEE 0.1095 0.0690 0.1364 0.4276 0.2579 0.5280 0.1623 Percent Diff 3.04% 3.75% 1.41% 2.91% 0.92% 3.93% 5.17% Number 12 Truth 5 1 0.5 -1 0.5 0.5 1 Avg Events (199) MLE 5.0861 0.9981 0.5002 -0.9960 0.5103 0.4880 1.0242 Bias 1.69% -0.19% 0.04% -0.40% 2.01% -2.46% 2.37% SSE 0.1311 0.0821 0.1685 0.1846 0.1392 0.2520 0.2169 SEE 0.1277 0.0837 0.1625 0.1918 0.1385 0.2605 0.1974 Percent Diff 2.57% 1.93% 3.60% 3.80% 0.48% 3.32% 9.44% Number 13 Truth 5 2 1.5 -1 1 0.9 1 Avg Events (177) MLE 5.0689 1.9932 1.5008 -1.0084 1.0252 0.9262 1.0579 Bias 1.36% -0.34% 0.05% 0.84% 2.45% 2.83% 5.47% SSE 0.1404 0.0995 0.1686 0.2002 0.1917 0.2755 0.2504 SEE 0.1334 0.0965 0.1735 0.2066 0.1792 0.2852 0.2248 Percent Diff 5.08% 3.08% 2.89% 3.17% 6.69% 3.47% 10.75% Number 14 Truth 5 1 0.5 0.3 0.75 0.5 1 Avg Events (114) MLE 5.0757 0.9955 0.4884 0.3007 0.7726 0.5144 1.1097 Bias 1.49% -0.45% -2.37% 0.23% 2.92% 2.80% 9.88% SSE 0.1753 0.1206 0.2250 0.1787 0.1458 0.2650 0.4111 SEE 0.1710 0.1139 0.2157 0.1781 0.1453 0.2587 0.3282 Percent Diff 2.47% 5.74% 4.22% 0.35% 0.38% 2.40% 22.42% Number 15 Truth 5 1 0.5 1 0.5 0.5 1 Avg Events (71) MLE 5.0841 0.9924 0.4923 1.0064 0.5084 0.5125 1.2153 Bias 1.65% -0.77% -1.56% 0.63% 1.65% 2.44% 17.72% SSE 0.2214 0.1460 0.2834 0.1920 0.1520 0.2811 0.6311 SEE 0.2159 0.1418 0.2772 0.1918 0.1500 0.2870 0.5535 Percent Diff 2.49% 2.88% 2.20% 0.12% 1.32% 2.06% 13.11%

Table 5.10: Simulation Results Based on 1000 Data Sets of Size 300 with Psi = 1.

information about the random effect in our model and the trend in accuracy of ψˆ makes logical sense. However, we do notice that the results in Table 5.8 indicate higher bias when the true value of ψ is 0.25 than in Table 5.9 when the true value of

ψ is 0.5. This may be due to a more difficult numerical likelihood evaluation since ψ is closer to zero. Also, the majority of the percent differences between the SSEs and

SEEs are not within the 10% range of acceptability. However, since this random effect is considered a nuisance parameter, we need not be too concerned with the variance because we won’t be performing inference on it. All of our regression estimates appear

89 to behave like minimum variance consistent estimators regardless of the difficulties

encountered with ψˆ.

Scenarios γ0 γ1 γ2 β0 β1 β2 ψ Number 16 Truth 5 0.5 0.5 -3 0.5 0.5 2 Avg Events (277) MLE 5.0798 0.5004 0.5043 -3.0772 0.5210 0.5178 2.0917 Bias 1.57% 0.09% 0.85% 2.51% 4.03% 3.44% 4.38% SSE 0.0988 0.0616 0.1155 0.4568 0.2693 0.5518 0.7162 SEE 0.0962 0.0580 0.1145 0.4349 0.2605 0.5347 0.6093 Percent Diff 2.63% 5.96% 0.86% 4.89% 3.33% 3.15% 16.12% Number 17 Truth 5 1 0.5 -1 0.5 0.5 2 Avg Events (199) MLE 5.0815 0.9963 0.5026 -0.9955 0.5068 0.4866 2.1902 Bias 1.60% -0.38% 0.52% -0.46% 1.34% -2.76% 8.68% SSE 0.1208 0.0742 0.1452 0.1984 0.1372 0.2582 0.9511 SEE 0.1122 0.0698 0.1360 0.1924 0.1383 0.2608 0.8674 Percent Diff 7.33% 6.10% 6.50% 3.07% 0.78% 1.02% 9.20% Number 18 Truth 5 2 1.5 -1 1 0.9 2 Avg Events (177) MLE 5.0802 1.9916 1.4867 -1.0070 1.0209 0.9199 2.1991 Bias 1.58% -0.42% -0.90% 0.70% 2.05% 2.16% 9.05% SSE 0.1187 0.0802 0.1502 0.2118 0.1879 0.2940 0.8426 SEE 0.1180 0.0804 0.1454 0.2065 0.1796 0.2857 1.0272 Percent Diff 0.61% 0.25% 3.23% 2.54% 4.48% 2.86% 19.75% Number 19 Truth 5 1 0.5 0.3 0.75 0.5 2 Avg Events (113) MLE 5.0688 0.9976 0.5119 0.3193 0.7614 0.4951 2.5247 Bias 1.36% -0.24% 2.32% 6.05% 1.50% -0.99% 20.78% SSE 0.1564 0.1034 0.1901 0.1800 0.1486 0.2698 1.4792 SEE 0.1520 0.0966 0.1821 0.1780 0.1452 0.2587 1.7072 Percent Diff 2.87% 6.84% 4.32% 1.10% 2.38% 4.19% 14.31% Number 20 Truth 5 1 0.5 1 0.5 0.5 2 Avg Events (71) MLE 5.0861 0.9986 0.5021 1.0036 0.5110 0.5205 3.6832 Bias 1.69% -0.14% 0.42% 0.36% 2.16% 3.94% 45.70% SSE 0.1934 0.1282 0.2496 0.1946 0.1515 0.2961 6.6534 SEE 0.1941 0.1220 0.2369 0.1917 0.1501 0.2873 10.4324 Percent Diff 0.34% 4.92% 5.22% 1.49% 0.96% 3.02% 44.24%

Table 5.11: Simulation Results Based on 1000 Data Sets of Size 300 with Psi = 2.

It is possible that some of the bias observed in ψˆ was due to limited sample size.

To investigate this further, we repeated selected scenarios, with high percent bias in

ψˆ, increasing n to 1000. The results improved and are shown in Table 5.12. If we look at Scenario 19 in Table 5.11, the percent bias for ψˆ was 20.78%. When the simulation was conducted with a sample size of 1000, the percent bias was reduced to 0.02%. Thus, we can conclude in this case, that the bias was driven by sample

90 size. To see how this bias in ψˆ may affect the survival estimates, we plotted the two survival curves corresponding to the two sets of estimates generated with sample sizes of 300 and 1000 (Figure 5.7). The difference in the survival estimates was negligible.

Thus, the bias with ψˆ was not a concern in this case.

Scenarios γ0 γ1 γ2 β0 β1 β2 ψ Number 4 Truth 5 1 0.5 0.3 0.75 0.5 0.25 Avg Events (378) MLE 5.0500 0.9962 0.5057 0.3107 0.7572 0.4963 0.2682 Bias 0.99% -0.38% 1.14% 3.45% 0.95% -0.74% 6.78% SSE 0.1455 0.1036 0.1951 0.0977 0.0765 0.1398 0.0173 SEE 0.1454 0.1034 0.1952 0.0973 0.0790 0.1410 0.0221 Percent Diff 0.10% 0.12% 0.07% 0.47% 3.31% 0.85% 24.27% Number 9 Truth 5 1 0.5 0.3 0.75 0.5 0.5 Avg Events (378) MLE 5.0862 1.0021 0.5076 0.3076 0.7526 0.4972 0.5036 Bias 1.69% 0.21% 1.50% 2.46% 0.34% -0.55% 0.71% SSE 0.1090 0.0805 0.1465 0.0975 0.0810 0.1413 0.0459 SEE 0.1133 0.0789 0.1495 0.0967 0.0784 0.1405 0.0470 Percent Diff 3.84% 2.07% 2.03% 0.79% 3.25% 0.59% 2.26% Number 14 Truth 5 1 0.5 0.3 0.75 0.5 1 Avg Events (379) MLE 5.0857 0.9980 0.4996 0.3016 0.7527 0.5019 0.9975 Bias 1.68% -0.20% -0.08% 0.53% 0.36% 0.37% -0.25% SSE 0.0929 0.0656 0.1204 0.0917 0.0788 0.1369 0.1415 SEE 0.0931 0.0624 0.1181 0.0967 0.0784 0.1405 0.1343 Percent Diff 0.26% 5.00% 1.88% 5.40% 0.52% 2.54% 5.24% Number 19 Truth 5 1 0.5 0.3 0.75 0.5 2 Avg Events (378) MLE 5.0792 1.0011 0.5054 0.3081 0.7547 0.4955 2.0004 Bias 1.56% 0.11% 1.07% 2.61% 0.62% -0.90% 0.02% SSE 0.0845 0.0564 0.0993 0.0955 0.0808 0.1401 0.5248 SEE 0.0820 0.0526 0.0993 0.0969 0.0786 0.1405 0.4669 Percent Diff 2.97% 7.06% 0.04% 1.37% 2.67% 0.35% 11.68%

Table 5.12: Simulation Results Based on 1000 Data Sets of Size 1000 with Varying True Values of Psi.

We conducted a second simulation study to quantify the impact of ignoring

random effects on the estimation of the regression coefficients. This study consisted of

four different simulation scenarios in which the true regression parameters remained

the same and the true values of ψ were set to either 0.25, 0.5, 1 or 2. First, the OU-TR random effects mixture model was fit to the simulated random effects mixture data and the resulting estimates and standard errors were recorded. The same covariate

91 data, censoring indicators, and survival times generated under the OU-TR random

effects mixture model were retained and the OU-TR mixture model was fit to these

data. Again, the resulting estimates and standard errors were recorded.

OU Survival Curves vs True Survival Plots by Berns 1

0.95

0.9

0.85

0.8

0.75

0.7 Survival Probability OU−TR REMM (Sample Size 300) Berns = 0 0.65 OU−TR REMM (Sample Size 1000) Berns = 0 True Survival Berns = 0

0.6 OU−TR REMM (Sample Size 300) Berns = 1 OU−TR REMM (Sample Size 1000) Berns = 1 True Survival Berns = 1 0.55 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.7: Comparison of Survival Estimates When Bias is Present in ψˆ (Scenario 19) for OU-TR Random Effects Mixture Model (OU-TR REMM)

The results of these simulations are found in Table 5.13. As expected under

all scenarios, the bias and difference between the SSE and SEE for the OU-TR

random effects mixture model were small for all of the regression parameter estimates.

However, when we look at the results under the OU-TR mixture model, there were

large biases for the initial health regression parameter estimates when the true value

of ψ was 0.25, 0.5 and 1. When ψ = 1, only the intercept was biased. These bias issues

occurred since the smaller ψ is, the greater the variance of the gamma random effect

and the greater the amount of heterogeneity in X0 not accounted for by the mixture model. Since E(log(λ)) = 0, data generated with the more variable random effect 6

92 Scenario γ0 γ1 γ2 β0 β1 β2 Truth 5 2 1.5 -1 1 0.9 1: OU-TR Rand Eff MLE 5.0299 1.9685 1.4907 -1.0067 1.0201 0.9178 Mixture Model Bias 0.59% -1.60% -0.62% 0.67% 1.97% 1.94% ψ = 0.25 SSE 0.2180 0.1704 0.2987 0.2080 0.1761 0.2805 Avg # Events=179 SEE 0.2087 0.1616 0.2899 0.2090 0.1776 0.2843 Percent Diff 4.32% 5.30% 3.02% 0.45% 0.89% 1.36% OU-TR MLE 0.4599 0.9781 0.7325 -0.9885 1.1099 0.9799 Mixture Model Bias -996.62% -104.48% -104.77% -1.17% 9.90% 8.15% SSE 0.6179 0.5750 0.9782 0.2082 0.1690 0.2716 SEE 0.0781 0.0889 0.1281 0.2032 0.1655 0.2741 Percent Diff 155.14% 146.43% 153.67% 2.46% 2.09% 0.92% 2: OU-TR Rand Eff MLE 5.0735 1.9933 1.5038 -1.0084 1.0214 0.9023 Mixture Model Bias 1.45% -0.33% 0.25% 0.83% 2.10% 0.25% ψ = 0.5 SSE 0.1642 0.1257 0.2261 0.2109 0.1783 0.2855 Avg # Events=178 SEE 0.1621 0.1224 0.2198 0.2064 0.1774 0.2843 Percent Diff 1.26% 2.61% 2.78% 2.18% 0.55% 0.42% OU-TR MLE 2.7283 1.7973 1.3489 -0.9895 1.1155 0.9695 Mixture Model Bias -83.26% -11.28% -11.20% -1.06% 10.36% 7.17% SSE 0.7486 0.5010 1.1037 0.2097 0.1712 0.2782 SEE 0.0776 0.0793 0.1250 0.2041 0.1686 0.2764 Percent Diff 162.41% 145.33% 159.30% 2.74% 1.49% 0.67% 3: OU-TR Rand Eff MLE 5.0786 1.9915 1.4948 -0.9946 1.0300 0.8986 Mixture Model Bias 1.55% -0.43% -0.35% -0.54% 2.91% -0.16% ψ = 1 SSE 0.1359 0.0954 0.1738 0.2097 0.1812 0.2900 Avg # Events=177 SEE 0.1341 0.0966 0.1740 0.2059 0.1790 0.2849 Percent Diff 1.29% 1.28% 0.13% 1.84% 1.20% 1.78% OU-TR MLE 4.2290 1.9769 1.4612 -0.9814 1.0862 0.9392 Mixture Model Bias -18.23% -1.17% -2.65% -1.90% 7.94% 4.18% SSE 0.3337 0.2159 0.4862 0.2097 0.1792 0.2873 SEE 0.0738 0.0666 0.1140 0.2046 0.1740 0.2800 Percent Diff 127.52% 105.72% 124.04% 2.49% 2.94% 2.58% 4: OU-TR Rand Eff MLE 5.0750 1.9897 1.4896 -1.0071 1.0263 0.9209 Mixture Model Bias 1.48% -0.52% -0.70% 0.71% 2.56% 2.27% ψ = 2 SSE 0.1204 0.0813 0.1488 0.2059 0.1873 0.2877 Avg # Events=177 SEE 0.1183 0.0802 0.1456 0.2067 0.1798 0.2858 Percent Diff 1.76% 1.35% 2.18% 0.37% 4.08% 0.66% OU-TR MLE 4.7400 1.9811 1.4881 -0.9975 1.0532 0.9379 Mixture Model Bias -5.49% -0.96% -0.80% -0.25% 5.05% 4.04% SSE 0.1308 0.1049 0.2038 0.2064 0.1845 0.2853 SEE 0.0725 0.0615 0.1098 0.2057 0.1771 0.2831 Percent Diff 57.38% 52.24% 59.95% 0.33% 4.08% 0.76%

Table 5.13: Simulation Results Based on 1000 Data Sets of Size 300 for OU-TR Random Effects Mixture Model Comparison to OU-TR Mixture Model without Random Effects.

caused the intercept estimate and, in some cases, the covariate regression parameter estimates to be biased in an attempt to account for the random effect’s absence from the model. We encountered this same issue when analyzing the impact of overlooking the random effect under the OU-TR model. When ψ = 2, the initial health regression

93 parameters were unbiased for the OU-TR mixture model. Since there is smaller

variation of the gamma random effect, there is minimal heterogeneity present in X0

allowing for proper estimation under the OU-TR mixture model.

Under all four scenarios, the standard errors of the initial health regression parameter

MLEs were considerably underestimated when the random effect was overlooked. The

percent difference in the standard errors decreased as the true value of ψ increased.

This makes logical sense because a larger true ψ indicates a smaller presence of between-subject heterogeneity in the simulated data. For the most part, the coefficient estimates for the mixing parameter, provided by the model ignoring the random effects, were unbiased (note that in Scenario 2, βˆ1 had a bias of 10.36%). The corresponding standard errors (SEEs) were also within 10% of the empirical values

(SSEs).

To demonstrate the importance of accounting for random effects in terms of survival estimation, plots of the simulated survival curves corresponding to scenarios in Table 5.13 are shown in Figures 5.8 through 5.11 corresponding to the different true values of ψ (0.25, 0.5, 1, 2). When there is a strong presence of a random effect

(ψ =0.25, 0.5), the OU-TR mixture model inadequately fits the data. On the other hand, when there is less of a random effect present (ψ = 1, 2), the OU-TR mixture model does a better job in fitting the random effect data.

94 OU Estimated Survival vs True Survival Plots by Berns 1 OU−TR MM Berns = 0 OU−TR REMM Berns = 0 0.9 True Berns = 0 OU−TR MM Berns = 1 OU−TR REMM Berns = 1 0.8 True Berns = 1

0.7

0.6

0.5 Survival Probability

0.4

0.3

0.2 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.8: Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 0.25 (Scenario 1 from Table 5.13).

OU Estimated Survival vs True Survival Plots by Berns 1 OU−TR MM Berns = 0 OU−TR REMM Berns = 0 0.9 True Berns = 0 OU−TR MM Berns = 1 OU−TR REMM Berns = 1 0.8 True Berns = 1

0.7

0.6

0.5 Survival Probability

0.4

0.3

0.2 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.9: Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 0.5 (Scenario 2 from Table 5.13).

95 OU Estimated Survival vs True Survival Plots by Berns 1 OU−TR MM Berns = 0 OU−TR REMM Berns = 0 0.9 True Berns = 0 OU−TR MM Berns = 1 OU−TR REMM Berns = 1 0.8 True Berns = 1

0.7

0.6

0.5 Survival Probability

0.4

0.3

0.2 0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.10: Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 1 (Scenario 3 from Table 5.13).

OU Estimated Survival vs True Survival Plots by Berns 1 OU−TR MM Berns = 0 OU−TR REMM Berns = 0

0.9 True Berns = 0 OU−TR MM Berns = 1 OU−TR REMM Berns = 1 True Berns = 1 0.8

0.7

0.6

Survival Probability 0.5

0.4

0 1 2 3 4 5 6 7 8 9 10 Time to Death (Years)

Figure 5.11: Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 2 (Scenario 4 from Table 5.13).

96 5.2.3 Application of OU-TR Random Effects Mixture Model to Time to Relapse Data from Patients with Melanoma

To demonstrate the use of the OU-TR random effects mixture model on real data, we examined time to relapse data from melanoma patients enrolled in the

E1690 clinical trial analyzed in Kirkwood et al. (2000). This data set was previously described in Chapter 4, Section 3. In that section, the OU-TR mixture model was

fit to the data and it was determined that the best fitting model contained nodal category (larger category number indicates larger spread of melanoma cells) in both the initial health and the mixing parameter. Likewise, in this section, we fit the

OU-TR random effects mixture model to the melanoma data incorporating nodal category in X0 and p. The OU-TR random effects mixture model data application was conducted using Matlab Version 7.9.

The results for the OU-TR mixture model and the OU-TR random effects mixture model are shown in Table 5.14. When we look at these results, we notice the estimated regression coefficients in X0 are close to each other under the different models with the exception of the intercept term. This is expected since the mean of log(λ) =0 as 6 shown in Equation (5.4). The estimate of ψ in the random effects model is 2.52. This relatively large value of ψ indicates there is small variability in the random effects.

Therefore only a small amount of unexplained heterogeneity is present when nodal category is in the model. The mixing parameter regression estimates are in close agreement between the two models.

The Goodness of fit of both the OU-TR mixture model and the OU-TR random effects mixture model is demonstrated in Figures 5.12 and 5.13. The estimated survival curves generated under both models are very similar to each other. It appears

97 OU-TR Mixture Model Variable Parameter Estimate Std Err P-value X0:Intercept 0.3601 0.1201 0.0027 X0:Nodal Cat 1 -0.5058 0.1487 0.0007 X0:Nodal Cat 2 -0.6148 0.1530 <0.0001 X0:Nodal Cat 3 -0.6971 0.1544 <0.0001 P:Intercept -0.0845 0.2506 0.7361 P:Nodal Cat 1 -0.4040 0.3240 0.2124 P:Nodal Cat 2 -1.0298 0.3763 0.0062 P:Nodal Cat 3 -1.3451 0.4029 0.0008 OU-TR Random Effects Mixture Model Variable Parameter Estimate Std Err P-value X0:Intercept 0.6964 0.1891 0.0002 X0:Nodal Cat 1 -0.5607 0.1985 0.0047 X0:Nodal Cat 2 -0.7694 0.2042 0.0001 X0:Nodal Cat 3 -0.8744 0.2062 <0.0001 P:Intercept -0.0995 0.2525 0.6892 P:Nodal Cat 1 -0.4002 0.3262 0.2191 P:Nodal Cat 2 -1.0247 0.3786 0.0068 P:Nodal Cat 3 -1.3329 0.4046 0.0010 ψ 2.5152 0.9864 N/A

Table 5.14: Final OU-TR Mixture Model and OU-TR Random Effects Mixture Model for the Melanoma Data

the OU-TR random effects mixture model provides slightly better fit to the melanoma

data than the OU-TR mixture model. The standard errors of the regression estimates

in X0 are slightly higher under the random effects mixture model when compared to

the mixture model. We saw a similar trend in the simulation study. In Table 5.14,

simulation scenario 4 involved simulated random effects data where ψ = 2. We notice

that the OU-TR mixture model standard errors for the regression parameters in X0 were underestimated by roughly 56% in the simulation. In the data application, the standard errors from the OU-TR mixture model were 8% lower than those from the

OU-TR random effects mixture model.

98 OU MM and REMM Plots Versus Kaplan−Meier Plots for Node 0 1 OU REMM Model 0.9 OU MM Model KM Plot 0.8

0.7

0.6

0.5

0.4

Relapse Free Probability 0.3

0.2

0.1

0 0 1 2 3 4 5 6 7 8 9 Time to Relapse (Years)

OU MM and REMM Plots Versus Kaplan−Meier Plots for Node 1 1 OU REMM Model OU MM Model 0.9 KM Plot

0.8

0.7

0.6

Relapse Free Probability 0.5

0.4

0 1 2 3 4 5 Time to Relapse (Years)

Figure 5.12: Goodness of Fit of Best BIC Melanoma Model (Mixture Model and Random Effects Mixture Model) for Nodal Categories 0 and 1

99 OU MM and REMM Plots Versus Kaplan−Meier Plots for Node 2 1 OU REMM Model OU MM Model 0.9 KM Plot

0.8

0.7

0.6

0.5 Relapse Free Probability 0.4

0.3

0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time to Relapse (Years)

OU MM and REMM Plots Versus Kaplan−Meier Plots for Node 3 1 OU REMM Model 0.9 OU MM Model KM Plot 0.8

0.7

0.6

0.5

0.4 Relapse Free Probability

0.3

0.2

0.1 0 1 2 3 4 5 6 Time to Relapse (Years)

Figure 5.13: Goodness of Fit of Best BIC Melanoma Model (Mixture Model and Random Effects Mixture Model) for Nodal Categories 2 and 3

100 5.3 Discussion

In Chapters 3 and 4, we introduced the Ornstein-Uhlenbeck threshold regression

and mixture models respectively. These models performed well in the data applications

and simulations, but they lacked the ability to account for unexplained between-

subject heterogeneity which may be present in biomedical studies. In this chapter,

we introduced a random effect component to each of these models to account for this

phenomenon.

We first examined the OU-TR random effects model which performed considerably

better than the OU-TR model in simulations, especially when there was considerable

variability in the random effects. Even when the data had a small random effect

variance, the OU-TR random effects model still did a much better job estimating

standard errors than the OU-TR model.

When fitting the OU-TR random effects model to the oropharynx data set described in Chapter 3, we observed estimates of the regression coefficients similar to those from the OU-TR model except for the intercept term. We expected this since there was minimal variability in the random effects indicated by the estimate of ψ. As a result,

the estimated survival curves generated under both these models were very similar

to each other. We must also note that, like the OU-TR model, the OU-TR random

effects model had difficulty fitting data with flat survival curves.

Next, we extended our OU-TR mixture model to include random effects. As

seen before with the OU-TR model, when the OU-TR mixture model (without

random effects) was applied to simulated data containing considerable variability in

the random effects, estimates of the regression coefficients and survival probabilities

were biased and standard errors were grossly underestimated. On the other hand,

101 when data had a small random effect presence, the OU-TR mixture model without

random effects did well with regards to bias and survival estimation, but the standard

errors of the estimates were still significantly underestimated. In contrast, when the

OU-TR random effects mixture model was applied to these data, the standard errors

of the estimates were accurate, bias was small and the estimated survival curve was

almost exactly equal to the true survival curve.

Lastly, when the OU-TR random effects mixture model was fit to the melanoma

data set described in Chapter 4, the resulting estimated regression coefficients in X0 were not much different than the estimates obtained under the OU-TR mixture model without random effects except for the intercept term. Again, this was expected since our estimate of ψ was large indicating a minimal random effect presence. The OU-

TR mixture and OU-TR random effects mixture models also provided similar survival

estimates.

Based on the research conducted in this chapter, we have shown that both the

OU-TR random effects model and the OU-TR random effects mixture model can

be powerful tools in the field of survival analysis. They allow for proper estimation

of covariate effects when there are random effects present in the data and when

there are not. In human subject studies, this is extremely important since factors

influencing survival can either be unmeasured or immeasurable. Also, it may be

good research practice to apply the random effects model even though the random

effect presence is minimal. Regardless of the level of variability in the random effects,

estimates of the standard errors were more accurate under the random effects models

compared to those models that didn’t account for the random effects. Having accurate

standard error estimates plays a significant role in drawing inferences on the regression

102 estimates (i.e. Wald tests). On the other hand, we have to be careful with regards to the accuracy of survival estimates when applying the OU-TR random effects model when there is only a minimal random effect presence. When we analyzed the oropharynx data with the OU-TR random effects model, the survival estimates were slightly worse than those generated under the OU-TR model. Thus, we may have to sacrifice some accuracy in the survival estimates in order to obtain better standard errors of the regression estimates.

Further research should be conducted on these random effects models. In our real data analysis sections of this chapter, we fit our OU-TR random effects and

OU-TR random effects mixture models to data containing only a minimal amount of unexplained heterogeneity between the subjects. Demonstrating the ability of our random effects models to fit these types data is important, but our models need to be applied to data with a significant random effect presence. This would further demonstrate the advantage of these models over the OU-TR and OU-TR mixture models.

We must also note that, in general, we expect the OU-TR random effects model to be a good fit to the survival data when subjects experience the event rather quickly.

However, when the data suggests a longer time is needed for subjects to experience the event, our model poorly fits the data (similar results obtained in OU-TR models without random effects). Thus, adding the capability to account for between-subject unexplained heterogeneity did not improve model fit for this type of data.

103 Chapter 6: Conclusion

In this dissertation, we introduced several threshold regression models where the underlying stochastic process was the Ornstein-Uhlenbeck process. Homeostasis, defined as the tendency to regulate internal conditions by diffusing back and forth to stabilize health, is a characteristic found in many biological processes. The OU process is a natural model for the latent health process in threshold regression because it diffuses back and forth around it’s mean which can be thought of as a point of homeostasis. Throughout this research, we regarded homeostasis as the point where the event occurs (called the threshold).

First, we introduced the Ornstein-Uhlenbeck threshold regression (OU-TR) model.

This model utilized the OU process to model a subject’s health where covariates are linked to the baseline level of health using a log-link function. Simulations demonstrated that maximum likelihood methods provided unbiased estimates of the regression coefficients in the log-link function for baseline health, and reliable standard errors were calculated for these estimates. We applied the OU-TR model to survival data of subjects afflicted with oropharynx cancer. The model fit the survival data well for those subjects who were less healthy, but underestimated survival for those subjects who were healthier (no disability and the tumor was not massive).

104 Next, we extended the OU-TR model to allow for a cure rate. This model allowed us to examine two types of covariate effects. Like the OU-TR model, we can examine covariate effects on baseline health. In addition, we can also examine covariate effects on the cure rate through use of a logistic function. Again, simulations demonstrated that maximum likelihood methods provided unbiased estimates of the regression coefficients in the log-link function as well as the regression coefficients in the logistic function. Also, reliable standard errors were calculated for these estimates. Our model was then applied to relapse free survival data of melanoma patients following definitive surgery and demonstrated excellent fit.

Even though the OU-TR and OU-TR mixture models performed very well in our

simulations and data examples, they lacked the ability to account for unexplained

between-subject heterogeneity which may be present in biomedical studies. To handle

this phenomenon, we introduced a random effect component to each of these models.

Our first model examined was the OU-TR random effects model. In simulation

studies, this model generated unbiased estimates and reliable standard error estimates

under various degrees of unexplained heterogeneity. We also examined the effects of

ignoring random effects on the estimation of the regression coefficients. When data

with a large random effect presence was modeled using the OU-TR model (ignoring

random effects), the intercept was biased, standard errors were underestimated and

survival estimates were inaccurate. When data had a small random effect presence,

the OU-TR model did well with regards to bias and survival estimates, but the

standard errors of the estimates were still significantly underestimated. We then

fit the OU-TR random effects model to the oropharynx cancer data set. Since the

random effect parameter estimate ψˆ indicated there was minimal variability in the

105 random effects, the fit of this random effects model was very similar to the fit obtained

using the OU-TR model. However, the OU-TR model provided a slightly better fit

to the data.

We also extended our OU-TR mixture model for data with a cure rate to include

random effects. In simulations, this model also generated unbiased estimates and

reliable standard error estimates under various degrees of unexplained heterogeneity.

There were some issues with bias and discordance between the SEE and SSE values

for the random effects parameter estimate ψˆ most notably when the censoring was more than 60%. However, survival estimates were unaffected by the bias, and, since the random effect variance is not of inferential interest, we need not be too concerned with the variance. Just as we did with the OU-TR random effects model, we studied the effects of ignoring random effects on the estimation of the regression coefficients.

When the OU-TR mixture model without random effects was fit to data generated with significant variability in the random effects, the coefficients in the initial health status model were biased and their standard errors were underestimated. When the generated data had small variability in the random effects, the OU-TR mixture model did well with regards to bias, but the standard errors of the estimates were still significantly underestimated. Lastly, the OU-TR random effects mixture model was fit to relapse free survival data of melanoma patients. Since the random effect parameter estimate ψˆ indicated there was minimal variability in the random effects, the fit of this random effects mixture model was very similar to the fit obtained using the OU-TR mixture model without random effects. However, the random effects mixture model provided a slightly better fit to the data.

106 We expect our OU-TR models to be better suited for some data sets than others.

The OU-TR models worked best for data corresponding to subjects whose health declined fairly rapidly toward the event under study (e.g., high grade cancer patients whose health declined rapidly toward death or relapse). On the other hand, our models tended to underestimate survival for subjects whose health declined slowly toward the event (e.g., time to death of low grade cancer patients). This makes logical sense because our OU-TR model pulls the subject’s health toward the threshold which we set as the mean of the OU process. Thus, data with survival curves that decrease at a much slower rate will not correspond as well with the mean-reverting property of the OU process.

The OU-TR mixture models we proposed are most appropriate for data containing subjects who were either immediately cured from treatment or were not cured and their health declined rapidly toward death or relapse. A good example of would be the melanoma data studied in Chapters 4 and 5. Those with the least severe nodal category were better off (cured) after the malignancy was removed than those who had a more severe nodal category (not cured). The health of those subjects with a more severe nodal category tended to decline rapidly toward death from melanoma.

If the data contained subjects who were not cured, but did not exhibit a rapid decline in health toward the event, the OU-TR mixture model may not be a good fit.

Future research should address the aforementioned limitations of the OU-TR mixture model. For instance, it may be possible to extend the OU-TR mixture model by adding a time-varying component. We could section the survival time interval into m components allowing separate cure rate modeling within each interval. For example, we could model treatment for a particular disease involving administration

107 of medication on a set dosing schedule (m medication administrations). There would be m intervals where time 0 equals the first dosing and the last interval is from time of last dosing to end of follow-up. Then we would define a different cure rate for each of the m intervals. Thus, in each interval, we would have subjects who were cured

and subjects who were not cured and their health would decline according to the OU

process and approach the threshold. Similar extensions of Wiener process threshold

regression models can be found in Lee et al. (2010) which includes time-varying

covariates and in Li and Lee (2011) which involves time-varying coefficients.

We could also apply the Cox proportional hazards model to the carcinoma of

the oropharynx and melanoma data sets. Then we can compare the effects of the

covariates on survival between the Cox model and our OU-TR models to further

validate the results contained in this research.

Another area of future research to consider, in the context of the random effects

models, is to apply our models to data with a significant presence of between subject

heterogeneity. In our real data analysis sections of Chapter 5, we fit our OU-

TR random effects and OU-TR random effects mixture models to data containing

only a minimal amount of unexplained heterogeneity present between the subjects.

Demonstrating the ability of our random effects models to fit these types of data was

important, but our models should also be applied to data with a significant random

effect presence. This would better demonstrate the advantage of these models over

the OU-TR and OU-TR mixture models.

We could also extend our OU-TR random effects model to accommodate clustered

survival data. In this type of data scenario, the random effect accounts for between

cluster heterogeneity where subjects within the cluster are considered to have similar

108 characteristics. Adding this random effect accounts for the dependency within the clusters and allows us to model the survival data assuming independence, conditional on the random effect. Saebo et al. (2004) modeled mastitis resistance in dairy cattle as first-hitting times using a Wiener process and account for heterogeneity within the population by expressing process parameters as functions of random variables shared between cows from a particular sire. This sire effect (potentially due to the genetic traits of the sire) could account for the particular sire’s ability to produce calves that are resistant to mastitis. Potentially an extended version of our model could be applied to the Saebo et al. data. In this scenario, all cattle from the jth sire (j =1,...,k) would share the same random effect (λj) in the model for the initial health status. This type of random effects model could also be used in multi-center cancer clinical trial studies such as found in Glidden and Vittinghoff (2004) and the

λ’s would capture differences across study centers not accounted for by the predictors in the model. This model differs from the random effects models presented in this dissertation, which capture unexplained between-subject heterogeneity.

Currently, we are unable to pick an arbitrary mean for the OU process to revert to since there is no closed form for the first hitting distribution in the general case. A general OU process model, which would allow all of the parameters defining the OU process to be estimated from the data, may improve fit in datasets that we currently have difficulty modeling. However, derivation of the first hitting time distribution in the general case is extremely difficult and has never been accomplished.

109 Bibliography

Aalen, O. O. and Gjessing, H. K. (2001). Understanding the shape of the hazard rate: A process point of view. Statistical Science 16, 1-14.

Aalen, O. O. and Gjessing, H. K. (2004). Survival models based on the Ornstein- Uhlenbeck Process. Lifetime Data Analysis 10, 407-423.

Aalen, O. O., Borgan, O. and Gjessing, H. K. (2008). Survival and Event History Analysis. Springer: New York.

American Cancer Society, Melanoma Skin Cancer, 28 April 2011, (http://www.cancer.org/Cancer/SkinCancer-Melanoma/DetailedGuide/melanoma- skin-cancer-staging). Retrieved August 7, 2011.

Berkson, J. and Gage, R.P., (1952). Survival curve for cancer patients following treatment. Journal of the American Statistical Association 47, 501-515.

Blessing, W. W. (1987). The Lower Brainstem and Bodily Homeostasis. Oxford University Press: New York.

Boscardin, J.W., Taylor, J.M.G, Law, N., (1998). Longitudinal models for AIDS marker data. Statistical Methods in Medical Research 7, 13-27.

Cely, C. M. et al. (2004). Relationship of baseline glucose homeostasis to hyperglycemia during medical critical illness. Chest 126, 879-887.

Chen, M. H., Harrington, D. P., Ibrahim, J. G. (2002). Bayesian cure rate models for malignant melanoma: A case study of Eastern Cooperative Oncology Group trial E1690. Applied Statistics 51, 135-150.

110 Chiras, D. D. (2005). Human Biology. Jones and Bartlett Learning: Sudbury.

Chhikara, R. S. and Folks, J. L. (1989). The Inverse Gaussian Distribution: Theory, Methods, and Applications. Marcel Dekker: New York.

Cox, D. R. (1972). Regression Models and Life Tables. Journal of the Royal Statistical Society. Series B 34, 187-220.

Cox, D. R. and Miller, H. D. (1965). The Theory of Stochastic Processes. Wiley: New York. de Leeuw, P. W. and Dees, A. (2003). Fluid homeostasis in chronic obstructive lung disease. European Respiratory Journal 22, Supplement 46, 33s-40s.

Duchesne, T., Lawless, J. (2004). Alternative time scales and failure time models. Lifetime Data Analysis 6, 157-179.

Eaton, W. W. and Whitmore, G. A. (1977). Length of stay as a stochastic process: A general approach and application to hospitalization for schizophrenia. J. Math. Sociology 5, 273-292.

Efron, B. and Hinkley, D. V. (1978). The observed versus expected information. Biometrika. Series B 65, 457-487.

Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38 10411046.

Feller, W. (1966). An Introduction to and Its Applications. Wiley: New York.

Finch, S. (2004). Ornstein-Uhlenbeck Process. (http://algo.inria.fr/csolve/ou.pdf). Retrieved 1 July 2009.

Fry, T. J. et al. (2001). A potential role for interleukin-7 in T-cell homeostasis. Blood 97, 2983-2990.

111 Garshick, E., Laden, F., Hart, J. E., Rosner, B., Smith, T. J., Dockery, D. W., and Speizer, F. E. (2004). Lung cancer in railroad workers exposed to diesel exhaust. Environmental Health Perspectives 112, 1539-1543.

Glidden, D. V. and Vittinghoff, E. (2004). Modelling clustered survival data from multicentre clinical trials. Statistics in Medicine 23, 369-388.

Horrocks, J. and Thompson, M. E. (2004). Modeling event times with multiple outcomes using the Wiener process with drift. Lifetime Data Analysis 10, 29-49.

Horsthemke, W. and Lefever, R. (1984). Noise-Induced Transitions: Theory and Applications in Physics, Chemistry and Biology. Springer: Berlin.

Hougaard, P., (1991). Modeling Heterogeneity in survival data. Journal of Applied Probability 28, 695-701.

Ibrahim, J. G., Chen, M-H., Sinha, D. (2001). Bayesian Survival Analysis. Springer: New York.

Kalbfleisch, J. D. (1978). Nonparametric Bayesian analysis of survival time data. Journal of the Royal Statistical Society, Series B 40, 214-221.

Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. Wiley: Hoboken, NJ.

Kirkwood, J. et al. (2000). High and Low-Dose Interferon Alfa-2b in High-Risk Melanoma: First Analysis of Intergroup Trial E1690/S9111/C9190. Journal of Clinical Oncology 18, 2444-2458.

Keiding, N., Andersen, P. K., Klein, J. P. (1997). The role of frailty models and accelerated failure time models in describing heterogeneity due to omitted covariates. Statistics in Medicine 16, 215-224.

Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data, Second Edition. Springer: New York.

112 Kuk, A. Y. C. and Chen, C-H. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika 79, 531-541.

Kyprianou, A. E. (2006). Introductory Lectures on Fluctuations of Levy Processes with Applications. Springer: Berlin.

Laska, E. M. and Meisner, M. J. (1992). Nonparametric estimation and testing in a cure rate model. Biometrics 48, 12231234.

Lawless, J. (2003). Statistical Models and Methods for Lifetime Data, Second Edition. Wiley: Hoboken, NJ.

Lawless, J. and Crowder, M. (2004). Covariates and random effects in a gamma process model with application to degradation and failure. Lifetime Data Analysis 10, 213-227.

Lee, M. L. T., Chang, M., Whitmore, G. A. (2008). A threshold regression mixture model for assessing treatment efficacy in a multiple myeloma clinical trial. Journal of Biopharmaceutical Statistics 18, 1136-1149.

Lee, M. L. T., DeGruttola, V., Schoenfeld, D. (2000). A model for markers and latent health status. Journal of the Royal Statistical Society. Series B 62, 747-762.

Lee, M. L. T., Whitmore, G. A. et al. (2004). Assessing lung cancer risk in railroad workers using a first hitting time regression model. Environmetrics 15, 501-512.

Lee, M. L. T., Whitmore, G. A. (2006). Threshold regression for survival analysis: modeling event times by a stochastic process reaching a boundary. Statistical Science 21, 501-513.

Lee, M. L. T., Whitmore, G. A. et al. (2009). A case-control study relating railroad worker mortality to diesel exhaust exposure using a threshold regression model. J Stat Plan Inference 139(5), 1633-1642.

Lee, M. L. T., Whitmore, G. A., and Rosner, C. (2010). Threshold regression for survival data with time-varying covariates. Statistics in Medicine 29, 896-905.

113 Li, J. and Lee, M. L. T. (2011). Analysis of failure time using threshold regression with semi-parametric varying coefficients. Statistica Neerlandica 65, 164-182.

Li, Q., Shen, X. and Pearl, D. K. (2007). Bayesian modeling of the dynamics of hepatotoxicity. Statistics in Medicine 26(19), 3525-3680.

Linden, M. (2000). Modeling strike duration distribution: A controlled Wiener process approach. Applied Stochastic Models in Business and Industry 16, 35-45.

Lindqvist, B. H. and Skogsrud, G. (2009). Modeling of dependent competing risks by first passage times of Wiener processes. IIE Transactions 41, 72-80.

Meyers, L. E., (1981). Survival functions induced by stochastic covariate processes. Journal of Applied Probability 18, 523-529.

Oakes, D. (1995). Multiple time scales in survival analysis. Lifetime Data Analysis 1, 7-18.

Pennell, M. L., Whitmore, G. A., Lee, M. L. T. (2010). Bayesian random-effects threshold regression with application to survival data with non proportional hazards. Biostatistics 11, 111-126.

Prabhu, N. U. (1965). Stochastic Processes: Basic Theory and its Applications. Macmillan: New York.

Ricciardi, L. M. and Sato, S. (1988). First-passage-time density and moments for the Ornstein-Uhlenbeck process. Journal of Applied Probability 25, 4357.

Saebo, S., Almoy, T., and Aastveit, A. H. (2005). Disease resistance modeled as first-passage times of genetically dependent stochastic processes. Applied Statistics 54, 273-285.

Saebo, S., Almoy, T., Heringstad, B., Klemetsdal, G. and Aastveit, A. H. (2005). Genetic evaluation of mastitis resistance using a first-passage time model for Wiener processes for analysis of time to first treatment. Journal of Dairy Science 88, 834-841.

114 Singpurwalla, N. D. (1995). Survival in dynamic environments. Statistical Science 1, 86-103.

Taylor, J. M. G. (1995). Semi-parametric estimation in failure time mixture models. Biometrics 51, 899-907.

Tong, X, He, X., Sun, J. and Lee, M. L. T. (2008). Joint analysis of current status and marker data: an extension of a bivariate threshold model. The International Journal of Biostatistics 4, 1122-1135.

Vaupel, J. M., Manton, K. G., and Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16, 439-454.

Whitmore, G. A. (1979). An Inverse Gaussian Model for Labour Turnover. Journal of the Royal Statistical Society. Series B 142, 468-478.

Whitmore, G. A. (1986). Normal-gamma Mixtures of Inverse Gaussian Distributions. Scandinavian Journal of Statistics 13, 211-220.

Whitmore, G. A. and Su, Y. (1979). Modeling low birth weights using threshold regression: results for U. S. birth data. Lifetime Data Analysis 13, 161-190.

Whitmore, G. A., Crowder, M. J. and Lawless, J. F. (1998). Failure inference from a marker process based on a bivariate Wiener model. Lifetime Data Analysis 4, 229-251.

Woodbury, M. A. and Manton, K. G. (1977). A random-walk model of human mortality and aging. Theoretical Population Biology 11, 37-48.

Xiao, T., Whitmore, G. A., He, X., and Lee, M. L. T. (2012). Threshold regression for time-to-event analysis: the stthreg package. The Stata Journal, In Press.

Yamaguchi, K., (1992). Accelerated failure-time regression models with a regression model of surviving fraction: An application to the analysis of “permanent employment” in Japan. Journal of the American Statistical Assoc. 87, 284292.

115 Yashin, A. I., (1985). Dynamics in survival analysis: conditional Gaussian property versus Cameron-Martin formula, in: Statistics and Control of Stochastic Processes, Springer: New York.

116 Appendices

117 Appendix A: Standard Error Derivations for OU-TR Model

The likelihood under this model, assuming that the first m observations are event times and the last N m are censoring times, is −

m 2ti 2 N 2 e x0i x0i L = x0i exp 2Φ 1 π (e2ti 1)3/2 − 2(e2ti 1) √ 2ti − i=1 "r # i=m+1 e 1 Y −  −  Y   −   and the corresponding log-likelihood is

m 2ti 2 N 2 e x0i x0i log(L)= log x0i exp + log 2Φ 1 . π (e2ti 1)3/2 − 2(e2ti 1) √ 2ti − i=1 "r # i=m+1 e 1 X −  −  X   −  

First we define several vectors to make the presentation cleaner. The regression

T coefficient vector is given by γ =(γ0,γ1,...,γp) and the covariate vector is given by

T Zi = (z0,z1,...,zp) . Also, fz(.) is the standard normal density function and Φ(.) is the standard normal cumulative distribution function. Rewriting the log-likelihood in vector form we get

m 2 T N T π T exp(2γ Zi) exp(2γ Zi) log(L)= log + 2ti + γ Zi + log 2Φ 1 .  (e2tiq 1)3/2  − 2(e2ti 1) √e2ti 1 − i=1 − − i=m+1   −   X   X  

113 Taking the second derivative of the log-likelihood with respect to γ we get

2 m T T ∂ log(L) −2exp(2γ Zi)Zi Zi = + T 2ti − ∂γ∂γ i " (e 1) # X=1

  T T 3 T T 2 T 2 exp(γ Zi) exp(γ Zi) exp(γ Zi) exp(γ Zi) exp(γ Zi) T  −2fz 2Φ − 1 − 4 fz Zi Zi  N  t t t t t   ( e2 i −1 ! e2 i −1 ! e2 i −1 ! ! " e2 i −1 !# e2 i −1 ! )    +  q q q q q   2  i=m+1   X        2   T    exp(γ Zi)     2Φ −1      e2ti −      1      q    T T exp(γ Zi) exp(γ Zi) T 2fz Zi Zi N 2t 2t  e i −1 ! e i −1  + q q . T Z i=m+1  exp(γ i)  X  2Φ − 1   2ti −   e 1 !   q 

The negative inverse of this matrix, evaluated at the MLE of γ, is the estimated

variance-covariance matrix. The square root of the diagonal elements of this inverted

matrix equal the estimated standard errors of the maximum likelihood estimators of

the model parameters. The derivations in this appendix are original work by the

author.

114 Appendix B: Standard Error Derivations for OU-TR Mixture Model

The likelihood under this model, assuming that the first m observations are event times and the last N m are censoring times, is −

m 2ti 2 N 2 e x0i x0i L = (1 pi) x0i exp pi + (1 pi) 2Φ 1 − π (e2ti 1)3/2 − 2(e2ti 1) − √ 2ti − i=1 " r # i=m+1 e 1 Y −  −  Y    −   and the corresponding log-likelihood is

m 2ti 2 2 e X0i log(L) = log(1 pi) + log + log(X0i) + log − π (e2ti 1)3/2 − 2(e2ti 1) i=1 " r # X  −  − N X0i + log pi + (1 pi) 2Φ 1 . − √e2ti 1 − i=Xm+1    −  

T In this model, X0i = exp(γ Zi) where the regression coefficient vector is given

T T by γ = (γ0,γ1,...,γp) and the covariate vector is given by Zi = (z0,z1,...,zp) .

T T The mixing parameter is defined as pi = exp(β Wi)/(1 + exp(β Wi)) where the

T mixing parameter regression coefficient vector is β = (β0, β1,...,βq) and Wi =

T (w0,w1,...,wq) is the covariate vector. Note that Zi and Wi need not be the same set of covariates. For ease of readability in the following derivations, pi and X0i are not shown in vector form. Also, fz(.) is the standard normal density function, Φ(.) is the standard normal cumulative distribution function and we denote ⋆ = X /√e2ti 1. 0i −

115 The second derivative of the log-likelihood with respect to γ is

m 2 T N ∂2log(L) 2X ZiZi 2(1 p )Zi = − 0i + − − i ∂γ∂γT " e2ti 1 # √e2ti 1 × Xi=1 − i=Xm+1  −  T 2 2 T fz(⋆)X0iZi (1 ⋆ ) pi + (1 pi)[2Φ(⋆) 1] 2(1 pi)(fz(⋆)) ⋆X0iZi − { − − }− 2 − . × ( pi + (1 pi)[2Φ(⋆) 1] ) { − − }

∂2log(L) Now, we must calculate ∂γ∂βT . This expression is

T X0iZi Wi β T 2 N 2fz(⋆) t e Wi ∂ log(L) − √e2 i −1   = 2 . ∂γ∂βT W T i=m+1 (1 + e i β)(p + (1 p )[2Φ(⋆) 1]) X i − i − n o Continuing the formulation of the observed information matrix, we now calculate the second derivative of the log-likelihood with respect to β to get

T m W β T N ∂2 log(L) e i WiWi 1 [2Φ(⋆) 1] = − T + − − T T Wi β 2 Wi β 4 ∂β∂β (1 + e ) ! (pi + (1 pi)[2Φ(⋆) 1])(1 + e ) ! × Xi=1 i=Xm+1 − − W T 2 i β T T T e (1 [2Φ(⋆) 1])WiWi Wi β 2Wi β T − − − + e (1 e )WiWi . × (p + (1 p )[2Φ(⋆) 1]) − ( i − i − )

∂2log(L) Then, calculating ∂β∂γT , we get

T 2 N Wi β T ∂ log(L) 2e fz(⋆) ⋆ WiZi = −T . T Wi β 2 2 ∂β∂γ (1 + e ) (pi + (1 pi)[2Φ(⋆) 1]) i=Xm+1 − −

Thus, the observed information matrix is

∂2log(L) ∂2log(L) ∂γ∂γT ∂γ∂βT ˆ − γ=γˆMLE − γ=γˆMLE , β=βMLE  

∂2log(L) ∂2log(L)  T T  ∂β∂γ ˆ ∂β∂β ˆ  − γ=γˆMLE , β=βMLE − β=βMLE     

The inverse of this matrix is the estimated variance-covariance matrix. The square root of the diagonal elements of this inverted matrix equal the estimated standard errors of the maximum likelihood estimators of the model parameters. The derivations in this appendix are original work by the author.

116 Appendix C: Random Effects Density Function and Survival Function Derivations

We begin our derivation of the first hitting time density function by adding the random effect (λ) to the OU-TR model through the initial health component. We

T define this as X0 = λ exp(Z β). Then, our density function becomes

2 e2t (λX )2 f(t λ)= λX exp 0 . | π 0 (e2t 1)3/2 −2(e2t 1) r −  − 

This is the pdf of the event time given λ. We let φ = λ2 gamma(ψ,ψ). We ∼ selected φ = λ2 to be distributed as gamma(ψ,ψ) since it is a conjugate pair with

our likelihood; we can integrate over the space of random effects and end up with a

closed form marginal pdf. We substitute φ = λ2 into the above conditional pdf to get

2 e2t φX2 f(t φ)= φX exp 0 . | π 0 (e2t 1)3/2 −2(e2t 1) r   p − −

From the definition of conditional density, the joint density follows as f(t,φ) = f(t φ)f(φ). Integrating this joint density over the space of φ we can get the marginal | pdf of t by performing the following calculations

117 2t 2 ψ ∞ 2 e φX0 ψ ψ 1 φψ f(t) = φX exp φ − e− dφ π 0 (e2t 1)3/2 −2(e2t 1) Γ(ψ) 0 r     Z p − −

2t ψ 2 2 e ψ ∞ φX0 ψ 1 = X exp φψ φ − φdφ π 0 (e2t 1)3/2 Γ(ψ) −2(e2t 1) − r   0   − Z − p

2t ψ 2 2 e ψ ∞ X0 ψ 1/2 = X exp φ ψ + φ − dφ. π 0 (e2t 1)3/2 Γ(ψ) − 2(e2t 1) r  −  Z0   − 

We recognize the integrand as the kernel of a gamma(ψ +1/2,ψ + X2/2(e2t 1)) 0 − distribution. Multiplying the integrand by ((ψ + X2/2(e2t 1))ψ+1/2)/Γ(ψ +1/2), 0 − the integral becomes 1 and the resulting marginal pdf of t is

2 e2tψψΓ(ψ +1/2) f(t)= X0 ψ+1/2 . π X2 r 2t 3/2 0 (e 1) Γ(ψ) 2(e2t 1) + ψ − −   The cdf is

ψ t 2z 2 X0Γ(ψ +1/2)ψ e F (t)= ψ+1/2 dz. π Γ(ψ) X2 r 0 2z 3/2 0   Z (e 1) 2(e2z 1) + ψ − − h i Thus, the survival function can be written as

ψ t 2z 2 X0Γ(ψ +1/2)ψ e S(t)=1 ψ+1/2 dz. − π Γ(ψ) X2 r 0 2z 3/2 0   Z (e 1) 2(e2z 1) + ψ − − h i

The derivations in this appendix are original work by the author.

118 Appendix D: Simplification of the Likelihood Function Under the OU-TR Random Effects Model

Under the OU-TR random effects model, assuming that the first m observations are

event times and the last N m are censoring times, the likelihood is −

m 2 2ti ψ π X0ie ψ Γ(ψ + 1/2) L = ψ+1/2  q X2  × i=1 2ti 3/2 0i (e 1) Γ(ψ) 2ti − + ψ Y  − 2(e 1)    2   N ψ ti 2z π X0iΓ(ψ + 1/2)ψ e 1 dz .  q 2 ψ+1/2  × −  Γ(ψ)  0 2z 3/2 X0i i=m+1  Z (e 1) 2z + ψ  Y  − 2(e −1)    h i  

This likelihood is difficult to maximize with respect to the regression coefficients and

ψ parameter due to the complicated form coupled with the necessary use of numerical integration. A much simpler form of the likelihood was derived through simplification and a change of variables. We start with the simplification of the marginal probability density of the first hitting time, f(t). As derived in Appendix C:

2 e2tψψΓ(ψ +1/2) f(t) = X0 ψ+1/2 . π X2 r 2t 3/2 0 (e 1) Γ(ψ) 2(e2t 1) + ψ − −  

119 2t 1/2 We let u = (e 1)− X /√2ψ and we also note that B(a,b) = Γ(a)Γ(b)/Γ(a + b) − 0 is the beta function and Γ(1/2) = √π. Then, we can rewrite f(t) as

√ √2ψu 2ψu2 ψ 2X0 X X2 +1 ψ Γ(ψ +1/2) f(u) = 0 0 h  2 i2 ψ+1/2 X0 2ψu Γ(1/2)Γ(ψ) 2 + ψ 2 X0   √ √2ψu 2ψu2 ψ 2X0 X X2 +1 ψ = 0 0 B(1/2,ψh )(u2 + 1)ψ+1/2ψiψ+1/2

2ψu2 2√ψu X2 +1 = 0 B(1/2,ψ)(u2 + 1)ψ+1/2√ψ

2ψu2 2u( X2 + 1) = 0 . B(1/2,ψ)(u2 + 1)ψ+1/2

120 To simplify the survival function, we start by working on the integral portion

t e2z I = dz. 2 ψ+1/2 0 2z 3/2 X0 Z (e 1) 2(e2z 1) + ψ − − h i X0 2z 1/2 Then, using substitution, we set v = (e 1)− . It follows that √2ψ −

X0 2z 3/2 2z dv = (e 1)− e dz. − √2ψ −

X0 2t 1/2 Letting v (t)= (e 1)− which is greater than 0 for all t> 0, we can rewrite 0 √2ψ − I as

√2ψ ∞ dv I = X0 2 2 ψ+1/2 v0(t) X0 v 2ψ Z 2 + ψ 2X0 h √2ψ i ∞ dv = X0 (v2ψ + ψ)ψ+1/2 Zv0(t) √2ψ ∞ dv = X0 ψψ+1/2(v2 + 1)ψ+1/2 Zv0(t) √2 ∞ dv = . X ψψ (v2 + 1)ψ+1/2 0 Zv0(t)

⋆ 1 ⋆ If we let v = tan(θ) < , then θ (t) = tan− (v (t)) > 0 with θ (t) < π/2. Then we 0 ∞ 0 can write I as

π/2 √2 2ψ 1 − I = ψ [cos(θ)] dθ. X ψ ⋆ 0 Zθ (t)

121 To further simplify I, for x,y > 0, we note that

π/2 2x 1 2y 1 B(x,y)=2 [sin(θ)] − [cos(θ)] − dθ, Z0 is, by definition, the beta function. This implies

π/2 1 2ψ 1 B(1/2,ψ)= [cos(θ)] − dθ. 2 Z0 Now we can rewrite I as

π/2 θ⋆ √2 2ψ 1 2ψ 1 I = [cos(θ)] − dθ [cos(θ)] − dθ X ψψ − 0 "Z0 Z0 # √2 1 1 = B(1/2,ψ) B ⋆ (1/2,ψ) X ψψ 2 − 2 θ 0   1 = [B(1/2,ψ) Bθ⋆ (1/2,ψ)] ψ √2X0ψ − where Bθ⋆ (1/2,ψ) is the incomplete beta function defined as

z 2x 1 2y 1 Bz(x,y)= [sin(θ)] − [cos(θ)] − dθ. Z0 Substituting I back into the survival function, S(t), we get

2 ψ π X0Γ(ψ +1/2)ψ 1 S(t) = 1 [B(1/2,ψ) Bθ⋆ (1/2,ψ)] − q Γ(ψ)  √2X ψψ −  0  Γ(ψ +1/2)  = 1 [B(1/2,ψ) B ⋆ (1/2,ψ)] − √πΓ(ψ) − θ   1 = 1 [B(1/2,ψ) B ⋆ (1/2,ψ)] − B(1/2,ψ) − θ   B ⋆ (1/2,ψ) = θ . B(1/2,ψ)

122 Thus, the likelihood simplifies to

2ψu2 m 2u i +1 N 1 i X2 Bθ⋆ ( ,ψ) L = 0i i 2 1 ψ+ 1 1   2  2  B( ,ψ) i=1 B( ,ψ)(ui + 1) i=m+1 " 2 # Y 2 Y   2t where ui = X0i/ 2ψ(e i 1), B(a,b) is the beta function, and Bθ⋆ (a,b) is the − i p ⋆ 2 1 incomplete beta function where θi = sin [tan− (ui)].

The derivations in this appendix are original work by the author with assistance from

Dr. William Baker from the Air Force Institute of Technology, Wright-Patterson

AFB.

123 Appendix E: Standard Error Derivations for OU-TR Random Effects Model

In order to make the presentation cleaner, we denote v = (e2ui 1) where u is i − i 2 T used in the integrals from 0 to ti, wi = (ψ + X0i/(2vi)), and x0i = exp(γ Zi) where

T the regression coefficient vector is given by γ = (γ0,γ1,...,γp) and the covariate

T vector is given by Zi = (z0,z1,...,zp) . The likelihood under this model, assuming

that the first m observations are event times and the last N m are censoring times, − is

m 2 e2ti Γ(ψ + 1/2)ψψ N 2 Γ(ψ + 1/2)ψψ t e2ui L = x0i 1 x0i du π 3/2 ψ+1/2 − π Γ(ψ) 3/2 ψ+1/2 i=1 "r v Γ(ψ)w # i=m+1 " r 0 v w # Y i i Y Z i i and the corresponding log-likelihood is

m 2 e2ti Γ(ψ + 1/2)ψψ N 2 Γ(ψ + 1/2)ψψ t e2ui log(L)= log x0i + log 1 x0i du . π 3/2 ψ+1/2 − π Γ(ψ) 3/2 ψ+1/2 i=1 "r v Γ(ψ)w # i=m+1 " r 0 v w # X i i X Z i i

Further annotation is necessary to make the derivations clear. We will need to take the first and second derivatives of the gamma function (Γ(X)). In Matlab, the function psi(X) is the first derivative of log(Γ(X)), and psi(1,X) is the second derivative of log(Γ(X)). Therefore, we will denote these derivatives as such in the following derivations.

124 For simplification, we denote the following:

t e2ui Q1 = du 3/2 ψ+1/2 0 Z vi wi 2 Γ(ψ + 1/2)ψψ ⋆ = 1 x0i Q1 − r π Γ(ψ) Γ′(ψ + 1/2)ψψ + Γ(ψ + 1/2)ψψ log(ψ + 1) Γ(ψ + 1/2)ψψpsi(ψ) A = − Γ(ψ) t e2ui (log(w )+(ψ + 1/2)/w ) Q2 = − i i du 3/2 ψ+1/2 0 Z vi " wi # 2 Γ(ψ + 1/2)ψψ d⋆ψ = x0i AQ1+ Q2 − π Γ(ψ) r   [Γ′(ψ + 1/2)ψψ + Γ(ψ + 1/2)ψψ(log(ψ) + 1)][log(ψ) + 1 2psi(ψ)]+Γ(ψ + 1/2)ψψ[1/ψ psi(1,ψ) + (psi(ψ))2] dAψ = − − Γ(ψ) Γ′′(ψ + 1/2)ψψ +Γ′(ψ + 1/2)ψψ(log(ψ)+1) + Γ(ψ)

t 2ui 2 2 e 2/wi (ψ + 1/2)/w (log(wi)+(ψ + 1/2)/wi) Q3 = − i − du 3/2 ψ+1/2 0 Z vi " wi # x2 (ψ + 1/2) 0i t e2ui − vi Q4 =   du 3/2  ψ+3/2  0 Z vi wi     2 Γ(ψ + 1/2)ψψ  d⋆γ = (Q1+ Q4)x0iZi −r π Γ(ψ) 2 2 2 (ψ+1/2)x0i x0i ψ+1/2 (ψ+1/2)x0i t 2ui + log(wi)+ e w2vi wivi wi wivi Q5 = i − du 3/2   ψ+1/2   0 Z vi wi   t e2ui  Q6 = du 5/2 ψ+3/2 0 Z vi wi t e2ui (ψ + 1/2)x2 Q7 = 0i du 5/2 ψ+3/2 0 Z vi " wi # ψ+3/2 t 2ui 2 1 (ψ + 1/2) log(w )+ e x i wi Q8 = 0i − du 5/2 ψ+3/2 0     Z vi wi 2  x i  t 2ui 2 (ψ + 3/2) 0 e wivi Q9 = − du 5/2 ψ+3/2 0   Z vi wi  

125 The second derivative of the log-likelihood with respect to γ is

2 m 4 2 ∂ log(L) x0i 2x0i T = (ψ + 1/2) + ZiZi T 2 2 ∂γ∂γ − − wi vi wivi Xi=1     N T ψ d ⋆γ d⋆γ x0i 2 Γ(ψ + 1/2)ψ 2 2 T + [Q1 2x (ψ + 1/2)Q6 x (ψ + 1/2)Q9]ZiZi . − ⋆2 ⋆ π Γ(ψ) − 0i − 0i i=m+1 ( r ) X

∂2log(L) Now, we must calculate ∂γ∂ψ . This expression is

m ∂2log(L) x2 Zi ψ + 1/2 = 0i 1 ∂γ∂ψ − viwi − wi Xi=1   2 N Zi ψ d ⋆γ d⋆ π x0i Γ(ψ + 1/2)ψ ψ + A(Q1 Q7) + (Q2 Q8) . − ⋆2 q ⋆ − Γ(ψ) − i=Xm+1   Continuing the formulation of the observed information matrix, we now calculate the second derivative of the log-likelihood with respect to ψ, we get

∂2 log(L) m 1 2 (ψ + 1/2) = psi(1,ψ + 1/2) + psi(1,ψ) + 2 2 ∂ψ ψ − − wi wi Xi=1 N 2 2 x ψ (d⋆ψ) π 0i Γ(ψ + 1/2)ψ + dAψQ1 + 2AQ2+ Q3 . − ⋆2 q ⋆ Γ(ψ) i=Xm+1    

∂2log(L) Then, calculating ∂ψ∂γ , we get

2 m ∂ log(L) 1 (ψ + 1/2) 2 Zi = + 2 x0i ∂ψ∂γ − wivi wi vi Xi=1   N 2 ψ d ⋆γ d⋆ψ π Γ(ψ + 1/2)ψ + A(Q1+ Q4) + (Q2+ Q5) x0iZi. − ⋆2  q⋆  Γ(ψ) i=m+1     X     Thus, the observed information matrix is

∂2log(L) ∂2log(L) ∂γ∂γT ∂γ∂ψ ˆ − γ=γˆMLE − γ=γˆMLE ,ψ=ψMLE  

∂2log(L) ∂2log(L)  T 2  ∂ψ∂γ ˆ ∂ψ ˆ  − ψ=ψMLE , γ=γˆMLE − ψ=ψMLE     

The inverse of this matrix is the estimated variance-covariance matrix. The square root of the diagonal elements of this inverted matrix equal the estimated standard errors of the maximum likelihood estimators of the model parameters. The derivations in this appendix are original work by the author.

126 Appendix F: Standard Error Derivations for OU-TR Random Effects Mixture Model

In order to make the presentation cleaner, we denote v = (e2ui 1) where u is i − i 2 T used in the integrals from 0 to ti, ri = (ψ + X0i/(2vi)), and x0i = exp(γ Zi) where

T the regression coefficient vector is given by γ = (γ0,γ1,...,γp) and the covariate

T vector is given by Zi = (z0,z1,...,zp) . The mixing parameter is defined as pi =

T T exp(β Wi)/(1+exp(β Wi)) where the mixing parameter regression coefficient vector

T T is β =(β0, β1,...,βq) and Wi =(w0,w1,...,wq) is the applicable covariate vector.

Note that Zi and Wi need not be the same set of covariates. The likelihood under

this model, assuming that the first m observations are event times and the last N m − are censoring times, is

m 2 e2ti Γ(ψ + 1/2)ψψ L = (1 p ) x − i π 0i 3/2 ψ+1/2 × i=1 " r v Γ(ψ)r # Y i i N 2 Γ(ψ + 1/2)ψψ t e2ui p + (1 p ) 1 x du i − i − π 0i Γ(ψ) 3/2 ψ+1/2 i=m+1 ( " r 0 v r #) Y Z i i and the corresponding log-likelihood is

m 2 e2ti Γ(ψ + 1/2)ψψ log(L) = log (1 p ) x + − i π 0i 3/2 ψ+1/2 i=1 " r v Γ(ψ)r # X i i N 2 Γ(ψ + 1/2)ψψ t e2ui log p + (1 p ) 1 x du . i − i − π 0i Γ(ψ) 3/2 ψ+1/2 i=m+1 ( " r 0 v r #) X Z i i

127 Further annotation is necessary to make the derivations clear. We will need to

take the first and second derivatives of the gamma function (Γ(X)). In Matlab, the function psi(X) is the first derivative of log(Γ(X)), and psi(1,X) is the second derivative of log(Γ(X)). Therefore, we will denote these derivatives as such in the following derivations. For simplification, we denote the following:

t e2ui Q1 = du 3/2 ψ+1/2 0 Z vi ri 2 Γ(ψ + 1/2)ψψ ⋆ = 1 x0i Q1 − r π Γ(ψ) 2 Γ(ψ + 1/2)ψψ Ω = pi + (1 pi) 1 x0i Q1 − " − r π Γ(ψ) # Γ′(ψ + 1/2)ψψ + Γ(ψ + 1/2)ψψ log(ψ + 1) Γ(ψ + 1/2)ψψpsi(ψ) A = − Γ(ψ) t e2ui (log(r )+(ψ + 1/2)/r ) Q2 = − i i du 3/2 ψ+1/2 0 Z vi " ri # 2 Γ(ψ + 1/2)ψψ d⋆ψ = x0i AQ1+ Q2 − π Γ(ψ) r   [Γ′(ψ + 1/2)ψψ + Γ(ψ + 1/2)ψψ(log(ψ) + 1)][log(ψ) + 1 2psi(ψ)]+Γ(ψ + 1/2)ψψ[1/ψ psi(1,ψ) + (psi(ψ))2] dAψ = − − Γ(ψ) Γ′′(ψ + 1/2)ψψ +Γ′(ψ + 1/2)ψψ(log(ψ)+1) + Γ(ψ)

t 2ui 2 2 e 2/ri (ψ + 1/2)/r (log(ri)+(ψ + 1/2)/ri) Q3 = − i − du 3/2 ψ+1/2 0 Z vi " ri # x2 (ψ + 1/2) 0i t e2ui − vi Q4 =   du 3/2  ψ+3/2  0 Z vi ri     2 Γ(ψ + 1/2)ψψ  d⋆γ = (Q1+ Q4)x0iZi −r π Γ(ψ) 2 2 2 (ψ+1/2)x0i x0i ψ+1/2 (ψ+1/2)x0i t 2ui + log(ri)+ e r2vi wivi ri wivi Q5 = i − du 3/2   ψ+1/2   0 Z vi ri   t e2ui  Q6 = du 5/2 ψ+3/2 0 Z vi ri t e2ui (ψ + 1/2)x2 Q7 = 0i du 5/2 ψ+3/2 0 Z vi " ri # ψ+3/2 t 2ui 2 1 (ψ + 1/2) log(r )+ e x i ri Q8 = 0i − du 5/2 ψ+3/2 0     Z vi ri 2  x i  t 2ui 2 (ψ + 3/2) 0 e wivi Q9 = − du 5/2 ψ+3/2 0   Z vi ri  

128 The second derivative of the log-likelihood with respect to γ is

2 m 4 2 ∂ log(L) x0i 2x0i T = (ψ + 1/2) + ZiZi T 2 2 ∂γ∂γ − − ri vi rivi Xi=1     N T ψ (1 pi)d ⋆γ d⋆γ x0i 2 Γ(ψ + 1/2)ψ 2 2 T (1 pi) − + [Q1 2x (ψ + 1/2)Q6 x (ψ + 1/2)Q9]ZiZi . − − Ω2 Ω π Γ(ψ) − 0i − 0i i=m+1 ( r ) X The remaining partial derivatives required are

m ∂2log(L) x2 Zi ψ + 1/2 = 0i 1 ∂γ∂ψ − viri − ri Xi=1   2 N x Zi ψ (1 pi)d ⋆γ d⋆ψ π 0i Γ(ψ + 1/2)ψ (1 pi) − + A(Q1 Q7) + (Q2 Q8) − −  Ω2 q Ω − Γ(ψ) −  i=m+1   X    2 N ∂ log(L) d⋆γ   = pi(1 pi)[Ω + (1 pi)(1 ⋆)]Zi . ∂γ∂β − Ω2 { − − − } i=Xm+1 Continuing the formulation of the observed information matrix, we now calculate the second derivative of the log-likelihood with respect to β, we get

2 m N 2 2 2 2 ∂ log(L) T (pi(1 pi) pi (1 pi))Ω pi (1 pi) (1 ⋆) T = pi(1 pi)WiWi + (1 ⋆) − − − − − − WiWi . ∂β∂βT − − − Ω2 Xi=1 i=Xm+1   The remaining partial derivatives required are

2 N ∂ log(L) p (1 p )d ⋆γ Wi (1 p )(1 ⋆) = i − i 1+ − i − ∂β∂γT − Ω Ω i=Xm+1    2 N ∂ log(L) pi(1 pi)d ⋆ Wi (1 p )(1 ⋆) = − ψ 1+ − i − . ∂β∂ψ − Ω Ω i=Xm+1    To finish the formulation of the observed information matrix, we now calculate the second derivative of the log-likelihood with respect to ψ. We get

∂2 log(L) m 1 2 (ψ + 1/2) = psi(1,ψ + 1/2) + psi(1,ψ) + 2 2 ∂ψ ψ − − ri ri Xi=1 N 2 2 x ψ (1 pi)(d⋆ψ) π 0i Γ(ψ + 1/2)ψ (1 pi) − + dAψQ1 + 2AQ2+ Q3 . − −  Ω2 q Ω Γ(ψ)  i=m+1     X     The remaining partial derivatives required are

2 m ∂ log(L) 1 (ψ + 1/2) 2 Zi = + 2 x0i ∂ψ∂γ − rivi ri vi Xi=1   N 2 ψ (1 pi)d ⋆γ d⋆ψ π Γ(ψ + 1/2)ψ (1 pi) − + A(Q1+ Q4) + (Q2+ Q5) x0iZi − −  Ω2  qΩ  Γ(ψ)  i=m+1     X     2 N   2   ∂ log(L) pi(1 pi)WiΩ pi(1 pi) Wi(1 ⋆) = d ⋆ψ − − − − . ∂ψ∂β − Ω2 i=Xm+1  

129 Thus, the observed information matrix is

∂2log(L) ∂2log(L) ∂2log(L) ∂γ∂γT ∂γ∂βT ˆ ∂γ∂ψ ˆ − γ=γˆMLE − γ=γˆMLE , β=βMLE − γ=γˆMLE ,ψ=ψMLE  

∂2log(L) ∂2log(L) ∂2log(L)  ∂β∂γT ˆ ∂β∂βT ˆ ∂β∂ψ ˆ ˆ   − β=βMLE , γ=γˆMLE − β=βMLE − β=βMLE ,ψ=ψMLE       2 2 2   ∂ log(L) ∂ log(L) ∂ log(L)  ∂ψ∂γT ˆ ∂ψ∂βT ˆ ˆ ∂ψ2 ˆ  − ψ=ψMLE , γ=γˆMLE − ψ=ψMLE , β=βMLE − ψ=ψMLE     

The inverse of this matrix is the estimated variance-covariance matrix. The square root of the diagonal elements of this inverted matrix equal the estimated standard errors of the maximum likelihood estimators of the model parameters. The derivations in this appendix are original work by the author.

130 Appendix G: Newly Developed Matlab Functions for Fitting OU-TR Models to Data

The following Matlab functions were developed during this research by the author.

Requests for the code may be sent to Roger Erich at roger.erich@afit.edu or Michael

Pennell at [email protected].

G.1 OU-TR Model

OUTR

Ornstein-Uhlenbeck threshold regression model using maximum likelihood estimation

Syntax

[gam,eststderror,pvalues,loglike,convergence] = OUTR(data,t,c,start)

Description

This function returns maximum likelihood estimates (gam), standard errors

(eststderror), and the Wald test p-values (pvalues) for the intercept and regression coefficients from the log-link function for initial health (X0). The corresponding log-likelihood (loglike) and an output flag (convergence) identifying whether or not convergence was achieved through use of the fminsearch function (1 = converged, 0

= did not converge) are also returned.

131 Input into the function consists of data, an n-by-(p + 1) design matrix where the rows correspond to observations and the columns correspond to predictor variables

(the first column of this data matrix must contain all ones in order to appropriately estimate the intercept term in the log-link function), t which is an n-by-1 vector of observation times, c which is an n-by-1 vector of event indicators (0 = censored, 1 = event) and start which is a 1-by-(p+1) vector of initial guesses for the intercept (g0) and p regression coefficients (g1,...,gp). For example, starting values required for a model with 3 covariates linked to X0 need to be in the following order: [g0,g1,g2,g3].

This function utilizes fminsearch (an already existing Matlab function) to maximize the log-likelihood. Computational time required for the OUTR function to estimate the regression coefficients ranges from less than a second for models with one covariate to 8 seconds for a model with 6 covariates. While our previous simulations and real data analyses utilizing this function have not been sensitive to starting values, we still recommend trying several sets of starting values prior to reporting the results.

Additionally, OUTR requires the OUTRlikelihood function. This is the negative of the log-likelihood function (Equation (3.5)) minimized using fminsearch in order to obtain the MLEs.

132 Example

To demonstrate the use of OUTR, we fit the final OU-TR model for the oropharynx data described in Section 3.3. onesvec is a column vector of ones, and condition and tstage are column vectors consisting of the values of the variables “condition” (coded as 1 if patient had a disability, 0 if not) and “T-Stage” (coded as 1 if patient had a massive tumor, 0 if not). The following function call to OUTR produces the results given below:

[gam,eststderror,pvalues,loglike,convergence] = OUTR([onesvec condition tstage],t,c,[0 0 0])

Output: gam = 1.1161 -0.9807 -0.5126 eststderror = 0.0675 0.1189 0.1099 pvalues = 1.0e-005 *( 0 0 0.3084) loglike = -235.1671 convergence = 1

133 G.2 OU-TR Mixture Model

OUTRMM

Ornstein-Uhlenbeck threshold regression mixture model using maximum likelihood

estimation

Syntax

[gam,beta,stderrgam,stderrbeta,pvgam,pvbeta,loglike,convergence] = OUTRMM(dataxo,datap,t,c,start)

Description

This function returns maximum likelihood estimates (gam and beta), standard

errors (eststderrorgam and eststderrorbeta), and the Wald test p-values (pvaluesgam

and pvaluesbeta) for the intercept and regression coefficients from the log-link function

for initial health (X0) and the logistic function linked to the cure rate (p). The corresponding log-likelihood (loglike) and an output flag (convergence) identifying whether or not convergence was achieved through use of the fminsearch function (1

= converged, 0 = did not converge) are also returned.

Input into the function consists of dataxo in the form of an n-by-(m + 1) design matrix where the rows correspond to the observations and the columns correspond to predictor variables linked to X0 (the first column of this dataxo matrix must

contain all ones in order to appropriately estimate the intercept term in the log-

link function), datap in the form of an n-by-(q + 1) design matrix where the rows correspond to the observations and the columns correspond to predictor variables linked to p (the first column of this datap matrix must contain all ones in order to appropriately estimate the intercept term in the logistic function), t which is an n- by-1 vector of observation times, c which is an n-by-1 vector of event indicators (0 = censored, 1 = event) and start which is a 1-by-(m + q + 2) vector of initial guesses

134 for the intercepts and regression coefficients (g0,g1,...,gm) and (b0,b1,...,bq) in that order. For example, the starting values needed for a model with 3 covariates linked to initial health and 2 covariates linked to the cure rate need to be in the following order: [g0,g1,g2,g3,b0,b1,b2].

This function utilizes the fminsearch function to maximize the log-likelihood.

Computational time required for the OUTRMM function to estimate the regression coefficients ranges from 6 seconds for models with one covariate in X0 and one covariate in p to 31 seconds for a model with 4 covariates in X0 and 4 covariates in p. Due to model sensitivity to the starting values, we recommend trying several sets of starting values prior to reporting the results.

Additionally, OUTRMM requires the OUTRMMlikelihood function. This is the negative of the log-likelihood function (Equation (4.7)) minimized using fminsearch in order to obtain the MLEs.

Example

To demonstrate the use of OUTRMM, we the final OU-TR mixture model for the melanoma data described in Section 4.3. onesvec is a column vector of ones, and node1, node2 and node3 are column vectors of data from the melanoma study containing the values of indicator variables for nodal categories N1 N3. The − following function call to OUTRMM produces the results given below:

[gam,beta,stderrgam,stderrbeta,pvgam,pvbeta,loglike,convergence] = OUTRMM([onesvec node1 node2 node3],[onesvec node1 node2 node3],t,c,[0 -1 -1 -1 -1 -1 -1 -1])

135 Output: gam = 0.3602 -0.5058 -0.6148 -0.6971 beta = -0.0844 -0.4040 -1.0298 -1.3452 eststderrorgam = 0.1201 0.1487 0.1530 0.1544 eststderrorbeta = 0.2506 0.3240 0.3763 0.4029 pvaluesgam = 0.0027 0.0007 0.0001 0.0000 pvaluesbeta = 0.7362 0.2125 0.0062 0.0008 loglike = -396.7977 convergence = 1

136 G.3 OU-TR Random Effects Model

OUTRREM

Ornstein-Uhlenbeck random effect threshold regression model using maximum likelihood

estimation

Syntax

[gam,Psi,eststderrorgam,eststderrorpsi,pvaluesgam,loglike,convergence] = OUTRREM(data,t,c,start)

Description

This function returns maximum likelihood estimates (gam), standard errors

(eststderrorgam), and the Wald test p-values (pvaluesgam) for the intercept and regression coefficients from the log-link function for initial health (X0). In addition,

the MLE of the random effect precision parameter ψ (Psi) and the corresponding

standard error (eststderrorpsi) are returned. Also, the corresponding log-likelihood

(loglike) and an output flag (convergence) identifying whether or not convergence

was achieved through use of the fminsearch function (1 = converged, 0 = did not

converge) are returned.

Input into the function consists of data in the form of an n-by-(p + 1) design matrix where the rows correspond to observations and the columns correspond to predictor variables (the first column of this data matrix must contain all ones in order to appropriately estimate the intercept term in the log-link function), t which is an n-by-1 vector of observation times, c which is an n-by-1 vector of event indicators

(0 = censored, 1 = event) and start which is a 1-by-(p+2) vector of initial guesses for the intercept (g0), p regression coefficients (g1,...,gp), and the random effect precision parameter ψ. For example, the starting values needed for a model with 3 covariates linked to initial health need to be in the following order: [g0,g1,g2,g3,ψ].

137 This function utilizes fminsearch (an already existing Matlab function) to maximize the log-likelihood. Computational time required for the OUTRREM function to estimate the regression coefficients and ψ ranges from 17 seconds for models with one

covariate to 30 seconds for a model with 6 covariates. Since our previous simulations

and real data analyses utilizing this function were moderately sensitive to starting

values, we recommend trying several sets of starting values (with the starting value

of ψ > 0) prior to reporting the results.

Additionally, OUTRREM requires the OUTREMlikelihood, rndeff1, rndeff2, rndeff3,

rndeff4, rndeff5, rndeff6, rndeff7, rndeff8 and rndeff9 functions. OUTREMlikelihood is

the negative of the log-likelihood function (Equation (5.8)) minimized using fminsearch

in order to obtain the MLEs, and the 9 rndeff functions are necessary for estimating

standard errors.

Example

To demonstrate the use of OUTRREM, we fit the OU-TR random effects model

applied to the oropharynx data described in Section 5.1.3. onesvec is a column vector

of ones, and condition and tstage are column vectors consisting of the values of the variables “condition” (coded as 1 if patient had a disability, 0 if not) and “T-Stage”

(coded as 1 if patient had a massive tumor, 0 if not). The following function call to

OUTRREM produces the results given below:

[gam,Psi,eststderrorgam,eststderrorpsi,pvaluesgam,loglike,convergence] = OUTRREM([onesvec condition tstage],t,c,[1 0.5 0.5 1])

138 Output: gam = 1.3833 -0.9866 -0.5384

Psi = 2.0501 eststderrorgam = 0.1355 0.1544 0.1438 eststderrorpsi = 0.7476 pvaluesgam = 1.0e-003*(0 0 0.1813) loglike = -108.0626 convergence = 1

139 G.4 OU-TR Random Effects Mixture Model

OUTRREMM

Ornstein-Uhlenbeck random effect threshold regression mixture model using maximum

likelihood estimation

Syntax

[gam,beta,Psi,stderrgam,stderrbeta,stderrPsi,pvgam,pvbeta,loglike,convergence] = OUTRREMM(dataxo,datap,t,c,start)

Description

This function returns maximum likelihood estimates (gam and beta), standard

errors (eststderrorgam and eststderrorbeta), and the Wald test p-values (pvaluesgam

and pvaluesbeta) for the intercept and regression coefficients from the log-link function

for initial health (X0) and the logistic function linked to the cure rate (p). In addition,

the MLE of the random effect parameter ψ (Psi) and the corresponding standard

error (eststderrorPsi) are returned. Also, the corresponding log-likelihood (loglike)

and an output flag (convergence) identifying whether or not convergence was achieved

through use of the fminsearch function (1 = converged, 0 = did not converge) are

returned.

Input into the function consists of dataxo in the form of an n-by-(m + 1) design matrix where the rows correspond to the observations and the columns correspond to predictor variables linked to X0 (the first column of this dataxo matrix must

contain all ones in order to appropriately estimate the intercept term in the log-

link function), datap in the form of an n-by-(q + 1) design matrix where the rows

correspond to the observations and the columns correspond to predictor variables

linked to p (the first column of this datap matrix must contain all ones in order to

140 appropriately estimate the intercept term in the logistic function), t which is an n- by-1 vector of observation times, c which is an n-by-1 vector of event indicators (0 = censored, 1 = event) and start which is a 1-by-(m + q + 3) vector of initial guesses for the intercepts and regression coefficients (g0,g1,...,gm) and (b0,b1,...,bq) and the random effect parameter ψ in that order. For example, the starting values needed for a model with 3 covariates linked to initial health and 2 covariates linked to the cure rate need to be in the following order: [g0,g1,g2,g3,b0,b1,b2,ψ].

This function utilizes the fminsearch function to maximize the log-likelihood.

Computational time required for the OUTRREMM function to estimate the regression coefficients and ψ ranges from 45 seconds for models with one covariate in X0 and one covariate in p to 122 seconds for a model with 4 covariates in X0 and 4 covariates in p. Due to model sensitivity to the starting values, we recommend trying several sets of starting values (with the starting value of ψ > 0) prior to reporting the results.

Additionally, OUTRREMM requires the OUTRREMMlikelihood, rndeff1, rndeff2, rndeff3, rndeff4, rndeff5, rndeff6, rndeff7, rndeff8 and rndeff9 functions.

OUTRREMMlikelihood is the negative of the log-likelihood function (Equation (5.13)) minimized using fminsearch in order to obtain the MLEs, and the 9 rndeff functions are necessary for estimating standard errors.

141 Example

To demonstrate the use of OUTRREMM, we fit the OU-TR random effects

mixture model applied to the melanoma data described in Section 5.2.3. onesvec is a column vector of ones, and node1, node2 and node3 are column vectors of data from the melanoma study containing the values of the indicator variables for nodal categories N1 N3. The following function call to OUTRREMM produces the results − given below:

[gam,beta,Psi,stderrgam,stderrbeta,stderrPsi,pvgam,pvbeta,loglike,convergence] = OUTRREMM([onesvec node1 node2 node3],[onesvec node1 node2 node3],t,c,[0 -1 -1 -1 -1 -1 -1 -1 1.5])

Output: gam = 0.6964 -0.5607 -0.7694 -0.8744 beta = -0.0994 -0.4003 -1.0247 -1.3329

Psi = 2.5152 eststderrorgam = 0.1891 0.1985 0.2042 0.2062 eststderrorbeta = 0.2525 0.3262 0.3786 0.4046 eststderrorPsi = 0.9863 pvaluesgam = 0.0002 0.0047 0.0002 0.0000 pvaluesbeta = 0.6936 0.2198 0.0068 0.0010 loglike = -391.9231 convergence = 1

142