arXiv:0708.0346v1 [stat.ME] 2 Aug 2007 aamyb nepee as interpreted be may data fabudr rtrsodsaeb apeptsof paths sample by state threshold or boundary a of tcatcPoesRahn Boundary Whitmore A. a a G. and Lee by Reaching Ting Mei-Ling Times Process Event Stochastic Survival Modeling for Analysis: Regression Threshold 06 o.2,N.4 501–513 4, DOI: No. 21, Vol. 2006, Science Statistical [email protected] nvriy oubs ho420 S e-mail: USA 43210, State Ohio Ohio Columbus, Health, University, Public of School Biostatistics, aaaHA15e-mail: 165 H3A Quebec, Canada Montreal, University, McGill of Management, Faculty Desautels Science, Management

e-igTn e sPoesradCar iiinof Division Chair, and Professor is Lee Ting Mei-Ling c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint aytpso ieie uaino time-to-event or duration lifetime, of types Many nttt fMteaia Statistics Mathematical of Institute 10.1214/088342306000000330 .INTRODUCTION 1. nttt fMteaia Statistics Mathematical of Institute , e od n phrases: and and topic words this Key research. of future d aspects for may avenues reviews scale fruitful paper time and This state covariates. threshold the dat process, covariate the accommodate of rameters that structures regression with cess. stoppi Wie time-to-event, process, regression, threshold stochastic analysis, vival time, running process, Ornstein–Uhlen l Poisson exposure, maximum occupational models, time, variable first operational latent studies, lifetime, environmental process, duration, gamma rate, cure risk, ing ies rgeso.I ayapiain,tepoesi l is process the degrad applications, of many unobservable). In operational other progression. tim some The disease or time. time first T the calendar individual. whe for be an state endpoint threshold of clinical adverse a health an experiences the reaches individual or the pr item or underlying an fails the of of strength state the the sents context, natural survival process a arise Wiener In from times chains. ranging hitting processes, stochastic First of data. types survival for models Abstract. 06 o.2,N.4 501–513 4, No. 21, Vol. 2006, .A htoei Professor, is Whitmore A. G. . [email protected] 2006 , rthtigtimes hitting first ayrsaceshv netgtdfis itn ie as times hitting first investigated have researchers Many hehl regression Threshold ceeae etn,clna ie compet- time, calendar testing, Accelerated (FHT’s) This . . in 1 hmhsdeee n ped n xiignew exciting and spread, and deepened in has interest ap- Recently, them applicability. conceptual and their realism of peal, becom- because gradually adopted lit- are widely reliability ing models and FHT data These lifetime eratures. the marriage. application, both a in and of appear theory length covering the articles, for and Relevant time change transition price the stock business, sur- a new the a inventory, of an time of vival time occu- engineering depletion by an the of induced system, time cancer failure a the pa- exposure, a for transplant pational time of a onset of length the time tient, the survival describe the stay, may hospital models soci- and FHT economics ology. business, environmental engineering, medicine, science, including application of fields, history diverse long in a observ- have or models latent FHT be able. may which process, stochastic a eest rthtigtm models first-hitting-time to refers e iuinpro- diffusion ner ekprocess, beck h process the n gtm,sur- time, ng st Markov to es itn time, hitting cs repre- ocess yi many in ly .Tepa- The a. tn (i.e., atent cl can scale e ikelihood, pn on epend discusses to or ation eitem he 2 M.-L. T. LEE AND G. A. WHITMORE areas of application are being encountered. The po- for the process and, therefore, the FHT is usually tential applications require new conceptual view- a stopping time in the formal sense of that term in points, theoretical advances, analytical techniques theory. Note that when the parent and methodological extensions to which the discus- process is latent, there is no direct way of observing sion returns later. the FHT event in the state space of the process. To make FHT models truly valuable in applica- In some versions of the FHT model, there is no tions, they must be capable of extension to include guarantee that process X(t) will reach the bound- regression structures. Regression structures allow the ary set , so P (S< ){< 1.} We will let S = de- effects of covariates to explain the inherent disper- note theB absence of a∞ finite hitting time with P∞(S = sion of the data, thereby taking account of vari- ) = 1 P (S< ). The later discussion will show ability and sharpening inferences. Regression struc- ∞situations− where∞ this condition is plausible and a tures also provide scientific insights into potential desirable model feature. The basic FHT model (1) causal roles of covariates in the underlying processes, assumes that is fixed in time. In some applica- boundary sets and time scales. This article reviews tions, however,B it varies with time, that is, (t). This aspects of FHT models and is concerned especially variation may be deterministic or follow aB stochastic with regression structures for FHT models, which process. will be referred to as threshold regression, or TR An exhaustive review of the first-hitting-time lit- for short. The word “threshold” refers to the fact erature is impossible within the confines of a single that the FHT is triggered by the underlying process article. Eaton and Whitmore (1977) discuss FHT’s reaching a threshold state within a boundary set, as as a general model for hospital stay. Aalen and described in more detail in the next section. Gjessing (2001) provide an excellent overview of much of this subject. Likewise, Lawless (2003) gives a com- 2. THE FIRST-HITTING-TIME (FHT) MODEL plete and compact summary of theory, models and methods (see Section 11.5, pages 518–523). Lee and A first-hitting-time (FHT) model has two basic Whitmore (2004) also provide an overview of first- components: (1) a parent stochastic process X(t),t hitting-time models for survival and time-to-event ,x with initial value X(0) = x , where{ is∈ 0 data. We will make numerous references to selected Tthe time∈ X } space and is the state space of theT pro- work as we proceed. There is a huge literature deal- cess and (2) a boundaryX set , where . We ing with theoretical and mathematical aspects of shall refer to the boundary setB as aB⊂X boundary, FHT models that we will not attempt to review or barrier or threshold, depending onB which is most de- incorporate. We also will not cover random growth scriptive or conventional in the context. The process curve models, such as those of Carey and Koenig X(t) may have a variety of properties, such as one (1991) and Lu and Meeker (1993), which have an {or many} dimensions, the Markov property, continu- FHT interpretation but where the only randomness ous or discrete states, and monotonic sample paths. at the level of the individual parent process is con- Whether the sample path of the parent process is fined to a noise factor. The large literature on linear observable or latent (unobservable) is an important and nonlinear regression methods for survival data distinguishing characteristic of the FHT model. La- and reliability where the underlying models have no tent processes are the most common by far. The central FHT motivation (such as accelerated failure boundary set may also have different features as time and proportional hazards models) is also not will be illustratedB in later examples. covered. Taking the initial value X(0) = x0 of the process to lie outside the boundary set , the first hitting time of is the random variableBS defined as 3. EXAMPLES OF FIRST-HITTING-TIME B MODELS (1) S = inf t : X(t) . { ∈ B} The parent stochastic processes may take many Thus, the first hitting time is the time when the forms, from Wiener processes to Markov chains. Like- stochastic process first encounters set . We refer wise, the nature of the boundary state may vary to the state first encountered in the boundaryB set widely—for example, a fixed threshold in a Wiener by the process, that is, X(S) , as the threshold process or an absorbing state in a Markov chain. state. The boundary set defines∈B a stopping condition As the preceding description of a first-hitting-time THRESHOLD REGRESSION MODELS 3 model is quite abstract, we now list a few basic ex- (1995), Whitmore and Schenkelberg (1997) and Hor- amples to illustrate the variety encountered in ap- rocks and Thompson (2004), to name a few. Onar plications. and Padgett (2000) and Padgett and Tomlinson (2004) 1. Bernoulli process and negative binomial first extend the Wiener diffusion model to an acceler- hitting time. The number of trials S required to ated testing context. Pettit and Young (1999) set reach the mth success in a Bernoulli process Bt,t = the model in a Bayesian context. As illustrated later, 1, 2,... has a negative binomial distribution{ with the inverse Gaussian distribution has a closed-form parameters} m and p, where p is the success prob- probability density function (p.d.f.) and a compu- ability on each trial. To give this setup our stan- tationally simple cumulative distribution function dard representation, we consider a parent process (c.d.f.). Their formulas vary slightly depending on Xt,t = 0, 1, 2,... with initial value X = x = m whether the parent process is defined as rising or { } 0 0 and let Xt = x0 Bt,t = 1, 2,..., where Bt is the falling to hit the relevant boundary. preceding Bernoulli− process. The first hitting{ } time 4. Gamma process and inverse gamma first hitting time. Consider a parent process X(t),t 0 with is the first Bernoulli trial t = S for which Xt equals { ≥ } 0. The number of rocket launches required to get m initial value X(0) = x0 > 0. Let X(t)= x0 Z(t) where Z(t),t 0 is a gamma process with− scale satellites in orbit is a simple example of this FHT { ≥ } model. parameter β, shape parameter α and Z(0) = 0. The 2. Poisson process and Erlang first hitting time. first hitting time of the zero level in the parent pro- The time S until the occurrence of the mth event in cess (X = 0) has an inverse gamma c.d.f., defined a Poisson process N(t),t 0 with rate parameter by the identity P (S>t)= P (Z(t)

FHT models offer the possibility of inferences about particular, P (S< ) = exp( 2x µ/σ2). Likewise, a ∞ − 0 this distance. gamma process with a cure rate might be defined as To illustrate an FHT competing risks model by x0, with probability 1 p, a concrete medical example, consider a multidimen- (4) X(t)= −  x0 Z(t), with probability p. sional Wiener process of C dimensions with a bound- − ary c in each dimension. Each of the C dimensions Here the parameter p is a susceptibility fraction, B with 0 p 1. As an example of this last model, defines a different cause of death c 1, 2,...,C . In ≤ ≤ such a model, one may make inferences∈ { about} sec- a subject may have a malignant or benign form of a disease with probabilities p and 1 p, respectively. ondary medical conditions that are not the primary − cause of death. For example, in studies of occupa- The malignant form progresses monotonically to- tional exposure to diesel exhaust, workers may be ward a medical endpoint (e.g., death). found to have increased risks of death from lung disease, cardiovascular disease and other causes. It 6. COVARIATES AND LINK FUNCTIONS FOR is desirable in such a context to have an FHT model THRESHOLD REGRESSION that is capable of considering different causes of death The parent process X(t) and boundary set of simultaneously. A worker dying of lung cancer (the the FHT model will both{ generally} have parametersB primary cause of death) may have advanced cardio- that depend on covariates that vary across individu- vascular disease, both of which are aggravated by als. To illustrate, consider the Wiener process model exposure to an occupational hazard such as diesel in Example 3. The Wiener process has mean param- exhaust. Then, an investigator’s interest may lie in eter µ and variance parameter σ2 and the boundary making inferences about the worker’s cardiovascu- set has parameter x0, the initial process level. In lar disease status at the time of death from lung threshold regression, these parameters will be con- cancer. We note that if the underlying multidimen- nected to linear combinations of covariates using sional Wiener process is correlated, then the latent suitable regression link functions, as illustrated be- survival times for different causes of death will be low for some parameter, say θ, dependent. (5) gθ(θi)= ziβ.

5. CURE RATES Here gθ is the link function, the parameter θi is the value of the parameter θ for individual i, zi = We mentioned at the outset that some FHT mod- (1, zi1,...,zik) is the covariate vector of individual els may offer a positive probability of no FHT tak- i (with a leading unit to include an intercept term) ing place in finite time. Thus, for example, a med- and β is the associated vector of regression coef- ical treatment may offer a cure, some animals in a ficients. The mathematical form of the link func- population may be immune to infection, some stock tion must be suited to the application. Generally, it prices may never reach $100, and some marriages will be chosen to map the parameter space into the may never end in divorce. The fact that P (S = ) > ∞ real line. For example, a variance parameter such 0 in some FHT models is closely related to compet- as σ2 may employ a logarithmic link function, that ing risks. Generally, if the FHT model takes account is, ln(σ2)= zβ. Likewise, the list of covariates and of all competing risks, then eventual failure from their mathematical forms in the regression function some cause is assured. If, however, the FHT model zβ must be chosen appropriately, as is the case in a takes account of only one or a few competing risks, conventional regression analysis. then there is a positive probability that the FHT will Previous work that has considered regression struc- be infinite to accommodate those individuals who tures for FHT models includes Whitmore (1983), are not susceptible to the limited array of causes of Whitmore, Crowder and Lawless (1998), Lee, failure that are considered in the model. To illustrate DeGruttola and Schoenfeld (2000) and Lee et al. the natural way in which FHT models take account (2004). To illustrate one of these applications, Lee, of a cure rate, consider a Wiener diffusion model DeGruttola and Schoenfeld (2000) use a bivariate with a fixed boundary at zero (the time axis). If the Wiener diffusion process as the basis of a thresh- drift of the process is away from the boundary, that old regression model for the study of progression to is, µ> 0, then a finite FHT is not assured and, in death in AIDS, with CD4 cell count serving as a 6 M.-L. T. LEE AND G. A. WHITMORE marker process (marker processes are discussed in ‘scale’ for measuring time is clock time, although a later section). The initial health status and mean other possibilities certainly arise, such as the use of parameter of the parent process are made to de- operating time of a system, mileage of a car, or some pend on baseline covariates and treatment variables measure of cumulative load encountered.” These ac- through log-linear and identity link functions, re- cumulation measures are increasing with calendar spectively. The mean and variance parameters of the time and thus are alternative progression scales for marker process are also given a regression structure, the stochastic process. Such measures are given a with identity and log-linear link functions. Finally, variety of names, depending on the context, such the correlation parameter for the parent and marker as operational time, disease progression, or running processes uses a correlation transform as a link func- time. We shall mainly use the last name here. If r(t) tion. denotes the transformation of calendar time t to run- ning time r, with r(0) = 0, and X(r) is the par- Threshold regression raises some new issues for es- { } timation and inference in FHT models. Where FHT ent process defined in terms of running time r, then the resulting process expressed in terms of calen- models are estimated only from censored survival ∗ data, parameter estimators may exhibit significant dar time is the subordinated process X (t)= X[r(t)], multicollinearity, especially with highly parameter- where the asterisk identifies the subordinated pro- ized regression functions. This fact does not reflect cess. Adaptations of FHT models to running time any deficiency of the FHT model but, rather, reflects scales may be done in a variety of ways, as we de- the limited information content of sample data in scribe below. The variety includes both random and a rich modeling context. Reparameterization of the nonrandom transformations. We note that r(t) must model can assist with computational problems that be a monotonic transformation but its monotonic- may arise from this multicollinearity but generally ity need not be strictly increasing. Interesting ef- the condition is not sufficiently severe to prevent es- fects arise, for example, where r(t) is a function with jump discontinuities. timates from being computed. The impact is primar- ily felt in the interpretability of the estimation re- 1. Some applications require a monotonic mathe- sults. As with conventional regression, where regres- matical transformation of the time scale. In these sion effects are highly collinear, it will be difficult to cases, r(t) is a deterministic function of calendar attribute the effect to a particular model compo- time t. A typical example from an engineering nent. For example, in threshold regression based on application is the strictly monotonic transforma- censored inverse Gaussian survival data, estimates tion r = 1 exp( λtγ) with λ> 0 and γ> 0. See, − − of covariate effects of the initial value x0 and mean for example, Carey and Koenig (1991), Whitmore parameter µ may be collinear because the mean sur- (1995) and Whitmore and Schenkelberg (1997). vival time depends on their ratio x0/ µ . Thus, the The mathematical transformation may depend high correlation of their sampling errors| | can only on covariates, as in Bagdonaviˇcius and Nikulin be mitigated by having fine details for the disper- (2001), where the running time scale forms part sion pattern of the survival data. Censoring or small of an accelerated life model. sample sizes may mask those fine details and thus 2. Running time may also enter an FHT model us- make estimation more difficult. ing a stochastic process for subordination. In this context, the parent process X(r) is directed by a second stochastic process {R(t) } having mono- 7. RUNNING TIME VERSUS CALENDAR { } TIME tonic sample paths. In this context, we refer to R(t) as the directing process and the subordi- In many applications of threshold regression, the nated{ } process takes the form X∗(t) = X[R(t)] . natural time scale of the parent process is not calen- Unlike a monotonic mathematical{ transformation,} { } dar or clock time. For example, a mechanical compo- subordination with a stochastic process gives the nent may wear according to the amount of its usage transformation random properties that can greatly or liver disease may progress according to an individ- enrich the model. Lee and Whitmore (1993) ex- ual’s cumulative consumption of alcohol. Mathemat- amine the connection between subordinated stochas- ical research on different time scales has been car- tic processes and running time. As a specific ex- ried out by many researchers. Cox and Oakes (1984, ample of a subordinated process, one can con- Section 1.2, pages 3–4) pointed out that “often the sider a Poisson parent process that is directed by THRESHOLD REGRESSION MODELS 7

a gamma process (which has monotonic sample say αJ =1. The rj(t) also must satisfy the account- J paths). The result is a clustering Poisson process ing constraint j=1 rj(t)= t. Observe that (6) is a (Hougaard, Lee and Whitmore, 1997) in which deterministic transformationP for any given set of ex- an FHT can be triggered by the occurrence of a posure intervals rj(t) but that these intervals vary cluster of Poisson events. randomly from one worker to another according to 3. The running time scale may be a combination their individual work histories. of different accumulation measures. For exam- Some processes may be defined in terms of run- ple, Oakes (1995) and Kordonsky and Gertsbakh ning time from the outset. For example, in a Bernoulli (1997) look at multiple running time scales in sur- process or a Markov chain, the progress parameter vival data analysis. Duchesne and Lawless (2000) represents the sequence of trials or steps of the pro- and Duchesne and Rosenthal (2003) describe var- cesses, respectively. These parameters may already ious advances in running time models for survival be seen as reflecting a kind of running time. The data. The concept of collapsible time within the mapping of calendar time to this running time, as context of accelerated failure time models is cen- represented by r(t)= r, is already implicit in the tral to this earlier work. The basic idea appears in discrete progress parameter of the process. various forms. For example, a composite running The running time scale r(t) is included in the FHT time might be defined by model in order to make the model a more valid rep- J resentation of reality. With a correct specification (6) r(t)= αjrj(t), of running time, one would expect health status jX=1 or component strength to decline steadily and pre- dictably against the scale that measures the accu- where the rj(t) are different accumulation mea- sures that can advance degradation or disease mulating “wear and tear” of running time. In other words, X(r) would retain very little or no inher- progression and the αj are positive parameters that weight the contributions of the different mea- ent variability if r(t) could be chosen carefully. This situation describes an ideal that is unattainable in sures. One of the measures, say r1(t), may be cal- most practical applications of FHT models but is a endar time itself so r1(t)= t. One αj parameter will need to be set to unity to give a well-defined target of model building. scale. Typically, in this setup, composite running time has a fixed mathematical form for any given 8. INCORPORATING MARKER PROCESSES individual case but individuals will have different IN THRESHOLD REGRESSION scales because the rj(t) vary randomly among in- A marker process refers to an external process dividuals. As a simple example of a composite that covaries with the parent process. It assists in running time, consider the mechanical aging of tracking progress of the parent process if the par- a motor vehicle which may be related to both ent process is latent or only infrequently observed. the passage of calendar time r1(t)= t and accu- In this way, the marker process forms a basis for mulated mileage r2(t). In this case, (6) has two predictive inference about the status of the parent components, as follows: r(t)= t + αr2(t). Notice process and its progress toward an FHT. Marker α1 is set to 1 and α2 = α. processes may also be of scientific interest in their A practical case of the last kind of running time is own right. As markers of the parent process, they of- illustrated in Lee et al. (2004) where railroad work- fer potential insights into the causal forces that are ers are employed in different types of jobs, indexed generating the movements of the parent process. Ex- by j = 1,...,J, which have differential exposures amples of marker processes include CD4 cell count to diesel exhaust, an occupational risk. The run- for AIDS, blood pressure for cardiovascular disease, ning time (6) here is defined as a weighted sum of personal medical cost for health status, input drive different exposure intervals. The quantity rj(t) is current for a laser, and ambient temperature for the time spent by the worker in job type j during equipment. time interval [0,t]. The αj are positive weights that The basic analytical framework for a marker pro- determine the rates at which the running time ad- cess conceives of a bivariate stochastic process vances per unit of calendar time spent in the differ- X(r),Y (r) where the parent process X(r) is { } { } ent job types. One αj is set to unity as a numeraire, one component process and the marker process Y (r) { } 8 M.-L. T. LEE AND G. A. WHITMORE is the other. Both are assumed to be one-dimensional for convenience of exposition. They are also both defined on the running time scale r. We discuss the implications of this last point shortly. Whitmore, Crowder and Lawless (1998) look at failure infer- ence based on a bivariate Wiener model in which failure is governed by the FHT of a latent degra- dation process while auxiliary readings are available from a correlated marker process. As noted earlier, Lee, DeGruttola and Schoenfeld (2000) apply this bivariate marker model to CD4 cell counts in the context of AIDS survival. An application may offer a variety of marker proces- ses, say, Yk(r), k = 1,...,K , that may be of po- { } tential scientific value. They can be studied sepa- Fig. 1. This conceptual framework shows the connections rately or combined into a composite marker process. between the parent process (often a latent process), running For marker processes that involve measurements, time (RT) and an external marker process that is correlated the following additive form for the composite marker with the parent process. Time subordination links calendar might be appropriate: time (CT) to running time. The threshold regression struc- ture stands in the background and is not displayed explicitly K in the figure. (7) Y (r)= γ0 + γkYk(r). X k=1 process and boundary set) that defines the relevant The concept of a composite marker was first pro- endpoint, a running time scale and a marker process. posed by Whitmore, Crowder and Lawless (1998) The threshold regression (TR) structure stands in in an engineering context. The aim in constructing the background of the schematic in Figure 1 and the composite marker process is to find that lin- allows the parameters of the various components ear combination of the K candidate markers that in the figure to depend on baseline and other co- has the largest predictive correlation or association variates. Although the schematic shows only one with the parent process. The γk parameters define marker, it is clear that there may be many. The the linear combination and these generally must be framework in Figure 1 has several noteworthy fea- estimated. The approach is reminiscent of regres- tures. For example, if a marker process has mono- sion analysis with the composite marker serving as tonic sample paths, it may serve either as a marker a regression function to predict the parent process or as a running time r(t), as may be deemed appro- X(r) . Here γ0 serves as the intercept term of the priate by the investigator. The framework reminds { } regression relationship. If the composite marker can us that a marker process Y (r) should be defined mimic the parent process perfectly, then X(r) and on the running time scale r{when} its correlation with { } Y (r) will be perfectly correlated. An exact model the parent process X(r) is being considered. Thus, { } for the preceding setup is a (k + 1)-variate Wiener for example, if r(t){ measures} an individual’s cumu- diffusion process in which the parent process is one lative exposure to a potential carcinogen at time t component and the k markers are the remaining and the marker y is a serum measurement for the in- components. The conditional process X(r) Y (r)= dividual on a cancer-specific antigen at time t, then { | 1 y (r),...,YK(r)= yK(r) then defines an exact lin- the serum reading y should be recorded as a func- 1 } ear regression structure. Where one is dealing with tion of cumulative exposure r. In other words, the a parent process or marker processes that are not progress parameter of the serum marker process is measurement processes, such as Markov chains, the cumulative exposure r, not calendar time t. concept of a composite marker process must be re- We have said that the parent process is generally defined in a suitable manner. latent. This feature definitely is common in medical The FHT modeling framework has evolved in the applications where inherent health condition cannot literature to encompass three major components as be observed (and, indeed, may be deemed unmea- shown in Figure 1, namely, an FHT model (parent surable). Marker processes are surrogates for health THRESHOLD REGRESSION MODELS 9

(i) status, especially if they are highly correlated with ses Xi(t) and boundary sets . The individ- the underlying medical condition. These markers may ual{ processes} are often assumedB to be mutually range from biomedical measurement processes, such independent. as serum measurements, to more qualitative pro- 2. Where there are competing modes of failure, then cesses, such as periodic subjective evaluations of health the cause of failure d will be recorded for each status by a patient or caregiver. In engineering sys- individual. tems, there will be contexts in which wear and tear, 3. The final observation time tm for an individual is for example, can be observed and measured. In many a random stopping time if fm = 1. Thus, tm = S and xm = X(S) if fm = 1. Here X(S) is the physical settings, however, only surrogates for the ∈B system condition are available. For example, the drive threshold state realized by the individual at the current of a laser is a marker for its physical con- FHT. If fm = 0, then time tm is a right-censoring time for the FHT, that is, tm 1 for some individuals. 5. If the parent process is latent, then the data set sures of solvency). We also add that marker pro- will have no observations x , although there may cesses may be leading, lagging or coincident with j still be readings on the covariate vectors z . respect to the parent process and their phase will j 6. If the data set consists only of a single time t and be important in predictive inference for the parent failure indicator f for each individual, then the process and its FHT. data set constitutes censored survival data. With a baseline covariate vector z0 available, the data 9. DATA STRUCTURES FOR THRESHOLD provide a basis for censored survival threshold REGRESSION MODELS regression. The data structures of threshold regression studies 7. Let X(tj) be abbreviated Xj for any individual. vary widely. To be specific, we look at the case in The reading xj on the parent process, for jtj. The conditioning event is that the available for the parent process and the covariate | vector. In this case, the data structure for a single process has reached state xj at time tj without individual can be summarized as follows: experiencing an FHT. 8. Where X(t) is a Markov process (which is the Time points: most common{ } type of model), we have for any 0= t0 t1 tm, individual that Failure codes:≤ ≤···≤ − f = 0, f = 0,...,f − = 0, f = 0 or 1, P (Xj = xj xj 1,...,x0,S>tj) (8) 0 1 m 1 m | Readings on parent process: = P (Xj = xj Xj−1 = xj−1,S>tj) x ,x ,...,x , | 0 1 m for j

10. PARAMETER ESTIMATION AND P (X = ) = 1 exp( 2x µ/σ2). The c.d.f. corre- ∞ − − 0 INFERENCE sponding to (9) is F (r µ,σ2,x ) In applications to date, parameter estimation for | 0 FHT models and threshold regression have been heav- (µr + x ) (10) =Φ 0 ily dominated by maximum likelihood methods. The − √σ2r  reason is that the probabilistic specification of the 2 µr x0 parent stochastic process in FHT models is usually + exp( 2x0µ/σ )Φ − , −  √ 2  explicit and, hence, likelihood expressions follow as a σ r where Φ( ) is the c.d.f. of the standard normal dis- matter of course. The optimization required by this · estimation method may employ a variety of compu- tribution. tational techniques but gradient methods work very The health status process is latent here and, hence, can be given an arbitrary measurement unit. Thus, well. Extensions to Bayesian methods have been de- one parameter may be fixed. We set the variance pa- veloped in some cases. For example, Pettit and Young 2 rameter σ to unity. Both parameters µ and x0 are (1999) and Shubina (2005a, 2005b) have embedded linked to k regression covariates that are represented the Wiener diffusion FHT model in a Bayesian frame- by the row vector z = (1, z1,...,zk). The leading 1 work. Lee, DeGruttola and Schoenfeld (2000) have in z allows for a constant term in the regression re- developed some predictive inference results for this lationship. An identity function of the form case, in conjunction with a marker process. Nothing µ = zβ = β + β z + + βkzk seems to stand in the way of developing nonparamet- 0 1 1 · · · ric and semiparametric approaches for these models is used to link the parameter µ to the covariates and but these approaches have not yet been taken up in a logarithmic function the literature. ln(x0)= zγ = γ0 + γ1z1 + + γkzk · · · Case illustration. To illustrate the nature of in- is used to link the parameter x0 to the covariates. β ′ γ ′ ference for one of the simple threshold regression Here = (β0, β1,...,βk) and = (γ0, γ1,...,γk) , where β and γ are regression constants. Parame- settings, we now set up the sample log-likelihood 0 0 ters of the running time scale, such as α = (α ,...,α ) function for censored inverse Gaussian regression for 1 J in (6), may also be linked to covariates using link a medical application like that found in Lee et al. functions of appropriate form. (2004). We consider a latent health status process (i) We now denote µ and x0 for subject i by µ defined on a running time scale r. We let the par- (i) (i) and x0 . We let r denote the running time for sub- ent process be a Wiener diffusion process. The FHT ject i. Time r(i) is the running time at the moment for such a process follows an inverse Gaussian distri- of death for a dying subject and a right-censored bution. The inverse Gaussian distribution depends running time for the moment of death for a sur- on the mean and variance parameters of the un- viving subject. Hence, each dying subject i con- 2 derlying Wiener process (µ and σ ) and the initial tributes probability density f(r(i) µ(i),x(i)) to the health status level (x ). We let f(r µ,σ2,x ) and | 0 0 0 sample likelihood function, for i = 1,...,n1, and each F (r µ,σ2,x ) denote the probability| density func- | 0 surviving subject i contributes survival probability tion (p.d.f.) and cumulative distribution function F (r(i) µ(i),x(i)) = 1 F (r(i) µ(i),x(i)) to the sample (c.d.f.) of the FHT distribution, both defined in terms | 0 − | 0 likelihood function, for i = n1 + 1,...,n1 + n0. The of running time r. These functions have simple com- sum n = n1 + n0 is the total number of subjects. putational forms. For the case where the process be- The sample log-likelihood function to be maximized gins at x0 > 0 and the boundary is the zero level, the therefore has the form p.d.f. for the first hitting time is given by n1 ln L(α, β, γ)= ln f(r(i) µ(i),x(i)) 2 | 0 2 x0 (x0 + µr) Xi=1 f(r µ,σ ,x0)= exp | √ 2 3 − 2σ2r  (11) 2πσ r n1+n0 (9) (i) (i) (i) 2 + ln F (r µ ,x ). for <µ< ,σ > 0,x0 > 0. | 0 −∞ ∞ i=Xn1+1 If µ> 0, then the FHT is not certain to occur and Numerical gradient methods can be used to find the p.d.f. is improper. Specifically, in this case, maximum likelihood estimates for β, γ and α. THRESHOLD REGRESSION MODELS 11

11. THRESHOLD REGRESSION FOR Now we come to the crucial assumption. If it can be LONGITUDINAL DATA ANALYSIS assumed that Aj, j = 0, 1,... is a Markov process with initial state{ A , then (12}) can be simplified as Our discussion of data structures in Section 9 has 0 anticipated that longitudinal data are gathered on P (Am, Am−1,...,A1, A0) (13) the respective stochastic processes of individuals in m some applications. Using our previous notation, we = P (A0) P (Aj Aj−1). now let Aj denote the longitudinal observation Y | { } j=1 process, defined on the time points tj, j = 0, 1,.... If In other words, the probability of observing Aj de- the individual survives beyond time tj , then the fail- pends only on its preceding state Aj−1 and not on ure code fj =0 and Aj = S>tj,xj, zj for j m. { } ≤ the earlier history of the observation process. The If the individual fails in the final interval (tm− ,tm], 1 explicit forms of the probability elements on the then fm =1 and Am = S (tm− ,tm],xm . As 1 right-hand side of (13) are defined earlier, S is the{ stopping∈ time for the∈ B} longi- tudinal observation process. We note that zm is not P (Aj Aj−1) defined when the individual has failed and, hence, is | (14) = P (S>tj,xj, zj S>tj−1,xj−1, zj−1) dropped from the expression for Am. Moreover, the | final reading xm for the parent process lies inside if fj = 0, j m, the boundary set when the individual has failed. ≤ B P (Am Am−1) Longitudinal data of this kind pose an interesting | challenge for first-hitting-time models, as for most (15) = P (S (tm−1,tm],xm ∈ ∈ B| time-to-event models. Lu (1995) considers the prob- S>tm−1,xm−1, zm−1) if fm = 1. lem for the basic Wiener model where longitudinal observations are made on the process X(t) up to If no observations are available on the parent pro- { } the hitting or censoring time, as the case may be. cess, then xj is dropped from the Aj notation, giving She formulates the likelihood function and computes Aj = S>tj, zj if fj = 0, j m, and Am = S { } ≤ { ∈ exact maximum likelihood estimates. The method- (tm−1,tm] if fm = 1. Again, invoking the Markov } ology is somewhat intricate but manageable. Lee, assumption for the observation process, (14) and DeGruttola and Schoenfeld (2000) consider the is- (15) take the revised forms sue of modeling longitudinal data for a bivariate P (Aj Aj− ) Wiener model representing a latent health status | 1 z z process and a correlated marker process. These au- (16) = P (S>tj, j S>tj−1, j−1) | thors mention an interesting approach to handling if fj =0 for j m, longitudinal data which they anticipated would be ≤ P (Am Am− ) technically satisfactory and practical to implement. | 1 Their suggested approach, however, is not elabo- (17) = P (S (tm−1,tm] S>tm−1, zm−1) rated in their article, so we sketch one direction of ∈ | development below but leave a full exploration of if fm = 1. the approach as an open research question. We refer Statement (13) is the theoretical justification for the to this method as an uncoupling procedure because uncoupling procedure. Neither this theoretical devel- it effectively unlinks the longitudinal observations opment nor issues of practical implementation of the into a set of independent conditional observations. procedure were taken up by Lee, DeGruttola and With the preceding notation, the probability of Schoenfeld (2000). As already noted, the procedure observing the longitudinal data record of an indi- remains an open topic for future research. vidual can be expanded as a product of conditional probabilities as 12. MODEL VALIDATION, DIAGNOSTICS AND REMEDIES P (Am, Am−1,...,A1, A0) (12) m Although procedures for model validation, diag- nostics and remedies are not as well developed for = P (A0) P (Aj Aj−1,...,A0). jY=1 | threshold regression as for conventional regression 12 M.-L. T. LEE AND G. A. WHITMORE models for survival data, a number of techniques would be made by comparing and contrasting the re- have been proposed and applied successfully in ear- sults of Cox regression and TR in the same context. lier FHT investigations. For example, procedures are Some public sets of survival data that are scientifi- available for checking the assumptions of the TR re- cally important and have a plausible FHT interpre- gression model having a Wiener process and inverse tation might very well be reanalyzed to see if the key Gaussian FHT, both with and without associated research conclusions are materially affected when a marker processes. Lee, DeGruttola and Schoenfeld TR model is used in place of a more conventional (2000) present some procedures for this TR model technique. and demonstrate the techniques using a medical case Both parent and marker processes may be sub- application. Lee and Whitmore (2002) present a ject to measurement error. For example, blood pres- larger suite of techniques for checking assumptions sure is known to be measured with error. Whitmore of this model and also discuss a number of reme- (1995), for example, studied a Wiener diffusion FHT dies that might be used where assumptions do not model with measurement error. The true state of hold. Lee et al. (2004) also discuss validation for an a process is also often randomly masked. The in- extension of this same model in which the calen- corporation of measurement or masking errors in TR models, where these extensions are motivated dar time scale is replaced by a job-exposure disease by significant applications, would represent a useful progression scale. One of the proposed validation research extension. procedures relies on the fact that the inverse Gaus- The identification of individual marker processes sian (IG) distribution is the first-stopping-time dis- and the construction of composite marker processes tribution of a Wiener process. Hence, comparisons to track or mimic a latent parent process are chal- can be made between Kaplan–Meier (KM) survival lenging subjects that need further theoretical work curves and the IG survival curves implied by the and more experience with real applications. The chal- model (for different covariate subgroups). Applica- lenge will be especially great where the marker pro- tions with longitudinal observations on the parent cesses and latent processes are from different classes process or marker measurements offer even more of processes. A related open research issue concerns data for model validation. The previous work also the investigation of whether markers are leading, points out the importance of having subject-matter lagging or coincident with the parent process. specialists understand the model features and com- Nonparametric, semiparametric and other robust pare them with the fundamental physical processes estimation methods seem to have much to contribute at play. For example, the concept of an FHT is one to the successful application of threshold regression. feature whose mechanism is found frequently in na- Quasi- likelihood methods and generalized estimat- ture, is easily understood by scientists and can be ing equations may offer feasible approaches. As thresh- checked against their scientific understanding of the old regression estimation in a general setting in- application context. volves parameter estimation for the boundary set, the parent process and the running time scale, it 13. SOME OPEN RESEARCH PROBLEMS is conceivable that a blend of nonparametric and parametric methods may be effective in some appli- Many interesting aspects of threshold regression cations. For example, nonparametric estimation of require further study. We noted earlier that multi- running time parameters might be combined with collinearity of parameter estimates can be a prac- parametric estimation of the parent process. tical issue. It remains to be seen which parame- Our discussion of the analysis of longitudinal data terizations of threshold regression models tend to in the context of threshold regression has already have relatively independent estimation errors. Mul- pointed out that a full theoretical development and ticollinearity within regression functions will tend to justification of the uncoupling method remains an show itself in familiar ways and will likely be dealt open research issue. In the same vein, practical ex- with by conventional remedies. perience with this method or other methods for han- The Cox proportional hazards regression model dling longitudinal data in threshold regression will is widely used for survival data analysis. Thresh- be valuable contributions. old regression models do not generally possess the Much remains to be done on model validation and proportional hazards feature for different configu- diagnostic techniques in the context of threshold re- rations of covariates. A useful research contribution gression. These tools are likely to be developed as THRESHOLD REGRESSION MODELS 13 threshold regression is applied in a broader range of Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data. practical cases. The earlier work on model valida- Chapman and Hall, London. MR0751780 tion has been largely restricted to the Wiener FHT Crowder, M. J. (2001). Classical Competing Risks. Chap- man and Hall/CRC, Boca Raton, FL. model and thus extensions to other FHT models Doksum, K. and Hoyland,´ A. (1992). Models for variable- need attention. For example, comparisons of Kaplan– stress accelerated life testing experiments based on Wiener Meyer (KM) survival curves with fitted TR survival processes and the inverse Gaussian distribution. Techno- curves, both defined on running time scales, will re- metrics 34 74–82. quire new methods that take account of the fact that Doksum, K. A. and Normand, S.-L. (1995). Gaussian mod- the running time scale is itself fitted by a statisti- els for degradation processes. I. Methods for the anal- ysis of biomarker data. Lifetime Data Anal. 1 131–144. cal model. As another example, TR models assume MR1353845 that particular functions link the model parameters Duchesne, T. and Lawless, J. (2000). Alternative time to the regression covariates. Both the forms of the scales and failure time models. Lifetime Data Anal. 6 157– link functions and the adequacy of the regression 179. MR1766199 functions must be validated. Whether the correct di- Duchesne, T. and Rosenthal, J. S. (2003). On the collapsi- bility of lifetime regression models. Adv. in Appl. Probab. recting process has been chosen is also a feature that 35 755–772. MR1990613 must be checked by model validation techniques. Al- Eaton, W. W. and Whitmore, G. A. (1977). Length of stay though model validity is likely to be established by as a stochastic process: A general approach and application standard techniques (such as cross-validation), new to hospitalization for schizophrenia. J. Math. Sociology 5 techniques and modifications of conventional meth- 273–292. Hazelton, W. D., Luebeck, E. G., Heidenreich, W. F. ods will surely be needed. In addition to using sta- and Moolgavkar, S. H. (2001). Analysis of a historical tistical methods for model verification, it is desirable cohort of Chinese tin miners with arsenic, radon, cigarette to work closely with subject-matter specialists to en- smoke, and pipe smoke exposures using the biologically sure that the FHT models have realistic features and based two-stage clonal expansion model. Radiation Re- that the findings emerging from the analysis make search 156 78–94. practical sense. Horrocks, J. C. and Thompson, M. E. (2004). Modelling event times with multiple outcomes using the Wiener pro- The last sentence of the preceding paragraph hints cess with drift. Lifetime Data Anal. 10 29–49. MR2058573 at the largest open research question. Threshold re- Hougaard, P., Lee, M.-L. T. and Whitmore, G. A. gression will prove itself through beneficial practical (1997). Analysis of overdispersed count data by mixtures application. With exploration of fresh application of Poisson variables and Poisson processes. Biometrics 53 areas will come ideas for better methods and mod- 1225–1238. MR1614370 Kalbfleisch, J. D. and Prentice, R. L. (1980). The Sta- els for this new type of regression approach. tistical Analysis of Failure Time Data. Wiley, New York. MR0570114 ACKNOWLEDGMENTS Kalbfleisch, J. D. and Prentice, R. L. (2002). The Sta- tistical Analysis of Failure Time Data, 2nd ed. Wiley, New This research was supported in part by NIH Grants York. MR1924807 OH008649 and HL40619 (Lee) and by a research Kordonsky, K. B. and Gertsbakh, I. (1997). Multiple time grant from the Natural Sciences and Engineering scales and the lifetime coefficient of variation: Engineering 3 Research Council of Canada (Whitmore). applications. Lifetime Data Anal. 139–156. Lancaster, T. (1972). A stochastic model for the duration of a strike. J. Roy. Statist. Soc. Ser. A 135 257–271. REFERENCES Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data, 2nd ed. Wiley, New York. MR1940115 Aalen, O. O. and Gjessing, H. K. (2001). Understand- Lawless, J. and Crowder, M. (2004). Covariates and ran- ing the shape of the hazard rate: A process point of view. dom effects in a gamma process model with application to Statist. Sci. 16 1–22. MR1838599 degradation and failure. Lifetime Data Anal. 10 213–227. Aalen, O. O. and Gjessing, H. K. (2004). Survival models MR2086957 based on the Ornstein–Uhlenbeck process. Lifetime Data Lee, M.-L. T., DeGruttola, V. and Schoenfeld, D. Anal. 10 407–423. MR2125423 (2000). A model for markers and latent health status. J. R. Bagdonavicius,ˇ V. and Nikulin, M. (2001). Estimation in Stat. Soc. Ser. B Stat. Methodol. 62 747–762. MR1796289 degradation models with explanatory variables. Lifetime Lee, M.-L. T. and Whitmore, G. A. (1993). Stochastic pro- Data Anal. 7 85–103. MR1819926 cesses directed by randomized time. J. Appl. Probab. 30 Carey, M. B. and Koenig, R. H. (1991). Reliability as- 302–314. MR1212663 sessment based on accelerated degradation: A case study. Lee, M.-L. T. and Whitmore, G. A. (2002). Assump- IEEE Transactions on Reliability 40 499–506. tions of a latent survival model. In Goodness-of-Fit Tests 14 M.-L. T. LEE AND G. A. WHITMORE

and Model Validity (C. Huber-Carol, N. Balakrishnan, M. Pettit, L. I. and Young, K. D. S. (1999). Bayesian analysis S. Nikulin and M. Mesbah, eds.) 227–235. Birkh¨auser, for inverse Gaussian lifetime data with measures of degra- Boston. MR1901838 dation. J. Stat. Comput. Simul. 63 217–234. MR1703821 Lee M.-L. T. and Whitmore, G. A. (2004). First hitting Ricciardi, L. M. and Sato, S. (1988). First-passage-time time models for lifetime data. In Advances in Survival density and moments of the Ornstein–Uhlenbeck process. Analysis (C. R. Rao and N. Balakrishnan, eds.) 537–543. J. Appl. Probab. 25 43–57. MR0929503 North-Holland, Amsterdam. MR2065787 Shubina, M. (2005a). Bayesian analysis for markers and Lee, M.-L. T., Whitmore, G. A., Laden, F., Hart, J. degradation. Ph.D. dissertation, Harvard School of Pub- E. and Garshick, E. (2004). Assessing lung cancer risk lic Health. in rail workers using a first hitting time regression model. Shubina, M. (2005b). Threshold models with markers mea- Environmetrics 15 501–512. sured before observed event times. Ph.D. dissertation, Har- Lu, C. J. and Meeker, W. Q. (1993). Using degradation vard School of Public Health. measures to estimate a time-to-failure distribution. Tech- Singpurwalla, N. D. (1995). Survival in dynamic environ- nometrics 35 161–174. MR1225093 ments. Statist. Sci. 1 86–103. Lu, J. (1995). A reliability model based on degradation and Whitmore, G. A. (1975). The inverse Gaussian distribution lifetime data. Ph.D. dissertation, McGill Univ. as a model of hospital stay. Health Services Research 10 Luebeck, E. G., Heidenreich, W. F., Hazelton, W. D., 297–302. Paretzke, H. G. and Moolgavkar, S. H. (1999). Biolog- Whitmore, G. A. (1979). An inverse Gaussian model for ically based analysis of the data for the Colorado uranium labour turnover. J. Roy. Statist. Soc. Ser. A 142 468–478. miners cohort: Age, dose and dose-rate effects. Radiation Whitmore, G. A. (1983). A regression method for cen- Research 152 339–351. sored inverse-Gaussian data. Canad. J. Statist. 11 305–315. Moolgavkar, S. H., Luebeck, E. G. and Anderson, E. MR0732859 L. (1998). Estimation of unit risk for coke oven emissions. Whitmore, G. A. (1986). First-passage-time models for du- Risk Analysis 18 813–825. ration data: Regression structures and competing risks. Oakes, D. (1995). Multiple time scales in survival analysis. The Statistician 35 207–219. Lifetime Data Anal. 1 7–18. Whitmore, G. A. (1995). Estimating degradation by a Onar, A. and Padgett, W. J. (2000). Inverse Gaussian ac- Wiener diffusion process subject to measurement error. celerated test models based on cumulative damage. J. Stat. Lifetime Data Anal. 1 307–319. Comput. Simul. 66 233–247. MR1807537 Whitmore, G. A., Crowder, M. J. and Lawless, J. F. Padgett, W. J. and Tomlinson, M. A. (2004). Inference (1998). Failure inference from a marker process based on a from accelerated degradation and failure data based on bivariate Wiener model. Lifetime Data Anal. 4 229–251. Gaussian process models. Lifetime Data Anal. 10 191–206. Whitmore, G. A. and Neufeldt, A. H. (1970). An appli- MR2081721 cation of statistical models in mental health research. Bull. Park, C. and Padgett, W. J. (2005). Accelerated degrada- Math. Biophys. 32 563–579. tion models for failure based on geometric Brownian mo- Whitmore, G. A. and Schenkelberg, F. (1997). Modelling tion and gamma processes. Lifetime Data Anal. 11 511– accelerated degradation data using Wiener diffusion with 527. MR2213502 a time scale transformation. Lifetime Data Anal. 3 27–45.