
AMS 588 – Failure and Survival Data Analysis Instructor: Prof. Wei Zhu 1. Introduction Some Basics • Binomial distribution The Binomial distribution is based on the idea of a Bernoulli trial, which is an experiment with two, and only two, possible outcomes. 푌1,…푌푛 iid Bernoulli(p), 푝(푌 = 1) = 휋 = 1 − p(푌 = 0) 푌 = 1 is often termed as a “success” and 휋 is referred to as the success probability. 퐸 푌 = 휋; 푉푎푟 푌 = 휋(1 − 휋) The total number of “success” in 푛 trails follows a Binomial distribution: trials pmf: for i=0,1,…,n Binomial Theorem: For any real numbers푛and 푦 and integer 푛 ≥ 0, 푛 푛 (푥 + 푦)푛= 푥푦푛− =0 Some Basics • Hypergeometric distribution Suppose we have a large urn filled with 푁 balls that are identical in every way except that 푀 are red and N − M are green. We reach in, blindfolded, and select 퐾 balls at random. The number of red balls 푋 in a sample of size 퐾 follows a Hypergeometric 푀 푁−푀 distribution: 푥 퐾−푥 푃 푋 = 푥 푁, 푀, 퐾 = 푁 , 푥 = 0,1, … , 퐾 퐾푀 퐾 E X = μ = ; 푁 퐾푀 (푁 − 푀)(푁 − 퐾) 푉푎푟(푋) = 휎2 = 푁 푁(푁 − 1) Example: Let 푋1 and 푋2 be independent observations with 푋1~푏푛표푚푎푙 푛1, 푝1 and 푋2~푏푛표푚푎푙 푛2, 푝2 . Consider testing 퐻0: 푝1 = 푝2 푣푠 퐻1: 푝1 > 푝2. It is easy to show that under 퐻0: 푝1 = 푝2 = 푝, 푋 = 푋1 + 푋2is a sufficient statistics for 푝, and 푋1|푋 = 푥~ℎ푦푝푒푟푔푒표푚푒푡푟푐(푛1 + 푛2, 푛1, 푥) -- Fisher’s exact test The data can also be formulated into a 2x2 table Under 퐻0: 푝1 = 푝2 x1 n1-x1 n1 푛1 푛2 푥 푥−푥 x2 n2-x2 n2 1 1 (hypergeometric) 푓 푥1 푛1, 푛2, 푥 = 푛 x n-x n 푥 Some Basics • Iterative expectation & variance If 푋 and 푌 are any two random variables, then 퐸푋 = 퐸 퐸(푋|푌) 푉푎푟 푋 = 퐸 푉푎푟 푋 푌 + 푉푎푟 퐸 푋|푌 Provided that the expectations exist Example: 푋|푃~binomial(푛, 푃) and 푃~beta(α, β) α 퐸푋 = 퐸 퐸(푋|푃) = 퐸 푛푃 = 푛 α + β 푉푎푟 푋 = 퐸 푉푎푟 푋 푃 + 푉푎푟 퐸 푋|푃 = 퐸 푛푃(1 − 푃) + 푉푎푟 푛푃 αβ αβ = 푛 + 푛2 α + β α + β + 1 α + β 2 α + β + 1 αβ α + β + 푛 = 푛 α + β 2 α + β + 1 Some Basics Central Limit Theorem: n Delta method: Survival Analysis In many biomedical applications the primary endpoint of interest is time to a certain event, such as time to death; time it takes for a patient to respond to a therapy; time from response until disease relapse (i.e., disease returns); time to eradicate an infection after treatment with antibiotics; etc. We may be interested in • characterizing the distribution of “time to event” for a given population ; • comparing this “time to event” among different groups (e.g., treatment vs. control in a clinical trial or an observational study); • modeling the relationship of “time to event” to other covariates (sometimes called prognostic factors or predictors). Difficulties: • Censoring: Since the data are collected over a finite period of time, the “time to event” may not be observed for all the individuals in our study population (sample). This results in what is called censored data. • Differential follow-up: It is also common that the amount of follow-up for the individuals in a sample vary from subject to subject. The standard statistical methods cannot properly handle these difficulties in the analysis of the “time to event” data. A new research area in statistics has emerged which is called Survival Analysis or Censored Survival Analysis. “Time To Event” Let the random variable T denote time to the event of our interest. Of course, T is a positive random variable which has to be unambiguously defined; that is, we must be very specific about the start and end with the length of the time period in-between corresponding to T. EX: Survival time of a treatment for a population with certain disease: measured from the time of treatment initiation until death. cdf of T: 퐹 푡 = 푃 푇 ≤ 푡 , 푡 ≥ 0 Right continuous: lim 퐹(푢) = 퐹 푡 푢→푡+ When 푇 is a survival time, 퐹 푡 is the probability that a randomly selected subject from the population will die before time 푡. 푑퐹 푡 푡 pdf of T: 푓 푡 = , 퐹 푡 = 푓 푢 푑푢 푑푡 0 In biomedical applications, it is often common to use the survival function 푆 푡 = 푃 푇 ≥ 푡 = 1 − 퐹(푡−) 푤ℎ푒푟푒, 퐹 푡− = lim 퐹(푢) 푢→푡− When T is a survival time, S(t) is the probability that a randomly selected individual will survive to time t or beyond. Note: Sometimes, a survival function may be defined as 푆 푡 = 푃 푇 > 푡 = 1 − 퐹 푡 . This definition will be identical to the above one if T is a continuous random variable, which is the case we will focus on in this course. 푆 푡 • Non-increasing; • 푆 0 = 1 and 푆 ∞ = 0 for a proper random variable, which means that everyone will eventually experience the event, e.g. death. • However, we may also allow the possibility that 푆 ∞ > 0. This corresponds to a situation where there is a positive probability of not “dying” or not experiencing the event. For example, if the event of interest is the time from response until disease relapse and the disease has a cure for some proportion of individuals in the population, then we have 푆 ∞ > 0, where 푆 ∞ corresponds to the proportion of cured individuals. • If 푇 is continuous r.v., ∞ 푑푆 푡 푆 푡 = න 푓 푢 푑푢 , 푓 푡 = − 푡 푑푡 That is, there is a one-to-one correspondence between 푓 푡 and 푆 푡 . Here f(t) is the usual probability density function (pdf), where f(t) = dF(t)/dt Definitions Mean Survival Time: μ = 퐸(푇). Due to censoring, sample mean of observed survival times is no longer an unbiased estimate of μ = 퐸(푇). If we can estimate 푆 푡 well, then we can estimate μ = 퐸(푇) using the following fact: ∞ ∞ ∞ 퐸 푇 = න 푡푑퐹 푡 = − න 푡푑푆 푡 = න 푆 푡 푑푡 0 0 0 Median Survival Time: Median survival time 푚 is defined as the quantity 푚 satisfying 푆 푚 = 0.5. Sometimes denoted by 푡0.5. If 푆 푡 is not strictly decreasing, 푚 is the smallest one such that 푆 푚 ≤ 0.5. pth quantile of Survival Time (100pth percentile): 푡푝 such that 푆 푡푝 = 1 − 푝 0 < 푝 < 1 . If 푆 푡 is not strictly decreasing, 푡푝 is the smallest one such that 푆 푡푝 ≤ 1 − 푝 Mean Residual Life Time(푚푟푙): 푚푟푙 푡0 = 퐸[푇 − 푡0|푇 ≥ 푡0] i.e., 푚푟푙 푡0 = average remaining survival time given the population has survived beyond 푡0. It can be shown that ∞ 푆 푡 푑푡 푡0 푚푟푙 푡0 = 푆 푡0 EX: Figure 1.1: The survival function for a hypothetical population In this population: • 70% of the individuals will survive at least 2 years (i.e., 푡0.3 = 2) and • the median survival time is 2.8 years (i.e., 50% of the population will survive at least 2.8 years). is stochastically larger than Figure 1.2: We say that the survival distribution for group 1 is stochastically larger than the survival distribution for group 2 if 푆1 푡 ≥ 푆2 푡 , for all 푡 ≥ 0, where 푆 푡 is the survival function for group . If 푇 is the corresponding survival time for groups , we also say that 푇1 is stochastically (not deterministically) larger than 푇2. Note that 푇1 being stochastically larger than 푇1 does NOT necessarily imply that 푇1 ≥ 푇2. Mortality Rate The mortality rate at time 푡, where 푡 is generally taken to be an integer in terms of some unit of time (e.g., years, months, days, etc), is the proportion of the population who fail (die) between times 푡 and 푡 + 1among individuals alive at time 푡, , i.e., 푚 푡 = 푃[푡 ≤ 푇 < 푡 + 1|푇 ≥ 푡] Figure 1.3: A typical mortality pattern for human Hazard Rate The hazard rate is the limit of a mortality rate if the interval of time is taken to be small (rather than one unit). The hazard rate is the instantaneous rate of failure (experiencing the event) at time 푡 given that an individual is alive at time 푡. 푃[푡 ≤ 푇 < 푡 + ℎ|푇 ≥ 푡] 휆 푡 = lim ℎ→0 ℎ 푃[푡 ≤ 푇 < 푡 + ℎ] lim ℎ = ℎ→0 푃[푇 ≥ 푡] 푓(푡) 푆′ 푡 푑푙표푔{푆 푡 } = = − = − 푆(푡) 푆 푡 푑푡 • A useful way of describing the distribution of “time to event” because it has a natural interpretation that relates to the aging of a population; • Very popular in biomedical community. • If ℎ is very small, we have 푃 푡 ≤ 푇 < 푡 + ℎ 푇 ≥ 푡 ≈휆 푡 ℎ 푡 Λ 푡 = 0 휆 푢 푑푢 = −푙표푔{푆 푡 }, where Λ 푡 is referred to as the cumulative • hazard function. Here we used the fact that S(0) = 1. Hence, 푡 휆 푢 푑푢 − 푆 푡 = 푒−Λ 푡 = 푒 0 Note: 1. There is a one-to-one relationship between hazard rate 휆 푡 , 푡 ≥ 0, and survival function 푆 푡 , namely, 푡 { 휆 푢 푑푢 푑푙표푔{푆 푡 − 푆 푡 = 푒 0 , 휆 푡 = − 푑푡 2. The hazard rate is NOT a probability, but a probability rate. Therefore it is possible that a hazard rate can exceed one in the same fashion as a density function푓 푡 may exceed one.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages19 Page
-
File Size-