Median Survival Time: Median Survival Time 푚 Is Defined As the Quantity 푚 Satisfying 푆 푚 = 0.5

AMS 588 – Failure and Survival Data Analysis Instructor: Prof. Wei Zhu 1. Introduction Some Basics • Binomial distribution The Binomial distribution is based on the idea of a Bernoulli trial, which is an experiment with two, and only two, possible outcomes. 푌1,…푌푛 iid Bernoulli(p), 푝(푌 = 1) = 휋 = 1 − p(푌 = 0) 푌 = 1 is often termed as a “success” and 휋 is referred to as the success probability. 퐸 푌 = 휋; 푉푎푟 푌 = 휋(1 − 휋) The total number of “success” in 푛 trails follows a Binomial distribution: trials pmf: for i=0,1,…,n Binomial Theorem: For any real numbers푛and 푦 and integer 푛 ≥ 0, 푛 푛 (푥 + 푦)푛= ෍ 푥푦푛− =0 Some Basics • Hypergeometric distribution Suppose we have a large urn filled with 푁 balls that are identical in every way except that 푀 are red and N − M are green. We reach in, blindfolded, and select 퐾 balls at random. The number of red balls 푋 in a sample of size 퐾 follows a Hypergeometric 푀 푁−푀 distribution: 푥 퐾−푥 푃 푋 = 푥 푁, 푀, 퐾 = 푁 , 푥 = 0,1, … , 퐾 퐾푀 퐾 E X = μ = ; 푁 퐾푀 (푁 − 푀)(푁 − 퐾) 푉푎푟(푋) = 휎2 = 푁 푁(푁 − 1) Example: Let 푋1 and 푋2 be independent observations with 푋1~푏푛표푚푎푙 푛1, 푝1 and 푋2~푏푛표푚푎푙 푛2, 푝2 . Consider testing 퐻0: 푝1 = 푝2 푣푠 퐻1: 푝1 > 푝2. It is easy to show that under 퐻0: 푝1 = 푝2 = 푝, 푋 = 푋1 + 푋2is a sufficient statistics for 푝, and 푋1|푋 = 푥~ℎ푦푝푒푟푔푒표푚푒푡푟푐(푛1 + 푛2, 푛1, 푥) -- Fisher’s exact test The data can also be formulated into a 2x2 table Under 퐻0: 푝1 = 푝2 x1 n1-x1 n1 푛1 푛2 푥 푥−푥 x2 n2-x2 n2 1 1 (hypergeometric) 푓 푥1 푛1, 푛2, 푥 = 푛 x n-x n 푥 Some Basics • Iterative expectation & variance If 푋 and 푌 are any two random variables, then 퐸푋 = 퐸 퐸(푋|푌) 푉푎푟 푋 = 퐸 푉푎푟 푋 푌 + 푉푎푟 퐸 푋|푌 Provided that the expectations exist Example: 푋|푃~binomial(푛, 푃) and 푃~beta(α, β) α 퐸푋 = 퐸 퐸(푋|푃) = 퐸 푛푃 = 푛 α + β 푉푎푟 푋 = 퐸 푉푎푟 푋 푃 + 푉푎푟 퐸 푋|푃 = 퐸 푛푃(1 − 푃) + 푉푎푟 푛푃 αβ αβ = 푛 + 푛2 α + β α + β + 1 α + β 2 α + β + 1 αβ α + β + 푛 = 푛 α + β 2 α + β + 1 Some Basics Central Limit Theorem: n Delta method: Survival Analysis In many biomedical applications the primary endpoint of interest is time to a certain event, such as time to death; time it takes for a patient to respond to a therapy; time from response until disease relapse (i.e., disease returns); time to eradicate an infection after treatment with antibiotics; etc. We may be interested in • characterizing the distribution of “time to event” for a given population ; • comparing this “time to event” among different groups (e.g., treatment vs. control in a clinical trial or an observational study); • modeling the relationship of “time to event” to other covariates (sometimes called prognostic factors or predictors). Difficulties: • Censoring: Since the data are collected over a finite period of time, the “time to event” may not be observed for all the individuals in our study population (sample). This results in what is called censored data. • Differential follow-up: It is also common that the amount of follow-up for the individuals in a sample vary from subject to subject. The standard statistical methods cannot properly handle these difficulties in the analysis of the “time to event” data. A new research area in statistics has emerged which is called Survival Analysis or Censored Survival Analysis. “Time To Event” Let the random variable T denote time to the event of our interest. Of course, T is a positive random variable which has to be unambiguously defined; that is, we must be very specific about the start and end with the length of the time period in-between corresponding to T. EX: Survival time of a treatment for a population with certain disease: measured from the time of treatment initiation until death. cdf of T: 퐹 푡 = 푃 푇 ≤ 푡 , 푡 ≥ 0 Right continuous: lim 퐹(푢) = 퐹 푡 푢→푡+ When 푇 is a survival time, 퐹 푡 is the probability that a randomly selected subject from the population will die before time 푡. 푑퐹 푡 푡 pdf of T: 푓 푡 = , 퐹 푡 = ׬ 푓 푢 푑푢 푑푡 0 In biomedical applications, it is often common to use the survival function 푆 푡 = 푃 푇 ≥ 푡 = 1 − 퐹(푡−) 푤ℎ푒푟푒, 퐹 푡− = lim 퐹(푢) 푢→푡− When T is a survival time, S(t) is the probability that a randomly selected individual will survive to time t or beyond. Note: Sometimes, a survival function may be defined as 푆 푡 = 푃 푇 > 푡 = 1 − 퐹 푡 . This definition will be identical to the above one if T is a continuous random variable, which is the case we will focus on in this course. 푆 푡 • Non-increasing; • 푆 0 = 1 and 푆 ∞ = 0 for a proper random variable, which means that everyone will eventually experience the event, e.g. death. • However, we may also allow the possibility that 푆 ∞ > 0. This corresponds to a situation where there is a positive probability of not “dying” or not experiencing the event. For example, if the event of interest is the time from response until disease relapse and the disease has a cure for some proportion of individuals in the population, then we have 푆 ∞ > 0, where 푆 ∞ corresponds to the proportion of cured individuals. • If 푇 is continuous r.v., ∞ 푑푆 푡 푆 푡 = න 푓 푢 푑푢 , 푓 푡 = − 푡 푑푡 That is, there is a one-to-one correspondence between 푓 푡 and 푆 푡 . Here f(t) is the usual probability density function (pdf), where f(t) = dF(t)/dt Definitions Mean Survival Time: μ = 퐸(푇). Due to censoring, sample mean of observed survival times is no longer an unbiased estimate of μ = 퐸(푇). If we can estimate 푆 푡 well, then we can estimate μ = 퐸(푇) using the following fact: ∞ ∞ ∞ 퐸 푇 = න 푡푑퐹 푡 = − න 푡푑푆 푡 = න 푆 푡 푑푡 0 0 0 Median Survival Time: Median survival time 푚 is defined as the quantity 푚 satisfying 푆 푚 = 0.5. Sometimes denoted by 푡0.5. If 푆 푡 is not strictly decreasing, 푚 is the smallest one such that 푆 푚 ≤ 0.5. pth quantile of Survival Time (100pth percentile): 푡푝 such that 푆 푡푝 = 1 − 푝 0 < 푝 < 1 . If 푆 푡 is not strictly decreasing, 푡푝 is the smallest one such that 푆 푡푝 ≤ 1 − 푝 Mean Residual Life Time(푚푟푙): 푚푟푙 푡0 = 퐸[푇 − 푡0|푇 ≥ 푡0] i.e., 푚푟푙 푡0 = average remaining survival time given the population has survived beyond 푡0. It can be shown that ∞ ׬ 푆 푡 푑푡 푡0 푚푟푙 푡0 = 푆 푡0 EX: Figure 1.1: The survival function for a hypothetical population In this population: • 70% of the individuals will survive at least 2 years (i.e., 푡0.3 = 2) and • the median survival time is 2.8 years (i.e., 50% of the population will survive at least 2.8 years). is stochastically larger than Figure 1.2: We say that the survival distribution for group 1 is stochastically larger than the survival distribution for group 2 if 푆1 푡 ≥ 푆2 푡 , for all 푡 ≥ 0, where 푆 푡 is the survival function for group . If 푇 is the corresponding survival time for groups , we also say that 푇1 is stochastically (not deterministically) larger than 푇2. Note that 푇1 being stochastically larger than 푇1 does NOT necessarily imply that 푇1 ≥ 푇2. Mortality Rate The mortality rate at time 푡, where 푡 is generally taken to be an integer in terms of some unit of time (e.g., years, months, days, etc), is the proportion of the population who fail (die) between times 푡 and 푡 + 1among individuals alive at time 푡, , i.e., 푚 푡 = 푃[푡 ≤ 푇 < 푡 + 1|푇 ≥ 푡] Figure 1.3: A typical mortality pattern for human Hazard Rate The hazard rate is the limit of a mortality rate if the interval of time is taken to be small (rather than one unit). The hazard rate is the instantaneous rate of failure (experiencing the event) at time 푡 given that an individual is alive at time 푡. 푃[푡 ≤ 푇 < 푡 + ℎ|푇 ≥ 푡] 휆 푡 = lim ℎ→0 ℎ 푃[푡 ≤ 푇 < 푡 + ℎ] lim ℎ = ℎ→0 푃[푇 ≥ 푡] 푓(푡) 푆′ 푡 푑푙표푔{푆 푡 } = = − = − 푆(푡) 푆 푡 푑푡 • A useful way of describing the distribution of “time to event” because it has a natural interpretation that relates to the aging of a population; • Very popular in biomedical community. • If ℎ is very small, we have 푃 푡 ≤ 푇 < 푡 + ℎ 푇 ≥ 푡 ≈휆 푡 ℎ 푡 Λ 푡 = ׬0 휆 푢 푑푢 = −푙표푔{푆 푡 }, where Λ 푡 is referred to as the cumulative • hazard function. Here we used the fact that S(0) = 1. Hence, 푡 ׬ 휆 푢 푑푢 − 푆 푡 = 푒−Λ 푡 = 푒 0 Note: 1. There is a one-to-one relationship between hazard rate 휆 푡 , 푡 ≥ 0, and survival function 푆 푡 , namely, 푡 { ׬ 휆 푢 푑푢 푑푙표푔{푆 푡 − 푆 푡 = 푒 0 , 휆 푡 = − 푑푡 2. The hazard rate is NOT a probability, but a probability rate. Therefore it is possible that a hazard rate can exceed one in the same fashion as a density function푓 푡 may exceed one.

Median Survival Time: Median Survival Time 푚 Is Defined As the Quantity 푚 Satisfying 푆 푚 = 0.5

Summary Notes for Survival Analysis

SOLUTION for HOMEWORK 1, STAT 3372 Welcome to Your First

Study of Generalized Lomax Distribution and Change Point Problem

Review of Multivariate Survival Data

Survival and Reliability Analysis with an Epsilon-Positive Family of Distributions with Applications

Bp, REGRESSION on MEDIAN RESIDUAL LIFE FUNCTION FOR

Log-Epsilon-S Ew Normal Distribution and Applications

Introduction to Survival Analysis in Practice

Estimating the Survival Function

University of California Santa Cruz Bayesian Inference

Modeling Absolute Differences in Life Expectancy with a Censored Skew-Normal Regression Approach

LIFE CONGINGENCY MODELS I Contents 1 Survival Models