HST 190: Introduction to Biostatistics

HST 190: Introduction to Biostatistics

HST 190: Introduction to Biostatistics Lecture 8: Analysis of time-to-event data 1 HST 190: Intro to Biostatistics • Survival analysis is studied on its own because survival data has features that distinguish it from other types of data. • Nonetheless, our tour of survival analysis will visit all of the topics we have learned so far: § Estimation § One-sample inference § Two-sample inference § Regression 2 HST 190: Intro to Biostatistics Survival analysis • Survival analysis is the area of statistics that deals with time-to- event data. • Despite the name, “survival” analysis isn’t only for analyzing time until death. § It deals with any situation where the quantity of interest is amount of time until study subject experiences some relevant endpoint. • The terms “failure” and “event” are used interchangeably for the endpoint of interest • For example, say researchers record time from diagnosis to death (in months) for a sample of patients diagnosed with laryngeal cancer. § How would you summarize the survival of laryngeal cancer patients? 3 HST 190: Intro to Biostatistics • A simple idea is to report the mean or median survival time § However, sample mean is not robust to outliers, meaning small increases the mean don’t show if everyone lived a little bit longer or whether a few people lived a lot longer. § The median is also just a single summary measure. • If considering your own prognosis, you’d want to know more! • Another simple idea: report survival rate at � years § How to choose cutoff time �? § Also, just a single summary measure—graphs show same 5-year rate 4 HST 190: Intro to Biostatistics • Summary measures such as means and proportions are highly useful for concisely describing data. However, they capture some features of the data but not all. § As we see, survival curve itself is far more informative § So, let’s estimate that! Patient months 1 1 2 2.5 3 3 4 3 5 5.5 6 6 7 7 8 7 9 9 10 13 5 HST 190: Intro to Biostatistics Measuring survival time • We measure time with variable �. The beginning of the period we’re studying (e.g., time of diagnosis) is � = 0. § � can represent days, months, weeks, years, etc. • Time is recorded relative to when a person enters the study, not according to the calendar. § In other words, a person reaches the point � = 1 after being in the study for 1 year (or month, etc). § It doesn’t matter how far along the others are. 6 HST 190: Intro to Biostatistics • Person #1 enters study in 2003 • Person #2 enters study in 2002 • Both die at 2006 • If t measured in years, person #1 dies at � = 3 and person #2 dies at � = 3 Person #1 X Person #2 X 2002 2003 2004 2005 2006 2007 7 HST 190: Intro to Biostatistics • The probability of an individual in our population surviving beyond a given time is called the survival function, written as �(�) • Here we want to make inference about a function, not just a single population value • Seems logical to estimate �(�) by taking a sample and recording the proportion of sample who are still alive after time � has elapsed § However, it is not as simple as that sounds 8 HST 190: Intro to Biostatistics Censored data • Analyses of survival times often include censored data (a type of missingness). • Valid inference in the presence of missing data is a topic of ongoing research in statistics. • To do valid inference when some data are missing, we must make assumptions. • Time-to-event data often have data missing in a particular way: individuals are lost to follow-up (or the study ends) before they experience the event of interest. This is called (right) censoring. • Censored data provide partial information: you don’t know how long a patient lived, but you know that she/he lived at least as long as the time before being lost to follow-up. 9 HST 190: Intro to Biostatistics • Why would a person be lost to follow-up? • The person could have… § moved to another city § withdrawn from the study § died of a different cause § still be in the study without an event at the time of the analysis • To do inference in the setting of missing data, we must be willing to make a big assumption § Assumption that censoring is non-informative • In other words, assume that being lost to follow-up is unrelated to prognosis. • If this assumption can’t be made, inference becomes more complicated and requires strong assumptions. 10 HST 190: Intro to Biostatistics • As an important counterexample, say researchers administer a new chemotherapy drug to 10 cancer patients to estimate survival time while on the drug. • 5 patients can’t tolerate the side effects and drop out of the study • If non-informative censoring were assumed, the drug would probably appear falsely impressive. § Those who dropped out were probably more ill; hence shorter survival times were disproportionately removed from the sample. 11 HST 190: Intro to Biostatistics Estimating the survival curve • The Kaplan-Meier estimator provides an estimate of �(�) at all time points, even if some data are censored § Also known as product-limit estimator • Using the rules of probability, we’ll see where this estimator comes from by choosing specific times �*, … , �-, then at �., � �. = � alive at �. = � alive at �. ∩ alive at �.8* = � alive at �.|alive at �.8* ⋅ � alive at �.8* = � alive at �.|alive at �.8* ⋅ �(�.8*) § Put simply, probability of surviving to time �. is probability of surviving to �.8* and then given you made it that far, surviving to �. § What if we applied this trick repeatedly? 12 HST 190: Intro to Biostatistics � �. = � alive at �.|alive at �.8* ⋅ � �.8* = � alive at �.|alive at �.8* ⋅ � alive at �.8*|alive at �.8; ⋅ � �.8; = � alive at �.|alive at �.8* ⋅ … ⋅ � alive at �;|alive at �* ⋅ � �* • After writing the survival function as the product of these individual pieces, we can then estimate it by estimating each piece individually • If there were no censoring, then we could simply estimate � alive at �.|alive at �.8* by the sample quantity # alive at �. # alive at �.8* • However, a patient who is alive but censored at time �.8* never really had a chance to make it to �. § That patient was not eligible to die during the interval from �.8* to �. and therefore shouldn’t be counted for computing survival rate in this interval. 13 HST 190: Intro to Biostatistics • When there is censoring, then we estimate � alive at �.|alive at �.8* using the sample quantity # alive at �. # alive at �.8* − # censored at �.8* • Denominator counts those at risk for event at time �. § It is exactly because of independent censoring that we can estimate the conditional probability as � alive at �.|alive at �.8* and uncensored at �. 14 HST 190: Intro to Biostatistics • Let’s define the following: § �. = # died at time �. § �. = # censored at time �. § �. = # still alive and not censored at �. • Notice that �.8* = �. + �. + �., so we can write the previous estimator as # alive at � � + � . = . # alive at �.8* − # censored at �.8* �.8* + �.8* − �.8* � − � � = . = 1 − . �.8* �.8* 15 HST 190: Intro to Biostatistics • So, given a set of observed time points �*, … , �-, the Kaplan- Meier estimator of survival probability at time �. is �* �; �. �I �. = 1 − ⋅ 1 − ⋅ … ⋅ 1 − �J �* �.8* § Two key features: 1) Estimated curve jumps at event times only 2) Curve goes to zero if last observed time is event, not censored • Consider outcomes, at 2-year intervals, for 100 patients with some disease or other. year fail censored 2 7 2 § Estimate the survival function. 4 16 5 6 19 8 8 14 7 10 11 4 12 5 2 16 HST 190: Intro to Biostatistics • Consider outcomes, at 2-year intervals, for 100 patients with some disease or other. L § �I 2 = 1 − = 0.930 *JJ L § �I 4 = 1 − = 0.766 *JJ L *R *S § �I 6 = 1 − 1 − 1 − = 0.558 … *JJ S* LJ year fail = �� censored = �� survive = �� Total = �.8* 2 7 2 100 − (7 + 2) = 91 100 4 16 5 91 − (16 + 5) = 70 91 6 19 8 70 − (19 + 8) = 43 70 8 14 7 10 11 4 12 5 2 17 HST 190: Intro to Biostatistics • Estimating S(t) this way is also called the life-table method because it pre-specifies the time intervals. • Using software, estimates of S(t) usually defined at each individual time to get a smoother curve Liu R et al. NEjM 2007 jan 18; 365(3):217-226 (Fig. 2) 18 HST 190: Intro to Biostatistics Brahmer j et al. NEjM 2015; 373(2):123-135 19 HST 190: Intro to Biostatistics Confidence intervals for KM estimator • In addition to estimating �(�) at any time point, we can also form a confidence interval for it as well. • As with the odds ratio, we use the log-transformation for this (which improves the normal approximation) 1) Take logarithm of �(�) 2) CI for ln (�(�)) 3) Convert back to CI for �(�) 20 HST 190: Intro to Biostatistics I . [\ • At time �., variance of ln � �. = ∑`a* ]\^_ ]\^_8[\ • Therefore, the 100(1 − �)% CI for ln � �. is . h �` I f ln � �. ± �*8 g = �*, �; ; � � − � `a* `8* `8* ` k_ kl • So the 100(1 − �)% CI for � �. is � , � 21 HST 190: Intro to Biostatistics • Returning to our example, �I 6 = 0.558 ⇒ ln �I 6 = −0.583 • Then the variance of ln �I 6 is . �` � � � g = * + ; + n � � − � �J �J − �* �* �* − �; �; �; − �n `a* `8* `8* ` 7 16 19 = + + = 0.008 100 100 − 7 91 91 − 16 70 70 − 19 • Then the 100(1 − �)% CI for ln �I 6 is −0.583 ± 1.96h 0.008 = (−0.763, −0.403) • Thus, the 100(1 − �)% CI for �I 6 is �8J.LRn, �8J.oJn = (0.466,0.668) 22 HST 190: Intro to Biostatistics Log-rank test • In addition to estimating survival, we may want to compare two groups’ survival functions using a hypothesis test.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    34 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us