<<

Maths and Help Centre

Survival analysis Survival relates to the time taken for an individual to reach a certain event. This event is not always and not everyone will have experienced the event by the end point of the study. that an individual has not experienced the event by the end of the study e.g. they withdrew from the study or died from an unrelated event.

Producing a Kaplan-Meier plot DATA USED IN THE EXAMPLE: Worcester Heart Attack Study data from Dr. Robert J. Goldberg of the Department of Cardiology at the University of Massachusetts Medical School.

DESCRIPTIVE ABSTRACT: The main goal of this study is to describe factors associated with trends over time in the and survival rates following hospital admission for acute myocardial infarction (MI). Data have been collected during thirteen 1-year periods beginning in 1975 and extending through 2001 on all MI patients admitted to hospitals in the Worcester, Massachusetts Standard Metropolitan Statistical Area.

A Kaplan-Meier plot displays survival (cumulative of an individual remaining alive/ free etc at any time after baseline). The cumulative survival probability is the product of the survival probabilities up to that point in time. n  d No. alive the day before - No. dying on day i Probability of survival on day i: i i  ni No. alive the day before

ni  di n1  d1 n2  d2 ni  di Survival until day i: S(ti )   S(ti1)   ..... ni n1 n2 ni

These probabilities are only calculated for days when the event (e.g. death) happens. When observations are censored, the probability of survival remains the same.

To produce a Kaplan-Meier plot in SPSS, select ANALYSE  SURVIVAL  KAPLAN-MEIER and select the following options:

Define event: Tell SPSS what number defines an event. Death is indicated by 1 for this study. Select the log rank test option to test for a difference in survival We are interested in the difference between the groups between males and females so enter gender into the factor box.

Survival analysis Maths and Statistics Help Centre

Summary statistics for the two groups:

The estimated time until death is 1907.42 days for males and 1475.21 days for females. In survival analysis the survival probabilities are usually reported at certain time points on the curve (e.g. 1 year and 5 year survival); otherwise the survival time (the time at which 50% of the subjects have reached the event) can be reported. The median time between admission for myocardial infarction and death is 2624 days for males compared to 1806 days for females. The next plot shows that the survival probability is lower for females at all time points so they are less likely to survive.

The log rank test The log-rank test tests the hypothesis that there is no difference in survival times between the groups studied at all time points in the study. The log rank test compares the observed and expected number of events for each group using the same test as the Chi-squared test although the calculations for the expected frequencies are different. The assumptions for the test are that the survival times are continuous or ordinal and that the ratio of the risk of the event happening in group 1 compared to the risk in group 2 remains constant (proportional assumption).

2 2 2 OA  EA  OB  EB  Test Statistic for comparing two groups A and B: logrank   EA EB

The calculation for the expected values is a bit tricky and time consuming:

dinAi EA   ni

di  no. of events at time ti , nAi  no. of people at risk at time i in group A and ni  total no. of people at risk Ask a tutor for help or just use SPSS to calculate the test statistic.

Compare the test statistic with the critical value from the Chi-squared table. The degrees of freedom are: 2 number of groups – 1, thus for a significance level of 5% and 1 df, the critical value is 0.05,1  3.84 . If the test statistic is larger than 3.84, reject the null hypothesis and conclude that there is significant evidence to suggest a difference in survival times for the two groups.

Survival analysis Maths and Statistics Help Centre

The p-value (sig) is the probability of getting a test statistic of at least 3.971 if there really is no difference in survival times for males and females. As the p-value (0.046) is less than 0.05, conclude that there is significant evidence of a difference in survival times for males and females. The estimated time until death is 1907.42 days for males and 1475.21 days for females, i.e. males have an increased chance of survival.

Cox’s regression Cox’s regression compares the hazards (as ratios) of the two treatment groups and allows several variables to be taken into account. The is the risk (probability) of reaching the endpoint (e.g. death) at time point i, given that the individual has not reached it up to that point.

ANALYSE  SURVIVAL  COX REGRESSION

For all categorical variables, select the ‘Categorical’ option. Tell SPSS whether you want to compare factor levels to the first or last category and select the ‘Simple’ option (although indicator appears to give the same main output). Click on change after each variable. Look at the help for info on other options but stick with ‘simple’ for now.

For gender, selecting the reference category as ‘first’ means that males are the reference category (as male = 0, female =1).

Requesting a hazard plot within the plots options, gives the following plot:

It is clear from the plot that the risk of dying increases with age. Survival analysis Maths and Statistics Help Centre

For standard multiple regression the model produced is of the form y  0  1x1  2 x2  ..... k xk For modelling the hazard function the model is  (t)   (t)exp x   x  ......   x  i 0 1 1 2 2 k k  (t)  hazard function at time point t for individual i, i 0 (t)  baseline hazard function (hazard function when all explanatory variables are set to 0)

There is a lot of output from SPSS but the following table probably contains all that is needed.

To look for significant effects, use the p-values in the ‘sig’ column. Compare the p-values to the standard significance level of 0.05. To understand the effects of individual predictors, look at Exp(B), which is the and can be interpreted as the predicted change in the hazard for a unit increase in the predictor. There is one assumption for Cox’s regression which is the proportional hazards assumption that the hazard ratio between two groups remains constant over time.

For gender, the p-value is 0.553 so there is no evidence of a greater risk of death following acute myocardial infarction in either sex. For age group, the reference category is < 60 which means that all the other categories are compared to the group containing patients under the age of 60. The next category is 60 – 69, then 70 – 79 and lastly 80+. There is no evidence of a difference between 60 – 69 year olds and under 60’s but there is a difference between the other two age groups and the under 60’s. For categorical variables, interpret the hazard ratio, Exp(B) directly from the output e.g. after controlling for gender and BMI, the hazard for 70 – 79 year olds is 2.475 times that of under 60s.

What’s a hazard ratio? h (t) risk of event in group A (e.g. death) HR  A  h (t) risk of event in group B B h (t) 70 79  2.475 h (t) For comparing the risk in the 70 – 79 year olds with the risk in the under 60’s, 60 A hazard ratio can be interpreted in a similar way to . It compares the risk of an event occurring in two groups. If the ratio is above 1, the risk of the event happening in group A is higher.

The effect of BMI is statistically significant. For each additional unit of BMI, the hazard decreases by (1 – 0.931)*100 = 6.9%. For an additional 5 units increase in BMI, the hazard decreases by (1 – 0.9315 )*100 = 30%