Get the Correct Hazard Ratio from SAS® PROC PHREG Procedure Betty Ying Wang, Genentech Inc., South San Francisco, CA

ABSTRACT

Cox proportional hazards model is a commonly used model in providing hazard ratio to compare survival times of two population groups. The exponentiated part of the model describes the effects of explanatory variables on hazard ratio.

PROC PHREG is a SAS procedure that implements the Cox model and provides the hazard ratio estimate. The estimate is interpreted as the percent change in the hazards of the two population groups given an increase of one unit in a given explanatory variable and conditional on fixed values of all other explanatory variables. When the increase in the given explanatory variable is not equal to one unit, the hazard ratio estimate should be interpreted with caution.

This paper illustrates this issue via an example with a multi-level treatment group variable as an explanatory variable. The paper provides three options (with sample codes) to obtain the correct hazard ratio when the increase in the explanatory variable is not equal to one unit:

1> Computing from the regression coefficient estimates of PROC PHREG output, 2> Recoding the values of the explanatory variable such that the increase is equal to one unit, 3> Using the CLASS statement to specify the explanatory variable in PROC TPHREG (experimental) procedure.

This paper is not limited to any particular operating system. It is intended for users with some and basic SAS data steps knowledge.

INTRODUCTION

In survival analysis, the hazard function is a useful way to describe the distribution of survival times. The hazard ratio is the ratio of the hazard functions between two population groups. If the hazard ratio estimate is less than one, this means that the hazard function for the first group is smaller than that for the second group.

Cox proportional hazards model (Cox model) is a commonly used semi-parametric model in computing the hazard ratio estimates. It is a general proportional hazards model which does not require to specify the underlying . The model assumes a parametric form (exponentiated linear regression form) for the effects of the explanatory variables and an unspecified non-parametric form for the underlying survival function. Thus, the Cox model can provide estimate on the hazard ratio without knowing the underlying survival function.

With the parametric assumption that the relationship between the explanatory variables and the log hazard is linear, and further assumption that the effects of the explanatory variables are the same at all values of time (that is, the explanatory variables are not time dependent), a typical form of the Cox model is (Note: Equations in this section are from Survival Analysis Using the Proportional Hazards Model, Instructor-based Training Course Notes. 2005. Cary, NC: SAS Institute Inc.):

{ i ββ XX i 2211 +++ β Xikk }... i = 0 )()( ethth

Baseline hazard function Linear function of explanatory variables

Taking the logarithm of both sides of the above equation, the model becomes

i 0 )(log)(log β i ++= β XXthth i2211 + ... + β X ikk

The hazard ratio between group A and group B is:

1

B groupin hazard groupin B th )( )( eth { B ββ XX B 2211 +++ β X Bkk }... B == 0 { A ββ XX A2211 +++ β X Akk }... A groupin hazard groupin A A th )( 0 )( eth

β ({ XX β () XX β (...) −++−+− XX )} = e AB AB 222111 AkBkk

Linear regression coefficients (β1, β2, …βk) estimated by the model measure the effects of the explanatory variables (X1, X2, … Xk) on the hazard ratio. The exponentiated regression coefficient is interpreted as the increase in the hazard for group B as compared to group A given an increase of one unit in the explanatory variable (when holding all other explanatory variables constant).

PROC PHREG is a SAS procedure that implements the Cox model and computes the hazard ratio estimate. For continuous explanatory variables, the interpretation of the hazard ratio is straightforward. When the explanatory variable is coded in categorical values and the increase in the category values is not equal to one unit, the hazard ratio estimate should be interpreted with caution.

In this paper, we will illustrate this issue by using a very simple Cox model that has only one categorical explanatory variable. The time-dependent variables and non-proportional hazards are not considered in this paper.

The hazard ratio between group A and group B with only one explanatory variable becomes:

B groupin hazard groupin B βˆ −XX )( hazard ratio = = e AB (1) A groupin hazard groupin A

When the increase in the explanatory variable is one unit (XB-XA=1), the hazard ratio is the exponential of the βˆ regression coefficient ( e ).

DISCUSSION

We will use the following data set (SURV) for our illustration. The data set SURV contains the variable SURVTIME (the survival time in months), the variable SURVCEN (the censoring indicator variable: 0 if not censored and 1 if censored), and the categorical explanatory variable TRT (the treatment group indicator). We will use XA, XB, XC to denote X from treatment groups 0, 1, 2, respectively.

The objectives are: 1. To estimate the hazard ratio of death between treatment group A (trt=0) and treatment group B (trt=1) 2. To estimate the hazard ratio of death between treatment group A (trt=0) and treatment group C (trt=2).

Output 1. Data set SURV, partial print out

Obs Survcen Survtime Patid Trt

1 0 60.63 900001 1 2 0 49.55 900002 2 3 1 53.64 900003 0 4 0 65.14 900004 2 5 0 57.63 900005 2 6 0 64.37 900006 0 7 0 62.51 900007 0 8 0 57.37 900008 0 9 0 63.22 900009 2 10 1 75.62 900010 1 11 0 54.06 900011 2 12 0 60.38 900012 2 13 0 63.25 900013 1

2 14 1 71.17 900014 1

more data lines ------

WHEN THE INCREASE IN THE CATEGORY VARIABLE (XB-XB A) EQUALS TO 1 UNIT

First, we are interested in estimating the hazard ratio of death between treatment group A and treatment group B (trt=0 vs. trt=1). Specify the following statements in SAS:

proc phreg data=surv(where=(trt in (0,1)); model survtime*survcen(1)=trt; run; (2)

The partial SAS output with the estimates for β and the hazard ratio is:

Output 2. trt=0 vs. trt=1, partial print out from PROC PHREG

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio

trt 1 -0.59851 0.09400 40.5370 <.0001 0.550

The estimate for β is -0.59851 and the hazard ratio estimate is 0.550. The 0.550 is simply the exponential of βˆ -0.59851 ( e ). It is provided in the SAS output with the assumption that the explanatory variable X is increased by one unit. Applying formula (1)

βˆ −XX )( βˆ − )01*( βˆ hazard ratio = e AB = e = e

The hazard ratio in the above SAS output means that the hazard of death in treatment group B is 0.550 times the hazard of death in treatment group A.

WHEN THE INCREASE IN THE CATEGORY VARIABLE (XC-XA) DOES NOT EQUAL TO 1 UNIT

Next, we are interested in estimating the hazard ratio of death between treatment group A and treatment group C (trt=0 vs. trt=2). Replace (2) with the corresponding treatment groups:

proc phreg data=surv(where=(trt in (0,2)); model survtime*survcen(1)=trt; run; (3)

The partial SAS output with the estimates for β and hazard ratio is:

Output 3. trt=0 vs. trt=2, partial print out from PROC PHREG

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio trt 1 0.40119 0.05008 64.1866 <.0001 1.494

If we interpret the hazard ratio in the above SAS output as the hazard of death in treatment group C is 1.494 (the βˆ exponential of 0.40119 ( e )) times the hazard of death in treatment group A, this would be incorrect. This is because the assumption that the explanatory variable X is increased by one unit is not true. The explanatory variable X is increased by two units (XC-XA = 2-0 = 2).

There are three options to obtain the correct hazard ratio estimate in this scenario.

3

Option 1: Computing from regression coefficient estimates of PROC PHREG output

The correct hazard ratio can be computed using the regression coefficient estimates from the same PROC PHREG output (Output 3). Applying formula (1) when (XC-XA) is not equal to 1; in particular, when (XC-XA) is equal to 2:

βˆ −XX )( βˆ − )02*( )2*40119.0( hazard ratio = e AC = e = e = 2.231

So the hazard of death in treatment group C is 2.231 times the hazard of death in treatment group A.

Option 2: Recoding the values of the category variables such that the increase is equal to one unit

To avoid the above manual computation of the hazard ratio estimate at the end, a straightforward approach would be to recode values of the treatment group variable X such that the increase in (XC-XA) is equal to one. Since X is a categorical variable and its values of 0, 1, 2, are just representing the different treatment groups, we can recode XC from 2 to 1 while keep XA as 0, or recode XA from 0 to 1 while keep XC as 2.

Following are the SAS statements to recode XC from 2 to 1 while keep XA as 0. The recoded values are saved in a new variable (trt_cd) in the data set SURV:

data surv; set surv; if trt=2 then trt_cd=1; else if trt=0 then trt_cd=0; run;

Replace variable trt with trt_cd in (3):

proc phreg data=surv(where=(trt in (0,2)); model survtime*survcen(1)=trt_cd; run; (4)

The partial SAS output with the estimates for β and hazard ratio is:

Output 4. trt_cd=1 vs. trt_cd=0, partial print out from PROC PHREG

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio

trt_cd 1 0.80238 0.10015 64.1866 <.0001 2.231

βˆ ( − XX ) βˆ − )01*( βˆ With the recoded variable trt_cd, the hazard ratio is e _ _ cdAcdC = e = e . Therefore, the hazard ratio estimate 2.231 is obtained directly from the SAS output (Output 4).

As we just mentioned above, we can either recode XC from 2 to 1 while keep XA as 0 (as we just did above), or recode XA from 0 to 1 while keep XC as 2 as long as (XC-XA) equals to one. Again, be aware of the interpretation of the hazard ratio directly obtained from the SAS output after the recoding. It is the hazard in group C vs. the hazard in group A if (XC-XA) equals to 1, or the hazard in group A vs. the hazard in group C if (XC-XA) equals to -1.

To illustrate this, try the following recoding which results in (XC_cd-XA_cd) = (1-2) = -1:

data surv; set surv; if trt=2 then trt_cd=1; else if trt=0 then trt_cd=2; run;

Running the SAS statements in (4), the partial SAS output with the estimates for β and hazard ratio is:

4

Output 5. trt_cd=1 vs. trt_cd=2, partial print out from PROC PHREG

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio

trt_cd 1 -0.80238 0.10015 64.1866 <.0001 0.448

ˆ β ( _ − XX _ cdAcdC ) βˆ − )1(* − − )1)*(80238.0( Hazard ratio (XC_cd vs. XA_cd) = e = e = e = 2.231 or equivalently,

ˆ β ( _ − XX _ cdCcdA ) βˆ 1* − 1)*80238.0( Hazard ratio (XA_cd vs. XC_cd) = e = e = e = 0.448

So the hazard of death in treatment group C is 2.231 times the hazard of death in treatment group A, or equivalently, the hazard of death in treatment group A is 0.448 times the hazard of death in treatment group C, which is the reciprocal of the previous one.

Option 3: Using the CLASS statement to specify the category variable in PROC TPHREG (experimental) procedure

The experimental procedure in SAS 9.1, PROC TPHREG, incorporates the CLASS statement which enables you to specify the categorical explanatory variables. PROC TPHREG contains the majority of the functionalities in PROC PHREG, with additional experimental functionalities such as specifying CLASS variables as in PROC GLM.

A two-level categorical variable (in our example, trt with values of 0 and 1, or 0 and 2) specified in the CLASS statement will be assigned design values of 0 (or the reference) and 1 regardless of their original values. The original values of the categorical variable become meaningless and the difference of the design values is always one (for two- level categorical variable). Therefore the hazard ratio estimate can be obtained directly from the SAS output.

In the following PROC TPHREG statements, we copy the exact model statements from (3) and add an additional CLASS statement:

proc tphreg data=surv(where=(trt in (0,2))); class trt; model survtime*survcen(1)=trt; run; (5)

The partial SAS output with the estimates for β and hazard ratio is:

Output 6. trt=0 vs. trt=2, partial print out from PROC TPHREG

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio

trt 0 1 -0.80238 0.10015 64.1866 <.0001 0.448

In order to interpret the hazard ratio estimate from the SAS output (Output 6) correctly (is it treatment group A versus group C or vice versa?), we need to know which treatment group (A or C) is assigned design variable value 0 or 1. This information can also be found in the PROC TPHREG SAS output, as displayed below:

Output 7. Design variable values, partial print out from PROC TPHREG

5 Class Level Information

Design Class Value Variables

trt 0 1 2 0

In this case, the design variable values are 1 for XA and 0 for XC. By default, CLASS statement designates the last ordered value of CLASS variable as the reference. The above hazard ratio estimate in Output 6 is therefore for the hazard in group A vs. the hazard in group C. In other words, the hazard of death in treatment group A is 0.448 times the hazard of death in treatment group C (the reference). This is consistent with the results from PROC PHREG SAS outputs.

If we need to obtain the hazard ratio of treatment group C versus treatment group A (the reference) directly from the SAS output, we have the option in PROC TPHREG to change the reference by reversing the order of the CLASS variable. In the above example, specify ‘descending’ option in the CLASS variable in (5):

proc tphreg data=surv(where=(trt in (0,2))); class trt (descending); model survtime*survcen(1)=trt; run;

The design variable values now become:

Output 8. Design variable values (with descending option), partial print out from PROC TPHREG

Class Level Information

Design Class Value Variables

trt 2 1 0 0

The partial SAS output with the estimates for β and hazard ratio is:

Output 9. trt=0 vs. trt=2 (with descending option), partial print out from PROC TPHREG

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio

trt 2 1 0.80238 0.10015 64.1866 <.0001 2.231

The hazard of death in treatment group C is 2.231 times the hazard of death in treatment group A.

Above PROC PHREG codes were tested in SAS 8.2 and SAS 9.1.3. PROC TPHREG codes were tested in SAS 9.1.3.

CONCLUSION

This paper discussed three options in obtaining the correct hazard ratio estimate using SAS PROC PHREG or PROC TPHREG procedure when the assumption that the increase in the explanatory variable is equal to one does not hold. By recoding the values in the categorical explanatory variable in PROC PHREG or by using the CLASS statement to specify the categorical explanatory variable in the experimental PROC TPHREG procedure, we can obtain the correct hazard ratio estimate directly from the SAS procedure output. By further awareness of recoded values in PROC PHREG or design variable values set by PROC TPHREG, we can finally interpret the hazard ratio correctly.

6 REFERENCES

Allison, Paul D. 2003. Survival Analysis Using SAS®: A Practical Guide. Cary, NC: SAS Institute Inc.

SAS Institute Inc. 2005. Survival Analysis Using the Proportional Hazards Model, Instructor-based Training Course Notes. Cary, NC: SAS Institute Inc.

SAS Institute Inc. 2005. SAS/STAT® 9.1 User’s Guide, Volume 6. Cary, NC: SAS Institute Inc.

ACKNOWLEDGMENTS

I’d like to acknowledge my special gratitude to my manager Ms. Phuong Dang for her continuous support and advice on my work. Many thanks to Ms. Phuong Dang and Dr. Yuting Zhang for their in depth review and valuable feedback on this paper. I’m with great appreciation to Ms. Phuong Dang for her very helpful suggestions, thorough editorial notes and detailed proofreading of the paper. I’m also very grateful to Dr. Yuting Zhang for her inputs and tireless discussions with me on related statistical rationales.

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at: Betty Ying Wang Genentech, Inc. MS #59, 1 DNA Way South San Francisco, CA 94080 (650) 467-8542 (voice) [email protected]

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

7