Stata Application Tutorial 8: Competing Risks/Split Population

August 2005 Stata Application Tutorial 8: Competing Risks/Split Population ______Data Note: Code makes use of the restrictive abortion adoption legislation. These data are available on the Event History website. Code is based on Stata version 8.

Preliminaries: With competing risks, the question arises as to how handle the fact that different kinds of events can possibly occur. This is a common issue with a many duration-type problems social scientists work with. For example, in what ways can a career end? Can regimes “fail” in different ways (overthrown, elections, resignation, etc.)? And so on.

I start with the restrictive abortion adoption legislation. Suppose we just estimate a garden-variety single-state Cox model? We obtain:

. stcox south lgctsid nbrestr mooneyp ugov conright, nohr exactp

failure _d: event analysis time _t: time

Iteration 0: log likelihood = -189.66918 Iteration 1: log likelihood = -173.75225 Iteration 2: log likelihood = -172.84282 Iteration 3: log likelihood = -172.83956 Iteration 4: log likelihood = -172.83956 Refining estimates: Iteration 0: log likelihood = -172.83956

Cox regression -- exact partial likelihood

No. of subjects = 418 Number of obs = 418 No. of failures = 44 Time at risk = 3170 LR chi2(6) = 33.66 Log likelihood = -172.83956 Prob > chi2 = 0.0000

Under this model, it is assumed that any j event is equivalent to any k event (that is, we are not distinguishing between or among possibly disparate events…or competing risks; also note, the code above will replicate results from Jones and Branton 2005).

There are a variety of ways to handle competing risks. The simplest way, via a Cox model, is to estimate a stratified model. Under this model, we make the Bradford S. Jones 1 ICPSR MLE-2 Event History Course Stata Tutorial (strong?) assumption that heterogeneity due to different event types is found in the baseline hazards; the covariate effects are the same over the competing risks. This is a strong assumption insofar as it disallows directly estimating different covariate effects for the J competing risks. Instead, heterogeneity is “swept” into the baseline hazards.

Estimation is straightforward (assuming the data are constructed appropriately). Under this model, it is assumed that an observation is at risk of experiencing any one of the J events. Consequently, in terms of the data set up, it will consist of multiple records per observation, with each observation contributing a separate record of data for each of the competing risks.

The model makes an important assumption about the competing risks and that assumption is this. At the entry time, an observation is assumed to be at risk of experiencing any one of the J events. After event j is experienced, it is assumed the observation is no longer at risk of experiencing that event. Hence, it is now at risk of experiencing one of the J-1 remaining risks.

See below. Here the variable “stcode” is our identifier; the variable “type_r” labels the type of event (1, 2, 4, 5 [coding is arbirtray]) the state is at risk of experiencing (in this context, each event type is a type of restrictive abortion legislation: (informed consent, parental consent, limited funding, and spousal consent). The variable “rev_even” denotes the event indicator: 1 if the j policy was adopted; 0 if not. Year is self-explanatory and _d is redundant with rev_even.

Note that there are 4 lines of data per year. This is because it is assumed the state is at risk of adopting any one of those four kinds of policies. Note how “type_r” is adjusted after an event is experienced. Look at state 1 in 1977. Here it adopted policy type “5” (spousal consent). After 1977, the events state 1 is now at risk of experiencing are risks 1,2, and 4. Risk 5 was observed and so it’s assumed risk 5 cannot be repeated. Is this feasible? This is a call you have to make.

Bradford S. Jones 2 ICPSR MLE-2 Event History Course Stata Tutorial +------+ | stcode type_r rev_even year _d | |------| 1. | 1 1 0 1974 0 | 2. | 1 2 0 1974 0 | 3. | 1 4 0 1974 0 | 4. | 1 5 0 1974 0 | 5. | 1 1 0 1975 0 | |------| 6. | 1 2 0 1975 0 | 7. | 1 4 0 1975 0 | 8. | 1 5 0 1975 0 | 9. | 1 1 0 1976 0 | 10. | 1 2 0 1976 0 | |------| 11. | 1 4 0 1976 0 | 12. | 1 5 0 1976 0 | 13. | 1 1 0 1977 0 | 14. | 1 2 0 1977 0 | 15. | 1 4 0 1977 0 | |------| 16. | 1 5 1 1977 1 | 17. | 1 1 0 1978 0 | 18. | 1 2 0 1978 0 | 19. | 1 4 0 1978 0 | 20. | 1 1 0 1979 0 | |------| 21. | 1 2 0 1979 0 | 22. | 1 4 0 1979 0 | 23. | 1 1 0 1980 0 | 24. | 1 2 0 1980 0 | 25. | 1 4 0 1980 0 | |------| 26. | 1 1 0 1981 0 | 27. | 1 2 0 1981 0 | 28. | 1 4 0 1981 0 | 29. | 1 1 0 1982 0 | 30. | 1 2 0 1982 0 | |------| 31. | 1 4 0 1982 0 | 32. | 1 1 0 1983 0 | 33. | 1 2 0 1983 0 | 34. | 1 4 0 1983 0 | 35. | 1 1 0 1984 0 | |------| 36. | 1 2 0 1984 0 | 37. | 1 4 0 1984 0 | 38. | 1 1 0 1985 0 | 39. | 1 2 0 1985 0 | 40. | 1 4 0 1985 0 | |------| 41. | 1 1 0 1986 0 | 42. | 1 2 1 1986 1 | 43. | 1 4 0 1986 0 | 44. | 1 1 0 1987 0 | 45. | 1 4 0 1987 0 | |------| 46. | 1 1 0 1988 0 | 47. | 1 4 0 1988 0 | 48. | 1 1 0 1989 0 | 49. | 1 4 0 1989 0 | 50. | 1 1 0 1990 0 | |------| 51. | 1 4 0 1990 0 | 52. | 1 1 0 1991 0 | 53. | 1 4 0 1991 0 | 54. | 1 1 0 1992 0 | 55. | 1 4 0 1992 0 | |------| 56. | 1 1 0 1993 0 | 57. | 1 4 1 1993 1 |

Bradford S. Jones 3 ICPSR MLE-2 Event History Course Stata Tutorial You can continue this exercise throughout state 1’s history if you want. Below, I estimate this model:

. stcox south lgctsid nbrestr mooneyp UGOV conright, nohr exactp strata(type_r)

failure _d: rev_even analysis time _t: duration

Iteration 0: log likelihood = -478.38134 Iteration 1: log likelihood = -455.87676 Iteration 2: log likelihood = -455.23175 Iteration 3: log likelihood = -455.23085 Refining estimates: Iteration 0: log likelihood = -455.23085

Stratified Cox regr. -- exact partial likelihood

No. of subjects = 2554 Number of obs = 2554 No. of failures = 98 Time at risk = 23593 LR chi2(6) = 46.30 Log likelihood = -455.23085 Prob > chi2 = 0.0000

------_t | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------+------south | .6588073 .2610267 2.52 0.012 .1472043 1.17041 lgctsid | -.142223 .0758693 -1.87 0.061 -.2909241 .0064782 nbrestr | .1128489 .1619133 0.70 0.486 -.2044954 .4301933 mooneyp | -.1923744 .0587138 -3.28 0.001 -.3074513 -.0772974 UGOV | .1085451 .2174234 0.50 0.618 -.3175969 .534687 conright | -.9162597 .2803983 -3.27 0.001 -1.46583 -.3666891 ------Stratified by type_r

The major difference between this and the other model is we are explicitly allowing for different kinds of events. The covariate effects differ, depending on which conceptualization of events you believe is most valid. I summarize this below (table taken from Jones and Branton 2005).

Bradford S. Jones 4 ICPSR MLE-2 Event History Course Stata Tutorial Table 2. Comparing Competing Risks and Single Event Cox Models of State Adoption of Restrictive Abortion Legislation, 1974-1993 Single-Event M Stratified Competing od Risks Model el Variable Estimate (s.e.) Estimate (s.e.) South .66 (.26) .82 (.43) Ideology distance .14 (.07) .14 (.12) Neighbor .11 (.16) .24 (.23) Pre-Roe .19 (.06) .25 (.09) Unified Government .11 (.22) .01 (.34) Constitutional Right .92 (.28) .31 (.44) N 2554 418 Log-Likelihood 455.23 .84 Note: Data are from Brace, Hall, and Langer 2001. Both models are semi-parametric Cox models.

In the contrast between column 1 and column 2, I would choose column 1 on the grounds that it explicitly allows for competing risks.

The usual and mostly valid complaint against this model is that covariate effects are constrained to be equal across risks. Maybe this is a bad assumption.

Here is another way: partitioned likelihood.

Under this approach, we are going to model the type-specific hazards using Cox model. Practically speaking, this will require us to estimate 4 models, 1 each for each risk. The basic idea here is that we treat the J-1 remaining risks as if they are right-censored cases. This isolates the risk of interest. Mechanically (in Stata), we will re-stset the data after each estimation.

In Stata this would mean 4 stsets. stset duration, failure(type_r==1) (estimate Cox model) stset duration, failure(type_r==2) (estimate Cox model) stset duration, failure(type_r==3) (estimate Cox model) stset duration, failure(type_r==4) (estimate Cox model)

Below, I summarize the results that would be obtained from this estimation.

Bradford S. Jones 5 ICPSR MLE-2 Event History Course Stata Tutorial Table 3. Cox Type-Specific Competing Risks Models of State Adoption of Restrictive Abortion Legislation, (years) Informed Parental Limited Spousal Consent Consent Funding Consent Variable Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) South .32 (.59) .31 (.46) .93 (.49) .35 (.63) Ideology Distance .17 (.16) .32 (.14) .08 (.14) .05 (.18) Neighbor States .05 (.36) .05 (.28) .01 (.31) .38 (.36) Pre-Roe .26 (.14) .14 (.10) .12 (.11) .29 (.16) Unified Government .34 (.47) .24 (.36) .21 (.44) .34 (.53) Constitutional Right 1.06 (.65) .93 (.47) .82 (.53) .66 (.63) N 386 386 386 386 Log-Likelihood 132.75 222.67 157.08 91.25 Note: Data are from Brace, Hall, and Langer 1999.

Differences? There are definite changes from the stratified model to this one. Which one would I report? Most likely this one. It accounts for competing risks but has the flexibility of permitting covariate-specific effects.

Bradford S. Jones 6 ICPSR MLE-2 Event History Course Stata Tutorial A Very Quick Look at a Split-Population Model:

First a split-population model using the cloglog link (thanks to S. Jenkins 2001):

. spsurv dead mooneymean neighbor, id(stcode) seq(t)

Iteration 0: log likelihood = -129.08367 (not concave) Iteration 1: log likelihood = -128.78891 Iteration 2: log likelihood = -127.66949 Iteration 3: log likelihood = -127.09029 Iteration 4: log likelihood = -127.0878 Iteration 5: log likelihood = -127.0878

Split population survival model Number of obs = 386 LR chi2(3) = 15.58 Log likelihood = -127.0878 Prob > chi2 = 0.0014

------dead | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------+------hazard | mooneymean | -.3223903 .0959937 -3.36 0.001 -.5105345 -.134246 neighbor | -.0677823 .4140629 -0.16 0.870 -.8793307 .7437661 _cons | -1.713602 .3519405 -4.87 0.000 -2.403393 -1.023812 ------+------cure_p | _cons | -2.484965 .6332793 -3.92 0.000 -3.72617 -1.24376 ------c = Pr(never fail) = .07691892; Std.Err. = .04496435; z = 1.7106645 Likelihood ratio test of c=0: chibar2(01)= 5.45 Prob>=chibar2 = 0.010

Second, a standard cloglog model.

. cloglog dead mooneymean neighbor

Iteration 0: log likelihood = -129.95727 Iteration 1: log likelihood = -129.81491 Iteration 2: log likelihood = -129.81464 Iteration 3: log likelihood = -129.81464

Complementary log-log regression Number of obs = 386 Zero outcomes = 343 Nonzero outcomes = 43

LR chi2(2) = 10.13 Log likelihood = -129.81464 Prob > chi2 = 0.0063

I did this quickly! You would want to account for f(t) here!

Bradford S. Jones 7 ICPSR MLE-2 Event History Course Stata Tutorial