Biostat 140.656 Multilevel Statistical Models Final Project

Analysis of Hospital Expenditures Using Multilevel Models

Background

Data on medical expenditures are frequently analyzed in the interest of health care providers,

medical insurance companies, as well as public health policy makers. The expenditure is

believed to be associated with the patient’s age as well as the specialty of the patient’s

principle care providers (PCP). In this project, I used the dataset containing the medical

expenditure information for 18-64 year old male patients with MD PCP in 1996.

Specifically, the association of the allowable outpatient charges with the patients’ age and

PCP specialty was analyzed. Furthermore, the proportion of patients with allowable

outpatient charges was estimated after controlling for age and specialty effects. The

proportion of patients with allowable specialist charges was then assessed as a function of

age and specialty. Because the patients were from different PCPs and we believe there

are variances within each PCP as well as across PCP clusters, I used multilevel models to

assess the unexplained variance at the PCP level. For most of the analysis I used both

fixed models and random intercept models. In one analysis, a random slope was added in

addition to the random intercept.

Methods

1. Assessment of median allowable outpatient expenditures using fixed and random effects models

The allowable outpatient expenditure (denoted as Allop) was obtained by summing up the annually allowed outpatients in primary care, specialist, laboratory, as well as others. Only patients with positive expenditure were used in the analysis. I first checked the distribution of

Allop and found it was highly right-skewed. distributed after transformation, distributed shown after figure. as in the following theI natural-logthen transformed approximatelynormallybase.lookedin data a The follows, if(spec=1 practice) medicine, specialty spec=0 family The internalas if coviates. models are as Fixed effects usedmodels toassesswere withor and/orlogAllop,without the average age PCP

Density Density 0 .1 .2 .3 0 1.0e-04 2.0e-04 3.0e-04 4.0e-04 0 0 20000 40000 allop 5 logallop 60000 80000 10 a) Allopij= 0 + eij; b) Allopij = 0 + 1*Ageij+ eij ; c) Allopij = 0 + 1*Specij+ eij;

d) Allopij = 0 + 1*Ageij + 2*Specij+ eij

A random intercept at the PCP cluster level was then added in the models, where i stands for PCP cluster i and j refers to individual j.

a) Allopij = 0 + b0i + eij; b) Allopij = 0 + b0i + 1*Ageij+ eij ; c) Allopij = 0+ b0i + 1*Specij+ eij; d)

Allopij = 0 + b0i + 1*Ageij + 2*Specij+ eij

2. Assessment of proportion of having positive allowable outpatient charges as a function of age and specialty effect a) I first used a fixed effect logistic regression model to assess the proportion of having positive allowable outpatient charges (denoted as op_bin). The model is as follows,

log odds(op_bin=1) = 0 + 1*Ageij + 2*Specij+ eij b) Then a random intercept at the PCP level was added,

log odds(op_bin=1) = 0 + b0i + 1*Ageij + 2*Specij+ eij c) Finally I added a random slope at the PCP level on age in addition to the random intercept. In order to make the WinBUGs program run successfully for this model, I used an indicator variable

“ages” for the age function (i.e. ages = 1 when age>45, otherwise age=0). The model is as follows,

log odds(op_bin=1) = 0 + b0i + (1+ b1i ) *Agesij + 2*Specij+ eij

3. Assessment of proportion of having positive allowable specialist charges as a function of age and PCP specialty a) I first used a fixed effect model to estimate the proportion of having positive allowable specialist charges (denoted as sp_bin). The model is as follows,

log odds(sp_bin=1) = 0 + eij b) I then fitted a model accounting the random PCP cluster effect. The model is as follows,

log odds(sp_bin=1) = 0 + b0i + eij c) Then a random intercept model was fitted in the presence of age and specialist covariates,

log odds(sp_bin=1) = 0 + b0i + 1*Ageij + 2*Specij+ eij

Results

1. Estimation of median allowable outpatient expenditure using fixed and random effects models.

a) Only individuals with positive outpatient expenditure were used in the analysis. Without

adding the covariates, the average log Allop is estimated to be 5.827 by the fixed effects model

and 5.823 by the random effects model. These two estimates are quite similar. The median Allop

is then estimated to be exp(5.823) = $338 by the fixed model and exp(5.827) = $339. For the

random effect model, the unexplained variance at PCP level is estimated to be 0.041.

Fixed effect

logallop | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------|------_cons | 5.827588 .0152364 382.48 0.000 5.797721 5.857455 ------Random effect:

logallop | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------_cons | 5.823024 .0203764 285.77 0.000 5.783087 5.862961 Variance at level 1 2.0488351 (.03109222) Variances and covariances of random effects ***level 2 (pcp) var(1): .0414679 (.0103667) b) By adding the age covariate alone, the log Allop is estimated to increase 0.032 with the age increases by 1 year by both the fixed effects and the random effects model. The unexplained variance at PCP level is estimated to be 0.026. The unexplained variance became smaller is probably because part of the variance is explained by added age covariate.

Fixed effects:

logallop | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------|------age | .0322313 .0012465 25.86 0.000 .0297878 .0346747 _cons | 4.427768 .0560964 78.93 0.000 4.317807 4.53773 ------Random intercept:

logallop | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------age | .0320034 .0012541 25.52 0.000 .0295454 .0344613 _cons | 4.437345 .0573523 77.37 0.000 4.324936 4.549753 ------Variance at level 1 1.918158 (.0290958) Variances and covariances of random effects ***level 2 (pcp) var(1): .02617777 (.00833582) c) By adding the PCP specialty covariate alone, the log Allop is estimated to be 0.22 and 0.24 more in the PCP specialty with internal medicine group than the group of PCP with family practice by the fixed effects model and the random effects model, respectively. The unexplained variance at PCP level in the randomly effect model is estimated to be 0.029, which is smaller than the variance estimated in a). Again, this is probably due to added Spec covariate.

Fixed effect:

logallop | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------|------spec | .2249364 .0310367 7.25 0.000 .1640973 .2857754 _cons | 5.738034 .0195835 293.00 0.000 5.699645 5.776422 ------

Random effect:

logallop | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------spec | .2480291 .0389789 6.36 0.000 .1716318 .3244265 _cons | 5.721612 .0250865 228.08 0.000 5.672443 5.770781 ------Variance at level 1 2.047152 (.03101318) Variances and covariances of random effects ***level 2 (pcp) var(1): .02908714 (.00866454) d) By adding the age and PCP specialty covariates, the log Allop is estimated to be 0.12 and 0.14 more in the PCP specialty with internal medicine group than the group of PCP with family practice by the fixed effects model and the random effects model, respectively. The log Allop is estimated to increase 0.031 with the age increases by 1 year by both the fixed effects and the random effects model The unexplained variance at PCP level in the randomly effect model is estimated to be 0.024.

Fixed effect:

logallop | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------|------age | .031545 .0012571 25.09 0.000 .0290808 .0340092 spec | .121667 .0302862 4.02 0.000 .0622991 .1810349 _cons | 4.409132 .0562409 78.40 0.000 4.298887 4.519377 ------

Random effect:

logallop | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------age | .031472 .0012598 24.98 0.000 .029003 .0339411 spec | .143023 .0372116 3.84 0.000 .0700896 .2159563 _cons | 4.401555 .0579371 75.97 0.000 4.288 4.51511 ------Variance at level 1 1.9165153 (.02903962) Variances and covariances of random effects ***level 2 (pcp) var(1): .02377868 (.00782131)

2. Proportion of having positive allowable outpatient charges as a function of age and specialty effect in fixed effects, random intercept and random slope models a) I first fitted a fixed effect logistic regression model to estimate the log odds of having positive allowable outpatient charges as a function of age and PCP specialty, using both WinBUGs and

STATA. Both programs produced similar results in terms of the regression coefficients and the standard errors. The log odds ratio of having positive allowable outpatient expenditure is estimated to be 0.12 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable outpatient expenditure is 0.032 between persons with age t and t-1.

WinBUGS output: node mean sd MC error 2.5% median 97.5% start sample alpha0 -0.3277 0.07473 0.003346 -0.4783 -0.3269 -0.1791 1001 10000 beta0 0.1198 0.04391 5.458E-4 0.03359 0.1196 0.2045 1001 10000 beta1 0.03198 0.001783 7.952E-5 0.02841 0.03197 0.03555 1001 10000 STATA output:

op_bin | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------age | .0320178 .0017677 18.11 0.000 .0285531 .0354825 spec | .1191968 .0437575 2.72 0.006 .0334337 .2049599 _cons | -.3292473 .0739045 -4.46 0.000 -.4740974 -.1843971

b) I then fitted a logistic regression model with a random intercept at the PCP level, using both

WinBUGS and STATA. Given a specific PCP cluster, the log odds ratio of having positive allowable outpatient expenditure is estimated to be 0.14 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable outpatient expenditure is 0.032 between persons with age t and t-1. The unexplained variance at PCP level is estimated to be 0.033 by both WinBUGS and STATA.

WinBUGS: node mean sd MC error 2.5% median 97.5% start sample alpha0 -0.3366 0.07283 0.004855 -0.4879 -0.335 -0.2032 1001 4000 beta0 0.1392 0.04911 0.001408 0.04521 0.1391 0.2362 1001 4000 beta1 0.03208 0.00173 1.153E-4 0.02878 0.03203 0.03558 1001 4000 sigma2 0.03328 0.0136 0.0013 0.008793 0.0324 0.06384 1001 4000

STATA:

op_bin | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------age | .0320859 .0017827 18.00 0.000 .0285919 .0355799 spec | .140124 .0511835 2.74 0.006 .0398062 .2404418 _cons | -.3374741 .076275 -4.42 0.000 -.4869704 -.1879777 ------Variances and covariances of random effects ***level 2 (pcp) var(1): .0334997 (.01245831) c) Finally, I generated an indicator variable “ages” based on the patients’ age data. Patients with age more 45 yr was coded as 1 and otherwise 0. I added a random slope on the ages covariate, along with the random intercept in the model. Both WinBUGS and STATA produced similar results in terms of the estimates of the coefficients. Given a specific PCP cluster, the log odds ratio of having positive allowable outpatient expenditure is estimated to be 0.17 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable outpatient expenditure is 0.58 between persons with age >45 and those with age <45.

The unexplained variance at PCP level induced by the random intercept is estimated to be 0.026 and 0.032 by WinBUGS and STATA, respectively. The unexplained variance at PCP level induced by the random slope is estimated to be 0.0042 and 0.00024 by WinBUGS and STATA, respectively.

WinBUGS: node mean sd MC error 2.5% median 97.5% start sample V[1,1] 0.02614 0.01164 9.617E-4 0.009449 0.02466 0.0511 1001 10000 V[1,2] 0.002806 0.008447 8.286E-4 -0.01729 0.00363 0.02118 1001 10000 V[2,1] 0.002806 0.008447 8.286E-4 -0.01729 0.00363 0.02118 1001 10000 V[2,2] 0.004187 0.005505 5.341E-4 1.987E-4 0.00234 0.02147 1001 10000 alpha0 0.7289 0.03324 7.879E-4 0.6646 0.7287 0.795 1001 10000 beta0 0.5813 0.04325 7.016E-4 0.4963 0.5813 0.665 1001 10000 beta1 0.1675 0.04931 0.001176 0.0733 0.1674 0.2661 1001 10000

STATA: op_bin | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------ages | .5927755 .0442118 13.41 0.000 .5061219 .6794291 spec | .1691844 .0510158 3.32 0.001 .0691954 .2691735 _cons | .7414851 .0338559 21.90 0.000 .6751289 .8078414 ------Variances and covariances of random effects ***level 2 (pcp) var(1): .03242069 (.01532337) cov(2,1): .00029059 (.01351884) cor(2,1): .99996697 var(2): 2.605e-06 (.00024311)

3. Proportion of having positive specialist charges in fixed effect an, random intercept models a) First I estimated the proportion of having positive specialist charges without adding any covariate or accounting for the PCP cluster effect. The proportion of having positive specialist charges is estimated to be 0.245 by both WinBUGS and STATA.

WinBUGS: node mean sd MC error 2.5% median97.5% start sample p 0.2451 0.003948 5.387E-5 0.2374 0.2451 0.253 1001 4000

STATA: op_spec | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------_cons | -1.124473 .0210496 -53.42 0.000 -1.16573 -1.083217 ------Pr(op_spec) | Freq. Percent Cum. ------|------.2451825 | 12,195 100.00 100.00 ------|------Total | 12,195 100.00 b) A random intercept was then added in the model to account for the PCP cluster effect. While considering the random effects, the average log odds of having positive allowable specialist charge is estimated to be –1.167. The average probability of having positive allowable specialist is then estimated to be 0.24. The unexplained variance induced by the random intercept is estimated to be 0.14 by both WinBUGS and STATA.

WinBUGS: node mean sd MC error 2.5% median 97.5% start sample alpha0 -1.167 0.03295 0.001017 -1.234 -1.167 -1.102 1001 3000 pop.mean 0.2374 0.005964 1.841E-4 0.2256 0.2373 0.2494 1001 3000 sigma2 0.1464 0.02832 0.001727 0.09717 0.1441 0.2056 1001 3000

STATA:

sp_bin | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------_cons | -1.167869 .0330721 -35.31 0.000 -1.232689 -1.103049 ------Variances and covariances of random effects

***level 2 (pcp)

var(1): .14486629 (.02785468)

phat4 | Freq. Percent Cum. ------|------.2438479 | 12,195 100.00 100.00 ------|------Total | 12,195 100.00 c) The random intercept model was then added with the age and PCP specialty covariates.

Conditioning on specific PCP cluster, the log odds ratio of having positive allowable specialist expenditure is estimated to be 0.39 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable specialist expenditure is 0.029 between persons with age t and t-1. The unexplained variance at PCP level is estimated to be 0.09 by both

WinBUGS and STATA, which is smaller than the variance estimated in b). This is probably because some of the unexplained variance was explained by the added covariates. WinBUGS: node mean sd MC error 2.5% median 97.5% start sample alpha0 -2.574 0.0963 0.004931 -2.751 -2.577 -2.381 1001 10000 beta0 0.3925 0.06189 0.001401 0.2707 0.3921 0.5172 1001 10000 beta1 0.02893 0.002002 1.019E-4 0.02491 0.02897 0.03275 1001 10000 sigma2 0.0942 0.02241 9.571E-4 0.05648 0.09208 0.1436 1001 10000

STATA: sp_bin | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------|------age | .0290063 .0018979 15.28 0.000 .0252865 .032726 spec | .3938617 .0601857 6.54 0.000 .2758999 .5118236 _cons | -2.577151 .0915153 -28.16 0.000 -2.756518 -2.397784 ------|------Variances and covariances of random effects ***level 2 (pcp) var(1): .09206369 (.02173319)

Conclusion and Discussion

In this study, I intended to analyze the hospital expenditure data using the dataset on 18-64 yr-old male patients with MD principal care providers in 1996. Because the patients had different PCP, it is reasonable to assume that there is variance across PCP clusters in addition to the variance within the PCP cluster. Therefore, multilevel models were used in my data analysis. Because the patients’ ages vary from 18 to 64, and older people are likely to get sick more often than younger people, it is reasonable to add age as a covariate in our model. Besides, some PCPs specialized in internal medicine and the others just had family practice, therefore it also makes sense to add the

PCP specialty as another covariate in our model.

I first used fixed and random intercept linear regression models to estimate the average log

annual allowable outpatient expenditure on those who had positive expenditure. Without adding

the covariates, the average log Allop is estimated to be 5.82. Therefore the median allowable

outpatient charge is estimated to be $338. Although the estimates were similar from the fixed and

random effect models, the interpretation is different. For the fixed model, it gives the estimates

of the average population whereas the random effect model assumes the expected log

expenditure in each cluster is different. The estimate is the average of expected expenditure in

each cluster. For the random effect model, the unexplained variance induced by the random intercept at PCP level is estimated to be 0.041. After adding the age and PCP specialty

covariates, the unexplained variance induced by random intercept is reduced to 0.024.

Then I assessed the log odds of having positive allowable hospital expenditures as a function of age and PCP specialty using a fixed logistic regression model, a random intercept model as well as a random intercept-plus-random slope model. From the fixed model, we found the log odds ratio of having positive allowable outpatient expenditure is estimated to be 0.12 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable outpatient expenditure is 0.032 between persons with age t and t-1. From the random intercept model we concluded that given a specific PCP cluster, the log odds ratio of having positive allowable outpatient expenditure is estimated to be 0.14 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable outpatient expenditure is 0.032 between persons with age t and t-1. While adding the random slope on age in the model, I created a new age indicator variable to test the log odds ratio between people who are older and younger than 45 yr. Given a specific PCP cluster, the log odds ratio of having positive allowable outpatient expenditure is estimated to be 0.17 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable outpatient expenditure is 0.58 between persons with age >45 and those with age <45.

Finally, the log odds of having allowable specialist charges were also assessed by logistic regression. The average probability of having specialist charges is estimated to be 0.2451 in the fixed model, whereas the random effect models estimated the average proportion of having specialist charges conditioning on each specific cluster is about 0.24. Using the random intercept model with the age and PCP specialty covariates we found that the log odds ratio of having specialist charges is 0.39 between PCP specialty with internal medicine and family practice. The log odds ratio of having positive allowable specialist expenditure is 0.029 between persons with age t and t-1. In logistic regression analysis, the fixed effect and random effect models should also be interpreted differently. The fixed effect model assumes there is no variance across clusters, and therefore the expected intercept and slopes on the covariates are the same for the whole population. On the contrary, the random intercept model takes the cluster effect into account. It assumes that there is variation of the random intercept across clusters, which follows certain distributions. The random slope model, on the other hand, assumes that the slope varies across clusters following a certain distribution.

I also noticed that in random intercept models, when covariates were added, the unexplained variance induced by the random intercept becomes smaller, this is probably because part of the unexplained variance was accounted by the added covariates.

WinBUGS and STATA generally produce similar results, but they are not exactly the same. This is probably because WinBUGS and STATA use different approaches to obtain the estimates.

WinBUGS uses the Baysian method to get the posterior distributions based on the priors, whereas

STATA uses frequentist’s approaches to calculate the maximum likelihood. In the random slope model, we could see that there is a big difference in the estimation of the variance induced by the random slope. This might be due to the different methods the softwares took to calculate the variances, or it might be because the prior distribution (omeaga) I set in my WinBUGS program was not the best.

Taken together, in this project I analyzed the medical expenditure dataset using random effects models. WinBUGS and STATA were both used in my analysis. The results showed that there is a significant effect of age and PCP specialty on median annual allowable outpatient expenditure as well as the proportion of total allowable and specialist outpatient expenditures.