ATTRIBUTABLE RISK ESTIMATION USING SAS/STAT SOFTWARE

Olaf Gefeller Abteilung Medizinische Statistik, Georg-August-Universitat Gottingen

Abstract

In epidemiologic studies devot~d to .the analysis of the relationship between an exposure characteristic and a disease the attributable risk constitutes an interesting epidemiologic risk measure. It can be interpreted as the proportion of cases of disease due to the exposure under study among all cases of disease in the study population. Thus, it indicates the impact of the exposure factor on the disease load of the population and can aid public health administrators in determining promising targets for disease preventive strategies in the population. The estimation of attributable risk from data arranged in contingency tables has received much attention in the statistical literature during recent years. However, none of the leading statistical software packages including SAS has incorporated the attributable risk in the catalogue of measures of association to be automatically calculated in some procedure. In the paper a brief introduction to the theory and application of attributable risk estimation from epidemiologic data is provided. Then, a general method is shown how to calculate attributable risk estimators under the multinomial sampling model usingSAS/STAT* software. Data from the Luebeck Blood Pressure Study, a cross-sectional study in the field of cardiovascular disease , are used to illustrate the practical application of the method.

Keywords: attributable risk, epidemiologic methods, measure of association

1. Introduction The quantification and measurement of disease risk associated with some exposure characteristic constitute a major goal of epidemiologic research. However, different conceptual approaches to describe the association between exposure and disease have been put forward in the ·epidemiologic literature. The most popular concept consists of employing the or, as its approximation in some special study designs, the to this problem. Relative risk is defined as the relative increase (or decrease) of the disease probability among exposed subjects compared to unexposed. It may be thought of as indicating the strenght of the physiologic effects of the exposure to the disease under study. On the other hand, this measure does not take into account the proportion of individuals exposed to the risk factor in the population, and so it is quite possible to identify a risk factor with a high relative risk, but which is not an important public health problem, because very few individuals are exposed to it. Therefore, another measure has been developed to address this issue of the impact of an exposure on the disease burden of a population: the attributable risk. This paper provides a brief introduction to theory and application of this concept and proposes a method to calculate estimators of attributable risk and their asymptotic variances using SAS/STAT* software.

280 .

.. ~ 2. Definition of Attributable. Risk

Consider a population that is classified into groups according to a dichotomous disease variable D and some dichotomous exposure characteristic E. Let P(DIE), P(DIE) and P(D) represent the (or preva­ lence) of disease among the exposed, the unexposed, and the entire population, respectively. The attributable risk according to Levin (1953) is defined as

AR = P(D) - P(DIE) . P(D) .. This measure can be interpreted as the proportion of cases of disease due to the exposure among all cases of disease in the population or, to put it in other words, as the percentage of cases of disease preventable by total elimination of the exposure in the population. These interpretations assume a causal relationship between exposure and disease, i.e., the exposure has to play an essential role in producing an occurrence of the disease and must not only be a mere correlate of the disease. The term 'attributable risk' is unwarranted if no cause-effect relation exists (Rothman, 1976). In the simple 2 X 2 - table situation the maximum likelihood estimator of AR and its asymptotic variance can be derived easily under all sampling models (Walter, 1976). In the particular situation of a 2 X 2 - table from cross-sectional data the sampling model is represented by a multinomial distribution with four independent parameters (the sample size and three cell probabilities).

iI . The maximum likelihood estimator of the attributable risk - which is consistent and asymptotically normal - !.. and its· asymptotic variance can be easily derived as:

- ad-be AR = -:---:--:-----:- . (a+e)·(c+d)

Va;.A:R = eN· (ad. (N - e) + be2 ) (a + ep· (e + dP where a denotes the number of exposed cases, b the number of exposed non-cases, e the number of unexposed cases, d the number of unexposed non-cases, and N the total number of subjects, respectively.

3. Extension to the Multivariable Case

Typically in observational epidemiologic studies the exposure-disease relationship is affected by confounding and/or effect-modification due to other variables. The reduction to a simple 2 X 2 - table of disease and exposure constitutes often an unrealistic approach to estimate the potential exposure impact on the disease load in the population. As a consequence, the estimator derived from the simple 2 X 2 - table cannot be interpreted in the way described above. The extension to the multivariable case to adjust for potential confounding and effect-modification leads to the situation of a 2 X 2 X I< - table where the third dimension is represented by a stratum variable C with I< levels which could be one observed variable or a construct of the combination of two ormore variables. The distributional assumption in the situation of an unrestricted sampling of subjects with post­ sampling stratification is that of one multinomial distribution with 4I< independent parameters ·(sample size and 4I< - 1 cell probabilities). Two different models should be distinguished: (i) homogeneity model, (ii) interaction model. In (i) C acts as a pure confounding variable, i.e., the relative risks in the strata are all equal but different from the crude relative risk in the collapsed 2 X 2 - table. In (ii) C acts as an effect-modifying variable, i.e., the relative risks differ between the strata.

281 Under both models the adjusted attributable risk to be estimated from the data of the 2 x 2 x K - table can be expressed analogously to the 2 X 2 - situation as:

K P(D) - 2: P(Gi )· P(DIE, Gi ) A ;=1 Radjusted = ----==.~P...... ,..(D~)----- where P(Gi ) represents the proportion of the population in the i-th stratum of G, and P(DIE, Gi) denotes the stratum-specific disease probability among the unexposed in the i-th stratum of G. Different strategies to adjust for confounding and effect-modification in the estimation of attributable risks have been proposed in the literature. These strategies can be categorized into three types. Type I estima­ tors employ a weighting procedure of the stratum-specific attributable risk estimates (Ejigou, 1979, Walter, 1980, Whittemore, 1982). Type II estimators use the functional relationship of relative and attributable risk to adjust attributable risk via adjustment of the relative risk (Miettinen, 1974, Greenland, 1984, Bruzzi et al., 1985, Greenland, 1987, Kuritz & Landis, 1988a, 1988b). The type III estimator adapts Miettinen's fac­ torization idea in the context of relative risk adjustment (Miettinen, 1972) to attributable risk adjustment (Walter, 1976). A formal presentation and a detailed discussion of these estimators are provided elsewhere (Gefeller, 1991a). A simulation study was conducted to investigate the finite properties of some of these ad­ justed attributable risk estimators under the unrestricted multinomial sampling model of the cross-sectional study design (Gefeller, 1991 b ). The simulation study demonstrated that the maximum likelihood estimator, a special type I estimator resulting from the 'case load weighting' of the stratum-specific attributable risk esti­ mates, will be the best overall choice under this sampling model. It was practically unbiased in all situations. Other adjusted attributable risk estimators depended heavily on the underlying structure of the multinomial model.

4. Computational Realization using SAS/STAT Software

The estimation of attributable risks and their asymptotic variances from the data of 2 x 2 - or 2 x 2 x K - tables is not directly performed by any of the statistical analysis procedures of the SAS software (None of the leading statistical software packages has incorporated the attributable risk in the catalogue of measures of association to be automatically calculated in some procedure). Of course, a specific program using the matrix language SAS/IML* or the SAS command language could be written to do the calcula­ tions. However, in the situation of the multinomial sampling model there exists an easier way by 'adapting' a standard procedure of the SAS/STAT* software. As outlined in other papers (Gefeller & Woltering, 1991a, Gefeller & Woltering, 1991b), a general method of estimating measures of association and their asymptotic variances under the multinomial sampling model using PROC CATMOD can be employed. In the context of attributable risk estimation this new method is presented here for

(a) the estimation of the attributable risk in 2 x 2 - tables,

(b) the estimation of the adjusted attributable risk in 2 x 2 x K - tables using the 'case load weighting' method (Walter, 1980, Whittemore, 1982).

282 (a) Estimation in 2 x 2 - tables The simple SAS-program realizing the general method in this particular situation is printed below (CONT­ FREQ denotes the variable containing the observed frequencies of the cells of the contingency table defined by the variable E (= exposure) and D (= disease)). proc catmodj weight CONTFREQj model E * D = (1) / nodesign noprofilej response exp 1 -1 log 1 -1 0 0 0, 1 0 1 1 1 exp 1 0 0 1, 0 1 1 0, 1. 0 1 0, 0 0 2 .0, 0 0 1 1 log / out = estimate( keep = _obs_ -seobs_ rename(..obs_ = ar -seobs_ = stdar))j Running the program on an appropriate data set yields the estimator of the attributable risk and its asymp­ totic standard error in the ouput of PROC CATMOD under the heading 'Analysis of weighted-least-squares estimates; and in the data set 'estimate'· as well.

(b) Estimation in ~ x 2 X I< - tables In addition to the same. variables as used in (a) the variable C (= confounder) denotes the stratifying factor. The RESPONSE statement. of PROC CATMOD depends on the actual dimension of the contingency table specified by the value .I<. As an example, the program for the estimation from a 2 x 2 x 4 - table is presented below (this program corresponds to the practical exampledicussed in section 5). proc catmodj weight CONTFREQj model E * D * C = (1) / nodesign noprofilej response 1 -1 1 -1 1 -1 1 -1 exp 0 1 0 0 0 0 0 0 0 0 0 0 0, 1 1 -1 0 0 0 0 0 0 0 0 0 -1, 0 0 0 0 1 0 0 0 0 0 0 0 0, 0 0 0 1 1 -1 0 0 0 0 0 0 -1, 0 0 0 0 0 0 0 1 0 0 0 0 0, 0 0 0 0 0 0 1 1 -1 0 0 0 -1, 0 0 0 0 0 0 0 0 0 0 1 0 0, 0 0 0 0 0 0 0 0 0 1 1 -1 -1

283 log 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0, 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0, 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0, 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0, 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0, 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0, 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0, 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0, 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0, 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0, 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1, 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1, 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 / out = estimate( keep = _obs_ ...seobs_ rename(_obs_ = ar ...seobs_ = stdar))j Running the program on an appropriate data set yields the estimator of the attributable risk and its asymp- totic standard error in the ouput of PROC CATMOD under the heading 'Analysis of weighted-least-squares estimates' and in the data set 'estimate' as well.

5. Practical Example: The Luebeck Blood Pressure Study As a practical illustration of methods discussed in this paper data of the Luebeck Blood Pressure Study (LBS), a study in the field of cardiovascular disease epidemiology, are used. The LBS is a cross-sectional study on a systematic random sample of 3100 (2833 available) Luebeck citizens, aged 30 to 69 years, of whom 2359 took part in the study. Details of the study have been published elsewhere (Keil et aI., 1986, Keil et aI., 1988, Keil et al. 1989). The principal goal of the LBS was to provide data on the , awareness, treatment, and control of hypertension and on other cardiovascular risk factors. For the purpose of this illustration data on the relationship between hypertension (the disease variable) and obesity (the exposure variable) in men are selected. Hypertension was defined according to the WHO-criteria, and obesity was assessed using the Body Mass Index with a dichotomization cutpoint of 25 kg/m2 as recommended by Bray (1978). The crude 2 x 2 - table is given in table 1.

Table 1: Data of Luebeck men aged 30 - 69 yrs on the relationship between hypertension and obesity (LBS 1984)

not hypertensive hypertensive obese 165 553 718 not obese 46 301 347 211 854 1065

-AR 0.331 Va;.A'R 0.00646

284 Using this data Levin's attributable risk AR is computed as nearly one third, meaning that among all hypertensives 33.1 % can be attributed to obesity. However, this value is epidemiologically meaningless unless we consider the potentially confounding or effect-modifying effects of other variables. In this situation age is known to be a relevant confounder. After stratifying by age (in ten year groups) different adjustment methods (type I - III estimators) were employed. The computational realization has been taken place using PROC CATMOD of the SAS/STAT* software as outlined in the preceding section. Table 2 summarizes the results of these adjustment methods.

Table 2: Adjusted attributable risks, their estimated asymptotic standard errors, and 95% - confidence intervals for the association between hypertension and obesity in 1065 Luebeck men aged 30 - 69 years

Point Asymptotic 95 % - CI Estimator estimate standard error (Logit-Version) ,. Type I: !-' - case load 0.317 0.085 (0.177, 0.500) - precision 0.318 0.083 (0.181, 0.496)

Type II: - Mantel-Haenszel 0.318 0.083 (0.181, 0.496) - Tarone 0.318 0.083 (0.180, 0.496) - OR approximation 0.369 0.088 (0.217,0.551)

Type III: 0.326 0.097 (0.169, 0.535)

The point estimates are very similar with the exception of the type II estimator based on the odds ratio approximation and the type III estimator. The former one is much too high, because the odds ratio constitutes a bad approximation to the relative risk in this situation. The type III estimator revealed bad properties under a variety of situations in the simulation study (Gefeller, 1991b). In this particular case its bias is only small.

~. 6. Discussion

The concept of attributable risk addresses the question of the public health impact of an exposure for a disease in a particular population under study. It should in no way be regarded as a substitute for the relative risk, but rather as an additional dimension of the health hazard appraisal. The identification of an exposure characteristic with a high relative risk may yield important clues to the mechanism of the disease development. But this would be of little interest to health administrators planning preventive strategies for the disease, if the same exposure were found only rarely in the population. Therefore, the attributable risk should play an important role in guiding health administrators to a rational choice of disease preventive strategies. Attributable risk estimation from the data of an is highly dependent on the prevalence of

28~ other confounding and/or effect-modifying factors in the population. The multifactorial approach to adjust for confounding and effect-modification in the estimation of attributable risk is often more realistic than the reduction to a simple 2 x 2 - table of disease and exposure. Therefore, one should routinely incorporate these adjustment methods when estimating attributable risk. The resulting estimate may reveal an interesting feature of the association under study. The paper has presented a general method of using SAS/STAT· software to estimate attributable risks and their asymptotic variances under the multinomial sampling model. The method can be applied to the estimation in 2 x 2 - tables and to summary attributable risk estimation from 2 x 2 x K - tables as well. The restriction of the computational method results from the distributional assumption implicitely employed when using PROC CATMOD. However, for all situations of the multinomial sampling model (e.g. given in cross-sectional and special cohort studies) the approach provides a flexible and convenient way of estimating attributable risks and their asymptotic variances.

References Bray, G.A. (1978). Definition, measurement and classification of the syndrom of obesity. Int. J. Obesity 2, 99-112. Bruzzi, P., Green, S.B., Byar, D.P., Brinton, L.A., Schairer, C. (1985). Estimating the population attributable risk for multiple risk factors using case-control data. Am. J. Epidemiol. 122, 904-914. Ejigou, A. (1979). Estimation of attributable risk in the presence of confounding. Biom. J. 21, 155-165. Gefeller, D. (1991a). Comparison of adjusted attributable risk estimators. Stat. Med. (in press). Gefeller, D. (1991b). Summary attributable risk estimation in 2 x 2 x f( - tables. In: MIE 91 Proceedings. Springer Verlag, Berlin (in press). Gefeller, D. & Woltering, F. (1991a). A general method of estimating measures of association and their asymptotic variances under the multinomial model using standard SAS software. Comput. Statist. Data i Analysis (submitted). ';' " Gefeller, D. & Woltering, F. (1991b). How to use PROC CATMOD in estimation problems. In: SEUGI j: 91 Proceedings. SAS Institute, Heidelberg (in press). 1, ~~ Greenland, S. (1984). Bias in methods for deriving standardized morbidity ratio and attributable fraction \. r estimates. Stat. Med. 3, 131-141. ~,'. Greenland, S. (1987). Variance estimators for attributable fraction estimates consistent in both large strata ~' and sparse data. Stat. Med. 6, 701-708. [} ~; Keil, U., Remmers, A., Chambless, L., Hense, H.W., Stieber, J., Lauck, A. (1986). Epidemiolo­ A' tC gie des Bluthochdruckes. Hiiufigkeit, Verteilung, Bekanntheits- und Behandlungsgrad der Hypertonie in der t: t Hansestadt Luebeck. MMW 128, 424-429. ~' i> Keil, U., Gefeller, D., Stieber, J. (1988). Rauchverhalten in Luebeck. Ergebnisse der Luebecker Blut­ ~: t. druckstudie. Fortschr. Med. 106, 563-567. Keil, U., Chambless, W., Remmers, A. (1989). Alcohol and blood pressure: Results from the Luebeck Blood Pressure Study. Prevo Med. 18, 1-10. I"'~ Kuritz, S.J. & Landis, J.R. (1988a). Summary attributable risk estimation from unmatched case-control t data. Stat. Med. 7, 507-517. ~ Kuritz, S.J. & Landis, J.R. (1988b). Attributable risk estimation from matched case-control data. Bio­ ~ metrics 44, 355-367. ~. t'tt· I \ ~ \" ~ f 286 ~ ~ Levin, M.L. (1953). The occurrence of lung cancer in man. Acta Unio Internationalis contra Cancrum 9, 531-54l. Miettinen, O.S. (1972). Components of the crude risk ratio. Am. J. Epidemiol. 96, 168-172. Miettinen, O.S. (1974). Proportion of disease caused or prevented by a given exposure, trait or intervention. Am. J. Epidemiol. 99, 325-332. Rothman, K.J. (1986). Modern epidemiology. Little, Brown & Co., Boston. Walter, S.D. (1976). The estimation and interpretation of attributable risk in health research. Biometrics 32, 829-849. Walter, S.D. (1980). Prevention for multifactorial diseases. Am. J. Epidemiol. 112,409-416. Whittemore, A.S. (1982). Statistical methods for estimating attributable risk from retrospective data. Stat. Med. 1, 229-243.

SASjSTAT and SASjIML are registered trademarks of SAS Institute Inc., Cary, NC, USA.

287.