Attributable Risk Estimation Using Sas/Stat Software
Total Page:16
File Type:pdf, Size:1020Kb
ATTRIBUTABLE RISK ESTIMATION USING SAS/STAT SOFTWARE Olaf Gefeller Abteilung Medizinische Statistik, Georg-August-Universitat Gottingen Abstract In epidemiologic studies devot~d to .the analysis of the relationship between an exposure characteristic and a disease the attributable risk constitutes an interesting epidemiologic risk measure. It can be interpreted as the proportion of cases of disease due to the exposure under study among all cases of disease in the study population. Thus, it indicates the impact of the exposure factor on the disease load of the population and can aid public health administrators in determining promising targets for disease preventive strategies in the population. The estimation of attributable risk from data arranged in contingency tables has received much attention in the statistical literature during recent years. However, none of the leading statistical software packages including SAS has incorporated the attributable risk in the catalogue of measures of association to be automatically calculated in some procedure. In the paper a brief introduction to the theory and application of attributable risk estimation from epidemiologic data is provided. Then, a general method is shown how to calculate attributable risk estimators under the multinomial sampling model usingSAS/STAT* software. Data from the Luebeck Blood Pressure Study, a cross-sectional study in the field of cardiovascular disease epidemiology, are used to illustrate the practical application of the method. Keywords: attributable risk, epidemiologic methods, measure of association 1. Introduction The quantification and measurement of disease risk associated with some exposure characteristic constitute a major goal of epidemiologic research. However, different conceptual approaches to describe the association between exposure and disease have been put forward in the ·epidemiologic literature. The most popular concept consists of employing the relative risk or, as its approximation in some special study designs, the odds ratio to this problem. Relative risk is defined as the relative increase (or decrease) of the disease probability among exposed subjects compared to unexposed. It may be thought of as indicating the strenght of the physiologic effects of the exposure to the disease under study. On the other hand, this measure does not take into account the proportion of individuals exposed to the risk factor in the population, and so it is quite possible to identify a risk factor with a high relative risk, but which is not an important public health problem, because very few individuals are exposed to it. Therefore, another measure has been developed to address this issue of the impact of an exposure on the disease burden of a population: the attributable risk. This paper provides a brief introduction to theory and application of this concept and proposes a method to calculate estimators of attributable risk and their asymptotic variances using SAS/STAT* software. 280 . .. ~ 2. Definition of Attributable. Risk Consider a population that is classified into groups according to a dichotomous disease variable D and some dichotomous exposure characteristic E. Let P(DIE), P(DIE) and P(D) represent the incidence (or preva lence) of disease among the exposed, the unexposed, and the entire population, respectively. The attributable risk according to Levin (1953) is defined as AR = P(D) - P(DIE) . P(D) .. This measure can be interpreted as the proportion of cases of disease due to the exposure among all cases of disease in the population or, to put it in other words, as the percentage of cases of disease preventable by total elimination of the exposure in the population. These interpretations assume a causal relationship between exposure and disease, i.e., the exposure has to play an essential role in producing an occurrence of the disease and must not only be a mere correlate of the disease. The term 'attributable risk' is unwarranted if no cause-effect relation exists (Rothman, 1976). In the simple 2 X 2 - table situation the maximum likelihood estimator of AR and its asymptotic variance can be derived easily under all sampling models (Walter, 1976). In the particular situation of a 2 X 2 - table from cross-sectional data the sampling model is represented by a multinomial distribution with four independent parameters (the sample size and three cell probabilities). iI . The maximum likelihood estimator of the attributable risk - which is consistent and asymptotically normal - !.. and its· asymptotic variance can be easily derived as: - ad-be AR = -:---:--:-----:- . (a+e)·(c+d) Va;.A:R = eN· (ad. (N - e) + be2 ) (a + ep· (e + dP where a denotes the number of exposed cases, b the number of exposed non-cases, e the number of unexposed cases, d the number of unexposed non-cases, and N the total number of subjects, respectively. 3. Extension to the Multivariable Case Typically in observational epidemiologic studies the exposure-disease relationship is affected by confounding and/or effect-modification due to other variables. The reduction to a simple 2 X 2 - table of disease and exposure constitutes often an unrealistic approach to estimate the potential exposure impact on the disease load in the population. As a consequence, the estimator derived from the simple 2 X 2 - table cannot be interpreted in the way described above. The extension to the multivariable case to adjust for potential confounding and effect-modification leads to the situation of a 2 X 2 X I< - table where the third dimension is represented by a stratum variable C with I< levels which could be one observed variable or a construct of the combination of two ormore variables. The distributional assumption in the situation of an unrestricted sampling of subjects with post sampling stratification is that of one multinomial distribution with 4I< independent parameters ·(sample size and 4I< - 1 cell probabilities). Two different models should be distinguished: (i) homogeneity model, (ii) interaction model. In (i) C acts as a pure confounding variable, i.e., the relative risks in the strata are all equal but different from the crude relative risk in the collapsed 2 X 2 - table. In (ii) C acts as an effect-modifying variable, i.e., the relative risks differ between the strata. 281 Under both models the adjusted attributable risk to be estimated from the data of the 2 x 2 x K - table can be expressed analogously to the 2 X 2 - situation as: K P(D) - 2: P(Gi )· P(DIE, Gi ) A ;=1 Radjusted = ----==.~P........,..(D~)----- where P(Gi ) represents the proportion of the population in the i-th stratum of G, and P(DIE, Gi) denotes the stratum-specific disease probability among the unexposed in the i-th stratum of G. Different strategies to adjust for confounding and effect-modification in the estimation of attributable risks have been proposed in the literature. These strategies can be categorized into three types. Type I estima tors employ a weighting procedure of the stratum-specific attributable risk estimates (Ejigou, 1979, Walter, 1980, Whittemore, 1982). Type II estimators use the functional relationship of relative and attributable risk to adjust attributable risk via adjustment of the relative risk (Miettinen, 1974, Greenland, 1984, Bruzzi et al., 1985, Greenland, 1987, Kuritz & Landis, 1988a, 1988b). The type III estimator adapts Miettinen's fac torization idea in the context of relative risk adjustment (Miettinen, 1972) to attributable risk adjustment (Walter, 1976). A formal presentation and a detailed discussion of these estimators are provided elsewhere (Gefeller, 1991a). A simulation study was conducted to investigate the finite properties of some of these ad justed attributable risk estimators under the unrestricted multinomial sampling model of the cross-sectional study design (Gefeller, 1991 b ). The simulation study demonstrated that the maximum likelihood estimator, a special type I estimator resulting from the 'case load weighting' of the stratum-specific attributable risk esti mates, will be the best overall choice under this sampling model. It was practically unbiased in all situations. Other adjusted attributable risk estimators depended heavily on the underlying structure of the multinomial model. 4. Computational Realization using SAS/STAT Software The estimation of attributable risks and their asymptotic variances from the data of 2 x 2 - or 2 x 2 x K - tables is not directly performed by any of the statistical analysis procedures of the SAS software (None of the leading statistical software packages has incorporated the attributable risk in the catalogue of measures of association to be automatically calculated in some procedure). Of course, a specific program using the matrix language SAS/IML* or the SAS command language could be written to do the calcula tions. However, in the situation of the multinomial sampling model there exists an easier way by 'adapting' a standard procedure of the SAS/STAT* software. As outlined in other papers (Gefeller & Woltering, 1991a, Gefeller & Woltering, 1991b), a general method of estimating measures of association and their asymptotic variances under the multinomial sampling model using PROC CATMOD can be employed. In the context of attributable risk estimation this new method is presented here for (a) the estimation of the attributable risk in 2 x 2 - tables, (b) the estimation of the adjusted attributable risk in 2 x 2 x K - tables using the 'case load weighting' method (Walter, 1980, Whittemore, 1982). 282 (a) Estimation