Deriving Relative Risks from Aggregate Data. 1. Theory

J Epidemiol Community Health: first published as 10.1136/jech.42.4.333 on 1 December 1988. Downloaded from Journal of Epidemiology and Community Health, 1988, 42, 333-335 Deriving relative risks from aggregate data. 1. Theory THOR NORSTROM From the Swedish Institute for Social Research, University of Stockholm, S-10691 Stockholm, Sweden SUMMARY Sociological macro analyses ofthe association between risk factors and mortality can be seen as a valuable supplement to epidemiological micro studies. However, sociologists and epidemiologists typically employ different measures of association and this hampers strict comparisons of findings. This study presents a synthetic approach relying on both micro and data. In Part 1, the macro mathematical relations between the relative risk and the attributable fraction on the one hand, and the regression coefficient on the other are derived in order to make cross level comparisons possible. Part 2 provides an empirical illustration of the approach. guest. Protected by copyright. The traditional method in epidemiology for For some investigating the purposes, eg, public health planning, it is relationship between a suspected desirable to go beyond the relative risk and to estimate aetiological agent and a specific cause of death is the the overall importance of a risk factor for a specific observational study, either prospective or cause ofdeath. In this context the attributable fraction retrospective. A sociological approach which is (AF) is useful. There are several formulas for its becoming increasingly widespread is to estimate this calculation, but the following is often applied.' relationship on the basis of aggregate ecological or time series data. With these differences in approach there come also AF 0 ) differences in measures ofassociation 0 (RR-l (2) between risk factors and mortality, a state of affairs (RR-1)+1 that tends to obstruct the cumulation ofevidence. This where 0 is the proportion ofthe population to is a loss of research efficiency, because the findings the exposed from the two risk factor, and RR is the relative risk. The traditions actually supplement each attributable fraction the other. The present study outlines a expresses relative importance synthetic approach of the aetiological agent for the at in which both micro data (epidemiological) and macro mortality issue. Or put in another way: AF* 100 indicates by how many data (sociological) are relied upon, the aim being to cent integrate the per mortality is expected to decrease if the findings from the two. This requires an exposure were eliminated. understanding of how the micro and macro measures For those many situations where relate to each other, which is treated in 1 experimental http://jech.bmj.com/ Part of this studies are not feasible, the relative risk is based on study. In Part 2, the approach is empirically illustrated observational by micro studies, prospective or retrospective. and macro analyses of the association There are two main problems ofsuch The first between unemployment and designs. suicide. one is the difficulty in obtaining adequate exposure data. the Measures of Typically, exposure measure refers to one association in epidemiology and sociology single point of time prior to the follow up period. To keep track ofchanges in exposure requires designs that In the epidemiological literature, the common are expensive as well as time consuming (see on September 23, 2021 by measure of association between a risk factor and a Schlesselman2 for a more detailed discussion). The specific cause of death is the relative risk (RR). This second major problem is selection bias. This is found expresses how many times higher (or lower) the when correlates of the outcome variable (eg, mortality rate is among the exposed, as compared with mortality) are associated with the probability of being the unexposed: included in the study population.3 As an example, such bias may be present in studies of the association between and RR unemployment mortality, since Mortality rate in the exposed category unemployed persons as a group are probably less Mortality rate in the unexposed category healthy at the outset. 333 J Epidemiol Community Health: first published as 10.1136/jech.42.4.333 on 1 December 1988. Downloaded from 334 Thor Norstrom To substantiate the association between a potential In this example (table), the proportion of the risk factor and a specific cause of death, a larger population which is exposed to the risk factor is 0 10 at number of observational studies are typically relied time t = 1, and 0 11 at t = 2. The relative risk (RR) of upon. If the findings are consistent, one may have the exposed category is 150/50, or 3-0. As noted above more confidence in causal inferences. There is still the the regression coefficient (b) expresses the increase in disquieting possibility that all these estimates are the mortality rate associated with a one unit increase in plagued by a common source ofbias, eg, selection bias. the predictor. If we regard the risk factor (measured as One avenue out of this dilemma is to broaden the per cent exposed) as the predictor (X) in a regression empirical basis by including studies in which the analysis, there is a one unit (ie, one per cent point) potential bias is quite different. Such an opportunity is increase in X between t, and t2. Hence b is equal to the offered by time series studies of aggregate data. change in the mortality rate between t, and t2, ie, By the same token, aggregate studies of the 61-60= 1.0. relationships between potential risk factors and Linking with the example displayed in the table, the mortality would become more powerful if the relation between RR and b is derived as follows: researchers more systematically related their macro denote population size N, and the death intensity of findings to existing micro evidence. There is thus a the unexposed category PO. The change between t I and need for cross level comparisons. However, as t2 yields N/100 additional individuals in the exposed lamented by Rothman,3 an obstacle to the comparison category which produces an increment of the number of evidence from micro and macro inquiries is that of deaths shown by expression (4): different measures of association are used, and this RRPON PoN PON (RR- 1) guest. Protected by copyright. permits only crude judgements of consistency. The (4) next section describes a bridge between the micro and 100 100 100 macro measures. Expression (4) thus represents the number of deaths THE RELATION BETWEEN MICRO AND MACRO associated with a one unit increase in the predictor (X). MEASURES To obtain b, (4) is converted into the metric of the The common and preferable measure ofassociation in dependent variable (mortality per 100 000 of the time series analyses of aggregate data is the population), ie, it is multiplied by 100 OOO/N: unstandardised regression coefficient (b). Given the simple model: b PoN(RR - 1)100 000 PO(RR- 1)1000 b =lOO -(5) Yt = a + bXt + et (3) (The correction factor is equal to 1000 only when the b expresses the change expected in Y when there is a dependent variable is expressed as mortality per one unit change in X (controlling for other 100 000). Expression (5) is applied when we want to determinants included in e). It is seldom recognised obtain the b expected from given values of RR and PO. that there is a relation between the relative risk and the We can illustrate this relation by using the attributable fraction on the one hand, and the hypothetical data on RR and P0 in the table: regression coefficient on the other. b = 0-0005(3-1)1000 = 1-0 Since the formulas for these relations do not appear When working the other way round, ie, when we http://jech.bmj.com/ either in epidemiological or sociological text books on want the RR associated with an observed b, it is more methodology, their derivations are shown below, practicable to proceed in two steps, starting from an supplemented by a hypothetical example (to simplify, expression which yields the attributable fraction from we assume no selection effects). aggregate data. This is derived as follows. Denote the Table. Hypothetical example of mortality in and unexposed persons at t = I and t= 2 exposed on September 23, 2021 by Time t=1 t=2 Mortality rate Mortality, rate N % per 100 000 Deaths N % per 100 000 Deaths Exposed 100000 10 150 150 110000 II 150 165 Unexposed 900 000 90 50 450 890 000 89 50 445 Total I 000000 100 60 600 1 000000 100 61 610 J Epidemiol Community Health: first published as 10.1136/jech.42.4.333 on 1 December 1988. Downloaded from Deriving relative risks from aggregate data 335 risk factor X, and let b be the parameter expression the measured by the regression coefficient) should be due effect of X on Y (the mortality rate). The mortality to changes in exposure, and not to changes in self rate produced by X will then be equal to bX, and thus: selection. Analytically, this can be shown as follows. bX Consider the following micro model: AF== (6) Y Yit = Ci + bXit + eit (7) We may compute the AF for each unit of observation where Yi, is the response ofindividual i on the outcome (ie, each year if annual data are used), or for the variable at time t; C, is the unmeasured time invariant average ofa time period. From the figures in the table, attributes which are correlated with the outcome we get for t1: variable; and eit is a random error term. The potential bias in the estimation of b is due to the correlation 1*10 AF = = 0167 between Yit and X1t that may arise because of self 60 selection.

Deriving Relative Risks from Aggregate Data. 1. Theory

Design and Implementation of N-Of-1 Trials: a User's Guide

Generalized Linear Models for Aggregated Data

Meta-Analysis Using Individual Participant Data: One-Stage and Two-Stage Approaches, and Why They May Differ Danielle L

Learning to Personalize Medicine from Aggregate Data

Good Statistical Practices for Contemporary Meta-Analysis: Examples Based on a Systematic Review on COVID-19 in Pregnancy

Reconciling Household Surveys and National Accounts Data Using a Cross Entropy Estimation Method

Supervised Learning by Training on Aggregate Outputs

Census Aggregate Data Workshop – 17 February 2015 • Ukdataservice.Ac.Uk/News-And-Events/Events

Aggregation Effects in Generalized Linear Models: a Biochemical Engineering Application

Effects of Data Aggregation on Time Series Analysis of Seasonal Infections

A Critique of Structural Vars Using Real Business Cycle Theory∗

Comparing the Forecasting Performance of Var, Bvar and U-Midas