RANDOM BYTES Editor: Garrett Fitzmaurice, ScD The Ratio: Impact of Study Design Garrett Fitzmaurice, ScD From the Department of , Harvard School of Public Health, Boston, Massachusetts, USA

n the previous column1 we considered STUDY DESIGN (called controls) are used. When the the “two-by-two” , a under study is relatively rare, the cases may Isimple square array used to provide a To examine the relationship between dis- include all diseased subjects in a clinic or tabular display of the relationship between ease and a hypothesized risk factor, e.g., registry. The controls, however, should be two categorical variables, each having only nutritional status, we ordinarily must obtain drawn from the same population. Cases and two levels. Figure 1 shows such a table, data on both disease and nutritional status in controls are then interviewed to determine where the rows correspond to nutritional a sample of individuals. There are two com- their nutritional status; typically thus, data status and the columns correspond to dis- mon designs for observational studies of the on nutritional status are obtained retrospec- ease status. We showed that the relationship association between disease and a specific tively after determination of the disease. between the two variables can be described risk factor: the prospective and the retro- It should be intuitively clear that finding in terms of the odds ratio, spective study. The prospective study, elevated rates of malnutrition among dis- sometimes known as the , at- eased cases compared with controls pro- Pr(disease Խ malnourished)/ tempts to mimic a designed . vides evidence for association between dis- Pr(no disease Խ malnourished) That is, in a prospective study individuals ease and nutritional status. However, OR ϭ Pr(disease Խ well nourished)/ are selected into the study on the basis of because of the sample design, with data Pr(no disease Խ well nourished) their nutritional status. In the simplest case, from a case-control study we can only esti- the greatest power for detecting an effect of mate Pr(malnourished Խ disease status) and Խ -c͒ nutritional status on disease is obtained by not Pr(disease nutritional status). As a re ء d͒/͑b ء ϭ ͑a choosing an equal number of malnourished sult, we cannot directly estimate the relative where OR denotes the odds ratio and Pr and well-nourished individuals. The indi- risk with data from a case-control study. denotes . In the previous column, viduals in the study are then followed for a However, the association between nutri- we also showed that the odds ratio has many specified period to determine the develop- tional status and disease can be expressed in appealing properties that account for its ment of disease in each of these two groups. terms of the odds ratio. Using Bayes’ rule3 widespread use in practice. First, the odds Note that in a prospective study we can (a fundamental theorem that the reader may ratio can often be interpreted as an approx- express the relationship between nutritional have encountered in an introductory statis- imation to the (or risk ratio) of status and subsequent disease in terms of the tics course) and a little bit of algebra, it can disease in cases where the probability of odds ratio, be shown that the odds ratio can be defined disease is rare.2 Second, the odds ratio is not only in terms of Pr(disease Խ nutritional Pr(disease Խ malnourished)/ invariant to reversals of the orientation of status) but also in terms of the Pr(malnour- Pr(no disease Խ malnourished) ished Խ disease status). That is, the two-by-two contingency table. That is, OR ϭ the odds ratio remains the same when rows Pr(disease Խ well nourished)/ Խ and columns of the table are interchanged; a Pr(no disease Խ well nourished) Pr(malnourished disease)/ property that is not shared by other mea- Pr(well nourished Խ disease) or in terms of the relative risk (RR), OR ϭ sures of association, e.g., the relative risk. Pr(malnourished Խ no disease)/ This latter property implies that it is not Pr(disease Խ malnourished) Pr(well nourished Խ no disease) ϭ necessary to distinguish which of the two RR Խ variables is considered to be the outcome Pr(disease well nourished) As a result, the odds ratio can be estimated and which is considered to be the predictor Both of these measures of association at- regardless of whether the study design is to estimate the odds ratio. In this column, tempt to explain the same phenomenon, prospective or retrospective. we will see that a very appealing feature of namely whether nutritional status has any the odds ratio is that it is equally valid effect on the probability of disease. regardless of whether the study design is The prospective design is generally the CONCLUSION prospective or retrospective. This unique method of choice if the disease outcome can property of the odds ratio is not shared by be observed relatively soon after the com- In summary, the odds ratio is often consid- other measures of association and has im- mencement of the study. However, in many ered to be the measure of choice for quan- plications for the design of studies that ex- instances, particular may develop tifying the association between two dichot- amine the relationship between disease and decades after initial exposure to specific risk omous variables. The reason for the a hypothesized risk factor (e.g., nutritional factors. In such instances the prospective widespread adoption of the odds ratio is status). study would take decades to complete, mak- due, at least in part, to its unique mathemat- ing it very costly, if not entirely infeasible. ical properties. The odds ratio, unlike other The retrospective study, also known as the measures of association, can be defined in case-control study, in some sense takes the terms of the conditional of ei- Correspondence to: Garrett Fitzmaurice, ScD, opposite design approach. In a case-control ther one of the two variables, given the Department of Biostatistics, Harvard School of study individuals are selected into the study other. As a result, the odds ratio can be Public Health, Room 423, 665 Huntington Ave- on the basis of their disease status. Often an estimated either from a prospective or case- nue, Boston, MA 02115, USA. E-mail: fitzmaur@ equal number of diseased individuals control (retrospective) design. Furthermore, hsph.harvard.edu (called cases) and non-diseased individuals this attractive property readily generalizes

Nutrition 16:1114–1115, 2000 0899-9007/00/$20.00 ©Elsevier Science Inc., 2000. Printed in the United States. All rights reserved. PII S0899-9007(00)00437-8 Nutrition Volume 16, Numbers 11/12, 2000 The Odds Ratio: Impact of Design 1115

factor as the predictor, regardless of whether the design is prospective or retrospective. That is, case-control data can be treated as if it were prospective data in a logistic regres- sion analysis to determine the odds ratio relating disease and exposure to the hypoth- esized risk factor.5

REFERENCES

1. Fitzmaurice G. Some aspects of interpretation of the odds ratio. Nutrition 2000;16:462 2. Cornfield J. A method of estimating comparative rates from clinical data: applications to cancer of the FIG. 1. Illustration of a two-by-two contingency table. lung, breast, and cervix. J Natl Cancer Inst 1951;11: 1269 3. Pagano M, Gauvreau K. Principles of biostatistics. Belmont, CA: Duxbury Press, 1993 when, for example, the risk factor is a con- additional variables to control 4. Pagano M. . Nutrition 1996;12: tinuous (e.g., urinary nitrogen as a biomar- for in the analysis. In this case, a logistic 135 ker for protein intake) rather than dichoto- regression model4 can be used, with disease 5. Prentice RL, Pyke R. Logistic disease mod- mous variable, and/or when there are status treated as the outcome and the risk els and case-control studies. Biometrika 1979;66:403