Case–Control Studies: Basic Concepts
Total Page:16
File Type:pdf, Size:1020Kb
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Published by Oxford University Press on behalf of the International Epidemiological Association International Journal of Epidemiology 2012;41:1480–1489 ß The Author 2012; all rights reserved. doi:10.1093/ije/dys147 Case–control studies: basic concepts Jan P Vandenbroucke1* and Neil Pearce2,3 1Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands, 2Faculty of Epidemiology and Population Health, Departments of Medical Statistics and Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK and 3Centre for Public Health Research, Massey University, Wellington, New Zealand *Corresponding author. Department of Clinical Epidemiology, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands. E-mail: [email protected] Accepted 7 August 2012 Downloaded from The purpose of this article is to present in elementary mathematical and statistical terms a simple way to quickly and effectively teach and understand case–control studies, as they are commonly done in dy- namic populations—without using the rare disease assumption. Our focus is on case–control studies of disease incidence (‘incident case– http://ije.oxfordjournals.org/ control studies’); we will not consider the situation of case–control studies of prevalent disease, which are published much less frequently. Keywords Education corner, methods, case–control studies, teaching, rare dis- ease assumption, dynamic population at Universidad de Navarra on January 8, 2013 Introduction decades,2–19 and its history has been described.20 Still, Readers of the medical literature were once taught that the theory is not well known or well understood outside case–control studies are ‘cohort studies in reverse’, in professional epidemiological and statistical circles. which persons who developed disease during follow-up Introductory textbooks of epidemiology often fall back are compared with persons who did not. In addition, on methods of control sampling, which involve the ‘rare disease assumption’ as it was proposed by Cornfield in they were told that the odds ratio calculated from 3 1 case–control studies is an approximation of the risk 1951, because it seems easier to explain. Moreover, ratio or rate ratio, but only if the disease is ‘rare’ (say, several advanced textbooks or articles depict the differ- if <5% of the population develops disease). These no- ent ways of sampling cases and control subjects from tions are no longer compatible with present-day epide- the point of view of a cohort with fixed member- 13,18 miological theory of case–control studies which is based ship. This reinforces the view of case–control stu- on ‘density sampling’. Moreover, a recent survey found dies as constructed within a cohort, even though this that the large majority of case–control studies do not applies to only a small minority of published case–con- sample cases and control subjects from a cohort with trol studies. fixed membership; rather, they sample from dynamic The purpose of this article is to present in elemen- populations with variable membership.1 Of all case– tary mathematical and statistical terms a simple way control studies involving incident cases, 82% sampled to quickly and effectively teach and understand case– from a dynamic population; only 18% of studies control studies as they are commonly done in sampled from a cohort, and only some of these may dynamic populations––without using the rare disease need the ‘rare disease assumption’ (depending on how assumption. Our focus is on case–control studies of the control subjects were sampled). Thus, the ‘rare dis- disease incidence (‘incident case–control studies’); ease assumption’ is not needed for the large majority of we will not consider the situation of case–control stu- published case–control studies. In addition, different dies of prevalent disease, which are published much assumptions are needed for case–control studies in dy- less frequently,1 except in certain situations as dis- namic populations and those in cohorts to ensure that cussed by Pearce21 (e.g. for diseases such as asthma the odds ratios are estimates of ratios of incidence rates. in which it is difficult to identify incident cases). The underlying theory for case–control studies in dy- The theory of case–control studies in dynamic popu- namic populations has been developed in epidemiologi- lations cannot be explained before first going back to cal and statistical journals and textbooks over several the calculation of incidence rates and risks in 1480 CASE–CONTROL STUDIES 1481 dynamic populations. In a previous article, we have reviewed the demographic concepts that underpin these calculations.22 In the current article, these con- cepts will first be applied to case–control studies involving sampling from dynamic populations. Second, we discuss how to teach the theory in the situation of sampling from a cohort. In the third part, it is explained how these two distinct ways of sampling cases and control subjects can be unified conceptually in the proportional hazards model (Cox regression). Finally, we discuss the consequences of this way of teaching case–control studies for under- standing the assumptions behind these studies, and for appropriately designing studies. We propose that the explanation of case–control studies within dy- namic populations should become the basis for teach- Downloaded from ing case–control studies, in both introductory and more advanced courses. Case–control studies in dynamic http://ije.oxfordjournals.org/ populations Figure 1 The underlying dynamic ‘source’ population of a Basic teaching study of myocardial infarction (MI) and oral contraceptive use. The bold undulating lines show the fluctuating number To understand the application of the basic concepts of of users and non-users of oral contraceptives in a population incidence rate calculations to case–control studies, we that is in a steady state. The finer lines below it depict in- start with the demographic perspective of a dynamic dividuals who enter and leave the populations of users and population in which we calculate and compare inci- non-users. Closed circles indicate cases of MI emanating 22 from the population. For users and non-users separately, an dence rates of disease. at Universidad de Navarra on January 8, 2013 Suppose that investigators are interested in the incidence rate (IR) of MI can be calculated. The incidence effect of oral contraceptive use on the incidence of rate ratio (IRR) can be used to compare the incidence of MI myocardial infarction among women of reproductive between users and non-users. In the description of the ex- ample in the text, the time t was set to one calendar year. age. They might investigate this in a large town in a Figure adapted from Miettinen9 particular calendar year (we base this example loosely on one of the first case–control studies that investi- gated this association23). The time-population struc- ture of the study is depicted in Figure 1. live in the town. Suppose that, on average, 40 000 In Figure 1, for the sake of simplicity, imagine that, women use oral contraceptives and 80 000 do not. on average, 120 000 young women of reproductive age Again, these are two dynamic subpopulations that (between ages 15 and 45 years) who have never had can be regarded as being in a steady state. Women coronary heart disease (CHD), are living in the town, start and stop using oral contraceptives for various on each day during the calendar year of investigation. reasons and switch from use to non-use and back This is a dynamic population: each day, new young again. As such, in one calendar year, we have women will become 15 years old, others will turn 46, 40 000 woman-years of pill use and 80 000 woman- some will leave town and others will come to live in years of non-use, free of CHD. the town, some will develop CHD and be replaced by Suppose that a group of investigators surveys all cor- others who do not have the disease and so forth. Such onary care units in the town each week to identify all a population can be safely regarded as being ‘in women, aged 15–45 years, admitted with acute myocar- steady state’. The demographic principle of a steady- dial infarction during that period. When a young state population was explained in our previous art- woman is admitted, the investigators enquire whether icle;22 in brief, it assumes that over a small period, she was on the pill––and whether she had previously e.g. a calendar year, the number of people in a popu- had a coronary event (if she had, she is excluded from lation is approximately constant from day to day the study). Suppose that, in total, 12 women were because the population is constantly depleted and admitted for first myocardial infarction during the replenished at about the same rate. It was also year of study: eight pill users and four non-users. explained why this assumption holds, even if the That produces an incidence rate of 8/40 000 population is not perfectly in a steady state.22 Thus, woman-years among pill users and 4/80 000 woman- we take it that each day of the year, 120 000 women years among non-users. The ratio of these incidence of reproductive age, free of clinically recognized CHD, rates becomes (8/40 000 woman-years)/(4/80 000 1482 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY woman-years), which is a rate ratio of 4, indicating that measure, nor to estimate, all the person-years of women on the pill have an incidence rate of myocardial pill-using and non-using women in town; we can infarction that is four times that of those not on the pill.