<<

in Cancer . 1.42.Lyon: IARC, 1997, i, Mocarelli P, Gerthoux eedham L, et al. Serum ns and breast cancer risk nen's Health Study. En- Conceptsin Cancer Epidemiologyand Etiology :ct 2002;1.10:625-28. rs CI, \Wang LE, Guo Z, :pair of tobacco carcino- PAGONA LAGIOU, DIMITRIOS TRICHOPOULOS, adducts and lung cancer idemiologic study. J Natl AND HANS-OLOV ADAMI Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce ,n of Biomarkers in Can- IARC Sci Publ No. 142. pp 1-18.

Epiderniology has been a powerful tool in understanding the logic underlying cancer 'the identification of causes of infectious epidemiology, however, central concepts diseasesand the elucidation of the condi- in epidemiology-the study of disease 'We tions underlying epidemic outbreaks that etiology-will be reviewed. examine :..trefrequently, but not always, of infectious cohort and case-control studies (with spe- edology. Around the middle of the twenti- cial reference to studies of genetic epide- eth century, first in the United Kingdom miology), we considerthe impact of chance {Doll and Hill, 1950) and later in the United and systematic errors (confounding and Statesand the rest of the world (Wynder bias), and we trace the process of causal and Graham, 1950; Clemmesen and Niel- reasoning. Familiarity with these concepts *en, 1.957; MacMah on,'1.957 ), epidemi ol ogy is essentialfor critical reading and under- expanded in scope by focusing also on the standing of the chapterson specificcancers. etiology of chronic diseases,irrespective of A glossary found at the end of the chapter the nature of the causal agents.Since then, provides a summary of definitions for epidemiology has developed and matured words in italics. to become a rich and powerful toolbox for the study of biologic phenomenain humans. !(lith a number of fine textbooks nowa- ETIOLOGY days available to students of epidemiology instance Miettinen, 1985; Hennekens {for 'S7alker, snd Buring, 1,987; 1,991,;MacMa- The definition of a causeshould apply to all hon and Trichopoulos, L996; Rothman and diseases,whether defined on the basis of a Greenland,1,998; Rothman, 2002; and sev- pafticular exposure, such as many infectious eral others), this chapter is not intended and occupational diseases,or documented to expand on methods or quantitative by ^ constellation of clinical andf or lab- considerations. For the purpose of better oratory findings-for example, malignant

1,27 t28 BACKGROUND

tumors, connectivetissue disorders, or psy- the pie is complete, diseaseis, in a deter*r fFl. choses.In terms of a particular individual, ministic context, inevitable. Table 6-t prw BLNI exposure to a causeof a diseaseimplies that vides a summary of the attributes of th*'i the individual is now more likely to develop causal pie model. the diseasg-although there is no certainry Causalityis rarely,if ever,characterizedt that this will happen. The complexiry of by a simple one-to-one correspondence lx'., biological phenomena and our ignorance fween a particular exposure and a specifie or limited understanding of many of the disease.If so, the presenceof the exposure underlying processeshinder a deterministic, would be both necessaryand sufficientfsr:;, logically unassailable,explanation of dis- the occurrence of the disease.By necessary easecausation. Hence, causation of disease we mean that the disease cannot occur ion can only be conceptualizedin a probabilis- without the presenceof that exposure (al. r ;able (stochastic) tic sensethat involves srarisrical though otherexposures may be requiredfor Pre' terms and procedures. For instance, while the occurrenceof the disease).By sufficient heavy smokers are much more likely to de- we meana setof exposuresthat inevitably * Rothr velop lung cancer than nonsmokers, most producedisease. There may of coursebc smokers never develop lung cancer and some different ways by which one could get dis': nonsmokersdo. ease,and thus sufficient causesmay not k,l ing, In epidemiology, there are severalmodels necessary. con of causality that have been applied to help In cancerepidemiology, the only known, ,d. clarify the role of various exposures in the examples of exposures that are sufficient to etiology of disease.The causal pies pre- cause diseaserefer to the genetic origin of ftrtair sentedby Rothman (1976) provide perhaps some rare cancers due to dominant genei ry (al the most coherent approach to conceptual- with complete penetrance.In this instance, izing causality in a variety of epidemio- the causal pie would require only one factot , chror logic settings (Rothman, 1,986ir.Each of for the pie to be complete and this would h these pies describesa of exposuresthat the way that carriers would get the specific, le inju work together on the same pathway to cancer. Also rare is the existenceof single:. r vel cause disease (Fig. 5-1). Different expo- factors that are in and by themselvessuffi.., 5; Ro suresmay occur within a short span, or cient (although not necessary)for the cau-,, hon r may happen decadesapaft. Once every ex- sation of a certain disease. Even powerful ilethe, posure in a causal pie has occurred, that is exogenous factors, such as life-long heavy addit

nP, cervlc; being: ncersd

Figure 6.1. The causal pie modef describesa set of exposures that work together in the same the pathway to cause disease.These are hypothesized ways in which a series of exposures could ill cases. interact biologically over time to cause disease.This figure provides an example of suffi- ' m, cient causesfrom cancer epidemiology. Tobacco is an established component cause in many Ilor casesof oral cancer. However, tobacco use by itself is not enough for the diseaseto occur; in lnry cau{ addition, oral cancer can occur among people who have never used tobacco. In a given .SousalPi, causal pie, the complementary exposures can occur simultaneously, or many years apart. If In examl even one of the component causesdid not occur, diseasewould be prevented by this pathway, Iuggeste( although a person could develop the diseaseby another mechanism (a different causal pie). the first ' CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 129

diseaseis, in a deter- Table 6-1. Aaributes of the causal pie itable. Table 5-1 pro- ATTRIBUTE DESCRIPTION the attributes of the Inevitability Completion of a sufficient cause (causal pie) is synonymous with eventual occurrence if ever, characterized (though not necessarily diagnosis) of the disease. le correspondencebe- Causality A component cause (piece of a causal pie) can involve presence of a detrimental :posure and a specific exposure or absence of a preventive exposure. senceof the exposure llurden of disease The amount of diseasecaused by a sufficient cause depends on the prevalence of all ;ary and sufficient for complementary component caus€s. 'femporaliry disease.By necessary ., Component caus€scan act far apart in time lisease cannot occur lnteraction Component causesin the same pie interact biologically to cause disease of that exposure (al- Attributable fraction Different component causesare responsible for more than 100 percent of diseasecases. esmay be required for Diseaseprevenrion Blocking the action of any component cause prevents completion of the respective disease).By sufficient sufficient cause and therefore prevents disease by that pathway. osuresthat inevitably Source: Rothman. 1976. :re may of course be ich one could get dis- :nt causesmay not be smoking, and strong geneticinfluences, like alcohol over time are contributing factors those conveyed by dominant breast cancer (component cnuses)in the etiology of oral logy, the only known genes,do not always cause diseasein an cancer.However, the oral cancerwould not s that are sufficienr to individual. have occurred in the presence of a dental the genetic origin of Certain exposuresare by definition nec- visit that could have treated precancerous re to dominant genes essary (although not sufficient) for the oc- lesionsand might haveprevented the disease. rnce. In this instance, currenceof a particular disease.For exam- \7hile smoking is a component cause in equire only one factor ple, chronic lead diseasecannot occur in the many causalpies for oral cancer,people can lete and this would be absenceof lead exposure, and a motor ve- get oral cancer without smoking, as shown would get the specific hicle injury requires the involvement of a by the second causal pie in this figure. he existence of single motor vehicle (MacMahon et aL,1,950;Hill, C by themselvessuffi- 1965;Rothman, 1.975;Susser, 1991,; Mac- Interventional Epidemiology ecessary)for the cau- Mahon and Trichopoulos, t996). Again, How do we design a scientific study to eval- sease.Even powerful while theserepresent necessory causes, there uate whether a particular exposure (for ich as life-long heavy are additional cofactors that must work in example, asbestos)is a cause of a specific concert before diseaseis inevitable. Most disease(for example, lung cancer)?To un- human cancerscan occur via severalpath- derstand the most appropriate design in ways, so it is hard to define any single practice, it is useful to begin by describing necessarycause. Asbestos, in relation to the ideal scientificstudy. Imagine for a mo- mesothelioma (cancer of the pleura), and ment that we have accessto a time machine. human papillomavirus infection, in relation In an imaginary study, we follow a group to cervical squamous cell cancer, are close of individuals from birth to death, where to being necessary.However, casesof these everyone is exposed to asbestos,and we cancersdo arisewithout the exposurebeing observe whether they develop lung cancer. documentable,either becausethe exposure We then send everyone back in a time ma- occurred but could not be identified, or be- chine, to live the exact samelives they lived, ther in the same causethese exposures are not necessaryfor except that we completely remove asbestos :xposurescould all cases. from the environment so that no one is ex- .ple of suffi- For most diseases,there is no neces- posed. r causein many one I7e then compare whether there are :aseto occur; in sary cause.Indeed there may be numerous changes in the frequency of occurrence of In a given causalpies by which diseasecan occur. Such lung cancer before and after use of the time years apart. If an example is illustrated in Figure 5-1, with machine. Sincethe same people live identi- ry this pathway, suggestedsufficient causes of oral cancer.In cal lives but for the presence/absenceof :nt causalpie). the first example, exposure to tobacco and asbestos, any difference in the frequency 130 BACKGROUND CON( may be attributed to alterations in the ex- everyone comply with this allocation over short life span of thes posure to asbestos,which leads to the def- the course of many years. Most trials are which impose the adn inition of cause. thus only conducted for no more than a few doses suspected . of agen How then can we develop the time ma- years, an unrealistically short period to test ate a sufficient number chine analogy into a realistic epidemiologic the effect of most exposuresbecause of the sequently,questionable approach? \7e could study two groups of long latency between exposure and diag- olations to humans hav people who are comparable on every char- nosis of cancer. Furthermore, in many ran- Even when experime acteristic, except that one group had expo- domized trials, subjects become noncom- randomized controlled r sure and one did not. The randomized pliant over time-that is peopleallocated to ethical, they are, with controlled trial closely approximates this the intervention arm stop taking the inter- practical becausemost ( goal. By randomly allocating who receives vention, and those in the original placeboor their latent period, that an exposure, for example treatment, and usual care arm may adopt the intervention exposureto a causeand who does not, the exposure occurs only (a phenomenon called cross-over).This di- clinical disease,is long, because the investigator has assigned it. minishes the contrast between the original essary to enroll unreal For example, an investigator randomly as- randomized groups, reducing the power to bersof compliant volun signsone group of peopleto receivevitamin detect a difference in diseaserates between period (Hennekens a E supplements (exposed),while the other the groups. MacMahon, 1979; M group receivesa placebo(unexposed). Study Because of the limitations of the ran- chopoulos, 1996). participants are then followed forward in domized controlled trial, the observational Observational, that r time to see whether they develop cancer. cohort and case-control designsare exten- studiesrepresent the ma Ifhether someone receivesvitamin E then sively utilized in epidemiology. As will be epidemiology. Such stt does not depend on whether or not the discussedlater in the chapter, attention to ment causal relations o subject,for example, smokes,drinks, eatsa both the designand analysisof thesestudies ciations berween partic high-fat diet, or has a certain genetic sus- may allow us to approximate the standards cancer or other disease ceptibiliry. of comparability, necessaryto validly eval- sation on the basis of In this wly, the randomization in a trial uate the effect of an exposure on the fre- when the association makes the two (or more) groups, those ex- quency of a disease. biologically credible- posedand those unexposed,comparable on eancer, or hepatitis B ' other study factors that might causethe dis- Observational Epidemiology cer, are striking exampl ease.Hence, the unexposedgroup is a proxy The essenceof observational epidemiology difficult when the associ of what would have happened to the ex- is the noninteruentional investigation of dis- eompelling but the ep posed group if they had been unexposed- easecausation in human population groups. ence weak-for exampl that is if we could have sent them back in The argument is that only by studying level ionizing radiatior the time machine. Comparability is essen- humans is it possibleto draw confident con- passive smoking and I tial in order to ascribe any changes in the clusions about normal or pathological pro- interpretation also be, frequency of disease to alterations in the cesses concerning humans (MacMahon, when the epidemiologic exposure. 7979;MacMahon and Trichopoulos, 1,996). convincing but the bic \fhile some researchersdescribe the ran- In vitro studies, such as those involving cell uncertain, as it is with domized controlled trial as the gold- cultures, and studies in laboratory animals gnd colorectal cancer ol standard of scientific studies, this design is are valuable. They are indeed indispensable {ancer. I7hen an epider impractical in the majority of epidemiologic when toxic exposures or invasive proce- ir weak, is derived fron situations. For one thing, most exposures dures like repeated biopsies are needed for tionable quality, and fl we study are detrimental. If we want to the study of physiologic or pathologic pro- Ygcuum,inferring caus: study the impact of asbestoson lung cancer, cesses,such as carcinogenesis.However, in we cannot ethically randomize people to vitro systems are frequently artificial, and live in a house with asbestos.But even for there are physiological and metabolic dif- STUDY D exposures that are not necessarily. detri- ferences between humans and laboratory mental, randomization may be difficult or animals that hinder interspeciesanalogies. DescriptiveStudies impractical. For instance,it is very difficult These analogiesare further complicated by It is possibleto distin; and expensiveto randomize alarge group to the unavoidably limited number of animals epidemiologicalstudies eat a low-fat versusa normal diet, and have used in laboratory studiesand the relatively *nnlytic.In descriptives CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 131

th this allocation over rhort life span of these animals, both of of occurrence of a disease (incidence)- years. Most trials are which impose the administration of high or of death from a disease (mortality)-is for no more than a few ' dosesof suspectedagents in order to gener- estimated in a population, by routinely rlly short period to test *te a sufficient number of outcomes. Con- available time, place, andf or group charac- posures becauseof the 'ffquently, questionablequantitative extrap- teristics. Descriptive studies are essentially n exposure and diag- olations to humans have to be undertaken. exploratory and hypothesis generating. hermore, in many ran- Even when experimental studies,such as For instance, descriptive studies that docu- ects become noncom- rsndomizedcontrolled trials, in humans are mented the increasing trend of lung can- rt is people allocatedto ethical, they are, with few exceptions, im- cer incidence among men, but not among stop taking the inter- practical becausemost diseasesare rare and women, in the early part of the twentieth the originalplaceboor their latent period, that is, the time berween century pointed to tobacco smoking as a adopt the intervention cxposure to a causeand the appearanceof a likely cause of this disease.In contrast, the :d cross-over).This di- clinical disease,is long. This makes it nec- objective of analytic studiesis to document t between the original rssary to enroll unrealistically large num- causation from the pattern of association in reducing the power to bersof compliant volunteers for a very long individuals between one or more exposures r diseaserates berween period (Hennekens and Buring, t987; on the one hand, and a particular diseaseon MacMahon, 1979; MacMahon and Tri- the other. mitations of the ran- chopoulos,L996). :rial, the observational Observational, that is nonexperimental, EcologicStudies trol designs are exten- studiesrepresent the mainstream of modern Ecologic studies in epidemiology occupy an demiology. As will be epidemiology. Such studies seek to docu- intermediate position berween descriptive : chapter, aftention to ment causal relations on the basis of asso- and analytic investigations, in that they Lnalysisof thesestudies ciations between particular exposures and share many characteristics with descriptive :oximate the standards cancer or other diseases. of cau- studies, but serve etiologic objectives. In ressaryto validly eval- sation on the basis of association is easy ecologic studies, the exposure and the dis- l exposure on the fre- wheir the association is both strong and easeunder investigation are ascertainednot biologically credible-smoking and lung for individuals but for groups or even whole cancer, or hepatitis B virus and liver can- populations (Morgenstern, 1,952). Thus miology cer, are striking examples. It becomesmore the prevalence of hepatitis B virus (HBV) 'vational epidemiology difficult when the association is biologically in several populations could be correlated wl investigation of dis- compelling but the epidemiologic experi- with the incidence of liver cancer in these ran population groups. enceweak-for example, in studies of low- populations, even though no lat only by studying level ionizing radiation and leukemia or could be obtained as to whether any par- to draw confident con- passive smoking and lung cancer. Causal ticular individual in these populations was al or pathological pro- interpretation also becomes problematic or was not an HBV carrier and has or has xumans (MacMahon, when the epidemiologicassociation is fairly not developed liver cancer. Associations dTrichopoulos, 1996). convincing but the biological rationale is from ecologic studies are viewed with skep- as those involving cell uncertain, as it is with respect to red meat ticism, becausethese studies are susceptible in laboratory animals and colorectal cancer or alcohol and breast to unidentifiable and intractable confound- :e indeed indispensable cancer. Vhen an epidemiologic association ing as well as to several other forms of bias 'es '1.982; or invasive proce- is weak, is derived from a study with ques- (Morgenstern, Greenland and Ro- riopsiesare neededfor tionable quality, and floats in a biological bins, t9941. rgic or pathologic pro- vacuum, inferring causation is perilous. ri7hen an exposure is fairly common, for rogenesis.However, in example, smoking, or even prevalence of :quently artificial, and HBV carriers, ecologic studies can provide :al and metabolic dif- STUDY DESIGN useful evidence on the possible effects of lmans and laboratory exPosures'For instance' following the interspeciesanalogies. Descriptive studies " lncrease:l::t in tobacco consumption, the inci- :urther complicated by. It is possible to distinguish observational denceof lung cancerincreased sharply over ted number of animals epidemiological studies into descriptive and time, and the incidence of primary liver -rdiesand the relatively analytic. In descriptive studiesthe frequency cancer is higher in populations with higher 132 BACKGROUND CO

prevalenceof HBV. As a corollary, lack of tributed in a study by subjectsat risk of a Person-timeis the s an associationin ecologic studiesbetween a disease. want to investigate,f widespread exposure that has rapidly in- Theoretically, an ambitious investigator rcnceof cancer.To he creased over time and the incidence of a might wish to include the entire world lretterunderstanding ; diseaseallegedly caused by this exposure, population in an epidemiologicstudy during base,we will use the t does not support a strong causal relation. many decades.Needless to say, such a study risk of brain cancer () would provide marvelous opportunities to ulation, five people ha Analytic Studies evaluate many different exposuresin rela- rays,and another five I Analytic epidemiologicinvestigations ascer- tion to many diseases.Millions of person- attd remain unexposer tain exposure and diseaseoutcome in indi- yearswould be generatedeven within a few riod. While in real life viduals and are usually distinguished into .In real life, however,any investigator arc much larger, we u cohort and case-control studies, although has to restrict the person-time from which arnpleto illustrate rhe there are also severalvariants of thesepro- information is harvested. This specified Among the people totype designs (MacMahon and Tricho- person-time is called the study base. Defin- p('rsons3and5wer poulos, 1,995; Rothman and Greenland, ing the person-time to be included in the tirnethey were €XpoS€r 1998). The objective of analytic epidemio- study base may include geographic restric- uf the study period-a logic studies is to ascertainwhether a par- tions, defined time periods, and certain age Persons1, 2, and 4, ticular exposure,such as a physical, chemi- limits. Personalcharacteristics such as gen- hrain cancerat the enc cal, or biological agent,and a specificcancer der, ethnicity, and occupation may further mspectively.Once thes or other diseaseare unrelated (independent) specifythe study base.For example,the study lrr;rin cancer, they are or associated.An associationdoes not nec- basemay be comprisedof all British doctors essarily indicate causation. Chance, bias, who answereda questionnairein 1,951.(Doll and confounding (see following) can also and Hill, 1.9561,or by all Swedishwomen generate associations,and they frequently who were aged 50 to 74 between"l.994and person 1 do. Causation is unlikely when there is no 1995 (\ilTeiderpasset al,'1,999),andwho gen- associationobserved. Even if a causal rela- erated person-time until they died or until person2 oxposureto tion does exist, however, it may sometimes the follow-up was completed. x-rays person3

be difficult to document it, particularly when Thus, the study baseis simply the person- person4 the associationis weak, the study has limited time of a population of individuals at risk person5 statisticalpower,or the exposureis seriously of a diseaseunder study. Defining the study misclassified. base is a crucial step in the design and person6 conduct of an epidemiologic study. There person7 Person-time and Study Base are three central considerations.One is to no axposure persong to x-rays ! The concepts of person-time and study base accommodaterealistic goals with regardsto I are fundamental to the design and analysis feasibility and resources, as certainly no person9 of epidemiologic studies. As the name im- investigator is independent of time and person10 r plies, there are two key components in our money. A secondgoal is to make the study 0.0 description of the person-time, namely the efficient. For example, it would make little number of people and the time they are sense to study the association between followed. To illustrate this, we could ask smoking and cancer in a population where how many brain cancer cases we would very few are smokers. Likewise, a study of expect if we followed one million people diet and prostate cancer would be ineffi- exposed to x-rays f.or zero seconds. Con- cient among men younger than 40, since versely,how many caseswould we expect if virtually no casesarise among such young we followed zero people for one million people. The final challenge is to identify a years?The answer in both instancesis, of study basethat allows valid inferencescon- course,zero. Hence, neither people nor time cerning associations between exposure(s) alone provides adequateinformation about and a particular is, a study disease-that ?r nver time. Of the fiv the diseaseexperience of a population, and basethat does not impose intractable con- rtudy period (person 9 thus both should be taken into account. founding or raise insurmountable obstacles rtceof rheseren ind Person-time is the sum of all the time con- of other biases. is the study-base. CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 133

ry subjectsat risk of a Person-timeis the sourceof any we the disease,and thus no longer contribute want to investigate,for example the occur- information to the study base.The person- ambitious investigator renceof cancer.To help set a foundation for time among those exposedto x-rays is esti- ude the entire world betterunderstandingperson-time in a study mated by summing up the person-timeof all emiologic study during base,we will use the example of x-rays and the individuals while at risk for the disease, iessto say, such a studY risk of brain cancer (Fig. 6-2). In this pop- that is: 'elous opportunities to ulation, five people have beenexposed to x- rent exposures in rela- rays,and another five havenot beenexposed (2 personsx 5 years) * (1 personx 1 year) :s. Millions of person- and remain unexposedduring the study pe- * (1 personx 4 years)* (L person :ated even within a few riod. \7hile in real life the study populations x 2 years):1'7 person-Years ,wever, investigator are much larger, we use this elementaryex- any 'W'e erson-time from which ample to illustrate the principles. can similarly sum up the person-time vested. This specified Among the people exposed to x-rays, among the group of five individuals who I the study base. Defin- persons 3 and 5 were followed from the were not exposed to x-rays: to be included in the rimethey were exposedto x-rays till the end ude geographic restric- of the study period-a total of 5 yearseach. (4 personsx 5 years)* (1 person reriods,and certain age Personsl, 2, and 4, however, developed x 2 yeats):22 Person-years :acteristicssuch as gen- lrrain cancer at the end of years 1.,4, and 2, )ccupation may further respectively.Once theseindividuals develop Later on when we discussanalysis of epi- :. For example,the studY brain cancer, they are no longer at risk of demiologic studies, we will see how the ;ed of all British doctors stionnairein 1951 (Doll by all Swedish women o 74 between L 994 and person1 al,,1.999),and who gen- person2 Brain cancer died or until until they exposure to ompleted. x-rays person3 rse is simply the Person- person4 ,n of individuals at risk person5 .udy. Defining the studY ;tep in the design and person6 emiologic study. There person7 is to rnsiderations.One no exposure person8 ;tic goals with regardsto to x-rays person9 )urces, as certainly no ependent of time and person10 the studY oal is to make 3.0 rle. it would make little time ,. association between r in a population where :rs. Likewise, a studY of l;igure5.2. Experienceof a theoretical study population over time. Five individuals who were cancer would be ineffi- cxposedto x-rays and five individuals who are unexposedare followed over time to seeif develop Among persons followed for younger than 40, since they brain cancer. those exposedto x-rays, 3 and 5 are the of the study period, which in this casewas five years. Person 1 developsbrain rrise among such young cilncerafter L year, person 2 develops brain cancer after 4 years, while person 4 develops hallenge is to identify a c:rncerafter 2 years.Persons 1,2 and 4 stop contributing person-timeafter they develop brain con- rws valid eilncer,since they are no longer at risk for the disease.The total person-time in the exposed .ls between exPosure(s) group is 17.0 person-years.Similarly,w€ can look at the population of people unexposedto lisease-that is, a studY x-raysover time. Of the five people who are unexposed,only one developsbrain cancerduring impose intractable con- thc study period (person 9). The remaining people are followed completely for five years.The :rsurmountableobstacles rxperienceof theseten individuals over time, that is the person-time that the subjectscon- rihuted, is the study-base. r34 BACKGROUND coN

person-time data will help us to compare cohort, established in "1.951,consisted of disease incidence between exposed and more than 30 000 doctors from Great unexposedpeople. Britain. In this landmark study, Doll and colleagues prospectively followed the co- Cohort studies hort and collected updated information on The word cohort derives from the similar multiple exposures, particularly smoking, Latin word, which identified one of the ten over several decades. Indeed, prospective divisions in a Roman legion. In epidemi- data from the British doctors were among ology, cohorts are groups of individuals, the first to demonstrate convincingly the which can be followed over time. In cohort role of tobacco in the etiology of lung can- studies,individuals are classifiedaccording cer (Doll and Hill, 1,9561.More than four to their exposure and are observed for as- decadeslater, data from the British Doctors certainment of the frequency of disease have continued to provide insight into the occurrenceor death in the various exposure- etiology of cancer (Doll et al, 2005). definedcategories (Fig. 5-3A). In each cate- Another notable cohort is the Nurses gory the frequency of occurrence is calcu- Health Study, which began in 1976 with E lated either as risk or as incidencerate. Risk over 120 000 US registered nurses.This co- Figure6.3A. A cohort stu factor(s) of interest. Wher describesthe proportion of those who de- hort was assembledinitially to evaluatepro- '' veloped the diseaseunder study among all spectively the effect of oral contraceptives Newly diagnosedcases of individuals in this category. Rate describes on the risk of breast cancer (Hennekenset al, recorded. The exposure s cigarenes for five years, t the number of those who developed the 1984). Subsequently, diet and many other ' xquently, each person ca diseasedivided by the person-time during exposures have been studied in relation to ,,., consideredexposed, if tht which the individuals in this category have the risk of cancer as well as other chronic (Zhang beenunder observation.Cohort studieshave condition s et al, 2005 ). Information .:: tv6$accumulating non-ex the following defining characteristics. on these diverse exposures has been col- ,:.'' t?ncy. The total amount c Cohort studies are exposure-based. The lected biennially through questionnaires. lnd non-exposedcases ca groups to be studied are selectedon the ba- Moreover, blood samples have allowed re- :, ttccurred in the exposed c sisof exposure.In specialexposure cohorts, searchersto explore biomarkers and genetic ;, calculatethe incidence ra the groups are chosen on the basis of.a par- factors. For example, prospective data from ii, bttween the exposure an( ticular exposure. In general population co- the Nurses Health Study has provided in- horts, groups offering logistical advantages sight into the role of both exogenous and

for follow-up are initially chosen and the endogenous estrogens in breast cancer eti- l;rj.::l:l:::.. '' individuals are classifiedaccording to their ology. A particular characteristic of these pOpulatront over tlme- exposure status. Special exposure cohorts types of cohorts is that the individuals can be -i;'.' histories of '.r.j.=.Plo)'ment may be necessarywhen rare exposures need followed almost completely over time, due linlted to recorded he ',''j to be studied, such as those encountered to their membership in groups with a high "$ation of the worker in the occupational setting. For example, interest in health studies and registration .j1.', enhort study, the rele' to study efficiently the effect of vinyl chlo- requirements that facilitate initial contact :Iil:;. $sygn*y notnor havenave actedacteo ancan( ride on liver angiosarcoma, or aromatic and long-term follow-up. "', wrtainly have nor ye '', ' amines on bladder cancer, epidemiologic Cohort studies are patently or conceptu- ftrlklwing identification studies have been conducted in cohorts of ally longitudinal.The study groups are ob- , thc investigatormust w workers in the plastic and dyestuff manu- served over a period of time to determine t-?"g cohortr l#ilp.;'',.lir. facturing industries, respectively. the frequency of diseaseoccurrence among Mcthodologically, th The general population cohort is appro- them. The distinction beween retrospec- i ',1 sshort studies:closed c ' ' priate when the exposure under consider- tive and prospective cohort studies depends , *pcn or dynamic cohc ation is fairly common. Classicalexamples on whether the casesof diseaseoccurred in ,,i-ir,pc frequentin occupa of general population cohorts, in which the the cohon at the time the study began. [n ffid the study of outbr professionfacilitated accessibilityof cohort a retrospective cohort study, exposures and ;: &*arts dominate canc( members rather than being a study factor, health outcomes occurred before the in- funr the conceptualb include the British Doctors Study and the vestigation started. These are typically as- ,*FRtrolstudies. The key Nurses Health Studv. The British Doctors sembled from pre-existing records of a *sfienand closedcohor CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 135

in 1951, consistedof - x1 ExposedOases doctors from Great mark study, Doll and vely followed the co- pdated information on particularly smoking, s. Indeed, prospective lg h doctors were among l1 trate convincingly the I Non-exposed person-time (Ps) : l-_l e etiology of lung can- :lll,rlllr 1955). More than four i------l- l------1-- 'om ttl the British Doctors ----rl III rovide insight into the Y Y Y lncidence rate ratio: )oll et al, 2005). (x1/P) (xo/Po) cohort is the Nurses h began in t976 with ;isterednurses. This co- Figure 6.3A. A cohort study comprises individuals who are either exposed or unexposed to the nitially to evaluate pro- factor(s) of interest. When these people are followed over time, they generate person time. of oral contraceptives Newly diagnosedcases of a particular disease,that occur while person-time is accumulated are :ancer(Hennekens et al, recorded. The exposure status of a person can change. A person could be smoking high tar cigarettesfor five years, then switch to light cigarettes for fifteen years, and then quit. Con- , diet and many other *cquently, each person can contribute to person-time in different exposure groups. A case is r srudied in relation to eonsideredexposed, if the diseaseoccurred when the person who developed the disease other chronic ; well as was accumulating exposed person-time. A caseis non-exposed if it occurred while the person : al, 2005). Information was accumulating non-exposed person time. The example assumes,for simplicity, zero la- posures has been col- tency. The total amount of exposed and non-exposed person time and the number of exposed rrough questionnaires. lnd non-exposed casescan be calculated. After that, one can determine whether more cases nples have allowed re- occurred in the exposed or non-exposed group per unit of person-time, that is, one can biomarkers and genetic calculate the incidence rate ratio. This ratio will indicate whether there is a relationship , prospective data from between the exposure and the diseaseof interest. itudy has provided in- rf both exogenous and rs in breast cancer eti- characteristic of these population over time-for example, the em- ship in the cohort is determined.In a closed rt the individuals can be ployment histories of a factory can be cohort, it is determined by a membership- npletely over time, due linked to recorded health-outcome infor- defining event that occursat a point in time. , in groups with a high mation of the workers. In a prospective For example, people who were living in :udies and registration cohort study, the relevant causes may or Hiroshima and Nagasaki when the atomic rcilitate initial contact may not have acted and the casesof disease bombs were dropped in 1945 are part of r/-up. certainly have not yet occurred. Hence, a cohort whose membership began on the ? patently or conceptu- following identification of the study cohort, date of the bombing. These subjects remain e study groups are ob- the investigator must wait for the diseaseto in the cohort until they die. I of time to determine rppear among cohort members. Open cohorts are composed of individ- 3aseoccurrence among Methodologically, there are fwo rypes of uals who contribute person-time to the co- on between retrospec- cohort studies: closed or fixed cohorts, and hort only as long as they meet the criteria cohort studiesdepends open or dynamic cohorts. Closed cohorts for a membership-defining state (Fig. 6- ; of diseaseoccurred in sre frequent in occupational epidemiology 3A). Examples of suchcriteria include place ne the study began. In gnd the study of outbreaks, whercas open of residence,age, and health status. Once rt study, exposuresand cohorts dominate cancer epidemiology and individuals can no longer be characterized :curred before the in- form the conceptual basis for most case- by the defining state(s),they ceaseto con- These are typically as- control studies.The key distinction between tribute person-time to the open cohort and existing records of a apen and closed cohorts is how member- are no longer members. Open cohorts are 136 BACKGROUND

used, for example, in cancer epidemiology and unexposed),the relative risk can be es- studies based on registry data (Hansson timated by dividing the odds of exposure et al, 1996). A person could be a member of among caseswith the corresponding odds the cohort, for example, only as long as he among the controls, the odds ratio. or shewas a residentof Swedenand was not Thus, if among 200 male patients diag- diagnosed with the cancer under srudy. If nosed with lung cancer (cases),a:'1.50 the person emigrated from Sweden to an- were smokers and b:59 nonsmokers, I other country, he or she stopped contrib- whereas among 300 men similar in age to I I uting person-time to the cohort at that time. the cases but without lung cancer (con- I I Similarly, if someone I born outside of Swe- trols), c-50 were smokers and d--250 I den immigrates there later in life, he or she were nonsmokers. the odds ratio would be would begin contributing person-time to #-#-ffi or 15.This measure isa the cohort at that time. In studies basedon good approximation to the relative risk (or Odds ratio: open cohorts it is not possible to directly risk ratio, or rate ratio). Hence, similar to xrYol lr,[t measure risk, otherwise referred to as cu- the cohort study, these data from a case- mulatiue incidence. Analyses are based on control study show a 1S-fold excessof lung '. person-time using incidencerate measures. cancer among smokers. ., Figure 6.38. It is not alw As an example, assumethat in a closed , ,itl.. The case-control study There are some features of case-control '= is cohort among 5000 nonsmoking men fol- studiesthat make this designsusceptible to ofexposed and unexpose, lowed for an average period of L0 years bias (seefollowing). A well-designedcase- ,,,i'.'.,fxposed to unexposedper (Po:50 000 person-years),xo:25 were control study, however, is a valid and cost- ,' ,t;-- {controls) without the dis, .,,.r''. nining their exposure diagnosed with lung cancer, and among efficient approach to the study of the eti- stat thcir exposurestatus, thel 10 000 smoking men followed for an av- ology of cancer and other conditions. ,t, .,-.,1,,, tntal study person-time. Tl erageperiod of 8 years (Pr:80 000 per- ',io.- lmong the casesof the disr son-years),Xr :500 were diagnosedwith Nested case-control studies .::1..:...,${n then be made Of the Od lung cancer. In this example the incidence Some case{ontrol designsare methodolog- ...,t- lhe odds ratio, which is ar rate among nonexposed would then be 50 ically superiorto others.The bestexample is ;.l.,,, Hhether there is an associ, per 10' person-yearsand among exposed the nested case{ontrol design. The defini- 750 per 1.05person-years. The reilatiue risk tion of this study design is still somewhat (incidence rateratio) would b. ffiffi, ot ambiguous (\ilfalker, 1991,; Rothman and the cases,the study bar L5. The conclusionis that there is'a 1S-fold ' Greenland, 1998). A definite requirement, r,'''ii;'. tional case-control des : . a;::;: increase in lung cancer occurrence from i . r:il-;, however, is that controls are chosen from l;:.:::'::l=::. . to bias due to selectiv smoking. the clearly defined person-time from which :'+' .,,;,1r,,differencesin recall. thi all caseshave arisen. In other words, if one fcrves the validi Case-control Studies €' w of a of the controls had developed the disease . *tudy. Case-contiol stu, In case-control studies,patients diagnosed under '..t...t$' study, he or she would have definitely ,, . lrting cohort are being r with the disease under consideration form been included among the cases.Defining the etl$ efficiency when ar: the case 'fiembers series.As in cohort studies, their underlying person-time from which a series ,jii, requires subst exposure to the f.actor under investigation of cases-for example, lung cancer cases Nested case-control is ascertained,for : I iti.L_ 'frequently example, through ques- presentingat a referral hospital-arose can ,':::: used in occ '',i": tionnaires, interviews, examination of re- be difficult. Sampling controls from a cohort :: , ology (Rothman and Gr, cords, undertaking ,'r*,' of laboratory tests in different from the one that gave rise to the {X'cupational cohort ca biological samples, and other means (Fig. casesoften results in selection bias. dclined whereas abstrac 6-38). Using the samemethods, the pattern According to more a strict definition, the Fo$ure information fro of exposure to the study factor(s) is then term nested case-control study is used only : f?quiressubstantial wor estimatedin population, the or more strictly when the underlying cohort and the corre- *fficient to investigate o in the person-time from which the case se- sponding person-time have beenpreviously lsrest and a sample fr< ries arose.This is done among control sub- enumerated and the exposure information i* the controls. Nowa, jects 'sonrol selectedas a sample of the study base was collectedprior to the diagnosis.In other studiesare used from which the casesarose. If only rwo ca- words, the controls selected are from ex- F.rsureinformation is der tegories of exposure are relevant (exposed actly the sameperson-time that gave rise to f,rprnsive laboratory pr, CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 137

: relative risk can be es- E.--->fxr Exposed Cases ; the odds of exposure he corresponding odds the odds ratio. ,00 male patients diag- Z h Exposed ancer (cases),a:150 Controls I b-50 nonsmokers, tr ) men similar in age to tt out lung cancer (con- I Non-exposedperson-time (Po) : I_ i_-_-_-_-_-_-_-_lJzo Unexposed smokers and d -250 Controls r------l'i t -F-----l------Jl------l------l* he odds ratio would be ttl 15. This measureis a ------rlv ttl r to the relative risk (or Y Y Odds ratio = rtio). Hence, similar to - xtlolvsyt Case's rese data from a case- r 1S-fold excessof lung ers. Figure6.38. It is not alwayspractical or economicalto evaluatethe entirestudy person-time. 3atures-of case-control Thecase

nably becauseof nat- fluse the respectiverelative risks deviate that about six (L00 x 0.0625) among them P rery little from the null value, but also be- would have obtained either five heads or n studies can be of i|use the tools to examine alleles across five tails in a row. )re frequently, case- genome simultaneously-rather than at It must be real:r,edthat stochastic (prob- are frequently under- I limited number usually selectedon the abilistic), in contrast to deterministic, pro- 'basis opulation, rather than of weak prior probabilities of being cessesalways have built-in uncertainty. In conceptually similar trufy associated-are only just becoming their research, all investigators want to re- miological investiga- lvailable. duce chance-related uncertainty as much is, however, that in- as possible in order to allow more reliable rnvironmental factors, THE ROLE OF CHANCE conclusions. This can be achieved mainly :, genetic association by enrolling progressively larger numbers "exposures" specific ,Scforean epidemiologicassociation could of individuals in a study. The remaining oci) of genetic poly- 'br consideredtrue and therefore deserve uncertainty can always be assessedby uti- ingle nucleotide poly- interpretationin causalterms, the role of lizing statistical procedures that generate a he specific alleles may €hance and systematic errors should be number of summary statistics, including the d to cancer or, much p-ualue. 'closely linked to the The true meaning of the p-ualue, how- :The e which may not be P-value ever, is poorly understood and the concept nvestigated allele and 'Our daily lives are full of highly unlikely itself is widely misused. Surprisingly, this llele are said to be in wents and coincidences.At the extremes, misunderstanding and misuse is quite com- n-that is, they are so thousands of people have become wealthy mon even in scientific research. Tradition- :y tend to be inherited .from lotteries; many more have died in ally, p-ualues arc expressed as numerical inkage disequilibrium :ltrange accidents, even though the probabil- fractions of 1. For example, a p-ualue of but two linked loci -lties for the respective events are extremely 0.1. for a particular positive association (or rge disequilibrium if ,imall+ay one in 100 000 or smaller. The difference) indicates that there is a 10i" rpart in the chromo- lesson is simple: Highly unlikely events chance that such an association or a more 1, sooner or later, by lrapp.n by chance all the time. Chance does extreme one (or a symmetrically opposite :r process in the mei- ..notoperate differently in scientificresearch one-that is an inverse association) would 'ing the generation of ,tttd everyday life. In science, however, appear by chance, even if there were in re- 'ords, linkage covers -proper quantification and judgment, relying ality no association at all. than linkage disequi- i on sound substantive knowledge, are nec- In essence, the p-ualue is interpretable as 'sBrI Clayton, 2005; Teare before considering chance as an un- such when only one comparison or one test '!7hen he specific allele may likely explanation for a phenomenon. is performed. multiple comparisons ;ause the correspond- . Let us take, as an example, tossing a f.atr or multiple tests are carried out the set of the ro be involved in the {unbiased) coin that has a 50% or 0.5 respectiyep-ualues loses its inter- r under investigation probability of turning up heads and an pretability. Various proceduresfor adjusting e). Many SNPs over identical probability of turning up tails. p-ualues according to the number of com- )me, or even over the ,:Tossing the coin three times and getting parisons undertaken or testsperformed have lso be evaluated, with three heads in a row is somewhat unusual been proposed (I7acholder et al, 2004). nce that most of them but it can hardly be taken as an indication A p-ualue of 0.05 or smaller is ant or are in linkage that the coin is systematically influenced uaditionally-and indeed arbitrarily- etiologically relevant '{biased) toward tails. The p-ualue tn this treated in medical research as evidence that :uation, most statisti- instance is 0.25 and is calculated by multi- an observedassociation may not have arisen gsare likely to be false plying 0.5 x 0.5 x 0.5 - 0.'1.25, and then by chance. For example, the proportion of rocedures are recom- ,doubling 0."1.25,because the symmetrically long-term smokers is found to be larger vhich ones among the opposite outcome, three tails in a row, is as among lung cancer patients than among are probably genuine .: txtreme as three heads in a row. Getting individuals without the disease and the 04). Genetic associa- five heads or tails in a row generates some p-ualue for this difference is, S8I, 0.05. ,eenvery successfulto ruspicion 1p:[r/z]5 x2-0.06251. But if This implies that the probability of finding resor polymorphisms ,100 people have tossed a fair (unbiased) a difference of this magnirude or larger (in tiology, possibly be- coin five times each, it should be expected absolute terms) is 5% if smoking were t40 BACKGROUND CON(

unrelated to lung cancer. In this situation, relative risk-effect measure,described later blinding of researchers the con- chance is considered unlikely to explain the on) and its statistical significance, - association. However, small p-ualues, in- cept of confidence interual has been devel- and devices(for exampi cluding values considerably smaller than oped. Most common are 95"h confidence inert pills, the so-called 0.05, do not guarantee that an association intervals. With a 95o/" confidence interval, ther assurethat every ft ', (or difference) is genuine-let alone causal. one can be 95t/" confideht that the interval diseaseoccurrence. oth( Even when the p-ualue is very small and covers the true measure of association(for under study, is kept at e was generated from a carefully conducted example the relative risk). But in 5 times out betweenthe exposed an< study, it could still be dismissed when the of 100, the true measure is not included. Experimental studies relevant result makes no sense (Miettinen, The confidence interval is closely linked to ' Latin dictum ceteris p 1985). Hence, a statistically significant as- the p-ualue. The width of the confidence being equal). However, sociation, linked by to a p-ualue interval is determined primarily by the de- timal conditions that cr of 0.05 or less, does not necessarilyimply sired level of confidence and the sample size. confounding and bias a ' causation. Systematic errors, generated by Hence, the interval is wider if it includes the evenin randomized con ' confounding or bias (seefollowing), cannot true value with 95% confidencethan with, over, as already indicate always be confidently discounted in obser- for example, 80% confidence. Likewise, fullv control the inher ' ' vational epidemiology. Moreover, as indi- smaller studies create wider confidence in- , role of chance, except cated at the end of this chapter, the exis- tervals-that is, greater uncertainty about tence of a genuine association that can be the true value-than larger studies. . alistic objective in man; confidently attributed to causation does not : The randomized cont necessarily imply that someone who devel- ::,. methodological advant: ' oped the diseasefollowing the exposure did SYSTEMATIC ERRORS perimental research in so becauseof that exposure. A common misconception (Miettinen, The Experimental Study ,:, .. experiments faces seri 1985) is that if a p-ualue (for example, The chance-related issuesapply to all types '-'-'- most important of whi 2:0.03) hasbeen properly derived, then its of studies, observational as well as experi . obviously not acceptabl complement (0.97 in our example) can be mental. As discussed earlier, experimental ..' intentionally to a poter interpreted as the likelihood that the re- studies undertaken under optimal condi- .r,.... fl8ent in order to asce. t' spective association is indeed causal. This tions are methodologically superior to ob- 't' tion. For this reason, misconception is rarely stated explicitly in servational studies. With randomization of :., , €orltrolled trials in hun the scientific literature, but it underlies the exposure, complete follow-up of srudy sub- ., ,', 5t*ed to evaluate trea conclusions of many epidemiologic reports jects, and double-blind assessmentof out- ',, t,.'lnd occasionally to det that are not securely anchored in methodo- come, they are not as liable to the pitfalls of ..",',' Sivepotential of vaccine, logical principles and biomedical substance. typical observational studies-that is con- :,i,, tuPPlements.In most in Lastly, it must be recognized that the p- founding and bias (Miettinen, 1985; Hen- ,: ,.,.,'di$easeetiology has to ualue itself does not convey any information nekens and Buring, t987; MacMahon and .':,,5.nal models-with inht about the strength of the respective associ- Trichopoulos, 1996; Rothman and Green- ation. A weak association may be statisti- land, 1998; Rothman,2002). Proper evalu- cally highly significant (very small p) when ation of the association between a particular the study is large, and a strong association exposure and a specific diseasepresupposes may be statistically nonsignificant (larye p) that every other factor that could influence when the study is small. Hence, all p-ualues diseaseoccurrence is either constant among are inherently dependent on the study size, subjects studied or distributed equally be- because statistical power-the abiliry to fween exposed and unexposed subjects. detect an association(or a difference)when In other words, an experimental study it exists-increases when a study is larger uses random allocation of study subjects (Rothman and Greenland, 1998). into those who will be exposed and those who will not. Thus, the two or more groups 6.4A. Infectionwit Confidence Intervals fture will tend to be similar in distribution to i=laofoundedby hepatitisB In order to integrate information about the known as well as unknown factors that ifunding is disregarded,th strength of an association (as reflected in the may influence the results. In some studies, CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 141

reasure,described later blinding of researchersand study subjects sumptions about interspecies similarities I significance, the con- tluough the useof appropriateprocedures and exposure dose extrapolations-or on tterual has been devel- rnd devices(for example,indistinguishable epidemiologic studieswith an observational n are 95% confidence inert pills, the so-calledplacebos) may fur- design. "/" confidence interval, ther assurethat everyfactor that can affect Epidemiologic studies have indeed gen- fident that the interval diseaseoccurrence, other than the exposure erated most of what is currently known rure of association (for [nder study,is kept at about the samelevel about the etiolory of human diseasesin risk). But in 5 times out htween theexposed and unexposed groups. general, and cancer in particular. At the :asure is not included. Experimentaistudies aim to fulfill the same time, however, epidemiologic studies val is closely linked to Latin dictum ceterispariba (other things have also generated conflicting results, un- ,dth of the confidence hing equal).However, in humans,the op- warranted concern about everyday expo- :d primarily by the de- . timal conditions that completely eliminate sures, and considerable confusion over the rce and the sample size. confounding and bias are difficult to create rational ranking of public health priorities ; wider if it includes the cven in randomized controlled trials. More- (Taubes,1,995). The problem arisesbecause , confidence than with, over, as already indicated, there is no way to epidemiologic studies must confront not confidence. Likewise, fully control the inherently unpredictable only the vagaries of chance but also the 'e wider confidence in- role of chance, except by the use of very problems of systematic errors that under- lter uncertainty about large numbers of study subjects-an unre- mine their validitv. larger studies. rlistic objective in many studies. . The randomized controlled trial, with its Confounding methodological advantages, dominates ex- Confounding is the systematic error gener- NC ERRORS perimental research in laboratory animals. ated when another factor that causes the [n humans, however, the undertaking of diseaseunder study, or is otherwise related rudy cxperiments faces serious obstacles, the to it, is also related to the exposure un- ssuesapply to all fypes most important of which are ethical. It is der investigation (Fig. 6-aA). Thus, if one onal as well as experi- obviously not acceptableto expose humans wishes to examine whether hepatitis C virus I earlier, experimental intentionally to a potentially carcinogenic (HCV) causesliver cancer, hepatitis B virus under optimal condi- ggent in order to ascertain cancer causa- (HBV) would be a likely confounder. Con- gically superior to ob- tion. For this reason, most randomized founding arises because HBV causes liver With randomization of controlled trials in humans have been per- cancer and carriers of HBV are more likely bllow-up of study sub- brmed to evaluate treatment effectiveness to also be carriers of HCV (because these Lnd assessmentof out- end occasionally to determine the preven- rwo viruses are largely transmiffed by the ; liable to the pitfalls of tive potential of vaccines,vitamins, or other same routes). Hence, if the confounding I studies-that is con- supplements.In most instances,research on influence of HBV is not accounted for in Miettinen,'1,985; Hen- diseaseetiology has to rely either on ani- the design (by limiting the study to HBV- 1987; MacMahon and mal models-with inherently dubious as- negative subjects) or in analysesof the data, Rothman and Green- y 2002). Proper evalu- rn benareena particular fic diseasepresupposes or that could influence either constant among iistributed equally be- rnexposedsubjects. m experimental study rion of study subjects be exposed and those :he two or more groups Figure 6.4A. Infection with hepatitis C virus (HCV), a cause of liver cancer, is (positively) ilar in distribution to confounded by hepatitis.B virus (HBV) infection, another cause of liver cancer. If this con- unknown factors that founding is disregarded, the strength of the association between HCV and liver cancer will be :sults. In some studies, overestimated. BACKGROUND CONC

. mare of the effect of I {Changer al, 2006). A well thought-out 1 iued procedures, and b, '.,, lrol measurescan redu ' i0me quanrification of i .. However, complete assu been eliminated can ne, lddition,. the reliance o{ : ' ies on a control series t Figure 6'4B.' An associationberween carrying matchesand lung cancerwould arisespuriously due to confounding by smoking_the major ."ur. of lung nless this confounding is accounred for in the design or the analysis eiency,and generalprac. .. *usceptibleto selection .. rble direction and magr - arise when eligible conr *enrarive then the strength of the association between , , of the populati , HBC and liver cancer Bias ,' the person-tim., th", g, would be overes_ : timated (Fig. 5-aA). {tffacholder et al, j,991t the problems of epidemio- A more trivial example is the strong .Compounding , 1992b;S(acholder et al. as_ logic studies is that the data are sociation between carrying almost Assumeas in the sam, matches l, never of optimal quality. cigar_ette Data collection ,, Ihat controls lighter and developing lung can_" refuse to p relies on the recollection of exposures and '. cer. Obviously, neither matchesnor lilhters rrn if they ar. smokei their accurate reporting by study partici_ c.auselung cancer and their associati,on ftorsrrokers. $7e would to pants, laboratory procedures, or ' 'rnoking the disease is due existing in the control entirely to confounding records. These sources are rarely p.rf..tl ,' $verestimate by cigarette smoking. The confoundin g f^J both the , For example, studies on diet rely on indi_ ,'i,gg$es tor, cigarette smoking, is the true cause and controls and tl of viduals' imperfect recall on how irequently ' lung cancer and the dependence , pital conrrols, neighborl of cigarette they eat specific foods, or on lighting on serum markers C{rltrols enrolled throug match., oriighters generatesthe of nutrients '... that are far from perfect indi_ phone confounded, entirelv sfurious association lists have their o cators of long-term consumption. Such of the lafter rwo factois with ,. dtesehave been extensivr the disease misclassificarion, or informatiin (Fie.5-aB). bias, can ,'l:l!lahonand Trichopoul< influence the relative risk in any direction ln contrast There are several ways to deal with ., to selectio and, thus, entails exaggeration, underesti_ '- confounding: some simple, btases,issues of chancear others more mation, or even reversal complicated. of the true associ_ :-SQuallyrelevant to They all assumethat two con_ ations. cohor ditions are satisfied: (1) rhat all the con_ .1:tinvestigations(Hennek In case-control studies, the ascertain_ '.,.;1987; founders have been identified or at least MacMahonand T ment of exposure occurs after the occur_ ,t..Rothman suspected, and (2) that the and Greenlanc identified or rence of disease.Therefore, suspected confounders this study de_ can be adequately sign is parricularly conceptualized subject to information and accurately measured. bias. In particular, $7hen.the casesmay be likely to rr ANALYSIS study is fairly large, it is always OF EPII remember their exposures differently ihan possibleto evaluateall zuspectedconfound_ STUDII controls-a form of information bias called ,:.: ers in the analysis. However, the abilirv ' recall bias. For example, a reasonable con_ Gff.rt Measures to- conceptualize and accurately -."rrr. cern is that cases,or their relatives,are in_ all of them is frequently beyona . ?he underlyinggoal of in. conrrol clined to ruminate about of any investigator. the diseaseand *termineStermrne the magnitud The result is what has identify a particular been exposure as the caus_ trequency rermed residual confounding, that is, S$e causedby ative agent, either for conscious or subcon_ confounding left unaccounted for (Mac_ l'.:'$ we accomplish this? scious reasons. Casesmay also try harder Mahon and Trichopoulos, 1996; :,i the cumulative incidencr Rothman than controls to recall relatives and Greenland, 1999). with the , lfiong those exposed t< diseaseof interest, leading to a biased esti_ , lfiple, we could observe CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 143

Ftate of the effect of the family history rate of breast cancer in a population of al- {ehang et al, 2006). coholic women is 50/10 000 person-years. A well thought-out protocol, standard- This information provides an estimate of I ired procedures, and built-in quality con- the overall disease burden in this study trol measures can reduce bias and allow base.However, we do not know how many rome quantification of its potential impact. caseswould have arisen in the study base if However, complete assurancethat bias has all the women in this population had not treen eliminated can never be achieved. In been alcoholics. In epidemiology, the un- rddition, the reliance of case{ontrol stud- exposed group stands in for the person-time i*s on a control series that simultaneously experience of the exposed group had it not has to meet criteria of compliance, compa- been exposed. Thus, we need to harvest d arisespuriously rebility to the case series, statistical effi- information from both exposed and unex- ris confounding eiency, and general practicality makes them posed person-time. *usceptible to selection bias of unpredict- There are several ways through which eble direction and magnitude. Such biases an association, or lack thereof, is assessed. arise when eligible controls are not repre- Consider a population of women exposedto fcntative of the population, or more strictly a high saturated fat diet and a group exposed the person-time, that gave rise to the cases to low saturated fat diets that are followed (Vacholder et al, 1,992a;I(acholder et al, for 5 years to see if they develop breast 'sfacholder ,roblems of epidemio- 1992b; et al, 1.992c). cancer. The absolute effect of the high-fat the data are almost Assume as in the same previous example diet would be the difference in the cumula- rality. Data collection that controls refuse to participate more of- tive incidence bet'ween the fwo groups, or :tion of exposures and ten if they are smokers than if they are the difference in the incidence rates. Since ting by study partici- nonsmokers. We would then underestimate the experienceof the low saturatedfatgroup :ocedures, or existing *moking in the control group and thereby should represent what would have h"p- :es are rarely perfect. overestimate both the difference between pened to the high saturated fat group if they r on diet rely on indi- easesand controls and the excessrisk. Hos- had not eaten the high saturated fat, and if :all on how frequently pital controls, neighborhood controls, and the n,rrogroups are equivalent with respect s, or on serum markers Gontrols enrolled through searchesof tele- to other breast cancer risk factors, the dif- far from perfect indi- phone lists have their own problems, and ferencein risks or rates representsthe excess r consumption. Such thesehave been extensively discussed(Mac- risk or rate. These absolute-effect measures information bias, can Mahon and Trichopoulos, 1995). are called the risk difference and rate dif- : risk in any direction In contrast to selection and information ference, respectively. aggeration, underesti- biases,issues of chanceand confounding are Although the absolutemeasures are easily rsal of the true associ- equally relevant to cohort and case-control interpreted, more common are effect mea- investigations (Hennekens and Buring, sures that are taken as ratios and collec- rudies, the ascertain- 1987;MacMahon and Trichopoulos, 1,996; tively known as the relatiue risA. This term :curs after the occur- Rothman and Greenland, 19981. includesthe risk ratio, rate ratio, odds ratio, :refore, this study de- standardizedmortaliry ratio, and standard- ubject to information ized incidence ratio. The risk ratio is simply ases may be likely to ANALYSIS OF EPIDEMIOLOGIC the cumulative incidence of diseaseamong ,sures differently than STUDIES the exposed, divided by the cumulative in- nformation bias called cidence among the unexposed. The rate ra- ple, a reasonablecon- Effect Measures tio is a ratio of the rates of diseaseamong their relatives, are in- The underlying goal of epidemiology is to the exposed and unexposed. The odds ratio rbout the diseaseand determine the rnagnitude of change in dis- is the odds of disease among the exposed exposure as the caus- :easefrequency causedby an exposure. How divided by the odds of disease among the 'conscious or subcon- , do we accomplish this? \fle could measure unexposed. Lastly, the standardized mor- i may also try harder ,the cumulative incidence or incidence rate tality ratio or standardized incidence ratio is all relatives with the gmong those exposed to a factor. For ex- a ratio of the observed number of deaths or ading to a biased esti- ample, we could observe that the incidence casesin a cohort, divided by the expected 1,44 BACKGROUND CONC

number of deaths or cases in the general expectation of the joint effect of factors A Table G2. Definitionsof i population, usually stratified by age and and B can be assessedin either an additive or exposed(+) or not exposed gender. a multiplicative way. thesefactors comprise the '$7e A relative risk value of 1 implies that the can use the example in Table 6-2 to Table 6-2A. Staristical int exposure under study does not affect the illustrate how interaction is assessed.When superadditive factors. incidence of the disease under consider- a multiplicative scale is assumed, there is ation. Values below and above f. indicate a statistical interaction if the relative risk negative (inverse)and a positive association, among those exposed to both factors A Factor A respectively. For example, a relative risk of and B (that is, RRas) is different than the 0.5 implies that the diseaseoccurs only half product of the two individual relative risks 1.0 (referen as frequently among exposed as among un- (that is, RRe x RRe). \fhen an additive exposed individuals; the studied factor ap- scale is assumed,there is interaction if the 4.0 pearsto be protective. In contrast, if the rela- RRes is different than (RRa + RRB - 1). tive risk is 1.5, then the occurrence(usually In this exanrple, the expected relative risk the incidence) is 50% higher among ex- for someone with both exposures is 5.0 Table G2B. Statistical intt and supermultiplicative fac posed than among unexposed individuals. (6.0-- [4.0 + 3.0-1J) under the additive- Studiesbased on follow-up of closed co- effect assumption (Table 6-2A), whereas it horts may be ana|yzed by using either cu- is 12.0 (12.0:4.0 x 3.0) under the multi- Factor A mulative incidence (risk) measures or by plicative- effect assumption (Table 6-2F-l'. counting person-time and calculating inci- Hence, interaction between two expo- 1.0 dencerate measures.Analyses based on cu- sures is present when the relative risk is (referenc mulative incidence measuresare only useful significantly different from what is ex- 4.0 under certain conditions, such as no /oss pected according to a specified scale. Thus, to follow-up, no competing risks, and un- for those with both exposures, we would changed exposure status throughout fol- have interaction on the additive scaleif the Teble6-2C. Effectson lun low-up. In addition, study subjects should relative risk is significantly different from be followed for the same period of time. 6.0 (Table 5-2A), and on the multiplicative \0fhetheror not theseconditions are met, it is scale if the relative risk is significantly dif- Smoking always valid to conduct analyses based on ferent from Q.A Gable 6-2F-l.If the rela- 1.0 person-time, using incidence rate measures. tive risk following exposure to both fac- (referenc tors compared to having neither is greater 10.9 Interaction than the sum (minus the reference risk of *t*erce: Hammond et al. 1979. The term interaction has been used to de- 1, which should not be counted twice) or scribe different biological and statistical product of the individual risks, we call this .,:. i:l-::. concepts. Indeed, even in the epidemiologic interaction super additiue or super multi- :', literature, statements about interaction are plicatiue, respectively. If the relative risk is :r Compared to men wh -, often ambiguous and inadequately speci- significantly lower, we refer to this as either .,i Fxure, the relative risk t fied. From a biological point of view, com- subadditiue or submubiplicatiue. l Htokers, but who were 'Sfe ,...'- I ponent causes within the same sufficient can illustrate the concept of inter- '', ' h*tos occupationally wa cause may be thought of as interacting action using data from an epidemiological of those 'i't'ttttt------.ti-.tl.iidt exposed to (Fig. 6-1). In other words, the exposuresact study of asbestos,smoking, and lung cancer ,;:1i:1.;.;,' Xare not smokers, was synergisticallyto produce disease,since in the risk. The source population for the data l'r.i-,Fd to both asbestos absenceof one factor, diseasewill not occur shown in Table 6-2C is a cohort of in- ;F'*lative risk of lung canc ' by that mechanism.From an epidemiologic sulation workers from the United Statesand , , *rcd to those with neit point of view, interaction is frequently Canada (Hammond et al, 19791. The ex- t*nmple, there appearsr( :" =-i,fo,'-'i.*- characterized as effect-modification: That posed person-time was the experience of ndditive*.lli-i-,^ r."i.,^^^l^ since^i-^^ is, a factor A and factor B alone have a over 12 000 male workers with at least 20 i:j",rilan,,, = 53.2 is substanr certain relationship with a disease,but to- yearsof asbestosexposure. The comparison ;€fo rxpected relative risk gether the factors have an effect different person-time came externally from the ex- itive model (RR,-o1 than that expected on the basis of the perience of more than 73 000 men of sim- i** 10.9+ 5.2 - 1).I(e magnitude of their individual effects. The ilar socialclass. fhr+rrc interactionon CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 145

"l"able e joint effect of factors A 6-2. Definitions of interaction. Relative risks of developing a certain diseaseamong subjects isedin either an additive or cxposed(+) or not exposed(-) to one or both factors denotedA and B. Subfects exposed to neither of ,ray. thesefactors comprise the reference category and their relative risk is by definition 1.0. in Table 6-2 to r example Table 6-2A. Statisticalinteraction on the additive scalewith examplesof subadditive and eraction is assessed.When superadditivefactors. scale is assumed, there is :tion if the relative risk Factor B posed to both factors A t'actor A Rne) is different than the 1.0 3.0 ro relative risks individual (reference) RRs). When an additive 2.0 Subadditive there is interaction if the 4.0 6.0 Expected under additive effects assumption r than (RRe + RRB - 1). 8.0 Superadditive the expected relative risk Table GZB. Statisticalinteraction on the multiplicative scalewith examplesof submultiplicative :h both exposures is 5.0 and supermultiplicativefactors. -11) under the additive- r (Table 6-2A), whereas it Factor B .0 x 3.0) under the multi- Factor A ;sumption (Table 6-28). :tion between two expo- 1.0 3.0 (reference) when the relative risk is 8.0 Submultiplicative ex- erent from what is 4.0 12.0 Expected under multiplicative effects assumption to a specifiedscale. Thus, 16.0 Supermultiplicative oth exposures' we would Effects on the additive scaleif the Table 6-2C. on lung cancer risk of smoking, asbestos,and both factors. gnificantly different from Asbestos and on the multiplicative , !imoking ve risk is significantly dif- (Table 5-28).If the rela- 1.0 5.2 ng exposure to both fac- (reference) r having neither is greater r0.9 53.2

.inus the reference risk of Source:Hammond et al- 1979. not be counted twice) or dividual risks, we call this ' additiue or suqer multi- ively. If the relative risk is Compared to men who had neither ex- scale,since the relative risk for both smok- rr, we refer to this as either posure,the relative risk of those who were ing and asbestos (53.2) does not repre- 'bmuhiplicatiue. smokers,but who were not exposedto as- sent a significant departure from what 'ate the concept of inter- bestosoccupationally was 10.9; the relative is expected under the multiplicative-effect r from an epidemiological risk of those exposedto asbestos,but who assumption (55.7: RRr-ok.. x RR"r6"r- , smoking, and lung cancer were not smokers, was 5.2. For those ex- asbestos-'1,0.9 x 5.2 ). population for the data posed to both asbestosand smoking, the There are not any clear-cut guidelines 6-2C is a cohort of in- relativerisk of lung cancer was 53.2 com- on whether to assess interaction in the from the United Statesand pared to those with neither factor. In this additive or multiplicative sefting for the rnd et al, 1,979).The ex- example,there appearsto be interaction on various diseaseoutcomes examined in epi- le was the exPerience of the additive scale, since the RRr-oker and demiology, although both approaches are e workers with at least 20 rrbestos:53.2is substantially higher than used (Brennan, t9991. exposure.The comParison lhe expectedrelative risk of 15.1 under the from the ex- additive model (RRr-or". * RR"r6"..o, Meta-analysis e externally '\J7e than 73 000 men of sim- l-'1,0.9 + 5.2 - 1). do not, however, Random variation per se in epidemiologic observe interaction on the multiplicative studies is not an insurmountable problem. 146 BACKGROUND col

Larger studies and eventually quantitative gin, but they should not be confused with sation in a certain indi summary analyses are increasingly used. the establishment of causation based on perceivedlikelihood o Such systematic statistical evaluations of scientific considerationsalone. befween a particular resultsof severalindependent investigations When results of an observational epide- cific diseasemoves for can effectively address genuine chance- miologic study designedto addressa specific a continuous spectrur related concerns. Quantitative summary hypothesis are striking, the study is large, accumulate. The evid analyses have been termed meta-analyses and there is no evidenceof overt confound- declared as sufficienr andpooled analyses.There is no completely ing or major biases, it is legitimate to thresholdhas beenrea, accepteddistinction between the rwo terms, attempt etiologic inferences. In contrast, requiresreevalutation although meta-analysis is used more fre- interpretation becomesproblematic when a quent evidence(Cole, quently when published results are com- weak associationturns out to be statistically bined. By contrast, in pooled analysis pri- significant-for example, in a large but im- The IARC Classificati mary individual-level data from different perfect data set. Although that association The International Age studiesmay be made available to an inves- could reflect.a weak-but genuine-

not be confused with liationin a certain individual. In , the in Table 6-4. Group 1 indicates that there I causation based on perceivedlikelihood of a causal association is sufficient evidence to conclude that the lns alone. berween a particular exposure and a spe- agent is carcinogenicto humans. A label of r observational epide- cific diseasemoves forward or backward in group 2A means that there are insufficient .edto addressa specific il continuous spectrum as research results human data, but there is strong evidence ng, the study is large, accumulate. The evidence for causality is that the agent is carcinogenic in animal rce of overt confound- declared as sufficient when a oarticular models. Agents for which there is limited s, it is legitimate to rhresholdhas been reached, but on occasion evidence in humans and insufficient evi- ferences. In contrast, rcquiresreevalutation in the light of subse- dencein experimental animals are assigned es problematic when a quent evidence(Cole, 1997). to group 28. Group 3 is used when there is rsout to be statistically inadequatehuman and animal data to come rple, in a large but im- The IARC Classification to a conclusion. Group 4 indicates that the rough that association The International Agency for Researchon agent is most likely not a carcinogen in -but genuine-causal (lancer (IARC) evaluatesthe risk of specific humans based on adequate evidence sug- also be the result of agentsto determineif they are carcinogenic gesting that it is not a carcinogen in both ;, subtle unidentifiable in humans. In order to come to a conclu- animal models and human studies. aps following a multi- sion,the IARC has implemented its own set of criteria for evaluating the carcinogeni- The Processof Causal Inference :ration of an associa- city of agents.After considering all the ev- Criteria for causality can be invoked, ex- .ion and magnitude in idence,the IARC working group assignsthe plicitly or implicitly, in evaluatingthe results rtaken by different in- agentto one of five categories,summarized of a single epidemiologic study, although, nt population groups, in a genuine causal :lusivelyestablish this. Table6-3. The Hill criteriafor inferringcausation es establish causality. ( :ntially addressthe is- lriteria. Definition ,videno guaranteethat Strength A strong association is more likely to be causal. The measure of strength of an recognized confound- association is the relative risk and not statistical significance. ting have not operated ( ionsistency An association is more likely to be causal when it is observed in different rdies.It is at this stage population groups. rd con- epidemiologic Specificity When an exposure is associatedwith a specificoutcome only (for example, a , taken into account in cancer site or even better a particular histological type of this cancer), then ts of empirical studies. it is more likely to be causal.There are exceptions,however, for example, ng causation from ep- smoking causing severalforms of cancer. 'l'cmporality rtions have been pro- A causeshould not only precedethe outcome (disease),but also the timing of 's, by several authors, the exposure should be compatible with the latency period (in non- )n et al (1960),, the infectious diseases)or the incubation period (in infectious diseases). I (US Department of ( iradient This criterion refers to the presence of an exposure-responserelationship. If Austin Bradford Hill the frequency or intensity of the outcome increaseswhen an exposure is .C (1987), and others. more intenseor lasts longer, then it is more likely that the associationis causal. rin emphasis,a similar been invoked by most I'lausibility An associationis more likely to be causal when it is biologically plausible. Bradford Hill (t9551 (.oherence A cause and effect interpretation of an association should not conflict with ridely used criteria lis- what is known about the natural history and of the disease,or its listinguish causal from distribution in time and place. :ls. l..xperimentalevidence If experimental evidenceexists, then the associationis more likely to be although sensible and causal.Such evidence,however, is seldom availablein human populations. :ately address the in- Analogy The existence of an analogy (for example, if a drug causesbirth defects, then ues that are posed by another drug could also have the same effect) could strengthen the e study, the results of rhat an associationis causal. the likelihood of cau- Itwrce: Hill, 1955. 1,48 BACKGROUND CONC

Table 6-4. International Agency for Research on Cancer (IARC) terns and time trends, su classification of carcinogeniciry of agents, mixtures or processes incidenceof lung cancel ing the increasing use c Group 1 The agent is carcinogenic to humans by the population; (5) Group 2A The agent is probably carcinogenic humans to exists when one type c Group 28 The agent is possibly carcinogenic to humans tently linked with one Group 3 The agent is not classifiable in its terms of carcinogenicity rather than several exP Group 4 The agent is not carcinogenic humans to sociatedwith a certain of exposure being assc diseases;and (6) biolog exists when a similar in this instance, a firm conclusion is all but (it should not contradict physical theory or shown to causea simila fo impossible. In the approach introduced by biological principles). speciesor a different Cole (19971,this situation is denoted as sin- humans. For example' gle study level, or level I. Criteria for cau- The general case (seueral studies, leuel II) shown to cause leukem sality are more frequently usedfor the assess- Establishment of the etiologic role of a par- speciesand at least one ment of evidence accumulated from several ticular exposure on the occurrence of a dis- mia in humans. epidemiologic studies and other biomedical ease ideally requires strong epidemiologic None of thesecriteria investigations. At this stage, the intellectual evidence,an appropriate and reproducible absolutely necessaryfor process is inductive, moving from the spe- animal model, and documentationatthe mo- sinequa non. But the evi cifics to generalization (severalstudies level, lecular or cellular level of the morphological strengthenedwhen mos or levelII). Finally, when causationhas been or functional pathogenetic process. Some- established at level II, then, and only then, times, an intended or unintended change, or Diseasein a specific Pe can the cause of the diseasein a particular natural experiment, greatly facilitates etio- Causality can be con< individual be considered (specific person logic inference: This happens when, for ex- betweena particular e: level, or level III). At this level, the intellec- ample, an occupational group is exposed to and a particular disease tual process is deductive, moving from the high levelsof compounds rarely encountered trast, it is not possible general concept of diseasecausation to the in other seftings, a religious group avoids an link conclusively betwt examination of what might have caused exposure that is otherwise widespread, or a a particular diseaseof diseasein a particular individual. vaccine that createsherd immuniry against a for example, smoking i particular virus turns out to reduce the in- cancer.It is possible, h The indiuidual study (leuel I) cidence of a certain form of cancer. ductively that the spe Causality can never be inferred on the basis Theseconditions, however, arerarely col- nesswas rnore likely th of a singleepidemiologic study, but the like- lectively satisfied.Instead investigators have specifiedexposure. lihood that an observedassociation is causal to be guided by the bestavailable biomedical For this conclusion is strengthened when several of the follow- evidence in order to interpret correctly epi- following criteria must ing criteria are met: (7)minimal confound- demiologic data from several studies. The (1)Theexposureunder ing; (2) minimal bias; /3/ limited chance following criteria need to be considered:(1) entify, must be an estt variation; (4) rclatively strong association; consistency,that is similarity (lack of het- diseaseunder conside (2) /5i monotonic exposure-diseaseassociation, erogeneity) of results obtained by different (level II). The rele' otherwise referred to as exposure-response investigators using different study designs particular individual r or dose-response association; (6) internal in different populations; (2) overwhelming comparable(in terms ( consistency, exemplified by similarity of biomedical evidence for weak associations, nssociatedlatencY, etc exposure-responsepatterns among various whereas for strong associationsreliance on heenshown to causetl subgroups of study subjects; (7) compati- powerful biomedical knowledge is less crit- sideration. 13l The dir bility of the temporal sequenceof exposure ical; (3) compatibiliry of exposure-response person must be identi and outcome with the known or presumed patterns across different studies exploring fymptomatological spr latency of the disease;and, lastli, 8) Aio- the exposure-diseaseassociation in different that, as an entity, h: logic plausibiliry, that is, a causal link be- exposure ranges; (4) coherence, which re- linked to the exposure rween the exposure and the diseaseshould quires results from analytic epidemiologic not have been exPost be, at a minimum, biologically conceivable studies to be compatible with ecologic pat- lished or likely cause CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 149

ARC) terns and time trends, such as the increasing patient has been exposed to both the factor esses incidence of lung cancer over time, follow- under consideration (for example, smoking) ing the increasing use of tobacco products and to another causal factor (for example, by the population; (5) specificity, which asbestos),individual attribution becomes a exists when one fype of disease is consis- function of several relative risks, all versus tently linked with one fype of exposure the completely unexposed: (a) relative risk )genlclry rather than several exposures all being as- of those who only had the exposure under sociated with a certain disease,or one rype consideration, (b) relative risk of those who of exposure being associated with several had only been exposed to the other causal diseases;and (5) biological analogy, which factor(s), and, (c)relative risk of those who cxists when a similar exposure has been have had a combination of these exposures. lict physical theory or shown to cause a similar diseasein another (5) The relative risk should be reasonably speciesor a different form of the diseasein elevated(e9,2 or more). humans. For example, viruses have been The last criterion stems from the fact that 'eral studies, leuel lI) shown to cause leukemia in several animal the relative risk comprises a baseline com- :tiologic role of a par- speciesand at least one rare form of leuke- ponent equal to 1.,which characterizes the re occurrence of a dis- mia in humans. unexposed, plus another component that strong epidemiologic None of thesecriteria can be consideredas applies only to the exposed. I7hen the rel- 'iate and reproducible absolutely necessaryfor causal inference-a ative risk is higher than 1 but lessthan 2 the :umentation at the mo- sinequa non. But the evidencefor causaliryis individual who has been exposed and has ,l of the morphological strengthenedwhen most of them are met. developed the disease is more likely than enetic process. Some- not to have developed the disease for rea- unintended change, or Disease in a specific person (leuel III) sons not entirely due to the exposure. For treatly facilitates etio- Causality can be conclusively established instance, if the risk of a light-smoking 55- rappens when, for ex- between a particular exposure as an entity- year-old man to suffer a first heart attack in al group is exposed to and a particular diseaseas an entity. In con- the next five years is 67", and that of a same- nds rarely encountered tfast, it is not possible to establish such a age non-smoking man is 4"/" (relative risk igious group avoids an link conclusively bet'weenan exposure and 1..5),then only 33o/oof the smoker's risk :wise widespread, or a r particular diseaseof a given individual- (that is, 1/3 of the total 5"/o) cag be attrib- :rd immunity against a for example, smoking in a patient with lung uted to his smoking. I7hen the relative risk out to reduce the in- cancer. It is possible, however, to infer de- is higher than 2, apartrcular individual who rrm of cancer. ductively that the specific individual's ill- has been exposed and has developed the owever, are rarely col- nesswas more likely tban not caused by the disease under consideration is more likely :eadinvestigators have cpecifiedexposure. than not to have developed the diseasebe- st available biomedical For this conclusion to be drawn, all the cause of the exposure. nterpret correctly epi- , following criteria must be met (Cole, 1,997)2 ', n several studies. The (1)The exposureunder consideration,as an CONCLUSION d to be considered:(L) sntify, must be an established cause of the imilarity (lack of het- ,diseaseunder consideration, as an entity Manipulation of exposuresin humans, many obtained by different {level W. Q) The relevant exposure of the of which may be harmful, is frequently un- ifferent study designs particular individual must have properties feasible, unethical, or both. Therefore, epi- >ns; (2) overwhelming : comparable(in termsof intensity,duration, demiologists have to basetheir inferenceson for weak associations, I tssociatedlatency, etc) to those that have experiments that humans subject themselves 'been ;sociations reliance on shownto causethe diseaseunder con- to intentionally, naturally, or even uncon- knowledge is less crit- dderation. (3) The diseaseof the specified sciously. The study of risk for lung cancer ' of exposure-response pcrsonmust be identical to, or within the among smokers compared with nonsmokers rent studies exploring i $tmptomatological spectrum of, the disease is one classic example of a natural experi- rssociation in different 'that, as an entity, has been etiologically ment. coherence, which re- linkedto the exposure.ft)Thepatient must Because human life is characterized by rnalytic epidemiologic Bot have been exposedto another estab- myriad complex, often interrelated, behav- ible with ecologic pat- fishedor likely causeof this disease.If the iors and exposures-ranging from genetic 150 BACKGROUND traits and features of the intrauterine envi- Closed cohort A closed cohort comprises lrorexample, the h ronment to growth rate; physical activity; a set of individuals who are followed for a r rrrrsis a necessal sexual practices; use of tobacco, alcohol, defined period of time. After becoming a 'rrrrnodeficiencysy and pharmaceuticalcompounds; dietary in- memberof the cohort, an individualremains t,letorsmay be in take; exposureto infections, environmental in the cohort until the end of the study. or ,lrscaseto occur. pollutants, and occupational hazards; and developmentof the outcome. Nonexperimental so on-epidemiologic investigation is diffi- Competing risks The risk of death from a strrd/. cult and challenging.Given this complexity, certain diseasecompetes with the risk of ()bsentational stu, it is not surprising that from time to time death from another diseaseby affectingtime rnvestigator cannl epidemiologic studies generate results that at risk. Competing risks generallybias risk st.lrcesof the exp< appear confusing, biologically absurd, or person-time ratios,but not rate ratios,since ()tlds ratio A rela contradictory. However, it is reassuring for follow-up time. allows different tron,which is calc that a wealth of new knowledge has been Component cause An exposurethat acts in ,,.lclsof disease: generatedby epidemiologicstudies over the concert with other factors (component cau- t r.ledby the odds , last few decades.This knowledge now lays ses)to producedisease. None of thesefactors ,'rposed. the scientificground for primary prevention are sufficientin themselvesto causedisease. of many major cancersand other chronic ()pen cohort A col Confidence interaal A statistical measure diseasesamong humansglobally. 'nt'nrbershipchang provides range possiblevalues that A detailedstudy of epidemiologicmeth- that of t'rrtcringor exiting includethe true measureof associationwith odology in any textbook (Hennekens and ilil. a particular degreeof certainty. For exam- Buring, 1987; Miettinen, 1985; Walker, I'crson-time The s ple, a 95% confidenceinterval provides a 7991;MacMahon and Trichopoulos' 1995; r'.rehstudy particip range of values that will include the true Rothman and Geenland, 1,998;Rothman, 1t-ualueA value t value 95% of the time. 2002) can be fascinating and indeed neces- Ir,rod of observingi sary for thosewho want to pursuetheir own Confounding A systematicerror generated .r\, of mofe extfel research. However, for the reader of this when anotherfactor, that causesthe disease lrt'trve€rra particul: textbook, the generalconcepts introduced in under study or is otherwise related with it, rlrscrs€,if there we is also related to the exposure under inves- this chapter should provide a sufficient ba- Rrndomized cont tigation, without being in the pathway that sis. We have tried to convey that the some- rrrt'ntzlstudy desig: links exposureunder investigationwith the times esoteric theory of modern epidemi- '.rrrclomlyallocater diseaseunder study. ology can be condensed to a few central 'r rll be subjected c issues-namely (1) how to quantify and Ecologic study The study of exposure and posUfe. understandthe impact of chance (2)how to the diseaseat the population level, rather , Rrcoll bias A misc best harvest information on exposuresand than at the individual level. \rrrc, common in c outcomesfrom a sourcepopulation by using Epidemiology The nonexperimental inves- ,re urs when subje a cohort design, a case-control design, or tigation of determinantsof human disease. rrrt'rnbefor report variants thereof, (3) how to achieve valid Experimental study See randomized con- ,'rrtlythan those w results by minimizing the impact of con- trolled trial founding and bias, and, (4) how to address Raktiue risk A te Infonnation bias A random, or nonrandom, .., the central issueof causality in a structured I tl)cs the various misclassification of information on either way. ,,rr irrtiofl, that is tl the exposure,outcome, or confounding var- trr,, t[s odds ratit iablesthat leadsto a biasedestimation of the ,rr.itlcnce morta GLOSSARY or true effect. \rlaction bias Asy: Cause A factor is a causeof a certain disease Loss to follow-up The inability to follow lrorrr the processof when alterations in the frequency or inten- beyond a certain point in time and thus rlrc stLrd| or on ac sity of this factor-without concomitant ascertainthe ultimate fate of individuals in llrrt'rrCC participatic alterations in other factors-are followed a cohort study. lrr.rroccufs when t by changesin the frequencyof occurrenceof Necessarycause A factor or exposure that tlrt't'xposureand tJ the disease,after the passageof a certain is essentialin the etiology of the diseaseand Ilr, rsc in the study time period (latency, or induction period). without which the disease cannot occur. '.trrtlr'. CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 151

>sed cohort comprises For example, the human immunodeficiency Study base The person-time of a group of yho are followed for a virus is a necessarycause of acquired im- individuals at risk for a diseasefrom which ne. After becoming a munodeficiencysyndrome, although other an investigator aims to harvest information :,an individual remains factors may be involved in order for the about diseaseoccurrence. re end of the study, or diseaseto occur. Sufficient cause A minimal set of factors or utcome. Nonexperimental study See observational exposures that inevitably produce the dis- : risk of death from a Itudy. easeafter a certain period of time. retes with the risk of Obsentational study A study in which the iseaseby affecting time investigator cannot control the circum- REFERENCES sks generally bias risk ftancesof the exposure. tios, sinceperson-time BrennanP. Chapter 1.2:Design and analysisis- Odds ratio A relative measure of associa- ollow-up time. suesin casrcontrol studiesaddressing ge- iion, which is calculated as the ratio of the neticsusceptibiliry. IARC SciPubl 1,999;1,48: n exposure that acts in odds of disease among the exposed di- 123-32. (component ChangET, Hialgrim ctors cau- vided by the odds of diseaseamong the un- SmedbyKE, H, GlimeliusB, :. None of thesefactors Adami HO. Reliability of self-reportedfam- txposed. history in ;elves to cause disease. ily of cancer a large case-control Open cobort A cohort of individuals whose studyof lymphoma.JNatl CancerInst 2006; A statistical measure membershipchanges over time, with people 98(1):51-58. lf possible Clemmesen Nielsen A. Comparisonof age- values that tntering or exiting based on defining crite- J, ure of associationwith adjustedcancer incidence rates in Denmark ria. and the United States. Natl Cancer Inst f certainty. For exam- J Petson-time The sum of all time spent by 1957;192989-98. ce interval provides a each study participant at risk for a disease. ColeP. Causaliryin epidemiology,health policy will include the true and law. EnvironmentalLaw Reporter1,997; p-ualue A value indicates re. that the likeli- 2721,0279-85. hood of observingan associationas extreme Cordell HJ, Clayton DG. GeneticEpidemiology :matic error generated gs, or more extreme than, the one found 3-Genetic associationstudies. Lancet 2005; that causesthe disease berwbena particular exposure and a certain 366:1,121-37. erwise related with it, disease,if there were in fact no association. Doll R, Hill AB. Smokingand lung cancer:pre- exposure under inves- liminaryreport. Brit Med J 1950;2:739-48. Fsndotnized controlled trial An experi- rg in the pathway that Doll R, Hill AB. Lung cancerand other causes mental study design in which the researcher of death in relation to smoking. Brit Med investigation with the randomly allocatessubjects to groups that J 1956;221071,-81. Doll R, Boreham I. will be subjectedor not to a particular ex- PetoR, J, Sutherland Mor- fudy of taliry from cancer in relation to smoking: exposure and posure. rpulation level, rather 50 yearsobservations on British doctors.Br Cancer.2005;92:426-29. I level. Recall bias A misclassification of an expo- J sure,common in case{ontrol studies,that FeinsteinAR. Meta-analysis:statistical alchemy rnexperimental inves- for the 21't century. Clinical Epidemiology occurs when subjects with the disease re- '1,995;48:71-79. J nts of human disease. or report their member exposures differ- GreenlandS, Robins J. Invited commentary: See randomized con- cntly than those without disease. ecologicstudies-biases, misconceptions, and Relatiue risk A term that collectively de- counterexamples.Am J Epidemiol1994;739: .ndom,or nonrandom, 747-50. rcribes the various relative measures of as- Hammond EC, SeikoffIJ, SeidmanH. Asbestos nformation on either rociation, that is the risk ratio, the rate ra- exposure,cigarette smoking and deathrates. e, or confounding var- tio, the odds ratio, and the standardized Ann NY Acad Sci1,979;330:473-90. iasedestimation of the incidenceor mortality ratio. HanssonLE, Nyren O, Hsing'WH, A'!7, BergstromR, JosefssonS, Chow et al. The risk of Selcction bias Asystematic error that results stomachcacner in patientswith gastricor du- re inabiliry to follow from the processof selectingparticipants for odenalulcer disease.New EnglJ Med 1,996; int in time and thus the study or on account of factors that in- 3352242-49. : fate of individuals in fluence participation in the study. Selection HennekensCH, Buring JE. Epidemiology in bias occurs when the relationship befween .Boston: Little, Brown, 1,987. HennekensCH, SpeizerFE, Lipnick RJ,Rosner, ctor or exposure that the exposure and the diseaseis different for Bain C, BelangerC, et al. A case-control ogy of the diseaseand those in the study than for those not in the study of oral contraceptiveuse and breast lisease cannot occur. rtudy. cancer.J Natl CancerInst 1984;72:39-42. r52 BACKGROUND

Hill AB. The environment and disease:associa- demiology. Am J Epidemiol 1.991;1332 tion or causation?Proc Roy Soc Med 1965; 635-48. 58:295-300. Taubes G. Epidemiology facesits limits. Science Hunter DJ, MorrisJS, StampferMJ, Colditz GA, 1,995;269:1.64-69. SpeizerFE, \Tillet \7C. A prospective study Teare DM, Barrett JH. Genetic Epidemiology of selenium status and breast cancer risk. 2-Genetic linkage studies. Lancet 2005; JAMA 1.990;264:11,28-31. 366:1036-44. International Agency for Research on Cancer. US Department of Health, Education and Wel- IARC Monographs on the Evaluation of fare. Smoking and Health. Report of the Ct Carcinogenic Risks to Humans, Supplement Advisory Committee to the Surgeon General 7, Overall Evaluations of Carcinogeniciry: of the Public Health Service. Publication BY An Updating of IARC Monographs, Vo- 1103. Washington, DC: US Government lumes "1.to 42, Lyon 1987. Printing Office; 1964. MacMahon B. Epidemiological evidenceon the rilTacholderS, Mclaughlin JK, Silverman DT, nature of Hodgkin's disease.Cancer 1957; Mandel JS. Selection of controls in case- 10:1045-54. control studies:I. Principles.Am J Epidemiol MacMahon B. Strengthsand limitations of epi- 1,992;1.3 5 :1.01.9-28. demiology. In: The NationalResearch Coun- lVacholder S, Chanock S, Garcia-ClosasM, El cil in 1979. Current issues and studies. Ghormli L, Rothman N. Assessingthe prob- Washington, DC: National Academy of Sci- ability that a positive report is false: an ap- ences,1979291-104. proach for molecular epidemiology studies. MacMahon B, Pugh TF, Ipsen J. Epidemiologic J Natl Cancer Inst. 2004;962434-42. Methods. Boston: Little, Brown, 1960. Wacholder S, Silverman DT, Mclaughlin JK, MacMaho4 B, Trichopoulos D. Epidemiology: Mandel JS. Selection of controls in case- Principles and Methods. Boston: Little, control studies: II. Types of controls. Am J Brown, 1,996. Epidemiol 1992;135 :1.029-41. Miettinen OS. Theoretical Epidemiology: Prin- Wacholder S, Silverman DT, Mclaughlin JK, ciples of Occurrence Researchin Medicine. Mandel JS. Selection of controls in case- New York:'Wiley, 1985. control studies: III. Design options. Am J Morgenstern H. Usesof ecologic analysis in ep- Epidemiol 1992;135 :1042-50. idemiologic research. Am J Public Health \0alker AM. Observation and inference:an in- 1,982;72:7336-44. troduction to the methods of epidemiology. Rothman KJ. Causes. Am J Epidemiol 1,975; Newton Lower Falls, MA. Epidemiology 1,04:587-92. ResourcesInc, "1.99"1.. Rothman KJ, Modern Epidemiology. Boston: WeiderpassE, Adami HO, Baron JA, Magnus- Little, Brown, 1,986. son C, Bergstrom R, Lindgren A, et al. Risk Rothman KJ. Epidemiology: An Introduction. of endometrial cancer following estrogen New York, Oxford University Press,2002. replacement with and without progestins. Rothman KJ, Greenland S. Modern J Natl Cancer lnst 1,999;91,:1,1,31,-37. Epidemiol ogy-2"0 Ed. Philadelphia: Wynder EL, Graham EA. Tobacco smoking as a Lippincott-Raven, 1998. possible etiologic factor in bronchiogenic SacksHS, BerrierJ, Reitman D. Meta-analysisof carcinoma-a study of 584 proved cases. randomized controlled trials. N Engl J Med JAMA 195 0;1.43 :329-36. 1,987;316:450-55. Zhang SM, Hankinson SE, Hunter DJ, Gio- Shapiro S. Meta-analysis/Shmeta-analysis.Am J vannucci EL, Colditz GA, Villett VC. Folate Epidemiol 1994 ;1,40:7 7 1-7 8. intake and risk of breastcancer characterized Susser M. IThat is a cause and how do we by hormone receptor status. Cancer Epide- know one? A grammar for pragmatic epi- miol Biomarkers Prev. 2005 :1,4:2004-8.