,l DesignConsiderations in Molecular

MontserratGarcia-Closas, Qing Lan, and Nathaniel Rothman Divisionof CancerEpidemiology and Cenetics,National Cancer lnsttute, Departmentof Healthand HumanServices, Bethesda, Maryland, U.S.A.

INTRODUCTION

There is a wide range of biomarkers that can be used in population-based molecular epidemiological studies of cancer. These include biomarkers of exposure, intermediate endpoints (e.g., biomarkers of early biological effect), disease, and susceptibility (1-7) (Fig. l). Hypothesis-driven biomarkefs have been used for many years in molecular epidemiology studies of cancer (e.9., measurement of xenobiotics and endogenous carcinogens, macromolecular adducts, cytogenetic endpoints in cultured lymphocytes, DNA mutations in tumor suppressor genes, and phenotypic and genotypic measures of genetic variation in candidate genes). Perhaps the most revolutionary change that has occurred in molecular epidemiology in the past several years has been the emergence of discovery technologies that can been incorporated into a variety of study designs and include -wide scans of common genetic variants, messenger RNA (mRNA) and microRNA expression rurays, proteomics, and (also referred to as metabonomics) (8-14). These approachesare allowing investigators to explore biological responsesto exogenous and endogenous exposures,to evaluate potential modification of those responsesby variants in essentially the entire genome, and to define tumors at the chromosomal, DNA, RNA, and protein levels. At the same time, with the incorporation of more powerful technologies into molecular epidemiology studies, there has been greater concern that the rights and confidentiality of study subjects be protected. A discussion of informed consent is outside the scope of this chapter, but we do note the critical need to consider ethical issuesand informed consent proceduresat the outset ofdesigning a study. The focus of this chapter is on design considerations for epidemiological studies of cancer that use biomarkers primarily in the context of etiological research. We first discuss the advantagesand disadvantages of classical epidemiological study designs for the application of biomarkers. We then describe biospecimen collections and sample size requirements for certain types of molecular epidemiology studies. Garcia-Closas et al'

SUSCEPTIBILITY

Exposure +

reflecting the carcinogenicprocess resulting from Figure 1 A continuum of biomarker categories xenobiotic exposures.Source" From Ref' l '

STUDY DESIGNS IN MOLECULAR EPIDEMIOLOGY

(fS-tZ) is outsidethe scopeofthis A descriptionofrhe generalprinciples ofstudy design and disadvantages of classical chapter. Instead, we will discuss the advantages epidemiologicalstudydesigns(i.e.,cross-sectional,case-control,andprospectivecohort) use of biological specimens'Potential that are particularly relevait to the collection and arise becauseof advancesin new biomarkers for epidemiologicalresearch continually laboratory techniques'when a the understandingof dir"ur" eti,ologyand in molecular some very basic issues'such as promising new biomarker emergesiio. ttt. laboratory, before considering its application in assayaccuracy and reliability, ieed to be assessed biomarkers for use in stuaies it). iit"t" initial efforts to characterize "pial*iorrgicat studiesby someinvestigators (18',1)' epidemiologicalstudies rtuu" u""n calledtransitional the critical need to characterizethe a term that serves to heighten awarenessabout before they are used in molecular determinants of biomarker levels and assays biological samples' In this epidemiological studies with precious, nonreplenishable for the use of biomarkersthat have section,we will focus on studyiesign considerations alreadYbeen characterized'

Cross'sectionalStudies with BiomarkerEndpoints interest in studying the relationship Cross-sectionalstudies are used when there is and a biomarker' which is between particular or demographiccharacteristics "*forur", generally carried out on healthy subjects' treated as the outcome variable, and are can be measuredat one or several Biomarkers of exposureand intermediateendpoints in the exposureand intraindividual points in time, dependingon the temporal variauility variation in the resPonse' Thestandarddesignistohaveonegroupof..exposed''studysubjectsanda drawn from-the same basepopulation comparably ,ir"o groufif "unexposed"subjicts, sex, and tobacco use, to improve and often matched on several factors, such as age, short halflives and a population efficiency. When biomarker endpointshave relatively design can be used' where subjects can be studiedbetore blgins, an alternative "*po.ur. aresampledbefore"*p*o,.beginsandagainafteranappropriatelengthoftime. Cross-sectionalstudiesgenerallyfocusonbiomarkersofexposureandintermediate a study populationhas been exposed endpoints.This design is ofte"nused todetermine if and the determinantsof the exposure to a particulu. the level of exposure, "o*iound, (22,23),andsometimesisusedtovalidatevariousapproachestomeasuringexternal 3 in MolecularEpidemiology a-Closas et al. Design Considerations

Biomarkers of exposure, exposure (e.g., questionnaires, environmental monitoring). of exogenous or endogenously discussed in chapter 7, measure internal exposure levels tissues or body fluids. A wide range of exposures can be froduced compounds in either nutrients, infectious agents' and measured biologically, including environmental factors, endogenouscomPounds. endpoints from cross-sectional studies also can be used to evaluate intermediate \ as well as from lifestyle ----+loisease exposures in the diet, general environment, and workplace, design can be used to provide I factors such as obesit/and reproductive status. This relationships and to supple- mechanistic insight inio well-eitablished exposure-disease essresulting from of an exposure (24)' As ment suggestiveLut inconclusive evidence of the carcinogenicity that use cancer endpoints' In such, they complement classic epidemiological studies clues about the carcinogenic addition, intermediate biomarkers can provide initial (1'6'25-27)' potential of new exposuresyears before cancerdevelops biological effect (l'28) One group of interr"iiut" biomarkers, biomarkers of early (Fig.1),generallymeasuresearlybiologicalchangesthatreflectearly,nonclonal,and effect biomarkers include the scope of this generally nonpersistent effects. d^umpt"i of early biological alterations, DNA, RNA, and protein ;es of classical measures of cellular toxicity, chromosomal (e.g., altered DNA repair' lspective cohort) expression and early nonneopLstic alterations in cell function effect markers are measured in :imens. Potential altered immune function). denerally, early biological (e.g', blood cells, white blood cells' : of advancesin substancessuch as blood and blood components red rniques. When a DNA,RNA,plasma,sera)becausetheyareeasilyaccessibleandbecauseinsome as surrogates for other organs' c issues,such as instances it is reasonable to assume that they can serve in other accessible tissues such as its application in Early biological effect markers also can be measured surface tissue scrapings or sPutum arkers for use in skin, ceruical and colon biopsies, epithelial cells from cells in feces, and epithelial cells in stigators(18-21), samples, exfoliated urothefi;l cells in urine, colonic include measures of circulating characterize the breast nipple aspirates. other early effect markers have epigenetic effects on cancer ld in molecular biologicaily active compounds in plasma that may samples. In this devel-opment(e. g., hormones, growth factors, cytokines)' narkers that have Cross-sectionalstudiescanalsobeusedtoextensivelyevaluatethegenetic the candidate gene approach has determinants of a biomarker endpoint. Traditionally, beenemployed,wherefunctionalorputativelyfunctionalvariantsinbiologicallyrelevant levels (22,23,29)' with genes are analyzed to determine how they influence biomarker generation of studies is being the advent of genome-wide scanning technology, a new of genetic variants for their launched that will agnostically-on evaluate a large number 1 the relationshiP include classic genotoxicity' potential influence biomarker endpoints. These )marker, which is a new generation of assays cytogenetic, hematological, and immunological biomarkers; healthy subjects. alterations such as telomere that include measures-of genomic stability and epigenetic at one or several identified by discovery and global methylition status (30-32); and biomarkers urd intraindividual length technologies described earlier' a Adistinctadvantageofthecross-sectionalstudyisthatverydetailedandaccurate ly subjects and past exposure patterns (23,33) and information can be collected on current as well as re base poPulation sample size in these studies potential confounders and effect modifiers. Further, as o use, to imProve prospective cohort studies, it is typically can be much smaller than in case-control or ; and a poPulation processing of biological to invest substantial resources into very extensive ed, where subjects feasible samples,oftenbeyondwhatresourcesallowinalargerstudy.Thisalsoenablesan ngth of time. to be collected and of new technologies that require biological samples e and intermediate evaluation processedin very precise and intensive ways (23'33)' r has been exPosed is that it is often unknown if At the sametime, an important caveat in thesesstudies Its of the exPosure (25)' As such' biomark", und", study is predictive of developing cancer neasuring external the intermediate Garcia-Closas et al' 4

itisimportanttocautiouslyinterpretresultsfromthesestudydesigns,aSaparticular perturbationsthat are of uncertainrelevance' exposuremay causemeasuiable biological

Case-Control Studies in case- wherebiomarkers are the outcomevariables' In contrastto cross-sectionalstudies controlandprospectivecohortstudiestheriskofdiseaseistheoutcomeofinterest.In by questionnaire,medical record abstraction' case-controlstudies, .irt tu"torr, measured externa]databases,biomarkers,etc.'arecomparedbetweensubjectswithandwithouta of large numbersof cancercases rni, a"rign allows efficient enrolling particulardisease. -of study of This is of particular importance for the relatively ,f,o.t p"'iod' time' in cohort studies' that occur in small numbersin prospective uncommon tumors on how the can be hospital- o. populution-baseddepending case-control studies all Population-basedstudies attempt to identify casesand controlsare identified (Tablsl). during a specifiedperiod of time, and controls casesoccurring ln a p;"rin"o population arearandomsampleofth"sou,"epopulationwherethecasescome.ontheotherhand'or are identified among subjectsadmitted to casesand controlsin hospital-basedsiudies hospitals. As in the population-baseddesign' seen in clinics asro.iatei with specific rep^resentthat from the source or in th" control group should. the distribution to define "*po.*", sourcepojulation is often more difficult population of the cases.However, the epidemillogy studies often use the hospital-based in hospital-basedstudies. Molecular the enrollment of subjectsas becausethe hospital setting facilitates case-controldesign subjectsis of biologlcal specimens.Enrollment of well as the collection and processing alsofacilitatedbyhavingin-personcontactwithstudyparticipantsbydoctors'nursesor interviewers,whichusuallyresultsinhigherparticipationrates(34).Becausestudyor 'pt"ud out ihan-those in population-based subjectsare generally f"" g"ogtuphically cohortstudies,rapidshipmentofspecimenstocentrallaboratoriesformoreelaborate of lymphocytes is facilitated' Rapid protocols such as cyropreservation processing of specimens hospitals also facilitates the collection ascertainmentof casesthrough the the potential influence of treatmenton some from casesbefore treatment,thus avoiding biomarker measurements'

Studies Exposure Assessmentin Case-Control single diseaseor questionnairesin case-controlstudies of a Exposureassessment through can (e'g" breast' ovarian' and endometrialcancer) multiple diseasessharing tlt iu"to" bemoredetailedandfocusedthanprospectivecohortstudiesthatoftenstudymultiple, unre]ateddiseases.However,exposureinformationandbiologicalspecimensare collectedafterdiagnosisofthediseaseandsometimesaftertreatment,andthereforedifferential betweencases measurementerror/misclassification are vulnerableto exposure in bias from questionnaire information collected and controls. oirr".lniiur errors or recall have only been proven for a few exposures' case_controlstudies, although of concern, Similarly,theinfluenceofthediseaseprocessortreatmentonabiomarkerofinterestis oftenraisedaSaconcern,butrarelyproven.Differencesinbiomarkerlevelsamongcaseswhether differences in stages of di'"u'" can help evaluate diagnosed at different on the controls refleit an influence of the disease biomarkers, levels betwe"n1ur", and biomarker rather than the contrary' in case-control depends on certaln The applicauiiltf or "*porur" .biomarkers variability' specificity) and the ,."1u;; to the marker itself (e'g', half-life' inrrinsic features marker is for succlessfulapplication of an exposure exposurepattern. rr," rirrt prerequisite ;ia-Closas et al. Design Considerations in Molecular Epidemiology

q-< as a particular 7 , UEo ertain relevance. 'tE E t' € Eia;' E : xE .= -g E-a o I O-;

Y'= 6 ariables, in case- 2 ?.; le of interest. In 9.t I 3s .q) tr >rt F BEEEEiEii*. cord abstraction, g E .:6 i-!a*a th and without a E F>. s of cancer cases s Fg ,9 t sst*:ig; of for the study vQ)a $gE;E @@ :udies. c) E E't ts 6xx ling on how the e s FE npt to identify all 'i ie gagig;gg€igEEEg ime. and controls n the other hand, tn,[€ vox> :ts admitted to or :9 E! on-based design, E;EE *=!.c) from the source ar v OE o FtrtrC! lifficult to define €E6o ggigggg;gicgEgii vt 2iu).D re hospital-based I F.l aa aaaa aa rnt of subjects as oo ent of subjects is .xeT.EA9 loctors, nurses or €'98s, eE oE E .= t. Because study ?a gFE.Yj'-A g rulation-based or 'zHSE E 5E Pg E ; () €-e ---X'E r more elaborate () o o h 1n E 3 t$ 'acilitated. x H J c: tr! afi Rapid xE E .99 €{;-E e *E I t!-t- 'i c-B PEfr.EE XI of specimens o =E = 9g Ef ion k c U !s'i EE H.9EE5 *€ --'E g-oE € 9b eatment on some . S^'Eo " b sg ;EBE.f; Bp €gtg=.c P x if5 . K E eF## ! s.3 c) & Eie: EEet ;EEiE r.bE = :H F :F: F o9r ts"E F : a E 9 j9 i: € E C.; l:.Hr €gYP single diseaseor ! X!E'a tr x o O. etrial cancer) can i9gEE.EH,g Egk: 94,3E

tl Zsfr.EtgE;; n study multiple, sEEcEEaE? 6 a* 8EEz Ep I specimens are =? EE Z2E2E nt, and therefore ()a s*rEiEF!€ oo ;ial between cases aa a ration collected in E6 v a few exposures. 1A rker of interest is H'Av6) lvels among cases xH ;>, () gr differences in 0 F () () ,he diseaseon the E> r-r bO {) pends on certain oF pecificity) and the EF 6A v xposure marker is F II] Garcia-Closas et al.

populations, and that the assayis reliable and accurate,the marker is detectablein human and kinetics are r important eifect modifiers (e.g., nutrition and demographic variables) with the biological S kno*n (24). Second, the timing of sample collection in combination the exposure time t halflife of a biomarker of exposure is the key, as this determines be critical if the window that a marker of exposure reflects. The time of collection may exposure pattern t exposure is of brief duration, is highly variable in time, or has a distinct markers such as hormones). However, t iei., aiurnat variation for certain endogenous biomarker should chronic, near-constant exposures pose fewer problems. Ideally, the studies'However' ( persistover time and not Le affectedby diseasestatus in case-control recent exposures I most biomarkers of internal dose generally provide information about dioxins, (hours to days), with the exception of markers such as persistentpesticides, to infectious i polychlorinaied biphenyls, certain metals, and serological markers related pattem of exposure ug*,r, which can ienect exposuresreceived many years before. If the applicable in case- bling measuredis relatively continuous, short-term markers may be would be less likely' contiol studiesof patients with early disease,so that diseasebias studiesaS they However, in general,short-term markers have limited use in case-control might influence its are less likely to reflect usual patterns' and the disease or treatment absorption,metabolism, storage, or excretion'

Biomarkers of Susceptibiliryin Case-ControlStudies have evolved very The approachesto studying genetic susceptibility factors for cancer quict

L

ProsPective Cohort Studies S S Inprospectivecohortstudies,exposureinformationandbiologicalspecimensare c collectedfromhealthy,uu.l""t,,*t,oarethenfolloweduptoidentifythosewhodevelop sampling of cases studiesare conceptualizedas a retrospective disease.In fact, case-control I cohort, referredto as the sourcepopulation and controls tiom an underlying prospective ( initially very cosrlyand time consuming, (15,17).Although cohortstudy is " diseaseendpoints ( """tiiri#g*oi" cost efficient sinceit can study multiple in the long run it become, I can be easily sampled for efficiency (41)' and provides a welt-oefined population,that ideally, before the I u.e cilected before disease diagnosis and, Biological specimens I it.is the only methodable to study biomarkers beginningof the diseasefo""rr. Therefore, ( the diseaseprocess (42). Although cohort studies rhat are directly o. inoi."itty affectedby I serial biological samplesover time, many have the theoretical uauun,ug"of coll,ecting biological sampleat only one point in time' large studieshave been able tl collect a single assaysof inherited susceptibilitymarkers, Although this is not a concernfor DNA-based categoriesof markers' particularly for short- it poses some limitations for several other termexposuremarkersthatmayvarySubstantiallyfromday-to-day.Inaddition,itcanbe of exposurefor diseasecausation unless difficult to evaluatethe relevant time window are available' serial collectionsof specimensover time Theadvantageofprospectivecohortsovercase.controlstudiesforthestudyof markers are not influencedby disease,has genetic associations, .i,ougt, DNA-based "*n designs being better suited for studying been advocatedon the basis of prospective genetic exposures (4346)' In particular' interactive effects of environmental and prospectivestudiesarebettersuitedtoevaluategenotypeassociationsandinteractions endpoints,if thesebiomarkers are influenced with biomark",, ot or intermediate "*po,ure to the time of diagnosisdo not reflect past events by diseasestatus, or if nl"uru.", close relevanttodiseaseonset.Althoughcohortstudiescanminimizetheoccurrenceof misclassificationof exposureor biomarkers differential mir"tasrificution, nondi-fferential canstil]limittheassessmentofgenotype-environmentand.genotype-phenotypeonly one disease mentioned before, studies evaluating interactions. As "us"-"ontrol outcomeorafewrelateddiseasesthatfocusonparticularexposurehypothesescanobtain questionnaiies than cohort studies, thus reducing more detailed infbrmation from studies can measure exposure misclassification. Therefore, unless cohort "^forur" accuratelyandwithrepeatedmeasuresov.ertime,theymightnothaveclearadvantages of certain hypotheses' ou", .ur"r-"ontrol studiesfor the study will not develop cancer,nested case-contol Given that most membersof a cohort and,lesscommonly,case.cohortstudiesareusedtoimproveefficiency(47).Inthese random subsetof noncasesare analyzed'reducing designs,only samptesfrom casesand a The nestedcase-control design includes the laboratory requirementsand cost considerably' time and a random sampleof in the cohort up to a particuiarpoint in all casesidentified to diagnosis' Increasin-gthe case-to-controlratio subjectsfree ofdiseaseat the time ofthe case nestedcase-control studies' per casecan easily increasethe efficiency of two or threecontrols the sample of the cohort population at the onset of A case-cohortdesign includes a random ui to u particular point in The:T:c^11"^1 study and all casesidentified in the cohort lme diseaseendpoints using the same companson desiin allows for the evaluation of several since the same disease-freesubjects are group (referred to as a subcohort); however,

,..'. trcia-Closas et al. Design Considerationsin MolecularEpidemiology follow-up. Use of repeatedlyused as "controls" for different diseaseendpoints, depletion ofsamples from this are diagnosed can group can be an issue. Perhaps some historical biomarker data from a subcohort can be comparedagainst a future seriesof caseswith newly analyzeddata (e.g., genetic biomarkers, which are now analyzedwith extremely high accuracyand precision). However, in general, biomarkers should be analyzed in casesand controls, or in a comparison subcohort,at the sametime, in the samelaboratory, on the sameplatform, with the same reagents,and by the al specimens are same personnelwhenever feasible, to minimize assayerrors differential between casesand hose who develop controls. and the influence of seculartrends. : sampling of cases Multiple prospective cohort studies are currently being followed-up for cancer source population incidence with basic risk factor information from questionnaires and stored blood d time consuming, components, including white blood cells that can be used as a source of DNA. At the : diseaseendpoints completion of ongoing collections, current studies will have stored DNA sampleson over rr efficiency (41). two million individuals (7). These studies will provide very large numbers of casesof the deally, before the more common cancer sites (e.g., breast, lung, prostate, and colon) to evaluate genetic ) study biomarkers markers of susceptibility and biomarkers in serum or plasma such as hormone levels, chemical rughcohort studies carcinogen levels, and proteomic pattems. Most cohort studies do not have cryopreserved i over time, many blood samplessince the procedureis very expensiveand logistically challengingin large 'one point in time. studies.Also, cohort studiesoften have a limited capability to collect tumor sampleson eptibility markers, large numbers of subjects and to follow up casesto carry out survival studies. New cohort licularly for short- studies based on large institutions such as health maintenanceorganizations (HMOs) addition, it can be could enable accessto tumor samplesand easier follow-up of casesfor treatment response e causationunless and survival. Prospective cohort studies are sometimes designed within screening cohorts.In this s for the study of design, screening failures lead to missing prevalent cases among cohort participants that :ed by disease,has are misclassified as controls (48). Although repeated screening reduces misclassification nited for studying of subjects, cases discovered in follow-up cannot be distinguished from prevalent cases f6). In particular, missed by the initial screening or incident disease. Howevern the degree of rs and interactions misclassificationof prevalent and incident casescan be assessedby analysesof time to kers are influenced diagnosis or pathological characteristics.Intensive screening may also uncover a reservoir reflect past events of latent diseasesthat would not otherwise become clinically relevant and that might the occurrence of differ from diseasedetected through clinical symptoms (49,50). sure or biomarkers )notype-phenotype Other Study Designs only one disease ntheses can obtain Case-SeriesDesign es, thus reducing In the so-calledcase-series, case-case, or case-onlydesign, only subjectswith the disease measure exposure of interest and no controls are enrolled in the study. This design has been proposed to e clear advantages evaluate etiological heterogeneity using tumor markers. The degree of etiological heterogeneity is quantified by the ratio of the odds ratio for the effect of exposure on rcsted case-control marker-positive tumors to the odds ratio for marker-negative tumors. This parameter is ncy (47). In these equivalent to the odds ratio for the association between exposure and tumor marker in the analyzed, reducing cases(51). However, case-onlystudies are limited to the estimation of the ratio of odds rol design includes ratios and cannot be used to obtain estimatesof the odds ratios for different tumor types. It random sample of should be noted that the odds ratio from a case-only design would underestimatethe odds e-to-control ratio to ratio derived in a case-control design when the exposure of interest is associated with ;ase-controlstudies. more than one tumor type. In addition, demonstration that expected associationsbetween r at the onsetofthe established factors and a particular type of cancer are identifiable in a particular study re. The case-cohort population provides reassurancesof the generalizability of findings. Case-seriesstudies ) sarne comparison where cases can be identified and obtained using well-characterized population-based e-free subjects are registries could overcome some of these limitations. De 10 Garcia-Closas et al.

The case-onlystudy has also beenproposed as a valid designto evaluatemultiplicative important on gene-gene(52) and gene-environmentinteractions (53). However, this design has risk for diseaseor for iin.,i,uiionr, most notably it cannot be used to obtain estimatesof relative parameter(54)' and dir additive interactions,is susceptibleto misinterpretationof the interaction and the mt is highly dependenton the assumption of independencebetween the exposure preferable ov genoiype unier study (55). Becauseof these limitations, case-controldesigns are SE Io .ur"-r".i", designs,when an appropriatecontrol group can be enrolled. tir sp Lltntcal l rw.t cc of therapeutic or Randomized clinical trials are the gold standard for the evaluation rh as case- preventive interventions.The key advantageover observationalstudies such of confounding control and prospectivecohort studiesis the potential to avoid selectionand be chance variation' Lt biases through randomization of interventions.Within the limits of confounding factors randomizationensures similar distributionsof known and unknown cc cannot be used to in the groups of patients being compared. Although clinical trials m assessmentof addressetiological questionsbecause of the lack of a control population, th valuablein studying risk factorsfor diseasethrough questionnairesand biomarkerscan be al addition' this etiological heterogeneityuring analyses described above. In pr "ur"-only risk factors Oesignis well suited to evaluatethe influence of genetic and environmental bt limitation is the for diseaseon diseaseprogression and responseto treatment.A potential EI designs of highly lack of generalizabilityoi finoings as discussedfor other case-only selectedcohorts of Patients. n( al pl Other Study Designs b, to the limitations of the Alternative study designshave been proposedto addresssome IA designscan be used classicalepidemiologicat designs. For instance, the two-phasesampling cl in large to improve efficiency and reduce the cost of measuring biomarkers ir design could be a case-controlor epidemiological studies (56). The first phase of this S] In a study with basic exposure information and no biomarker measurements. "tno* of biomarkers secondphaie, more elaborateexposure information and/or determination first phase)is E (with collection of biological specimensif these were not collected in the (e'g" carried out in an informative sampleof individuals defined by diseaseand exposure t methods such as subjects with extreme or uncommon exposures).Multiple statistical it been developed S simpte conditional likelihood (57) or estimated-score(58) methodshave of the kin-cohort to analyzedata from two-sampling designs.Another example is the use when the main aim design asa more efficient alternativeto case-controlor cohort studies t in the general ( is to estimate age-specific penetrance for rare inherited mutations genetic testing population (59,60j. In this design, relatives of selectedindividuals with t or censoring' ( io.- u retrospectivecohort that is followed from birth to onsetof disease (

DESIGNCoNS|DERAT|oNS|NB|oSPEC|MENcoLLEcT|oN

The proper collection,processing, and storageof biological specimensin epidemiological (61). this section we studiesis critical for the successfuldetermination of biomarkers In related to describe design considerations in biospecimen collection' Other aspects and storage biospecimens such as informed consent, sample sources and processing chapters 5 protocols, biobanks, and quality control considerations are addressedin of High Genotyping"). i..Biosampling Methods") and 6 ("Principles Quality

ttt arcia-Closas et al. Design Considerationsin MolecularEpidemiology 11

,luatemultiplicative Biological specimensin prospective cohort studies are collected before the clinical esign has important onset of disease,ensuring identical sample collection, processing, and storage conditions : risk for diseaseor for samples collected from individuals that develop the disease and those that remain parameter(54), and disease free. In addition, the potential effects of disease processes on biomarker ) exposure and the measurementsmake the collection of specimens, particularly of sequential collections signs are preferable overtime, very valuable in prospective studies. Biomarker measurements can be very ed. sensitive to differences in handling of samples,e.g., fasting status at blood collection and time between collection and processing of specimens. Therefore, to avoid or minimize spurious differences in case-control studies, it is important that samples from cases and controls are collected during the same timeframe and using identical protocols. Ideally, of therapeutic or the nursing and laboratory staff should be blinded with respect to the case-control status dies such as case- of the subjects to avoid differences in collection, processing,and storage. However, ,n and confounding because differences between cases and controls in handling of samples are not always chance variation, completely avoided, it is important to record key information such as date and time of onfounding factors collection, processing and storage processing problems, time since last meal, current cannot be used to medication, current tobacco, and alcohol use to be able to account for the influence of tion. assessmentof these variables at the data analysis stage. In fact, this information should be collected in aluable in studying all study designs. In addition, since biomarkers requiring elaborate and expensive . In addition, this protocols are often measured only in a subset of study participants, this information can mental risk factors be used to match cases and controls selected for biomarker measurements. This will ial limitation is the ensure efficient adjustment for these extraneous factors during data analysis. designs of highly Biomarkers measured in samples collected in subjects during a hospital stay might not reflect measurementsfrom samplescollected outside the hospital becausemany habits and exposures change during hospitalization, e.9., dietary habits, medication used, physical activity. Therefore, even if cases and controls are selected through a hospital- based design collection of specimens after the patients return home and are no longer : limitations of the taking medications for the conditions that brought them to the hospital should be Cesignscan be used considered, if feasible. On the other hand, specimens to measure biomarkers that are markers in large influenced by long-term effects of treatment should be collected before treatment is 3 a case-control or started at the hospital, within logistic limitations. reasurements. In a rtion of biomarkers r the first phase) is Blood and Buccal Specimens and exposure(e.g., Becausenew molecular techniques often require special processing of biological samplds, I methods such as it is important to design protocols that maximize opportunities to apply future assaysto rve been developed samples being collected in epidemiology studies. For instance, blood samples are a very re of the kin-cohort valuable source of specimens that can be used for the determination of a wide range of when the main aim biomarkers. Leukocytes or white blood cells (granulocytes, lymphocytes, and mono- ns in the general cytes), erythrocytes or red blood cells, platelets, and plasma./serumcan be obtained vith genetic testing through appropriate separation of blood components. Blood samples can also be a source raseor censoring. of viable lymphocytes to perform phenotypic assays (6244).In spite of the advantages of blood samples, the use of venipuncture to obtain blood samples in large-scale rl epidemiological studies has two important limitations: relatively high cost and, in some populations, relatively low acceptability. Small amounts of DNA can be obtained from in epidemiological dried blood spots on filter paper using finger-pricks, avoiding the use of venipuncture. (65). . In this section we Advantages of blood spots include lower costs for collection, shipping, and storage aspects related to Epidemiological studies often need less expensive methods of collection with lower levels essing and storage of discomfort to the study participant to increase participation rates. Further, in some ised in chapters 5 instances,methods that are suitablefor self-collection, such as expectorated,buccal epithelial ping"). cells (6649), may be particularly advantageous.Although the use of venipuncture to 12 Garcfa-Closaset al. Del collect blood samples has clear advantages,alternative less invasive methods to collect Thr genomic DNA, at a minimum, should be consideredin most etiological studies. coll infl the Urine Specimens (74 A wide variety of biological markers of exposure and metabolic markers can be measured oft in urine samples (70). Often, the collection and processing of urine samples are the uncomplicated,with the samplebeing kept cold, both to maintain stability of the analytes as well as to avoid bacterialovergrowth. Generally, urine is simply aliquotedand frozen; can however, for someanalytes, collection and storagecontainers have specificrequirements fixe and preservativesmay be needed.For most exposure markers, the gold standardis the hig) 24-hour urine samplecollection, followed, in general,by the l2-hour evening/overnight Patl sample, the 8-hour overnight collection, the first morning voided sample, and the so- spe, called spot single urine sample. The utility of a single spot urine sample, relative to longer, timed collections,is highly specific to the kinetics associatedwith the pattern of exposure and the half-life of the biomarker. SA

Tissue Specimens San imt of tissueprotocols during the design of the study is critical for the Consideration stat retrieval and use of specimens. As with other type of specimens,study successful size protocols shouldbe designedin conjunction with experts,in this casestudy pathologists, sarT who can assessoptions for obtaining, processing, storing, and testing specimens. the on specimenlabeling, storing, tracking, and shipping during the planning Considerations cat( stagescan also greatly facilitate the use of specimensin the future. Paraffin-embedded ASS are often the most accessiblesources of tissue, since these are routinely tissue blocks rati, preparedin pathologylaboratories for clinical purposes.To optimize the utility of tissue betr in epidemiologicalstudies, it is important to collected information on the protocols blocks ther used for tissue preparation. This information includes dates when blocks were prepared, criteria used for tissue sampling, and methods used for the processing and storage of ASSI blocks. These factors are relevant in the analysis and interpretation of assaysperformed in are instance,for surgicalspecimens it is critical to know if patientsunderwent the tissues.For gen presurgical chemotherapy or radiotherapy since these therapies can lead to extensive con the representativenessof the tissue specimens. necrosisthat can affect gen of paraffin-embedded tumor samples is made easier in hospital-based Collection the studies since the number of pathology departments compared with population-based (GI diagnosedtends to be smaller. Hospitals typically discard diagnostic where cases are ger years after the initial diagnosis, and thus, retrieval of archived tissue blocks some (79 years after the diagnosis of disease often results in low success rates. specimens sus of archived tissue blocks shortly after diagnosis increases the Requesting the retrieval (36 chancesof obtaining tissues,however, these specimensusually need to be returned to the SS be needed for medical care of the patients. Tissue microarray hospitals since they might (83 (TMA) technology can be used for sampling of small tissue cores of pathological targets of, paraffin blocks and transferred systematically into one or a few recipient blocks from ger multiple tissue cores (71). Sections of single TMA blocks can provide containing of ofhundreds ofcases suitablefor testing in a single batch,thereby reducing representations sca cost, expense,and interbatch variability. Sat Although TMAs offer opportunities to standardize IHC performance, many bar factors that impact the reliability of IHC data need to be been addressed. important un( arcia-Closas et al. Design Considerations in Molecular Epidemiology 13 methods to collect These concerns can be pronounced in multicenter studies in which tumor tissues are I studies. collected from different centers with varying tissue-processingprotocols. Primary factors influencing IHC results include delays in the time to formalin fixation (72), variation in the adequacy of formalin fixation (73), improper storage of cut and unstained slides (74,75), and variable reproducibility in IHC interpretation (76). Development of markers )rs can be measured of tissue quality and processing (77) could be a useful quality control measure to improve urine samples are the interpretation and analysis of IHC information, particularly in multicenter studies. ility of the analytes Implementation of tissue processing protocols not routinely performed for clinical iquoted and frozen; care can be of interest in epidemiological investigations. For instance, the use of tissue recific requirements fixatives that preserve RNA or snap freezing tissue samples may be required to obtain ;old standardis the high-quality RNA for gene expression arrays. The proximity to laboratory facilities and r evening/overnight pathology departments in hospital-based studies facilitates the implementation of ample, and the so- specialized protocols since it allows for rapid processing of specimens. sample, relative to with the pattern of SAMPLE SIZE CONSIDERATIONS

Sample size considerations during the design of a molecular epidemiology study are important to ensure adequate numbers to evaluate questions of interest with sufficient is critical for the statistical power. All general epidemiological principles that apply to power and sample I specimens, study size (16) apply to molecular epidemiology studies.For example, the main determinants of study pathologists, sample size requirements for a given test in a case-control study are the rate of diseasein testing specimens. the population under study, type and distribution of the biomarker (e.9., frequency of a during the planning categorical biomarker such as genotypes and distribution of a continuous biomarker such Paraffin-embedded as serum hormone levels), magnitude of biomarker differences (e.g., measuredas the odds these are routinely ratio for a biomarker-disease association or differences in means of biomarker levels the utility of tissue between two groups in cross-sectional studies), the desired statistical power to detect ion on the protocols these differences, and the alpha{evel of the test. lcks were prepared, Sample size considerations are critical for the design of studies of genetic ling and storage of associationsand gene-environment interactions (78). Generally, sample size requirements assaysperformed in are large (hundreds to thousands of subjects) because the expected effects of individual 'patients underwent genetic variants are relatively small. Recent findings from large-scale studies that have r lead to extensive confirmed associationsbetween common polymorphism and cancer risk using candidate s. gene and genome-wide association studies have shown magnitude of associationswithin :r in hospital-based the OR range of 1.2-1.5. For instance, a deletion in the glutathione-S transferase Ml rology departments (GSTMI) gene and the slow acetylation genotype for the N-acetyl transferase 2 (NAT2) discard diagnostic gene are associated,respectively, with 1.5- and 1.4-fold increases in bladder cancer risk trieval of archived (79,80), uncommon versus common homozygous genotypes in novel breast cancer- low success rates. susceptible loci for breast cancer are associated with ORs ranging from 1.2 to 1.6 lnosis increases the (36,81,82),and a polymorphism in the tumor necrosisfactor (TNF-308G->A) gene was :o be retumed to the associatedwith a 1.6-fold increased risk of diffuse large B-cell non-Hodgkin lymphoma . Tissue microarray (83). Because the costs of genome-wide association studies where hundred of thousands pathological targets of genetic markers are evaluated in thousands of individuals are very high at the current aw recipient blocks genotyping cost, staged designs are commonly used. The sample size needsfor these type rlocks can provide of studies are described in chapter 15. However, as the costs for primary genome-wide :h, thereby reducing scans continue to decrease, there may be less need for two-stage designs in the future. Sample size requirements for more complex analyses of genotype data such as pathway- rerformance, many based analyses, haplotype analyses, novel high-dimensional analyses are less well be been addressed. understood (chaps. 12-14). 14 Garcia-Closas et al. De

Evaluation of gene-environment interactions often requires large sample sizes (see RT chap. 11, "statistical Approaches to Studies of Gene-Gene and Gene-Environment Interactions"), and sample size needs are further increased by the presence of errors measuring environmental and/or genetic exposures, even when the errors are small (84,85).Although multiplicative parametersfor gene-environmentinteractions tend to be attenuatedby differential misclassification of exposure (84), this does not hold for the estimation of exposure main effects, joint effect, and subgroup effects or additive interactions.In addition, misclassificationleads to biased estimatesof risk (86)' Thus, high-quality exposure assessmenland almost perfect genotype determinations are required for the evaluation of gene-environment interactions. This highlights the importance of validating genotypeassays and including quality control samplesduring genotype determinations to assessthe reproducibility of the assays (see chap. 6, "Principles of High-Quality Genotyping"). Current case-controlor cohort studies usually include somewherebetween a few hundredto a few thousandcases and similar numbersof controls.Therefore, to meet the larger samplesize requirementsto identify weak associationsand interactions,especially when consideringhistological subtypesof cancers,an increasingnumber of consortiaof existing studiesare being formed. Consortiacan achievethe large samplesizes necessary to confirm or refute associations by coordinating the analysis of pooled data from many studies,as well as to evaluateconsistency of findings across studiesof different (see on the value of quality and with different sourcesof biases chap. l7 for a discussion ll consortia to validate and confirm associations through meta-analysesand pooled I analyses). l.

I CONCLUDING REMARKS t. The field of molecular epidemiology is undergoing a transformationalchange with the recent incorporationof powerful genomic technology,which should continueto improve I in its comprehensiveness,cost, and efficiency into the foreseeablefuture, and provide an unprecedentedopportunity to understandthe fundamentalprocess of carcinogenesis.At I the same time, large and high-quality case-controlstudies have been establishedwith detailed exposuredata and stored biological specimens;previously establishedcohorts with biologic samplesare being followed up; and new cohort studies with biological I I samplesarc still being established,particularly in developingcountries. The confluenceof extraordinary technology and the availability of large epidemiological studies should l ultimately lead to new preventive,screening, and treatmentstrategies. However, this will epidemiology adheres to the time-tested and only be achieved if the field of molecular 2 fundamenral epidemiological principles of high-quality study design, vigilant quality of control, thoughtful data analysis and interpretation, and well-powered replication I important findings.

2 ACKNOWLEDGMENTS 2 This chapter has been adapted and updated from a book chapter, "Application of Biomarkers in Cancer Epidemiology," by Garcia-Closaset al. (7). We thank the other ) coauthors from the earlier chapter, Drs. Roel Vermeulen, Mark E Sherman,Lee E Moore, and Martyn T Smith, for their valuable contributions. rcfa-Closas et al. Design Considerationsin MolecularEpidemiology 15

sample sizes (see REFERENCES iene-Environment )resence of errors 1. National ResearchC. Biological markersin environmentalhealth research.Committee on errors are small Biological Markersof the NationalResearch Council. EnvironHealth Perspect 1987;74:3-9. :actions tend to be 2. RothmanN, WacholderS, CaporasoNE, et al. The useof commongenetic polymorphisms to loes not hold for enhancethe epidemiologicstudy of environmentalcarcinogens. Biochimica et Biophysica )ffects or additive Acta2001; l47l:Cl{10. issuesin the useof biologic markersin epidemiologicresearch. Am rf risk (86). Thus, 3. SchultePA. Methodologic J Epidemiol1987 126(6):1006-1016. eterminations are : 4. PereraFP. Molecular cancer epidemiology: a new tool in cancerprevention. J Natl CancerInst Lis highlights the I 987;78(5):887-898. ol samples during 5. PereraFP. Molecularepidemiology: on the pathto prevention?J Natl CancerInst 2000;92(8): ,ys (see chap. 6, 602412. 6. Toniolo P, BoffettaP, ShukerDEG, et al. Applicationof Biomarkersin CancerEpidemiology. rre between a few Lyon: IARC, 1997. refore, to meet the 7. Garcia-ClosasM, VermeuleriR, ShermanME, et al. Application of biomarkersin cancer actions, especially epidemiology.In: FraumeniDSJF, ed. CancerEpidemiology and Prevention. Third Edition. rer of consortia of New York: Oxford UniversityPress, 2006. genetic era of rle sizesnecessary 8. Aardema MJ, MacGregor JT. Toxicology and toxicology in the new impactof '!omics" technologies.Mutat Res2002; 499(l):13-25. pooled data from "toxicogenomics": 9. WangW, ZhouH, Lin H, et al. of proteinsand metabolites by massspectrometry tudies of different Quantification without isotopiclabeling or spikedstandards. Anal Chem2003; 75(18):48184826. on on the value of 10. HanashS. Diseaseproteomics. Nature 2003; 422(6928):226-232. lyses and pooled 11. BaakJP, Path FR, HermsenMA, et al. Genomicsand proteomics in cancer.Eur J Cancer2003; 39(9\:1199-1215. 12. SellersTA, YatesJR. Reviewof proteomicswith applicationsto geneticepidemiology. Genet Epidemiol2W3; 24(2):8348. 13. StaudtLM. Moleculardiagnosis of the hematologiccancers. N Engl J Med 2003;348(18): t777-1185. 14. StrausbergRL, SimpsonAJ, WoosterR. Sequence-basedcancer : progress, lessons rl change with the andopportunities. Nat Rev Genet2003; 4(6):409-418. )ntinue to improve 15. WacholderS, Mclaughlin JK, SilvermanDT, et al. Selectionof controlsin case-control rue,and provide an studies.I. Principles.Am J Epidemiol1992; 135(9):1019-1028. carcinogenesis.At 16. Breslow NE, Day NE. Design Considerations.In: Breslow NE, Day NE, eds. Statistical n established with Methodsin CancerResearch Volume II. The Designand Analysisof Cohort Studies.Lyon: :stablished cohorts IARC hess, 1987. es with biological 17. RothmanKI, GreenlandS. ModernEpidemiology. Philadelphia: Lippincott-Raven, 1998. The confluence of 18. SchultePA, PereraFP, Toniolo P, et al. Transitionalstudies. In: Applicationof biomarkersin cal studies should cancerepidemiology. Lyon, France:IARC ScientificPublications,1997:19-29. N. Geneticsusceptibility biomarkers in studiesof occupationaland environmental However, this will 19. Rothman cancer-+nethodologicissues. Toxicology Letters 1995;7'7 (l -3):221-225. he time-tested and 20. Hulka BS, Margolin BH. Methodologicalissues in epidemiologicstudies using biologic n, vigilant quality markers.Am J Epidemiol 1992;135(2):2N-2O9. rred replication of 21. Hulka BS. ASPODistinguished Achievement Award Lecture.Epidemiological studies using biologicalmarkers: issues for epidemiologists.Cancer Epidemiol Biomarkers hev 1991;1(l): r 3-l 9. 22. Kim S, Lan Q, WaidyanathaS, et al. Geneticpolymorphisms and benzenemetabolism in humansexposed to a wide rangeof air concentrations.Pharmacogenet Genomics 2007; l7 (10):789-801. 23. Lan Q,Zhang L, Li G, et al. Hematotoxicityin workersexposed to low levels of benzene. r, "Application of Science2004; 306(5702):17 74-17'1 6. Ve thank the other 24. RothmanN, StewartWF, SchultePA. Incorporatingbiomarkers into cancerepidemiology: nan, Lee E Moore, a matrix of biomarkerand studydesign categories. Cancer Epidemiol Biomarkers Prev 1995; 4(4):301-23t1. Des Garcia-Closas et al' 16 49. 25.SchatzkinA,FreedmanLS'SchiffmanMH,gta].Validationofintermediateendpointsin 82(22):1746-l'752' research'J N;d Cancerlnst 1990; cancer considerations in molecular Stt'ou"nfeld D' ut' O"sign 26. Schulte pe, notnrian-N' "t )U epidemiology'tn'Mol"tul*Epidemiology:PrinciplesandPractices'SanDiego'CA: 51 Press,1993: 159-198' Academic points in cancerresearch' Nat pro,nir" and peril of surrogateend 2?. SchatzkinA, Gail ;. i;; 52 ornoverbiomarkers bvmicroarrav anarvsis ,, lil:;ffi i::i:lllJJ"#;E, erar. Discovery 53 ofperipheralbloodmononuclearcellg"n..*p,"s.ioninbenzene-exposedworkers.Environ Health^Perspect2005; I 13(6):801-807' polymorphismsin cytokineand cellular adhesion molecule 2g. LanQ,Zhang L, S"h;-M,';i al. Res 5t u*o"g *orkers exposedto benzene'Cancer genesand ,u'"tptiiiiity to hematotoxi"ity 5: in DNAm^ethYlation patterns in subjecrs ,r. 31?ilftf?i#"iiti, HouL, et aLchanges Res 2007; 67(3):876-880' exposedto low-dosebenzene' Cancer 5r 3l.ChenH,LiS,LiuJ'etal.Chronicino,guni"**"ni."*po,ureinduceshepaticglobaland individualgenehypomethylation:implicationsforarsenichepatocarcinogenesis.Carcino- 5 insmokers withand without coPD' ,, ffi:'riTl,i,iJll*]13;ltli;"t. reromereshortening 5 Eur ResPirJ 2006;27(3):525-528 5 33.VermeulenR,LiG,LanQ,eta,l.Detailedexposureassessmenl-forarnolecularepidemiologyAnn OccupHvg 2004;48(2):105-106' benzenei;;;; ;"" factoriesin China' studyof epidemiologicstudies: a surveyof Hartgep. x"porting_puii"ipationin ( 34." MortonLM, cahill J, practice.Am J Epidemiol2006;163(3):191203' 35.Coxe'OunningAM'Garcia-ClosasU'eral'AcommoncodingvariantinCASP8is 39(3):352-i58' ( associated*itt'U'"utttuncerrisk'NatGenet2OO7 identifiesnovel eM' et at. Genome-wideassociation study 36. Eastonun, poor"v-?e, ounning 2OO7; M1 ('71 48): I 087-1093' breastcancer *'J"ptiiirlty loci'"Nature in I u"rru. products:promise and pitfalls H, RundleAG. Mlurur", or g"no,yf" e*3 37. Ahsan 434' Carcinogenesis2003;.2 4(9):1429-1 cancerprevention' factorfor cancer'cancer sensitivity:'agenetic predisposition 3g. wu X, Gu J, spi,, vn. r"r"i"gen Res2007; 6'7 (8):34%-3a95' 3g.Berwickrur,vineisP.Mu,k".,ofDNArepairandsusceptibilitytocancerinhumans:an Inst 2000;92(ll):8'74-a97' epidemiologrc'"ui"*' J Natl Cancer 40.SpitzMR,Weia,o".,Q'etal.Geneticsusceptibilitytolung^cancer:theroleofDNA damageuno'"pui'6un"""'ipla"tniolBiomarkersPrev2003;12(8):689-698'use of biological p, p, et al, Logisl", and designissue.s in the 41. potterp, ronroio s.ff"," specimensino|,",uutionalepidemiotog.y.ln.ApplicationofBiomarkersinCancer' p'un""' IARC ScientificPublications ' 199'l31-37 Epidemiology';il| of biologicalmarkers p, et al. Methodologicalissues in the use 42. Hunterol, roriJrl'p,^;il" incancerepidemiology:cohortstudies'ln:ApplicationofBiomarkersinCancer Scientific Publications'199'7 :3946' Epidemiologvi;;;;';t";"' ranc 43.BanksE'MeadeT.Studyofgenesuno-"nui,on*entalfactorsincomplexdiseases.Lancet 157' (author I I 57)' 2002;359(9312): 1I 56-1 -reply factors in complex t of gin"' and 44. Burton P, McCarthy M' Elliott ryqi 'envir-mental 156' (authorreply 1157)' diseases.I'un"tt iWz; Z'sg(gZtz)tt155-l 45.ClaytonD,McKeiguePM.Epidemiologicalmethodsforstudyinggenesandenvironmental 358(9290): 1356-1 360' in Lun.", 200I ; factors of genesand environmentalfactors in "o*pr"*-alr"u."r.s, curcia-closas M, RothmanN. study 46. Wacholde, 157)' 2002;359(93 12): 1 1 55' (authorreply I complexOi,"ut"t' f-un"et nestedcase- inlhoosing betweenthe case-cohortand 47. wacholders. ;;;; considerarions 1991;2(2):155-158' controldesigns' Epidemiology clin Lab Med humanpapillomavirus testing and screening' 4g. FrancoEL. Statisiicalissues in 2000 20(2):345-367' iarcia-Closas et al. Design Gonsiderationsin MolecularEpidemiology 17 nediate end points in 49. Welch HG, Black WC. Using autopsyseries to estimatethe disease"reservoir" for ductal carcinomain situ of the breast:how much more breastcancer can we frnd? Ann Intem Med :ations in molecular 199'7; r27 (r r): 1023-1028. gs. San Diego, CA: 50. MorrisonAS, RothmanKI, GreenlandS. Screening.In: ModernEpidemiology, 2003:49-518. 51. Begg CB, ZhangZF. Statistical-analysisof molecularepidemiology studies employing case- cancer research. Nat series.Cancer Epidemiol Biomarkers Prev 1994;3(2):173-175. 52. Yang Q, Khoury MJ, Sun F, et al. Case-onlydesign to nrcasuregene-gene interaction. ry microarray analysis Epidemiology1999; 1O(2):167 -17 0. sed workers. Environ 53. Khoury MJ, FlandersWD. Nontraditionalepidemiologic approaches in the analysisof gene- environmentinteraction: case-control studies with no controls!Am J Epidemiol 1996;144(3): ar adhesion molecule 20'1-213. benzene.Cancer Res 54. Schmidt S, SchaidDJ. Potential misinterpretationof the case-onlystudy to assessgene- environmentinteraction. Am J Epidemiol1999; 150(8):878-885. r pattems in subjects 55. Albert PS,Ratnasinghe D, TangreaJ, et al. Limitationsof the case-onlydesign for identifying gene-environmentinteractions. Am J Epidemiol2001 ; I 54(8):687-693. es hepatic global and 56. White JE. A two stagedesign for the studyof the relationshipbetween a rare exposureand a linogenesis. Carcino- raredisease. Am J Epidemiol1982; I 15(l):l 19-128. 57. Cain KC, BreslowNE. Logisticregression analysis and efficientdesign for two-stagestudies. :h and without COPD. Am J Epidemiol1988; 128(6):1198-1206. 58. ChatterjeeN, ChenY, BreslowN. A pseudoscoreestimator for regressionproblems for two rlecular epidemiology phasesampling. J Am StatAssoc 2003; 98:10. l; 48(2):105-106. 59. WacholderS, HartgeP, StruewingJP, et al. The kin-cohortstudy for estimatingpenetrance. J ic studies:a survey of Epidemiol1998; 148(7):623-630. 60. ChatterjeeN, Shih J, HartgeP, et al. Associationand aggregationanalysis using kin-cohort variant in CASP8 is designswith applicationsto genotypeand family history datafrom the WashingtonAshkenazi Study.Genet Epidemiol 20()l; 2l(2):123-138. study identifies novel 61. Holland NT, Smith MT, EskenaziB, et al. Biological samplecollection and processingfor molecularepidemiological studies. Mutat Res 2003; 543(3):217-234. romise and pitfalls in 62. KleebergerCA, Lyles RH, Margolick JB, et al. Viability and recovery of peripheralblood mononuclearcells cryopreservedfor up to 12 yearsin a multicenterstudy. Clin Diagn Lab )tor for cancer. Cancer Immunol1999; 6(1):14-19. 63. Beck JC, BeiswangerCM, John EM, et al. Successfultransformation of cryopreserved lancer in humans: an lymphocytes:a resourcefor epidemiologicalstudies. Cancer Epidemiol BiomarkersPrev 2001;l0(5):551-554. cer: the role of DNA 64. HayesRB, Smith CO, HuangWY, et al. Whole blood cryopreservationin epidemiological 9-698. studies.Cancer Epidemiol Biomarkers Prev 2002; l1(ll):1496-1498. the use of biological 65. Steinberg KK, Sanderlin KC, Ou CY, et al. DNA banking in epidemiologic studies. omarkers in, Cancer EpidemiologicReviews 1997 ; 19(l):'156-162. 1 66. HansenTV, SimonsenMK, NielsenFC, et al. Collection of blood, saliva, and buccal cell of biological markers samplesin a pilot study on the Danish nursecohort: comparisonof the responserate and omarkers in Cancer quality of genomicDNA. CancerEpidemiol Biomarkers Prev 2007; 16(10):2072-2076. 6. 67. Garcia-ClosasM, Egan KM, Abruzzo J, et al. Collection of genomic DNA from adultsin rplex diseases.Lancet epidemiologicalstudies by buccalcytobrush and mouthwash.Cancer Epidemiol Biomarkers Prev2001; 10(6):687-696. al factors in complex 68. PaynterRA, SkibolaDR, SkibolaCF, et al. Accuracyof multiplexedlllumina platform-based single-nucleotidepolymorphism genotyping compared between genomic and whole genome les and environmental amplified DNA collectedfrom multiple sources.Cancer Epidemiol BiomarkersPrev 2006; r5(12):2533-2536. factors in 69. FeigelsonHS, RodriguezC, RobertsonAS, et al. Determinantsof DNA yield andquality from "ironmental buccalcell samplescollected with mouthwash.Cancer Epidemiol Biomarkers Prev 2001; l0 )hort and nested case- (9):1005-1008. ?0. GunterEW, McQuillanG. Qualitycontrol in planningand operating the laboratorycomponent eening. Clin Lab Med for the Third NationalHealth and Nutrition ExaminationSurvey. J Nutr 1990;120(suppl l1): . t45r-1454. 18 Garcia-Closas et al.

71. Kononen J, Bubendorf L, Kallioniemi A, et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens.Nat Med 1998l.4('l):844-847- 12. Oyama T, Ishikawa Y, Hayashi M, et al. The effects of fixation, processing and evaluation criteria on immunohistochemical detection of hormone receptors in breast cancer. Breast cancer (Tokyo, Japan)2007; l4(2):182-188. time for consistent 2 ?3. Goldstein NS, Ferkowicz M, Odish E, et al. Minimum formalin fixation estrogen receptor immunohistochemical staining of invasive breast carcinoma. Am J Clin Pathol 2003; 120(l):86-92. Fa 74. JacobsTW, Prioleau JE, Stillman IE, et al. Loss of tumor marker-immunostaining intensity on stored paraffin slides of breastcancer. J Natl Cancer Inst 1996; 88(15):105,t-1059. 75. FergenbaumJH, Garcia-ClosasM, Hewitt SM, et al. Loss of antigenicityin storedsections of Aur breast cancer tissue microarrays. Cancer Epidemiol Biomarkers Prev 2004; 13(4):667472. Det ?6. Rhodes A, Borthwick D, Sykes R, et al. The use of cell line standards to reduce HER-2/neu San assay variation in multiple European cancer centers and the potential of automated image analysis to provide for more accurate cut points for predicting clinical response to trastuzumab.Am J Clin Pathol 2004; 122(1):51-60. '7i. De Marzo AM, Fedor HH, Gage WR, et al. Inadequate formalin fixation decreasesreliability probing optimal fixation time using high-densitytissue of p27 immunohistochemicalstaining: INT microarrays. Hum Pathol 2002; 33(7):7 56-:160. 78. Garcia-Closas M, Lubin JH. Power and sample size calculations in case-control studies Fan of gene-environmentinteractions: comments on different approaches.Am J Epidemiol 1999; mol t49(8):689492. ?9. Garcia-ClosasM, Malats N, Silverman D, et al. NAT2 slow acetylation, GSTMI null fam genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta- agg', analyses. Lancet 2005; 366(948O:6a9459. the 80. Rothman N, Garcia-ClosasM, Hein DW. Commentary: reflections on G. M. Lower and the colleagues' 1979 study associating slow acetylator phenotype with urinary bladder cancer: dire meta-analysis, historical refinements of the hypothesis, and lessons learned. Int J Epidemiol for 200'7 36(1):23-28. bios 81. Hunter DJ, Kraft P, JacobsKB, et al. A genome-wideassociation study identifies alleles in with risk of sporadic postmenopausalbreast cancer. Nat Genet 2007; 39(7): FGFR2 associated dise 87H74. clor 82. Stacey SN, Manolescu A, Sulem P, et al. Common variants on chromosomes2q35 and 16q12 are confer susceptibilityto estrogenreceptor-positive breast cancer.Nat Genet 2007; 39(7):865-869. 83. Rothman N, Skibola CF, Wang SS, et al. Genetic variation in TNF and ILIO and risk of non- detr Hodgkin lymphoma: a report from the Interlymph Consortium. Lancet Oncol 20O6:,7(l):27-38. hav 84. Garcia-ClosasM, Rothman N, Lubin J. Misclassification in case-controlstudies of gene- bee environment interactions: assessmentof bias and sample size. Cancer Epidemiol Biomarkers can Prev 1999;8(12):1043-1050. 85. Deitz AC, Garcia-Closas M, Rothman N, et al. lmpact of Misclassification in Genotype- not( cancer. Disease Association Studies: Example of N-acetyl 2 (NAT2) smoking and bladder key Proc Am Asso Cancer Res 2000;41:559. pop BK White E, Saracci R, et al. Exposure measurement error and its effects. In: 86. Armstrong mel principles of ExposureMeasurement in Epidemiology. New York: Oxford University hess, 1992. can this thei stu( are qua