<<

15 Meta-Analysisand PooledAnalysis - Geneticand EnvironmentalData CamilleRagin and EmanuelaTaioli Ilniversity of Pittsburgh Institute and School of Public , Pittsburgh, PA, USA

15.1.INTRODUCTION Four steps can be identified when performing a meta-analysis(See Box 15.1) A large amount of have been produced markers measured and published on biological 1. Identificationofthe relevantstudies. a common obstacle in in human subjects, but 2. Setting of the eligibility criteria for the inclu- has been the lack reaching definite conclusions sion and exclusion in the meta-analvsisof the power the studies.To of statistical of identified studies. problem, it is customary to per- overcome this 3. Abstracting the relevant data. published studies, while form summariesof the 4. Analysis of the data, including formal statisti- Reviews of the new, larger studies are completed. cal testing for heterogeneity and investigation performed in two main ways: by evidence can be of the reasonsfor heterogeneityif it exists. completing a meta-analysisof publisheddata, or a pooled analysisof individual data (both published and unpublished). Databasesearching, eligibility criteria and data extraction A bibliographic search (e.g. in MEDLINE or 15.2. META.ANATYSIS EMBASE) should be conducted to identify the Meta-analysisprovides summary estimates by studiesof interest.Potentially relevant publications combining the individual results published by inde- may be identified from the abstracts,and full-text pendent .This approachincreases power, versions should be obtained for a review. The producesa more accurateestimate of the while next step is to define eligibility criteria for the reducing the possibility of false-negative results meta-analysis, since not all studies can or should (Greenland 1987). Some limitations are the impos- be included in a meta-analysis. This is to ensure sibility of performing more refined analyses,such of the meta-analysisand minimize as dose-response and stratified analyses. While . If the eligibility criteria are well defined, severalguidelines and methodological papershave any person should be able to reproduce the path been published for clinical trials, the organization followed for identifying studies to be included. and summarization of data for observational stud- Bias will be reduced by the systematic selection ies (Blettneret al.1999; Stroupet aI.20OO)and for of studies, which should not be influenced by the molecular genetic research(Bogardus et al. 1999) knowledge of the study results or aspects of the has been addressedless frequently. studv conduct. The basic considerationsin definine

Molecular Epidgmiology of Chronic Drseases, Edited by C. P. Wild, P. Vineis, md S' Garte @ 2008 John Wiley & Sons, Ltd tssuEslN 200 META-ANATYSISAND POOTEDANATYSIS - CENETICAND ENVIRONMENTALDATA

Summaryestirnafes and assessment Box 15.1 Four steps to performing a meta- of heterogeneity analysis The summary estimate provides the overall effect l. Identify the relevant studies. of the measured outcome by combining the data 2. Set the eligibility criteria for the inclusion from all the studies included in the meta-analysis. and exclusion. A weighted averageof the results of each study is 3. Abstract the relevant data. used to calculate this summary estimate;the simple 4. Analyse the data: arithmetic average would be misleading. The size (a) Generatesummary estimates. of the study must be taken into considerationwhen (b) Testing for heterogeneity. calculating the summary estimate. Larger studies (c) Assesspublication bias. have more weight than smaller studies because their results are less subject to chance. Fixed effects and random effects are two types of models eligibility criteria for a meta-analysis should used for calculating the summary estimate.Fixed include study design, years of publication, com- fficts models consider that the variability between pleteness of information in the publication, simi- studies is due to random variation becauseof the size of the study. This meansthat if all the studies larities of treatment and/or exposure, languages, Figure 15.1 F (i.e. and the choice of studies that have overlapping were large, they would yield the same results datasets. the studies are homogeneous).Statistical methods Data extraction should include data on the rel- which calculate the summary estimates based evant outcomes and the general characteristicsof on the assumption of fixed effects include the where a value each study such as samples size, source of the Mantel-Haenszel method (Mantel and Haenszel ogeneitYbetu control , race or ethnicity of the study 1959),the Petomethod (Yusuf eral. 1985),general Afixedeff' population. -basedmethods (Wolf 1986) and the CI methods (Greenland 1987; Prentice and Thomas the summarY bet 1987). Random effects models consider that the observed shoul Graphical summaries variability between studies is due to distinct differ- model- acrossstudie The results for each of the studies included in the encesbetween the studies (i.e. the studies are het- differences t meta-analysis can be graphically displayed with erogeneous).Heterogeneity can arise when there the meta-ant the odds ratios (ODs) or ratios (RRs) are differences in study design, lengths of follow- differences. and confidence intervals (CIs), using a Forest up or inclusion criteria of the study participants. estimate do' plot (Figure 15.1).Each study is representedby a The statistical methods describedby DerSimonian sometimes c square and a solid line. The squarecorresponds to and Laird (1986) calculate the summary estimates mates or fitt the OR or RR. The size of each squarecorresponds basedon the assumptionof random effects. (due to het to the contribution or weight of that particular Although meta-analysesprovide the opportu- can be exan study in the meta-analysis.Larger studies tend to nity to generate summary estimates of published ethnicity or contribute more to the meta-analysesthan smaller studies, it is important to note that this surnmary or detectiot studies. The solid horizontal line drawn through estimatemay not be appropriatewhen the included some cases each square representsthe study's 957o CL Note studies are heterogeneous.To establish whether relevant inl that the 95Vo CI shows the true underlying effect the results are consistent between studies, reports measuredoutcome 95Voof the times if the of meta-analysescommonly present a statistical of the AssessingP study were repeated again and again. The solid test of heterogeneity.The classical measureof het- vertical line (at OR : 1) representsno association. erogeneitybetween studiesis the Q-,where Publication If the study's 95Vo CI crosses this line, then the heterogeneityexists when p < 0.05. Another statis- significant and effect of the measuredoutcome is not statistically tical test, the 12 statistic,describes the percentageof circd, journals. I significant (i.e. p > 0.05). The diamond corre- variation across studies that are due to heterogeneity bY sponds to the summary estimate or combined OR rather than chance(Higgins et a\.2003; Higgins and done Pt for all the studies in the analysis. Thompson 2N2).T\e 12 rangesfrom|Vo to l00%o, ffY ProPos ISSUESIN POOLEDANALYSIS OF EPIDEMIOTOCICAISTUDIES INVOTVINC MOLECULAR MARKERS 2O1

A (1994) 'erall effect B (1ee6) rg the data c (2000) ta-analysis. D (2002) rch study is E (2002) ;the simple g. The size F (2003) :ation when G (2003) ger studies H (2004) es because (2004) nce. Fixed J (2006) s of models nate. Fixed Combined ity between ,auseof the the studies Figure 15.1 of studies included in the meta-analysis,and summary for the combined studies results (i.e. :al methods rates based include the d Flaenszel where a value of OVoindicates that there is no heter- of bias,the plot will resemblea symmetricalinverted 85), general ogeneity betweenthe studiesin the meta-analysis. funnel (Figure 15.2) and the Egger's test value will and the CI A should be usedto calculate be p > 0.05. Conversely,if thereis bias, funnel plots rnd Thomas the summary estimate when no heterogeneity is will often be skewedand asymmetrical.In this case, ler that the observed between studies, while a random-effects the Egger's test value would be p < 0.05. ltinct differ- model should be used was when heterogeneity lies are het- across studies is observed(Normand 1999). When 15.3. POOLEDANALYSIS when there differences among studies exist, it is the task of Another way to summarize results from observa- follow- the meta-analystto determine the sourcesof these s of tional studies is to pool individual records and re- participants. differences. The reporting of the random effects analyse the data (Fenech et aI. 1999; Friedenreich )erSimonian estimate does not remedy the problem and can 2N2;Taioli 1999).This approach(see Box 15.2) estimates sometimesconceal the fact that the summary esti- ry allows for the performance of statistical interac- matesor fitted model is a poor summary of the data fects. tion tests, sub-group analyses, and refined dose- he opportu- (due to heterogeneity). Sources of heterogeneity response curves (Friedenreich 1993). Guidelines :f published can be examined by stratification of the studies by and methods for pooling data from molecular epi- summary ethnicity or control source (Raimondi et aI.2006), ds demiological researchhave been published (Taioli included or detection method (Hobbs et aI. 2OO6),but in the and Bonassi 2002). ish whether some cases this may not be possible because the dies, reports relevant information is missing in the publication. 15.4. ISSUESIN POOLEDANALYSIS a statistical asure of het- Assessingpublication bias OF EPIDEMIOLOGICATSTUDIES rtistic, where Publicationbias ariseswhen studieswith statistically INVOLVINGMOTECULAR MARKERS nother statis- significantresultsare more likely to be publishedand rcrcentageof cited, and are preferentially published by scientific Choiceof study design reterogeneity joumals. A formal test for publication bias can be There are two commonly used study designs: Higgins and done by performing the test for asymme- case-control and cross-sectional studies. Case- JVoto lO0Vo, try proposedby Egger et al. (1997). In the absence control studies are more popular when genetic 202 META.ANAIYSISAND POOTEDANALYSIS - CENETICAND ENVIRONMENTATDATA lssuE!

Begg'sfunnel plot with pseudo 95% contidence limits Data request The collectir investigators nication net worksheets in the studY tion to Paa will encout data accord data receive anonymous

Evaluation < Ln ad hoc particiPants is very us Figure 15.2 Funnel plot for the evaluation ofpublication bias evaluate hc single stud sho markers are involved, while cross-sectionalstudies Box 15.2 Steps to performing a pooled mation are preferred when markers of exposure in oth- analysis case-contr erwise healthy are explored. Cohort availabiliq l. Selectthe studies: studies may also be included in pooled analyses. rate for b (a) Develop clear inclusion and exclusion Pooling data from cohorts has its advantagesas subjects in criteria. well as limitations. For example, pooling cohort ured. Infor (b) Requestthe data. studies that involve diet assessedprior to develop- reproducit (c) Validate the data. ment of the diseasewould limit recall and selection time of sa 2. Standardizethe data. .A limitation that these studies. sharedwith of storage 3. Analyse the data: the case-+ontrol studies,is that they may havebeen as well al (a) Generatepooled estimates. designedand implemented differently, thus leading laboratorl (b) Test for heterogeneity. to heterogeneitybetween studies and in many cases is Possibl (c) Assesspublication bias - stratify accord- to the inability to standardizedata before pooling. studies in, ing to published/unpublisheddatasets. in some r (d) Conduct subgroupand sratified analyses. als, has r Planning of the study tentativel' A pooled analysis of studies involving molecular ment on I markers in human populations needs to take into Selectionof studies easier to : account all the aspectsrelated to biological model- A further step is the selectionof studiesto be includ- ling or laboratory practice. The frrst step is locating ed in the pooled analysis.At this point, clear inclu- Data stan and collecting a list of relevant studies, followed sion and exclusion criteria needto be set and agreed by the tabulation of the study design, laboratory upon,e.g. it may be importantto excludeconvenience An imPol methods, and analysis of the data. Human studies samplesof healthy subjects,or subjectswith precur- studies is using often representthe joint effort of sor lesions, e.g. colon polyposis, or with specific ker meas researchers from different areas: therefore an advi- life-style habits, e.g. biological markers measured markers sory goup of recognized experts in several disci- in alcoholic subjects,or with specific exposure,e.g. the labot plines, usually , molecular workers exposed to chemicals.For stud- method ( and , would greatly benefit the success ies, it must be evaluatedwhether phenotypestudies niques al of the pooled analysis. can be used as a surrogateof genotype. common ISSUESIN POOTEDANALYSIS OF EPIDEMIOTOCICATSTUDIES INVOLVING MOLECUTAR MARKERS 2O3

Data request testing is also commonly defined as either binary or in three levels (wild-type, heterozygous,homozygous The collection of the original databasesfrom the variant). Comparison of levels of biomarkers of investigators can be performed through a cornmu- exposue is more difficult, since the protocols, equip- nication network based on e-mail. Pre-compiled ment and methods may be different among studies. worksheets reporting the variables to be included An example of this issue can be found in a re- in the study should be sent along with the invita- evaluationof 25 cytogeneticstudies (Bonassi er al. tion to participate in the pooled analysis. This 1996).A possiblesolution is to obtain a post-hocdata will encourage the investigators to contribute standardizationby asking the participating investiga- data according to a common framework. All the tors to test their method againsta cornmon gold stand- data received and included in the dataset must be ard, so that some correcting factor can be produced. anonymous. Often the laboratories involved are not available for further testing; an alternative approach is to ask them Evaluation of the validity of the study to provide detailed information on the method used, An ad hoc collected from all the so that differences and similarities acrosslaboratories participants, in order to characterize each study, can be better evaluated.Another aspectto consider is is very useful. This information is essential to the standardization of the to collect evaluate how reliable is the evidence provided by risk factors.In a pooled analysison micronuclei (the single studies.A minimum epidemiologicalinfor- HUMN study; Bonassi et aI. 2001), few variables a pooled mation should be collected, e.g. type of controls in were included in the overall statisticalanalysis, i.e. case-control studies, incident vs. prevalent cases, age, sex, coun!ry of origin, because several other availability of histological diagnosis, response factors, such as drinking, medical treatments rate for both cases and controls, percentage of and occupational exposure, were collected but not exclusion subjectsin which the biological marker was meas- standardizedand were not comparable acrossstudies ured. Information dealing with the sensitivity and categories. A common approach in pooled analysis reproducibility of the assay collected, such as the is to collect a minimum set of epidemiologicalvari- time of collection and storage, the length ables,for which a standardcan be set and compli- of storage, the reagentsused, the scoring criteria, ance to the standard can be expected from most of as well as data on inter- and intra-subject and - the investigators.The standardizationof the outcome laboratory variability, should also be collected. It variable is also relevant. A requirement is that com- is possible to apply a quality scoring system to parable measurementsof the reflects the ify accord- studiesincluded in the analysis.This method, used same of biological events, but this assump- datasets. in some meta-analysesof randomized clinical tri- tion is quite often not verified (Bonassiet al.2Nl). d analyses. als, has many potential advantagesand could be Two approaches are possible. One involves the tentatively applied to biomarkers, where an agree- categorization of data within each laboratory before ment on the quality of technical proceduresseems data are assembled.This approach was used by the easierto reach. European Study Group on Cytogenetic Biomarkers to be includ- and Health (ESCH) (Bonassi et al. 2000; Hagmar clear inclu- et al. 1998) and it is generally used in cohort stud- t and agreed Data standardization ies linking the of cytogenetic biomarkers convenience An important issue when pooling data from different to cancer (Bonassi et al. 20[f; Hagmar et aI. 1998; with precur- studies is the degreeof standardizationof the biomar- Smerhovskyet a\.2001,2002). Anottrersolution is to uith specific ker measureacross studies. Studies involving genetic apply statistical modelling for correlated data. Each rs measured markers suffer fewer problems of standardization of laboratory is considered as a cluster of events, and (posure,e.g. the laboratory technique, since the most corffnon the within-cluster conelation is taken into account rotype stud- method (PCR) is fairly universal. However, new tech- when the association biomarker/outcome is evalu- rtype studies niques are now used that require comparison with the ated (Bonassiet aL 1999; Golgstein 1995; Rasbash cornmon PCR methods. The outcome of genetic et al. 1999\. META-ANALYSISAND POOTEDANATYSIS - CENETICAND ENVIRONMENTALDATA p( H eteroge n ei ty among sfudies Any study that pools original data should estab- UsuallY ducted in di The presenceof heterogeneityamong studies, and lish inclusion criteria beforehand. For example, variety of el the distinction between heterogeneity due to the the GSEC study uses the following criteria to (a) qual- The investigt different distribution of risk factors in .the study include unpublished studies: score the pre-defined ing ethical population and heterogeneityattributable to exter- ity of the study according to a list the countries nal variables (study design, laboratory protocols, of variables including study design, choice of and in no c etc.) needs to be addressed. Some factors that controls, participation rate of cases and controls, sonal identil determine heterogeneitycan be efficiently studied laboratory methods, reasons for nonpublishing (b) another. with a pooled analysis, where major sources of the data; and check the frequency of both genetic variability in protocols and in scoring can be identi- epidemiological and variables to assess fied (Bonassiet al.20Ol). any unusual pattern, such as genotype frequencies Referencet The first step is to run a univariate analysis of not in Hardy-Weinberg equilibrium. In addition, a BlettnerM er the biomarker distribution by study or laboratory of studies included in the analysis and stud- ies not included becauseof lack of interest of the and Pooled to assessthe distribution of relevant variables and 28: l-9. original investigator should always be generated, identify . This approach should provide Bogardus ST to assessinclusion bias. information on whether to include a dataset.and in epidemiolc certain circumstancesto rank the datasetsby quali- Another option is to identify a committee of the need : quality ty. Information on the time of samplecollection and expertsto evaluatethe of the assaysand the Assoc 281r storage, length of storage, reagents used, scoring reliability of the information. Among basic criteria Bonassi S e criteria, as well as inter- and intra-subjectand labo- that should be consideredfor including a database, internatior cYtoki ratory variability are useful elements to judge the size is one of the most relevant. Often biomarker the lymPhocY possiblesource of variability. Inter-study variability studies are small because of costs and working procedures; criteria' at can be accounted for in the statistical analysis; time required by technical however, a certain number of samplesprocessed are neces- clei. Envit however. the removal of outliers with no obvious S el sary to achieve good practical skill. An important Bonassi possibleexplanation should be considered. biomonitc issue is the use of the same population by several exchange different laboratories,with the possibility of dupli- Publication bias Bonassi S cating the same subjects when pooling. This is lYmPhocl Pooled analysis allows for the collection of unpub- typical of studies involving biomarkers, where a exposure lished studies, of unpublished data from published population of interest, e.g. a case--controlstudy on Cytogene studies or of pilot studies. Sometimes,epidemiologi- , is shared by laboratories performing 1619-16: cal studies involving biomarkers remain unpublished different assays,or different . This has Bonassi S for reasonsthat do not necessarilyrelate to less rig- to be checked carefully by cross-checking IDs, causeof orous study design, analysis or laboratory worh but ages,dates of birth if available,or the namesof the able evid becauseof the quick decreasein interest for markers authors on related publications. DerSimoni Cr that are less fashionable,lack of funding or difficulty trials. Egger M e in recruiting subjects,etc. Pooling thesedata gives the simPle' 1 opportunity to collect data on rare cancersor on small, Ethical issues Fenech N geographically isolatedpopulations, with the potential Ethical considerationsshould be made in all phases Project of raising new researchquestions. Differences between of pooling studies.A suitable strategyis to request use of tl published and unpublished data should be always the original investigatorsto senddata to the coordi- damage testedby comparingthe frequencyof relevantvariables nating centre without personalidentifiers but with a Friedenrei in the nro groups before pooling data. In the Genetic numerical ID. A secondID can be centrally assigned ePidem Susceptibility to Environmental Carcinogens(GSEC) to each individual when the data are received, and Friedenre study, for example, we assessedthe fiequency of the tfris n wiil then be used in the analysis. This will analYse polymorphisms in the conrol population coming from make it impossible for the who pools the Garte S frequer pubtshed studies vs. unpublished studies,and no dif- data to identify and/or contact the subjectsincluded Biomar ferenceswere observed(Garte et al.2N1\. in the dataset. REFERENCES 205 ould estab- Usually pooled analysis involves studies con- Golgstein H. 1995. Multilev el Statistical M odels.Halstedl. r example, ducted in different countries, which implies a New York. criteria to variety of ethical regulations (see Chapter 22). Greenland S. 1987. Quantitative methods in the review : the qual- The investigator should strictly adhere to all exist- of epidemiologic literature. Epidemiol Rev 9: l-30. Hagmar L et al. 1998. Chromosomal aberrations in lefined list ing ethical and safety provisions applicable in lymphocytes predict human cancer: a report from the choice of the countries in which the researchis carried out, European Study Group on Cytogenetic Biomarkers d controls, and in no circumstances should data with per- and Health (ESCH). Cancer Res58:41174121. rpublishing sonal identifications be sent from one country to Higgins JP,Thompson SG. 2002. Quantifying heteroge- ry of both another. neity in a meta-analysis.Stat Med 21: 1539-1558. ; to assess Higgins JP et a1.2003.Measuring inconsistency in meta- lrequencies analyses.Br Med J 327; 557-560. addition, a References Hobbs CG et al. 2006. Human papillomavirus and s and stud- Blettner M et al. 1999Traditional reviews, meta-analyses : a and meta- 3restof the and pooled analysesin epidemiology. Int J Epidemiol analysis. Clin Otolaryngol 3l: 259-266. generated, 28: l-9. Mantel N, Haenszel W. 1959. Statistical aspectsof the Bogardus ST Jr, Concato J, Feinstein AR. 1999. Clinical analysis of data from retrospectivestudies of . quality J Natl CancerInst22:719-:748. mmittee of epidemiological in molecular genetic research: the need for methodological standards. J Am Med Normand SL. 1999. Meta-analysis:formulating, evaluat- ays and the Assoc 281'. 1919-1926. ing, combining, and reporting. Stat Med lEl'321-359. mic criteria Bonassi S et aI.2001. HUman MicroNucleus project: Prentice RL, Thomas DB. 1987. On the epidemiology a database, international database comparison for results with of oral contraceptives and disease.Adv Cancer Res biomarker the cytokinesis-block micronucleus assay in human 49:285401. rd working lymphocytes: I. Effect of laboratory , scoring Raimondi S et a|.2006. Meta- and pooled analysisof i; however, criteria, and factors on the frequency of micronu- GSTII and lung cancer: a HUGE-GSEC review.Am J are neces- clei. Environ Mol Mutagen 37:3145. Ep idemio I 164 : 1027-1042. r important BonassiS et al.1999. Analysisofcorrelated data in human RasbashJ et al. 1999. MLwiN version 2.0, Multilevel by several studies.The caseof high sisterchromatid Models Project. Institute of Education, of . ty of dupli- exchange frequency cells. Mutat Res 438: 13-21. SmerhovskyZ et a|.2001. Risk of cancerin an occu- rg. This is Bonassi S et al. 2000. Chromosomal aberrations in lymphocytes predict human cancer independently of pationally exposed cohort with increased level of 's, where a exposure to carcinogens. European Study Group on chromosomal aberrations. Environ Health Perspect ol study on Cytogenetic Biomarkers and Health. Cancer Res 60: 109:4145. performing l6t9-r625. Smerhovsky Z et al. 2002. Increased risk of s. This has Bonassi S et al.1996. Is human exposureto styrenea cancer in -exposed miners with elevated fre- rcking IDs, causeof cytogenetic damage?Re-analysis of the avail- quency of chromosomal aberrations. Mutat Res 5l4t amesof the able evidence.Biomarkcrs l: 217-225. 165-176. DerSimonian R, Laird N. 1986. Meta-analysis in clinical Stroup DF et al. 2000. Meta-analysis of observa- tials. Control Clin Trials 7: 177-188. tional studies in epidemiology: a proposal for report- Egger M et al. 1997. Bias in meta-analysisdetected by a ing. Meta-analysis Of Observational Studies in simple, graphical test. Br Med J 315:.629434. Epidemiology (MOOSE) group. J Am Med Assoc 283:2O08-2012. n all phases Fenech M et al. 1999. The HUman MicroNucleus Project - an intemational collaborative study on the Taioli E. 1999. International collaborative study on s to request use of the micronucleustechnique for measuring DNA genetic susceptibility to environmental carcinogens. the coordi- Canc e r Epidemi ol B iomarkers P revent 8: 727 J 28. 's damage in humans. Mutat Res428:271-283. but with a Friedenreich CM. 1993. Methods for pooled analysesof Taioli E, BonassiS. 2002.Methodological issues in pooled lly assigned epidemiologic studies.Epidemiology 4: 295-302. analysis of biomarker sttrdtes.Mutat Res 512:85-92. ceived, and Friedenreich CINf.2W2. Commentary: improving pooled Wolf F. 7986. Meta-Analysis: Quantitative Methods for s. This will analysesin epidemiology. Int J Epidemiol 3l:.86-87. ResearchSynthesis. Sage: Newbury Park, CA. o pools the Garte S et al. 2001. Metabolic polymorphism Yusuf S et al. 1985. Beta blockade during and after :ts included frequencies in control populations. Cancer Epidemiol myocardial infarction: an overview ofthe randomized Biomarkers Prevent lO: 1239-1248. tials. Prog Cardiovasc Dis 27:335-3'11.