REVIEW

From DNA to RNA to disease and back: The ‘central dogma’ of regulatory disease variation

BarbaraE.Stranger* and Emmanouil T. Dermitzakis

The Wellcome Trust Sanger Institute,Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK * Correspondence to:Tel: þ 44 (0)1223 834244; Fax: þ 44 (0)1223 494919; E-mail: [email protected]

Date received (in revised form): 25th April 2006

Abstract Much of the focus of human disease genetics is directed towards identifying nucleotide variants that contribute to disease phenotypes. This is acomplex problem, often involving contributions from multiple loci and their interactions, as well as effects due to environmental factors. Although some diseases with agenetic basis are caused by nucleotide changes that alter an amino acid sequence,inother cases, disease risk is associated with alteredgene regulation. This paper focuses on how studies of expression variation might complement disease studies and provide crucial links between genotype and phenotype.

Keywords: , human disease,linkage mapping, association mapping

Introduction disease creates new directions for disease research and is an important step towards improving human health. Understanding thecauses of human disease is one of the most Approaches to identifying the involved in complex fundamental goals of modernmedicine.Individuals differ with disease can be generally grouped into twocategories: candidate respect to disease susceptibility,disease progression and gene studies and linkage/association studies. Candidate gene effectiveness of treatment. Identifying the factorscontributing studies use knowledge about the biology of adisease,and about to these differences, and elucidating their interactions as they genes in physiologically or biochemically relevant pathways, contribute to aspectsofdisease phenotype,isaprecursor to and attempt to correlate genetic variation at these‘candidate improved prevention, detection and treatment of disease. genes’ with disease phenotype.Unfortunately,for most Much of the understandingofhumandisease derives from diseases, this type of information is not available or complete the study of those diseases that segregate in families in a enoughtoprove widely useful, and including them in some Mendelianfashion,where the causative variants and the genes of the analyses is more likely to increase the noise than it is in which they reside have been identifiedthrough classical to reduce the search space for the disease.Genome-wide family linkage approaches1 and through studies in large linkage studies and association analyses serve as alternative pedigrees and in isolated populations based on founder approaches to surveying the contribution to disease of genetic effects.2 The vast majority of common diseases exhibitamore variants located anywhere in the genome.The genome-wide complex mode of inheritance,however,aggregating in aspect means that these studies do not requireany apriori families but rarely exhibiting Mendelianinheritance.Examples hypothesis that aparticular region is involved, although of diseases of this type include diabetes, obesity,schizophrenia predictions about the potential effect of specific variants and asthma. Understandingofthese ‘complex’ diseases is (eg non-synonymous singlenucleotide polymorphisms[SNPs]) improving, although still limited, but it is clear that genetic can be incorporated in the models.Inthis respect, these variationplays an important role in susceptibility to disease,for approaches are unbiased. Family-based linkage studies entail exampleinautoimmune and infectious diseases.3 Most identifying genetic variants in families that co-segregate with complex disease is thought to be caused by the combined disease more often than would be expected by chance.In effect of genetic variants at afew loci or multiple loci, each general, linkage studies have achieved limitedsuccess in iden- with only modestfunctional effects on susceptibility. tifying genomic regionsinvolved in complex disease,inpart Additional roles areplayedbyenvironmental factorsand their because they are underpowered to detect moderate genetic interactions. Mapping thegenomic regions contributing to effects. Furthermore,because identification of aregion or

q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006 383 REVIEWReview Stranger and Dermitzakis

regions associated with the disease or trait requires identifying Ta ble 1. Genes with non-coding variants affecting disease. those alleles that segregate with the disease in families, which in Gene Disease Reference turndependsonrecombinationwithin the families, it can be difficult to narrowaregion exhibiting significant linkage. CAT Hypertension 12 An alternativemethodology is to performassociation analysis, CCR5 HIV-1 progression and 13,14 which looks for correlation of genetic variants with aspects of transmission phenotype,but does not requireapedigree structurefor the individuals. Association analyses aremore powerfulfor the CD209 Dengue fever 15 detection of common disease alleles with small to modest CRP Cardiovascular disease risk 16 effects,4,5 and increasingly are being used successfully in studies to identify genes contributing to disease. 6,7 CTLA4 Autoimmune disease 9 Although many phenotypic differences among individuals DARC Malaria susceptibility 10 are attributable to variants in codingDNA,8 variants in non- coding DNA can have profoundeffects on phenotypes, DAT1 Attention deficit hyperactivity 17 including disease phenotypes. For example,regulatoryvariants disorder affecting transcription initiation, splicing, RNA stability and FCRL3 Rheumatoid arthritis and 18 translational efficiency areknown to playroles in conditions autoimmune disease including autoimmune disease ( CTLA49 ), malaria ( DARC10), variouscancers(SMYD311)and other examples (shown in GSK3B Parkinson’sdisease 19 Table 1). In studies of suchdiseases, gene expression mayserve IGFBP3 Breast cancer 20 as an intermediate phenotype between disease phenotype and genotype. 30,31 Gene expression, or mRNAlevels, can be INS Type Idiabetes 21 modulated by variants in coding or non-coding DNA IPF1 Type II diabetes 22 (eg transcription factorsorbinding sites within promoters). Whole-genomeassociation studies of gene expression MMP-1 Breast cancer progression 23 (expression quantitativetrait loci[eQTL] mapping) may MMP-3 Coronaryarterydisease 24,25 generate hypotheses for disease susceptibility by identifying those regions of the genome with functional effects on gene NFKB1 Inflammatorybowel disease 26 expression. These might then serveascandidate regions for RET Hirschsprung’sdisease 27,28 evaluating for association with disease phenotypes, as is discussed below. SMYD3 Colorectal cancer,breast cancer, 11 hepatocellular carcinoma TNF Malaria 29 Resources for genome-wide analysis An efficient approach to the study of humandisease different tissues becausegene expression is highly dependent benefits from the use of shared resources. For example,in on developmental and cellular context and,indeed, some order to performgenome-wide linkage or association analyses, diseases manifest their phenotypes only in specific tissues. suitable DNA markersare required. The humangenome is In addition, thecell perturbationsthat accompany the estimated to harbourmore than 10 million SNPs, present at establishment of cell lines suggest the study of geneexpression . 1per cent frequency, 32 and theseSNPs are located in primarytissues, although, clearly,the choice of sample throughout the genome in regions of coding and non-coding dependsonthe purpose,stage and feasibilityofastudy,the DNA. Publicly available databases of SNP alleles, assays and sample size required and its availability.Despite some genotypes areaccessible online (egdbSNP33 and HapMap34). shortcomings of cell lines as perfect proxies for the complete High-throughput genotyping platformsand reductions in set of humantissues, data can be collected on alarge scale with genotyping costs nowmakewhole-genome genotyping respect to sample size and reproducibilityand can provide feasible for largenumbers of samples. Gene expression can also candidates for further study in other samples. Currently, there be quantified in ahigh-throughput manner usingcommer- are relatively few data on gene expression across the diversity cially available microarrays, permitting the detection of small of healthyhuman tissues or across multiple individuals from differences in expression levelsamongsamples. different populations.These data from healthyindividuals will The establishment of cell lines creates resourcesthat can be provide important information on the range of naturally used by multiple research groups from around the world to occurring gene expression variationand will serve as abaseline surveyvarious cellular phenotypes. With respect to the study againstwhich to comparedisease-associated molecular of gene expression, it is desirable to establish cell lines from phenotypes.

384 q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006 From DNAtoRNA to disease and back RevREVIEWiew

Statistical issues in genome-wide functional effect of the candidate disease variants. Below, the analysis twodirections will be explored in more detail. Although genome-wide association studies arethoughttohave more powerthan family-based linkage studies, they present Generating functional candidates using eQTL strong challenges in the formofstatisticalinterpretation. mapping For example,asimple genome-wide association study maytest Regionswith functional effects on gene expression can be hundreds of thousands of SNPs for association to aphenotype localised through the use of association mapping. Gene (or,moretypically,multiplephenotypes), and more expression, or mRNAlevel, is aquantitativephenotype that complicated modelsallowing for SNP–SNP interactions can be assayedinmultiple individuals. When the same vastly increase the already largenumber of statistical tests. With individuals are surveyedfor genetic variationatmarker loci, such alarge number of tests, the significance threshold must be for example SNPs,association analysistests whether variation adjusted to control for the number of false-positiveassociations. at each SNP can explain the observedphenotypic variation. Although procedures for multiple test correction exist —for The rationale behind this analysis is that markersthemselves example, Bonferroni correction, false discoveryrate35,36 and are either the causalvariant or are highly correlated (in LD) permutations of phenotypes relative to genotypes37 —it with the causalvariant. remains unclearwhich is the bestmethod to apply in this Associationmapping of gene expression variation has been 42 –45 46 –48 context.Itisalso not trivial to infer the biological significance successful in many species, including human, yeast, 49 –52 53 54,55 52 of an association from statistical significance,becauseallele mouse, rat, fish and maize. Together, these studies frequencies, variance of the phenotype,density of markersand provide several striking observationsrelated to the nature of linkage disequilibrium (LD) can have atremendous impact on functional variation influencing gene expression. First, the statistical significance inferred. variationingene expression levelsamongindividuals is Human genetic variation is structured into haplotypes, common —and much of that phenotypic variationhas a such that alleles at nearbyloci often showstrong statistical genetic basis. Much of the association signal is located cis-to 45,52,56 association with one another.Becauseofthis association, the gene of interest, although trans-actingvariants have known as LD,alarge region maycontain multiple SNPs also been observed. Hotspots of gene regulation (ie regions of exhibiting asignificantassociation withagiven phenotype. the genome influencing expression of several genes) have been Although this structure of humangenetic variation facilitates observedinsome, 44 but not all, studies. association mapping, it can complicate subsequent fine-scale There areseveral ways in which the study of the regulation mappingtonarrowthe associatedregion and locate the causal of gene expression can enhance disease studies as well as variant, as discussed below. Another concerninassociation narrowthe choice of candidate regions for disease association studies is the potentialfor false associations caused by studies. Where information exists about the contribution of population stratification,38 so care must be taken to reduce particular genes to adisease phenotype or susceptibility, these effects through appropriate experimental design and understanding theregulatorycontrol of those genes mayassist data analysis.39,40 in elucidating the complete set of effects. In addition, understanding the regulation of categories of genes,orgenes of aparticular pathway,may providetargetsfor further eQTL mapping follow-up in disease studies. It mayprove more time-efficient and cost-effectivetohavealistofmany potentialfunctional The interrogation of gene expression to facilitate thedesign variants located throughout the genome,however, and test and interpretation of disease association studies can be a them againstalarge number of diseases. Whole-genome powerful tool for theidentification of biologically functional eQTL studies can providealistofregionsofthe genome with variants and the interpretation of biological effects.41 By functional effects on the expression of known genes introducing aquantifiable and easily measurable biological (Figure 1a).SNPslocated within these regions can then serve outcome,itispossible to assess therelevance of statistical as candidates for disease association studies, much in the same significance and eliminatesome of the issues raised above. waythat non-synonymous SNPs areoften considered because The use of gene expression variation in the context of disease of their potential functional effect. There areseveral advan- mappingcan be viewed in twoways(Figure 1). On the one tages to this type of targeted approach over awhole-genome hand, hypotheses can be generated by discovering functional scan. First, because the number of SNPs to be genotyped in variationusingeQTL mappingand subsequently testingthose each individual is reduced, many more individuals can be functional variants in large case-control samples. On the other surveyedinadisease study without vastly increasing costs. The hand, following adisease association study that has identified reduction in the number of markerstested can eliminate some several signals located within regions of non-coding DNA, of the problems of multiple test correction, more sensible eQTL mappingcan be used to interpret and dissectthe thresholds can be used and smaller effect variants can be

q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006 385 REVIEW 386 Revie

AB w Mapping regulatory regions Disease association Casesand controls Healthy individuals Genotyping Gene expression survey q Whole-genome HE genotypes NR Whole-genome Whole-genome YS SNP expression TE genotypes phenotypes SNPs with allele frequency WA differences in cases and RT Association controls

PU analysis BL IC

AT Regulatory List of disease-associated

IO regions SNPs/regions NS 14

73 Diseaseassociation

–9 SNPs in coding regions SNPs in non-coding

542 Focused set or near genes regions

. of regulatory HU SNPs MA Genotyping Candidate disease genes NG

EN Genotypes in large OM numbers of cases and Gene expression support

IC controls S .V

OL Identify regulatory Identify gene Assess regulatory

2. region for the gene beingregulated potential of region

NO SNPs with allele frequency differences in cases and 6. controls 383–390

Identify gene being regulated JUNE

List of genes associated Strang 2006 with disease risk er Figure1. Flow charts demonstratingthe use of genome-wide gene expression studies in relation to disease studies. (A) Generating hypotheses for disease studies. and Expression quantitativetrait loci (eQTL) mapping studies identify variable regions of the genome with functional effects on gene expression. Single nucleotide polymorphisms within these functional regions can be candidates for involvement in disease. (B) Supporting disease association studies. Disease mapping studies often identify non-coding Dermit regions of the genome exhibiting significant association with disease. eQTL studies can provide alink to an associated gene by providing annotation of the function of that

non-coding region. zakis From DNAtoRNA to disease and back RevREVIEWiew

detected. Secondly,any significantassociationsdetected and that many of the significant peaks in an association analysis between SNPand disease phenotype provide both a will fall in regions devoid of genes. mechanism (gene regulation) and the identity of the affected Disease association studies often identify non-coding regions gene.Finally,the fact that potentialcausal regulatory of the genome exhibiting significantassociation withdisease. variants were initially discovered in healthyindividuals and The exploration of those non-coding regions will benefit from subsequently have been associated withdisease means that the surveyofgene expression variationand howitrelates to such variants are common and arelikely to contribute genetic variation (eQTLmapping). For anydisease-associated significantly to the disease risk of the population. non-coding region (eg from acase-control study), it is possible The methodology above carries the risk of focusing only to test whether the disease-associated SNPs and haplotypes on certain types of genomic variants, while it is known are also associated with geneexpression variationofnearby that much of genomefunction is still missing. Away to genes (as identifiedfromeQTL studies; Figure 2). This enables circumvent this problem is to enhance disease studies by conclusions to be drawn about the nature of the function of incorporating the data on functional regulatoryregions while the causal variant. For instance, if the same haplotype that using commercially available whole-genome SNP genotyping appearstoincrease the disease risk also appears to be associated chips in disease studies, in order to performthe association with high expression of anearbygene, it is possible to start analysisusing Bayesian methods that assign different prior makingsomeconnections between the biology of the affected probabilities to SNPs on the array. Under such ascenario, gene and the disease itself.Moreover, one can hypothesise SNPs located in regions with known functional effects on (and hopefully test) howlevels of expression of agenemight expression of specific genes —asidentifiedthrough eQTL affect disease risk. This simple connection between the two studies —would be assigned ahigher prior probability of types of study could providenot only the identityofthe being associated withaphenotype.Inaddition, one might gene that is linked to the disease,but also the consequence assign ahigher prior probabilitytoSNPs in known promoters, of genome variationthat linked the gene with the disease.It enhancers or transcription factors. Thus, one could focus on mayalso providesome clues about other candidates (upstream the effects of candidate variants without missing other transcriptional regulators, interacting proteinsetc). important signals. Another substantial advance of knowing Several studies illustrate the utility and validity of usinggene regulatoryvariants before performing agenome-wide expression variationfor disease fine mapping. Twoofthese association study is that one can correlate phenotypes and studies have focused on identifying functional nucleotide regulatorynetworksand utilise such information in the variationbyfocusing primarily on the regions surrounding statisticalmodelling of thedisease. each of aset of genes(cis-), but also considering other regions located trans-tothe genes.42,45 These studies showedthata large fraction of genes (10–20per cent) have significant variationthat affect their gene expression in cis,and in some Supporting genome-wide disease cases in trans.The regulatoryvariants that affectgene association studies: Narrowing on expression variationcan be mapped withthe sameresolution disease-associated non-coding signals as disease variants in genome-wide association studies because, in both cases,the resolution dependsonthe LD structure of Genome-wide association studies arenow increasing in the human populations. These studies, which allowthe frequency, 6,57 and although it would have been preferable to identification of regulatoryhaplotypes, need to be verified have identifiedall functional regulatoryvariants in advance, before functional experiments can be performed.The most investigatorswill be faced with the challenge of interpreting appropriate waytoperformafirst-pass verificationistotest some of the strongest association signals. Many of the associ- whether allelic imbalance in expression is correlated with ation studies have amulti-phase design, wherein afraction of heterozygosityinthe same SNPs as those that showed the SNPs with the topstatistical significance in the first phase genotypic association with gene expression. are genotyped in asubsequent phase in anew setofindivid- Even with this information, and thefact that the effect of a uals. The statistical exercise must eventuallygiveway to causal variant maybeknown to have an effect on gene biological interpretation, however, and the identification of regulation, it is still along wayfrombeing able to identifythe the causal variant will be necessary. Although most of the exact DNA variant that causes the regulatoryeffect and confirmed disease-causing variants arelocated in coding subsequent increased disease risk. This is astage where things regions,this observation is due to an ascertainment bias in the become complexfor many reasons. For example,althoughthe ability to predict the potentialfunctional consequences of genome-wide distribution of LD is quite variable,average LD nucleotide variation. As thehumangenome is composed of in the extends over large regions, which only , 3–5per cent codingDNA, and studies increasingly makes it challenging to fine map acausalvariant in many attribute function to non-codingDNA, it might be expected regions.Inthe bestcasescenario,associated SNPs would be that much of the disease-causing variation will be non-coding identifiedinaregion of very lowLD, thus reducingthe

q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006 387 REVIEWReview Stranger and Dermitzakis

12 10 8 Disease association 6 4 2 0 050100 150 200 250 300

12 10 8 Gene expression association 6 Gene A 4 2 0 050100 150 200 250 300

12 10 8 Gene expression association 6 Gene B 4 2 0 050100 150 200 250 300

Gene AGene B Figure 2. The three panels show the signal of association (as 2 log p -value of individual single nucleotide polymorphism (SNP)–pheno- type associations) of genomic regions with disease,gene expression of gene Aand gene expression of gene B. From this plot, it is suggested that genetic variation that increases disease risk is also associated with gene expression variation of gene A(assuming that the associated SNPs and haplotypes arethe same). This probably indicates that the disease risk is aregulatoryeffect and that the amount of transcript (or ) of gene Aiscritical for the development of the disease.

number of potentialcausal variants to test subsequently.More disease and phenotypic variation. This, and other ongoing often, an associated region of approximately 10–20kilobases studies, will offer afirst-pass annotation of functional elements will be identified.45 Although fine mappinginapopulation in the human genome and will providethe framework for with reduced LD (eg Africans) mightassist in identifying detailed characterisationoffunctional variation. shorter associated regions,itisatthis stage where extensive If an established and confirmed association of aregion amountsofinformation about genome function arecrucial. with disease and gene expression variation exists, and there is The diversity of methodologies for large-scale interrogation of light annotation of the associated region for coding and the human genome for function is increasing; the resulting non-coding elements, it is possible to apply brute force information will be very important for prioritising which of approaches to identifythe specific DNA changes that are those associated DNA segments to focus on first. causal.Arecommended strategy is to performextensive resequencing of potentially functional segments of theregion in high and lowexpressing individuals. The number of Interpreting regulatoryvariation individuals required to be assayeddependsonthe magnitude of the functional effect and the predicted within-population The identification of the causal variant can benefit from frequency of the causalvariant. This can be assisted by initial incorporating information about genome function. Many powercalculations that allowprediction of what is likely to be studies to determine functionality within the humangenome identified, giventhe study design. Theoptimal approach sequence are nowinprogress usinghigh-throughput, seemstobetosample sequences from individuals at each of genome-wide methodologies. TheENCyclopedia Of DNA the twoends of the phenotypic (expression) distribution, and Elements (ENCODE) project is the best example58 of this then proceed inwards. type of study.The aim of this project is to attribute afunc- As soon as aset of genomic segments have been tional identity to each nucleotide of the humangenome.Inits resequenced, one shouldlook for variants that appear to pilot phase,1per cent of the humangenome (44 genomic have equalorbetter correlation with the phenotype than regions)has been studied extensively for function, interspecies that observedinthe initial association study.This can be conservation and populationgenetic variation. The compre- determined by genotyping all of the potentially functional hensiveanalysisofthese 44 regionswill provide important variants (identifiedinthe resequencing approach in asubset of clues for the pattern and structure of genome function and will the individuals) in the originalcomplete sample.Depending allowpredictions for the nature of variations behind complex on the strength of these correlations, and where the highly

388 q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006 From DNAtoRNA to disease and back RevREVIEWiew

correlated variants are located, the appropriate approach 15. Sakuntabhai, A., Turbpaiboon, C.,Casademont, I. et al. (2005), ‘A variant should be adopted for directfunctional testing of causal in the CD209 promoter is associated with severity of dengue disease’, Nat. Genet. Vol. 37, pp.507–513. haplotypes. Such approaches can include reporter constructs, 16. Carlson, C.S.,Aldred, S.F. ,Lee,P.K. et al. (2005), ‘Polymorphisms within binding assays, RNAstability assays and chromatin modifi- the C-reactiveprotein (CRP) promoter region are associated with plasma cation assays usingall of the alternativehaplotypes. CRP levels’, Am. J. Hum. Genet. Vol. 77, pp.64–77. 17. VanNess, S.H., Owens, M.J.and Kilts, C.D. (2005), ‘The variablenumber of tandem repeats element in DAT1 regulates in vitrodopamine trans- porter density’, BMC Genet. Vol. 6, p. 55. Summary 18. Kochi, Y. ,Yamada, R., Suzuki, A. et al. (2005), ‘A functional variant in FCRL3, encoding Fc receptor-like3,isassociated with rheumatoid In this paper,some issues have been discussed that arise from arthritis and several autoimmunities’, Nat. Genet. Vol. 37, pp.478–485. the incorporation of gene expression variation data in disease 19. Kwok, J.B.,Hallupp,M., Loy, C.T. et al. (2005), ‘GSK3B polymorphisms studies. Theoverall message is that gene expression can greatly alter transcription and splicing in Parkinson’sdisease’, Ann. Neurol. Vo l. 58, assist the discovery of disease variants, as well as the inter- pp.829–839. 20. Al-Zahrani, A., Sandhu, M.S., Luben, R.N. et al. (2006), ‘IGF1 and IGFBP3 pretation of the biological effects of causal variants. Further tagging polymorphisms are associated with circulating levels of IGF1, exploration of gene expression variationinmore samples IGFBP3 and risk of breast cancer’, Hum. Mol. Genet. Vol. 15, pp.1–10. and more cell types will greatly enhance both our under- 21. Bennett, S.T.,Lucassen, A.M., Gough, S.C. et al. (1995), ‘Susceptibility to standingofphenotypic variation in humans and also the nature human type 1diabetes at IDDM2 is determined by tandem repeat variation at the insulin gene minisatellite ’, Nat. Genet. Vol. 9, pp.284–292. of regulatoryvariationand its impact on complex disease. 22. Karim, M.A., Wang, X., Hale,T.C.and Elbein, S.C. (2005), ‘Insulin promoter factor 1variation is associated with type 2diabetes in African Americans’, BMC Med. Genet. Vol. 6, p. 37. 23. Przybylowska, K., Kluczna, A., Zadrozny,M. et al. (2006), ‘Poly- References morphisms of the promoter regions of matrix metalloproteinases genes MMP-1 and MMP-9 in breast cancer’, Breast Cancer Res.Treat. Vol. 95, 1. Jimenez-Sanchez, G.,Childs, B. and Valle,D.(2001), ‘Human disease pp.65–72. genes’, Nature Vol. 409, pp.853–855. 24. Humphries, S.E., Luong, L.A., Talmud, P. J. et al. (1998), ‘The 5A/6A 2. Gulcher,J.R., Kong, A. and Stefansson, K. (2001), ‘The role of linkage polymorphism in the promoter of the stromelysin-1 (MMP-3) gene pre- studies for commondiseases’, Curr. Opin. Genet. Dev. Vol. 11, pp.264–267. dicts progression of angiographically determined coronaryarterydisease 3. Cooke, G.S. and Hill, A.V.(2001), ‘Genetics of susceptibility to human in men in the LOCATgemfibrozil study Lopid CoronaryAngiography infectious disease’, Nat. Rev.Genet. Vol. 2, pp.967–977. Tr ial’, Atherosclerosis Vol. 139, pp.49–56. 4. Risch, N. and Merikangas, K. (1996), ‘The futureofgenetic studies 25. Ye ,S., Eriksson, P. ,Hamsten, A. et al. (1996), ‘Progression of coronary of complex human diseases’, Science Vol. 273, pp.1516–1517. atherosclerosis is associated with acommon genetic variant of the human 5. Hirschhorn, J.N. and Daly,M.J.(2005), ‘Genome-wide association stromelysin-1 promoter which results in reducedgene expression’, J. Biol. studies for common diseases and complex traits’, Nat. Rev.Genet. Chem. Vol. 271, pp.13055–13060. Vol. 6, pp.95–108. 26. Borm, M.E., vanBodegraven, A.A., Mulder,C.J. et al. (2005), ‘A NFKB1 6. Klein, R.J., Zeiss, C.,Chew, E.Y. et al. (2005), ‘Complementfactor H promoter polymorphism is involved in susceptibility to ulcerative colitis’, polymorphism in age-related macular degeneration’, Science Vo l. 308, Int. J. Immunogenet. Vol. 32, pp.401–405. pp.385–389. 27. Emison, E.S., McCallion, A.S., Kashuk, C.S. et al. (2005), ‘A common 7. Ozaki, K., Ohnishi, Y. ,Iida, A. et al. (2002), ‘FunctionalSNPs in the sex-dependent mutation in aRET enhancer underlies Hirschsprung lymphotoxin-alpha gene that areassociated with susceptibility to disease risk’, Nature Vo l. 434, pp.857–863. myocardial infarction’, Nat. Genet. Vo l. 32, pp.650–654. 28. Grice,E.A., Rochelle,E.S., Green, E.D. et al. (2005), ‘Evaluation of the 8. Kasvosve, I., Delanghe,J.R., Gomo,Z.A. et al. (2000), ‘Transferrin RET regulatorylandscape reveals the biological relevance of aHSCR- polymorphism influences iron status in blacks’, Clin. Chem. Vol. 46, implicated enhancer’, Hum. Mol. Genet. Vol. 14, pp.3837–3845. pp.1535–1539. 29. Knight, J.C.,Udalova,I., Hill, A.V. et al. (1999), ‘A polymorphism that 9. Ueda, H., Howson, J.M., Esposito,L. et al. (2003), ‘Associationofthe affects OCT-1 binding to the TNF promoter region is associated with T- cell regulatorygene CTLA4 with susceptibility to autoimmune disease’, severemalaria’, Nat. Genet. Vol. 22, pp.145–150. Nature Vol. 423, pp.506–511. 30. Gottesman, I.I. and Gould, T.D. (2003), ‘The endophenotype concept in 10. To urnamille,C., Le VanKim, C.,Gane,P. et al. (1995), ‘Molecular psychiatry:Etymology and strategic intentions’, Am. J. Psychiatry Vol. 160, basis and PCR-DNAtyping of the Fya/fyb blood group polymorphism’, pp.636–645. Hum. Genet. Vol. 95, pp.407–410. 31. Wa tts, J.A., Morley, M., Burdick, J.T. et al. (2002), ‘Gene expression 11. Tsuge,M., Hamamoto,R., Silva, F. P. et al. (2005), ‘A variable number phenotype in heterozygous carriersofataxia telangiectasia’, Am. J. Hum. of tandem repeats polymorphism in an E2F-1 binding element in the 5 0 Genet. Vol. 71, pp.791–800. flanking region of SMYD3 is arisk factor for human cancers’, Nat. Genet. 32. Kruglyak, L. and Nickerson, D.A. (2001), ‘Variation is the spice of life’, Vol. 37, pp.1104–1107. Nat. Genet. Vo l. 27, pp.234–236. 12. Zhou, X.F., Cui, J.,DeStefano,A.L. et al. (2005), ‘Polymorphisms in the 33. http://www.ncbi.nlm.nih.gov/SNP/index.html promoter region of catalase gene and essential hypertension’, Dis.Markers 34. http://www.hapmap.org Vol. 21, pp.3–7. 35. Benjamini, Y. and Hochberg, Y. (1995), ‘Controlling the false discovery 13. McDermott, D.H., Zimmerman, P. A., Guignard, F. et al. (1998), ‘CCR5 rate —Apractical approach to multiple testing’, J. R. Stat. Soc.Ser.B promoter polymorphism and HIV-1 disease progression. Multicenter Methodol. Vol. 57, pp.289–300. AIDS CohortStudy (MACS)’, Lancet Vol. 352, pp.866–870. 36. Storey, J.D. and Tibshirani, R. (2003), ‘Statistical significance for 14. Kostrikis, L.G., Neumann, A.U., Thomson, B. et al. (1999), ‘A poly- genomewide studies’, Proc.Natl. Acad. Sci. USA Vol. 100, pp.9440–9445. morphism in the regulatoryregion of the CC-chemokine receptor 5 37. Doerge,R.W.and Churchill, G.A. (1996), ‘Permutation tests for multiple gene influences perinatal transmission of human immunodeficiency loci affecting aquantitativecharacter’, Genetics Vol. 142, pp.285–294. virus type 1toAfrican-American infants’, J. Virol. Vol. 73, 38. Cardon, L.R. and Palmer,L.J.(2003), ‘Population stratification and pp.10264–10271. spurious allelic association’, Lancet Vol. 361, pp.598–604.

q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006 389 REVIEWReview Stranger and Dermitzakis

39. Pritchard, J.K., Stephens,M., Rosenberg, N.A. and Donnelly,P.(2000), 49. Cowles, C.R., Hirschhorn, J.N.,Altshuler,D.and Lander,E.S.(2002), ‘Associationmapping in structured populations’, Am. J. Hum. Genet. ‘Detection of regulatoryvariation in mouse genes’, Nat. Genet. Vol. 32, Vol. 67, pp.170–181. pp.432–437. 40. Tang, H., Quertermous, T.,Rodriguez, B. et al. (2005), ‘Genetic struc- 50. Doss, S.,Schadt, E.E., Drake, T.A. and Lusis, A.J.(2005), ‘Cis-acting ture, self-identified race/ethnicity,and confounding in case-control expression quantitative trait loci in mice’, Genome Res. Vol. 15, association studies’, Am J. Hum. Genet. Vol. 76, pp.268–275. pp.681–691. 41. Stranger,B.E. and Dermitzakis, E.T.(2005), ‘The genetics of regulatory 51. Sandberg, R., Yasuda, R., Pankratz, D.G. et al. (2000), ‘Regional and variation in the human genome’, Hum. Genomics Vol. 2, pp.126–131. strain-specific gene expression mapping in the adult mouse brain’, Proc. 42. Cheung, V. G.,Spielman,R.S., Ewens, K.G. et al. (2005), ‘Mapping Natl. Acad. Sci. USA Vol. 97, pp.11038–11043. determinants of human gene expression by regional and genome-wide 52. Schadt, E.E., Monks, S.A., Drake, T.A. et al. (2003), ‘Genetics of gene association’, Nature Vol. 437, pp.1365–1369. expression surveyedinmaize,mouse and man’, Nature Vol. 422, 43. Monks, S.A., Leonardson, A., Zhu, H. et al. (2004), ‘Genetic inheritance pp.297–302. of gene expression in human cell lines’, Am. J. Hum. Genet. Vol. 75, 53. Wa lker,J.R., Su, A.I., Self,D.W. et al. (2004), ‘Applications of arat pp.1094–1105. multiple tissue gene expression data set’, Genome Res. Vol. 14, 44. Morley, M., Molony, C.M., We ber,T.M. et al. (2004), ‘Genetic analysis of pp.742–749. genome-wide variation in human gene expression’, Nature Vo l. 430, 54. Oleksiak, M.F., Churchill, G.A. and Crawford, D.L. (2002), ‘Variation in pp.743–747. gene expression within and among natural populations’, Nat. Genet. 45. Stranger,B.E., Forrest, M.S., Clark, A.G. et al. (2005), ‘Genome-wide Vol. 32, pp.261–266. associationsofgene expression variation in humans’, PLoS Genet. Vol. 1, 55. Oleksiak, M.F., Roach, J.L. and Crawford, D.L. (2005), ‘Natural variation p. e78. in cardiacmetabolism and gene expression in Fundulus heteroclitus’, Nat. 46. Brem, R.B.and Kruglyak, L. (2005), ‘The landscape of genetic com- Genet. Vol. 37, pp.67–72. plexity across 5,700 gene expression traits in yeast’, Proc.Natl. Acad. Sci. 56. Ya n, H., Yu an, W. ,Velculescu, V. E. et al. (2002), ‘Allelic variation in USA Vol. 102, pp.1572–1577. human gene expression’, Science Vol. 297, pp.1143. 47. Brem, R.B., Yvert, G.,Clinton, R. and Kruglyak, L. (2002), ‘Genetic 57. Herbert, A., Gerry,N.P., McQueen, M.B. et al. (2004), ‘A common dissection of transcriptional regulation in budding yeast’, Science Vol. 296, genetic variant is associated with adult and childhood obesity’, Science pp.752–755. Vol. 312, pp.279–283. 48. Yvert, G.,Brem, R.B., Whittle,J. et al. (2003), ‘Trans-acting regulatory 58. The ENCODE Project Consortium (2004), ‘The ENCODE variation in Saccharomyces cerevisiae and the role of transcription factors’, (ENCyclopedia Of DNA Elements) Project’, Science Vol. 306, Nat. Genet. Vo l. 35, pp.57–64. pp.636–640.

390 q HENRYSTEWART PUBLICATIONS 1473–9542. HUMANGENOMICS .VOL 2. NO 6. 383–390 JUNE 2006