<<

Nature Reviews Genetics | AOP, published online 11 June 2013; doi:10.1038/nrg3461 REVIEWS

GENOME-WIDE ASSOCIATION STUDIES Pleiotropy in complex traits: challenges and strategies

Nadia Solovieff1,2,3, Chris Cotsapas4,5, Phil H. Lee1,2,3, Shaun M. Purcell1,2,3,6 and Jordan W. Smoller1,2,3 Abstract | Genome-wide association studies have identified many variants that each affects multiple traits, particularly across autoimmune diseases, and neuropsychiatric disorders, suggesting that pleiotropic effects on human complex traits may be widespread. However, systematic detection of such effects is challenging and requires new methodologies and frameworks for interpreting cross- results. In this Review, we discuss the evidence for pleiotropy in contemporary genetic mapping studies, new and established analytical approaches to identifying pleiotropic effects, sources of spurious cross-phenotype effects and study design considerations. We also outline the molecular and clinical implications of such findings and discuss future directions of .

1Center for Human Genetics In the past 7 years, a wave of genome-wide association CP effects in GWASs mirror epidemiological obser- Research, Massachusetts studies (GWASs) has identified more than 8,500 vations of shared and comorbidity. For General Hospital, 185 Cambridge Street, Boston, genome-wide-significant associations with more than 350 example, twin and family studies have long provided Massachusetts 02114, USA. human complex traits, including susceptibility to a wide evidence for genetic correlations among diseases (such 2Department of Psychiatry, variety of diseases1. An interesting observation has as major depressive disorder and generalized anxiety Harvard Medical School, been that many genetic loci appear to harbour variants disorder10, or rheumatoid arthritis and systemic lupus 2 West, Room 305, 11 401 Park Drive, Boston, that are associated with multiple, sometimes seemingly erythematosus ), suggesting a role for pleiotropic Massachusetts 02215, USA. distinct traits, and such associations are termed cross- genetic effects. In addition, the co‑occurrence of multi- 3Stanley Center for phenotype (CP) associations. CP associations have ple diseases in the same individual (for example, type 1 Psychiatric Research, been identified in several disease areas. Examples diabetes and autoimmune thyroid disease12) also point Broad Institute of MIT and Harvard, 7 Cambridge Center, include: phosphatase non- to shared genetic causes. Cambridge, Massachusetts type 22 (PTPN22) for immune-related disorders, such In some cases, the same variants show association 02142, USA. as rheumatoid arthritis2, Crohn’s disease3, systemic with multiple traits; in other cases, although the same 4Departments of Neurology lupus erythematosus4 and type 1 diabetes5; the tel- overall region is implicated, distinct nearby mark- and Genetics, Yale University School of Medicine, Yale omerase reverse transcriptase (TERT)–CLPTM1‑like ers show signals of association with different traits. University School of Medicine, (CLPTM1L) for glioma, bladder and lung Distinguishing the associations that represent genu- 333 Cedar Street, New Haven, cancers6; and channel, voltage-dependent, inely shared effects of single variants from those that Connecticut 06520, USA. L-type, alpha 1C subunit (CACNA1C) for bipolar dis- represent the effects of colocalizing but independent 5 Medical and Population order and schizophrenia7. These CP associations high- variants is crucial, as they imply different notions of Genetics, Broad Institute of MIT and Harvard, 7 light that these traits share common genetic pathways pleiotropy and mechanistic models of shared function. Cambridge Center, Cambridge, and underscore the relevance of pleiotropy8,9 in human In this article, we define three types of such CP genetic Massachusetts 02142, USA. complex disease. The distinction between a CP associ- effects that occur when a genetic variant or is cor- 6 Division of Psychiatric ation and pleiotropy is important to define. A CP asso- related with more than one trait: biological pleiotropy, , Mount Sinai School of Medicine, 1 Gustave ciation occurs when a genetic locus is associated with mediated pleiotropy and spurious pleiotropy. In brief, L. Levy Place, New York, more than one trait in a study, regardless of the under- biological pleiotropy refers to a genetic variant or gene New York 10029–6574, USA. lying cause for the observed association. Pleiotropy that has a direct biological influence on more than one Correspondence to J.W.S. occurs when a genetic locus truly affects more than . Mediated pleiotropy occurs when e‑mail: jsmoller@hms. harvard.edu one trait and is one possible underlying cause for one phenotype is itself causally related to a sec- doi:10.1038/nrg3461 an observed CP association (others are discussed ond phenotype so that a variant associated with the Published online 11 June 2013 below). first phenotype is indirectly associated with the second.

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 1

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Table 1 | CP associations in the literature Type Locus Result Refs SNP (same rs11209026 Crohn’s disease, ankylosing spondylitis, The minor (A) of rs11209026 is protective for 125–128 direction of risk) (IL23R) ulcerative colitis, psoriasis Crohn’s disease, ankylosing spondylitis, ulcerative colitis and psoriasis SNP (same rs6983267 Prostate and colorectal The G allele increases risk for prostate cancer and 23,24 direction of risk) (8q24) colorectal cancer SNP (different rs12720356 Crohn’s disease and psoriasis The G allele increases risk for Crohn’s disease and 128,129 direction of risk) (TYK2) decreases risk for psoriasis Gene (different DNAH11 LDL cholesterol and multiple myeloma rs12670798 is associated with LDL cholesterol 130,131 SNPs) and rs4487645 is associated with multiple myeloma Gene (different FTO BMI and melanoma rs8050136 is associated with body mass index 17,18 SNPs) and rs16953002 is associated with melanoma Region (different 9q21.3 Coronary artery disease, glioma, intracranial rs4977574 is associatied with coronary artery 19–22 SNPs) aneurysm disease, rs4977756 with glioma, rs1333040 with intracranial aneurysm Copy number 16p2.11 , , , CNV duplication increases risk for all five disorders 26 variation duplication developmental delay, congenital malformations Copy number 7q11.23 Autism and Williams–Beuren syndrome CNV deletion causes Williams–Beuren syndrome and 132,133 variation de novo CNV duplication increases risk for autism Pathway Immune cell Autoimmune thyroid disease, coeliac disease, in this pathway have been implicated across 34 signalling Crohn’s disease, rheumatoid arthritis, systemic six diseases lupus erythematosus, T1D Polygenic scores – Schizophrenia and Schizophrenia and bipolar disorder share genetic 28 factors that increase risk to both disorders Genetic – T2D and Positive between T2D and 41 correlation hypertension suggests that shared genetic factors increase risk for both traits BMI, body mass index; CNV, copy number variant; CP, cross-phenotype; DNAH11, dynein, axonemal, heavy chain 11; FTO, fat mass and obesity associated; IL23R, interleukin 23 receptor; LDL, low-density lipoprotein; SNP, single-nucleotide ; T1D, type 1 diabetes; T2D, type 2 diabetes; TYK2, tyrosine kinase 2. This table provides some examples of different types of observed CP effects. These are illustrative examples and are not exhaustive; many additional CP associations have been published.

Spurious pleiotropy encompasses various sources of be conservative. Nonetheless, a startling level of overlap Genome-wide association bias that cause a genetic variant falsely to appear to be has been observed. studies (GWASs). Studies in which associated with multiple phenotypes. A recent evaluation of genome-wide-significant hundreds of thousands (or Here, we first review evidence of CP associations in single-nucleotide polymorphisms (SNPs) listed in the millions) of genetic markers are the literature and the underlying causal models that they National Human Genome Research Institute (NHGRI) tested for association with a imply. We next outline the analytical strategies that are Catalogue of Published Genome-Wide Association phenotypic trait; they are an required for detecting CP effects, particularly methods Studies found that 4.6% of SNPs and 16.9% of genes have unbiased approach to survey 13 the entire genome for that can be readily applied to existing GWAS data sets, and CP effects . These are underestimates as they rely on disease-associated regions how the types of pleiotropy can be distinguished highly conservative criteria (for example, an association of using common variation. and functionally characterized. Finally, we discuss the genome-wide significance for each trait) and were limited clinical implications of CP associations and visions for by the incomplete database of GWAS-associated SNPs at Genome-wide-significant A term describing the the future. Overall, we conclude that despite various the start of 2011. The first examples of cross-disease meta- statistical significance conceptual and technical challenges, the identification analyses (using methods described later) have discovered threshold that accounts for and characterization of this widespread pleiotropy is even higher levels of overlap: Cotsapas et al.14 estimate multiple testing in GWASs. crucial for a comprehensive biological understanding that at least 44% of SNPs associated with one autoim- of complex traits and disease states. mune disease are associated with another. Interestingly, Complex traits 15 Traits controlled by a Sirota et al. show that opposite effects — in which an combination of many genes Cross-phenotype effects in GWASs allele appears to increase the risk of one disease trait and and environmental factors. The results of GWASs have highlighted numerous CP decrease the risk of another disease trait — are also fre- effects, particularly across autoimmune diseases and quent. Recently, a large meta-analysis of Crohn’s disease Pleiotropy (TABLE 1) A gene or genetic variant psychiatric traits . Such observations have usu- and ulcerative colitis identified 110 SNPs that are associ- that affects more than one ally been incidental, and studies of different traits have ated with both disorders and found that 70% of SNPs phenotypic trait. independently led to discoveries of associations with the were shared across other immune-mediated diseases16. same marker or region. As the power of most GWASs is A CP association can be observed for an individual Heritability sufficient to detect only a subset of the many true asso- SNP or at the level of a gene or region (including in non- The proportion of phenotypic variance attributed to genetic ciations, the chance of two independent studies both coding DNA), in which different independent variants differences among individuals detecting a true association at the same locus is corre- in the same gene or region affect multiple phenotypes. in a population. spondingly low. Estimates of overlaps are thus likely to Both SNP-level and gene- or region-level CP effects

2 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

a b c Biological pleiotropy: Biological pleiotropy: different causal Biological pleiotropy: different causal single causal variant variants colocalizing in same gene and variants colocalizing the same gene tagged by the same genetic variant

P1 P2 P1 P2 P1 P2

s s s s

d e f Mediated pleiotropy Spurious pleiotropy: design artefact Spurious pleiotropy: causal variants in different genes Misclassification or ascertainment bias P1 P2 P1 P2 P1 P2

s s s Region of strong LD

Gene Causal variant (generally not observed) s Genetic variant identified in GWASs

Figure 1 | Types of pleiotropy. In each scenario, the observed genetic variant (S) is associated with phenotypes 1 Nature Reviews | Genetics and 2 (P1 and P2). We assume that the observed genetic variant is in linkage disequilibrium (LD) with a causal variant (red star) that affects one or more phenotypes. In some cases, the causal variant may be identified directly and the figures can be simplified accordingly. The various figures correspond to the unobserved underlying pleiotropic structure. a | Biological pleiotropy at the allelic level: the causal variant affects both phenotypes. b | Colocalizing association (biological pleiotropy): the observed genetic variant is in strong LD with two causal variants in the same gene that affect different phenotypes. c | Biological pleiotropy at the genic level: two independent causal variants in the same

gene affect different phenotypes. d | Mediated pleiotropy: the causal variant affects P1, which lies on the causal path to

P2, and thus an association occurs between the observed variant and both phenotypes. e | Spurious pleiotropy: the

causal variant affects only P1, but P2 is enriched for P1 owing to misclassification or ascertainment bias, and a spurious association occurs between the observed variant and the phenotype 2. f | Spurious pleiotropy: the observed variant is in LD with two causal variants in different genes that affect different phenotypes. GWAS, genome-wide association study.

can be considered to be real forms of pleiotropy and Finally, studies using aggregate measures of genetic Colocalizing polygenic Different genetic variants in provide insight into the shared underlying biology. For variation (such as genetic risk scores) have been high linkage disequilibrium example, variants in 1 of fat mass and obesity asso- used to demonstrate genetic covariation between two or located in the same gene that ciated (FTO) have been robustly associated with body more disorders. For example, using molecular genetic affect different phenotypes. mass index (BMI)17. Recently, variants elsewhere in the data, Purcell et al.28 showed that a substantial propor- linkage disequilibrium Single-nucleotide gene (and not in apparent with tion of heritability is shared between schizophrenia and polymorphisms the obesity-associated SNPs) have been associated bipolar disorder, which is consistent with family-based Single-nucleotides in the with melanoma and not with BMI18. CP effects outside epidemiological studies29. genome that vary across protein-coding genes include the 9q21.3 locus19–22 and individuals in the population. rs6983267 (REFS 23,24; TABLE 1) and point to possible cis- Biological pleiotropy 25 Linkage disequilibrium regulatory effects on gene expression . In fact, the 88% Characterizing the underlying biological mechanism (LD). The correlation between of SNPs reported in the NGHRI catalogue are intronic or of a pleiotropic effect is a major challenge in the field genetic markers owing to intergenic1. GWASs and other genomic analyses have also as many alternative models for an apparent CP effect limited recombination. identified rare structural variations that have CP effects. can fit the observed data (FIG. 1). Pleiotropy can occur copy number variants Copy number variants For example, rare (CNVs) in mul- at the allelic level, where a single causal variant is Regions of the genome in tiple chromosomal regions have been found to increase related to multiple phenotypes (FIG. 1a), or at the gene which the copy number is the risk of a range of neurodevelopmental disorders26,27. (or region) level, at which multiple variants in the same polymorphic (for example, Distinguishing between biological and spurious pleiotropy gene (or region) are associated with different pheno- deletions and duplications) for CNVs is particularly challenging because it is unclear types (FIG. 1b,c). For example, the common coding vari- across individuals. whether the same gene affects multiple traits (for bio- ant in PTPN22 described above seems to influence the 30 Polygenic logical pleiotropy) or whether different genes within the function of various subpopulations of T cells but also Controlled by many genes. region affect different traits (for spurious pleiotropy). interferes with the removal of auto-reactive B cells31.

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 3

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Box 1 | Spurious CP associations and study design considerations An example of biological pleiotropy in an intergenic region is the rs6983267 SNP on 8q24 Ascertainment bias can induce spurious cross-phenotype (CP) effects and occurs when that is a risk variant for prostate and colorectal cancer the recruitment of individuals with one phenotype increases the prevalence of a second (TABLE 1). This allele alters the ability of this region to act unrelated phenotype92 in the cohort and thus the spurious correlation between them 93 as an for the downstream MYC oncogene in (FIG. 1e). Such ascertainment bias (for example, Berkson’s bias ) is common in clinically 35,36 ascertained samples, as patients suffering from two conditions are often more likely to both colon and prostate tissue types . seek treatment than those suffering from one92. In addition, unaffected control individuals are often shared across multiple studies, and a spurious CP association Mediated pleiotropy could occur if an artefact (such as population stratification or batch effects) CP effects can also occur when one phenotype is causal systematically biases the shared controls and not the cases. for a second phenotype and a genetic variant is directly Spurious CP effects between two phenotypes can also occur when subjects with one (or ‘more proximally’) associated with the first pheno- phenotype are systematically misclassified with a different phenotype (FIG. 1e). For type (FIG. 1d). In such cases of mediated pleiotropy, the example, patients with schizophrenia can sometimes be misdiagnosed with bipolar genetic variant will be associated with both phenotypes disorder and vice versa, and this could result in artefactual genetic correlation between if tested separately. Mediated pleiotropy is a real form the traits and hence could generate spurious CP effects. However, the misclassification of pleiotropy, in contrast to spurious pleiotropy, but it is rate must be quite high94 to have a substantial impact on the genetic correlation. In the cases of schizophrenia and bipolar disorder, the misclassification rate would need to be important to distinguish this category from what we call larger than 20% to generate the genetic correlation (0.60) observed between the traits biological pleiotropy in order to describe the underly- if the true genetic correlation were zero94. Although misclassification must be carefully ing aetiology of the phenotypes properly. For example, considered as a source of bias during the design of the study, it is unlikely, in our opinion, genetic variants have been found to be associated with that a substantial number of reported CP effects are caused by this type of bias. both low-density lipoprotein (LDL) levels and risk of These sources of bias are relevant for both multivariate and univariate analytical myocardial infarction37. However, LDL levels are them- approaches (see the main text) and can be avoided with careful study design. General selves risk factors for myocardial infarction, so we must guidelines for study design of genome-wide association studies (GWASs) should be deconvolute whether a genetic variant influences myo- followed, including appropriate control selection, adequate quality control and cardial infarction risk by altering LDL levels or whether 95,96 adjustment for population stratification . Population stratification is a major source it has an additional effect that is independent of LDL of confounding in GWASs, and established methods of population stratification 97,98 levels. Another example includes the observed asso- adjustment should be applied within studies . In addition, appropriate adjustments 38 for multiple testing95 should be implemented to avoid false-positive CP associations, ciation of 15q24–15q25.1 with lung cancer and nico- 39 which could inflate the observed genetic overlap between traits. When testing for CP tine dependence , which has spurred a debate about effects across different studies (which is particularly relevant for univariate whether this region has a direct effect on lung cancer40. approaches), similar populations should be used, as the underlying linkage disequilibrium structure varies across populations and the single-nucleotide Sources of spurious pleiotropy polymorphism (SNP) of interest may differentially tag the causal variant99. Additionally, There are several ways in which a spurious CP associa- as the same individuals (particularly controls) might be used in multiple studies, tion can occur and falsely suggest underlying pleiotropy. the overlap across samples should be minimized and appropriately accounted for in the These include defects in the studies that identify CP analysis. If the participant overlap is not accounted for, the estimates of effects can be effects, such as ascertainment bias, phenotypic misclas- biased, and the discovery power may be reduced66. A further consideration is that when (FIG. 1e) combining phenotypes across studies, participants may have different sets of SNPs sification and shared controls . Further details available owing to differences in genotyping arrays; in this setting, genotype on these aspects and their minimization by careful study imputation100 can be used to obtain the same set of SNPs for all individuals. Special design are described in BOX 1. considerations are needed when the proportion of cases differs across genotyping Additionally, spurious associations can arise when arrays, as differences in genotype quality can induce spurious findings in this setting100. there is ambiguity in mapping the true underlying causal variant. There is currently limited evidence that the pri- mary SNPs identified in GWASs are causal variants; The equivalent variant in mice promotes degradation of instead, they are often tag SNPs that typically associate LYP (also known as PEP), which is the protein encoded with the trait because they are in strong linkage disequi- by PTPN22. This suggests that this is a loss‑of‑function librium (LD) with the nearby causal variant. In regions 32 Population stratification allele , although much more work is required to dem- of high LD, such a SNP could tag multiple causal variants A source of bias in onstrate the causal mechanism33. This variant decreases located in different genes with completely different func- genome-wide association the risk of Crohn’s disease but increases the risk of rheu- tions and thus lead to a spurious CP finding (FIG. 1f). This studies that occurs when a matoid arthritis and type 1 diabetes34, prompting ques- issue can be demonstrated by the major histocompat- phenotype and the of a single- tions about whether the opposite effects correspond ibility complex region that has been implicated in many 34 nucleotide polymorphism vary to functional changes in different cells or whether the complex traits, including autoimmune diseases . This owing to ancestral differences. overall homeostatic changes to T and/or B cell popu- region contains more than 100 genes and has high lev- lations are responsible for risk versus protective states. els of LD. A CP association in this region will probably Batch effect Systematic biases in the data At first glance, several scenarios fit these observations: tag multiple genes, and thus it can be particularly chal- that arise from differences in distinct effects of the same allele in different cell popula- lenging to distinguish between biological and spurious sample handling. tions underlying associations with different diseases or pleiotropy. disease groups; a single molecular effect having multiple Genotype imputation morphological or physiological consequences; or a CP Analytical strategies to identify CP effects Inference of missing genotypes or untyped single-nucleotide effect tagging two different causal variants within the Many methods have been proposed to test the associa- polymorphisms using statistical same gene (FIG. 1b) that result in different functions and tion between a genetic variant and multiple phenotypes. techniques. affect different phenotypes. These can be broadly classified into multivariate analyses

4 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

and univariate analyses, and the most suitable approach Other approaches include a dimension reduction depends on the circumstances. These methods facilitate technique on the phenotypes before testing the associa- the initial identification of CP effects, and details of study tion with the genetic variant. Principal components analysis design considerations to minimize spurious associations (PCA)49–51 extracts linear combinations (that is, principal are discussed in BOX 1. The subsequent approaches for components) of the traits that can be used as the pheno- Ascertainment bias classifying and characterizing the identified CP effects types in a genetic association analysis. Canonical cor- A consequence of collecting a are discussed later. relation analysis52 extracts a linear combination of the nonrandom subsample with Before searching for specific CP variants, it is possi- phenotypes that explains the largest amount of covaria- a systematic bias so that results based on the subsample are ble first to implement a polygenic approach that uses all tion with the genetic variant. The weights for the linear not representative of the entire or a large proportion of SNPs genome-wide to establish combination will differ for each genetic variant, in con- sample. genetic overlap between two traits. For example, com- trast to PCA, and will provide information about which mon genetic variants were found to underlie schizophre- phenotypes are most strongly related. Tag SNPs nia and bipolar disorder (as shown by polygenic scoring)28 These and other multivariate methods have recently Single-nucleotide polymorphisms (SNPs) chosen and also type 2 diabetes and hypertension (as shown by been reviewed, and we refer the reader to those 41 53 to represent a region of the a linear mixed-effect model) . Note that both approaches summaries for further details. genome owing to strong assess whether pleiotropy exists between phenotypes linkage disequilibrium. but do not point to any particular variant or region of Univariate approaches. It is also possible to combine

Multivariate analyses the genome. results from standard univariate analyses (such as GWAS The simultaneous inclusion of associations between variants and single phenotypes) two or more phenotypes in Multivariate approaches. Multivariate analyses jointly by combining these associations across various pheno- one analysis when testing the analyse more than one phenotype in a unified frame- types to identify those variants that are associated with association with a genetic work and test for the association of multiple pheno- multiple phenotypes (summarized in TABLE 2). Thus, variant. types with a genetic variant. Because most multivariate univariate approaches are well suited to analysing Univariate analyses methods require that all phenotypes be measured on existing GWAS results, including the plethora of well- Tests of association between the same individual, they are only well suited for studies powered GWASs conducted54 by consortia already one phenotype and a genetic in which subjects are phenotyped across various dis- organizing themselves into cross-disease groups (such variant. eases (for example, large cohort studies or cross-sectional as the Psychiatric Genomics Consortium7). This will Polygenic scoring studies). This is usually not feasible for diseases with be especially important for rare diseases, which are A score that aggregates the a low prevalence, which are typically collected using a less likely to be ascertained in cohort studies. As the number of risk a subject case–control study design. However, if phenotyping indi- genetic effects for most complex traits are small54, com- carries weighted by the effect viduals on all traits is possible, this allows the investiga- bining results across studies of different phenotypes size of the allele for a particular trait. The risk allele and effect tion of the correlations between the traits themselves, can improve the power of detecting CP associations. size for each single-nucleotide rather than just testing of associations between genetic This improvement in power will generally outweigh the polymorphism is generally variants and the traits. One complication of multivari- advantages of using one study in which individuals are taken from a genome-wide ate methods is that they generally require pooling of phenotyped on all traits. Another advantage of univari- association study of an independent study. individual-level data, and this may not be possible with- ate approaches is that, unlike multivariate approaches, out reacquiring patient consent, implementing privacy most of them are based on summary statistics, which Linear mixed-effect model protection measures and seeking additional ethical do not divulge individual-level data and thus maintain A linear model that contains review board approval. participant confidentiality. both fixed and random effects. Numerous multivariate approaches have been pro- The simplest univariate approach is to take the This type of model can be used to estimate genetic posed for testing the association between a genetic vari- known genome-wide-significant associations between correlation between traits ant and multiple phenotypes, particularly for correlated variants and individual phenotypes and to compare the using a genome-wide set of phenotypes. The choice of method will largely depend results across multiple phenotypes. CP effects are then single-nucleotide on the types of traits (that is, continuous, categorical or declared at markers that satisfy the significance thresh- polymorphisms. binary) included in the analysis. For continuous pheno­ old for multiple traits. Alternatively, the set of genome- Cohort studies types, a multivariate regression framework (such as a wide-significant SNPs for one phenotype can be tested Observational studies in which multivariate analysis of variance) can be used, but the for association with other phenotypes; in this case, the defined groups of people (the approach requires that the phenotypes are approximately significance level for multiple testing is adjusted only cohorts) are followed over time normally distributed. Several methods extend the regres- for the number of tested SNPs rather than for SNPs and outcomes are compared generalized estimating in subsets of the cohort who sion framework, using variations of genome-wide. Both of these approaches require robust were exposed to different equations (GEE), to allow non-normally distributed discovery as a starting point: because the known associa- levels of factors of interest. phenotypes42–44. To model multiple categorical pheno­ tions are probably only a subset of the true associations These studies can either types (for example, multiple binary disease traits), a (even in traits for which large sample sizes have been be prospectively or log-linear model45 Bayesian network46 28,55 retrospectively carried out and a have been analysed ), these analyses are fairly underpowered and from historical records. used. In addition, there are several approaches that can will overlook SNPs that are only moderately associated accommodate a mixture of continuous and categorical across a set of phenotypes. Cross-sectional studies phenotypes44,47,48. Ordinal regression47 uses the genotype Variations on meta-analysis have also been adapted Studies in which data are as the outcome variable and the set of phenotypes as the for CP effect detection. Traditional meta-analysis collected on subjects at one non-parametric approach specific point in time and predictors. A has been devel- approaches combine evidence for association with the subjects are not selected for a oped for a mixture of phenotypes but cannot incorporate same phenotype across numerous studies; for discover- particular trait or exposure. additional predictors beyond the genetic variant48. ing CP effects, the evidence for association is combined

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 5

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Table 2 | Univariate approaches for detecting CP associations Input Explicit Allows effect Types of Accommodates Combine Identify Genetic Refs test of CP heterogeneity phenotype (such overlapping data across subset of variant association as continuous or subjects multiple associated versus categorical) studies phenotypes region Fisher P value No Yes Any No Yes No Variant 56 CPMA P value Yes Yes Any No Yes No Variant 14 Fixed Effect No No Same type; need No Yes No Variant 54,57, effects estimate to standardize 58|| meta- continuous analysis phenotypes Random Effect No Moderate level; Same type; need No Yes No Variant 54,57, effects estimate not opposite to standardize 58|| meta- effects continuous analysis phenotypes Subset- Effect No Yes Same type; need No; offer Yes Yes Variant 59 based estimate to standardize extension to meta- continuous account for some analysis phenotypes overlap Extensions Effect No Yes Any Yes; all subjects No§ No Variant 61,62 to O’Brien estimate overlap* TATES P value No Yes Any Yes; all subjects No§ No Variant 63 overlap‡ PRIMe P value No Yes Any Yes Yes No Region 64 CP, cross-phenotype; CPMA, cross-phenotype meta-analysis; PRIMe, Pleiotropy Regional Identification Method; TATES, Trait-based Association Test that uses Extended Simes. *Can accommodate values missing completely at random. ‡Can accommodate values missing completely at random and blockwise missingness. §Can combine across multiple studies if all subjects have non-missing values for all phenotypes; TATES can accommodate situations in which a subset of studies have missing values for a subset of the phenotypes. ||References are given for meta-analytical methods typically used in genome-wide association studies.

across studies of multiple phenotypes. Meta-analytical from increased numbers of phenotypes, making it approaches aggregate summary statistics from individ- particularly well suited to broad phenotypic surveys. ual studies into one statistic to test for CP effects and Standard meta-analysis based on effect estimates can be applied genome-wide or on a pre-specified set of is commonly used to combine evidence of association SNPs. Broadly speaking, these methods can be split into across multiple GWASs for the same phenotype54,57,58 and two groups. First, those methods based on association has also been used to combine evidence across multiple P values ignore allelic effect direction (a positive versus phenotypes7. Fixed-effects meta-analysis assumes that Case–control study negative effect on the trait) and effect heterogeneity (dif- the genetic variant has the same effect on each pheno- Compares cases (that is, a ferent effect sizes across traits). Second, methods based type, whereas random-effects meta-analysis allows the selected group of individuals: for example, those diagnosed on the effect sizes are sensitive to allelic effect direc- genetic effect to differ across phenotypes. Although with a disorder) with controls tion and effect size. We note that in GWASs in which random-effects meta-analysis incorporates a moderate (that is, a comparison group of effect sizes are generally very small, accounting for effect level of effect heterogeneity, it is not well suited for situa- individuals: for example, those heterogeneity may be of less concern. tions in which the genetic variant has opposite effects on who are not diagnosed with 56 the disorder). Genome-wide The simplest meta-analytical approach aggregates different phenotypes. In addition, both will have lower association case–control P values across phenotypes in different studies to test the power when only a subset of analysed phenotypes is studies test whether genetic null hypothesis that the genetic variant is not associated associated. marker allele frequencies differ with any phenotype. Note that this approach (which is The subset-based meta-analysis59 extends standard between cases and controls. similar to most methods in this section) does not explic- fixed-effects meta-analysis to allow for opposite effects Generalized estimating itly test for CP effects, as a significant association could and to include situations in which association is only to equations be driven by one phenotype as opposed to two or more a subset of traits. This method exhaustively evaluates all A statistical technique used phenotypes. possible combinations of non-null models for associa- to estimate regression The cross-phenotype meta-analysis (CPMA) sta- tion, selects the strongest association and then adjusts parameters that does not 14 require the joint distribution tistic also uses association P values and tests whether for the multiple comparisons generated by the search. of the variables to be fully the observed P values deviate from the expected dis- At present, this is the only method that identifies which specified. tribution of P values under the null hypothesis of no traits a variant influences (through the model selection additional associations beyond those already known. step), but this advantage comes with a steep multiple Log-linear model Because the alternative hypothesis includes only mod- testing price: the number of possible non-null combi- A statistical model that captures the dependence els in which two or more of the phenotypes are associ- nations to be adjusted for exponentially increases with among a set of categorical ated with the SNP, this approach explicitly tests for CP the number of traits selected so that detection power variables. effects. It is also worth noting that this approach benefits decreases for even moderate phenotype counts.

6 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Several groups have proposed extensions to O’Brien’s same subjects, alternative methods can be used, includ- linear combination test60, which uses a weighted sum of ing the extensions to the O’Brien linear combination test, the univariate test statistics. The extensions61,62 account the TATES procedure or one of the many multivariate for effect heterogeneity by allowing the weights to dif- approaches. fer by phenotype and mainly differ in how they arrive at those weights. These approaches were specifically Distinguishing and characterizing CP effects developed for correlated traits measured in the same The forms of pleiotropy are important to distinguish individuals and simplify to standard meta-analysis if the because they imply distinct molecular mechanisms and 61 Bayesian network underlying data are taken from independent studies . have different implications for disease risk and patho- A network that captures Similarly to the O’Brien’s test, the ‘Trait-based genesis. Strategies to achieve this are described below, relationships between variables Association Test that uses Extended Simes’ (TATES) and further functional characterization of CP-effect loci or nodes of interest (for procedure63 was developed to detect effects across cor- is discussed in BOX 3. example, phenotypes and related traits measured in the same individuals but in SNPs). Bayesian networks can incorporate prior information contrast uses the P value for each trait. For each variant, Fine mapping to distinguish biological and spurious in establishing relationships the approach takes the minimum P value across the set pleiotropy. Careful study design is required in order between variables. of univariate tests carried out on each phenotype and to minimize the identification of spurious pleiotropy then applies a weight to the P value to account for the caused by artefactual CP associations (BOX 1); addition- Ordinal regression fine mapping A regression model in which number phenotypes tested and their correlation. ally, when feasible, of the region that sur- the outcome variable is ordinal. The ‘Pleiotropy Regional Identification Method’ rounds a CP effect can help to discriminate spurious (PRIMe)64 searches for regions of the genome that con- from biological pleiotropy. Such mapping is used to Non-parametric approach tain genetic variants associated with multiple traits but locate the causal variant or variants that are responsible A statistical analysis method does not require that the same genetic variant be asso- for a CP effect. If a single variant or variants in the same that does not rely on specific distributional assumptions (for ciated with multiple phenotypes. For each region, the gene are causal for the diseases, this indicates biological example, normality) for the approach calculates a pleiotropic index as the number of pleiotropy (FIG. 1a–c), whereas causal variants in different variables being analysed. traits that have at least one SNP with a univariate P value genes that are in LD is suggestive of spurious pleiotropy less than P (which is a pre-defined threshold) and then (FIG. 1f). Fine mapping can also aid in distinguishing the Principal components S analysis assesses the significance of the pleiotropic index. A different forms of biological pleiotropy and, in particu- A statistical method used related approach assesses whether expression quantitative lar, can identify whether the observed CP association to simplify data sets by trait loci (eQTLs) overlap disease associations; identifying is driven by one variant (FIG. 1a) or multiple variants transforming a series of effects on gene expression that result from variants in (FIG. 1b,c) in the same gene that is associated with dif- correlated variables into the identified region increases the confidence that this ferent phenotypes. This can be particularly challenging a smaller number of uncorrelated factors. It is region harbours causal molecular candidates underlying when two variants in the same gene are in strong LD 65 also commonly used to infer the trait . and may be related to different diseases (FIG. 1b), because continuous axes of variation Overall, choosing the appropriate statistical approach these variants will typically co‑occur in individuals, such in genetic data, often for detecting a CP variant depends on study design, the that the effects of each individual SNP will rarely be able representing genetic ancestry. type of phenotype, assumptions on effect heterogeneity to be dissected. For common diseases that can co‑occur Summary statistics and other factors that are summarized in TABLE 2. We in the same individual, variants for the first disease can A statistic that summarizes a will not enumerate all possible scenarios but aim to pro- be mapped in the presence of the second disease and set of observations. In the vide some general guidelines. When focusing on a small then in its absence to establish which variant is related context of genome-wide number of phenotypes (such as five or less) that are of to the first disease (and vice versa). association studies, meta-analyses can be carried the same type (for example, all binary or all continuous), Custom genotyping arrays have been designed to out solely by using summary standard meta-analysis can be used, but this has the dis- fine-map regions identified in GWASs for immune- statistics and typically include advantage that SNPs with opposite effects on the pheno- mediated traits (Immunochip67) and for metabolic, car- estimates of the effect size types will be missed. CPMA can accommodate opposite diovascular and anthropometric traits (Metabochip68). (for example, odds ratio) and standard error. risk effects and different types of phenotypic traits and is This provides a low-cost alternative to sequencing and well suited for moderate to large numbers of phenotypes allows for fine mapping in large sample sizes. Effect heterogeneity (such as more than five). After conducting standard Finally, it is worth noting that in many cases, estab- Different effect sizes meta-analysis or CPMA genome-wide, a model selection lishing whether a variant is truly causal cannot be across phenotypes. technique (for example, subset-based meta-analysis) established by fine mapping alone and requires biologi- Expression quantitative can be applied to the top selection of SNPs to refine the cal and animal studies to determine the exact function trait loci association and to identify which of the phenotypes is of the variant (BOX 3). Loci at which genetic driving the signal (BOX 2). When there are overlapping allelic variation is associated subjects (for example, shared controls) across studies, Identifying mediated pleiotropy. In cases of potential with variation in gene expression. the overlapping subjects can be split across the differ- mediated pleiotropy, the association between the genetic ent studies, and then univariate tests are carried out so variant and the second phenotype (that is, target pheno- Fine mapping that each subject is used only once. Then the tests can type) can be tested while adjusting or stratifying by the Extensively genotyping or be assumed to be uncorrelated. Alternatively, Lin et al.66 first (that is, intermediate phenotype). If the association sequencing a region of the have provided an adjustment for overlapping subjects for persists (that is, if the variant is associated with the tar- genome that was identified 59 in genome-wide association standard meta-analysis, and Bhattacharjee et al. have get phenotype even when the intermediate phenotype studies to identify the causal proposed a similar extension to the subset-based meta- is not present), then the CP effect is probably not fully variant. analysis. Finally, if the phentoypes are measured on the mediated. However, this approach can produce biased

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 7

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Box 2 | An analysis to identify CP effect loci for psychiatric disorders a Individual studies of different phenotypes

Attention-deficit Individual-level Autism spectrum Bipolar Major depressive hyperactivity Schizophrenia data combined by disorder disorder disorder disorder phenotype; GWAS for each phenotype

Meta-analysis across five phenotypes Meta-analysis to detect cross- phenotype effects

b –8 –8 rs11191454, P = 1.4 × 10 rs1024582, P = 1.9 × 10

Attention-deficit hyperactivity disorder

Autism spectrum disorder

Bipolar disorder

Major depressive disorder

Schizophrenia

All

–0.05 0 0.05 0.10 0.15 0.20 0.25 –0.05 0 0.05 0.10 0.15 0.20 ln(odds ratio), 95% confidence interval ln(odds ratio), 95% confidence interval

To illustrate a realistic application of a meta-analysis of cross-phenotype meta-analysed results (‘all’ in the figure). Fixed-effects meta-analysis Nature Reviews | Genetics (CP) effects, we provide examples from a study of psychiatric disorders. was chosen because the power can be higher than random-effects The Psychiatric Genomics Consortium (PGC) conducted a large- analysis for situations in which effects are not substantially different. scale meta-analysis in 61,220 cases and controls across five A significant CP result indicates that the SNP is associated with at least psychiatric disorders7: autism spectrum disorder, attention-deficit one of the phenotypes but requires an additional step to identify which hyperactivity disorder, bipolar disorder, major depressive disorder and phenotypes are driving the association (note that most meta-analytical schizophrenia (see part a of the figure). As data were collected across approaches require this step (TABLE 2)). To identify which of the five dozens of studies in over 19 countries and were genotyped on different phenotypes were associated, the authors used a multinomial logistic arrays, all individual-level raw data were subjected to the same regression model developed by Lee et al.45 that allows comparisons quality-control measures and followed the same protocol for imputation between multiple CP-specific disease models and uses a model selection to obtain a common set of single-nucleotide polymorphisms (SNPs). This step to identify the best-fitting configuration of disorder-specific CP step is essential for reducing biases in the data that can lead to spurious effects. The approach jointly models multiple categorical phenotypes and CP associations (BOX 1). In addition, controls appearing in more than one requires the availability of individual-level genotype data. The model study were randomly assigned to a control group for one of the selection technique found that the best-fit model indicated an effect on phenotypes. all five phenotypes for rs11191454 and an effect limited to schizophrenia Univariate genome-wide association studies (GWASs) were carried out and bipolar disorder for rs1024582. for each phenotype after combining individual-level data for each disorder. In addition to identifying SNP-level CP effects, polygenic scoring A fixed-effects meta-analysis was carried out on the summary statistics analyses were conducted to assess the overall evidence for pleiotropy from the univariate GWAS to test for CP effects genome-wide and among these disorders using thousands of SNPs in aggregate. The identified four genome-wide-significant SNPs: two are shown in forest plots results indicated significant genetic overlap among schizophrenia, in part b of the figure. In each forest plot, the effect size and 95% confidence bipolar disorder and major depressive disorder and also between autism interval are plotted for each individual phenotype and for the overall spectrum disorders and schizophrenia, although to a lesser extent.

8 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Confounding factor results when the phenotypes share a confounding factor More generally, identifying mediated pleiotropic 69 A variable (for example, batch that is influenced by the genetic variant . To address this genetic effects can provide a tool by which causation effects or population structure) shortcoming, approaches using causal inference meth- and correlation can be teased apart under some con- that is associated with both the odology have been developed to test whether a genetic ditions in an approach called Mendelian randomiza- genotype and the phenotype 37,73,74 (BOX 4) of interest and can give rise to variant influences the target phenotype through a path tion . This framework for causal inference 69–71 a spurious association. that does not involve the intermediate phenotype . tests whether the intermediate phenotype causally Such an approach demonstrates that the association affects the target phenotype. Specifically, if the effect of between SNPs at 15q25.1 with both smoking and lung a genetic variant can be taken as a proxy for the inter- cancer mostly reflects direct effects on each phenotype, mediate phenotype, this is used to establish the causal rather than mediated pleiotropy72. relationship between the intermediate phenotype and the disease. Using , Voight and colleagues37 found that LDL levels causally affect myocardial infarction risk, whereas high-density lipo- Box 3 | Functional characterization of CP effects protein (HDL) levels do not. This counter-intuitive result For a functional understanding of how a cross-phenotype (CP)-effect locus contributes suggests that low HDL may be a consequence rather to disease, various computational and experimental steps are carried out. The starting than a cause of myocardial infarction risk, thus chal- point is typically a genetic variant that was initially identified by genome-wide lenging the established view that increasing the levels of association studies (GWASs) and has subsequently satisfied CP-effect criteria across HDL cholesterol will uniformly lower the risk of myo- multiple traits. Therefore, many characterization steps are common between standard cardial infarction. However, we note that the assump- GWAS variants and CP-effect variants. These variants are typically not causal because tions underlying Mendelian randomization are quite they are usually only tags of the true causal variant or variants. (BOX 4) Sequencing and fine mapping are necessary steps for identifying causal variants in strong , and thus extreme care needs to be taken in the region that affect more than one trait. After the causal variants have been experimental design and data interpretation. identified, investigators are faced with the challenge of functionally characterizing the variants. Functional categories of the causal variants (such as missense or Clinical implications of CP effects nonsense in protein-coding genes) can provide crucial clues for Characterizing the molecular mechanisms of CP effects characterizing the CP effects of genes. Various bioinformatics tools and databases101 (BOX 3) will undoubtedly expand our understanding of are available for predicting the deleterious, potentially disease-causing biomolecular the underlying biology of complex diseases and will have 102 effects of mutations on the basis of the functional category (such as PolyPhen or clinical implications for drug discovery. First, character- 103 SIFT ). Although most of these tools focus on the functional effects of either izing CP effects may have clinically relevant implications protein-coding or splice-site variants, mutations in non-protein-coding genes (such as for the classification (nosology) of medical disorders. microRNAs) or intergenic regulatory elements (such as enhancers) can result in the dysregulation of hundreds of target and thus could have a major role (refer For example, psychiatric disorders are currently defined to the Encyclopedia of DNA Elements (ENCODE) project). It is also noteworthy that as distinct syndromes on the basis of their constella- regulatory variants may confer tissue-specific effects on multiple genes104, some tions of signs and symptoms. As noted earlier, however, of which could occur on different (trans-effects105). Examination of recent GWASs7,28 have demonstrated shared heritability expression (eQTL) data in a relevant tissue type can help to among many of these disorders29,75,76. As further studies identify the tissue-specific regulatory changes caused by mutations106,107, as provide a more comprehensive account of the distinct demonstrated in the Genotype-Tissue Expression (GTEx) eQTL Project108. Thus, single and overlapping of psychiatric dis- variants can have distinct effects on different tissues. orders, the goal of an aetiology-based classification may The CP effect of a single variant can also occur when the gene is involved in multiple become more feasible. Of note, imperfect nosology poses pathways or when it is involved in the same pathway but has a different phenotypic a challenge for teasing apart biological pleiotropy from effect on the associated disorders. Public resources of canonical pathways, biological functions or protein–protein interaction data can be used to compare and contrast spurious pleiotropy (particularly the bias resulting diverse biological roles of gene products as well as potential pathogenetic mechanisms from misdiagnosis) as the distinction between two dis- underlying distinct disorders (for example, Pathguide)109. It is often informative to use a orders may not be aetiologically valid. In such cases, the statistical strategy, such as multivariate pathway analysis, for identifying statistically pleiotropy may be real, but the diagnostic categories are enriched sets of biologically related genes for single disorders and comparing the in fact spurious. implicated pathways in the functional characterization of the CP effect110. The growing catalogue of genetic variants with pleio- Finally, it is essential to validate the predicted CP effects of genetic variation on tropic effects has important implications for genetic test- 104 cellular using experimental methods . Typically, the molecular effects of a ing and personal genomics. As genetic information is variant can be demonstrated using cultured cells in vitro, a knock‑in or knockout increasingly integrated into medical practice, clinicians strategy in animal models111–113 or, more recently, in vivo genome editing114,115. Changes and medical genetics professionals will need to be aware in protein expression, localization and mRNA indicate the functional effects of the . However, it should be noted that unless the replacement with that genetic tests for one disease may have implications the variant results in phenotypic changes that are directly related to the disorder, for risks of other diseases. In some cases, discovery of experimental validation of the functional effects does not necessarily imply causality to these secondary risks may emerge well after the original the disease in humans116. Therefore, clarification of the CP effects requires that test information has been provided, thus complicating functional studies of gene mutation be carried out separately in pathogenic cell types the process of genetic counselling and raising complex that are relevant to the implicated disorders117. A successful example is a series of ethical and ‘duty to warn’ issues. At the same time, the functional studies that verified the CP effects of an endogenous β‑galactoside-binding growth of direct-to‑consumer genetic tests will mean protein galectin 3; the knockout mouse model of galectin 3 revealed that the deficiency that an increasing number of individuals will be con- 118,119 of the protein leads to a concanavalin-A‑induced hepatitis in the liver , whereas fronted with CP risk information without the benefit of inhibition of galectin 3 expression suppresses tumour growth in human breast genetic counselling. The case of APOE provides a famil- carcinoma cells120–123. iar example of a common variant with well-established

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 9

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

Box 4 | Mendelian randomization

Mendelian randomization is a form of instrumental G PA PB variable analysis — a common approach in causal Genetic Intermediate Outcome inference — that uses a genetic variable (G, the variable phenotype phenotype instrumental variable) to test whether an intermediate

phenotype (PA) causes another phenotype (PB; see the figure)73,74. For example, Mendelian randomization was used to test whether the relationship between high-density lipoprotein cholesterol and myocardial infarction is causal37. To conduct a valid Mendelian randomization C experiment, the following assumptions must be met73,74: Confounders • Assumption 1: G (which is a single-nucleotide polymorphism (SNP) or a 124 combination of multiple SNPs ) is robustly associated with PA. Nature Reviews | Genetics • Assumption 2: G is unrelated to C, which are confounding factors that bias the relationship between PA and PB.

In other words, there are no common causes of G and PB.

• Assumption 3: G is related to PB only through its association with PA.

If these assumptions are met, it is possible to test the hypothesis that PA causes PB and to derive an estimate of this 73 relationship (βIV) by using the regression coefficients for testing the association of G and PB, and G and PA : βIV = βG,PB/βG,PA. The assumptions of Mendelian randomization are quite strong and thus the instrumental variable (G) must be carefully selected. Even small violations of the assumptions can result in severe bias74. Assumption 1 can easily be

verified by selecting a G that is robustly associated with PA. Assumption 2 generally holds because G is randomized at birth and thus is independent of non-genetic confounders and is not modified by the course of disease. However,

population stratification could violate this assumption if ancestry is related both to G and to PB. Assumption 3

explicitly assumes that G is not associated with PA and PB through biological pleiotropy, meaning that G is associated

with PB only through PA (that is, the association instead involves mediated pleiotropy) and that G is not associated with

any unmeasured phenotype that is related to PB. Knowledge about the causal nature of the association between G and 74 PA can help to verify this assumption . Additionally, using multiple different instrumental variables for different genes and showing consistent results can also help to rule out violations73,74. Although assumptions 2 and 3 cannot be empirically proved, there are several additional tests that can be used to try to falsify assumptions 2 and 3 and thus to minimize the chance of bias74.

CP effects. The APOE4 allele is a known risk factor for example, the finding that the L‑type calcium channel both atherosclerotic heart disease and Alzheimer’s dis- subunit gene CACNA1C is a risk gene for bipolar dis- ease but has also been shown to exert a protective effect order78 has revived interest in trials of calcium channel on risk of age-related macular degeneration77. Very lit- antagonist antihypertensive drugs as possible mood tle research is available to evaluate the psychological disorder treatments (R. H. Perlis, personal communica- impact of such ‘competing risk’ information. In addition, tion). Alternatively, a drug targeting a shared pathway accurately characterizing CP effects and distinguishing could be beneficial for one disease and detrimental for between biological and mediated pleiotropy will affect others; this scenario could result in ‘off-target’ effects at how this information is interpreted and used in clinical the disease level despite being on‑target at the pharma- practice. For example, if a patient carries a variant that cological level. For example, several genes have oppos- directly affects myocardial infarction through LDL, the ing effects on autoimmune disorders79–81, suggesting that mediated relationship provides clinicians with a more drugs modulating these gene products to treat one dis- proximal target for the prevention of myocardial infarc- order could have unintended adverse effects on another. tion. Furthermore, distinguishing between CP effects This is exemplified by the utility of treatments targeted caused by single versus multiple variants can improve to tumour necrosis factor (TNF) in Crohn’s disease and the accuracy of these genetic tests and the interpreta- rheumatoid arthritis but their counter-indication in tion of results. For example, although the same gene may multiple sclerosis. The adverse effect on multiple scle- be implicated in multiple diseases, if distinct variants in rosis is also supported by evidence of a genetic variant that gene are differentially associated with alternative identified in GWASs that increases the risk of disease diseases, then testing for both variants might provide and mimics the effect of TNF-targeted treatments82. separate risk information for each disease. In the realm of therapeutics, the existence of com- Conclusions and future directions mon pathological mechanisms in distinct disorders An exciting picture is emerging of startling genetic over- may suggest new opportunities and challenges for drug lap between seemingly unrelated diseases and traits. development. Drugs developed for one disorder could The promise is twofold: using ever-larger sample sizes be repurposed to treat another disorder if the therapeu- across genetic cohorts will further increase discoveries Genetic architecture tic target is found to be common to the biology of both of genetic association, and the patterns of sharing will A genetic model (that is, the disorders. In such cases, a gene or multiple genes in a help to sort associations into discrete pathways, which number of single-nucleotide polymorphisms, effect sizes, pathway might be considered to be pleiotropic if they will further our understanding of biology and disease. allele frequency, and so on) affect more than one phenotype, regardless of whether In this Review, we have outlined analytical strategies to underlying a phenotypic trait. the specific variants are shown to have CP effects. For discover CP effects systematically in existing GWAS data

10 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

sets as the first step in this direction. Several advances identified in GWASs and thus to distinguish between will be instrumental in allowing us to reap the full ben- the different types of pleiotropy more accurately. The efits of shared genetic architecture across traits: ana- current focus on whole-exome sequencing will probably lytical frameworks, such as those we describe, must be bias findings towards gene-centric pleiotropic effects, developed, tested and implemented; multi-disease mega- whereas whole-genome sequencing will provide a more consortia must be formed to pool data across traits; and robust survey of the genomic landscape for CP effects. systems-level approaches must be developed to charac- Sequencing will also allow us to interrogate lower-fre- terize the molecular mechanisms perturbed by common quency variants (which are typically not represented on CP associations of modest effect (BOX 3). SNP-genotyping microarrays) for CP effects, and some This Review has focused on the detection of of these variants are likely to have higher penetrance CP effects, but functionally characterizing identified than those found in GWASs. In addition, the observed variants and understanding the underlying mechanism comorbidity between mapped Mendelian disorders and remains a major challenge in the field (BOX 3). Although complex traits can be exploited by carrying out focused many resources are available for characterizing protein- sequencing of the mapped region. For example, comor- coding variants, experiments in animal or cellular mod- bidity between Gaucher’s disease and Parkinsonism led els are generally necessary to establish causality. The to the identification of risk alleles for Parkinson’s dis- Encyclopedia of DNA Elements (ENCODE) project ease in GBA, which is the gene implicated in Gaucher’s provides a valuable resource for characterizing non- disease86. protein-coding variants and regulatory elements and has Extending observations of CP effects to a wider found that most GWAS associations overlap a functional range of phenotypes is an emerging area. Systematic region25. In addition, examining eQTLs in the relevant and unbiased phenome-wide association studies tissue for each phenotype of a CP effect can help to elu- (PheWASs) are now beginning in which a SNP with cidate the functional consequence and to distinguish an established association with a phenotype is tested between biological and spurious pleiotropy. Finally, net- for association with hundreds of other phenotypes87,88. work-based approaches83,84 have highlighted the impor- The Population Architecture using Genomics and tance of pleiotropy in human disease, and understanding Epidemiology (PAGE) network89 is a large-scale col- CP effects in the context of these models can provide laboration for harmonizing phenotypes across eight insight into the mechanisms of shared pathophysiology. epidemiological studies and five ethnic groups for the For example, proteins involved in the same disease are purpose of conducting PheWASs on replicated GWAS more likely to interact with each other83, pathopheno- hits90. Other efforts aim to analyse a broad range of types within the same disease class are more likely to share phenotypes that are extracted from electronic medical genes84, and increased comorbidity has been identified records88,91. These approaches will increase our under- among diseases that are metabolically linked85. standing of the extent of shared genetics among traits As the field moves towards sequencing-based asso- and our global understanding of phenotypes as a range ciation studies, we will have the opportunity directly of inter-related manifestations of biological mechanisms to identify the causal alleles underlying the CP effects rather than isolated events.

1. Hindorff, L. A. et al. Potential etiologic and functional 9. Wagner, G. P. & Zhang, J. The pleiotropic structure of 16. Jostins, L. et al. Host-microbe interactions have implications of genome-wide association loci for the genotype–phenotype map: the of shaped the genetic architecture of inflammatory bowel human diseases and traits. Proc. Natl Acad. Sci. USA complex organisms. Nature Rev. Genet. 12, 204–213 disease. Nature 491, 119–124 (2012). 106, 9362–9367 (2009). (2011). This is the largest study of Crohn’s disease and Characteristics of reported GWAS results listed in This excellent Review discusses pleiotropy ulcerative colitis and identifies more than 100 CP the US National Human Genome Research Institute in model organisms and the implications for associations. (NHGRI) catalogue are discussed in this paper. . 17. Thorleifsson, G. et al. Genome-wide association yields 2. Plenge, R. M. et al. Replication of putative candidate- 10. Kendler, K. S., Neale, M. C., Kessler, R. C., Heath, A. C. new sequence variants at seven loci that associate gene associations with rheumatoid arthritis in >4,000 & Eaves, L. J. Major depression and generalized with measures of obesity. Nature Genet. 41, 18–24 samples from North America and Sweden: association anxiety disorder. Same genes, (partly) different (2009). of susceptibility with PTPN22, CTLA4, and PADI4. environments? Arch. Gen. Psychiatry 49, 716–722 18. Iles, M. M. et al. A variant in FTO shows association Am. J. Hum. Genet. 77, 1044–1060 (2005). (1992). with melanoma risk not due to BMI. Nature Genet. 3. Barrett, J. C. et al. Genome-wide association defines 11. Criswell, L. A. et al. Analysis of families in the Multiple 45, 428–432 (2013). more than 30 distinct susceptibility loci for Crohn’s Autoimmune Disease Genetics Consortium (MADGC) 19. Schunkert, H. et al. Large-scale association analysis disease. Nature Genet. 40, 955–962 (2008). collection: the PTPN22 620W allele associates with identifies 13 new susceptibility loci for coronary artery 4. Kyogoku, C. et al. Genetic association of the R620W multiple autoimmune phenotypes. Am. J. Hum. Genet. disease. Nature Genet. 43, 333–338 (2011). polymorphism of protein tyrosine phosphatase 76, 561–571 (2005). 20. The Coronary Artery Disease (C4D) Genetics PTPN22 with human SLE. Am. J. Hum. Genet. 75, 12. Eaton, W. W., Rose, N. R., Kalaydjian, A., Consortium. A genome-wide association study in 504–507 (2004). Pedersen, M. G. & Mortensen, P. B. Epidemiology of Europeans and South Asians identifies five new loci for 5. Todd, J. A. et al. Robust associations of four new autoimmune diseases in Denmark. J. Autoimmun. 29, coronary artery disease. Nature Genet. 43, 339–344 chromosome regions from genome-wide analyses of 1–9 (2007). (2011). type 1 diabetes. Nature Genet. 39, 857–864 (2007). 13. Sivakumaran, S. et al. Abundant pleiotropy in human 21. Shete, S. et al. Genome-wide association study 6. Fletcher, O. & Houlston, R. S. Architecture of inherited complex diseases and traits. Am. J. Hum. Genet. 89, identifies five susceptibility loci for glioma. Nature susceptibility to common cancer. Nature Rev. Cancer 607–618 (2011). Genet. 41, 899–904 (2009). 10, 353–361 (2010). 14. Cotsapas, C. et al. Pervasive sharing of genetic effects 22. Yasuno, K. et al. Genome-wide association study of 7. Cross-Disorder Group of the Psychiatric Genomics in autoimmune disease. PLoS Genet. 7, e1002254 intracranial aneurysm identifies three new risk loci. Consortium. Identification of risk loci with shared (2011). Nature Genet. 42, 420–425 (2010). effects on five major psychiatric disorders: a genome- Systematic evaluation of CP associations is carried 23. Tomlinson, I. et al. A genome-wide association scan of wide analysis. Lancet 381, 1371–1379 (2013). out in this study across seven autoimmune diseases tag SNPs identifies a susceptibility variant for This paper presents a genome-wide analysis of CP and application of CPMA method. colorectal cancer at 8q24.21. Nature Genet. 39, associations across five psychiatric disorders. 15. Sirota, M., Schaub, M. A., Batzoglou, S., 984–988 (2007). 8. Stearns, F. W. One hundred years of pleiotropy: Robinson, W. H. & Butte, A. J. Autoimmune disease 24. Thomas, G. et al. Multiple loci identified in a genome- a retrospective. Genetics 186, 767–773 (2010). classification by inverse association with SNP alleles. wide association study of prostate cancer. Nature This is a historical review of pleiotropy. PLoS Genet. 5, e1000792 (2009). Genet. 40, 310–315 (2008).

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 11

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

25. Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. 50. Lange, C. et al. A family-based association test for 75. McGuffin, P. et al. The heritability of bipolar affective & Snyder, M. Linking disease associations with repeatedly measured quantitative traits adjusting disorder and the genetic relationship to unipolar regulatory information in the human genome. Genome for unknown environmental and/or polygenic effects. depression. Arch. Gen. Psychiatry 60, 497–502 Res. 22, 1748–1759 (2012). Stat. Appl. Genet. Mol. Biol. 3, Article17 (2004). (2003). 26. Malhotra, D. & Sebat, J. CNVs: harbingers of a rare 51. Klei, L., Luca, D., Devlin, B. & Roeder, K. 76. Rommelse, N. N., Franke, B., Geurts, H. M., variant revolution in psychiatric genetics. Cell 148, Pleiotropy and principal components of heritability Hartman, C. A. & Buitelaar, J. K. Shared heritability of 1223–1241 (2012). combine to increase power for association analysis. attention-deficit/hyperactivity disorder and autism 27. Heinzen, E. L. et al. Rare deletions at 16p13.11 Genet. Epidemiol. 32, 9–19 (2008). spectrum disorder. Eur. Child Adolesc. Psychiatry 19, predispose to a diverse spectrum of sporadic epilepsy 52. Ferreira, M. A. & Purcell, S. M. A multivariate test of 281–295 (2010). syndromes. Am. J. Hum. Genet. 86, 707–718 (2010). association. Bioinformatics 25, 132–133 (2009). 77. McKay, G. J. et al. Evidence of association of APOE 28. Purcell, S. M. et al. Common polygenic variation 53. Shriner, D. Moving toward system genetics through with age-related macular degeneration: a pooled contributes to risk of schizophrenia and bipolar multiple trait analysis in genome-wide association analysis of 15 studies. Hum. Mutat. 32, 1407–1416 disorder. Nature 460, 748–752 (2009). studies. Front. Genet. 3, 1 (2012). (2011). 29. Lichtenstein, P. et al. Common genetic determinants This is a review of multivariate approaches for 78. Ferreira, M. A. et al. Collaborative genome-wide of schizophrenia and bipolar disorder in Swedish detecting CP associations. association analysis supports a role for ANK3 and families: a population-based study. Lancet 373, 54. Ioannidis, J. P., Thomas, G. & Daly, M. J. Validating, CACNA1C in bipolar disorder. Nature Genet. 40, 234–239 (2009). augmenting and refining genome-wide association 1056–1058 (2008). 30. Rieck, M. et al. Genetic variation in PTPN22 signals. Nature Rev. Genet. 10, 318–329 (2009). 79. Wang, K. et al. Comparative genetic analysis of corresponds to altered function of T and B 55. Stahl, E. A. et al. Bayesian inference analyses of the inflammatory bowel disease and type 1 diabetes lymphocytes. J. Immunol. 179, 4704–4710 (2007). polygenic architecture of rheumatoid arthritis. Nature implicates multiple loci with opposite effects. 31. Menard, L. et al. The PTPN22 allele encoding an Genet. 44, 483–489 (2012). Hum. Mol. Genet. 19, 2059–2067 (2010). R620W variant interferes with the removal of 56. Fisher, R. A. Statistical Methods for Research Workers 80. Smyth, D. J. et al. Shared and distinct genetic variants developing autoreactive B cells in humans. (Oliver & Boyd, 1925). in type 1 diabetes and celiac disease. N. Engl. J. Med. J. Clin. Invest. 121, 3635–3644 (2011). 57. Kavvoura, F. K. & Ioannidis, J. P. Methods for meta- 359, 2767–2777 (2008). 32. Zhang, J. et al. The autoimmune disease-associated analysis in genetic association studies: a review of 81. Zhernakova, A. et al. Meta-analysis of genome-wide PTPN22 variant promotes calpain-mediated Lyp/Pep their potential and pitfalls. Hum. Genet. 123, 1–14 association studies in celiac disease and rheumatoid degradation associated with lymphocyte and dendritic (2008). arthritis identifies fourteen non-HLA shared loci. cell hyperresponsiveness. Nature Genet. 43, 902–907 58. de Bakker, P. I. et al. Practical aspects of imputation- PLoS Genet. 7, e1002004 (2011). (2011). driven meta-analysis of genome-wide association 82. Gregory, A. P. et al. TNF receptor 1 genetic risk 33. Behrens, T. W. Lyp breakdown and autoimmunity. studies. Hum. Mol. Genet. 17, R122–R128 (2008). mirrors outcome of anti-TNF therapy in multiple Nature Genet. 43, 821–822 (2011). 59. Bhattacharjee, S. et al. A subset-based approach sclerosis. Nature 488, 508–511 (2012). 34. Zhernakova, A., van Diemen, C. C. & Wijmenga, C. improves power and interpretation for the combined 83. Barabasi, A. L., Gulbahce, N. & Loscalzo, J. Detecting shared pathogenesis from the shared analysis of genetic association studies of Network medicine: a network-based approach to genetics of immune-related diseases. Nature Rev. heterogeneous traits. Am. J. Hum. Genet. 90, human disease. Nature Rev. Genet. 12, 56–68 Genet. 10, 43–55 (2009). 821–835 (2012). (2011). 35. Pomerantz, M. M. et al. The 8q24 cancer risk variant 60. O’Brien, P. C. Procedures for comparing samples with 84. Goh, K. I. et al. The human disease network. rs6983267 shows long-range interaction with MYC multiple endpoints. Biometrics 40, 1079–1087 Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007). in colorectal cancer. Nature Genet. 41, 882–884 (1984). A first step is taken in this study towards the (2009). 61. Xu, X., Tian, L. & Wei, L. J. Combining dependent tests construction of the genotype–phenotype map in 36. Wasserman, N. F., Aneas, I. & Nobrega, M. A. An for linkage or association across multiple phenotypic humans using known disease genes reported 8q24 gene desert variant associated with prostate traits. Biostatistics 4, 223–229 (2003). in OMIM (Online in Man). cancer risk confers differential in vivo activity to a MYC 62. Yang, Q., Wu, H., Guo, C. Y. & Fox, C. S. 85. Lee, D. S. et al. The implications of human metabolic enhancer. Genome Res. 20, 1191–1197 (2010). Analyze multivariate phenotypes in genetic association network topology for disease comorbidity. Proc. Natl 37. Voight, B. F. et al. Plasma HDL cholesterol and risk of studies by combining univariate association tests. Acad. Sci. USA 105, 9880–9885 (2008). myocardial infarction: a Mendelian randomisation Genet. Epidemiol. 34, 444–454 (2010). 86. DePaolo, J., Goker-Alpan, O., Samaddar, T., Lopez, G. study. Lancet 380, 572–580 (2012). 63. van der Sluis, S., Posthuma, D. & Dolan, C. V. & Sidransky, E. The association between mutations in This paper presents an example of Mendelian TATES: efficient multivariate genotype-phenotype the lysosomal protein glucocerebrosidase and randomization using results from GWASs. analysis for genome-wide association studies. PLoS parkinsonism. Mov. Disord. 24, 1571–1578 (2009). 38. Hung, R. J. et al. A susceptibility locus for lung cancer Genet. 9, e1003235 (2013). 87. Denny, J. C. et al. PheWAS: demonstrating the maps to nicotinic acetylcholine receptor subunit genes 64. Huang, J., Johnson, A. D. & O’Donnell, C. J. feasibility of a phenome-wide scan to discover gene- on 15q25. Nature 452, 633–637 (2008). PRIMe: a method for characterization and evaluation disease associations. Bioinformatics 26, 1205–1210 39. Thorgeirsson, T. E. et al. A variant associated with of pleiotropic regions from multiple genome-wide (2010). nicotine dependence, lung cancer and peripheral association studies. Bioinformatics 27, 1201–1206 88. Denny, J. C. et al. Variants near FOXE1 are associated arterial disease. Nature 452, 638–642 (2008). (2011). with hypothyroidism and other thyroid conditions: 40. Chanock, S. J. & Hunter, D. J. Genomics: when the 65. Nica, A. C. et al. Candidate causal regulatory effects using electronic medical records for genome- and smoke clears. Nature 452, 537–538 (2008). by integration of expression QTLs with complex trait phenome-wide studies. Am. J. Hum. Genet. 89, 41. Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & genetic associations. PLoS Genet. 6, e1000895 529–542 (2011). Wray, N. R. Estimation of pleiotropy between complex (2010). 89. Pendergrass, S. A. et al. The use of phenome-wide diseases using SNP-derived genomic relationships and 66. Lin, D. Y. & Sullivan, P. F. Meta-analysis of genome- association studies (PheWAS) for exploration of novel restricted maximum likelihood. Bioinformatics 28, wide association studies with overlapping subjects. genotype-phenotype relationships and pleiotropy 2540–2542 (2012). Am. J. Hum. Genet. 85, 862–872 (2009). discovery. Genet. Epidemiol. 35, 410–422 (2011). 42. Zeger, S. L. & Liang, K. Y. Longitudinal data analysis 67. Cortes, A. & Brown, M. A. Promise and pitfalls of the 90. Pendergrass, S. A. et al. Phenome-wide association for discrete and continuous outcomes. Biometrics 42, immunochip. Arthritis Res. Ther. 13, 101 (2011). study (PheWAS) for detection of pleiotropy within the 121–130 (1986). 68. Voight, B. F. et al. The metabochip, a custom population architecture using genomics and 43. Lange, C., Silverman, E. K., Xu, X., Weiss, S. T. & genotyping array for genetic studies of metabolic, epidemiology (PAGE) network. PLoS Genet. 9, Laird, N. M. A multivariate family-based association cardiovascular, and anthropometric traits. PLoS Genet. e1003087 (2013). test using generalized estimating equations: 8, e1002793 (2012). 91. Rasmussen-Torvik, L. J. et al. High density GWAS for FBAT-GEE. Biostatistics 4, 195–206 (2003). 69. Vansteelandt, S. et al. On the adjustment for LDL cholesterol in African Americans using electronic 44. Liu, J., Pei, Y., Papasian, C. J. & Deng, H. W. covariates in genetic association analysis: a novel, medical records reveals a strong protective variant in Bivariate association analyses for the mixture of simple principle to infer direct causal effects. Genet. APOE. Clin. Transl. Sci. 5, 394–399 (2012). continuous and binary traits with the use of extended Epidemiol. 33, 394–405 (2009). 92. Smoller, J. W., Lunetta, K. L. & Robins, J. generalized estimating equations. Genet. Epidemiol. 70. Lipman, P. J. & Lange, C. CGene: an R package Implications of comorbidity and ascertainment bias 33, 217–227 (2009). for implementation of causal genetic analyses. for identifying disease genes. Am. J. Med. Genet. 96, 45. Lee, P. H. et al. Modifiers and subtype-specific Eur. J. Hum. Genet. 19, 1292–1294 (2011). 817–822 (2000). analyses in whole-genome association studies: 71. Vanderweele, T. J. & Vansteelandt, S. Odds ratios 93. Berkson, J. Limitations of the application of fourfold a likelihood framework. Hum. Hered. 72, 10–20 for mediation analysis for a dichotomous outcome. table analysis to hospital data. Biometrics 2, 47–53 (2011). Am. J. Epidemiol. 172, 1339–1348 (2010). (1946). 46. Hartley, S. W., Monti, S., Liu, C. T., Steinberg, M. H. & 72. VanderWeele, T. J. et al. Genetic variants on 15q25.1, 94. Wray, N. R., Lee, S. H. & Kendler, K. S. Impact of Sebastiani, P. Bayesian methods for multivariate smoking, and lung cancer: an assessment of mediation diagnostic misclassification on estimation of genetic modeling of pleiotropic SNP associations and genetic and interaction. Am. J. Epidemiol. 175, 1013–1020 correlations using genome-wide genotypes. risk prediction. Front. Genet. 3, 176 (2012). (2012). Eur. J. Hum. Genet. 20, 668–674 (2012). 47. O’Reilly, P. F. et al. MultiPhen: joint model of multiple 73. Lawlor, D. A., Harbord, R. M., Sterne, J. A., 95. McCarthy, M. I. et al. Genome-wide association phenotypes can increase discovery in GWAS. PLoS Timpson, N. & Davey Smith, G. Mendelian studies for complex traits: consensus, uncertainty ONE 7, e34861 (2012). randomization: using genes as instruments for making and challenges. Nature Rev. Genet. 9, 356–369 48. Zhang, H., Liu, C. T. & Wang, X. An association test for causal inferences in epidemiology. Stat. Med. 27, (2008). multiple traits based on the generalized Kendall’s tau. 1133–1163 (2008). This Review presents an overview of key J. Am. Stat. Assoc. 105, 473–481 (2010). 74. Glymour, M. M., Tchetgen, E. J. & Robins, J. M. considerations and challenges in GWASs. 49. Ott, J. & Rabinowitz, D. A principal-components Credible Mendelian randomization studies: 96. Laurie, C. C. et al. Quality control and quality approach based on heritability for combining approaches for evaluating the instrumental variable assurance in genotypic data for genome-wide phenotype information. Hum. Hered. 49, 106–111 assumptions. Am. J. Epidemiol. 175, 332–339 association studies. Genet. Epidemiol. 34, 591–602 (1999). (2012). (2010).

12 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved REVIEWS

97. Price, A. L. et al. Principal components analysis 112. Smithies, O., Gregg, R. G., Boggs, S. S., 126. Evans, D. M. et al. Interaction between ERAP1 and corrects for stratification in genome-wide Koralewski, M. A. & Kucherlapati, R. S. Insertion of HLA‑B27 in ankylosing spondylitis implicates peptide association studies. Nature Genet. 38, 904–909 DNA sequences into the human chromosomal β-globin handling in the mechanism for HLA‑B27 in disease (2006). locus by homologous recombination. Nature 317, susceptibility. Nature Genet. 43, 761–767 (2011). 98. Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. 230–234 (1985). 127. Silverberg, M. S. et al. Ulcerative colitis-risk loci on New approaches to population stratification in 113. Thomas, K. R., Folger, K. R. & Capecchi, M. R. High chromosomes 1p36 and 12q15 found by genome- genome-wide association studies. Nature Rev. Genet. frequency targeting of genes to specific sites in the wide association study. Nature Genet. 41, 216–220 11, 459–463 (2010). mammalian genome. Cell 44, 419–428 (1986). (2009). 99. Rosenberg, N. A. et al. Genome-wide association 114. Li, H. et al. In vivo genome editing restores 128. Strange, A. et al. A genome-wide association study studies in diverse populations. Nature Rev. Genet. 11, haemostasis in a mouse model of haemophilia. Nature identifies new psoriasis susceptibility loci and an 356–366 (2010). 475, 217–221 (2011). interaction between HLA‑C and ERAP1. Nature Genet. 100. Marchini, J. & Howie, B. Genotype imputation for 115. Esvelt, K. M. & Wang, H. H. Genome-scale engineering 42, 985–990 (2010). genome-wide association studies. Nature Rev. Genet. for systems and synthetic biology. Mol. Syst. Biol. 9, 129. Franke, A. et al. Genome-wide meta-analysis increases 11, 499–511 (2010). 641 (2013). to 71 the number of confirmed Crohn’s disease 101. Kann, M. Advances in translational bioinformatics: 116. Cooper, G. M. & Shendure, J. Needles in stacks of susceptibility loci. Nature Genet. 42, 1118–1125 computational approaches for the hunting of disease needles: finding disease-causal variants in a wealth of (2010). genes. Brief. Bioinform. 11, 96–110 (2010). genomic data. Nature Rev. Genet. 12, 628–640 130. Teslovich, T. M. et al. Biological, clinical and population 102. Adzhubei, I. et al. A method and server for predicting (2011). relevance of 95 loci for blood lipids. Nature 466, damaging missense mutations. Nature Methods 7, 117. Hu, X. et al. Integrating autoimmune risk loci with 707–713 (2010). 248–249 (2010). gene-expression data identifies specific pathogenic 131. Broderick, P. et al. Common variation at 3p22.1 and 103. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the immune cell subsets. Am. J. Hum. Genet. 89, 7p15.3 influences multiple myeloma risk. Nature effects of coding non-synonymous variants on protein 496–506 (2011). Genet. 44, 58–61 (2012). function using the SIFT algorithm. Nature Protoc. 4, 118. Henderson, N. C. et al. Galectin‑3 regulates 132. Sanders, S. J. et al. Multiple recurrent de novo CNVs, 1073–1081 (2009). myofibroblast activation and hepatic fibrosis. including duplications of the 7q11.23 Williams 104. Freedman, M. et al. Principles for the post-GWAS Proc. Natl Acad. Sci. USA 103, 5060–5065 (2006). syndrome region, are strongly associated with autism. functional characterization of cancer risk loci. Nature 119. Radosavljevic, G. et al. The roles of galectin‑3 in 70, 863–885 (2011). Genet. 43, 513–518 (2011). autoimmunity and tumor progression. Immunol. Res. 133. Pober, B. R. Williams-Beuren syndrome. N. Engl. 105. Fehrmann, R. et al. Trans-eQTLs reveal that 52, 100–110 (2012). J. Med. 362, 239–252 (2010). independent genetic variants associated with a 120. Honjo, Y., Nangia-Makker, P., Inohara, H. & Raz, A. complex phenotype converge on intermediate genes, Down-regulation of galectin‑3 suppresses Acknowledgements with a major role for the HLA. PLoS Genet. 7, tumorigenicity of human breast carcinoma cells. This work was supported in part by the US National Institute e1002197 (2011). Clin. Cancer Res. 7, 661–668 (2001). of Mental Health (NIMH) grants R01‑MH079799 and 106. Majewski, J. & Pastinen, T. The study of eQTL 121. Shekhar, M. P., Nangia-Makker, P., Tait, L., Miller, F. & K24MH094614 (both to J.W.S.). variations by RNA-seq: from SNPs to phenotypes. Raz, A. Alterations in galectin‑3 expression and Trends Genet. 27, 72–79 (2011). distribution correlate with breast cancer progression: Competing interests statement 107. Gilad, Y., Rifkin, S. A. & Pritchard, J. K. Revealing functional analysis of galectin‑3 in breast epithelial- The authors declare no competing financial interests. the architecture of gene regulation: the promise endothelial interactions. Am. J. Pathol. 165, of eQTL studies. Trends Genet. 24, 408–415 1931–1941 (2004). (2008). 122. Baptiste, T. A., James, A., Saria, M. & Ochieng, J. FURTHER INFORMATION 108. Baker, M. Biorepositories: building better biobanks. Mechano-transduction mediated secretion and uptake Chris Cotsapas’s homepage: http://www.cotsapaslab.info Nature 486, 141–146 (2012). of galectin‑3 in breast carcinoma cells: implications in ENCODE Project: http://www.nature.com/encode 109. Cantor, R., Lange, K. & Sinsheimer, J. S. Prioritizing the extracellular functions of the lectin. Exp. Cell Res. Genotype-Tissue Expression eQTL Browser: GWAS results: a review of statistical methods and 313, 652–664 (2007). http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi recommendations for their application. Am. J. Hum. 123. Nangia-Makker, P. et al. Cleavage of galectin‑3 by NHGRI GWAS Catalogue: http://www.genome.gov/gwastudies Genet. 86, 6–22 (2010). matrix metalloproteases induces angiogenesis in Online Mendelian Inheriatnce in Man (OMIM): http://omim.org 110. Eleftherohorinou, H. et al. Pathway analysis of breast cancer. Int. J. Cancer 127, 2530–2541 (2010). PAGE network: http://www.pagestudy.org GWAS provides new insights into genetic 124. Palmer, T. M. et al. Using multiple genetic variants as Pathguide: http://www.pathguide.org susceptibility to 3 inflammatory diseases. PLoS instrumental variables for modifiable risk factors. POLYPHEN: http://genetics.bwh.harvard.edu/pph2/ Genet. 4, e8068 (2009). Stat. Methods Med. Res. 21, 223–242 (2012). index.shtml 111. Evans, M. J. & Kaufman, M. H. Establishment in 125. Duerr, R. H. et al. A genome-wide association study SIFT: http://sift.jcvi.org culture of pluripotential cells from mouse embryos. identifies IL23R as an inflammatory bowel disease ALL LINKS ARE ACTIVE IN THE ONLINE PDF Nature 292, 154–156 (1981). gene. Science 314, 1461–1463 (2006).

NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 13

© 2013 Macmillan Publishers Limited. All rights reserved