Dissection of Complex Gene Expression Using the Combined Analysis of Pleiotropy and Epistasis
Total Page:16
File Type:pdf, Size:1020Kb
DISSECTION OF COMPLEX GENE EXPRESSION USING THE COMBINED ANALYSIS OF PLEIOTROPY AND EPISTASIS VIVEK M. PHILIP, ANNA L. TYLER, and GREGORY W. CARTER∗ The Jackson Laboratory, Bar Harbor, ME, 04609, USA ∗E-mail: [email protected] Global transcript expression experiments are commonly used to investigate the biological processes that underlie complex traits. These studies can exhibit complex patterns of pleiotropy when trans- acting genetic factors influence overlapping sets of multiple transcripts. Dissecting these patterns into biological modules with distinct genetic etiology can provide models of how genetic variants affect specific processes that contribute to a trait. Here we identify transcript modules associated with pleiotropic genetic factors and apply genetic interaction analysis to disentangle the regulatory architecture in a mouse intercross study of kidney function. The method, called the combined anal- ysis of pleiotropy and epistasis (CAPE), has been previously used to model genetic networks for multiple physiological traits. It simultaneously models multiple phenotypes to identify direct genetic influences as well as influences mediated through genetic interactions. We first identified candidate trans expression quantitative trait loci (eQTL) and the transcripts potentially affected. We then clus- tered the transcripts into modules of co-expressed genes, from which we compute summary module phenotypes. Finally, we applied CAPE to map the network of interacting module QTL (modQTL) affecting the gene modules. The resulting network mapped how multiple modQTL both directly and indirectly affect modules associated with metabolic functions and biosynthetic processes. This work demonstrates how the integration of pleiotropic signals in gene expression data can be used to infer a complex hypothesis of how multiple loci interact to co-regulate transcription programs, thereby providing additional constraints to prioritize validation experiments. Keywords: pleiotropy, genetic interaction, genetic network. 1. Introduction The widespread adoption of genomic technologies has greatly increased the power and scope of genetic studies. One especially fruitful approach to understanding how genetic variation affects biological processes is the study of the genetics of gene expression.1{5 In these studies, transcript levels are treated as panels of thousands of phenotypes that quantify the cellular composition and gene expression of a tissue sample that is related to a physiological phenotype such as disease. These data are commonly analyzed to identify expression quantitative trait loci (eQTL), which are specific chromosomal regions that associate with the expression level of a given transcript. Associated eQTL are generally classified as local, cis-acting variants that affect the expres- sion of a gene located near the associated variant, or remote, trans-acting variants that affect the expression of a gene located at a distance (i.e. outside of linkage disequilibrium (LD) or on another chromosome). The more common cis associations have the straightforward biological interpretation of a sequence variant directly affecting the self transcript production, stability, or splicing. However, trans associations are often more difficult to interpret. The structure of gene regulatory networks suggests that these trans associations are caused by transcription factors or other proteins that bind and regulate DNA or RNA. The co-regulatory structures (A) (B) (C) eQTL1 eQTL1 modQTL1 modQT L2 T1 T2 T3 T4 T5 T6 T1 T2 T3 T1 T2 T3 T4 T5 T6 T4 T5 T6 trait trait trait Fig. 1. Hypothetical regulatory architecture of transcripts (T 1; :::; T 6) that serve as an endophenotype for an organism-level trait. (A) Simple model in which all transcripts are associated with trans-acting eQTL1 and part of a single underlying biological process affecting the trait. (B) Model with transcripts grouped into two modules that combine to affect the trait. Models (A) and (B) are indistinguishable using single-locus association. (C) Model obtained with co-expression clustering and CAPE analysis, in which the eQTL has been replaced by two multiple module QTL (modQTL). The genetic effects now map to the two modules distinctly, and the modQTL are linked by a directional influence mapping feed-forward regulation from modQTL1 to the red module via modQTL2. of these networks, in which proteins regulate multiple transcripts in complex hierarchies,6 suggest that a genetic variation in one regulatory gene could have significant effects on the expression of multiple target transcripts. This would generate extensive pleiotropy as many redundantly regulated transcripts would associate with the variant. While this is pleiotropy in the sense that one genetic variant is influencing multiple traits, it is somewhat trivial in that the multiple traits are redundant outputs of the same regulatory module. This effect can be efficiently modeled by first finding modules of co-expressed transcripts that map to the common trans-acting module QTL (modQTL). Pleiotropy between modQTL, in which a single variant is associated with multiple distinct gene modules, is more informative in the sense of a single variant affecting multiple regulatory programs in a more complex genetic architecture (Figure 1). Distinguishing between trivial and informative pleiotropy can be dif- ficult for complex regulatory networks in which multiple regulatory variants combine to affect hundreds of transcript outputs. In this paper, we address this problem by modeling interacting trans associations for mod- ules of co-expressed genes. We use kidney transcript data from a panel of F2 mouse intercross progeny to dissect the genetic regulation of multiple biological processes that affect over- all kidney function in these genetically diverse mouse models. We use co-expression analysis to identify gene modules with correlated expression and common function and derive sum- mary endophenotypes that describe transcriptional states. We next use a combined analysis of pleiotropy and epistasis (CAPE7) to simultaneously assess patterns of pleiotropy and sta- tistical interactions between trans modQTL, in order to infer the variant-to-variant ordering of regulatory influences on the multiple processes. This approach improves the interpretation of genetic interactions in terms of directed QTL-to-QTL influences that map how a given locus suppresses or enhances the effects of a second locus. By integrating evidence of epistasis across multiple phenotypes, the CAPE method can improve power to detect modQTL inter- actions and assign directionality to the relationship. Furthermore, CAPE inherently parses QTL-to-phenotype associations into direct effects and effects modified through genetic inter- actions, thereby separating the target transcripts into subsets that are influenced by distinct combinations of modQTL. In the case of transcript data, the result is a model of how multiple modQTL affect one another and, in turn, the regulation of multiple modules of co-expressed genes (Figure 1C). The resulting network model provides a clearer dissection of the nature of the observed pleiotropy and generates more specific hypotheses of variant activity and action. 2. Methods We followed a multi-step strategy to systematically identify and model multiple gene modules that underlie kidney health and disease. The procedure is outlined in Figure 2, and consisted of three main steps: a preliminary eQTL analysis to identify transcripts affected by one or more genetic factors; clustering of the affected transcripts into co-expressed gene modules; and a network analysis to map how the gene modules are regulated by multiple interacting genetic loci. We began with a study of gene expression related to kidney function in a mouse intercross.8 An F2 intercross population was derived from the kidney damage-susceptible SM/J inbred strain and the nonsusceptible MRL/MpJ inbred strain. Male SM/J mice exhibit kidney dysfunction, as measured by an increase in urinary albumin-to-creatine ratio (ACR). To identify causal genetic loci, ACR was measured in 173 male F2 progeny. Significant QTL were mapped on chromosomes (Chrs) 1, 4, and 15, with an additional suggestive QTL on Chr 17.8 This established ACR as a trait affected by multiple QTL that vary between the SM/J and MRL/MpJ lines. 2.1. Data To identify the biological pathways and processes underlying the ACR results, mRNA was collected from whole kidneys of the 173 F2 animals. Data generation and processing is de- 1. Transcript Selection 2. Gene Module Analysis 3. Network Derivation 33,881transcripts 8,144 transcripts 14 gene modules 258 loci 8 trans loci 8 trans loci Scan module phenotypes to identify key Map eQTL for all Cluster transcripts by co- modules and candidate modQTL transcripts expression using WCGNA Scan key module phenotypes to identify Select candidate trans Identify gene modules and interactions between modQTL eQTL query for functional coherence Reparametrize with CAPE to derive a Select transcripts that Compute summary profiles to common model for all module phenotypes map to 2+ trans eQTL define module phenotypes Derive network of modQTL-to-modQTL and modQTL-to-module influences 8,144 transcripts 14 gene modules 3 gene modules 8 trans loci 8 trans loci 8 interacting trans loci Fig. 2. Overview of analytical strategy. scribed in depth in the initial publication8 and will be summarized here. All mice were geno- typed using an array that contained 258 polymorphisms that were informative between the MRL/MpJ and SM/J strains. RNA samples were