Understanding Tissue-Specific Gene Regulation
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/110601; this version posted August 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Understanding Tissue-Specific Gene Regulation Abhijeet R. Sonawane1, John Platig2;3, Maud Fagny2;3, Cho-Yi Chen2;3, Joseph N. Paulson2;3, Camila M. Lopes-Ramos2;3, Dawn L. DeMeo1, John Quackenbush2;3;4, Kimberly Glass1;y;∗, Marieke L. Kuijjer2;3;y;∗ 1Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 2Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 3Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 4Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA Although all human tissues carry out common processes, tissues are distinguished by gene expres- sion patterns, implying that distinct regulatory programs control tissue-specificity. In this study, we investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Ex- pression project. We find that network edges (transcription factor to target gene connections) have higher tissue-specificity than network nodes (genes) and that regulating nodes (transcription fac- tors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue-network. However, they do assume bottleneck positions due to variability in transcription factor targeting and the influence of non-canonical reg- ulatory interactions. These results suggest that tissue-specificity is driven by context-dependent regulatory paths, providing transcriptional control of tissue-specific processes. 1. INTRODUCTION some biological insight concerning the associations be- tween both tissue-specific and other genes [11, 12], they do not explicitly model key elements of the gene regula- Although all human cells carry out common processes tory process. that are essential for survival, in the physical context of their tissue-environment, they also exhibit unique func- PANDA (Passing Attributes between Networks for tions that help define their phenotype. These common Data Assimilation) is an integrative gene regulatory net- and tissue-specific processes are ultimately controlled by work inference method that models the complexity of gene regulatory networks that alter which genes are ex- the regulatory process, including interactions between pressed and control the extent of that expression. While transcription factors and their targets [13]. PANDA tissue-specificity is often described based on gene expres- uses a message passing approach to optimize an initial sion levels, we recognize that, by themselves, individual network between transcription factors and target genes genes, or even sets of genes, cannot adequately capture by integrating it with gene co-expression and protein- the variety of processes that distinguish different tissues. protein interaction information. In contrast to other net- Rather, biological function requires the combinatorial work approaches, PANDA does not directly incorporate involvement of multiple regulatory elements, primarily co-expression information between regulators and tar- transcription factors (TFs), that work together and with gets. Instead, edges in PANDA-predicted networks re- other genetic and environmental factors to mediate the flect the overall consistency between a transcription fac- transcription of genes and their protein products [1, 2]. tor's canonical regulatory profile and its target genes' co-expression patterns. A number of studies have shown Gene regulatory network modeling provides a mathe- that analyzing the structure of the regulatory networks matical framework that can summarize the complex in- estimated by PANDA can help elucidate the regulatory teractions between transcription factors, genes, and gene context of genes and transcription factors and provide products [3{6]. Despite the complexity of the regulatory insight in the associated biological processes [14{17]. process, the most widely used network modeling methods are based on pairwise gene co-expression information [7{ The transcriptomic data produced by the Genotype- 10]. While these correlation-based networks may provide Tissue Expression (GTEx) consortium [18] provide us with an unprecedented opportunity to investigate the complex regulatory patterns important for maintaining the diverse functional activity of genes across different ∗[email protected]; [email protected] tissues in the human body [19, 20]. These data include yEqual contribution high-throughput RNA sequencing (RNA-Seq) informa- bioRxiv preprint doi: https://doi.org/10.1101/110601; this version posted August 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 2 tion from 551 research subjects, sampled from 51 post- GTEx Expression data from 38 Tissues/Tissue-Sites mortem body sites and cell lines derived from two tissue types. In this study, we apply PANDA to infer gene regu- latory networks for thirty-eight different tissues by inte- grating GTEx RNA-Seq data with a canonical set of tran- (1) Integrate Regulatory scription factor to target gene edges (based on a motif Information (PANDA) scan of proximal promoter regions) and protein-protein 0.1 0.05 interactions. We then use these tissue-networks to iden- + + 0 tify tissue-specific regulatory interactions, to study the -0.05 -0.1 tissue-specific regulatory context of biological function, PPI Motif Co-expression and to understand how tissue-specificity manifests itself Regulatory Networks for 38 Tissues/Tissue-Sites within the global regulatory framework. By studying the ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● structure of these networks and comparing them between ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● tissues, we are able to gain several important insights into ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● tissue-specific gene regulation. Our overall approach is summarized in Figure 1. (2) Identify Tissue-Specific Network Elements 2. RESULTS 2.1. Identifying Tissue-Specific Network Edges Tissue-Specific Edges Tissue-Specific Nodes (TFs and Genes) We started by reconstructing genome-wide regulatory networks for each human tissue. We downloaded GTEx (3) Characterize Tissue-Specific Regulation of Biological Processes RNA-Seq data from dbGaP (phs000424.v6.p1, 2015-10- 05 release). The RNA-Seq data were preprocessed and then normalized in a sparse-aware manner [21] so as to retain genes that are expressed in only a single or small number of tissues. After filtering and quality control, GO Terms GO our RNA-Seq data included expression information for 4) Investigate the 30; 243 genes measured across 9; 435 samples and 38 dis- Regulatory Context TF Targeting Profiles across tinct tissues (Supplemental Materials and Methods). For 38 Tissues/Tissue-Sites of Tissue-Specificity each tissue, we used PANDA to integrate gene-gene co- expression information from this data set with an ini- tial regulatory network based on a genome-wide motif Figure 1: Schematic overview of our approach to character- scan of 644 transcription factors [22] and pairwise tran- ize tissue-specific gene regulation using the GTEx expression data. We started with gene expression for 9; 435 samples scription factor protein-protein interactions (PPI) from across 38 tissues; the relative sample size of each of the 38 StringDb v10 [23] (Figure 1 and Supplemental Mate- tissues in the GTEx expression data is