Systematic Target Function Annotation of Human Transcription Factors Yong Fuga Li1,2,3* and Russ B

Systematic Target Function Annotation of Human Transcription Factors Yong Fuga Li1,2,3* and Russ B

Li and Altman BMC Biology (2018) 16:4 DOI 10.1186/s12915-017-0469-0 RESEARCH ARTICLE Open Access Systematic target function annotation of human transcription factors Yong Fuga Li1,2,3* and Russ B. Altman2,4* Abstract Background: Transcription factors (TFs), the key players in transcriptional regulation, have attracted great experimental attention, yet the functions of most human TFs remain poorly understood. Recent capabilities in genome-wide protein binding profiling have stimulated systematic studies of the hierarchical organization of human gene regulatory network and DNA-binding specificity of TFs, shedding light on combinatorial gene regulation. We show here that these data also enable a systematic annotation of the biological functions and functional diversity of TFs. Result: We compiled a human gene regulatory network for 384 TFs covering the 146,096 TF–target gene (TF–TG) relationships, extracted from over 850 ChIP-seq experiments as well as the literature. By integrating this network of TF–TF and TF–TG relationships with 3715 functional concepts from six sources of gene function annotations, we obtained over 9000 confident functional annotations for 279 TFs. We observe extensive connectivity between TFs and Mendelian diseases, GWAS phenotypes, and pharmacogenetic pathways. Further, we show that TFs link apparently unrelated functions, even when the two functions do not share common genes. Finally, we analyze the pleiotropic functions of TFs and suggest that the increased number of upstream regulators contributes to the functional pleiotropy of TFs. Conclusion: Our computational approach is complementary to focused experimental studies on TF functions, and the resulting knowledge can guide experimental design for the discovery of unknown roles of TFs in human disease and drug response. Keywords: Transcription factor, Regulatory network, Gene function annotation, Functional pleiotropy, Regulator diversity, Target gene, Database, Function enrichment, Co-regulation Background fine-granular expression control [6]. In this study, we Regulation of gene expression is essential for the adopt a broad definition of TF that includes the cofac- realization of cell type-specific phenotypes [1] during tors and other transcription initiation factors. normal development [2] and the adaptation of cellular The pivotal role of TFs in development and cell organisms to their environment [3]. To a large degree, identity determination is highlighted by the induced transcriptional regulation occurs through the interaction pluripotent stem cell (iPSC) technology [7, 8] and trans- of protein factors with the genomic DNA [4]. Multiple induction techniques [9, 10], in which the introduction proteins, including the chromatin remodelers, transcrip- of just a few specific TFs is sufficient for converting tion factors (TFs), cofactors, and other transcription fibroblast cells into pluripotent stem cells, or converting initiation factors [5], work in coordination to regulate one differentiated cell type, e.g., pancreatic exocrine the spatiotemporal details of gene expression. In the nar- cells, directly into another differentiated cell type, e.g., row sense, TFs are proteins that bind DNA in a β-cells. In addition, TFs are key players controlling sequence-specific manner and mediate the integrations diverse physiological functions, ranging from metabol- of other proteins with specific target genes (TGs) for ism [11, 12], chemical and mechanical stress responses [13–16], song-learning [17, 18], to longevity and aging * Correspondence: [email protected]; [email protected] [19–21]. Many TFs are directly involved in diseases such as 1 Stanford Genome Technology Center, Stanford, CA, USA cancer, diabetes, and neural developmental disorders [9]. 2Department of Bioengineering, Stanford University, Stanford, CA, USA Full list of author information is available at the end of the article © Li et al. 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http:// creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Li and Altman BMC Biology (2018) 16:4 Page 2 of 18 TFs have attracted intense research attention [22]; yet, evaluates each gene as a potential TG based on both the the biological functions of most TFs are still poorly locations and the intensities of the TF binding signals understood. The number of human TFs is estimated to relative to the gene transcriptional start site(s). Overall, be approximately 1500–2000 based on DNA-binding 149 (39%) TFs are covered only by high-throughput ex- domain evidence [23–26]. In total, the sequence-specific periments, among which 52 (35%) are covered by the DNA-binding activities of only 564 TFs are confirmed ENCODE consortium [32, 37] and 107 are covered by by experimental evidence and the existence of an individual research labs (based on data published by additional 490 TFs is supported indirectly by phylogen- October 2013). Meanwhile, 122 (32%) TFs are retrieved etic evidence or author claims, based on the Gene only from low-throughput experiments, and 113 (29%) Ontology (GO) database [27–30]. Limited knowledge is TFs from both low- and high-throughput experiments available on the biological functions of most TFs, with a (Additional file 1: Figure S1A). small number of ‘famous’ TFs, such as TP53, attracting A total of 16,967 unique TGs of TFs are available, much attention [23]. However, recent developments of including both TFs and non-TFs. We filtered the TGs high-throughput technologies such as ChIP-seq and identified in high-throughput experiments to achieve an DNase-seq [31] provide an unprecedented amount of data estimated false discovery rate (FDR) of 0.01. Combining on gene regulation, with binding profiles for over 100 TFs all sources, 146,096 TF–TG relationships were obtained. from ENCODE alone [32]. This has spurred systematic Each gene was regulated by 8.6 TFs in the compendium data-driven studies on transcriptional regulation, such as on average, while each TF in the compendium regulated the discovery of cis-regulatory motifs [33, 34], the 380.5 genes (Additional file 1: Figure S1B). Further, 63% mapping of the hierarchical architecture of human gene of TGs were each regulated by five or more TFs, while regulatory networks, and the modeling of combinatorial 18% were each regulated by a single TF in the compen- regulation [32, 35–38]. At the same time, analytics tools dium. Most TFs also had regulators within the compen- have been developed for annotating ChIP-seq data dium, with the exception of 14 TFs that appeared to be [39–41], some allowing analysis of GO term enrich- master regulators among the TFs in the compendium, ment for the binding sites [42–44]. including BCOR, GLI2, HLF, HNF4G, MAZ, NELFE, In this study, we integrate the existing knowledge NFATC1, NOTCH1, PHOX2A, RXRA, STAT4, SOX10, about functions and phenotypes of human genes with TEAD2, and THRA, although RXRA and SOX10 were the transcriptional regulatory network to study the func- self-regulated. Note that these TFs could be still be regu- tions of human TFs. We define the ‘target functions’ of a lated by TFs without existing ChIP-seq data or regulated TF as the statistical overrepresented functions among its through distant cis-elements not effectively captured by TGs, and provide a systematic annotation of TF func- current experimental/computational approaches. tions, ranging from metabolic pathways to disease phe- notypes. In parallel, we define the functional similarity of Defining the target functions of TFs two-TFs based on their TG overlaps, independent of the Transcription factors perform their functions by (1) availability of gene function annotations, and annotate interacting with proteins and cis-regulatory elements each TF by functionally similar TFs (Fig. 1). We study and (2) consequently regulating the expression of down- the pleiotropic functions of individual TFs and show that stream TGs. There are hence two aspects of functions multifunctionality is associated with the number of up- for a TF, the molecular functions of a TF that enables its stream regulators of the TFs. With these analyses, we regulation of the TGs, and the biological functions demonstrate a computational approach for achieving exerted by the genes that are under control of the TF. systematic understanding of TF functions. Formally, we define the target functions (e.g., target diseases, target signaling pathways) of a TF as the con- Results sensus functions of the TGs, and we identify the target The compendium of human TF TGs functions of a TF by detecting the enrichment of func- We compiled a TF–TG data compendium covering the tional terms in the TGs. The TGs, as a whole, precisely direct transcriptional regulation targets of 384 unique define the biological functions regulated by a TF, while TFs extracted from over 850 ChIP-seq experiments as the target functions summarize the functional impacts well as the literature with low throughput experimental upon perturbation of a TF. evidence. Low throughput experiments, ENCODE ChIP- We first compiled 3715 functional concepts

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us