Downloaded 13/6/12) (Kanehisa Et Al, 2010)
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/455709; this version posted October 30, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Coherent Transcription Factor Target Networks Illuminate Epithelial Remodelling and Oncogenic Notch Ian M. Overton1,2,3,4*, Andrew H. Sims1, Jeremy A. Owen2,5, Bret S. E. Heale1,6, Matthew J. Ford1,7, Alexander L. R. Lubbock1,8, Erola Pairo-Castineira1, Abdelkader Essafi1,9 1. MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK 2. Department of Systems Biology, Harvard University, Boston MA 02115, USA 3. Centre for Synthetic and Systems Biology (SynthSys), University of Edinburgh, Edinburgh EH9 3BF, UK 4. Centre for Cancer Research and Cell Biology, Queen’s University Belfast, Belfast BT9 7AE, UK 5. Department of Physics, Massachusetts Institute of Technology, Cambridge MA 02139, USA 6. Current address: Intermountain Healthcare, 3930 River Walk, West Valley City UT 84120, USA 7. Current address: Rosalind & Morris Goodman Cancer Research Centre, McGill University, Montreal H3A 0G4, Canada 8. Current address: Department of Biochemistry, Vanderbilt University, Nashville TN 37232, USA 9. Current address: School of Cellular and Molecular Medicine, University of Bristol, Bristol BS8 1TD, UK *Corresponding author: [email protected] Running title: Mapping Transcription Factor Networks Subject categories: Network biology; Chromatin, Epigenetics, Genomics & Functional Genomics Manuscript character count: 168,671 1 bioRxiv preprint doi: https://doi.org/10.1101/455709; this version posted October 30, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Cell identity is governed by gene expression, regulated by Transcription Factor (TF) binding at cis- regulatory modules. Decoding the relationship between patterns of TF binding and the regulation of cognate target genes is nontrivial, remaining a fundamental limitation in understanding cell decision-making mechanisms. Identification of TF physical binding that is biologically ‘neutral’ is a current challenge. We present the ‘NetNC’ software for discovery of functionally coherent TF targets, applied to study gene regulation in early embryogenesis. Predicted neutral binding accounted for 50% to ≥80% of candidate target genes assigned from significant binding peaks. Novel gene functions and network modules were identified, including regulation of chromatin organisation and crosstalk with notch signalling. Orthologues of predicted TF targets discriminated breast cancer molecular subtypes and our analysis evidenced new tumour biology; for example, predicting networks that reshape Waddington’s landscape during EMT-like phenotype switching. Predicted invasion roles for SNX29, ATG3, UNK and IRX4 were validated using a tractable cell model. This work illuminates conserved molecular networks that regulate epithelial remodelling in development and disease, with potential implications for precision medicine. 2 bioRxiv preprint doi: https://doi.org/10.1101/455709; this version posted October 30, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Introduction Transcriptional regulatory factors (TFs) govern gene expression, which is a crucial determinant of phenotype. Therefore, mapping transcriptional regulatory networks is an attractive approach to gain understanding of the molecular mechanisms underpinning both normal biology and disease (Shlyueva et al, 2014; Stampfel et al, 2015; Rhee et al, 2014). TF action is controlled in multiple ways; including protein-protein interactions, DNA sequence affinity, 3D chromatin conformation, post-translational modifications and the processes required for TF delivery to the nucleus (Zabidi & Stark, 2016; Rhee et al, 2014; Khoueiry et al, 2017). The interplay of mechanisms influencing TF specificity across different biological contexts encompasses considerable complexity and genome- scale assignment of TFs to individual genes is challenging (Shlyueva et al, 2014; Wilczynski & Furlong, 2010; Khoueiry et al, 2017). Indeed, much remains to be learned about the regulation of gene expression. For example, the relationship between enhancer sequences and the transcriptional activity of cognate promoters is only beginning to be understood (Khoueiry et al, 2017; Zabidi & Stark, 2016). Prediction of TF occupancy from DNA sequence composition alone has had only limited success, likely because protein interactions influence TF binding specificity (Jolma et al, 2015; Khoueiry et al, 2017). TF binding sites may be determined experimentally using chromatin immunoprecipitation followed by sequencing (ChIP-seq) or microarray (ChIP-chip). These and related methods (e.g. ChIP-exo, DamID) have revealed a substantial proportion of statistically significant ‘neutral’ TF binding, that has apparently no effect on transcription from the promoters of assigned target genes (Shlyueva et al, 2014; Li et al, 2008; Ozdemir et al, 2011; Biggin, 2011). Evidence suggests that neutral binding can arise from TF association with euchromatin; for example, the binding of randomly-selected TFs and genome-wide transcription levels are correlated (Cheng et al, 2012; Consortium, 2012; Brown & Celniker, 2015). Genomic regions that bind large numbers of TFs have been termed Highly Occupied Target (HOT) regions (Roy et al, 2010). HOT regions are enriched for disease SNPs and can function as developmental enhancers (Kvon et al, 2012; Li et al, 2015). However, a considerable proportion of individual TF binding events at HOT regions may have little effect on gene expression and association with chromatin accessibility suggests non-canonical regulatory function such as sequestration of TFs or in 3D genome organisation (Moorman et al, 2006; Montavon et al, 2011) as well as possible technical artefacts (Teytelman et al, 2013). A proportion of apparently neutral binding sites may also have more subtle functions; for example in combinatorial context-specific regulation and in buffering transcriptional noise (Cannavò et al, 3 bioRxiv preprint doi: https://doi.org/10.1101/455709; this version posted October 30, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2016; Stampfel et al, 2015). Furthermore, enhancers may control the expression of genes that are sequence-distant but spatially close due to the 3D chromatin conformation (Moorman et al, 2006; Montavon et al, 2011). Current approaches to match bound TFs to candidate target genes may miss these distant regulatory relationships. Identification of bona fide, functional TF target genes remains a major obstacle in understanding the regulatory networks that control cell behaviour (Biggin, 2011; Stampfel et al, 2015; Keung et al, 2014; Brown & Celniker, 2015; Khoueiry et al, 2017). The set of genes regulated by an individual TF typically have overlapping expression patterns and coherent biological function (Igual et al, 1996; Karczewski et al, 2014; MacArthur et al, 2009). Indeed, gene regulatory networks are organised in a hierarchical, modular structure and TFs frequently act upon multiple nodes of a given module (Hartwell et al, 1999; Hooper et al, 2007). Therefore, we hypothesised that functional TF targets collectively share network properties that may differentiate them from neutrally bound sites. Graph theoretic analysis can reveal biologically meaningful gene modules, including cross-talk between canonical pathways (Ideker et al, 2002; Vidal et al, 2011; Jaeger et al, 2017) and conversely may enable elimination of neutrally bound candidate TF targets derived from statistically significant ChIP-seq or ChIP-chip peaks. For this purpose, we have developed a novel algorithm (NetNC) that may be applied to discover functional TF targets and so help to illuminate mechanisms controlling cell phenotype, for example to inform causality in regulatory network inference (Shlyueva et al, 2014; Wilczynski & Furlong, 2010). NetNC analyses the connectivity between candidate TF target genes in the context of a functional gene network (FGN), in order to discover biologically coherent TF targets. Network approaches afford significant advantages for handling biological complexity, enable genome-scale analysis of gene function (Hu et al, 2016; Greene et al, 2015), and are not restricted to predefined gene groupings used by standard functional annotation tools (e.g. GSEA, DAVID) (Ideker et al, 2002; Subramanian et al, 2005; Huang et al, 2009). FGNs seek to comprehensively represent gene function and provide a useful framework for analysis of noisy real-world data (Marcotte et al, 1999; Pe’er & Hacohen, 2011). Clustering is frequently applied to a FGN in order to define a fixed network decomposition, as basis for identification of biological modules (Enright et al, 2002; Wang et al, 2011). Modules with a high proportion