Modeling Genomic Regulatory Networks with Big Data

Modeling Genomic Regulatory Networks with Big Data

TIGS-1108; No. of Pages 10 Review Modeling genomic regulatory networks with big data Hamid Bolouri Division of Human Biology, Fred Hutchinson Cancer Research Center (FHCRC), 1100 Fairview Avenue North, PO Box 19024, Seattle, WA 98109, USA High-throughput sequencing, large-scale data generation throughput sequencing-based technologies and to compu- projects, and web-based cloud computing are changing tational modeling and analysis. At the same time, GRNs how computational biology is performed, who performs are both complex (i.e., can exhibit hard-to-predict/nonline- it, and what biological insights it can deliver. I review here ar behaviors) and complicated (i.e., they are composed of the latest developments in available data, methods, and large numbers of component parts and interactions). For software, focusing on the modeling and analysis of the this reason, mathematical and computational approaches gene regulatory interactions in cells. Three key findings are essential in GRN research. are: (i) although sophisticated computational resources Cellular behaviors have traditionally been character- are increasingly available to bench biologists, tailored ized as being mediated through highly distinct processes ongoing education is necessary to avoid the erroneous (e.g., DNA replication) and pathways (e.g., the canonical use of these resources. (ii) Current models of the regula- WNT signaling pathway). However, because of widespread tion of gene expression are far too simplistic and need interactions among cellular processes and pathways, the updating. (iii) Integrative computational analysis of large- use of unbiased, genome-wide technologies is essential to scale datasets is becoming a fundamental component of the discovery and characterization of GRNs. molecular biology. I discuss current and near-term oppor- In addition to the bedrock of ‘classical’ cis-regulatory tunities and challenges related to these three points. analysis, GRN modeling today is buttressed by four cor- nerstones: (i) high-throughput technologies, (ii) integrative Gene regulatory networks (GRNs) analysis of complementary data types, (iii) leveraging The past few years have witnessed dramatic milestones in large-scale public datasets, (iv) computational modeling high-throughput sequencing, large-scale data generation, and analysis. This article reviews recent developments and cloud computing, and computational biology. Supra-expo- discusses their implications for future research. nential improvements in the throughput and cost of DNA To maintain coherence and brevity, this review will sequencing (http://www.genome.gov/sequencingcosts/) have focus on developments in human GRN modeling and anal- been accompanied by improvements in accuracy and reduc- ysis. Diverse new GRN modeling opportunities are also tions in the required sample size. These improvements have opening up in both well-studied and less-studied organ- in turn led to the widespread adoption of a broad range of isms. These and the complex GRNs underlying interac- sequencing-based technologies (reviewed in [1]) to charac- tions between hosts and commensal or pathogenic terize not only genomes but also the regulatory interactions organisms are beyond the scope of the present review. that allow genomes to specify cellular structure, function, and behavior. Types and uses of human GRN modeling GRNs are defined as the set of interactions among genes A model is any representation of a system that can facili- and their products (RNAs and proteins) that determine the tate its analysis, communication, or documentation [3]. isoforms, location (cell type), timing, and rate of RNA Modeling is at the heart of GRN research at multiple expression [2] (see Figure 1 for examples). With the possi- levels. At the most basic level, statistical models are at ble exception of some metabolic and physiological process- the heart of all high-throughput data analysis. For exam- es, GRNs are the primary drivers of cellular behavior and ple, statistical models are commonly used to characterize function. DNA fragment length distribution as a first step towards Because GRNs are ultimately specified by the digital the identification of transcription factor (TF) binding peaks code of DNA, they are uniquely accessible to both high- in ChIP-seq (chromatin immunoprecipitation followed by high-throughput DNA sequencing) data. Given filtered data, methods such as network inference Corresponding author: Bolouri, H. ([email protected]). Keywords: gene regulatory networks; modeling; network biology; big data; computa- [4], guilt-by-association (e.g., through network or expres- tional biology; bioinformatics; systems biology. sion clustering; see Figures 2 and 3), and enrichment/over- 0168-9525/$ – see front matter representation analysis (e.g., to identify the impacted ß 2014 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.tig.2014.02.005 pathways or processes [5]) are used to organize genes and their products into broad-brush conceptual models. These models can then be refined and extended by inte- grating multiple data types each highlighting a different Trends in Genetics xx (2014) 1–10 1 TIGS-1108; No. of Pages 10 Review Trends in Genetics xxx xxxx, Vol. xxx, No. x (A) SourceId TargetId Source Sign Target Evidence Weak early effect? Stronger indirect effect? (Franco et al. 2006 shows effect on 32 171 Notch/CSL REPRESSES C/EBPa purified Thy-1+ cells; not seen in crude fracon in Taghon et al 2007) DN2, DN3 Stage negave 265 100 Regulator REPRESSES GATA-2 Tydell et al. 2007 J. Immunol. 264 163 DN3 Specific regulator PROMOTES Eva1 Tydell et al. 2007 J. Immunol. DN2, DN3 Stage negave 265 161 Regulator REPRESSES c-Kit Taghon et al. 2006 Immunity; Yui & Rothenberg 2004 J. Immunol. 264 165DN3 Specific regulator PROMOTES Notch1 Taghon et al. 2006 Immunity Key: 192 82 Notch-mod. GATA-3 REPRESSES TCF-1 Taghon et al 2007 Nat Immunol Expression: Wang, D.et al.: Anderson, J Immunol. 2006 Jul 1;177(1):109-19; Franco et al 2006 32 117 Notch/CSL PROMOTES HEBAlt PNAS. High Medium Low 32 160 Notch/CS L PROMOTES CD25 Taghon et al 2005 ; I. Maillardet al.: Pear, J Exp Med. 2006 Oct 2;203(10):2239-45. Taghon et al 2005 Genes Dev; Franco et al. 2006 PNAS; Taghon et al 2007 Nat 32 89 Notch/CSL PROMOTES Deltex1 Immunol; many earlier references 32 116 Notch/CSL PROMOTES Ets2 Taghon et al 2007 Nat Immunol; Franco et al 2006 PNAS GATA-3 GATA-3 GATA-3 GATA-3 32 34 Notch/CSL PROMOTES HES-1 Taghon et al 2005 Genes Dev; many other references Notch-mod. Notch-mod. Notch-mod. Notch-mod. 32 149 GATA-3 Notch/CSL PROMOTEGATA-3S LEF-1 Taghon et al 2007 Nat ImmunolGATA-3 GATA-3 32 165 Notch/CSL PROMOTES Notch1 Taghon et al 2007 Nat Immunol 32 166TCF-1 Notch/CSBcl11BL PROMOTE TCF-1S Notch3Bcl11B Taghon et al 2007 Nat ImmunoTCF-1 l Bcl11B TCF-1 Bcl11B Reizis, B. & Leder Genes Dev. 2002 Feb 1;16(3):295-300; Franco et al. 2006 PNAS; 32 0 Notch/CSL PROMOTES PTa Taghon et al 2007 Nat Immunol α α HES-1 Nrarp PTα HES-1 Nrarp PTα HES-1 Nrarp PT HES-1 Nrarp PT Taghon et al 2007 Nat Immunol; Franco et al 2006 PNAS; cf. also Nakagawa, 32 153 Notch/CSL PROMOTES Runx1 M.et al.: Chiba, Blood. 2006 Nov 15;108(10):3329-34. Notch3 Notch3 Notch3 Notch3 Deltex1 Notch1 CD25 Deltex1 Notch1 CD25 Deltex1 Notch1 CD25 Deltex1 Notch1 CD25 Pou6f1 LEF-1 SATB1 Pou6f1 LEF-1 SATB1 Pou6f1 LEF-1 SATB1 Pou6f1 LEF-1 SATB1 DN1 Bcl2 Erg DN2 Bcl2 Erg DN3 Bcl2 Erg DN4 Bcl2 Erg (B) TRENDS in Genetics Figure 1. Examples of gene regulatory network analysis, documentation, and visualization. (A) Part of BioTapestry visualization of a proposed early T cell specification gene regulatory network (GRN) (adapted from: http://www.its.caltech.edu/tcellgrn/Oldnetwork.html). Each gene (symbol with a bent arrow) is represented as having a regulatory region (horizontal line) and a transcriptional output (arrow). A transcription factor (TF)–DNA binding interaction is depicted as an arrow incident on the regulatory region of a gene. Protein–protein interactions are depicted by circles with incident and output arrows. The background color of each gene indicates the fold-change in expression of the gene at a particular developmental stage. Snapshots of the network over four developmental stages are shown [double negative (DN) 1 to 4]. In the interactive viewer, clicking on a gene brings up a table showing the experimental data supporting the indicated regulatory interactions. (B) Cytoscape visualization of potential T cell specification gene regulatory interactions derived from ChIP-seq and gene expression data. Arrows represent regulatory interactions. Node colors and sizes represent gene expression levels at early and late developmental stages. The inset shows a zoomed-in view of the lower portion of the network. Using Cytoscape utilities, the user can quickly and easily identify a set of genes coregulated by Sfpi1 and Lyl1 (edge arrows highlighted in gold). This example network was derived during a 1.5 h introductory laboratory session by novice computational biology students (see http://www.bu.edu/computationalimmunology/summer-school/ for details). 2 TIGS-1108; No. of Pages 10 Review Trends in Genetics xxx xxxx, Vol. xxx, No. x (A) (B) PC2 Expression in condion 2 Expression in condion 1 PC1 (C) (D) PC2 PC2 PC1 PC1 TRENDS in Genetics Figure 2. An example of how choices in data processing impact the findings. (A) Scatter plot of example simulated data showing the expression of 200 genes in two conditions. This simulated dataset was intentionally generated to have two distinct characteristics. First, genes near the lower-right (upper-left) corner of the plot are high (low) in condition 1 and low (high) in condition 2. In addition, there are two well-separated groups of genes distinguished by having generally higher (lower) expression in both conditions. (B) The same data plotted in the space of the principal components (PC) (in this case, because there were only two conditions, there are only two PCs).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us