Functional Identification and Characterization of Cis-Regulatory Elements Christopher Michael Fiore Washington University in St

Total Page:16

File Type:pdf, Size:1020Kb

Functional Identification and Characterization of Cis-Regulatory Elements Christopher Michael Fiore Washington University in St Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Summer 8-15-2015 Functional Identification and Characterization of cis-Regulatory Elements Christopher Michael Fiore Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/art_sci_etds Part of the Biology Commons Recommended Citation Fiore, Christopher Michael, "Functional Identification and Characterization of cis-Regulatory Elements" (2015). Arts & Sciences Electronic Theses and Dissertations. 579. https://openscholarship.wustl.edu/art_sci_etds/579 This Dissertation is brought to you for free and open access by the Arts & Sciences at Washington University Open Scholarship. It has been accepted for inclusion in Arts & Sciences Electronic Theses and Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS Division of Biology and Biomedical Sciences Computational and Systems Biology Dissertation Examination Committee: Barak A. Cohen, Chair Joe Dougherty Justin Fay Gary Stormo Andrew Yoo Functional Identification and Characterization of cis-Regulatory Elements by Christopher M Fiore A dissertation presented to the Graduate School of Arts & Sciences of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy August 2015 St. Louis, Missouri © 2015, Christopher M Fiore Table of Contents List of Figures ................................................................................................................................. iv List of Tables .................................................................................................................................. vi Acknowledgments ........................................................................................................................ vii Abstract ........................................................................................................................................... ix Chapter 1: Introduction .................................................................................................................... 1 Importance of transcriptional regulation ...................................................................................... 1 Mechanisms of transcriptional regulation .................................................................................... 2 Discovery and modeling of cis-regulatory elements ................................................................... 4 Regulation of Pluripotency .......................................................................................................... 9 High-throughput reporter assays and synthetic CREs ............................................................... 12 Focus of Dissertation ................................................................................................................. 12 Chapter 2: Interactions between transcription factors help specify cis-regulation in pluripotency ....................................................................................................................................................... 16 Introduction ................................................................................................................................ 17 Results ........................................................................................................................................ 19 Discussion .................................................................................................................................. 23 Materials and Methods ............................................................................................................... 25 Supplemental information .......................................................................................................... 33 Figures ........................................................................................................................................ 37 Supplemental Tables .................................................................................................................. 40 Supplemental Figures ................................................................................................................. 44 Chapter 3: Changes in cis-regulatory rules during differentiation ................................................ 51 Abstract ...................................................................................................................................... 51 Methods ...................................................................................................................................... 55 Figures ........................................................................................................................................ 57 Chapter 4: High-throughput functional testing of ENCODE segmentation predictions ............... 61 Introduction ................................................................................................................................ 63 Results ........................................................................................................................................ 64 Discussion .................................................................................................................................. 71 ii Methods ...................................................................................................................................... 73 Tables ......................................................................................................................................... 80 Figures ........................................................................................................................................ 81 Supplementary Figures and Tables ............................................................................................ 85 Chapter 5: Discussion .................................................................................................................. 101 Synthetic CREs and TF interactions ........................................................................................ 101 Predicting active genomic regulatory elements ....................................................................... 104 Future Directions ..................................................................................................................... 105 Conclusions .............................................................................................................................. 109 References ................................................................................................................................... 110 iii List of Figures Figure 2.1: Thermodynamic model of OSKE library…………………………………….…...…37 Figure 2.2: Expression by unique types of binding sites …………………………………..……38 Figure 2.3: Expression of CREs with only Klf4 binding sites…………………………………...39 Figure 2.S1: OSKE CRE-seq library…………………………………………………………….44 Figure 2.S2: Interaction rules in statistical thermodynamic model of transcription……………..45 Figure 2.S3: Enriched TF binding near interacting partners……………………………………..46 Figure 2.S4: Expression of CREs with two types of binding sites………………………………47 Figure 2.S5: Expression of Klf genes in overexpression conditions…………………………….48 Figure 2.S6: Predicted expression of KBS library……………………………………………….49 Figure 2.S7: Clusters of Klf4 binding sites function in the genome……………………………..50 Figure 3.1: Reproducibility of expression measurements in RA………………...………………57 Figure 3.2: Expression of CREs in LIF vs. RA media…………………………………………...58 Figure 3.3: Expression of CREs by unique sites…………………………………………………59 Figure 3.4: Thermodynamic model predictions of RA expression………………………………60 Figure 4.1: Reproducible expression measurements show differences in expression by segmentation class…………………..………………………………...……….…….81 Figure 4.2: Lower H3K27ac and H3K36me3 signal are associated with higher Weak Enhancer expression……...…………………………………………………...…….………….82 Figure 4.3: Chromatin features and sequence-specific binding identify active sequences………83 Figure 4.S1: Luciferase assay expression measurements…………………………………….….95 Figure 4.S2: Computing the fraction of active sequences does not depend on the method of choosing short segments………………………………………………….………..96 Figure 4.S3: Weak Enhancers control higher median expression than Enhancers….….………..97 Figure 4.S4: Expression and GC Fraction of Scrambled CREs………………………….………98 iv Figure 4.S5: Active Weak Enhancer and Enhancer CREs have lower levels of H3K36me3 and H3K27ac……………...………………………………………...……...…………..99 Figure 4.S6: Diagram of 200-mer Oligos Used to Construct the CRE-seq Library…..………..100 v List of Tables Table 2.S1: Fit parameter values from thermodynamic models. ................................................... 40 Table 2.S2: Sequences used………………………...……………………………………………41 Table 4.1: Percentage of active CREs by segmentation class…………………...……..………..80 Table 4.S1. Percentage of repressed CREs by segmentation class………………………………85 Table 4.S2: Predictive capability of single chromatin features………………………………….86 Table 4.S3: Enrichment of single chromatin features in subsets of CREs………..….………….87 Table 4.S4: Enriched Motifs in Activated Sequences………………………...…..……………..88 Table 4.S5: Logistic Regression
Recommended publications
  • Transformations of Lamarckism Vienna Series in Theoretical Biology Gerd B
    Transformations of Lamarckism Vienna Series in Theoretical Biology Gerd B. M ü ller, G ü nter P. Wagner, and Werner Callebaut, editors The Evolution of Cognition , edited by Cecilia Heyes and Ludwig Huber, 2000 Origination of Organismal Form: Beyond the Gene in Development and Evolutionary Biology , edited by Gerd B. M ü ller and Stuart A. Newman, 2003 Environment, Development, and Evolution: Toward a Synthesis , edited by Brian K. Hall, Roy D. Pearson, and Gerd B. M ü ller, 2004 Evolution of Communication Systems: A Comparative Approach , edited by D. Kimbrough Oller and Ulrike Griebel, 2004 Modularity: Understanding the Development and Evolution of Natural Complex Systems , edited by Werner Callebaut and Diego Rasskin-Gutman, 2005 Compositional Evolution: The Impact of Sex, Symbiosis, and Modularity on the Gradualist Framework of Evolution , by Richard A. Watson, 2006 Biological Emergences: Evolution by Natural Experiment , by Robert G. B. Reid, 2007 Modeling Biology: Structure, Behaviors, Evolution , edited by Manfred D. Laubichler and Gerd B. M ü ller, 2007 Evolution of Communicative Flexibility: Complexity, Creativity, and Adaptability in Human and Animal Communication , edited by Kimbrough D. Oller and Ulrike Griebel, 2008 Functions in Biological and Artifi cial Worlds: Comparative Philosophical Perspectives , edited by Ulrich Krohs and Peter Kroes, 2009 Cognitive Biology: Evolutionary and Developmental Perspectives on Mind, Brain, and Behavior , edited by Luca Tommasi, Mary A. Peterson, and Lynn Nadel, 2009 Innovation in Cultural Systems: Contributions from Evolutionary Anthropology , edited by Michael J. O ’ Brien and Stephen J. Shennan, 2010 The Major Transitions in Evolution Revisited , edited by Brett Calcott and Kim Sterelny, 2011 Transformations of Lamarckism: From Subtle Fluids to Molecular Biology , edited by Snait B.
    [Show full text]
  • An Atlas of Gene Regulatory Elements in Adult Mouse Cerebrum Yang Eric Li1*, Sebastian Preissl2*, Xiaomeng Hou2, Ziyang Zhang1
    bioRxiv preprint doi: https://doi.org/10.1101/2020.05.10.087585; this version posted May 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 An Atlas of Gene Regulatory Elements in Adult Mouse Cerebrum 2 3 Yang Eric Li1*, Sebastian Preissl2*, Xiaomeng Hou2, Ziyang Zhang1, Kai Zhang1, Rongxin 4 Fang1, Yunjiang Qiu1, Olivier Poirion2, Bin Li1, Hanqing Liu3, Xinxin Wang2, Jee Yun Han2, 5 Jacinta Lucero4, Yiming Yan1, Samantha Kuan1, David Gorkin2, Michael Nunn3, Eran A. 6 Mukamel5, M. Margarita Behrens4, Joseph Ecker3,6 and Bing Ren1,2,7 7 8 *these authors contributed equally 9 10 1Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, CA 92093, USA 11 2Center for Epigenomics, Department of Cellular and Molecular Medicine, University of 12 California, San Diego, School of Medicine, La Jolla, CA, USA. 13 3Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, 14 92037, USA. 15 4Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, 16 CA 92037, USA 17 5Department of Cognitive Science, University of California, San Diego, La Jolla, CA 18 92037, USA. 19 6Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, 20 92037, USA. 21 7Institute of Genomic Medicine, Moores Cancer Center, University of California San 22 Diego, School of Medicine, La Jolla, CA, USA. 23 24 Correspondence: Bing Ren ([email protected]) 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.10.087585; this version posted May 11, 2020.
    [Show full text]
  • Evolution of Genomic Expression
    C H A P T E R 5 Evolution of Genomic Expression Bernardo Lemos, Christian R. Landry, Pierre Fontanillas, Susan P. Renn, Rob Kulathinal, Kyle M. Brown, and Daniel L. Hartl Introduction Genomic regulation is key to cellular differentiation, tissue morphogenesis, and development. Increasing evidence indicates that evolutionary diversity of phenotypes—from cellular to organismic—may also be, in large part, the result of variation in the regulation of genomic expression. In this chapter we explore the complexity of gene regulation from the perspective of single genes and whole genomes. The first part describes the major factors affecting gene expression levels, from rates of gene transcrip- tion—as mediated by promoter–enhancer interactions and chromatin mod- ifications—to rates of mRNA degradation. This description underscores the multiple levels at which genomic expression can be regulated as well as the complexity and variety of mechanisms used. We then briefly describe the major experimental and computational biology techniques for analyzing gene expression variation and its underlying causes. The final section reviews our understanding of the role of regulatory variation in evolution, including the molecular evolution and population genetics of noncoding DNA, as well as the inheritance and phenotypic evolution of levels of mRNA abundance. The Complex Regulation of Genomic Expression The regulation of gene expression is a complex and dynamic process. It is not a simple matter to turn a gene on and off, let alone precisely regulate its level of expression. Regulation can be accomplished through various mech- anisms at nearly every step of the process of gene expression. Furthermore, each mechanism may require a variety of elements, including DNA sequences, RNA molecules, and proteins, acting in combination to deter- 2 Chapter Five Evolution of Genomic Expression 3 mine the final amount, timing, and location of functional gene product.
    [Show full text]
  • In Response to DNA Damage and C&Sol
    Oncogene (2009) 28, 3235–3245 & 2009 Macmillan Publishers Limited All rights reserved 0950-9232/09 $32.00 www.nature.com/onc ORIGINAL ARTICLE C/EBPa expression is partially regulated by C/EBPb in response to DNA damage and C/EBPa-deficient fibroblasts display an impaired G1 checkpoint R Ranjan1, EA Thompson1, K Yoon2 and RC Smart1 1Cell Signaling and Cancer Group, Department of Environmental and Molecular Toxicology, North Carolina State University, Raleigh, NC, USA and 2National Cancer Center, Division of Common Cancers, Lung Cancer Branch, Goyang-si, Gyeonggi-do, South Korea We observed that CCAAT/enhancer-binding protein involved in homo- or hetero-dimerization (Ramji and (C/EBP)a is highly inducible in primary fibroblasts by Foka, 2002). The N-terminal region contains transcrip- DNA-damaging agents that induce strand breaks, alky- tion activation and regulatory domains that interact late and crosslink DNA as well as those that produce with basal transcription apparatus and transcription bulky DNA lesions. Fibroblasts deficient in C/EBPa co-activators. There are six members of the C/EBP family À/À (C/EBPa ) display an impaired G1 checkpoint as and C/EBPs have important functions in fundamental evidenced by an inappropriate entry into the S-phase in cellular processes, including proliferation, apoptosis, response to DNA damage, and these cells also display an differentiation, inflammation, senescence and energy enhanced G1/S transition in response to mitogens. The metabolism (Ramji and Foka, 2002; Johnson, 2005). induction of C/EBPa by DNA
    [Show full text]
  • The Life-Cycle of Operons
    Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title The Life-cycle of Operons Permalink https://escholarship.org/uc/item/0sx114h9 Authors Price, Morgan N. Arkin, Adam P. Alm, Eric J. Publication Date 2005-11-18 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Title: The Life-cycle of Operons Authors: Morgan N. Price, Adam P. Arkin, and Eric J. Alm Author a±liation: Lawrence Berkeley Lab, Berkeley CA, USA and the Virtual Institute for Microbial Stress and Survival. A.P.A. is also a±liated with the Howard Hughes Medical Institute and the UC Berkeley Dept. of Bioengineering. Corresponding author: Eric Alm, [email protected], phone 510-486-6899, fax 510-486-6219, address Lawrence Berkeley National Lab, 1 Cyclotron Road, Mailstop 977-152, Berkeley, CA 94720 Abstract: Operons are a major feature of all prokaryotic genomes, but how and why operon structures vary is not well understood. To elucidate the life-cycle of operons, we compared gene order between Escherichia coli K12 and its relatives and identi¯ed the recently formed and destroyed operons in E. coli. This allowed us to determine how operons form, how they become closely spaced, and how they die. Our ¯ndings suggest that operon evolution is driven by selection on gene expression patterns. First, both operon creation and operon destruction lead to large changes in gene expression patterns. For example, the removal of lysA and ruvA from ancestral operons that contained essential genes allowed their expression to respond to lysine levels and DNA damage, respectively. Second, some operons have undergone accelerated evolution, with multiple new genes being added during a brief period.
    [Show full text]
  • Molecular Basis of the Function of Transcriptional Enhancers
    cells Review Molecular Basis of the Function of Transcriptional Enhancers 1,2, 1, 1,3, Airat N. Ibragimov y, Oleg V. Bylino y and Yulii V. Shidlovskii * 1 Laboratory of Gene Expression Regulation in Development, Institute of Gene Biology, Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia; [email protected] (A.N.I.); [email protected] (O.V.B.) 2 Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Institute of Gene Biology, Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia 3 I.M. Sechenov First Moscow State Medical University, 8, bldg. 2 Trubetskaya St., 119048 Moscow, Russia * Correspondence: [email protected]; Tel.: +7-4991354096 These authors contributed equally to this study. y Received: 30 May 2020; Accepted: 3 July 2020; Published: 5 July 2020 Abstract: Transcriptional enhancers are major genomic elements that control gene activity in eukaryotes. Recent studies provided deeper insight into the temporal and spatial organization of transcription in the nucleus, the role of non-coding RNAs in the process, and the epigenetic control of gene expression. Thus, multiple molecular details of enhancer functioning were revealed. Here, we describe the recent data and models of molecular organization of enhancer-driven transcription. Keywords: enhancer; promoter; chromatin; transcriptional bursting; transcription factories; enhancer RNA; epigenetic marks 1. Introduction Gene transcription is precisely organized in time and space. The process requires the participation of hundreds of molecules, which form an extensive interaction network. Substantial progress was achieved recently in our understanding of the molecular processes that take place in the cell nucleus (e.g., see [1–9]).
    [Show full text]
  • Transcription in Eukaryotes
    Transcription in eukaryotes Chromatin structure and its effects on transcription RNA polymerases Promoters General Transcription Factors Activators and Repressors Enhancers and ( Silencers ) Order of events leading to transcription initiation in eukaryotes at a specific promoter CRC … and chemical DNA modifications The order of steps on the pathway to transcription initiation appears to be different for different promoters Acção concertada de: -Activadores/ repressores ( proteínas auxiliares acessórias) -Proteínas de remodelação da cromatina -Capacidade de ligação dos factores gerais da transcrição Chromatin Remodeling Complexes (CRC) or Nucleosome remodeling factors ATPase/Helicase activity and DNA binding protein motifs Histone acetylation is one of the Histone histone chemical modifications acetylation characteristic of actively transcribed chromatin Interaction with other histones and with DNA Lys + HAT- histone acetyltransferase HDAC- histone deacetylase DNA chemical modifications affecting transcription initiation in eukaryotes How DNA methylation may help turning off genes? The binding of gene regulatory proteins and the general transcription machinery near an active promoter may prevent DNA methylation by excluding de novo methylases . If most of these proteins dissociate from the DNA, however, as generally occurs when a cell no longer produces the required activator proteins , the DNA becomes methylated , which enables other proteins to bind, and these shut down the gene completely by further altering chromatin structure . DNA
    [Show full text]
  • GENE REGULATION Differences Between Prokaryotes & Eukaryotes
    GENE REGULATION Differences between prokaryotes & eukaryotes Gene function Description of Prokaryotic Chromosome and E.coli Review Differences between Prokaryotic & Eukaryotic Chromosomes Four differences Eukaryotic Chromosomes Form Length in single human chromosome Length in single diploid cell Proteins beside histones Proportion of DNA that codes for protein in prokaryotes eukaryotes humans Regulation of Gene Expression in Prokaryotes Terms promoter structural gene operator operon regulator repressor corepressor inducer The lac operon - Background E.coli behavior presence of lactose and absence of lactose behavior of mutants outcome of mutants The Lac operon Regulates production of b-galactosidase http://www.sumanasinc.com/webcontent/animations/content/lacoperon.html The trp operon Regulates the production of the enzyme for tryptophan synthesis http://bcs.whfreeman.com/thelifewire/content/chp13/1302002.html General Summary During transcription, RNA remains briefly bound to the DNA template Structural genes coding for polypeptides with related functions often occur in sequence Two kinds of regulatory control positive & negative General Summary Regulatory efficiency is increased because mRNA is translated into protein immediately and broken down rapidly. 75 different operons comprising 260 structural genes in E.coli Gene Regulation in Eukaryotes some regulation occurs because as little as one % of DNA is expressed Gene Expression and Differentiation Characteristic proteins are produced at different stages of differentiation producing cells with their own characteristic structure and function. Therefore not all genes are expressed at the same time As differentiation proceeds, some genes are permanently “turned” off. Example - different types of hemoglobin are produced during development and in adults. DNA is expressed at a precise time and sequence in time.
    [Show full text]
  • Minireview Mcbindall—A Better Name for CCAAT/Enhancer Binding
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Cell, Vol. 107, 259–261, November 2, 2001, Copyright 2001 by Cell Press McBindall—A Better Name for Minireview CCAAT/Enhancer Binding Proteins? Steven L. McKnight1 code a heterodimeric transcription factor, as well as Department of Biochemistry recognition that the Myc proto-oncogene encodes a UT Southwestern Medical Center transcription factor. Such studies also provided a frame- 5323 Harry Hines Boulevard work and powerful battery of reagents useful for the Dallas, Texas 75390 biological study of C/EBP itself. Finally, biochemical studies of C/EBP helped dispel the myth, anticipated from earlier work on bacterial gene expression, that C/EBP-related transcription factors regulate the bal- eukaryotic transcription factors must have exquisite ance between cell proliferation and mitotic growth ar- binding specificity. This shift of thinking led to the under- rest during terminal differentiation. Three new studies standing that eukaryotic genes employ complex en- give evidence that this regulation is mediated by pro- hancers that integrate the instruction of multiple tran- tein:protein interactions completely distinct from the scription factors acting in a combinatorial manner role of C/EBPs in gene expression. (Thompson and McKnight, 1992). The previous issue of Cell and the most recent issue In the mid-1980s Barbara Graves and Peter Johnson of Molecular Cell present three exciting papers relevant were postdoctoral fellows in my laboratory at the Carne- to the function of various members of the C/EBP family gie Embryology Department. Barbara and Peter were of transcription factors.
    [Show full text]
  • I = Chpt 15. Positive and Negative Transcriptional Control at Lac BMB
    BMB 400 Part Four - I = Chpt 15. Positive and Negative Transcriptional Control at lac B M B 400 Part Four: Gene Regulation Section I = Chapter 15 POSITIVE AND NEGATIVE CONTROL SHOWN BY THE lac OPERON OF E. COLI A. Definitions and general comments 1. Operons An operon is a cluster of coordinately regulated genes. It includes structural genes (generally encoding enzymes), regulatory genes (encoding, e.g. activators or repressors) and regulatory sites (such as promoters and operators). 2. Negative versus positive control a. The type of control is defined by the response of the operon when no regulatory protein is present. b. In the case of negative control, the genes in the operon are expressed unless they are switched off by a repressor protein. Thus the operon will be turned on constitutively (the genes will be expressed) when the repressor in inactivated. c. In the case of positive control, the genes are expressed only when an active regulator protein, e.g. an activator, is present. Thus the operon will be turned off when the positive regulatory protein is absent or inactivated. Table 4.1.1. Positive vs. negative control BMB 400 Part Four - I = Chpt 15. Positive and Negative Transcriptional Control at lac 3. Catabolic versus biosynthetic operons a. Catabolic pathways catalyze the breakdown of nutrients (the substrate for the pathway) to generate energy, or more precisely ATP, the energy currency of the cell. In the absence of the substrate, there is no reason for the catabolic enzymes to be present, and the operon encoding them is repressed. In the presence of the substrate, when the enzymes are needed, the operon is induced or de-repressed.
    [Show full text]
  • Escherichia Coli (Gene Fusion/Attenuator/Terminator/RNA Polymerase/Ribosomal Proteins) GERARD BARRY, CATHERINE L
    Proc. Natl. Acad. Sci. USA Vol. 76, No. 10, pp. 4922-4926, October 1979 Biochemistry Control features within the rplJL-rpoBC transcription unit of Escherichia coli (gene fusion/attenuator/terminator/RNA polymerase/ribosomal proteins) GERARD BARRY, CATHERINE L. SQUIRES, AND CRAIG SQUIRES Department of Biological Sciences, Columbia University, New York, New York 10027 Communicated by Cyrus Levinthal, July 2, 1979 ABSTRACT Gene fusions constructed in vitro have been regulation could occur (4-7). Yet under certain conditions, used to examine transcription regulatory signals from the operon coordinate of the RNA which encodes ribosomal proteins L10 and L7/12 and the RNA expression polymerase subunits and polymerase P and #I subunits (the rplJL-rpoBC operon). Por- ribosomal proteins is not observed. This is especially true of the tions of this operon, which were obtained by in vitro deletions, rplJL-rpoBC transcription unit which encodes the ribosomal have been placed between the ara promoter and the lacZgene proteins L10 and L7/12 and the RNA polymerase subunits 13 in the gene-fusion plasmid pMC81 developed by M. Casadaban and 13'. For example, only the ribosomal proteins are modulated and S. Cohen. The effect of the inserted DNA segment on the by the stringent regulation system (8, 9) whereas a transient expression of the IacZ gene (in the presence and absence of arabinose) permits the localization of regulatory signals to dis- stimulatory effect of rifampicin is specific for the RNA poly- crete regions of the rpIJL-rpoBC operon. An element that re- merase subunits (10, 11). In addition, different amounts of duces the level of distal gene expression to one-sixth is located mRNA hybridize to the rpl and rpo regions of the rplJL-rpoBC on a fragment which spans the rplL-rpoB intercistronic region.
    [Show full text]
  • Cross-Species Regulatory Sequence Activity Prediction
    bioRxiv preprint doi: https://doi.org/10.1101/660563; this version posted September 25, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Cross-species regulatory sequence activity prediction David R. Kelley Calico Life Sciences South San Francisco, CA [email protected] September 25, 2019 Abstract Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles of gene regulation and guided genetic variation analysis. While the human genome has been extensively annotated and studied, model organisms have been less explored. Model organism genomes offer both additional training sequences and unique annotations describing tissue and cell states unavailable in humans. Here, we develop a strategy to train deep convolutional neural networks simultaneously on multiple genomes and apply it to learn sequence predictors for large compendia of human and mouse data. Training on both genomes improves gene expression prediction accuracy on held out sequences. We further demonstrate a novel and powerful transfer learning approach to use mouse regulatory models to analyze human genetic variants associated with molecular phenotypes and disease. Together these techniques unleash thousands of non-human epigenetic and transcriptional profiles toward more effective investigation of how gene regulation affects human disease. Introduction Predicting the behavior of any nucleic acid sequence in any environment is a primary objective of gene regulation research. In recent years, machine learning approaches to directly tackle this problem have achieved significant accuracy gains predicting transcription factor (TF) binding, chromatin features, and gene expression from input DNA sequence (1{7).
    [Show full text]