The BEN Domain Is a Novel Sequence- Specific DNA-Binding Domain Conserved in Neural Transcriptional Repressors

Total Page:16

File Type:pdf, Size:1020Kb

The BEN Domain Is a Novel Sequence- Specific DNA-Binding Domain Conserved in Neural Transcriptional Repressors Downloaded from genesdev.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press The BEN domain is a novel sequence- specific DNA-binding domain conserved in neural transcriptional repressors Qi Dai,1,3 Aiming Ren,2,3 Jakub O. Westholm,1 Artem A. Serganov,2 Dinshaw J. Patel,2 and Eric C. Lai1,4 1Department of Developmental Biology, 2Department of Structural Biology, Sloan-Kettering Institute, New York, New York 10065, USA We recently reported that Drosophila Insensitive (Insv) promotes sensory organ development and has activity as a nuclear corepressor for the Notch transcription factor Suppressor of Hairless [Su(H)]. Insv lacks domains of known biochemical function but contains a single BEN domain (i.e., a ‘‘BEN-solo’’ protein). Our chromatin immuno- precipitation (ChIP) sequencing (ChIP-seq) analysis confirmed binding of Insensitive to Su(H) target genes in the Enhancer of split gene complex [E(spl)-C]; however, de novo motif analysis revealed a novel site strongly enriched in Insv peaks (TCYAATHRGAA). We validate binding of endogenous Insv to genomic regions bearing such sites, whose associated genes are enriched for neural functions and are functionally repressed by Insv. Unexpectedly, we found that the Insv BEN domain binds specifically to this sequence motif and that Insv directly regulates transcription via this motif. We determined the crystal structure of the BEN–DNA target complex, revealing homodimeric binding of the BEN domain and extensive nucleotide contacts via a helices and a C-terminal loop. Point mutations in key DNA-contacting residues severely impair DNA binding in vitro and capacity for transcriptional regulation in vivo. We further demonstrate DNA-binding and repression activities by the mammalian neural BEN-solo protein BEND5. Altogether, we define novel DNA-binding activity in a conserved family of transcriptional repressors, opening a molecular window on this extensive gene family. [Keywords: BEN domain; neurogenesis; transcriptional repression] Supplemental material is available for this article. Received January 3, 2013; revised version accepted February 6, 2013. Sequence-specific transcription factors coordinate and annotated DNA-binding proteins that have yet to be drive all programs of development and physiology, and characterized, but perhaps some of them may represent a few dozen families of sequence-specific DNA-binding target sites of proteins whose DNA-binding activity is not folds have been characterized to date (Yusuf et al. 2012). evident from a primary sequence. With the passing years, the recognition of novel DNA- The Drosophila peripheral nervous system has been an binding domains has become increasingly infrequent. In exemplary model system for studying the sequential particular, the latest DNA-binding domains were recog- adoption of cell fates during development (Lai and Orgogozo nized from bacterial or fungal species (Boch et al. 2009; 2004). The capacity for neurogenesis within the epider- Moscou and Bogdanove 2009; Wiedenheft et al. 2009; mis is conferred by the spatially patterned expression and Lohse et al. 2010), with few new domains described in activity of proneural basic helix–loop–helix (bHLH) acti- metazoans. Nevertheless, the observation that many un- vator proteins in proneural clusters. For example, the characterized sequence motifs emerge from multigenome bHLH factors Achaete and Scute are responsible for ini- alignments and genome-wide data collection efforts tiating the development of the several hundred external (Stark et al. 2007; Lindblad-Toh et al. 2011; Neph et al. mechanosensory organs that decorate the body surface. 2012) indicates that many binding sites remain to be con- Local cell interactions mediated by the Notch signaling nected to their cognate transcription factors. Undoubtedly, pathway subsequently restrict neural potential to indi- many of these correspond to sites for the hundreds of vidual sensory organ precursors (SOPs), which then ex- ecute a stereotyped lineage to generate the cells of a 3These authors contributed equally to this work. mature sensory organ comprised of a neuron and several 4Corresponding author nonneural accessory cells (Lai 2004). Each cell division in E-mail [email protected] Article published online ahead of print. Article and publication date are the neural lineages is asymmetric and driven by Notch online at http://www.genesdev.org/cgi/doi/10.1101/gad.213314.113. signaling such that loss-of-function and gain-of-function 602 GENES & DEVELOPMENT 27:602–614 Ó 2013 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/13; www.genesdev.org Downloaded from genesdev.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press Function and structure of DNA-binding BEN domains of the Notch pathway can induce reciprocal, symmetric Insv to multiple Notch/Su(H)-regulated targets resident cell fate outcomes. In particular, the neuron is the ulti- in the E(spl)-C, including both bHLH repressor and Bearded mate product of multiple precursor cells that successfully family genes (Duan et al. 2011). Insv mutants exhibit mild avoid activating Notch signaling. sensory organ loss and shaft-to-socket transformations A new component in Drosophila neurogenesis emerged within the sensory lineage, phenotypes that are exacer- from the recovery of Insensitive (Insv) in a molecular bated by elevation of haploinsufficiency of the charac- screen for genes specifically expressed in proneural clus- terized Su(H) corepressor Hairless. ters (Reeves and Posakony 2005). The only recognizable To gain insight into regulatory targets of Insv genome- domain on Insv is a BEN domain. Named after instances wide, we performed Insv ChIP sequencing (ChIP-seq) us- in the BANP, E5R, and NAC1 proteins, this bioinformat- ing 2.5- to 6.5-h and 6.5- to 12-h embryos. These data ically recognized motif is widely distributed among meta- revealed 5364 and 2390 peaks, respectively, relative to zoans and viruses (Abhiman et al. 2008). Some BEN control IgG ChIP-seq (Supplemental Table 1). The spec- proteins contain other motifs linked to chromatin func- ificity of Insv binding was well-illustrated at the E(spl)-C, tions, but Insv is the prototype of the ‘‘BEN-solo’’ family where Insv peaks were concentrated across the upstream that lacks other recognizable protein domains. We re- regions of most of these Notch-regulated genes (Fig. 1A), cently showed that Insv is a nuclear factor that promotes whereas binding of Insv to the flanking hundreds of peripheral neurogenesis and inhibits Notch signaling kilobases was sparse. Despite this, neither high-affinity (Duan et al. 2011). In particular, Insv has activity as a (YGTGRGA) nor low-affinity (RTGRGAR) consensus se- direct corepressor for the Notch pathway transcription quences for Su(H) (Nellesen et al. 1999) were enriched factor Suppressor of Hairless [Su(H)] and localizes in vivo among Insv peaks genome-wide, and the majority of re- to a number of Notch target genes located in the Enhancer gions strongly bound by Insv lacked Su(H)-binding sites of split-Complex [E(spl)-C] (Duan et al. 2011). (Supplemental Fig. 1). In this study, we examined the genomewide occupancy An example of such a region is shown in Figure 1B, of Insv during two stages of Drosophila embryogenesis. where both Insv ChIP-seq data sets reported on strong We confirmed binding of Insv to Notch target genes in the binding to a conserved genomic region located close to E(spl)-C; however, Insv bound thousands of regions that the promoter of the Notch modulator fringe. To under- did not contain consensus Su(H)-binding sites. Instead, stand the basis of chromatin association of Insv in the these regions defined a novel sequence motif that was en- absence of Su(H) sites, we performed de novo motif analysis riched among genes involved in neurogenesis, and genes using Weeder (Pavesi et al. 2001) and MEME (Bailey and with Insv occupied-motifs were broadly up-regulated in Elkan 1994), querying Insv peaks of various classes. We insv mutants. Surprisingly, we found that Insv bound to considered all bound regions, those regions that were ex- this motif directly via the BEN domain, and we went on clusive of Su(H)-binding sites, and regions exclusive of to solve the structure of the BEN domain in complex with highly occupied target (HOT) regions (Roy et al. 2010). its DNA target. Extensive mutational analysis confirmed Analysis by MEME reported a novel semipalindromic the structural details of BEN domain interactions with motif (TCYAATHRGAA) as the most highly enriched the DNA backbone and with critical base-specific con- sequence (Fig. 1C), while Weeder reported the perfect pal- tacts that explain the affinity of Insv for its cognate DNA indromic sequence (CCAATTGG). Indeed, these motifs target. Although BEN domains are only loosely related to were enriched even without having to preclude peaks con- each other, knowledge of the Insv structure allowed us to taining Su(H) sites. Genome-wide, the Insv ChIP peaks recognize other BEN proteins that share its base-specific bearing the TCYAATHRGAA motif were preferentially amino acids. In particular, we show that mammalian located in proximity to transcriptional start sites in both BEND5 is also a sequence-specific transcriptional repres- data sets (Fig. 1D; Supplemental Fig. 2). In addition, this sor that regulates neurogenesis, despite very limited over- motif was preferentially enriched among more strongly all homology with Insv. In summary, our work defines a bound sites in both ChIP-seq data sets, supporting its novel conserved family of metazoan transcription factors biological relevance (Fig. 1E; Supplemental Fig. 1). and provides direction to the study of the other >100 an- We confirmed in vivo binding to these genomic regions notated BEN domain proteins (Abhiman et al. 2008). by endogenous Insv. We selected Insv ChIP-seq peaks from repo, vestigial, mir-263a, fringe, grainyhead, ham- let, tramtrack, Syt1, Dlic2, Chinmo, and fne (found in Results neurons) for verification by ChIP-qPCR. All of these are genes involved in nervous system development, and none Genome-wide analysis reveals a novel DNA motif of their Insv peaks contained even low-affinity Su(H) sites.
Recommended publications
  • Representation for Discovery of Protein Motifs
    From: ISMB-93 Proceedings. Copyright © 1993, AAAI (www.aaai.org). All rights reserved. Representation for Discovery of Protein Motifs Darrell ConklinT, Suzanne Fortier~, Janice Glasgowt Departments~t of Computing and Information Sciencet and Chemistry Queen’s University Kingston, Ontario, Canada K7L 3N6 [email protected] Abstract pattern of amino acid residues. Protein motifs can be roughly classified into four categories. Sequence There are several dimensions and levels of com- motifs are linear strings of residues with an implicit plexity in which information on protein mo- topological ordering. Sequence-structure motifs are tifs may be available. For example, one- sequence motifs with secondary structure identifiers at- dimensional sequence motifs may be associated tached to one or more residues in the motif. Structure with secondary structure identifiers. Alterna- motifs are 3D structural objects, described by posi- tively, three-dimensional information on polypep- tions of residue objects in 3D Euclidian space. Apart tide segments may be used to induce prototypical from a topological linear ordering on the residues, three-dimensional structure templates. This pa- structure motifs are free of sequence information. Fi- per surveys various representations encountered nally, structure-sequence motifs are combined 1D- in the protein motif discovery literature. Many 3D structures that associate sequence information with of the representations are based on incompatible a structure motif. Figure 1 illustrates these four types semantics, making difficult the comparison and of protein motifs, along with somefurther subclassifi- combination of previous results. To make better cations which are elaborated upon later in the paper. use of machine learning techniques and to pro- The first three motif types are discussed by Thornton vide for an integrated knowledge representation and Gardner (1989); the structure-sequence motif will framework, a general representation language -- be presented in this paper.
    [Show full text]
  • Introduction to Sequence Motif Discovery and Search the Complete
    Introduction to sequence motif discovery and search EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid The complete genome era Components of transcriptional regulation Distal transcription-factor binding sites (enhancer) cis-regulatory modules Wasserman 5, 276-287 (2004) DNA-protein interaction Allen et al. © 1998 EMBL Jordan et al. © 1991 Prentice-Hall Sequence motifs sequence variants of site: ..A T C G C A.. ..T T G G A C.. ..T T G G T G.. ..A T C G G T.. matrix: A 2 0 0 0 1 1 (simplest version) C 0 0 2 0 1 1 G 0 0 2 4 1 1 T 2 4 0 0 1 1 + cutoff ! sequence logo: Title Representation of the binding specificity by a scoring matrix (also referred to as weight matrix) 1 2 3 4 5 6 7 8 9 A -10 -10 -14 -12 -10 5 -2 -10 -6 C 5 -10 -13 -13 -7 -15 -13 3 -4 G -3 -14 -13 -11 5 -12 -13 2 -7 T -5 5 5 5 -10 -9 5 -11 5 Strong C T T T G A T C T Binding site 5 + 5 + 5 + 5 + 5 + 5 + 5 + 3 + 5 = 43 Random A C G T A C G T A Sequence -10 -10 -13 + 5 -10 -15 -13 -11 - 6 = -83 Biophysical interpretation of protein binding sites Columns of a weight matrix characterize the specificity of base-pair acceptor sites on the protein surface. Weight matrix elements represent negated energy contributions to the total binding energy → weight matrix score inversely proportional to binding energy Motif search: Statistical over-representation of genomic sequence motifs Conserved motifs represent protein binding sites Putative conservation in nucleotide sequence (motif) position relative to: TSS other binding sites (protein complexes)
    [Show full text]
  • Human Induced Pluripotent Stem Cell–Derived Podocytes Mature Into Vascularized Glomeruli Upon Experimental Transplantation
    BASIC RESEARCH www.jasn.org Human Induced Pluripotent Stem Cell–Derived Podocytes Mature into Vascularized Glomeruli upon Experimental Transplantation † Sazia Sharmin,* Atsuhiro Taguchi,* Yusuke Kaku,* Yasuhiro Yoshimura,* Tomoko Ohmori,* ‡ † ‡ Tetsushi Sakuma, Masashi Mukoyama, Takashi Yamamoto, Hidetake Kurihara,§ and | Ryuichi Nishinakamura* *Department of Kidney Development, Institute of Molecular Embryology and Genetics, and †Department of Nephrology, Faculty of Life Sciences, Kumamoto University, Kumamoto, Japan; ‡Department of Mathematical and Life Sciences, Graduate School of Science, Hiroshima University, Hiroshima, Japan; §Division of Anatomy, Juntendo University School of Medicine, Tokyo, Japan; and |Japan Science and Technology Agency, CREST, Kumamoto, Japan ABSTRACT Glomerular podocytes express proteins, such as nephrin, that constitute the slit diaphragm, thereby contributing to the filtration process in the kidney. Glomerular development has been analyzed mainly in mice, whereas analysis of human kidney development has been minimal because of limited access to embryonic kidneys. We previously reported the induction of three-dimensional primordial glomeruli from human induced pluripotent stem (iPS) cells. Here, using transcription activator–like effector nuclease-mediated homologous recombination, we generated human iPS cell lines that express green fluorescent protein (GFP) in the NPHS1 locus, which encodes nephrin, and we show that GFP expression facilitated accurate visualization of nephrin-positive podocyte formation in
    [Show full text]
  • Sharmin Supple Legend 150706
    Supplemental data Supplementary Figure 1 Generation of NPHS1-GFP iPS cells (A) TALEN activity tested in HEK 293 cells. The targeted region was PCR-amplified and cloned. Deletions in the NPHS1 locus were detected in four clones out of 10 that were sequenced. (B) PCR screening of human iPS cell homologous recombinants (C) Southern blot screening of human iPS cell homologous recombinants Supplementary Figure 2 Human glomeruli generated from NPHS1-GFP iPS cells (A) Morphological changes of GFP-positive glomeruli during differentiation in vitro. A different aggregate from the one shown in Figure 2 is presented. Lower panels: higher magnification of the areas marked by rectangles in the upper panels. Note the shape changes of the glomerulus (arrowheads). Scale bars: 500 µm. (B) Some, but not all, of the Bowman’s capsule cells were positive for nephrin (48E11 antibody: magenta) and GFP (green). Scale bars: 10 µm. Supplementary Figure 3 Histology of human podocytes generated in vitro (A) Transmission electron microscopy of the foot processes. Lower magnification of Figure 4H. Scale bars: 500 nm. (B) (C) The slit diaphragm between the foot processes. Higher magnification of the 1 regions marked by rectangles in panel A. Scale bar: 100 nm. (D) Absence of mesangial or vascular endothelial cells in the induced glomeruli. Anti-PDGFRβ and CD31 antibodies were used to detect the two lineages, respectively, and no positive signals were observed in the glomeruli. Podocytes are positive for WT1. Nuclei are counterstained with Nuclear Fast Red. Scale bars: 20 µm. Supplementary Figure 4 Cluster analysis of gene expression in various human tissues (A) Unbiased cluster analysis across various human tissues using the top 300 genes enriched in GFP-positive podocytes.
    [Show full text]
  • Tools for Motif and Pattern Searching
    TOOLS FOR MOTIF AND PATTERN SEARCHING Prat Thiru OUTLINE • What are motifs? • Algorithms Used and Programs Available • Workflow and Strategies • MEME/MAST Demo (online and command line) Protein Motifs DNA Motifs MEME Output Definitions • Motif: Conserved regions of protein or DNA sequences • Pattern: Qualitative description of a motif eg. regular expression C[AT]AAT[CG]X •Profile: Quantitative description of a motif eg. position weight matrix Patterns • Regular Expression Symbols ¾[ ] – OR eg. [GA] means G or A ¾{ } – NOT eg. {P,V} means not P or V ¾( ) – repeats eg. A(3) means AAA ¾X or N or “.” – any • Complex patterns representation difficult • Loose frequency information eg. [AT] vs 20%A 80%T Profiles Sequence Logos Algorithms • Enumeration • Probabilistic Optimization • Deterministic Optimization 1. Identify motifs 2. Build a consensus Enumeration • Exhaustive search: word counting method, count all n-mers and look for overrepresentation • Less likely to get stuck in a local optimum • Computationally expensive ¾YMF http://wingless.cs.washington.edu/YMF/YMFWeb/YMFInput.pl ¾Weeder http://159.149.109.9/weederaddons/locator.html Probabilistic Optimization • Uses a Gibbs sampling approach • One n-mer from each sequence is randomly picked to determine initial model. In subsequent iterations, one sequence, i, is removed and the model is recalculated. Pick a new location of motif in sequence i iterate until convergence • Assumes most sequences will have the motif ¾ AlignAce http://atlas.med.harvard.edu/cgi-bin/alignace.pl ¾ Gibbs Motif Sampler http://bayesweb.wadsworth.org/gibbs/gibbs.html Deterministic Optimization • Based on expectation maximization (EM) • EM: iteratively estimates the likelihood given the data that is present I.
    [Show full text]
  • Degsampler: Gibbs Sampling Strategy for Predicting E3-Binding Sites with Position-Specific Prior Information
    DegSampler: Gibbs sampling strategy for predicting E3-binding sites with position-specic prior information Osamu Maruyama ( [email protected] ) Kyushu University https://orcid.org/0000-0002-5760-1507 Fumiko Matsuzaki Kyushu University Research article Keywords: motif, substrate protein, E3 ubiquitin ligase, prior, posterior, disorder, collapsed, Gibbs sampling, DegSampler Posted Date: November 6th, 2020 DOI: https://doi.org/10.21203/rs.3.rs-101967/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Maruyama and Matsuzaki RESEARCH DegSampler: Gibbs sampling strategy for predicting E3-binding sites with position-specific prior information Osamu Maruyama1* and Fumiko Matsuzaki2 Background Eukaryotic cells have two major pathways for degrad- ing proteins in order to control cellular processes and maintain intracellular homeostasis. One of the path- ways is autophagy, a catabolic process that deliv- ers intracellular components to lysosomes or vacuoles Abstract [1]. The other pathway is the ubiquitin-proteasome Background: The ubiquitin-proteasome system is system, which degrades polyubiquitin-tagged pro- a pathway in eukaryotic cells for degrading teins through the proteasomal machinery [2]. In the polyubiquitin-tagged proteins through the ubiquitin-proteasome system, an E3 ubiquitin ligase proteasomal machinery to control various cellular (hereinafter E3) selectively recognizes and binds to processes and maintain intracellular homeostasis. specific regions of the target substrate proteins. These In this system, the E3 ubiquitin ligase (hereinafter binding sites are called degrons [3]. E3) plays an important role in selectively It is important to identify the E3s and substrate recognizing and binding to specific regions of its proteins that interact with one another to character- substrate proteins.
    [Show full text]
  • Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D
    BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto BIOINFORMATICS SECOND EDITION METHODS OF BIOCHEMICAL ANALYSIS Volume 43 BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Copyright ᭧ 2001 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher.
    [Show full text]
  • Sequence Conservation
    2/14/17 Sequence conservation Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, FeBruary 16th 2017 After this lecture, you can… … list the mechanisms of DNA mutation … sort different biological information levels By conservation … discuss the multiple roles of protein domains and draw the parallel with computer scripting … use conservation to identify functional importance … make and interpret consensus sequences … make and interpret sequence logos & information content … recognize and explain why TFBS are often palindromic … explain the functional importance of unexpected variation 1 2/14/17 Evolution • Replication (copy) • Mutation (modify) • Selection (fitness) Time Mutations • Nucleotide suBstitutions – Replication error – Physical or chemical reaction • Insertions or deletions (indels) – Unequal crossing over during meiosis – Replication slippage • Inversions or rearrangements • Duplications of: – Partial or whole gene – Partial (polysomy) or whole chromosome (aneuploidy, polysomy) – Whole genome (polyploidy) • Horizontal gene transfer (HGT) – Conjugation (direct transfer between Bacteria) – Transformation by naturally competent Bacteria – Transduction by bacteriophages 2 2/14/17 Phenotypic/genotypic similarity • We exploit similarity to… …identify homology (shared ancestry) …determine evolutionary relationships …transfer functional information • Sequences (genotype) rarely converge • Functions (phenotype) can converge • Analogy • Homology – Similar function – Similar ancestry Low-complexity regions •
    [Show full text]
  • Evolutionary Analysis of Viral Sequences in Eukaryotic Genomes
    Evolutionary analysis of viral sequences in eukaryotic genomes Sean Schneider A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2014 Reading Committee: James H. Thomas, Chair Willie Swanson Phil Green Program Authorized to Offer Degree: Genome Sciences ©Copyright 2014 Sean Schneider University of Washington Abstract Evolutionary analysis of viral sequences in eukaryotic genomes Sean Schneider Chair of the supervisory committee: Professor James H. Thomas Genome Sciences The focus of this work is several evolutionary analyses of endogenous viral sequences in eukaryotic genomes. Endogenous viral sequences can provide key insights into the past forms and evolutionary history of viruses, as well as the responses of host organisms they infect. In this work I have examined viral sequences in a diverse assortment of eukaryotic hosts in order to study coevolution between hosts and the organisms that infect them. This research consisted of two major lines of investigation. In the first portion of this work, I outline the hypothesis that the C2H2 zinc finger gene family in vertebrates has evolved by birth-death evolution in response to sporadic retroviral infection. The hypothesis suggests an evolutionary model in which newly duplicated zinc finger genes are retained by selection in response to retroviral infection. This hypothesis is supported by a strong association (R2=0.67) between the number of endogenous retroviruses and the number of zinc fingers in diverse vertebrate genomes. Based on this and other evidence, the zinc finger gene family appears to act as a “genomic immune system” against retroviral infections. The other major line of investigation in this work examines endogenous virus sequences utilized by parasitic wasps to disable hosts that they infect.
    [Show full text]
  • Alignment, Clustering and Extraction of Structured Motifs in DNA Promoter Sequences
    Syracuse University SURFACE Electrical Engineering and Computer Science - Dissertations College of Engineering and Computer Science 8-2012 Alignment, Clustering and Extraction of Structured Motifs in DNA Promoter Sequences Faisal Abdulmalek Alobaid Syracuse University Follow this and additional works at: https://surface.syr.edu/eecs_etd Part of the Computer Engineering Commons Recommended Citation Alobaid, Faisal Abdulmalek, "Alignment, Clustering and Extraction of Structured Motifs in DNA Promoter Sequences" (2012). Electrical Engineering and Computer Science - Dissertations. 322. https://surface.syr.edu/eecs_etd/322 This Dissertation is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Electrical Engineering and Computer Science - Dissertations by an authorized administrator of SURFACE. For more information, please contact [email protected]. ABSTRACT A simple motif is a short DNA sequence found in the promoter region and believed to act as a binding site for a transcription factor protein. A structured motif is a sequence of simple motifs (boxes) separated by short sequences (gaps). Biologists theorize that the presence of these motifs play a key role in gene expression regulation. Discovering these patterns is an important step towards understanding protein-gene and gene-gene interaction thus facilitates the building of accurate gene regulatory network models. DNA sequence motif extraction is an important problem in bioinformatics. Many studies have proposed algorithms to solve the problem instance of simple motif extraction. Only in the past decade has the more complex structured motif extraction problem been examined by researchers. The problem is inherently challenging as structured motif patterns are segmented into several boxes separated by variable size gaps for each instance.
    [Show full text]
  • ER-Resident Transcription Factor Nrf1 Regulates Proteasome Expression and Beyond
    International Journal of Molecular Sciences Review ER-Resident Transcription Factor Nrf1 Regulates Proteasome Expression and Beyond Jun Hamazaki * and Shigeo Murata Laboratory of Protein Metabolism, Graduate School of Pharmaceutical Sciences, the University of Tokyo, Tokyo 1130033, Japan; [email protected] * Correspondence: [email protected] Received: 4 May 2020; Accepted: 20 May 2020; Published: 23 May 2020 Abstract: Protein folding is a substantively error prone process, especially when it occurs in the endoplasmic reticulum (ER). The highly exquisite machinery in the ER controls secretory protein folding, recognizes aberrant folding states, and retrotranslocates permanently misfolded proteins from the ER back to the cytosol; these misfolded proteins are then degraded by the ubiquitin–proteasome system termed as the ER-associated degradation (ERAD). The 26S proteasome is a multisubunit protease complex that recognizes and degrades ubiquitinated proteins in an ATP-dependent manner. The complex structure of the 26S proteasome requires exquisite regulation at the transcription, translation, and molecular assembly levels. Nuclear factor erythroid-derived 2-related factor 1 (Nrf1; NFE2L1), an ER-resident transcription factor, has recently been shown to be responsible for the coordinated expression of all the proteasome subunit genes upon proteasome impairment in mammalian cells. In this review, we summarize the current knowledge regarding the transcriptional regulation of the proteasome, as well as recent findings concerning the regulation of Nrf1 transcription activity in ER homeostasis and metabolic processes. Keywords: proteasome; Nrf1; DDI2; bounce back; proteostasis 1. Introduction Proteins are essential to the vast majority of cellular and organismal functions. Cellular protein levels are defined by the balance among protein synthesis, folding, and degradation, and the failure to accurately coordinate these processes leads to disease [1].
    [Show full text]
  • Lecture 7: Sequence Motif Discovery
    Sequence motif: definitions COSC 348: Computing for Bioinformatics • In Bioinformatics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has Lecture 7: been proven or assumed to have a biological significance. Sequence Motif Discovery • Once we know the sequence pattern of the motif, then we can use the search methods to find it in the sequences (i.e. Lubica Benuskova Boyer-Moore algorithm, Rabin-Karp, suffix trees, etc.) • The problem is to discover the motifs, i.e. what is the order of letters the particular motif is comprised of. http://www.cs.otago.ac.nz/cosc348/ 1 2 Examples of motifs in DNA Sequence motif: notations • The TATA promoter sequence is an example of a highly • An example of a motif in a protein: N, followed by anything but P, conserved DNA sequence motif found in eukaryotes. followed by either S or T, followed by anything but P − One convention is to write N{P}[ST]{P} where {X} means • Another example of motifs: binding sites for transcription any amino acid except X; and [XYZ] means either X or Y or Z. factors (TF) near promoter regions of genes, etc. • Another notation: each ‘.’ signifies any single AA, and each ‘*’ Gene 1 indicates one member of a closely-related AA family: Gene 2 − WDIND*.*P..*...D.F.*W***.**.IYS**...A.*H*S*WAMRN Gene 3 Gene 4 • In the 1st assignment we have motifs like A??CG, where the Gene 5 wildcard ? Stands for any of A,U,C,G. Binding sites for TF 3 4 Sequence motif discovery from conservation Motif discovery based on alignment • profile analysis is another word for this.
    [Show full text]