Annotation of Cis-Regulatory Elements by Identification, Subclassification, and Functional Assessment of Multispecies Conserved Sequences

Total Page:16

File Type:pdf, Size:1020Kb

Annotation of Cis-Regulatory Elements by Identification, Subclassification, and Functional Assessment of Multispecies Conserved Sequences Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences Jim R. Hughes*, Jan-Fang Cheng†, Nicki Ventress*, Shyam Prabhakar†, Kevin Clark*, Eduardo Anguita*, Marco De Gobbi*, Pieter de Jong‡, Eddy Rubin†, and Douglas R. Higgs*§ *Medical Research Council Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford OX3 9DS, United Kingdom; †Department of Energy Joint Genome Institute, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720; and ‡Children’s Hospital and Research Center at Oakland, Oakland, CA 94609 Communicated by David Weatherall, University of Oxford, Oxford, United Kingdom, April 28, 2005 (received for review November 5, 2004) An important step toward improving the annotation of the human and͞or unknown because there is usually very limited experimental genome is to identify cis-acting regulatory elements from primary data to link genome structure to function at the locus in question. DNA sequence. One approach is to compare sequences from Therefore, to establish accurate and sensitive routines for evaluat- multiple, divergent species. This approach distinguishes multispe- ing cis-regulatory sequences, it will be necessary to analyze well cies conserved sequences (MCS) in noncoding regions from more characterized regions of the human genome so that such sequences rapidly evolving neutral DNA. Here, we have analyzed a region of can be recognized and their potential importance can be assessed Ϸ238kb containing the human ␣ globin cluster that was sequenced and prioritized for functional studies. ͞ and͞or annotated across the syntenic region in 22 species spanning Here, we have sequenced and or annotated and compared extensive regions (47–238 kb) of conserved synteny containing the 500 million years of evolution. Using a variety of bioinformatic ␣ approaches and correlating the results with many aspects of -globin cluster, its known regulatory elements and flanking genes chromosome structure and function in this region, we were able to in 22 species representing widely divergent groups, spanning 500 my of evolution. Multiple alignments were analyzed with a variety of identify and evaluate the importance of 24 individual MCSs. This bioinformatic approaches, and the results were compared with approach sensitively and accurately identified previously charac- algorithms for the detection of MCSs described in ref. 9 and a newly terized regulatory elements but also discovered unidentified pro- developed analysis tool, GUMBY (S. Prabhakar, unpublished data). moters, exons, splicing, and transcriptional regulatory elements. The approach taken here allowed us to identify 24 MCSs within a Together, these studies demonstrate an integrated approach by 238-kb region containing and surrounding the human ␣-globin which to identify, subclassify, and predict the potential importance cluster. Previous studies of this locus have localized all DNase1 of MCSs. hypersensitive sites (DHSs; refs. 10 and 11), characterized its nuclear localization (12), nuclear matrix attachment (13), pattern of conserved noncoding elements ͉ gene regulation ͉ globin gene ͉ replication (14), chromatin structure (10), chromatin modification comparative genomics (15, 16) DNA methylation (17), and pattern of gene expression in normal loci and those with natural deletions (18), allowing us to ith the development of multiple computational algorithms evaluate in full the functional significance of the MCS elements Wand appropriate biological data, it is now possible to detect identified here. coding sequences relatively efficiently and accurately from primary Materials and Methods DNA sequence (1). By contrast, it is still difficult to identify and Clone Isolation and Sequencing. Bacterial artificial chromosomes assign function to noncoding sequences that include critical cis- (BACs) used in this study were isolated and separated by using acting regulatory elements such as promoters, enhancers, silencers, standard techniques (19); all details concerning multiple align- locus control elements, nuclear matrix attachment sites, and origins ments of the discussed elements are available as additional material of replication. A promising approach is to search for conserved from the authors on request. All BACs used in this study are orthologous noncoding sequences from multiple, evolutionarily available from the BACPAC Resources Centre (http:͞͞bacpac. diverse species (2–8), so-called multispecies conserved sequences chori.org). The sequences of the chimpanzee, cow, and cat were (MCSs). However, at present, it is not established how best to design obtained from the ENCODE project (http:͞͞genome.gov͞ and carry out such searches. Furthermore, even when complete, it ENCODE), region ENm008. is not clear how to evaluate and prioritize MCSs for functional analysis. Sequence Assembly and Finishing. Sequence reads were base called An important issue is to know which species to include in by using PHRED 0.020425 and assembled by using PHRAP (www.phrap. searches for MCSs. When comparing the sequences of distantly org) into a GAP4 database. Standard finishing methods were used. related species [e.g., birds and mammals, which diverged 310 million years (my) ago], cis elements may have diverged too much to be easily identified. By contrast, when comparing more closely related Abbreviations: ␣MRE, human ␣-globin major regulatory element; DHS, DNase1 hypersen- sitive site; MCS, multispecies conserved sequences; MCS-E, MCS-previously unidentified species (e.g., rodents and primates, which diverged 64–74 my ago), exon; MCS-P, MCS promoters; MCS-R, MCS-other transcriptional regulatory element; it may be difficult to distinguish between sequence homology MCS-S, MCS splicing element; my, million years; TFB, transcription factor binding site. resulting from a slow rate of evolution and the conservation of Data deposition: The sequences reported in this paper have been deposited in the GenBank functionally important cis elements. Nevertheless, it is becoming database [(species, clone names, and accession nos.): baboon, RP41-98L24, AC145461; clear that analyzing multiple species in different combinations adds colobus, CH272–179O16, AC148220; dusky titi, LB5-13F19, AC145465; marmoset, CH259- significant power to the identification of functionally important 50K11, AC146591, and CH259-177O2, AC145483; owl monkey, CH258-305M22, AC146782; squirrel monkey, CH254-200G16, AC146463, and CH254-167L1, AC151883; lemur, LB2- cis-acting sequences (2, 3). 99K6, AC145463; pig, RP44-210L18, AC120214, and RP44-293L23, AC120215; dog, RP81- Even when MCS can be unequivocally identified, there is often 84F12, AC120212, and RP81-397M1, AC120213; hedgehog, LB4-358K2, AC150435; opos- no way to judge how sensitive the searches have been. Do these sum, LB3-33C20, AC120504, and LB3-206D4, AC139599, and LB3-171C2, AC148752]. routines overcall or undercall regulatory elements? Furthermore, §To whom correspondence should be addressed. E-mail: [email protected]. the true functional significance of MCSs often remains untested © 2005 by The National Academy of Sciences of the USA 9830–9835 ͉ PNAS ͉ July 12, 2005 ͉ vol. 102 ͉ no. 28 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0503401102 Downloaded by guest on September 29, 2021 Sequence Annotation and Multiple Alignment. Sequences were an- notated against EST data when available for that species (http:͞͞ genome.ucsc.edu and www.ensembl.org) or by using cross-species comparison of available ESTs (MACVECTOR 7.2, Accelrys, Inc., San Diego). The presence of nonsyntenic features were evaluated by using a modification of HPREP and stored in AceDB (www.acedb. org). Repeat annotation was generated by using REPEATMASKER (www.repeatmasker.org). Multiple alignments were generated and͞or visualized by using VISTA (www-gsd.lbl.gov͞vista͞index. shtml); MULTIPIPMAKER (http:͞͞pipmaker.bx.psu.edu͞pipmaker), and MULTILAGAN (http:͞͞lagan.stanford.edu͞lagan࿝web͞index. shtml). Multiple alignments were analyzed as described in Results and were also compared with data obtained by using WEBMCS (http:͞͞research.nhgri.nih.gov͞MCS͞submit.shtml). In addition, MULTILAGAN alignments were analyzed for noncoding conserva- tion by using the algorithm GUMBY, a tool for detecting statistically significant conserved regions in pairwise or multiple alignments of DNA sequences at any alignable evolutionary distance. Individual MCSs were aligned by using CLUSTALW (MACVECTOR 7.2, Accelrys); BESTFIT (WISCONSIN PACKAGE 10.3, GCG) and DIALIGN (www. genomatix.de͞cgi-bin͞dialign͞dialign.pl) and manually optimized. Sequence Analysis. Potential transcription factor binding sites (TFBs) were identified by using TRANSFAC 6.0 (www.gene- regulation.com); TRES (http:͞͞bioportal.bic.nus.edu.sg͞tres), and REGULATORY VISTA (http:͞͞gsd.lbl.gov͞vista͞rvista͞submit. shtml). Plots of Observed͞Expected for the CpG dinucleotide were generated by using WINDOW and STATPLOT 10.3 (GCG) comparing the local frequency to the global expected value over a 1,000-bp window. Results A Region of Conserved Synteny Maintained Throughout 500 my of Evolution. ␣ The structure of the -globin cluster was established by Fig. 1. The arrangement (to scale) of globin genes (red boxes) and nonglobin EVOLUTION DNA sequence analysis (6- to 8-fold coverage) in 22 divergent genes (colored boxes) flanking the ␣-globin cluster. The extent of each black species representing well defined time points throughout
Recommended publications
  • Análise Integrativa De Perfis Transcricionais De Pacientes Com
    UNIVERSIDADE DE SÃO PAULO FACULDADE DE MEDICINA DE RIBEIRÃO PRETO PROGRAMA DE PÓS-GRADUAÇÃO EM GENÉTICA ADRIANE FEIJÓ EVANGELISTA Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas Ribeirão Preto – 2012 ADRIANE FEIJÓ EVANGELISTA Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas Tese apresentada à Faculdade de Medicina de Ribeirão Preto da Universidade de São Paulo para obtenção do título de Doutor em Ciências. Área de Concentração: Genética Orientador: Prof. Dr. Eduardo Antonio Donadi Co-orientador: Prof. Dr. Geraldo A. S. Passos Ribeirão Preto – 2012 AUTORIZO A REPRODUÇÃO E DIVULGAÇÃO TOTAL OU PARCIAL DESTE TRABALHO, POR QUALQUER MEIO CONVENCIONAL OU ELETRÔNICO, PARA FINS DE ESTUDO E PESQUISA, DESDE QUE CITADA A FONTE. FICHA CATALOGRÁFICA Evangelista, Adriane Feijó Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas. Ribeirão Preto, 2012 192p. Tese de Doutorado apresentada à Faculdade de Medicina de Ribeirão Preto da Universidade de São Paulo. Área de Concentração: Genética. Orientador: Donadi, Eduardo Antonio Co-orientador: Passos, Geraldo A. 1. Expressão gênica – microarrays 2. Análise bioinformática por module maps 3. Diabetes mellitus tipo 1 4. Diabetes mellitus tipo 2 5. Diabetes mellitus gestacional FOLHA DE APROVAÇÃO ADRIANE FEIJÓ EVANGELISTA Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas.
    [Show full text]
  • Imputation of Exome Sequence Variants Into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project
    ARTICLE Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project Paul L. Auer,1,17 Jill M. Johnsen,2,3,17 Andrew D. Johnson,4,17 Benjamin A. Logsdon,1,17 Leslie A. Lange,5,17 Michael A. Nalls,6 Guosheng Zhang,5 Nora Franceschini,5 Keolu Fox,3 Ethan M. Lange,5 Stephen S. Rich,7 Christopher J. O’Donnell,4 Rebecca D. Jackson,8 Robert B. Wallace,9 Zhao Chen,10 Timothy A. Graubert,11 James G. Wilson,12 Hua Tang,13,17 Guillaume Lettre,14,17 Alex P. Reiner,1,6,17,* Santhi K. Ganesh,15,17 and Yun Li5,16,17,* Researchers have successfully applied exome sequencing to discover causal variants in selected individuals with familial, highly pene- trant disorders. We demonstrate the utility of exome sequencing followed by imputation for discovering low-frequency variants asso- ciated with complex quantitative traits. We performed exome sequencing in a reference panel of 761 African Americans and then imputed newly discovered variants into a larger sample of more than 13,000 African Americans for association testing with the blood cell traits hemoglobin, hematocrit, white blood count, and platelet count. First, we illustrate the feasibility of our approach by demon- strating genome-wide-significant associations for variants that are not covered by conventional genotyping arrays; for example, one such association is that between higher platelet count and an MPL c.117G>T (p.Lys39Asn) variant encoding a p.Lys39Asn amino À acid substitution of the thrombpoietin receptor gene (p ¼ 1.5 3 10 11).
    [Show full text]
  • Functional Analysis of Structural Variation in the 2D and 3D Human Genome
    FUNCTIONAL ANALYSIS OF STRUCTURAL VARIATION IN THE 2D AND 3D HUMAN GENOME by Conor Mitchell Liam Nodzak A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Computational Biology Charlotte 2019 Approved by: Dr. Xinghua Mindy Shi Dr. Rebekah Rogers Dr. Jun-tao Guo Dr. Adam Reitzel ii c 2019 Conor Mitchell Liam Nodzak ALL RIGHTS RESERVED iii ABSTRACT CONOR MITCHELL LIAM NODZAK. Functional analysis of structural variation in the 2D and 3D human genome. (Under the direction of DR. XINGHUA MINDY SHI) The human genome consists of over 3 billion nucleotides that have an average distance of 3.4 Angstroms between each base, which equates to over two meters of DNA contained within the 125 µm3 volume diploid cell nuclei. The dense compaction of chromatin by the supercoiling of DNA forms distinct architectural modules called topologically associated domains (TADs), which keep protein-coding genes, noncoding RNAs and epigenetic regulatory elements in close nuclear space. It has recently been shown that these conserved chromatin structures may contribute to tissue-specific gene expression through the encapsulation of genes and cis-regulatory elements, and mutations that affect TADs can lead to developmental disorders and some forms of cancer. At the population-level, genomic structural variation contributes more to cumulative genetic difference than any other class of mutation, yet much remains to be studied as to how structural variation affects TADs. Here, we study the func- tional effects of structural variants (SVs) through the analysis of chromatin topology and gene activity for three trio families sampled from genetically diverse popula- tions from the Human Genome Structural Variation Consortium.
    [Show full text]
  • A Catalogue of Stress Granules' Components
    Catarina Rodrigues Nunes A Catalogue of Stress Granules’ Components: Implications for Neurodegeneration UNIVERSIDADE DO ALGARVE Departamento de Ciências Biomédicas e Medicina 2019 Catarina Rodrigues Nunes A Catalogue of Stress Granules’ Components: Implications for Neurodegeneration Master in Oncobiology – Molecular Mechanisms of Cancer This work was done under the supervision of: Clévio Nóbrega, Ph.D UNIVERSIDADE DO ALGARVE Departamento de Ciências Biomédicas e Medicina 2019 i ii A catalogue of Stress Granules’ Components: Implications for neurodegeneration Declaração de autoria de trabalho Declaro ser a autora deste trabalho, que é original e inédito. Autores e trabalhos consultados estão devidamente citados no texto e constam na listagem de referências incluída. I declare that I am the author of this work, that is original and unpublished. Authors and works consulted are properly cited in the text and included in the list of references. _______________________________ (Catarina Nunes) iii Copyright © 2019 Catarina Nunes A Universidade do Algarve reserva para si o direito, em conformidade com o disposto no Código do Direito de Autor e dos Direitos Conexos, de arquivar, reproduzir e publicar a obra, independentemente do meio utilizado, bem como de a divulgar através de repositórios científicos e de admitir a sua cópia e distribuição para fins meramente educacionais ou de investigação e não comerciais, conquanto seja dado o devido crédito ao autor e editor respetivos. iv Part of the results of this thesis were published in Nunes,C.; Mestre,I.; Marcelo,A. et al. MSGP: the first database of the protein components of the mammalian stress granules. Database (2019) Vol. 2019. (In annex A). v vi ACKNOWLEDGEMENTS A realização desta tese marca o final de uma etapa académica muito especial e que jamais irei esquecer.
    [Show full text]
  • Mrna Editing, Processing and Quality Control in Caenorhabditis Elegans
    | WORMBOOK mRNA Editing, Processing and Quality Control in Caenorhabditis elegans Joshua A. Arribere,*,1 Hidehito Kuroyanagi,†,1 and Heather A. Hundley‡,1 *Department of MCD Biology, UC Santa Cruz, California 95064, †Laboratory of Gene Expression, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan, and ‡Medical Sciences Program, Indiana University School of Medicine-Bloomington, Indiana 47405 ABSTRACT While DNA serves as the blueprint of life, the distinct functions of each cell are determined by the dynamic expression of genes from the static genome. The amount and specific sequences of RNAs expressed in a given cell involves a number of regulated processes including RNA synthesis (transcription), processing, splicing, modification, polyadenylation, stability, translation, and degradation. As errors during mRNA production can create gene products that are deleterious to the organism, quality control mechanisms exist to survey and remove errors in mRNA expression and processing. Here, we will provide an overview of mRNA processing and quality control mechanisms that occur in Caenorhabditis elegans, with a focus on those that occur on protein-coding genes after transcription initiation. In addition, we will describe the genetic and technical approaches that have allowed studies in C. elegans to reveal important mechanistic insight into these processes. KEYWORDS Caenorhabditis elegans; splicing; RNA editing; RNA modification; polyadenylation; quality control; WormBook TABLE OF CONTENTS Abstract 531 RNA Editing and Modification 533 Adenosine-to-inosine RNA editing 533 The C. elegans A-to-I editing machinery 534 RNA editing in space and time 535 ADARs regulate the levels and fates of endogenous dsRNA 537 Are other modifications present in C.
    [Show full text]
  • ATR16 Syndrome: Mechanisms Linking Monosomy to Phenotype 2 3 4 Christian Babbs,*1 Jill Brown,1 Sharon W
    bioRxiv preprint doi: https://doi.org/10.1101/768895; this version posted October 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 ATR16 Syndrome: Mechanisms Linking Monosomy to Phenotype 2 3 4 Christian Babbs,*1 Jill Brown,1 Sharon W. Horsley,1 Joanne Slater,1 Evie Maifoshie,1 5 Shiwangini Kumar,2 Paul Ooijevaar,3 Marjolein Kriek,3 Amanda Dixon-McIver,2 6 Cornelis L. Harteveld,3 Joanne Traeger-Synodinos,4 Douglas Higgs1 and Veronica 7 Buckle*1 8 9 10 11 12 1. MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular 13 Medicine, University of Oxford, Oxford, UK. 14 15 2. IGENZ Ltd, Auckland, New Zealand. 16 17 3. Department of Clinical Genetics, Leiden University Medical Center, Leiden, The 18 Netherlands. 19 20 4. Department of Medical Genetics, National and Kapodistrian University of Athens, 21 Greece. 22 23 24 25 *Authors for correspondence 26 [email protected] 27 [email protected] 28 29 1 bioRxiv preprint doi: https://doi.org/10.1101/768895; this version posted October 7, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Abstract 2 3 Background: Sporadic deletions removing 100s-1000s kb of DNA, and 4 variable numbers of poorly characterised genes, are often found in patients 5 with a wide range of developmental abnormalities. In such cases, 6 understanding the contribution of the deletion to an individual’s clinical 7 phenotype is challenging.
    [Show full text]
  • Meta-Analysis of Gene Expression in Individuals with Autism Spectrum Disorders
    Meta-analysis of Gene Expression in Individuals with Autism Spectrum Disorders by Carolyn Lin Wei Ch’ng BSc., University of Michigan Ann Arbor, 2011 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Bioinformatics) The University of British Columbia (Vancouver) August 2013 c Carolyn Lin Wei Ch’ng, 2013 Abstract Autism spectrum disorders (ASD) are clinically heterogeneous and biologically complex. State of the art genetics research has unveiled a large number of variants linked to ASD. But in general it remains unclear, what biological factors lead to changes in the brains of autistic individuals. We build on the premise that these heterogeneous genetic or genomic aberra- tions will converge towards a common impact downstream, which might be reflected in the transcriptomes of individuals with ASD. Similarly, a considerable number of transcriptome analyses have been performed in attempts to address this question, but their findings lack a clear consensus. As a result, each of these individual studies has not led to any significant advance in understanding the autistic phenotype as a whole. The goal of this research is to comprehensively re-evaluate these expression profiling studies by conducting a systematic meta-analysis. Here, we report a meta-analysis of over 1000 microarrays across twelve independent studies on expression changes in ASD compared to unaffected individuals, in blood and brain. We identified a number of genes that are consistently differentially expressed across studies of the brain, suggestive of effects on mitochondrial function. In blood, consistent changes were more difficult to identify, despite individual studies tending to exhibit larger effects than the brain studies.
    [Show full text]
  • Mechanisms of Antisense Transcription Initiation with Implications in Gene Expression, Genomic Integrity and Disease Pathogenesis
    non-coding RNA Review Mechanisms of Antisense Transcription Initiation with Implications in Gene Expression, Genomic Integrity and Disease Pathogenesis Priyanka Barman, Divya Reddy and Sukesh R. Bhaumik * Department of Biochemistry and Molecular Biology, Southern Illinois University School of Medicine, Carbondale, IL 62901, USA; [email protected] (P.B.); [email protected] (D.R.) * Correspondence: [email protected]; Tel.: 618-453-6479; Fax: 618-453-6400 Received: 15 October 2018; Accepted: 15 January 2019; Published: 21 January 2019 Abstract: Non-coding antisense transcripts arise from the strand opposite the sense strand. Over 70% of the human genome generates non-coding antisense transcripts while less than 2% of the genome codes for proteins. Antisense transcripts and/or the act of antisense transcription regulate gene expression and genome integrity by interfering with sense transcription and modulating histone modifications or DNA methylation. Hence, they have significant pathological and physiological relevance. Indeed, antisense transcripts were found to be associated with various diseases including cancer, diabetes, cardiac and neurodegenerative disorders, and, thus, have promising potentials for prognostic and diagnostic markers and therapeutic development. However, it is not clearly understood how antisense transcription is initiated and epigenetically regulated. Such knowledge would provide new insights into the regulation of antisense transcription, and hence disease pathogenesis with therapeutic development. The recent studies on antisense transcription initiation and its epigenetic regulation, which are limited, are discussed here. Furthermore, we concisely describe how antisense transcription/transcripts regulate gene expression and genome integrity with implications in disease pathogenesis and therapeutic development. Keywords: antisense transcription; long non-coding RNA; chromatin modification; GAL10; NuA4; TFIID; SAGA; RNA polymerase II 1.
    [Show full text]
  • Close Sequence Comparisons Are Sufficient to Identify Human Cis-Regulatory Elements
    Downloaded from genome.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press Letter Close sequence comparisons are sufficient to identify human cis-regulatory elements Shyam Prabhakar,1,2,4 Francis Poulin,1,3 Malak Shoukry,1 Veena Afzal,1 Edward M. Rubin,1,2 Olivier Couronne,1,2 and Len A. Pennacchio1,2,4 1Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; 2U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human–rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%–80%) and true-positive rate (27%–67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking.
    [Show full text]
  • Science Journals
    SCIENCE ADVANCES | RESEARCH ARTICLE GENETICS Copyright © 2021 The Authors, some rights reserved; An ancestral recombination graph of human, exclusive licensee American Association Neanderthal, and Denisovan genomes for the Advancement Nathan K. Schaefer1,2,3†, Beth Shapiro1,2,3, Richard E. Green3,4* of Science. No claim to original U.S. Government Works. Distributed Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic under a Creative hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Nean- Commons Attribution derthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral re- NonCommercial combination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy License 4.0 (CC BY-NC). on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function. INTRODUCTION As prior techniques for ancestry mapping can be thought of as sum- Much of the current genetic variation within humans predates the maries of the ARG, higher resolution ancestry maps could be pro- split, estimated at 520 to 630 thousand years (ka) ago (1), between duced if the ARG were known.
    [Show full text]
  • Identification of Novel Fusion Transcripts in High Grade Serous
    International Journal of Molecular Sciences Article Identification of Novel Fusion Transcripts in High Grade Serous Ovarian Cancer Andreea Newtson 1,* , Henry Reyes 2, Eric J. Devor 3,4 , Michael J. Goodheart 1,3 and Jesus Gonzalez Bosquet 1,3 1 Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA; [email protected] (M.J.G.); [email protected] (J.G.B.) 2 Department of Obstetrics and Gynecology, University of Buffalo, Buffalo, NY 14260, USA; [email protected] 3 Holden Comprehensive Cancer Center, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA; [email protected] 4 Department of Obstetrics and Gynecology, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA * Correspondence: [email protected]; Tel.: +1-319-356-2015 Abstract: Fusion genes are structural chromosomal rearrangements resulting in the exchange of DNA sequences between genes. This results in the formation of a new combined gene. They have been implicated in carcinogenesis in a number of different cancers, though they have been understudied in high grade serous ovarian cancer. This study used high throughput tools to compare the transcrip- tome of high grade serous ovarian cancer and normal fallopian tubes in the interest of identifying unique fusion transcripts within each group. Indeed, we found that there were significantly more fusion transcripts in the cancer samples relative to the normal fallopian tubes. Following this, the Citation: Newtson, A.; Reyes, H.; role of fusion transcripts in chemo-response and overall survival was investigated. This led to the Devor, E.J.; Goodheart, M.J.; Bosquet, identification of fusion transcripts significantly associated with overall survival.
    [Show full text]
  • PRMT5 Inhibition Disrupts Splicing and Stemness in Glioblastoma
    ARTICLE https://doi.org/10.1038/s41467-021-21204-5 OPEN PRMT5 inhibition disrupts splicing and stemness in glioblastoma Patty Sachamitr1,2, Jolene C. Ho 2, Felipe E. Ciamponi 3,4, Wail Ba-Alawi5,6, Fiona J. Coutinho1, Paul Guilhamon1,5, Michelle M. Kushida1, Florence M. G. Cavalli1, Lilian Lee1, Naghmeh Rastegar1, Victoria Vu2,5, María Sánchez-Osuna 7, Jasmin Coulombe-Huntington7, Evgeny Kanshin7, Heather Whetstone1, Mathieu Durand8, Philippe Thibault8, Kirsten Hart2,6, Maria Mangos2, Joseph Veyhl2, Wenjun Chen2, Nhat Tran2, Bang-Chi Duong2, Ahmed M. Aman 9, Xinghui Che1, Xiaoyang Lan 1, Owen Whitley10,11, Olga Zaslaver10,11, Dalia Barsyte-Lovejoy 2,12, Laura M. Richards 5,6, Ian Restall 13,14, Amy Caudy 10,11,15, 10,11 16 17,18 17,19,20 1234567890():,; Hannes L. Röst , Zahid Quyoom Bonday , Mark Bernstein , Sunit Das , Michael D. Cusimano19, Julian Spears17,21, Gary D. Bader 10,11, Trevor J. Pugh 5,6,9, Mike Tyers7, Mathieu Lupien 5,6,9, Benjamin Haibe-Kains5,6,9,22,23, H. Artee Luchman13,14,24, Samuel Weiss13,14,24, ✉ ✉ ✉ Katlin B. Massirer3,4, Panagiotis Prinos 2 , Cheryl H. Arrowsmith 2,5,6 & Peter B. Dirks 1,11,17,20,25,26 Glioblastoma (GBM) is a deadly cancer in which cancer stem cells (CSCs) sustain tumor growth and contribute to therapeutic resistance. Protein arginine methyltransferase 5 (PRMT5) has recently emerged as a promising target in GBM. Using two orthogonal-acting inhibitors of PRMT5 (GSK591 or LLY-283), we show that pharmacological inhibition of PRMT5 suppresses the growth of a cohort of 46 patient-derived GBM stem cell cultures, with the proneural subtype showing greater sensitivity.
    [Show full text]