Dots: Integrated Gene Indices for Human and Mouse Built from Transcribed Sequences

Total Page:16

File Type:pdf, Size:1020Kb

Dots: Integrated Gene Indices for Human and Mouse Built from Transcribed Sequences DoTS: integrated gene indices for human and mouse built from transcribed sequences Running Title: DoTS gene indices Y Thomas Gan 1,2 , Brian Brunk1, Jonathan Crabtree 1,2 , Deborah Pinney 1,2 , Steve Fischer 1,2 , Joan Mazzarelli 1,2 , Otto Valladares 2, Maja Bucan 2, Christian J. Stoeckert, Jr. 1,2 1Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA 2Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA Y Thomas Gan: 215-746-7013 (tel), 215-573-3111 (fax), yg [email protected] (email) Brian Brunk: 215-573-3118 (tel), 215-573-3111 (fax), [email protected] (email) Jonathan Crabtree: 215-573-3115 (tel), 215-573-3111 (fax), [email protected] (email) Deborah Pinney: 215-573 -3116 (tel), 215-573-3111 (fax), [email protected] (email) Steve Fischer: 215-573-2280 (tel), 215-573-3111 (fax), [email protected] (email) Joan Mazzarelli: 215-573-4413 (tel), 215-573-3111 (fax), [email protected] (email) Otto Valladares: 215-898-0021 (tel), 215-573-2041 (fax ), [email protected] (email) Maja Bucan: 215-898-0020 (tel), 215-573-2041 (fax), [email protected] (email) Corresponding author: Christian J. Stoeckert. Jr. 215-573-4409 (tel), 215-573-3111 (fax), [email protected] (email) Genome Biology Abbreviations used in this paper: EST : expressed sequence tag DoTS : database of transcribed sequences DT : DoTS Transcript DG : DoTS Gene sDG : similarity -based DoTS Gene gDG : genome -based DoTS Gene TC : tentative consensus BLAST : basic local alignment search tool BLAT : BLA ST -like alignment tool UTR : un-translated region ORF : open reading frame CDS : (protein) coding sequence Genome Biology Abstract Background Although sequences for large eukaryotic genomes are being completed, it remains a challenge to identify all genes encoded by them and determine or predict their functions. To help address this challenge, we have built a Database of Transcribed Sequences (DoTS). We cluster and assemble ESTs and mRNAs into DoTS Transcripts (DTs). We further group DTs representing transcripts from the sa me genes into DoTS Genes (DGs). We describe human and mouse DoTS here, although DoTS is generic and applicable to other species such as apicomplexa [1] . Results We have built an integrated transcriptome resource, DoTS, for human and mouse. In DoTS we catalogue, categorize, and annotate known and predic ted transcripts and genes. We have identified 48,994 human and 37,984 mouse high confidence DGs, of which 25,326 human and 22,024 mouse DGs are predicted to be protein -coding genes. Using these data, we can predict novel genes as demonstrated using a 75Mb proximal region on mouse chromosome 5. We have found that DGs can significantly enrich the models of known genes by predicting extended UTRs, novel exons, and alternative transcription starts. DoTS also enables the study of non- coding genes and singleton transcripts (DTs with only one input EST or mRNA), in addition to other studies such as the investigation of alternative splicing. A powerful query interface for human and mouse DoTS is available at http://www.allgenes. org [2]. Conclusion DoTS Transcripts and DoTS Genes, which are extensively annotated and significantly curated, present a unique, integrated, non-redundant, and genome -mapped view of the millions of ESTs and mRNAs in the public domain. They are categorized into various subsets such as high Genome Biology confidence genes, protein -coding genes, and non-coding genes. They predict many putative novel genes, enrich gene models of known genes, and enable datamining in novel directions. Background and significance In a post -genomic era, identifying all genes and studying their functions and relationships are among the ongoing challenges in the field of functional genomics. Transcribed sequences (mRNAs and ESTs) may be used to build integrated transcriptome da ta resources to help address such challenges. Genomic data integration Much progress has been made recently in sequencing large eukaryotic genomes. We now have an essentially complete sequence for the human genome [3 -5] and a draft for mouse [6]. Coincident with the explosion of genomic sequence data is the rapidly growing availability of vast amounts of functional genomics data such as expressed se quence tags (ESTs), proteomes, protein domains, and microarray gene expression data. For example, as of October, 2003, there are 5.4 million human and 3.9 million mouse ESTs in the public EST repository dbEST [7]. It is necessary to integrate these diverse types of data to facilitate gene identification and functional annotation. Transcribed sequences for data integration Transcribed sequences are a good integration point . First, they are the products of gene transcription, and they are abundant as a result of the large scale EST sequencing efforts. Therefore, they can be used for gene discovery and analysis of gene structure (e.g. exon-intron structures, alternative splic ing), in genomic sequences via alignments. Second, expression Genome Biology information is usually available for ESTs, based on the libraries from which they originate. In addition, ESTs are commonly used to generate features on microarrays. Therefore, transcribed sequences allow easy integration of expression information with genes, providing the basis for expression analyses. Third, transcribed sequences may be translated to allow protein sequence analyses (e.g. domain based functional annotation, ortholog identificati on). Fourth, they may be aligned with genomic sequences to identify regulatory regions. Finally, they may originate from genes that do not encode proteins, therefore, they allow the identification of non-coding genes. Existing transcriptome data resource s Human and mouse genome and transcriptome data are available from several sites [8]. Although there is overlap in the information presented, the sites generally provide unique views or emphases. This is expected as we are far from a complete understanding of the wealth of information provided by genome sequencing, EST sequencing, and microarray experiments. Groups such as Ensembl [9, 1 0] or the UCSC Genome Browser team [11] use the genome as their reference point. Another approach is to use shared identifiers (accessions) from different resources to organize and integrate information as is done by GeneCards [12] and MGI [13], which focus on known genes and emphasize phenotypes. These approaches are complementary, and they provide different views and different interpretations of the data. For example, transcribed sequences that cannot be properly aligned to the genome would fail to be seen as primary entities on genome -based views. Unigene [14] and the TIGR gene indices [15] represent multiple species transcriptome data resources organized around transcribed sequences. Other efforts in this class include MGC [16], RefSeq [17], STACK [18], and MIPS [19]. Unigene uses sequence similarity to cluster all ESTs and mRNAs but does not generate consensus sequences. Essentially, the Unigene clusters represent ESTs associated with the same gene. The gre at strength of Unigene is its currency but Genome Biology one of its weaknesses is the lack of persistent identifiers. TIGR gene indices provide consensus sequences and persistent identifiers, and they also have data on orthologs for species other than human and mouse, which enables comparative genomics studies using more than two species. TIGR assemblies (TCs) represent transcripts rather than genes, therefore they are a transcript- centric, not gene -centric resource. MGC focuses on full length cDNAs, and RefSeq underscor e known and curated genes, therefore, they are both limited in scope. DoTS as a transcriptome resource DoTS, short for Database of Transcribed Sequences, is a collective name to describe DoTS Transcripts (DTs) and DoTS Genes (DGs). A DT is an assembly of transcribed sequences representing transcripts of the same splice form, and a DG is a group of DTs representing transcripts from the same gene. The goal of DoTS is to generate relationships among genes, RNAs, proteins, and their sequences to assist in disc overing new genes, functions, genomic relationships (e.g. clusters by location), and regulation of gene expression. Allgenes.org is the website for public access to DoTS. As a human and mouse transcriptome resource, data in DoTS are organized around transcribed sequences, as Unigene and TIGR TCs do. DoTS and TIGR TCs provide consensus sequences and persistent identifiers, both of which Unigene lacks. Although DoTS and TIGR TCs are very similar in the degree of annotation performed and, as recently reported , in the assemblies generated [20], the two are not identical because of differences in the details of their clustering and assembly processes. For example DoTS has more consensus transcripts but a smal ler number of sequences per transcript than TIGR TCs. This may be due to less trimming of low quality sequences from the ends, a choice made for DoTS to better preserve representation of differentially processed transcripts. The DoTS transcript indices als o differ from TIGR TCs in some of the annotations performed on the consensus sequences (e.g. gene trap associations, Genome Biology signal peptide prediction, transmembrane predictions), significant manual curation by expert annotators (Mazzarelli J. et. al., manuscript in preparation), and the availability of a powerful query interface through the Allgenes website [2]. DTs are taken a step further than TCs to generate genes. Therefore DoTS is also a gene index. Gene finding and transcribed sequences The difficulty in identifying all the genes in a mammalian genome is illustrated by the range of predictions over recent years. The estimate for the total number of human genes ranges from 28,000-34,000 based on homology [21], 35,000 based on ESTs [22], and 41,000-45,000 based on validation of computational predictions [23], to 56,960-81,273 based on cDN As [24]. The initial genome annotations by the public and private human genome projects, using similar approaches, both suggested that there are ab out 30,000 human protein -coding genes [4, 5], but the actual genes predicted differed significantly [25].
Recommended publications
  • Dual Proteome-Scale Networks Reveal Cell-Specific Remodeling of the Human Interactome
    bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.905109; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Dual Proteome-scale Networks Reveal Cell-specific Remodeling of the Human Interactome Edward L. Huttlin1*, Raphael J. Bruckner1,3, Jose Navarrete-Perea1, Joe R. Cannon1,4, Kurt Baltier1,5, Fana Gebreab1, Melanie P. Gygi1, Alexandra Thornock1, Gabriela Zarraga1,6, Stanley Tam1,7, John Szpyt1, Alexandra Panov1, Hannah Parzen1,8, Sipei Fu1, Arvene Golbazi1, Eila Maenpaa1, Keegan Stricker1, Sanjukta Guha Thakurta1, Ramin Rad1, Joshua Pan2, David P. Nusinow1, Joao A. Paulo1, Devin K. Schweppe1, Laura Pontano Vaites1, J. Wade Harper1*, Steven P. Gygi1*# 1Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA. 2Broad Institute, Cambridge, MA, 02142, USA. 3Present address: ICCB-Longwood Screening Facility, Harvard Medical School, Boston, MA, 02115, USA. 4Present address: Merck, West Point, PA, 19486, USA. 5Present address: IQ Proteomics, Cambridge, MA, 02139, USA. 6Present address: Vor Biopharma, Cambridge, MA, 02142, USA. 7Present address: Rubius Therapeutics, Cambridge, MA, 02139, USA. 8Present address: RPS North America, South Kingstown, RI, 02879, USA. *Correspondence: [email protected] (E.L.H.), [email protected] (J.W.H.), [email protected] (S.P.G.) #Lead Contact: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.905109; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder.
    [Show full text]
  • A Framework to Identify Contributing Genes in Patients with Phelan
    A framework to identify contributing genes in patients with Phelan-McDermid syndrome Anne-Claude Tabet, Thomas Rolland, Marie Ducloy, Jonathan Levy, Julien Buratti, Alexandre Mathieu, Damien Haye, Laurence Perrin, Céline Dupont, Sandrine Passemard, et al. To cite this version: Anne-Claude Tabet, Thomas Rolland, Marie Ducloy, Jonathan Levy, Julien Buratti, et al.. A frame- work to identify contributing genes in patients with Phelan-McDermid syndrome. Genomic Medicine, Springer Verlag, 2017, 2, pp.32. 10.1038/s41525-017-0035-2. hal-01738521 HAL Id: hal-01738521 https://hal.umontpellier.fr/hal-01738521 Submitted on 20 Mar 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License www.nature.com/npjgenmed ARTICLE OPEN A framework to identify contributing genes in patients with Phelan-McDermid syndrome Anne-Claude Tabet 1,2,3,4, Thomas Rolland2,3,4, Marie Ducloy2,3,4, Jonathan Lévy1, Julien Buratti2,3,4, Alexandre Mathieu2,3,4, Damien Haye1, Laurence Perrin1, Céline Dupont 1, Sandrine
    [Show full text]
  • Content Based Search in Gene Expression Databases and a Meta-Analysis of Host Responses to Infection
    Content Based Search in Gene Expression Databases and a Meta-analysis of Host Responses to Infection A Thesis Submitted to the Faculty of Drexel University by Francis X. Bell in partial fulfillment of the requirements for the degree of Doctor of Philosophy November 2015 c Copyright 2015 Francis X. Bell. All Rights Reserved. ii Acknowledgments I would like to acknowledge and thank my advisor, Dr. Ahmet Sacan. Without his advice, support, and patience I would not have been able to accomplish all that I have. I would also like to thank my committee members and the Biomed Faculty that have guided me. I would like to give a special thanks for the members of the bioinformatics lab, in particular the members of the Sacan lab: Rehman Qureshi, Daisy Heng Yang, April Chunyu Zhao, and Yiqian Zhou. Thank you for creating a pleasant and friendly environment in the lab. I give the members of my family my sincerest gratitude for all that they have done for me. I cannot begin to repay my parents for their sacrifices. I am eternally grateful for everything they have done. The support of my sisters and their encouragement gave me the strength to persevere to the end. iii Table of Contents LIST OF TABLES.......................................................................... vii LIST OF FIGURES ........................................................................ xiv ABSTRACT ................................................................................ xvii 1. A BRIEF INTRODUCTION TO GENE EXPRESSION............................. 1 1.1 Central Dogma of Molecular Biology........................................... 1 1.1.1 Basic Transfers .......................................................... 1 1.1.2 Uncommon Transfers ................................................... 3 1.2 Gene Expression ................................................................. 4 1.2.1 Estimating Gene Expression ............................................ 4 1.2.2 DNA Microarrays ......................................................
    [Show full text]
  • Downloaded from Ensembl
    UCSF UC San Francisco Electronic Theses and Dissertations Title Detecting genetic similarity between complex human traits by exploring their common molecular mechanism Permalink https://escholarship.org/uc/item/1k40s443 Author Gu, Jialiang Publication Date 2019 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California by Submitted in partial satisfaction of the requirements for degree of in in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, SAN FRANCISCO AND UNIVERSITY OF CALIFORNIA, BERKELEY Approved: ______________________________________________________________________________ Chair ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ Committee Members ii Acknowledgement This project would not have been possible without Prof. Dr. Hao Li, Dr. Jiashun Zheng and Dr. Chris Fuller at the University of California, San Francisco (UCSF) and Caribou Bioscience. The Li lab grew into a multi-facet research group consist of both experimentalists and computational biologists covering three research areas including cellular/molecular mechanism of ageing, genetic determinants of complex human traits and structure, function, evolution of gene regulatory network. Labs like these are the pillar of global success and reputation
    [Show full text]
  • Signatures of Adaptive Evolution in Platyrrhine Primate Genomes 5 6 Hazel Byrne*, Timothy H
    1 2 Supplementary Materials for 3 4 Signatures of adaptive evolution in platyrrhine primate genomes 5 6 Hazel Byrne*, Timothy H. Webster, Sarah F. Brosnan, Patrícia Izar, Jessica W. Lynch 7 *Corresponding author. Email [email protected] 8 9 10 This PDF file includes: 11 Section 1: Extended methods & results: Robust capuchin reference genome 12 Section 2: Extended methods & results: Signatures of selection in platyrrhine genomes 13 Section 3: Extended results: Robust capuchins (Sapajus; H1) positive selection results 14 Section 4: Extended results: Gracile capuchins (Cebus; H2) positive selection results 15 Section 5: Extended results: Ancestral Cebinae (H3) positive selection results 16 Section 6: Extended results: Across-capuchins (H3a) positive selection results 17 Section 7: Extended results: Ancestral Cebidae (H4) positive selection results 18 Section 8: Extended results: Squirrel monkeys (Saimiri; H5) positive selection results 19 Figs. S1 to S3 20 Tables S1–S3, S5–S7, S10, and S23 21 References (94 to 172) 22 23 Other Supplementary Materials for this manuscript include the following: 24 Tables S4, S8, S9, S11–S22, and S24–S44 1 25 1) Extended methods & results: Robust capuchin reference genome 26 1.1 Genome assembly: versions and accessions 27 The version of the genome assembly used in this study, Sape_Mango_1.0, was uploaded to a 28 Zenodo repository (see data availability). An assembly (Sape_Mango_1.1) with minor 29 modifications including the removal of two short scaffolds and the addition of the mitochondrial 30 genome assembly was uploaded to NCBI under the accession JAGHVQ. The BioProject and 31 BioSample NCBI accessions for this project and sample (Mango) are PRJNA717806 and 32 SAMN18511585.
    [Show full text]
  • Genetic Variant of TTLL11 Gene and Subsequent Ciliary Defects Are Associated with Idiopathic Scoliosis in a 5-Generation UK Fami
    www.nature.com/scientificreports OPEN Genetic variant of TTLL11 gene and subsequent ciliary defects are associated with idiopathic scoliosis in a 5‑generation UK family Hélène Mathieu1, Shunmoogum A. Patten2, Jose Antonio Aragon‑Martin4, Louise Ocaka3, Michael Simpson6, Anne Child5* & Florina Moldovan1,7* Idiopathic scoliosis (IS) is a complex 3D deformation of the spine with a strong genetic component, most commonly found in adolescent girls. Adolescent idiopathic scoliosis (AIS) afects around 3% of the general population. In a 5‑generation UK family, linkage analysis identifed the locus 9q31.2-q34.2 as a candidate region for AIS; however, the causative gene remained unidentifed. Here, using exome sequencing we identifed a rare insertion c.1569_1570insTT in the tubulin tyrosine ligase like gene, member 11 (TTLL11) within that locus, as the IS causative gene in this British family. Two other TTLL11 mutations were also identifed in two additional AIS cases in the same cohort. Analyses of primary cells of individuals carrying the c.1569_1570insTT (NM_194252) mutation reveal a defect at the primary cilia level, which is less present, smaller and less polyglutamylated compared to control. Further, in a zebrafsh, the knock down of ttll11, and the mutated ttll11 confrmed its role in spine development and ciliary function in the fsh retina. These fndings provide evidence that mutations in TTLL11, a ciliary gene, contribute to the pathogenesis of IS. Idiopathic scoliosis is a form of vertebral column deformity, defned as a combination of a deviation of the spine in the sagittal and coronal plane with a vertebral rotation. It is characterised by a Cobb angle of ≥ 10° curvature1 with rotation of the spine, both of which can be seen on an upright spinal radiograph2.
    [Show full text]
  • Unilateral Opercular Polymicrogyria in a Girl with 22Q13 Deletion Syndrome
    www.symbiosisonline.org Symbiosis www.symbiosisonlinepublishing.com Case Report International Journal of Pediatrics & Child Care Open Access Unilateral Opercular Polymicrogyria in a Girl with 22q13 Deletion Syndrome Papetti L1*, Ursitti F1, Pimpolari L2, Nicita F1, Novelli A3, Zicari AM2, Duse M, Tarani L2, Spalice A1 1 Department of Pediatrics, Child Neurology Division, Sapienza University of Rome. 2 Department of Pediatrics, Sapienza University of Rome. 3 Mendel Laboratory, IRCCS Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo, Foggia, Italy. Received:: March 15, 2017; Accepted: May 25, 2017; Published: September 1, 2017 *Corresponding author: Alberto Spalice, Professor, Department of Pediatrics, Child Neurology Division, Sapienza, University of Rome, Viale Regina Elena 324, 00161, Rome, Italy, Tele: +39 0649979311; Fax: +39 0649979312; E-Mail: [email protected] Major features of the syndrome include neonatal hypotonia, Abstract moderate to severe intellectual impairment, severe or absent The 22q13 deletion syndrome, also known as Phelan-McDermid expressive language delay, and normal growth. Common facial Syndrome (PMS), is a chromosomal microdeletion syndrome characterized by neonatal hypotonia, normal growth, profound wide nasal bridge, deep-set eyes, full cheeks, puffy eyelids, long developmental delay, absent or delayed speech, and minor dysmorphic characteristics include dolicocephaly, flat midface, wide brow, features. Almost all of the 22q13 deletions published so far have been described as terminal. It is believed that the SHANK3 gene is the major toenails, sacral dimple, and large poorly formed ears are candidate gene for the neurologic features of the syndrome. frequentlyeyelashes, andobserved. bulbous Behavior nose. Largeis autistic-like fleshy hands, with dysplasticimpaired communication, reduced social interaction, poor eye contact, myelination, frontal lobe hypoplasia, hypogenesis of corpus callosum anxiety, and self-stimulatory con- duct [19].
    [Show full text]
  • Transcriptome Profiles in Peripheral White Blood Cells at the Time of Artificial Insemination Discriminate Beef Heifers with Different Fertility Potential Sarah E
    Dickinson et al. BMC Genomics (2018) 19:129 https://doi.org/10.1186/s12864-018-4505-4 RESEARCHARTICLE Open Access Transcriptome profiles in peripheral white blood cells at the time of artificial insemination discriminate beef heifers with different fertility potential Sarah E. Dickinson1,2, Brock A. Griffin1, Michelle F. Elmore1,2, Lisa Kriese-Anderson1, Joshua B. Elmore2, Paul W. Dyce1, Soren P. Rodning1 and Fernando H. Biase1* Abstract Background: Infertility is a longstanding limitation in livestock production with important economic impact for the cattle industry. Female reproductive traits are polygenic and lowly heritable in nature, thus selection for fertility is challenging. Beef cattle operations leverage estrous synchronization in combination with artificial insemination (AI) to breed heifers and benefit from an early and uniform calving season. A couple of weeks following AI, heifers are exposed to bulls for an opportunity to become pregnant by natural breeding (NB), but they may also not become pregnant during this time period. Focusing on beef heifers, in their first breeding season, we hypothesized that: a- at the time of AI, the transcriptome of peripheral white blood cells (PWBC) differs between heifers that become pregnant to AI and heifers that become pregnant late in the breeding season by NB or do not become pregnant during the breeding season; and b- the ratio of transcript abundance between genes in PWBC classifies heifers according to pregnancy by AI, NB, or failure to become pregnant. Results: We generated RNA-sequencing data from 23 heifers from two locations (A: six AI-pregnant and five NB-pregnant; and B: six AI-pregnant and six non-pregnant).
    [Show full text]
  • Transcriptomics-Based Phenotypic Screening Supports Drug Discovery in Human Glioblastoma Cells
    cancers Article Transcriptomics-Based Phenotypic Screening Supports Drug Discovery in Human Glioblastoma Cells Vladimir Shapovalov 1, Liliya Kopanitsa 1 , Lavinia-Lorena Pruteanu 1, Graham Ladds 2 and David S. Bailey 1,* 1 IOTA Pharmaceuticals Ltd., St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, UK; [email protected] (V.S.); [email protected] (L.K.); [email protected] (L.-L.P.) 2 Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK; [email protected] * Correspondence: [email protected] Simple Summary: Glioblastoma (GBM) remains a particularly challenging cancer, with an aggressive phenotype and few promising treatment options. Future therapy will rely heavily on diagnosing and targeting aggressive GBM cellular phenotypes, both before and after drug treatment, as part of personalized therapy programs. Here, we use a genome-wide drug-induced gene expression (DIGEX) approach to define the cellular drug response phenotypes associated with two clinical drug candidates, the phosphodiesterase 10A inhibitor Mardepodect and the multi-kinase inhibitor Regorafenib. We identify genes encoding specific drug targets, some of which we validate as effective antiproliferative agents and combination therapies in human GBM cell models, including HMGCoA reductase (HMGCR), salt-inducible kinase 1 (SIK1), bradykinin receptor subtype B2 (BDKRB2), and Janus kinase isoform 2 (JAK2). Individual, personalized treatments will be essential if we Citation: Shapovalov, V.; Kopanitsa, are to address and overcome the pharmacological plasticity that GBM exhibits, and DIGEX will L.; Pruteanu, L.-L.; Ladds, G.; Bailey, play a central role in validating future drugs, diagnostics, and possibly vaccine candidates for this D.S.
    [Show full text]
  • Viewed As a Transcription Silencing Mechanism [11, 12]
    Kushwaha et al. Human Genomics 2016, 10(Suppl 2):18 DOI 10.1186/s40246-016-0071-5 RESEARCH Open Access Hypomethylation coordinates antagonistically with hypermethylation in cancer development: a case study of leukemia Garima Kushwaha1,2, Mikhail Dozmorov3, Jonathan D. Wren4, Jing Qiu5, Huidong Shi6,7* and Dong Xu1,2,8* Abstract Background: Methylation changes are frequent in cancers, but understanding how hyper- and hypomethylated region changes coordinate, associate with genomic features, and affect gene expression is needed to better understand their biological significance. The functional significance of hypermethylation is well studied, but that of hypomethylation remains limited. Here, with paired expression and methylation samples gathered from a patient/ control cohort, we attempt to better characterize the gene expression and methylation changes that take place in cancer from B cell chronic lymphocyte leukemia (B-CLL) samples. Results: Across the dataset, we found that consistent differentially hypomethylated regions (C-DMRs) across samples were relatively few compared to the many poorly consistent hypo- and highly conserved hyper-DMRs. However, genes in the hypo-C-DMRs tended to be associated with functions antagonistic to those in the hyper- C-DMRs, like differentiation, cell-cycle regulation and proliferation, suggesting coordinated regulation of methylation changes. Hypo-C-DMRs in B-CLL were found enriched in key signaling pathways like B cell receptor and p53 pathways and genes/motifs essential for B lymphopoiesis. Hypo-C-DMRs tended to be proximal to genes with elevated expression in contrast to the transcription silencing-mechanism imposed by hypermethylation. Hypo- C-DMRs tended to be enriched in the regions of activating H4K4me1/2/3, H3K79me2, and H3K27ac histone modifications.
    [Show full text]
  • The Role of Rfx Transcription Factors in Neurons and in the Human Brain
    From the DEPARTMENT OF BIOSCIENCES AND NUTRITION Karolinska Institutet, Stockholm, Sweden THE ROLE OF RFX TRANSCRIPTION FACTORS IN NEURONS AND IN THE HUMAN BRAIN Debora Sugiaman-Trapman Stockholm 2018 All previously published papers, figures and tables were reproduced with permission. Cover illustration by the author, titled “How I Imagine”. Some cartoon figures were taken from the internet (Accessed 7 May 2018), namely the human brain (Shutterstock 69563776), the Hedgehog signaling (Richard Wheeler, Calculated Images), and the Monas Tower of Jakarta (Freepik, Flaticon). Published by Karolinska Institutet Printed by E-Print AB 2018 © Debora Sugiaman-Trapman, 2018 ISBN 978-91-7831-043-2 The role of RFX transcription factors in neurons and in the human brain THESIS FOR DOCTORAL DEGREE (Ph.D.) By Debora Sugiaman-Trapman Friday the 28th of September, 2018 at 09:30 Lecture Hall 9Q Månen, Alfred Nobels allé 8, Huddinge Campus Flemingsberg Principal Supervisor: Opponent: Prof. Juha Kere Prof. Andrew Jarman Karolinska Institutet University of Edinburgh Department of Biosciences and Nutrition Centre for Discovery Brain Sciences Co-supervisor: Examination Board: Dr. Peter Swoboda Prof. Mattias Mannervik Karolinska Institutet Stockholm University Department of Biosciences and Nutrition Department of Molecular Biosciences The Wenner-Gren Institute Prof. Maria Eriksson Karolinska Institutet Department of Biosciences and Nutrition Assoc. Prof. Helena Karlström, Docent Karolinska Institutet Department of Neurobiology, Care Sciences and Society To my father, Djonady Sugiaman ABSTRACT RFX transcription factors (TFs) are conserved in animals, fungi and some amoebae, but not in algae, plants and protozoan species. The conservation is based on the protein sequence of the DNA binding domain (DBD).
    [Show full text]
  • Systematic Evaluation of the Effect of Common Snps on Pre-Mrna Splicing
    Systematic Evaluation of the Effect of Common SNPs on Pre-mRNA Splicing Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Abdou Gomaa Abdou ElSharawy B.Sc., M.Sc. Kiel, November 2008 II Referent:…………………………………………………….. Prof. Dr. Frank Kempken Koreferent:………………………………………………….. Prof. Dr. Stefan Schreiber Tag der mündlichen Prüfung: Kiel, den 11.11.08………... Zum Druck genehmigt: Kiel, den ..………………. ................................................... Der Dekan III To my parents my wife and my sons Ahmed & Amr IV TABLE OF CONTENTS 1 INTRODUCTION .............................................................................................................. 1 1.1. Single nucleotide polymorphisms: Biology and functional relevance ....................... 1 1.2. Pre-mRNA splicing: Mechanism and challenges ....................................................... 2 1.3. Alternative splicing and biological complexity: One gene, many proteins ............... 5 1.3.1. Patterns of alternative pre-mRNA splicing ........................................................ 5 1.3.2. Splicing regulatory mechanisms at genomic dimensions .................................. 7 1.3.3. Global functions and communication of alternative splicing ............................. 8 1.3.4. Components influencing exon recognition and alternative splicing .................. 9 1.4. Pre-mRNA (mis)splicing as a primary cause of disease .......................................... 11 1.4.1.
    [Show full text]