'Next- Generation' Sequencing Data Analysis

Total Page:16

File Type:pdf, Size:1020Kb

'Next- Generation' Sequencing Data Analysis Novel Algorithm Development for ‘Next- Generation’ Sequencing Data Analysis Agne Antanaviciute Submitted in accordance with the requirements for the degree of Doctor of Philosophy University of Leeds School of Medicine Leeds Institute of Biomedical and Clinical Sciences 12/2017 ii The candidate confirms that the work submitted is her own, except where work which has formed part of jointly-authored publications has been included. The contribution of the candidate and the other authors to this work has been explicitly given within the thesis where reference has been made to the work of others. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement ©2017 The University of Leeds and Agne Antanaviciute The right of Agne Antanaviciute to be identified as Author of this work has been asserted by her in accordance with the Copyright, Designs and Patents Act 1988. Acknowledgements I would like to thank all the people who have contributed to this work. First and foremost, my supervisors Dr Ian Carr, Professor David Bonthron and Dr Christopher Watson, who have provided guidance, support and motivation. I could not have asked for a better supervisory team. I would also like to thank my collaborators Dr Belinda Baquero and Professor Adrian Whitehouse for opening new, interesting research avenues. A special thanks to Dr Belinda Baquero for all the hard wet lab work without which at least half of this thesis would not exist. Thanks to everyone at the NGS Facility – Carolina Lascelles, Catherine Daley, Sally Harrison, Ummey Hany and Laura Crinnion – for the generation of NGS data used in this work and creating a supportive and stimulating work environment. Finally, I am extremely grateful to Sir Jules Thorn Charitable Trust, who funded this work. ii Abstract In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applications for interrogating whole genomes and transcriptomes in research, diagnostic and forensic settings. While the innovations in sequencing have been explosive, the development of scalable and robust bioinformatics software and algorithms for the analysis of new types of data generated by these technologies have struggled to keep up. As a result, large volumes of NGS data available in public repositories are severely underutilised, despite providing a rich resource for data mining applications. Indeed, the bottleneck in genome and transcriptome sequencing experiments has shifted from data generation to bioinformatics analysis and interpretation. This thesis focuses on development of novel bioinformatics software to bridge the gap between data availability and interpretation. The work is split between two core topics – computational prioritisation/identification of disease gene variants and identification of RNA N6 -adenosine Methylation from sequencing data. The first chapter briefly discusses the emergence and establishment of NGS technology as a core tool in biology and its current applications and perspectives. Chapter 2 introduces the problem of variant prioritisation in the context of Mendelian disease, where tens of thousands of potential candidates are generated by a typical sequencing experiment. Novel software developed for candidate gene prioritisation is described that utilises data mining of tissue-specific gene expression profiles (Chapter 3). The second part of chapter investigates an alternative approach to candidate variant prioritisation by leveraging functional and phenotypic descriptions of genes and diseases from multiple biomedical domain ontologies (Chapter 4). Chapter 5 discusses N6 AdenosineMethylation, a recently re-discovered post- transcriptional modification of RNA. The core of the chapter describes novel software developed for transcriptome-wide detection of this epitranscriptomic mark from sequencing data. Chapter 6 presents a case study application of the software, reporting the previously uncharacterised RNA methylome of Kaposi’s Sarcoma Herpes Virus. The chapter further discusses a putative novel N6-methyl-adenosine -RNA binding protein and its possible roles in the progression of viral infection. iii Contents 1. Introduction .................................................................................................................... 1 1.1 Nucleic acid sequencing......................................................................................................................... 1 1.2 ‘Next-Generation’ Sequencing ............................................................................................................. 6 1.3 “Next-Generation” Sequencing Applications ............................................................................. 10 1.4 Bioinformatics Challenges in ‘Next-Generation’ Sequencing Era ..................................... 12 1.5 Overview of this work ......................................................................................................................... 14 2. Candidate Disease Gene and Variant Prioritisation ......................................................... 16 2.1 Disease Gene/Variant Identification ............................................................................................. 16 2.1.1 Genetic Linkage and Association Studies .................................................................. 16 2.1.2 Whole Exome and Whole Genome Sequencing Studies ........................................... 18 2.1.3 Putative disease variant prioritisation....................................................................... 22 2.2 Computational Candidate Disease Gene Prioritisation ......................................................... 27 3. GeneTiER: gene Tissue Expression Ranker ...................................................................... 41 3.1 Motivation ................................................................................................................................................ 41 3.2 Methods ..................................................................................................................................................... 47 3.2.1 Software Implementation ......................................................................................... 47 3.2.2 User Input Processing ................................................................................................ 48 3.2.3 GeneTiER Database ................................................................................................... 51 3.2.4 The gene prioritization algorithm .............................................................................. 54 iv 3.3 Results and Discussion ........................................................................................................................ 55 3.3.1 Performance Assessment .......................................................................................... 55 3.3.2 Case Study Genes ...................................................................................................... 65 3.3.3 GeneTiER Application ................................................................................................ 74 3.3.4 Discussion .................................................................................................................. 76 4 OVA: Ontology Variant Analysis Tool .............................................................................. 79 4.1.1 Motivation ................................................................................................................. 79 4.1.2 Representing Biomedical Knowledge in Machine-Readable Ways ........................... 79 4.1.3 Biomedical Domain Ontologies ................................................................................. 83 4.2 Methods ..................................................................................................................................................... 85 4.2.1 Overview .................................................................................................................... 85 4.2.2 Software Implementation ......................................................................................... 87 4.3 Results and Discussion ...................................................................................................................... 103 4.3.1 Performance Assessment ........................................................................................ 103 4.3.2 OVA Application....................................................................................................... 121 4.3.3 Discussion ................................................................................................................ 132 5. N6-Methyl Adenosine Sequencing ............................................................................... 136 5.1 Introduction........................................................................................................................................... 136 5.1.2 N6-Methyl Adenosine Molecular Biology................................................................ 137 5.1.3 m6A detection .......................................................................................................... 155 v 5.1.4 Computational methods for m6A-seq data analysis ................................................ 159 Box 1. Hidden Markov models .......................................................................................................... 163 Box 2.
Recommended publications
  • IMGT-ONTOLOGY and IMGT Databases, Tools and Web
    Molecular Immunology 40 (2004) 647–660 Review IMGT-ONTOLOGY and IMGT databases, tools and Web resources for immunogenetics and immunoinformatics Marie-Paule Lefranc a,b,∗ a Laboratoire d’ImmunoGénétique Moléculaire, LIGM, Institut de Génétique Humaine IGH, Université Montpellier II, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France b Institut Universitaire de France, France Received 18 June 2003; received in revised form 2 September 2003; accepted 16 September 2003 Abstract The international ImMunoGeneTics information system® (IMGT; http://imgt.cines.fr), is a high quality integrated information system specialized in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC), and related proteins of the immune system (RPI) of human and other vertebrates, created in 1989, by the Laboratoire d’ImmunoGénétique Moléculaire (LIGM; Université Montpellier II and CNRS) at Montpellier, France. IMGT provides a common access to standardized data which include nucleotide and protein sequences, oligonucleotide primers, gene maps, genetic polymorphisms, specificities, 2D and 3D structures. IMGT consists of several sequence databases (IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB), one genome database (IMGT/GENE-DB) and one 3D structure database (IMGT/3Dstructure-DB), interactive tools for sequence analysis (IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/PhyloGene, IMGT/Allele-Align), for genome analysis (IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView) and for 3D struc- ture analysis (IMGT/StructuralQuery), and
    [Show full text]
  • METTL3-Mediated M6a Modification Is Required for Cerebellar Development
    RESEARCH ARTICLE METTL3-mediated m6A modification is required for cerebellar development Chen-Xin Wang1,2☯, Guan-Shen Cui3,4☯, Xiuying Liu5☯, Kai Xu1,2☯, Meng Wang5☯, Xin- Xin Zhang1, Li-Yuan Jiang1, Ang Li2,3, Ying Yang3, Wei-Yi Lai2,6, Bao-Fa Sun2,3,7, Gui- Bin Jiang6, Hai-Lin Wang6, Wei-Min Tong8, Wei Li1,2,7, Xiu-Jie Wang2,5,7*, Yun- Gui Yang2,3,7*, Qi Zhou1,2,7* 1 State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China, 2 University of Chinese Academy of Sciences, Beijing, China, 3 Key Laboratory of Genomic and Precision Medicine, Collaborative Innovation Center of Genetics and Development, Beijing a1111111111 Institute of Genomics, Chinese Academy of Sciences, Beijing, China, 4 Sino-Danish College, University of a1111111111 Chinese Academy of Sciences, Beijing, China, 5 Key Laboratory of Genetic Network Biology, Institute of a1111111111 Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China, 6 State Key Laboratory a1111111111 of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese a1111111111 Academy of Sciences, Beijing, China, 7 Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, China, 8 Department of Pathology, Center for Experimental Animal Research, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China ☯ These authors contributed equally to this work. OPEN ACCESS * [email protected] (XJW); [email protected] (YGY); [email protected] (QZ) Citation: Wang C-X, Cui G-S, Liu X, Xu K, Wang M, Zhang X-X, et al.
    [Show full text]
  • Genome Wide Association Study of Response to Interval and Continuous Exercise Training: the Predict‑HIIT Study Camilla J
    Williams et al. J Biomed Sci (2021) 28:37 https://doi.org/10.1186/s12929-021-00733-7 RESEARCH Open Access Genome wide association study of response to interval and continuous exercise training: the Predict-HIIT study Camilla J. Williams1†, Zhixiu Li2†, Nicholas Harvey3,4†, Rodney A. Lea4, Brendon J. Gurd5, Jacob T. Bonafglia5, Ioannis Papadimitriou6, Macsue Jacques6, Ilaria Croci1,7,20, Dorthe Stensvold7, Ulrik Wislof1,7, Jenna L. Taylor1, Trishan Gajanand1, Emily R. Cox1, Joyce S. Ramos1,8, Robert G. Fassett1, Jonathan P. Little9, Monique E. Francois9, Christopher M. Hearon Jr10, Satyam Sarma10, Sylvan L. J. E. Janssen10,11, Emeline M. Van Craenenbroeck12, Paul Beckers12, Véronique A. Cornelissen13, Erin J. Howden14, Shelley E. Keating1, Xu Yan6,15, David J. Bishop6,16, Anja Bye7,17, Larisa M. Haupt4, Lyn R. Grifths4, Kevin J. Ashton3, Matthew A. Brown18, Luciana Torquati19, Nir Eynon6 and Jef S. Coombes1* Abstract Background: Low cardiorespiratory ftness (V̇O2peak) is highly associated with chronic disease and mortality from all causes. Whilst exercise training is recommended in health guidelines to improve V̇O2peak, there is considerable inter-individual variability in the V̇O2peak response to the same dose of exercise. Understanding how genetic factors contribute to V̇O2peak training response may improve personalisation of exercise programs. The aim of this study was to identify genetic variants that are associated with the magnitude of V̇O2peak response following exercise training. Methods: Participant change in objectively measured V̇O2peak from 18 diferent interventions was obtained from a multi-centre study (Predict-HIIT). A genome-wide association study was completed (n 507), and a polygenic predictor score (PPS) was developed using alleles from single nucleotide polymorphisms= (SNPs) signifcantly associ- –5 ated (P < 1 ­10 ) with the magnitude of V̇O2peak response.
    [Show full text]
  • Identification of the Binding Partners for Hspb2 and Cryab Reveals
    Brigham Young University BYU ScholarsArchive Theses and Dissertations 2013-12-12 Identification of the Binding arP tners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactions and Non- Redundant Roles for Small Heat Shock Proteins Kelsey Murphey Langston Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Microbiology Commons BYU ScholarsArchive Citation Langston, Kelsey Murphey, "Identification of the Binding Partners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactions and Non-Redundant Roles for Small Heat Shock Proteins" (2013). Theses and Dissertations. 3822. https://scholarsarchive.byu.edu/etd/3822 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Identification of the Binding Partners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactions and Non-Redundant Roles for Small Heat Shock Proteins Kelsey Langston A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Master of Science Julianne H. Grose, Chair William R. McCleary Brian Poole Department of Microbiology and Molecular Biology Brigham Young University December 2013 Copyright © 2013 Kelsey Langston All Rights Reserved ABSTRACT Identification of the Binding Partners for HspB2 and CryAB Reveals Myofibril and Mitochondrial Protein Interactors and Non-Redundant Roles for Small Heat Shock Proteins Kelsey Langston Department of Microbiology and Molecular Biology, BYU Master of Science Small Heat Shock Proteins (sHSP) are molecular chaperones that play protective roles in cell survival and have been shown to possess chaperone activity.
    [Show full text]
  • Nuclear Organization and the Epigenetic Landscape of the Mus Musculus X-Chromosome Alicia Liu University of Connecticut - Storrs, [email protected]
    University of Connecticut OpenCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 8-9-2019 Nuclear Organization and the Epigenetic Landscape of the Mus musculus X-Chromosome Alicia Liu University of Connecticut - Storrs, [email protected] Follow this and additional works at: https://opencommons.uconn.edu/dissertations Recommended Citation Liu, Alicia, "Nuclear Organization and the Epigenetic Landscape of the Mus musculus X-Chromosome" (2019). Doctoral Dissertations. 2273. https://opencommons.uconn.edu/dissertations/2273 Nuclear Organization and the Epigenetic Landscape of the Mus musculus X-Chromosome Alicia J. Liu, Ph.D. University of Connecticut, 2019 ABSTRACT X-linked imprinted genes have been hypothesized to contribute parent-of-origin influences on social cognition. A cluster of imprinted genes Xlr3b, Xlr4b, and Xlr4c, implicated in cognitive defects, are maternally expressed and paternally silent in the murine brain. These genes defy classic mechanisms of autosomal imprinting, suggesting a novel method of imprinted gene regulation. Using Xlr3b and Xlr4c as bait, this study uses 4C-Seq on neonatal whole brain of a 39,XO mouse model, to provide the first in-depth analysis of chromatin dynamics surrounding an imprinted locus on the X-chromosome. Significant differences in long-range contacts exist be- tween XM and XP monosomic samples. In addition, XM interaction profiles contact a greater number of genes linked to cognitive impairment, abnormality of the nervous system, and abnormality of higher mental function. This is not a pattern that is unique to the imprinted Xlr3/4 locus. Additional Alicia J. Liu - University of Connecticut - 2019 4C-Seq experiments show that other genes on the X-chromosome, implicated in intellectual disability and/or ASD, also produce more maternal contacts to other X-linked genes linked to cognitive impairment.
    [Show full text]
  • Bioinformatics Analyses of Genomic Imprinting
    Bioinformatics Analyses of Genomic Imprinting Dissertation zur Erlangung des Grades des Doktors der Naturwissenschaften der Naturwissenschaftlich-Technischen Fakultät III Chemie, Pharmazie, Bio- und Werkstoffwissenschaften der Universität des Saarlandes von Barbara Hutter Saarbrücken 2009 Tag des Kolloquiums: 08.12.2009 Dekan: Prof. Dr.-Ing. Stefan Diebels Berichterstatter: Prof. Dr. Volkhard Helms Priv.-Doz. Dr. Martina Paulsen Vorsitz: Prof. Dr. Jörn Walter Akad. Mitarbeiter: Dr. Tihamér Geyer Table of contents Summary________________________________________________________________ I Zusammenfassung ________________________________________________________ I Acknowledgements _______________________________________________________II Abbreviations ___________________________________________________________ III Chapter 1 – Introduction __________________________________________________ 1 1.1 Important terms and concepts related to genomic imprinting __________________________ 2 1.2 CpG islands as regulatory elements ______________________________________________ 3 1.3 Differentially methylated regions and imprinting clusters_____________________________ 6 1.4 Reading the imprint __________________________________________________________ 8 1.5 Chromatin marks at imprinted regions___________________________________________ 10 1.6 Roles of repetitive elements ___________________________________________________ 12 1.7 Functional implications of imprinted genes _______________________________________ 14 1.8 Evolution and parental conflict ________________________________________________
    [Show full text]
  • The Role of Cyclin B3 in Mammalian Meiosis
    THE ROLE OF CYCLIN B3 IN MAMMALIAN MEIOSIS by Mehmet Erman Karasu A Dissertation Presented to the Faculty of the Louis V. Gerstner Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy New York, NY November, 2018 Scott Keeney, PhD Date Dissertation Mentor Copyright © Mehmet Erman Karasu 2018 DEDICATION I would like to dedicate this thesis to my parents, Mukaddes and Mustafa Karasu. I have been so lucky to have their support and unconditional love in this life. ii ABSTRACT Cyclins and cyclin dependent kinases (CDKs) lie at the center of the regulation of the cell cycle. Cyclins as regulatory partners of CDKs control the switch-like cell cycle transitions that orchestrate orderly duplication and segregation of genomes. Similar to somatic cell division, temporal regulation of cyclin-CDK activity is also important in meiosis, which is the specialized cell division that generates gametes for sexual production by halving the genome. Meiosis does so by carrying out one round of DNA replication followed by two successive divisions without another intervening phase of DNA replication. In budding yeast, cyclin-CDK activity has been shown to have a crucial role in meiotic events such as formation of meiotic double-strand breaks that initiate homologous recombination. Mammalian cells express numerous cyclins and CDKs, but how these proteins control meiosis remains poorly understood. Cyclin B3 was previously identified as germ cell specific, and its restricted expression pattern at the beginning of meiosis made it an interesting candidate to regulate meiotic events.
    [Show full text]
  • Ythdc2 Is an N6-Methyladenosine Binding Protein That Regulates Mammalian Spermatogenesis
    Cell Research (2017) 27:1115-1127. © 2017 IBCB, SIBS, CAS All rights reserved 1001-0602/17 $ 32.00 ORIGINAL ARTICLE www.nature.com/cr Ythdc2 is an N6-methyladenosine binding protein that regulates mammalian spermatogenesis Phillip J Hsu1, 2, 3, *, Yunfei Zhu4, *, Honghui Ma1, 2, *, Yueshuai Guo4, *, Xiaodan Shi4, Yuanyuan Liu4, Meijie Qi4, Zhike Lu1, 2, Hailing Shi1, 2, Jianying Wang4, Yiwei Cheng4, Guanzheng Luo1, 2, Qing Dai1, 2, Mingxi Liu4, Xuejiang Guo4, Jiahao Sha4, Bin Shen4, Chuan He1, 2, 5 1Department of Chemistry and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA; 2Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA; 3Committee on Immunology, The University of Chicago, Chicago, IL 60637, USA; 4State Key Laboratory of Reproductive Medicine, Department of Histology and Embryology, Nanjing Medical University, Nanjing 211166, China; 5Department of Biochemistry and Molecular Biology, The University of Chi- cago, Chicago, IL 60637, USA N6-methyladenosine (m6A) is the most common internal modification in eukaryotic mRNA. It is dynamically in- stalled and removed, and acts as a new layer of mRNA metabolism, regulating biological processes including stem cell pluripotency, cell differentiation, and energy homeostasis. m6A is recognized by selective binding proteins; YTHDF1 and YTHDF3 work in concert to affect the translation of m6A-containing mRNAs, YTHDF2 expedites mRNA decay, and YTHDC1 affects the nuclear processing of its targets. The biological function of YTHDC2, the final member of the YTH protein family, remains unknown. We report that YTHDC2 selectively binds m6A at its consensus motif. YTHDC2 enhances the translation efficiency of its targets and also decreases their mRNA abundance.
    [Show full text]
  • The Role of the X Chromosome in Embryonic and Postnatal Growth
    The role of the X chromosome in embryonic and postnatal growth Daniel Mark Snell A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Francis Crick Institute/Medical Research Council National Institute for Medical Research University College London January 28, 2018 2 I, Daniel Mark Snell, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the work. Abstract Women born with only a single X chromosome (XO) have Turner syndrome (TS); and they are invariably of short stature. XO female mice are also small: during embryogenesis, female mice with a paternally-inherited X chromosome (XPO) are smaller than XX littermates; whereas during early postnatal life, both XPO and XMO (maternal) mice are smaller than their XX siblings. Here I look to further understand the genetic bases of these phenotypes, and potentially inform areas of future investigation into TS. Mouse pre-implantation embryos preferentially silence the XP via the non-coding RNA Xist.XPO embryos are smaller than XX littermates at embryonic day (E) 10.5, whereas XMO embryos are not. Two possible hypotheses explain this obser- vation. Inappropriate expression of Xist in XPO embryos may cause transcriptional silencing of the single X chromosome and result in embryos nullizygous for X gene products. Alternatively, there could be imprinted genes on the X chromosome that impact on growth and manifest in growth retarded XPO embryos. In contrast, dur- ing the first three weeks of postnatal development, both XPO and XMO mice show a growth deficit when compared with XX littermates.
    [Show full text]
  • Association of Gene Ontology Categories with Decay Rate for Hepg2 Experiments These Tables Show Details for All Gene Ontology Categories
    Supplementary Table 1: Association of Gene Ontology Categories with Decay Rate for HepG2 Experiments These tables show details for all Gene Ontology categories. Inferences for manual classification scheme shown at the bottom. Those categories used in Figure 1A are highlighted in bold. Standard Deviations are shown in parentheses. P-values less than 1E-20 are indicated with a "0". Rate r (hour^-1) Half-life < 2hr. Decay % GO Number Category Name Probe Sets Group Non-Group Distribution p-value In-Group Non-Group Representation p-value GO:0006350 transcription 1523 0.221 (0.009) 0.127 (0.002) FASTER 0 13.1 (0.4) 4.5 (0.1) OVER 0 GO:0006351 transcription, DNA-dependent 1498 0.220 (0.009) 0.127 (0.002) FASTER 0 13.0 (0.4) 4.5 (0.1) OVER 0 GO:0006355 regulation of transcription, DNA-dependent 1163 0.230 (0.011) 0.128 (0.002) FASTER 5.00E-21 14.2 (0.5) 4.6 (0.1) OVER 0 GO:0006366 transcription from Pol II promoter 845 0.225 (0.012) 0.130 (0.002) FASTER 1.88E-14 13.0 (0.5) 4.8 (0.1) OVER 0 GO:0006139 nucleobase, nucleoside, nucleotide and nucleic acid metabolism3004 0.173 (0.006) 0.127 (0.002) FASTER 1.28E-12 8.4 (0.2) 4.5 (0.1) OVER 0 GO:0006357 regulation of transcription from Pol II promoter 487 0.231 (0.016) 0.132 (0.002) FASTER 6.05E-10 13.5 (0.6) 4.9 (0.1) OVER 0 GO:0008283 cell proliferation 625 0.189 (0.014) 0.132 (0.002) FASTER 1.95E-05 10.1 (0.6) 5.0 (0.1) OVER 1.50E-20 GO:0006513 monoubiquitination 36 0.305 (0.049) 0.134 (0.002) FASTER 2.69E-04 25.4 (4.4) 5.1 (0.1) OVER 2.04E-06 GO:0007050 cell cycle arrest 57 0.311 (0.054) 0.133 (0.002)
    [Show full text]
  • Large-Scale Functional Prediction of Individual M6a RNA Methylation Sites from an RNA Co-Methylation Network
    Wu et al. BMC Bioinformatics (2019) 20:223 https://doi.org/10.1186/s12859-019-2840-3 RESEARCHARTICLE Open Access m6Acomet: large-scale functional prediction of individual m6A RNA methylation sites from an RNA co- methylation network Xiangyu Wu1,2, Zhen Wei1,2, Kunqi Chen1,2, Qing Zhang1,4, Jionglong Su5, Hui Liu6, Lin Zhang6* and Jia Meng1,3* Abstract Background: Over one hundred different types of post-transcriptional RNA modifications have been identified in human. Researchers discovered that RNA modifications can regulate various biological processes, and RNA methylation, especially N6-methyladenosine, has become one of the most researched topics in epigenetics. Results: To date, the study of epitranscriptome layer gene regulation is mostly focused on the function of mediator proteins of RNA methylation, i.e., the readers, writers and erasers. There is limited investigation of the functional relevance of individual m6A RNA methylation site. To address this, we annotated human m6A sites in large-scale based on the guilt-by-association principle from an RNA co-methylation network. It is constructed based on public human MeRIP-Seq datasets profiling the m6A epitranscriptome under 32 independent experimental conditions. By systematically examining the network characteristics obtained from the RNA methylation profiles, a total of 339,158 putative gene ontology functions associated with 1446 human m6A sites were identified. These are biological functions that may be regulated at epitranscriptome layer via reversible m6A RNA methylation. The results were further validated on a soft benchmark by comparing to a random predictor. Conclusions: An online web server m6Acomet was constructed to support direct query for the predicted biological functions of m6A sites as well as the sites exhibiting co-methylated patterns at the epitranscriptome layer.
    [Show full text]
  • Supplementary Materials
    Supplementary materials Supplementary Table S1: MGNC compound library Ingredien Molecule Caco- Mol ID MW AlogP OB (%) BBB DL FASA- HL t Name Name 2 shengdi MOL012254 campesterol 400.8 7.63 37.58 1.34 0.98 0.7 0.21 20.2 shengdi MOL000519 coniferin 314.4 3.16 31.11 0.42 -0.2 0.3 0.27 74.6 beta- shengdi MOL000359 414.8 8.08 36.91 1.32 0.99 0.8 0.23 20.2 sitosterol pachymic shengdi MOL000289 528.9 6.54 33.63 0.1 -0.6 0.8 0 9.27 acid Poricoic acid shengdi MOL000291 484.7 5.64 30.52 -0.08 -0.9 0.8 0 8.67 B Chrysanthem shengdi MOL004492 585 8.24 38.72 0.51 -1 0.6 0.3 17.5 axanthin 20- shengdi MOL011455 Hexadecano 418.6 1.91 32.7 -0.24 -0.4 0.7 0.29 104 ylingenol huanglian MOL001454 berberine 336.4 3.45 36.86 1.24 0.57 0.8 0.19 6.57 huanglian MOL013352 Obacunone 454.6 2.68 43.29 0.01 -0.4 0.8 0.31 -13 huanglian MOL002894 berberrubine 322.4 3.2 35.74 1.07 0.17 0.7 0.24 6.46 huanglian MOL002897 epiberberine 336.4 3.45 43.09 1.17 0.4 0.8 0.19 6.1 huanglian MOL002903 (R)-Canadine 339.4 3.4 55.37 1.04 0.57 0.8 0.2 6.41 huanglian MOL002904 Berlambine 351.4 2.49 36.68 0.97 0.17 0.8 0.28 7.33 Corchorosid huanglian MOL002907 404.6 1.34 105 -0.91 -1.3 0.8 0.29 6.68 e A_qt Magnogrand huanglian MOL000622 266.4 1.18 63.71 0.02 -0.2 0.2 0.3 3.17 iolide huanglian MOL000762 Palmidin A 510.5 4.52 35.36 -0.38 -1.5 0.7 0.39 33.2 huanglian MOL000785 palmatine 352.4 3.65 64.6 1.33 0.37 0.7 0.13 2.25 huanglian MOL000098 quercetin 302.3 1.5 46.43 0.05 -0.8 0.3 0.38 14.4 huanglian MOL001458 coptisine 320.3 3.25 30.67 1.21 0.32 0.9 0.26 9.33 huanglian MOL002668 Worenine
    [Show full text]