Structural RNA Homology Search and Alignment Using Covariance Models Eric Nawrocki Washington University in St

Total Page:16

File Type:pdf, Size:1020Kb

Structural RNA Homology Search and Alignment Using Covariance Models Eric Nawrocki Washington University in St Washington University in St. Louis Washington University Open Scholarship All Theses and Dissertations (ETDs) January 2009 Structural RNA Homology Search and Alignment Using Covariance Models Eric Nawrocki Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/etd Recommended Citation Nawrocki, Eric, "Structural RNA Homology Search and Alignment Using Covariance Models" (2009). All Theses and Dissertations (ETDs). 256. https://openscholarship.wustl.edu/etd/256 This Dissertation is brought to you for free and open access by Washington University Open Scholarship. It has been accepted for inclusion in All Theses and Dissertations (ETDs) by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS Division of Biology and Biomedical Sciences (Computational Biology) Dissertation Examination Committee: Sean Eddy, Chair Michael Brent Jeremy Buhler Justin Fay Jeff Gordon Rob Mitra Gary Stormo STRUCTURAL RNA HOMOLOGY SEARCH AND ALIGNMENT USING COVARIANCE MODELS by Eric Paul Nawrocki A dissertation presented to the Graduate School of Arts and Sciences of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2009 Saint Louis, Missouri copyright by Eric Paul Nawrocki 2009 ABSTRACT OF THE DISSERTATION Structural RNA Homology Search and Alignment Using Covariance Models by Eric Paul Nawrocki Doctor of Philosophy in Biology and Biomedical Sciences (Computational Biology) Washington University in St. Louis, 2009 Sean R. Eddy, Chairman Functional RNA elements do not encode proteins, but rather function directly as RNAs. Many different types of RNAs play important roles in a wide range of cellular processes, including protein synthesis, gene regulation, protein transport, splicing, and more. Because important sequence and structural features tend to be evolutionarily conserved, one way to learn about functional RNAs is through comparative sequence analysis - by collecting and aligning examples of homologous RNAs and comparing them. Covariance models (CMs) are powerful computational tools for homology search and alignment that score both the conserved sequence and secondary structure of an RNA family. However, due to the high computational complexity of their search and alignment algorithms, searches against large databases and alignment of large RNAs like small subunit ribosomal RNA (SSU rRNA) are prohibitively slow. Large-scale alignment of SSU rRNA is of particular utility for environmental survey studies of microbial diversity which often use the rRNA as a phylogenetic marker of microorganisms. In this work, we improve CM methods by making them faster and more sensitive to remote homology. To accelerate searches, we introduce a query-dependent banding (QDB) technique that makes scoring sequences more efficient by restricting the possible lengths of structural elements based on their probability given the model. We combine QDB with a ii complementary filtering method that quickly prunes away database subsequences deemed unlikely to receive high CM scores based on sequence conservation alone. To increase search sensitivity, we apply two model parameterization strategies from protein homology search tools to CMs. As judged by our benchmark, these combined approaches yield about a 250-fold speedup and significant increase in search sensitivity compared with previous implementations. To accelerate alignment, we apply a method that uses a fast sequence- based alignment of a target sequence to determine constraints for the more expensive CM sequence- and structure-based alignment. This technique reduces the time required to align one SSU rRNA sequence from about 15 minutes to 1 second with a negligible effect on alignment accuracy. Collectively, these improvements make CMs more powerful and practical tools for RNA homology search and alignment. iii Acknowledgements I gratefully acknowledge financial support from the Howard Hughes Medical Institute (HHMI) and a National Institutes of Health National Human Genome Research Initiative (NIH NHGRI) Institutional Training Grant in Genomic Science, T32-HG000045. Sean Eddy’s lab in Washington University at St. Louis, where this work took place from 2004 to 2006, was supported by NIH NHGRI grant RO1-HG01363, HHMI, and Alvin Goldfarb. Since then, my research in the Eddy lab at Janelia Farm Research Campus in Ashburn, Virginia has been financially supported by HHMI. This dissertation is based on and benefits from important research from many other scientists. Zasha Weinberg and Larry Ruzzo’s excellent work on RNA homology search, Michael Brown and David Haussler’s clever methods for accelerating structural RNA align- ment, and the wealth of freely available information at Robin Gutell’s Comparative RNA Website (crw) have been particularly crucial. In addition, I have been extremely fortunate to work in the lab of Sean Eddy. Sean’s tireless work ethic and depth of knowledge in the biological and computational sciences are staggering. The past and present members of his lab bring dedication, skill and enthusiasm to their work that is unmatched. It has been an honor to learn from and work with each of them. Finally, I want to thank my wife, Michael Hopkins, for her unwavering support and confidence, and for adding so much fun to the last six years of my life - none of this would have been possible without her. iv This work is dedicated to my parents, Kathy and Ed Nawrocki. Thank you for all the love, support, and opportunities that you have provided for me. v Contents Acknowledgements iv Part 1: RNA homology search 1 1 Introduction to RNA homology search 2 1.1 Functional RNA elements . 3 1.2 Comparative sequence analysis . 9 1.3 Computational methods for sequence homology search . 15 1.4 Exploiting conserved structure in RNA homology searches . 34 1.5 Stochastic context-free grammars for RNA sequence and structure modeling 41 1.6 Covariance models: profile stochastic context-free grammars . 46 1.7 Addressing the limitations of covariance models . 57 1.8 Outline of Part 1 of this work . 63 2 QDB for faster RNA similarity searches 65 2.1 Abstract . 65 2.2 Introduction . 66 2.3 Results . 69 2.4 Discussion . 91 2.5 Materials and Methods . 95 3 Infernal 1.0: inference of RNA alignments 96 vi 3.1 Abstract . 96 3.2 Introduction . 96 3.3 Usage . 97 3.4 Performance . 98 4 A filter pipeline for accelerating covariance model searches 102 4.1 Previous work on accelerating CM searches using filters . 103 4.2 Optimized implementations of the CYK and Inside algorithms . 104 4.3 Infernal’s two-stage filter acceleration pipeline . 106 4.4 Evaluation . 118 4.5 Implementation . 127 4.6 Conclusion and future directions . 130 5 Computational identification of functional RNA homologs in metage- nomic data 132 5.1 Exploiting conserved structure in RNA similarity searches . 134 5.2 Infernal: software for RNA homology search and alignment . 138 5.3 Rfam: high-throughput RNA homology search and annotation . 141 5.4 Limitations of CMs . 145 5.5 Conclusion . 148 6 Conclusion to RNA homology search 149 Part 2: RNA alignment 151 7 Introduction to small subunit ribosomal RNA alignment 152 7.1 SSU and the tree of life . 153 7.2 Rapid culture-independent SSU sequence determination . 154 7.3 Environmental sequencing surveys . 155 7.4 Comparative SSU analysis for environmental surveys . 161 vii 7.5 SSU alignment methods . 169 7.6 Dedicated SSU databases and alignment tools . 175 7.7 Developing a new SSU alignment tool . 181 8 HMM banding for faster structural RNA alignment 184 8.1 Faster alignment using banded dynamic programming . 184 8.2 HMM banded alignment in Infernal . 186 8.3 Comparison of HMM and query-dependent banding . 194 8.4 Benchmarking . 195 8.5 Constructing an HMM that is maximally similar to a CM . 201 9 SSU-align: a tool for structural alignment of SSU rRNA sequences 206 9.1 Aligning SSU sequences with SSU-align . 207 9.2 Automated probabilistic alignment masking . 208 9.3 Implementation . 214 9.4 SSU-align’s SSU rRNA sequence and structure models . 217 Bibliography 236 Vita 265 viii List of Figures 1.1 RNA secondary structure elements. 8 1.2 Predicted secondary structure of SSU rRNA from E. coli (a bacterium) and M. vannielii (an archaeon). 10 1.3 Predicted secondary structure of RNase P RNA from E. coli (a bacterium) and M. vannielii (an archaeon). 11 1.4 Toy example of a multiple sequence alignment. 12 1.5 Dynamic programming matrix filled during a Needleman-Wunsch-Sellers align- ment of two sequences. 19 1.6 A banded pairwise alignment dynamic programming matrix. 21 1.7 A toy hidden Markov model. 27 1.8 A simple profile HMM. 29 1.9 Additional information gained from modeling structure . 39 1.10 Information in a sequence-only versus a sequence and structure profile. 42 1.11 A parse tree for a dual-stem loop grammar and the corresponding RNA secondary structure . 45 1.12 Input alignment, CM guide tree, CM state graph and parse trees. 48 1.13 Examples of CM local begin and end transitions. 53 1.14 Histogram of number of sequences in rfam 7.0 seed alignments . 56 1.15 A pattern defining the E-loop RNA structural motif used by the rnamotif program. 62 ix 2.1 An example RNA family and corresponding CM . 71 2.2 Effect of transition priors on band calculation . 78 2.3 Effect of varying the β parameter on sensitivity, specificity, and speedup . 88 2.4 CPU time required by CM searches with and without QDB . 89 2.5 ROC curves for the benchmark (QDB chapter) . 90 3.1 ROC curves for the benchmark (infernal 1.0 chapter) . 99 4.1 Average QDB CYK and HMM Forward scores versus Inside scores for various RNA families. 111 4.2 CM score histograms of random, known, and sampled sequences for three RNA families. 115 4.3 MER versus time for the benchmark.
Recommended publications
  • Proquest Dissertations
    Automated learning of protein involvement in pathogenesis using integrated queries Eithon Cadag A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: Department of Medical Education and Biomedical Informatics UMI Number: 3394276 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation Publishing UMI 3394276 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. uest ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Eithon Cadag and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Reading Committee: (SjLt KJ. £U*t~ Peter Tgffczy-Hornoch In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with "fair use" as prescribed in the U.S.
    [Show full text]
  • Molecular Biology for Computer Scientists
    CHAPTER 1 Molecular Biology for Computer Scientists Lawrence Hunter “Computers are to biology what mathematics is to physics.” — Harold Morowitz One of the major challenges for computer scientists who wish to work in the domain of molecular biology is becoming conversant with the daunting intri- cacies of existing biological knowledge and its extensive technical vocabu- lary. Questions about the origin, function, and structure of living systems have been pursued by nearly all cultures throughout history, and the work of the last two generations has been particularly fruitful. The knowledge of liv- ing systems resulting from this research is far too detailed and complex for any one human to comprehend. An entire scientific career can be based in the study of a single biomolecule. Nevertheless, in the following pages, I attempt to provide enough background for a computer scientist to understand much of the biology discussed in this book. This chapter provides the briefest of overviews; I can only begin to convey the depth, variety, complexity and stunning beauty of the universe of living things. Much of what follows is not about molecular biology per se. In order to 2ARTIFICIAL INTELLIGENCE & MOLECULAR BIOLOGY explain what the molecules are doing, it is often necessary to use concepts involving, for example, cells, embryological development, or evolution. Bi- ology is frustratingly holistic. Events at one level can effect and be affected by events at very different levels of scale or time. Digesting a survey of the basic background material is a prerequisite for understanding the significance of the molecular biology that is described elsewhere in the book.
    [Show full text]
  • Bacterial Small Rnas in the Genus Herbaspirillum Spp
    International Journal of Molecular Sciences Article Bacterial Small RNAs in the Genus Herbaspirillum spp. Amanda Carvalho Garcia 1,* , Vera Lúcia Pereira dos Santos 1, Teresa Cristina Santos Cavalcanti 2, Luiz Martins Collaço 2 and Hans Graf 1 1 Department of Internal Medicine, Federal University of Paraná, Curitiba 80.060-240, Brazil; [email protected] (V.L.P.d.S.); [email protected] (H.G.) 2 Department of Pathology, Federal University of Paraná, PR, Curitiba 80.060-240, Brazil; [email protected] (T.C.S.C.); [email protected] (L.M.C.) * Correspondence: [email protected]; Tel.: +55-41-99833-0124 Received: 5 November 2018; Accepted: 12 December 2018; Published: 22 December 2018 Abstract: The genus Herbaspirillum includes several strains isolated from different grasses. The identification of non-coding RNAs (ncRNAs) in the genus Herbaspirillum is an important stage studying the interaction of these molecules and the way they modulate physiological responses of different mechanisms, through RNA–RNA interaction or RNA–protein interaction. This interaction with their target occurs through the perfect pairing of short sequences (cis-encoded ncRNAs) or by the partial pairing of short sequences (trans-encoded ncRNAs). However, the companion Hfq can stabilize interactions in the trans-acting class. In addition, there are Riboswitches, located at the 50 end of mRNA and less often at the 30 end, which respond to environmental signals, high temperatures, or small binder molecules. Recently, CRISPR (clustered regularly interspaced palindromic repeats), in prokaryotes, have been described that consist of serial repeats of base sequences (spacer DNA) resulting from a previous exposure to exogenous plasmids or bacteriophages.
    [Show full text]
  • Global Analysis of Small RNA and Mrna Targets of Hfq
    Blackwell Science, LtdOxford, UKMMIMolecular Microbiology 1365-2958Blackwell Publishing Ltd, 200350411111124Original ArticleA. Zhang et al.Global analysis of Hfq targets Molecular Microbiology (2003) 50(4), 1111–1124 doi:10.1046/j.1365-2958.2003.03734.x Global analysis of small RNA and mRNA targets of Hfq Aixia Zhang,1 Karen M. Wassarman,2 greatly expanded over the past few years (reviewed in Carsten Rosenow,3 Brian C. Tjaden,4† Gisela Storz1* Gottesman, 2002; Grosshans and Slack, 2002; Storz, and Susan Gottesman5* 2002; Wassarman, 2002; Massé et al., 2003). A subset of 1Cell Biology and Metabolism Branch, National Institute of these small RNAs act via short, interrupted basepairing Child Health and Development, Bethesda MD 20892, interactions with target mRNAs. How do these small USA. RNAs find and anneal to their targets? In Escherichia coli, 2Department of Bacteriology, University of Wisconsin, at least part of the answer lies in their association with Madison, WI 53706, USA. and dependence upon the RNA chaperone, Hfq. The 3Affymetrix, Santa Clara, CA 95051, USA. abundant Hfq protein was identified originally as a host 4Department of Computer Science, University of factor for RNA phage Qb replication (Franze de Fernan- Washington, Seattle, WA 98195, USA. dez et al., 1968), but later hfq mutants were found to 5Laboratory of Molecular Biology, National Cancer exhibit multiple phenotypes (Brown and Elliott, 1996; Muf- Institute, Bethesda, MD 20892, USA. fler et al., 1996). These defects are, at least in part, a reflection of the fact that Hfq is required for the function of several small RNAs including DsrA, RprA, Spot42, Summary OxyS and RyhB (Zhang et al., 1998; Sledjeski et al., Hfq, a bacterial member of the Sm family of RNA- 2001; Massé and Gottesman, 2002; Møller et al., 2002).
    [Show full text]
  • Department of Energy Office of Health and Environmental Research SEQUENCING the HUMAN GENOME Summary Report of the Santa Fe Workshop March 3-4, 1986
    Department of Energy Office of Health and Environmental Research SEQUENCING THE HUMAN GENOME Summary Report of the Santa Fe Workshop March 3-4, 1986 Los Alamos National Laboratory Los Alamos Los Alamos, New Mexico 87545 Los Alamos National Laboratory is operated by the University of California for the United States Department of Energy under contract W-7405-ENG-36. DEPARTMENT OF ENERGY OFFICE OF HEALTH AND ENVIRONMENTAL RESEARCH SEQUENCING THE HUMAN GENOME SUMMARY REPORT ON THE SANTA FE WORKSHOP (MARCH 3-4, 1986) Executive Summary. The following is a summary of the Santa Fe Workshop held on March 3 and 4, 1986. The workshop was sponsored by the Office of Health and Environmental Research (OHER) and Los Alamos National Laboratory (LANL) and dedicated to examining the feasibility, advisability, and approaches to sequencing the human genome. The workshop considered four principal topics: I. Technologies to be employed. II. Expected benefits. III. Architecture of the enterprise. IV. Participants and funding. I . Technology The participants of the workshop foresaw extraordinary and continuing progress in the efficiency and accuracy of mapping, ordering , and sequencing technologies. They suggested that a coordinated analysis of the human genome begin with the task of ordering overlapping recombinant DNA fragments obtained from purified human chromosomes that would provide an infrastructure for sequencing activity. At the same time, they support in-depth evaluation of current and developing strategies for sequencing including possible applications of automation and robotics that would minimize the time and cost of sequencing. II. Benefits The socio-political and health benefits, and the benefit:cost ratio were seen as highly favorable not only for human health, but in addition for the development of new diagnostic, preventative and therapeutic tools, jobs, and industries.
    [Show full text]
  • Ontology-Based Methods for Analyzing Life Science Data
    Habilitation a` Diriger des Recherches pr´esent´ee par Olivier Dameron Ontology-based methods for analyzing life science data Soutenue publiquement le 11 janvier 2016 devant le jury compos´ede Anita Burgun Professeur, Universit´eRen´eDescartes Paris Examinatrice Marie-Dominique Devignes Charg´eede recherches CNRS, LORIA Nancy Examinatrice Michel Dumontier Associate professor, Stanford University USA Rapporteur Christine Froidevaux Professeur, Universit´eParis Sud Rapporteure Fabien Gandon Directeur de recherches, Inria Sophia-Antipolis Rapporteur Anne Siegel Directrice de recherches CNRS, IRISA Rennes Examinatrice Alexandre Termier Professeur, Universit´ede Rennes 1 Examinateur 2 Contents 1 Introduction 9 1.1 Context ......................................... 10 1.2 Challenges . 11 1.3 Summary of the contributions . 14 1.4 Organization of the manuscript . 18 2 Reasoning based on hierarchies 21 2.1 Principle......................................... 21 2.1.1 RDF for describing data . 21 2.1.2 RDFS for describing types . 24 2.1.3 RDFS entailments . 26 2.1.4 Typical uses of RDFS entailments in life science . 26 2.1.5 Synthesis . 30 2.2 Case study: integrating diseases and pathways . 31 2.2.1 Context . 31 2.2.2 Objective . 32 2.2.3 Linking pathways and diseases using GO, KO and SNOMED-CT . 32 2.2.4 Querying associated diseases and pathways . 33 2.3 Methodology: Web services composition . 39 2.3.1 Context . 39 2.3.2 Objective . 40 2.3.3 Semantic compatibility of services parameters . 40 2.3.4 Algorithm for pairing services parameters . 40 2.4 Application: ontology-based query expansion with GO2PUB . 43 2.4.1 Context . 43 2.4.2 Objective .
    [Show full text]
  • Mapping Our Genes—Genome Projects: How Big? How Fast?
    Mapping Our Genes—Genome Projects: How Big? How Fast? April 1988 NTIS order #PB88-212402 Recommended Citation: U.S. Congress, Office of Technology Assessment, Mapping Our Genes-The Genmne Projects.’ How Big, How Fast? OTA-BA-373 (Washington, DC: U.S. Government Printing Office, April 1988). Library of Congress Catalog Card Number 87-619898 For sale by the Superintendent of Documents U.S. Government Printing Office, Washington, DC 20402-9325 (order form can be found in the back of this report) Foreword For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technol- ogy, and politics. Congress is responsible for ‘(writing the rules” of what various Federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the U.S. Congress, The House Committee on Energy and Commerce requested that OTA undertake the project. The House Committee on Science, Space, and Technology, the Senate Com- mittee on Labor and Human Resources, and the Senate Committee on Energy and Natu- ral Resources also asked OTA to address specific points of concern to them. Congres- sional interest focused on several issues: ● how to assess the rationales for conducting human genome projects, ● how to fund human genome projects (at what level and through which mech- anisms), ● how to coordinate the scientific and technical programs of the several Federal agencies and private interests already supporting various genome projects, and ● how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology.
    [Show full text]
  • Outer Membrane Protein Genes and Their Small Non-Coding RNA Regulator Genes in Photorhabdus Luminescens Dimitris Papamichail1 and Nicholas Delihas*2
    Biology Direct BioMed Central Research Open Access Outer membrane protein genes and their small non-coding RNA regulator genes in Photorhabdus luminescens Dimitris Papamichail1 and Nicholas Delihas*2 Address: 1Department of Computer Sciences, SUNY, Stony Brook, NY 11794-4400, USA and 2Department of Molecular Genetics and Microbiology, School of Medicine, SUNY, Stony Brook, NY 11794-5222, USA Email: Dimitris Papamichail - [email protected]; Nicholas Delihas* - [email protected] * Corresponding author Published: 22 May 2006 Received: 08 May 2006 Accepted: 22 May 2006 Biology Direct 2006, 1:12 doi:10.1186/1745-6150-1-12 This article is available from: http://www.biology-direct.com/content/1/1/12 © 2006 Papamichail and Delihas; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Introduction: Three major outer membrane protein genes of Escherichia coli, ompF, ompC, and ompA respond to stress factors. Transcripts from these genes are regulated by the small non-coding RNAs micF, micC, and micA, respectively. Here we examine Photorhabdus luminescens, an organism that has a different habitat from E. coli for outer membrane protein genes and their regulatory RNA genes. Results: By bioinformatics analysis of conserved genetic loci, mRNA 5'UTR sequences, RNA secondary structure motifs, upstream promoter regions and protein sequence homologies, an ompF -like porin gene in P. luminescens as well as a duplication of this gene have been predicted.
    [Show full text]
  • Characterizing the Dna-Binding Site Specificities of Cis2his2 Zinc Fingers
    MQP-ID-DH-UM1 C H A R A C T E RI Z IN G T H E DN A-BINDIN G SI T E SPE C I F I C I T I ES O F C IS2H IS2 Z IN C F IN G E RS A Major Qualifying Project Report Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degrees of Bachelor of Science in Biochemistry and Biology and Biotechnology by _________________________ Heather Bell April 26, 2012 APPROVED: ____________________ ____________________ ____________________ Scot Wolfe, PhD Destin Heilman, PhD David Adams, PhD Gene Function and Exp. Biochemistry Biology and Biotech UMass Medical School WPI Project Advisor WPI Project Advisor MAJOR ADVISOR A BST R A C T The ability to modularly assemble Zinc Finger Proteins (ZFPs) as well as the wide variety of DNA sequences they can recognize, make ZFPs an ideal framework to design novel DNA-binding proteins. However, due to the complexity of the interactions between residues in the ZF recognition helix and the DNA-binding site there is currently no comprehensive recognition code that would allow for the accurate prediction of the DNA ZFP binding motifs or the design of novel ZFPs for a desired target site. Through the analysis of the DNA-binding site specificities of 98 ZFP clones, determined through a bacterial one-hybrid selection system, a predictive model was created that can accurately predict the binding site motifs of novel ZFPs. 2 T A B L E O F C O N T E N TS Signature Page ««««««««««««««««««««««««««« $EVWUDFW«««««««««««««««««««««««««««««« 7DEOHRI&RQWHQWV«««««««««««««««««««««««««« $FNQRZOHGJHPHQWV««««««««««««««««««««««««« %DFNJURXQG«««««««««««««««««««««««««««« Project Purpose «««««««««««««««««««««««««««15 0HWKRGV««««««««««««««««««««««««««««««16 5HVXOWV««««««««««««««««««««««««««««««21 'LVFXVVLRQ«««««««««««««««««««««««««««««28 Bibliograph\«««««««««««««««««««««««««««« 6XSSOHPHQWDO««««««««««««««««««««««««««« 3 A C K N O W L E D G E M E N TS I would like to thank Dr.
    [Show full text]
  • The International Society for Computational Biology 10Th Anniversary Lawrence Hunter, Russ B
    Message from ISCB The International Society for Computational Biology 10th Anniversary Lawrence Hunter, Russ B. Altman, Philip E. Bourne* Introduction training programs in bioinformatics or significant part of the original computational biology. By 1996, motivation for founding the Society PLoS Computational Biology is the however, high throughput molecular was to provide a stable financial home official journal of the International biology and the attendant need for for the ISMB conference. The first few Society for Computational Biology informatics had clearly arrived. In conferences were sufficiently successful (ISCB), a partnership that was formed addition to the founding of ISCB, 1996 to create a financial nest egg that was during the Journal’s conception in saw the sequencing of the first genome used each year to start the process of 2005. With ISCB being the only of a free-living organism (yeast) and planning and executing the next year’s international body representing Affymetrix’s release of its first meeting. Initially, relatively modest computational biologists, it made commercial DNA chip. checks were cut and sent from one perfect sense for PLoS Computational The hot topics in bioinformatics organizer to another informally. As the Biology to be closely affiliated. The circa 1996, at least as reflected by the size of these checks increased, the Society had to take more of a chance conference on Intelligent Systems for organizers (and their home than similar societies, choosing to step Molecular Biology (ISMB) at institutions) became increasingly away from an existing financially Washington University in St. Louis, uncomfortable exchanging them beneficial subscription journal to align included issues which have largely been informally, and decided that they with an open access publication as a solved (such as gene finding or needed an organization—thus, ISCB.
    [Show full text]
  • Biocuration 2016 - Posters
    Biocuration 2016 - Posters Source: http://www.sib.swiss/events/biocuration2016/posters 1 RAM: A standards-based database for extracting and analyzing disease-specified concepts from the multitude of biomedical resources Jinmeng Jia and Tieliu Shi Each year, millions of people around world suffer from the consequence of the misdiagnosis and ineffective treatment of various disease, especially those intractable diseases and rare diseases. Integration of various data related to human diseases help us not only for identifying drug targets, connecting genetic variations of phenotypes and understanding molecular pathways relevant to novel treatment, but also for coupling clinical care and biomedical researches. To this end, we built the Rare disease Annotation & Medicine (RAM) standards-based database which can provide reference to map and extract disease-specified information from multitude of biomedical resources such as free text articles in MEDLINE and Electronic Medical Records (EMRs). RAM integrates disease-specified concepts from ICD-9, ICD-10, SNOMED-CT and MeSH (http://www.nlm.nih.gov/mesh/MBrowser.html) extracted from the Unified Medical Language System (UMLS) based on the UMLS Concept Unique Identifiers for each Disease Term. We also integrated phenotypes from OMIM for each disease term, which link underlying mechanisms and clinical observation. Moreover, we used disease-manifestation (D-M) pairs from existing biomedical ontologies as prior knowledge to automatically recognize D-M-specific syntactic patterns from full text articles in MEDLINE. Considering that most of the record-based disease information in public databases are textual format, we extracted disease terms and their related biomedical descriptive phrases from Online Mendelian Inheritance in Man (OMIM), National Organization for Rare Disorders (NORD) and Orphanet using UMLS Thesaurus.
    [Show full text]
  • Scalable Approaches for Auditing the Completeness of Biomedical Ontologies
    University of Kentucky UKnowledge Theses and Dissertations--Computer Science Computer Science 2021 Scalable Approaches for Auditing the Completeness of Biomedical Ontologies Fengbo Zheng University of Kentucky, [email protected] Author ORCID Identifier: https://orcid.org/0000-0001-5902-0186 Digital Object Identifier: https://doi.org/10.13023/etd.2021.128 Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Zheng, Fengbo, "Scalable Approaches for Auditing the Completeness of Biomedical Ontologies" (2021). Theses and Dissertations--Computer Science. 105. https://uknowledge.uky.edu/cs_etds/105 This Doctoral Dissertation is brought to you for free and open access by the Computer Science at UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Computer Science by an authorized administrator of UKnowledge. For more information, please contact [email protected]. STUDENT AGREEMENT: I represent that my thesis or dissertation and abstract are my original work. Proper attribution has been given to all outside sources. I understand that I am solely responsible for obtaining any needed copyright permissions. I have obtained needed written permission statement(s) from the owner(s) of each third-party copyrighted matter to be included in my work, allowing electronic distribution (if such use is not permitted by the fair use doctrine) which will be submitted to UKnowledge as Additional File. I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and royalty-free license to archive and make accessible my work in whole or in part in all forms of media, now or hereafter known.
    [Show full text]