Methods and Models for the Analysis of Biological Signifïcance Based on High-Throughput Data

Total Page:16

File Type:pdf, Size:1020Kb

Methods and Models for the Analysis of Biological Signifïcance Based on High-Throughput Data Methods and Models for the Analysis of Biological Signifïcance Based on High-Throughput Data Jose Luis Mosquera Mayo Aquesta tesi doctoral està subjecta a la llicència Reconeixement- NoComercial – CompartirIgual 3.0. Espanya de Creative Commons. Esta tesis doctoral está sujeta a la licencia Reconocimiento - NoComercial – CompartirIgual 3.0. España de Creative Commons. This doctoral thesis is licensed under the Creative Commons Attribution-NonCommercial- ShareAlike 3.0. Spain License. Methods and Models for the Analysis of Biological Significance Based on High Throughput Data Jose Luis Mosquera Mayo 2 Methods and Models for the Analysis of Biological Significance Based on High Throughput Data M`etodes i Models per a l’An`aliside la Significaci´oBiol`ogicaBasada en Dades d'Alt Rendiment Mem`oriapresentada per en Jose Luis Mosquera Mayo per optar al grau de doctor per la Universitat de Barcelona SIGNATURES El Doctorand: Jose Luis Mosquera Mayo El Director de la Tesi: Dr. Alexandre Sanchez´ Pla El Tutor de la Tesi: Dr. Josep Mar´ıa Oller Sala Programa de doctorat en estad´ıstica Departament d'Estad´ıstica,Facultat de Biologia Universitat de Barcelona 4 Acknowledgements \A teacher affects eternity, he can never tell where his influence stops." Henry Brooks Adams \Mathematicians do not study objects, but relations between objects." Jules Henri Poincar´e It is truly an honor for me to address these words of gratitude to everyone who has contributed, to a greater or lesser extent by supporting me during this exciting program of studies, to help make my lifelong dream come true: to complete my doctoral dissertation. First and foremost, I would like (and feel compelled) to mention a special debt of gratitude to Prof. Alex S´anchez Pla. I am enormously grateful to him, as the director of this dissertation, for having taught, guided and supported me, correcting my scientific work with an interest and dedication that have greatly exceeded all the expectations that I, as a UB student, could have had. Likewise, I would also like to thank him once again for his work as the head of the Statistics and Bioinformatics Unit at the VHIR, for directing, supervising and correcting all the projects in which I have participated over the last six years as a senior technician in the department. It is also my deep personal desire to thank all the members of the Statistical Research and Bioinformatics Group, as well as the professors in the UB Statistics Department, for everything they have taught me, both profes- sionally and as a human being. I learned something from them each and every day. For these reasons, I would like to specifically express my thanks to Prof. M. Carme Ru´ızde Villa, Prof. Francesc Carmona, Prof. Esteban Vegas, Prof. Ferran Reverter and Prof. Antoni Mi´narro. I would also like to recognize my office mate and sparring partner, both at the UB and the VHIR, Mr. Israel Ortega. I would not wish to neglect a big thank you to the department secretaries, Ms. Roser Maldonado 5 6 and Ms. Elisabet Ballester, for their hard work and dedication; without them, the bureaucracy would have been a disaster. I would like to extend these thanks to all the members of the Statistics and Bioinformatics Unit and the High Technology Unit at the VHIR. In particular, I would like to thank Mr. Alejandro Artacho, Dr. Ricardo Gonzalo and Dr. Francisca Gallego for having shared their knowledge and friendship with me. I must not forget a few words of appreciation for the researchers at the VHIR, with whom I have col- laborated closely, and who without knowing it, have at times indirectly contributed to this dissertation. I would therefore like to especially mention Dr. Mar´ıa Vicario, Dr. Cristina Mart´ınez, Dr. Javier Santos, Dr. Rosanna Paciucci, Dr. Andreas Doll and Dr. Jaume Revent´os. Finally, I would like to express my eternal gratitude to Antonio, Carmen, Chus, Mar´ıa, Magdalena, my parents, my brother and my grandparents, who are no longer with us; without them, I would never have been able to complete this project. They have always been with me, through thick and thin. Furthermore, I would like to sincerely thank Inma, Ricardo, Jordi, Ferran, Mireia and Paula for having put up with me, without asking for anything in return. Thank you very much to all those whom I have not mentioned, who have contributed in some way to this personal achievement; please forgive me for this omission. Thank you. Contents Acknowledgement5 Contents 13 List of Tables 15 List of Figures 17 Glossary 21 I Introduction and Objectives 23 1 Introduction: Scope of the Thesis 25 1.1 Meaning of Biological Significance Concept........... 27 1.2 The Gene Ontology........................ 28 1.2.1 The Ontology Domains of the GO............ 29 1.2.1.1 Cellular Component.............. 29 1.2.1.2 Biological Process............... 30 1.2.1.3 Molecular Function............... 31 1.2.2 Scope of the GO..................... 32 1.2.3 The Structure of the GO................. 33 1.3 From Biological to Statistical Significance........... 35 1.3.1 Philosophy of Semantic Similarity Measures...... 35 1.3.1.1 Semantic Similarity Methods......... 37 1.3.1.2 The Edge-Based Approach........... 38 1.3.1.3 The Node-Based Approach.......... 39 1.3.1.4 The Hybrid-Based Approach......... 39 1.3.2 Philosophy of Enrichment Analysis........... 39 1.3.2.1 Enrichment Analysis Tools........... 40 1.3.2.2 Singular Enrichment Analysis......... 41 1.3.2.3 Modular Enrichment Analysis......... 41 1.3.2.4 Gene Set Enrichment Analysis........ 42 7 CONTENTS 8 2 Hypothesis, Objectives and Outline of the Thesis 45 2.1 Hypotheses............................ 45 2.2 Objectives............................. 46 2.2.1 Main Objectives...................... 46 2.2.2 Specific Objectives.................... 46 2.2.2.1 Specific Objectives Associated with the Study of Semantic Similarity Measures.... 47 2.2.2.2 Specific Objectives Associated with the Clas- sification and Study of GO Tools for Enrich- ment Analysis.................. 47 2.3 Outline............................... 47 II Study of Two Semantic Similarity Measures for Exploring GO Categories 49 3 Material and Methods 53 3.1 Main Concepts of Graph Theory................. 53 3.1.1 Basic Graph Concepts.................. 54 3.1.2 Subgraphs......................... 55 3.1.3 Directed Graphs..................... 55 3.1.4 Paths and Connection.................. 56 3.1.5 DAG and Rooted DAG.................. 57 3.1.6 Matrices and Graphs................... 57 3.1.7 Order and Degrees.................... 59 3.2 The GO graph.......................... 61 3.2.1 Basic Concepts of the GO graph............. 61 3.2.2 The Study of the GO based on the Graph Theory... 61 3.3 Carey's Framework........................ 63 3.3.1 Refinement of Relationships............... 64 3.3.1.1 Refinement Matrix............... 65 3.3.1.2 Accessibility Matrix.............. 66 3.3.2 Mapping Genes to GO.................. 67 3.3.2.1 Mapping Matrix................ 67 3.3.2.2 Object-Ontology Complexes.......... 68 3.3.2.3 Coverage Matrix................ 68 3.4 Main concepts of POSET theory................ 69 3.4.1 Partially Ordered Sets Definition............ 69 3.4.2 Chains and Anti-chains.................. 71 3.4.3 Ideal, Filter and Hourglass................ 71 3.4.4 Upper and Lower Bounds................ 72 9 CONTENTS 3.4.5 Interval Orders and Length............... 73 3.4.6 POSET Ontology..................... 74 3.5 Similarity Measures........................ 75 3.5.1 Formal Definitions of Similarity Measure and Metric Distance.......................... 75 3.5.2 Semantic Similarity Measure............... 76 3.5.3 Organization and Classification of Semantic Similarities 77 3.6 GO Terms and Semantic Similarity Measures.......... 78 3.6.1 Lord's Measure...................... 79 3.6.1.1 The Information Content Concept...... 79 3.6.1.2 Resnik's Similarity Measure.......... 80 3.6.1.3 Lord's similarity measure........... 81 3.6.2 Joslyn's Measure..................... 82 3.6.2.1 Pseudo-distances................ 83 3.7 sims: An R Package for Computing Semantic Similarities of an Ontology............................ 84 4 Results 85 4.1 Minor Contributions....................... 85 4.1.1 Graph Theory....................... 85 4.1.2 Semantic Similarity Measures.............. 87 4.1.2.1 The Information Content Concept...... 88 4.2 Main Contributions........................ 89 4.2.1 GO Terms and Semantic Similarity Measures..... 89 4.2.2 Lord's Measure...................... 89 4.2.2.1 The Information Content Concept...... 90 4.2.2.2 Resnik's Measure in Terms of Metric Distance 92 4.2.3 Joslyn's Measure..................... 94 4.2.4 sims: An R Package for Computing Semantic Similar- ities of an Ontology.................... 96 4.2.4.1 Availability of sims .............. 96 4.2.4.2 Requirements.................. 97 4.2.4.3 Main Possibilities of the Package and the List of Functions................... 97 4.2.4.4 Semantic Similarity Measures Implemented in sims ..................... 97 4.2.5 Semantic Similarity Profiles between GO Terms.... 100 5 Discussion 103 CONTENTS 10 III Classification of GO Tools for Enrichment Anal- ysis 109 6 Material and Methods 113 6.1 Selection of GO Tools....................... 113 6.2 Definition of Standard Functionalities Set and Classification of GO Tools............................ 115 6.3 SerbGO: Searching for the Best GO Tool............ 115 6.3.1 Implementation of SerbGO ................ 116 6.4 Evolution and Clustering of GO Tools............. 116 6.4.1 Data............................ 117 6.4.1.1 Raw Data.................... 117 6.4.1.2 Homogenization of Raw Data......... 117 6.4.1.3 Data Matrices.................. 118 6.4.2 Statistical Methods.................... 118 6.4.2.1 Descriptive Statistics Methods........ 118 6.4.2.2 Inferential Analysis Methods......... 119 6.4.2.3 Multivariate Analysis Methods........ 120 6.5 DeGOT: An Ontology for Developing GO Tools......... 124 6.5.1 Basic Concepts of an Ontology............
Recommended publications
  • A Method to Infer Changed Activity of Metabolic Function from Transcript Profiles
    ModeScore: A Method to Infer Changed Activity of Metabolic Function from Transcript Profiles Andreas Hoppe and Hermann-Georg Holzhütter Charité University Medicine Berlin, Institute for Biochemistry, Computational Systems Biochemistry Group [email protected] Abstract Genome-wide transcript profiles are often the only available quantitative data for a particular perturbation of a cellular system and their interpretation with respect to the metabolism is a major challenge in systems biology, especially beyond on/off distinction of genes. We present a method that predicts activity changes of metabolic functions by scoring reference flux distributions based on relative transcript profiles, providing a ranked list of most regulated functions. Then, for each metabolic function, the involved genes are ranked upon how much they represent a specific regulation pattern. Compared with the naïve pathway-based approach, the reference modes can be chosen freely, and they represent full metabolic functions, thus, directly provide testable hypotheses for the metabolic study. In conclusion, the novel method provides promising functions for subsequent experimental elucidation together with outstanding associated genes, solely based on transcript profiles. 1998 ACM Subject Classification J.3 Life and Medical Sciences Keywords and phrases Metabolic network, expression profile, metabolic function Digital Object Identifier 10.4230/OASIcs.GCB.2012.1 1 Background The comprehensive study of the cell’s metabolism would include measuring metabolite concentrations, reaction fluxes, and enzyme activities on a large scale. Measuring fluxes is the most difficult part in this, for a recent assessment of techniques, see [31]. Although mass spectrometry allows to assess metabolite concentrations in a more comprehensive way, the larger the set of potential metabolites, the more difficult [8].
    [Show full text]
  • The Kyoto Encyclopedia of Genes and Genomes (KEGG)
    Kyoto Encyclopedia of Genes and Genome Minoru Kanehisa Institute for Chemical Research, Kyoto University HFSPO Workshop, Strasbourg, November 18, 2016 The KEGG Databases Category Database Content PATHWAY KEGG pathway maps Systems information BRITE BRITE functional hierarchies MODULE KEGG modules KO (KEGG ORTHOLOGY) KO groups for functional orthologs Genomic information GENOME KEGG organisms, viruses and addendum GENES / SSDB Genes and proteins / sequence similarity COMPOUND Chemical compounds GLYCAN Glycans Chemical information REACTION / RCLASS Reactions / reaction classes ENZYME Enzyme nomenclature DISEASE Human diseases DRUG / DGROUP Drugs / drug groups Health information ENVIRON Health-related substances (KEGG MEDICUS) JAPIC Japanese drug labels DailyMed FDA drug labels 12 manually curated original DBs 3 DBs taken from outside sources and given original annotations (GENOME, GENES, ENZYME) 1 computationally generated DB (SSDB) 2 outside DBs (JAPIC, DailyMed) KEGG is widely used for functional interpretation and practical application of genome sequences and other high-throughput data KO PATHWAY GENOME BRITE DISEASE GENES MODULE DRUG Genome Molecular High-level Practical Metagenome functions functions applications Transcriptome etc. Metabolome Glycome etc. COMPOUND GLYCAN REACTION Funding Annual budget Period Funding source (USD) 1995-2010 Supported by 10+ grants from Ministry of Education, >2 M Japan Society for Promotion of Science (JSPS) and Japan Science and Technology Agency (JST) 2011-2013 Supported by National Bioscience Database Center 0.8 M (NBDC) of JST 2014-2016 Supported by NBDC 0.5 M 2017- ? 1995 KEGG website made freely available 1997 KEGG FTP site made freely available 2011 Plea to support KEGG KEGG FTP academic subscription introduced 1998 First commercial licensing Contingency Plan 1999 Pathway Solutions Inc.
    [Show full text]
  • Applied Category Theory for Genomics – an Initiative
    Applied Category Theory for Genomics { An Initiative Yanying Wu1,2 1Centre for Neural Circuits and Behaviour, University of Oxford, UK 2Department of Physiology, Anatomy and Genetics, University of Oxford, UK 06 Sept, 2020 Abstract The ultimate secret of all lives on earth is hidden in their genomes { a totality of DNA sequences. We currently know the whole genome sequence of many organisms, while our understanding of the genome architecture on a systematic level remains rudimentary. Applied category theory opens a promising way to integrate the humongous amount of heterogeneous informations in genomics, to advance our knowledge regarding genome organization, and to provide us with a deep and holistic view of our own genomes. In this work we explain why applied category theory carries such a hope, and we move on to show how it could actually do so, albeit in baby steps. The manuscript intends to be readable to both mathematicians and biologists, therefore no prior knowledge is required from either side. arXiv:2009.02822v1 [q-bio.GN] 6 Sep 2020 1 Introduction DNA, the genetic material of all living beings on this planet, holds the secret of life. The complete set of DNA sequences in an organism constitutes its genome { the blueprint and instruction manual of that organism, be it a human or fly [1]. Therefore, genomics, which studies the contents and meaning of genomes, has been standing in the central stage of scientific research since its birth. The twentieth century witnessed three milestones of genomics research [1]. It began with the discovery of Mendel's laws of inheritance [2], sparked a climax in the middle with the reveal of DNA double helix structure [3], and ended with the accomplishment of a first draft of complete human genome sequences [4].
    [Show full text]
  • Gene Prediction: the End of the Beginning Comment Colin Semple
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by PubMed Central http://genomebiology.com/2000/1/2/reports/4012.1 Meeting report Gene prediction: the end of the beginning comment Colin Semple Address: Department of Medical Sciences, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK. E-mail: [email protected] Published: 28 July 2000 reviews Genome Biology 2000, 1(2):reports4012.1–4012.3 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2000/1/2/reports/4012 © GenomeBiology.com (Print ISSN 1465-6906; Online ISSN 1465-6914) Reducing genomes to genes reports A report from the conference entitled Genome Based Gene All ab initio gene prediction programs have to balance sensi- Structure Determination, Hinxton, UK, 1-2 June, 2000, tivity against accuracy. It is often only possible to detect all organised by the European Bioinformatics Institute (EBI). the real exons present in a sequence at the expense of detect- ing many false ones. Alternatively, one may accept only pre- dictions scoring above a more stringent threshold but lose The draft sequence of the human genome will become avail- those real exons that have lower scores. The trick is to try and able later this year. For some time now it has been accepted increase accuracy without any large loss of sensitivity; this deposited research that this will mark a beginning rather than an end. A vast can be done by comparing the prediction with additional, amount of work will remain to be done, from detailing independent evidence.
    [Show full text]
  • 3 13437143.Pdf
    Title Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O.; Barrero, Roberto A.; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A.; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K.; Brookes, Anthony J.; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; Souza, Sandro J. de; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Author(s) Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei;
    [Show full text]
  • Functional Effects Detailed Research Plan
    GeCIP Detailed Research Plan Form Background The Genomics England Clinical Interpretation Partnership (GeCIP) brings together researchers, clinicians and trainees from both academia and the NHS to analyse, refine and make new discoveries from the data from the 100,000 Genomes Project. The aims of the partnerships are: 1. To optimise: • clinical data and sample collection • clinical reporting • data validation and interpretation. 2. To improve understanding of the implications of genomic findings and improve the accuracy and reliability of information fed back to patients. To add to knowledge of the genetic basis of disease. 3. To provide a sustainable thriving training environment. The initial wave of GeCIP domains was announced in June 2015 following a first round of applications in January 2015. On the 18th June 2015 we invited the inaugurated GeCIP domains to develop more detailed research plans working closely with Genomics England. These will be used to ensure that the plans are complimentary and add real value across the GeCIP portfolio and address the aims and objectives of the 100,000 Genomes Project. They will be shared with the MRC, Wellcome Trust, NIHR and Cancer Research UK as existing members of the GeCIP Board to give advance warning and manage funding requests to maximise the funds available to each domain. However, formal applications will then be required to be submitted to individual funders. They will allow Genomics England to plan shared core analyses and the required research and computing infrastructure to support the proposed research. They will also form the basis of assessment by the Project’s Access Review Committee, to permit access to data.
    [Show full text]
  • Depleting PTOV1 Sensitizes Non-Small Cell Lung Cancer Cells to Chemotherapy Through Attenuating Cancer Stem Cell Traits
    Wu et al. Journal of Experimental & Clinical Cancer Research (2019) 38:341 https://doi.org/10.1186/s13046-019-1349-y RESEARCH Open Access Depleting PTOV1 sensitizes non-small cell lung cancer cells to chemotherapy through attenuating cancer stem cell traits Zhiqiang Wu1*† , Zhuang Liu1†, Xiangli Jiang2†, Zeyun Mi3, Maobin Meng1, Hui Wang1, Jinlin Zhao1, Boyu Zheng1 and Zhiyong Yuan1* Abstract Background: Prostate tumor over expressed gene 1 (PTOV1) has been reported as an oncogene in several human cancers. However, the clinical significance and biological role of PTOV1 remain elusive in non-small cell lung cancer (NSCLC). Methods: The Cancer Genome Atlas (TCGA) data and NCBI/GEO data mining, western blotting analysis and immunohistochemistry were employed to characterize the expression of PTOV1 in NSCLC cell lines and tissues. The clinical significance of PTOV1 in NSCLC was studied by immunohistochemistry statistical analysis and Kaplan–Meier Plotter database mining. A series of in-vivo and in-vitro assays, including colony formation, CCK-8 assays, flow cytometry, wound healing, trans-well assay, tumor sphere formation, quantitative PCR, gene set enrichment analysis (GSEA), immunostaining and xenografts tumor model, were performed to demonstrate the effects of PTOV1 on chemosensitivity of NSCLC cells and the underlying mechanisms. Results: PTOV1 is overexpressed in NSCLC cell lines and tissues. High PTOV1 level indicates a short survival time and poor response to chemotherapy of NSCLC patients. Depleting PTOV1 increased sensitivity to chemotherapy drugs cisplatin and docetaxel by increasing cell apoptosis, inhibiting cell migration and invasion. Our study verified that depleting PTOV1 attenuated cancer stem cell traits through impairing DKK1/β-catenin signaling to enhance chemosensitivity of NSCLC cells.
    [Show full text]
  • Meeting Review: Bioinformatics and Medicine – from Molecules To
    Comparative and Functional Genomics Comp Funct Genom 2002; 3: 270–276. Published online 9 May 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.178 Feature Meeting Review: Bioinformatics And Medicine – From molecules to humans, virtual and real Hinxton Hall Conference Centre, Genome Campus, Hinxton, Cambridge, UK – April 5th–7th Roslin Russell* MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK *Correspondence to: Abstract MRC UK HGMP Resource Centre, Genome Campus, The Industrialization Workshop Series aims to promote and discuss integration, automa- Hinxton, Cambridge CB10 1SB, tion, simulation, quality, availability and standards in the high-throughput life sciences. UK. The main issues addressed being the transformation of bioinformatics and bioinformatics- based drug design into a robust discipline in industry, the government, research institutes and academia. The latest workshop emphasized the influence of the post-genomic era on medicine and healthcare with reference to advanced biological systems modeling and simulation, protein structure research, protein-protein interactions, metabolism and physiology. Speakers included Michael Ashburner, Kenneth Buetow, Francois Cambien, Cyrus Chothia, Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd, Janet Thornton, Pim van der Eijk, Michael Vieth and Richard Ward. Copyright # 2002 John Wiley & Sons, Ltd. Received: 22 April 2002 Keywords: bioinformatics;
    [Show full text]
  • Algorithms for Computational Biology 8Th International Conference, Alcob 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings
    Lecture Notes in Bioinformatics 12715 Subseries of Lecture Notes in Computer Science Series Editors Sorin Istrail Brown University, Providence, RI, USA Pavel Pevzner University of California, San Diego, CA, USA Michael Waterman University of Southern California, Los Angeles, CA, USA Editorial Board Members Søren Brunak Technical University of Denmark, Kongens Lyngby, Denmark Mikhail S. Gelfand IITP, Research and Training Center on Bioinformatics, Moscow, Russia Thomas Lengauer Max Planck Institute for Informatics, Saarbrücken, Germany Satoru Miyano University of Tokyo, Tokyo, Japan Eugene Myers Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany Marie-France Sagot Université Lyon 1, Villeurbanne, France David Sankoff University of Ottawa, Ottawa, Canada Ron Shamir Tel Aviv University, Ramat Aviv, Tel Aviv, Israel Terry Speed Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia Martin Vingron Max Planck Institute for Molecular Genetics, Berlin, Germany W. Eric Wong University of Texas at Dallas, Richardson, TX, USA More information about this subseries at http://www.springer.com/series/5381 Carlos Martín-Vide • Miguel A. Vega-Rodríguez • Travis Wheeler (Eds.) Algorithms for Computational Biology 8th International Conference, AlCoB 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings 123 Editors Carlos Martín-Vide Miguel A. Vega-Rodríguez Rovira i Virgili University University of Extremadura Tarragona, Spain Cáceres, Spain Travis Wheeler University of Montana Missoula, MT, USA ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Bioinformatics ISBN 978-3-030-74431-1 ISBN 978-3-030-74432-8 (eBook) https://doi.org/10.1007/978-3-030-74432-8 LNCS Sublibrary: SL8 – Bioinformatics © Springer Nature Switzerland AG 2021 This work is subject to copyright.
    [Show full text]
  • Genome Informatics 4–8 September 2002, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
    Comparative and Functional Genomics Comp Funct Genom 2003; 4: 509–514. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.300 Feature Meeting Highlights: Genome Informatics 4–8 September 2002, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Jo Wixon1* and Jennifer Ashurst2 1MRC UK HGMP-RC, Hinxton, Cambridge CB10 1SB, UK 2The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK *Correspondence to: Abstract Jo Wixon, MRC UK HGMP-RC, Hinxton, Cambridge CB10 We bring you the highlights of the second Joint Cold Spring Harbor Laboratory 1SB, UK. and Wellcome Trust ‘Genome Informatics’ Conference, organized by Ewan Birney, E-mail: [email protected] Suzanna Lewis and Lincoln Stein. There were sessions on in silico data discovery, comparative genomics, annotation pipelines, functional genomics and integrative biology. The conference included a keynote address by Sydney Brenner, who was awarded the 2002 Nobel Prize in Physiology or Medicine (jointly with John Sulston and H. Robert Horvitz) a month later. Copyright 2003 John Wiley & Sons, Ltd. In silico data discovery background set was genes which had a log ratio of ∼0 in liver. Their approach found 17 of 17 known In the first of two sessions on this topic, Naoya promoters with a specificity of 17/28. None of the Hata (Cold Spring Harbor Laboratory, USA) sites they identified was located downstream of a spoke about motif searching for tissue specific TSS and all showed an excess in the foreground promoters. The first step in the process is to sample compared to the background sample.
    [Show full text]
  • Genbank by Walter B
    GenBank by Walter B. Goad o understand the significance of the information stored in memory—DNA—to the stuff of activity—proteins. The idea that GenBank, you need to know a little about molecular somehow the bases in DNA determine the amino acids in proteins genetics. What that field deals with is self-replication—the had been around for some time. In fact, George Gamow suggested in T process unique to life—and mutation and 1954, after learning about the structure proposed for DNA, that a recombination—the processes responsible for evolution-at the triplet of bases corresponded to an amino acid. That suggestion was fundamental level of the genes in DNA. This approach of working shown to be true, and by 1965 most of the genetic code had been from the blueprint, so to speak, of a living system is very powerful, deciphered. Also worked out in the ’60s were many details of what and studies of many other aspects of life—the process of learning, for Crick called the central dogma of molecular genetics—the now example—are now utilizing molecular genetics. firmly established fact that DNA is not translated directly to proteins Molecular genetics began in the early ’40s and was at first but is first transcribed to messenger RNA. This molecule, a nucleic controversial because many of the people involved had been trained acid like DNA, then serves as the template for protein synthesis. in the physical sciences rather than the biological sciences, and yet These great advances prompted a very distinguished molecular they were answering questions that biologists had been asking for geneticist to predict, in 1969, that biology was just about to end since years.
    [Show full text]
  • Computational Biology and Bioinformatics
    Vol. 30 ISMB 2014, pages i1–i2 BIOINFORMATICS EDITORIAL doi:10.1093/bioinformatics/btu304 Editorial This special issue of Bioinformatics serves as the proceedings of The conference used a two-tier review system, a continuation the 22nd annual meeting of Intelligent Systems for Molecular and refinement of a process begun with ISMB 2013 in an effort Biology (ISMB), which took place in Boston, MA, July 11–15, to better ensure thorough and fair reviewing. Under the revised 2014 (http://www.iscb.org/ismbeccb2014). The official confer- process, each of the 191 submissions was first reviewed by at least ence of the International Society for Computational Biology three expert referees, with a subset receiving between four and (http://www.iscb.org/), ISMB, was accompanied by 12 Special eight reviews, as needed. These formal reviews were frequently Interest Group meetings of one or two days each, two satellite supplemented by online discussion among reviewers and Area meetings, a High School Teachers Workshop and two half-day Chairs to resolve points of dispute and reach a consensus on tutorials. Since its inception, ISMB has grown to be the largest each paper. Among the 191 submissions, 29 were conditionally international conference in computational biology and bioinfor- accepted for publication directly from the first round review Downloaded from matics. It is expected to be the premiere forum in the field for based on an assessment of the reviewers that the paper was presenting new research results, disseminating methods and tech- clearly above par for the conference. A subset of 16 papers niques and facilitating discussions among leading researchers, were viewed as potentially in the top tier but raised significant practitioners and students in the field.
    [Show full text]