Statistical Signals in Bioinformatics

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Signals in Bioinformatics FROM THE ACADEMY: COLLOQUIUM PERSPECTIVE Statistical signals in bioinformatics Samuel Karlin† Department of Mathematics, Stanford University, Stanford, CA 94305-2125 Contributed by Samuel Karlin, July 7, 2005 The Arthur M. Sackler Colloquium of the National Academy of Sciences, ‘‘Frontiers in Bioinformatics: Unsolved Problems and Chal- lenges,’’ organized by David Eisenberg, Russ Altman, and myself, was held October 15–17, 2004, to provide a forum for discussing concepts and methods in bioinformatics serving the biological and medical sciences. The deluge of genomic and proteomic data in the last two decades has driven the creation of tools that search and analyze biomolecular sequences and structures. Bioinformatics is highly interdisciplinary, using knowledge from mathematics, statistics, computer science, biology, medicine, physics, chemistry, and engineering. BLAST ͉ repeat sequences ͉ r-scan statistics ͉ frequent and rare oligonucleotides and peptides ore than 200 bacterial, 20 ment) (13, 14); r-scan statistics (15) (de- to evaluate gene expression with a differ- archaeal, and 20 eukaryotic tecting anomalous spacing of specific ent set of limitations (32–34). Ribosomal genomes as well as 1,600 markers distributed along a sequence); protein (RP) codon frequencies deviate viral genomes have been and ‘‘frequent’’ and ‘‘rare’’ oligonucleo- strongly from average codon frequencies Mcompletely sequenced. Moreover, several tides and peptides (sequence words in many bacteria, especially during rapid hundred mitochondrial chromosomal sets that occur statistically more or less fre- growth. By contrast, the expression levels and at least 30 chromosomal plastids have quently than would be expected by for RPs in archaea are variable; 15–30% been sequenced. A deeper understanding chance) (16–18). of archaeal genes have average codon of basic biology can be gained from a The standard BLAST protocol compares usage (35, 36). The most biased codon comparison of organisms in different evo- a query sequence with a large database of usages are found for genes generally lutionary lineages, and this understanding protein sequences to uncover significant involved in general processes of transcrip- is the aim of the currently intense se- similarities that help to circumscribe func- tion͞translation, chaperone͞degradation, quencing efforts. New species thriving in tion and structure of the query sequence. and energy metabolism (33, 34). environments throughout the earth and BLAST-like programs are extended to oceans reveal microbes in geothermal ar- searches using multiple alignments (13, 14, Highlights of the Sackler Colloquium eas, in arctic ice, in acidic springs, in toxic 19); to single-sequence analyses (20), e.g., Discussions at the Colloquium focused on waste sites, in assorted air currents, and in to identify DNA-binding peptides and the following topics: sequence patterns subterranean habitats (1, 2). Unique mixes transmembrane tracts; and to three-di- (12, 18, 20), comparative genomics and of microbes that thrive in different ecosys- mensional analyses, e.g., to identify charge proteomics, modeling of molecular inter- tems have been described (3). For exam- clusters in protein structures and cysteine actions and gene regulatory networks, ple, a new type of bacterial rhodopsin was knots (21, 22). The BLAST programs cur- management and interpretation of protein discovered by genomic analysis of natu- rently serve Ϸ250,000 queries per day at expression data (33, 34), microarrays (29), rally occurring marine bacterioplankton the National Center for Biotechnology over- and underrepresentations of words (4). Prokaryote genomic samples from the Information (NCBI) in Washington, DC. in genomes and proteomes, molecular human oral cavity and the intestinal tract, The theory relies on fundamental proper- evolution (37–39), alternative splicing (40– and their changes in time, are progres- ties of extremal statistical distributions and 43), polymorphisms and SNPs (44), sively being followed (5). The diversity of the stochastic theory of large deviations genomic haplotypes (HAP-MAP) (45), microbes in cutaneous wounds of humans (23–25). There are natural relationships and three-dimensional macromolecular is of great practical importance. Sequenc- between these analyses and studies on the taxonomy (46). In this article, I will sum- ing efforts facilitate health research and maximum service time among customers marize results and perspectives issuing understanding of pathogenesis and may in queuing systems, as well as in applica- from Colloquium talks and then review have commercial, industrial, and agricul- tions to insurance risk and traffic flow some general concepts and methods of tural benefits as well. It is clear that lack models (see refs. 26 and 27 and below) bioinformatics. of data is not a problem today; rather, the DNA microarrays (DNA chips) aim to George Miklos called attention to diffi- challenge will be the analysis of the vast dissect gene expression under varied phys- culties arising from noise in microarray quantity of information already available. iological, clinical, and environmental con- data sets. Conflicts in the clinical applica- Bioinformatic methods underlie most of ditions. Microarrays are used to monitor tion of microarray data to cancers and computational biology. Current emphasis well characterized genes in different situa- complex diseases are exposed in refs. 47 is on computationally efficient and wide- tions; to discover disease genes; to assess ranging algorithms that have been imple- gene expression during treatment with This paper serves as an introduction to the Arthur M. Sackler mented and tested on real and simulated drugs, chemicals, or toxins; to discover Colloquium of the National Academy of Sciences, ‘‘Frontiers data sets. Exact and empirical algorithms genes that compensate for knockout mu- in Bioinformatics: Unsolved Problems and Challenges,’’ have been applied to genomics, proteom- tations; and to profile gene expression in held October 15–17, 2004, at the Arnold and Mabel Beck- ics, gene networks, structure prediction, temporal and in tissue-specific localiza- man Center of the National Academies of Sciences and Engineering in Irvine, CA. Papers from this Colloquium will and drug design. Bioinformatic tools are tions (28–30). Experimental evaluations of be available as a collection on the PNAS web site. The used daily, including the generalized protein abundances under different cellu- complete program is available on the NAS web site at BLAST programs (sequence similarity eval- lar conditions can be assayed by two-di- www.nasonline.org͞bioinformatics. uations) (6–9); GENSCAN and GENIE (gene mensional gel electrophoresis (31) and Abbreviations: AS, alternative splicing; i.i.d., independent discovery) (10, 11); SAPS (statistical analy- supplemented by mass spectrometry, anti- identically distributed; USS, uptake signal sequence. sis of protein sequences) (12); CLUSTAL body associations, and biochemical tests. †E-mail: [email protected]. and ITERALIGN (multiple sequence align- Codon usage analysis offers another way © 2005 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0501804102 PNAS ͉ September 20, 2005 ͉ vol. 102 ͉ no. 38 ͉ 13355–13362 Downloaded by guest on September 24, 2021 and 48. Marc Gerstein reviewed data on interacting genes. Per Bork described gene CpG Dinucleotide Suppression the numbers and distribution of pseudo- control in metazoans and how it influ- The role of CpG dinucleotides, particu- genes (called fossil records) over several ences genome evolution. He proffered a larly in the context of CpG methylation, is complete eukaryotic genomes, especially gene neighborhood on predicting gene of considerable interest (e.g., see ref. 69). the human genome. Ribosomal protein function (43). Hanah Margalit reported In addition to causing increased mutation genes predominate among pseudogenes in how gene regulation and the arrangement rates, CpG methylation alters the shape of human sequences counting more than of protein pairs participate in protein– the major groove of DNA, leading to 2,000 cases (49, 50). In chromosomes 21 protein interactions in Escherichia coli and modified chromatin structure, and thus it and 22 (see ref. 51), the median length of Saccharomyces cerevisiae. She displayed is capable of altering patterns and rates of processed pseudogenes is approximately situations of chromosomal adjacency gene transcription (70). Recent studies in the same median length as that of single- among these genes (58). Shoshana Wodak humans have shown that the nonpromoter exon (intronless) genes. The median also dealt with protein–protein interaction CpG islands are targets for de novo meth- length of single-exon genes is approxi- and the structural problem of docking ylation, playing a role in cancer and aging mately congruent to the median internal (59). David Eisenberg reviewed several (71). As a mechanism for modifying gene exon length times the average number of programs relevant to the study of protein expression levels and tissue-specific ex- exons per gene. These observations sup- interactions comparing multiple bacterial pression, it offers regulatory possibilities port the proposition that many single-exon genomes. The information can be applied that are exploited in genomic imprinting, genes derive from processed multiexon in structural genomics to determine pro- X chromosome inactivation, transposon genes transposed into the genome (51). tein
Recommended publications
  • RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY
    RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY Proceedings of the 5th WSEAS International Conference on CELLULAR and MOLECULAR BIOLOGY, BIOPHYSICS and BIOENGINEERING (BIO '09) Proceedings of the 3rd WSEAS International Conference on COMPUTATIONAL CHEMISTRY (COMPUCHEM '09) Puerto De La Cruz, Tenerife, Canary Islands, Spain December 14-16, 2009 Recent Advances in Biology and Biomedicine A Series of Reference Books and Textbooks Published by WSEAS Press ISSN: 1790-5125 www.wseas.org ISBN: 978-960-474-141-0 RECENT ADVANCES in BIOLOGY, BIOPHYSICS, BIOENGINEERING and COMPUTATIONAL CHEMISTRY Proceedings of the 5th WSEAS International Conference on CELLULAR and MOLECULAR BIOLOGY, BIOPHYSICS and BIOENGINEERING (BIO '09) Proceedings of the 3rd WSEAS International Conference on COMPUTATIONAL CHEMISTRY (COMPUCHEM '09) Puerto De La Cruz, Tenerife, Canary Islands, Spain December 14-16, 2009 Recent Advances in Biology and Biomedicine A Series of Reference Books and Textbooks Published by WSEAS Press www.wseas.org Copyright © 2009, by WSEAS Press All the copyright of the present book belongs to the World Scientific and Engineering Academy and Society Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the Editor of World Scientific and Engineering Academy and Society Press. All papers of the present volume were peer reviewed
    [Show full text]
  • ISMB 2008 Toronto
    ISMB 2008 Toronto The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Linial, Michal, Jill P. Mesirov, B. J. Morrison McKay, and Burkhard Rost. 2008. ISMB 2008 Toronto. PLoS Computational Biology 4(6): e1000094. Published Version doi:10.1371/journal.pcbi.1000094 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:11213310 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Message from ISCB ISMB 2008 Toronto Michal Linial1,2, Jill P. Mesirov1,3, BJ Morrison McKay1*, Burkhard Rost1,4 1 International Society for Computational Biology (ISCB), University of California San Diego, La Jolla, California, United States of America, 2 Sudarsky Center, The Hebrew University of Jerusalem, Jerusalem, Israel, 3 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 4 Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America the integration of students, and for the of ISMB. One meeting in South Asia support of young leaders in the field. (InCoB; http://incob.binfo.org.tw/) has ISMB has also become a forum for already been sponsored by ISCB, and reviewing the state of the art in the many another one in North Asia is going to fields of this growing discipline, for follow. ISMB itself has also been held in introducing new directions, and for an- Australia (2003) and Brazil (2006).
    [Show full text]
  • From DNA Sequence to Chromatin Dynamics: Computational Analysis of Transcriptional Regulation
    From DNA Sequence to Chromatin Dynamics: Computational Analysis of Transcriptional Regulation Thesis submitted for the degree of “Doctor of Philosophy” by Tommy Kaplan Submitted to the Senate of the Hebrew University May 2008 This work was carried out under the supervision of Prof. Nir Friedman and Prof. Hanah Margalit Abstract All cells of a living organism share the same DNA. Yet, they differ in structure, activities and interactions. These differences arise through a tight regulatory system which activates different genes and pathways to fit the cell’s specialization, condition, and requirements. Deciphering the regulatory mechanisms underlying a living cell is one of the fundamental challenges in biology. Such knowledge will allow us to better understand how cells work, how they respond to external stimuli, what goes wrong in diseases like cancer (which often involves disruption of gene regulation), and how it can be fought. In my PhD, I focus on regulation of gene expression from three perspectives. First, I present an innovative algorithm for identifying the target genes of novel transcription factors, based on their protein sequence (Chapter 1). Second, I consider how several transcription factors cooperate to process external stimuli and alter the behavior of the cell (Chapter 2). Finally, I study how the genomic position of nucleosomes and their covalent modifications modulate the accessibility of DNA to transcription factors, thus adding a fascinating dimension to transcriptional regulation (Chapters 3 and 4). To understand transcriptional regulation, one should first reconstruct the architecture of the cell’s regulatory map, thus identifying which genes are regulated by which transcription factors (TFs).
    [Show full text]
  • Program Book
    Pacific Symposium on Biocomputing 2016 January 4-8, 2016 Big Island of Hawaii Program Book PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016 Big Island of Hawaii, January 4-8, 2016 Welcome to PSB 2016! We have prepared this program book to give you quick access to information you need for PSB 2016. Enclosed you will find • Logistics information • Menus for PSB hosted meals • Full conference schedule • Call for Session and Workshop Proposals for PSB 2017 • Poster/abstract titles and authors • Participant List Conference materials are also available online at http://psb.stanford.edu/conference-materials/. PSB 2016 gratefully acknowledges the support the Institute for Computational Biology, a collaborative effort of Case Western Reserve University, the Cleveland Clinic Foundation, and University Hospitals; the National Institutes of Health (NIH), the National Science Foundation (NSF); and the International Society for Computational Biology (ISCB). If you or your institution are interested in sponsoring, PSB, please contact Tiffany Murray at [email protected] If you have any questions, the PSB registration staff (Tiffany Murray, Georgia Hansen, Brant Hansen, Kasey Miller, and BJ Morrison-McKay) are happy to help you. Aloha! Russ Altman Keith Dunker Larry Hunter Teri Klein Maryln Ritchie The PSB 2016 Organizers PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016 Big Island of Hawaii, January 4-8, 2016 SPEAKER INFORMATION Oral presentations of accepted proceedings papers will take place in Salon 2 & 3. Speakers are allotted 10 minutes for presentation and 5 minutes for questions for a total of 15 minutes. Instructions for uploading talks were sent to authors with oral presentations. If you need assistance with this, please see Tiffany Murray or another PSB staff member.
    [Show full text]
  • Conference Proceedingssmall
    1 COMMITTEES Steering Committee Phil Bourne - University of California, San Diego Eric Davidson - California Institute of Technology Steven Salzberg - The Institute for Genomic Research John Wooley - University of California San Diego, San Diego Supercomputer Center Organizing Committee Pat Blauvelt – LSS Membership Director Karen Hauge – Palo Alto Medical Foundation, Local Arangements Kass Goldfein - Finance Consultant AlishaHolloway – The J. David Gladstone Institutes, Tutorial Chair Sami Khuri – San Jose State University, Poster Chair Ann Loraine – University of North Carolina at Charlotte, CSB Publication Chair Fenglou Mao – University of Georgia, On-Line Registration and Refereeing Website Peter Markstein – in silico Labs, Program Co-Chair Vicky Markstein - Life Sciences Society, Conference Chair, LSS President Jean Tsukamoto - Graphics Design Bill Wang - Sun Microsystems Inc, LSS Information Technology Director Ying Xu – University of Georgia, Program Co-Chair Program Committee Tatsuya Akutsu – Kyoto University Chris Bailey-Kellogg – Dartmouth College Pierre Baldi – University of California Irvine Liming Cai – University of Georgia Bill Cannon – Pacific Northwest National Laboratory Jake Chen – Indiana University Bhaskar DasGupta – University of Illinois Chicago Andrey Gorin – Oak Ridge National Laboratory Matt Hibbs – Princeton University Wen-Lian Hsu – Academia Sinica Tamer Kahveci – University of Florida Carl Kingsford – University of Maryland Christina Leslie – Memorial Sloan-Kettering Cancer Center Jing Li – Case Western Reserve
    [Show full text]
  • Assigning Folds to the Proteins Encoded by the Genome of Mycoplasma Genitalium (Protein Fold Recognition͞computer Analysis of Genome Sequences)
    Proc. Natl. Acad. Sci. USA Vol. 94, pp. 11929–11934, October 1997 Biophysics Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium (protein fold recognitionycomputer analysis of genome sequences) DANIEL FISCHER* AND DAVID EISENBERG University of California, Los Angeles–Department of Energy Laboratory of Structural Biology and Molecular Medicine, Molecular Biology Institute, University of California, Los Angeles, Box 951570, Los Angeles, CA 90095-1570 Contributed by David Eisenberg, August 8, 1997 ABSTRACT A crucial step in exploiting the information genitalium (MG) (10), as a test of the capabilities of our inherent in genome sequences is to assign to each protein automatic fold recognition server and as a case study to sequence its three-dimensional fold and biological function. identify the difficulties facing automated fold assignment. Here we describe fold assignment for the proteins encoded by the small genome of Mycoplasma genitalium. The assignment MATERIALS AND METHODS was carried out by our computer server (http:yywww.doe- mbi.ucla.eduypeopleyfrsvryfrsvr.html), which assigns folds to The MG Sequences. The 468 MG sequences were obtained amino acid sequences by comparing sequence-derived predic- from The Institute for Genome Research (TIGR) through its tions with known structures. Of the total of 468 protein ORFs, Web address: http:yywww.tigr.orgytdbymdbymgdbymgd- 103 (22%) can be assigned a known protein fold with high b.html. Three types of annotation (based on searches in the confidence, as cross-validated with tests on known structures. sequence database) accompany each TIGR sequence (10): (i) Of these sequences, 75 (16%) show enough sequence similarity functional assignment—a clear sequence similarity with a to proteins of known structure that they can also be detected protein of known function from another organism was found by traditional sequence–sequence comparison methods.
    [Show full text]
  • Practical Structure-Sequence Alignment of Pseudoknotted Rnas Wei Wang
    Practical structure-sequence alignment of pseudoknotted RNAs Wei Wang To cite this version: Wei Wang. Practical structure-sequence alignment of pseudoknotted RNAs. Bioinformatics [q- bio.QM]. Université Paris Saclay (COmUE), 2017. English. NNT : 2017SACLS563. tel-01697889 HAL Id: tel-01697889 https://tel.archives-ouvertes.fr/tel-01697889 Submitted on 31 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 NNT : 2017SACLS563 Thèse de doctorat de l’Université Paris-Saclay préparée à L’Université Paris-Sud Ecole doctorale n◦580 (STIC) Sciences et Technologies de l’Information et de la Communication Spécialité de doctorat : Informatique par M. Wei WANG Alignement pratique de structure-séquence d’ARN avec pseudonœuds Thèse présentée et soutenue à Orsay, le 18 Décembre 2017. Composition du Jury : Mme Hélène TOUZET Directrice de Recherche (Présidente) CNRS, Université Lille 1 M. Guillaume FERTIN Professeur (Rapporteur) Université de Nantes M. Jan GORODKIN Professeur (Rapporteur) University of Copenhagen Mme Johanne COHEN Directrice de Recherche (Examinatrice)
    [Show full text]
  • BIOINFORMATICS Doi:10.1093/Bioinformatics/Btq499
    Vol. 26 ECCB 2010, pages i409–i411 BIOINFORMATICS doi:10.1093/bioinformatics/btq499 ECCB 2010 Organization CONFERENCE CHAIR B. Comparative Genomics, Phylogeny, and Evolution Yves Moreau, Katholieke Universiteit Leuven, Belgium Martijn Huynen, Radboud University Nijmegen Medical Centre, The Netherlands PROCEEDINGS CHAIR Yves Van de Peer, Ghent University & VIB, Belgium Jaap Heringa, Free University of Amsterdam, The Netherlands C. Protein and Nucleotide Structure LOCAL ORGANIZING COMMITTEE Anna Tramontano, University of Rome ‘La Sapienza’, Italy Jan Gorodkin, University of Copenhagen, Denmark Yves Moreau, Katholieke Universiteit Leuven, Belgium Jaap Heringa, Free University of Amsterdam, The Netherlands D. Annotation and Prediction of Molecular Function Gert Vriend, Radboud University, Nijmegen, The Netherlands Yves Van de Peer, University of Ghent & VIB, Belgium Nir Ben-Tal, Tel-Aviv University, Israel Kathleen Marchal, Katholieke Universiteit Leuven, Belgium Fritz Roth, Harvard Medical School, USA Jacques van Helden, Université Libre de Bruxelles, Belgium Louis Wehenkel, Université de Liège, Belgium E. Gene Regulation and Transcriptomics Antoine van Kampen, University of Amsterdam & Netherlands Jaak Vilo, University of Tartu, Estonia Bioinformatics Center (NBIC) Zohar Yakhini, Agilent Laboratories, Tel-Aviv & the Tech-nion, Peter van der Spek, Erasmus MC, Rotterdam, The Netherlands Haifa, Israel STEERING COMMITTEE F. Text Mining, Ontologies, and Databases Michal Linial (Chair), Hebrew University, Jerusalem, Israel Alfonso Valencia, National
    [Show full text]
  • A Hidden Markov Model Framewrok for Studying Regulation from Chromatin Through RNA
    Submitted in part fulfilment of the requirements for the degree of Master of Science in Computer Science and Computational Biology A Hidden Markov Model Framewrok For Studying Regulation From Chromatin Through RNA Eran Rosenthal Supervisor: Dr. Tommy Kaplan October 2017 Hebrew University of Jerusalem Faculty of Science - Computer Science and Engineering Acknowledgments I would like to deeply thank Tommy, for being a great supervisor, giving very useful advices and scientific tutoring. Tommy has great ability to express complex concepts in simple, illustrative stories, and he introduced me to different kinds of experimental data and computational approaches. I would like also to thank for the members of the lab, for the interesting ideas and discussions. I would like to thank for all the people I had worked in collaboration in my master: • Moshe Oren and Gilad Fuchs for the providing me their experimental data of nascent RNA with RNF20 knockdown for studying the role of monoubiquity- lation of H2B. • Michael Berger and Yuval Malka for the exciting project of studying post- transcriptional 3’UTR cleavage of mRNA transcripts, and for Hanah Margalit and Avital Shimony who helped us in the analysis of miRNA regulation in this project. i Abstract Information flows from DNA to RNA, through transcription, followed by trans- lation of the RNA transcript to protein. A viable cell must have proper regulation on genes expression. There are varieties of regulation mechanisms along the way, which include regulation on the DNA and expression level of genes, post-transcriptional regulation, and post-translation regulation. Most of the cells in our body share the same DNA, but they have different func- tions.
    [Show full text]
  • EYAL AKIVA, Phd
    Eyal Akiva CV, Nov. 2017 EYAL AKIVA, PhD Department of Bioengineering and Therapeutic Sciences Phone +1-650-504-9008 University of California at San Francisco Email [email protected] 1700 4th street, San Francisco, Web www.babbittlab.ucsf.edu/eakiva CA, USA EDUCATION 2012-2017 Post-doctoral fellowship at UCSF, Dept. Of Bioengineering and Therapeutic Sciences. Host: Prof. Patricia Babbitt. 2010-2012 Post-doctoral fellowship at UCSF, Dept. Of Bioengineering and Therapeutic Sciences. Host: Prof. Tanja Kortemme. 2004-2010 PhD at The Hebrew University of Jerusalem (Israel), bioinformatics. Host: Prof. Hanah Margalit. “Various Aspects of Modularity in Protein-Protein Interaction". 2001-2004 MSc at The Hebrew University of Jerusalem (Israel), bioinformatics and human genetics. Host: Prof. Muli Ben-Sasson. “Exploiting the Exploiters: Identification of Virus-Host Pep- tide Mimicry as a Source for Modules of Functional Significance”. MAGNA CUM LAUDE. 1997-2000 BSc at Bar-Ilan University (Israel), biology (major) and computer science (minor). Final project advisor: Prof. Ramit Mehr “Modeling the Evolution of the Immune System: a Sim- ulation of the Evolution of Genes that Encode the Variable Regions of Immunoglobulins”. MAGNA CUM LAUDE. 1996-1997 First year of "Industrial Engineering and Management" studies, Tel-Aviv University, Israel. OTHER WORK EXPERIENCE 2000-01 ‘Do-coop technologies’: Team leader and chemistry/microbiology researcher; development of biological applications and manufacture of proprietary nanoparticles (Or Yehuda, Israel and Tel-Aviv University (Prof. Eshel Ben-Jacob’s lab at the school of physics)). FUNDING, HONORS AND AWARDS 2017 Grant: Co-PI, “Utilizing metagenomic sequences for enzyme function prediction”, Joint Genome Institute (US Department of Energy) (http://jgi.doe.gov/doe-user-facilities-ficus- join-forces-to-tackle-biology-big-data/).
    [Show full text]
  • UC San Diego UC San Diego Electronic Theses and Dissertations
    UC San Diego UC San Diego Electronic Theses and Dissertations Title Analysis and applications of conserved sequence patterns in proteins Permalink https://escholarship.org/uc/item/5kh9z3fr Author Ie, Tze Way Eugene Publication Date 2007 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Analysis and Applications of Conserved Sequence Patterns in Proteins A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Tze Way Eugene Ie Committee in charge: Professor Yoav Freund, Chair Professor Sanjoy Dasgupta Professor Charles Elkan Professor Terry Gaasterland Professor Philip Papadopoulos Professor Pavel Pevzner 2007 Copyright Tze Way Eugene Ie, 2007 All rights reserved. The dissertation of Tze Way Eugene Ie is ap- proved, and it is acceptable in quality and form for publication on microfilm: Chair University of California, San Diego 2007 iii DEDICATION This dissertation is dedicated in memory of my beloved father, Ie It Sim (1951–1997). iv TABLE OF CONTENTS Signature Page . iii Dedication . iv Table of Contents . v List of Figures . viii List of Tables . x Acknowledgements . xi Vita, Publications, and Fields of Study . xiii Abstract of the Dissertation . xv 1 Introduction . 1 1.1 Protein Homology Search . 1 1.2 Sequence Comparison Methods . 2 1.3 Statistical Analysis of Protein Motifs . 4 1.4 Motif Finding using Random Projections . 5 1.5 Microbial Gene Finding without Assembly . 7 2 Multi-Class Protein Classification using Adaptive Codes . 9 2.1 Profile-based Protein Classifiers . 14 2.2 Embedding Base Classifiers in Code Space .
    [Show full text]
  • Pacific Symposium on Biocomputing 2020 January 3-7, 2020 Big Island of Hawaii
    Pacific Symposium on Biocomputing 2020 January 3-7, 2020 Big Island of Hawaii Program Book PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 Big Island of Hawaii, January 3-7, 2020 Welcome to PSB 2020! We have prepared this program book to give you quick access to information you need for PSB 2020. Enclosed you will find: • Logistics information • Medical services information • Full conference schedule (see website for the latest version) • Agendas for workshops and JBrowse demo • Call for Session and Workshop Proposals for PSB 2021 • Poster/abstract titles and author index • Participant list The latest conference materials are available online at http://psb.stanford.edu/conference-materials/. PSB 2020 gratefully acknowledges the support of the Cleveland Institute for Computational Biology; UPenn Institute for Biomedical Informatics; Variant Bio; Penn Center for Precision Medicine, Penn Medicine; Cipherome; Translational Bioinformatics Conference (TBC); the National Institutes of Health (NIH); and the International Society for Computational Biology (ISCB). If you or your institution are interested in sponsoring, PSB, please contact Tiffany Murray at [email protected] If you have any questions, the PSB registration staff (Tiffany Murray, Kasey Miller, BJ Morrison-McKay, and Cindy Paulazzo) are happy to help you. Aloha! Russ Altman Keith Dunker Larry Hunter Teri Klein Maryln Ritchie The PSB 2020 Organizers PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 Big Island of Hawaii, January 3-7, 2020 SPEAKER INFORMATION Oral presentations of accepted proceedings papers will take place in Salon 2 & 3. Speakers are allotted 10 minutes for presentation and 5 minutes for questions for a total of 15 minutes. Instructions for uploading talks were sent to authors with oral presentations.
    [Show full text]