Overview of Biological Database Mapping Services for Interoperation Between Different ‘Omics’ Datasets Shweta S

Total Page:16

File Type:pdf, Size:1020Kb

Overview of Biological Database Mapping Services for Interoperation Between Different ‘Omics’ Datasets Shweta S SOFTWARE REVIEW Overview of biological database mapping services for interoperation between different ‘omics’ datasets Shweta S. Chavan,1,2 John D. Shaughnessy Jr2 and Ricky D. Edmondson2 1University of Arkansas Little Rock (UALR)/University of Arkansas Medical Sciences (UAMS) Joint Bioinformatics Program, AR 72204, USA 2Myeloma Institute for Research and Therapy, UAMS, Little Rock, AR 72205, USA *Correspondence to: Tel: þ1 501 454 2764; Fax: þ1 501 686 6442; E-mail: [email protected] Date received: 17th May 2011 Abstract Many primary biological databases are dedicated to providing annotation for a specific type of biological molecule such as a clone, transcript, gene or protein, but often with limited cross-references. Therefore, enhanced mapping is required between these databases to facilitate the correlation of independent experimental datasets. For example, molecular biology experiments conducted on samples (DNA, mRNA or protein) often yield more than one type of ‘omics’ dataset as an object for analysis (eg a sample can have a genomics as well as proteomics expression dataset available for analysis). Thus, in order to map the two datasets, the identifier type from one dataset is required to be linked to another dataset, so preventing loss of critical information in downstream analysis. This identifier mapping can be performed using identifier converter software relevant to the query and target identifier databases. This review presents the publicly available web-based biological database identifier converters, with comparison of their usage, input and output formats, and the types of available query and target database identifier types. Keywords: omics, genomics, proteomics, identifier mapping, biological database identifier converter Introduction interoperation between databases. Thus, enhanced mapping between these databases is required to Many primary biological databases are dedicated to facilitate the correlation of independent experimen- providing annotation for a specific type of biologi- tal datasets, which can be provided by Identifier cal molecule such as a clone, transcript, gene or (Id) mapping services. protein (eg the National Center for Biotechnology Id mapping services are tools to connect one Information [NCBI] Entrez Gene1 database pro- type of database Id to the corresponding Id in vides annotation for genes, whereas UniprotKB2,3 another database. This mapping includes three provides this for proteins). Other types of second- types of relationships: one-to-one, one-to-many ary databases provide relevant information about and many-to-many. One-to-many and the attributes of these molecules, such as pathway, many-to-many relationships are required to account function(s) or structural information (eg the Kyoto for biological processes such as alternative splicing, Encyclopedia of Genes and Genomes [KEGG]4 resulting in one gene giving rise to multiple tran- and Protein Data Bank [PDB].5 Often, these data- scripts, the presence of several isoforms of a single bases provide limited cross-references for protein and other similar processes occurring in a # HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 5. NO 6. 703–708 OCTOBER 2011 703 SOFTWARE REVIEW Chavan et al. cell. Also, gene expression datas such as microarray input; (ii) target output databases; (iii) species; (iv) datas are known to have multiple probes targeting a data sources—and therefore coverage of the genome single transcript and vice versa (eg Affymetrix6 or proteome of a particular species; (v) ease of use; probes, which can be described by many-to-many (vi) database update frequency; (vii) possible conver- relationships). Thus, mapping Ids of multiple data- sion types (eg protein-to-gene, gene-to-transcripts bases to one another facilitates the correlation of etc.); (viii) speed of conversion; (ix) a detailed help different types of ‘omics’ datasets which, in turn, section or tutorial describing the intended use of might provide meaningful insights into the biologi- the application; and (x) an algorithm for mapping cal processes occurring in a cell. database Ids. In general, these services establish Several Id mapping services are publicly available mapping links using existing cross-references or by (Table 1). The seven Id converters that are discussed using sequence alignment information to determine in detail in this review were selected to represent the the match. Fewer Id converters have their own majority of Id converters—as well as major biological published algorithm for establishing the mapping, in databases—used in high-throughput genomics and addition to using the existing cross-references. proteomics datasets. They have many common fea- Many of the Id converters provide a web-based tures, such as: (i) supporting one-to-many and many- intuitive user interface, generally having three com- to-many relationships; (ii) providing mapping counts, ponents: input types (ie query databases), output and the details of the database Ids used as a ‘bridge’ to types (ie target databases) and the species under link to target database Ids; (iii) providing a web-based consideration. One such web application is Clone/ graphical user interface (GUI) which allows sub- Gene ID Converter.7 It provides the option for mission of a single Id or a batch search for multiple several query and target databases for human, Ids; and (iv) the output database Ids are hyperlinked mouse and rat species. The output can be custo- to their original database for reference. mised by selecting from a number of output data- These Id converters also differ in a number of bases which are divided into several logical levels, ways, such as in the availability of (i) type of query such as gene, gene clone, protein and functional annotation. Further, detailed references are pro- Table 1. Link to web interface of various Id converters vided for the resultant output Ids by hyperlinking them to their original data sources. The output can Mapping services Link also be obtained in a spreadsheet or text format. Gene/Clone ID converter http://idconverter. A detailed list of input/output databases, availability bioinfo.cnio.es/ and other pertinent features can be found in Table ID mapping by UniProt http://www.uniprot.org/ 2. A useful piece of information provided on the ?tab=mapping interface itself includes the specific version that was MatchMiner http://discover.nci.nih. used as the data source for individual databases. gov/matchminer/ This is of importance, considering the frequent MatchMinerLookup.jsp updates of sequence databases and the increasing DAVID gene ID conversion http://david.abcc.ncifcrf. novel findings about the biological molecules in tool gov/conversion.jsp the respective research areas. Another similar type of Id converter is ID g:Convert http://biit.cs.ut.ee/ mapping hosted by UniProt.3 It supports almost all gprofiler/gconvert.cgi organisms, with monthly updates, and maps approxi- CRONOS http://mips.helmholtz- mately 90 database sources, including primary muenchen.de/genre/ sequence databases and secondary functional/ proj/cronos/index.html structural annotation databases. Thus, the input and bioDBnet:db2db http://biodbnet.abcc. output database option is divided into Uniprot, other ncifcrf.gov/db/db2db.php sequence database, three-dimensional structure, 704 # HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 5. NO 6. 703–708 OCTOBER 2011 Overview of biological database mapping services Table 2. Comparison of various Id converters Features of Gene/Clone ID ID mapping MatchMiner DAVID gene ID g:Convert CRONOS bioDBnet:db2db mapping converter by UniProt conversion tool # services HENRY STEWART PUBLICATIONS 1479–7364. Interface Web-based Web-based Web-based Web-based Web-based Web-based Web-based GUI form GUI form GUI form GUI form, GUI form GUI form GUI form command line Output Html, text, Html, text Html, text, Html, text, Html, text, Html, email for Text, spreadsheet format spreadsheet spreadsheet spreadsheet spreadsheet, minimal batch mode (no header) Organisms Human, Human, mouse, Human, mouse Human, about Human, mouse, rat Human, mouse, A specified list could mouse, rat rat and many another 90,000 and 31 other rat, cow, dog, and not be found other species species Ensembl-supported fruit fly genome species HUMAN GENOMICS Input/output Clone Ids, GenBank, Affymetrix Ids, Affymetrix Id, Agilent Affymetrix, Agilent, Ensembl/FlyBase Affymetrix, Agilent, clone or Affymetrix Ids, EMBL, DDBJ GenBank Id, Illumina Id, CCDS Ids, Ensembl Transcript ID, EMBL, GenBank, RefSeq transcript GenBank Accession, EST, GenBank Accession, transcript, Illumina, Affymetrix, Agilent, Genomic, RefSeq Accession IMAGE Clone Id, Gene symbol, RefSeq DNA/ CCDS Nucleotide (Additional FISH-mapped GenPept Accession, Genomic output: EMBL)* BAC Clone Id NCBI GI, RefSeq . VOL 5. NO 6. 703–708 OCTOBER 2011 RNA/Genomic accession Input/output HUGO gene Entrez Gene, Gene Symbol Entrez gene Id, Ensembl Gene, Gene Name, Entrez Gene ID, gene names, Entrez HGNC, HUGO/Alias, Ensembl gene/ Entrez Gene, Ensembl/FlyBase Ensembl Gene ID, gene Ids, Ensembl Ensembl, Name, UniGene transcript Id, RefSeq mrna, Gene ID, GI, UniGene gene Ids, UniGene UniGene, Cluster Id, Entrez RefSeq mRNA UniGene GeneID, HGNC, cluster Ids, RefSeq TIGR (JCVI) Gene Id, RefSeq accession, RefSeq mRNA RNAs (Additional RNA UniGene Id output: CCDS)a Input/output RefSeq peptides, UniProtKB, RefSeq protein PIR accession, PIR Id, Ensembl Protein, Protein Name, UniProt Accession, protein SwissProt names RefSeq, PIR NREF Id, RefSeq IPI,
Recommended publications
  • Maternally Inherited Pirnas Direct Transient Heterochromatin Formation
    RESEARCH ARTICLE Maternally inherited piRNAs direct transient heterochromatin formation at active transposons during early Drosophila embryogenesis Martin H Fabry, Federica A Falconio, Fadwa Joud, Emily K Lythgoe, Benjamin Czech*, Gregory J Hannon* CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, United Kingdom Abstract The PIWI-interacting RNA (piRNA) pathway controls transposon expression in animal germ cells, thereby ensuring genome stability over generations. In Drosophila, piRNAs are intergenerationally inherited through the maternal lineage, and this has demonstrated importance in the specification of piRNA source loci and in silencing of I- and P-elements in the germ cells of daughters. Maternally inherited Piwi protein enters somatic nuclei in early embryos prior to zygotic genome activation and persists therein for roughly half of the time required to complete embryonic development. To investigate the role of the piRNA pathway in the embryonic soma, we created a conditionally unstable Piwi protein. This enabled maternally deposited Piwi to be cleared from newly laid embryos within 30 min and well ahead of the activation of zygotic transcription. Examination of RNA and protein profiles over time, and correlation with patterns of H3K9me3 deposition, suggests a role for maternally deposited Piwi in attenuating zygotic transposon expression in somatic cells of the developing embryo. In particular, robust deposition of piRNAs targeting roo, an element whose expression is mainly restricted to embryonic development, results in the deposition of transient heterochromatic marks at active roo insertions. We hypothesize that *For correspondence: roo, an extremely successful mobile element, may have adopted a lifestyle of expression in the [email protected] embryonic soma to evade silencing in germ cells.
    [Show full text]
  • The Biogrid Interaction Database
    D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms.
    [Show full text]
  • Flybase: Updates to the Drosophila Melanogaster Knowledge Base Aoife Larkin 1,*, Steven J
    Published online 21 November 2020 Nucleic Acids Research, 2021, Vol. 49, Database issue D899–D907 doi: 10.1093/nar/gkaa1026 FlyBase: updates to the Drosophila melanogaster knowledge base Aoife Larkin 1,*, Steven J. Marygold 1, Giulia Antonazzo1, Helen Attrill 1, Gilberto dos Santos2, Phani V. Garapati1, Joshua L. Goodman3,L.SianGramates 2, Gillian Millburn1, Victor B. Strelets3, Christopher J. Tabone2, Jim Thurmond3 and FlyBase Consortium† 1 Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge Downloaded from https://academic.oup.com/nar/article/49/D1/D899/5997437 by guest on 15 January 2021 CB2 3DY, UK, 2The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA and 3Department of Biology, Indiana University, Bloomington, IN 47405, USA Received September 15, 2020; Revised October 13, 2020; Editorial Decision October 14, 2020; Accepted October 22, 2020 ABSTRACT In this paper, we highlight a variety of new features and improvements that have been integrated into FlyBase since FlyBase (flybase.org) is an essential online database our last review (3). We have introduced a number of new fea- for researchers using Drosophila melanogaster as a tures that allow users to more easily find and use the data in model organism, facilitating access to a diverse ar- which they are interested; these include customizable tables, ray of information that includes genetic, molecular, improved searching using expanded Experimental Tool Re- genomic and reagent resources. Here, we describe ports, more links to and from other resources, and bet- the introduction of several new features at FlyBase, ter provision of precomputed bulk files. We have upgraded including Pathway Reports, paralog information, dis- tools such as Fast-Track Your Paper and ID validator.
    [Show full text]
  • Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd
    European Bioinformatics Institute EMBL-EBI Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by PickeringHutchins Ltd www.pickeringhutchins.com EMBL member states: Austria, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom. Associate member state: Australia EMBL-EBI is a part of the European Molecular Biology Laboratory (EMBL) EMBL-EBI EMBL-EBI EMBL-EBI EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD United Kingdom Tel. +44 (0)1223 494 444, Fax +44 (0)1223 494 468 www.ebi.ac.uk EMBL Heidelberg Meyerhofstraße 1 69117 Heidelberg Germany Tel. +49 (0)6221 3870, Fax +49 (0)6221 387 8306 www.embl.org [email protected] EMBL Grenoble 6, rue Jules Horowitz, BP181 38042 Grenoble, Cedex 9 France Tel. +33 (0)476 20 7269, Fax +33 (0)476 20 2199 EMBL Hamburg c/o DESY Notkestraße 85 22603 Hamburg Germany Tel. +49 (0)4089 902 110, Fax +49 (0)4089 902 149 EMBL Monterotondo Adriano Buzzati-Traverso Campus Via Ramarini, 32 00015 Monterotondo (Rome) Italy Tel. +39 (0)6900 91402, Fax +39 (0)6900 91406 © 2012 EMBL-European Bioinformatics Institute All texts written by EBI-EMBL Group and Team Leaders. This publication was produced by the EBI’s Outreach and Training Programme. Contents Introduction Foreword 2 Major Achievements 2011 4 Services Rolf Apweiler and Ewan Birney: Protein and nucleotide data 10 Guy Cochrane: The European Nucleotide Archive 14 Paul Flicek:
    [Show full text]
  • The Drosophila Melanogaster Transcriptome by Paired-End RNA Sequencing
    Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press Resource The Drosophila melanogaster transcriptome by paired-end RNA sequencing Bryce Daines,1,2,6 Hui Wang,2,6 Liguo Wang,3,6 Yumei Li,1,2 Yi Han,2 David Emmert,4 William Gelbart,4 Xia Wang,1,2 Wei Li,3,5 Richard Gibbs,1,2,7 and Rui Chen1,2,7 1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA; 2Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 3Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA; 4Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA; 5Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA RNA-seq was used to generate an extensive map of the Drosophila melanogaster transcriptome by broad sampling of 10 developmental stages. In total, 142.2 million uniquely mapped 64–100-bp paired-end reads were generated on the Illumina GA II yielding 3563 sequencing coverage. More than 95% of FlyBase genes and 90% of splicing junctions were observed. Modifications to 30% of FlyBase gene models were made by extension of untranslated regions, inclusion of novel exons, and identification of novel splicing events. A total of 319 novel transcripts were identified, representing a 2% increase over the current annotation. Alternate splicing was observed in 31% of D. melanogaster genes, a 38% increase over previous estimations, but significantly less than that observed in higher organisms. Much of this splicing is subtle such as tandem alternate splice sites.
    [Show full text]
  • Rnacentral: Progress in the Development of the Non-Coding RNA Sequence Database
    RNAcentral: Progress in the Development of the Non-coding RNA Sequence Database Other potential titles: ● RNAcentral: New Developments in the Non-coding RNA Sequence Database ● RNAcentral: Three Years of ncRNA Data Integration Anton I. Petrov​1,​ Blake A. Sweeney​1,​ Boris Burkov​1*,​ Simon Kay​1,​ Rob Finn​1,​ Alex Bateman​1,​ and The RNAcentral Consortium European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD *To whom correspondence should be addressed: [email protected] BACKGROUND More than fifty major specialised resources exist in the area of non-coding RNA (ncRNA) research. RNAcentral (h​ttp://rnacentral.org)​ provides a unified interface that aggregates information from over twenty of them, including: - larger databases (e.g. RefSeq, ENA, Ensembl!) - databases with specific content type (e.g. Rfam, PDBe) - RNA type-specific databases (e.g. miRBase, GtRNAdb, snoPY) - organism-specific databases (e.g. HGNC, FlyBase, PomBase). RESULTS Launched in 2014 [2], RNAcentral currently contains over 10 million ncRNA sequences from more than 20 RNA databases and Is updated several times a year. Recent updates include ncRNA data from: - HGNC - Ensembl - FlyBase - Rfam (most of sequences in RNAcentral are annotated with Rfam families, while the rest are candidates for producing new Rfam families) There are three main ways of browsing the data through the RNAcentral website. - The text search makes it easy to explore all ncRNA sequences, compare data across different resources, and discover what is known about each ncRNA. - Using the sequence similarity search one can search data from multiple RNA databases starting from a sequence. - Finally, one can explore ncRNAs in select species by genomic location using an integrated genome browser.
    [Show full text]
  • Basic Bioinformatics
    Basic Bioinformatics Purpose of this protocol: From your literature reading, you might find proteins from other model systems (for example mouse) that have known roles in apoptosis, either upstream or downstream of IAPs. You will need to find out: A Search GenBank for the mammalian gene(s) mentioned in the literature, get the GenBank accession numbers of these genes; B Search FlyBase to identify Drosophila homologs and to find out more information on these genes – CG number of the gene, coding and 5’ and 3’-UTR sequences, and other information on the function of this gene. Also find out if it has alternative isoforms; C Search OpenBiosystems to find out if they carry RNAi clones of these genes, and get the clone ID and catalog number; This protocol has three sections, to answer the above three questions. Part A: Find genes in GenBank by name. 1) Go to NCBI website: http://www.ncbi.nlm.nih.gov/ . Under “Search”, select “Nucleotide” and type in the gene name and species that you want to search for. Use Booleans, and tags which you can find at the PubMed help page: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html#SearchFieldD escriptionsandTags If you know the author who published the paper on the gene you are interested, you can also include the author name in your search along with the gene name and species. Note: You will usually get more than one result. Read the title of each to screen out the ones that are obviously not what you are looking for.
    [Show full text]
  • Flybase 2.0: the Next Generation Jim Thurmond1, Joshua L
    Published online 26 October 2018 Nucleic Acids Research, 2019, Vol. 47, Database issue D759–D765 doi: 10.1093/nar/gky1003 FlyBase 2.0: the next generation Jim Thurmond1, Joshua L. Goodman1, Victor B. Strelets1, Helen Attrill2, L. Sian Gramates 3, Steven J. Marygold2, Beverley B. Matthews3, Gillian Millburn2, Giulia Antonazzo2, Vitor Trovisco2, Thomas C. Kaufman1, Brian R. Calvi1,* and the FlyBase Consortium† 1Department of Biology, Indiana University, Bloomington, IN 47408, USA, 2Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK and 3The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA Received September 20, 2018; Editorial Decision October 06, 2018; Accepted October 09, 2018 ABSTRACT primary scientific literature covering more than a century of genetics research. Over the years, the consortium has de- FlyBase (flybase.org) is a knowledge base that sup- veloped new formats of data display and new bioinformatic ports the community of researchers that use the fruit tools to mine these data for biological discovery and trans- fly, Drosophila melanogaster, as a model organism. lational research. These efforts have transformed FlyBase The FlyBase team curates and organizes a diverse from a simple database into a powerful knowledge base. array of genetic, molecular, genomic, and develop- The FlyBase site has undergone major changes since our mental information about Drosophila. At the begin- last review two years ago (1). In February 2017, we released ning of 2018, ‘FlyBase 2.0’ was released with a signifi- a beta version of the next-generation website, which we have cantly improved user interface and new tools.
    [Show full text]
  • Phylogenetic Profiling of the Arabidopsis Thaliana Proteome
    Open Access Research2004Gutiérrezet al.Volume 5, Issue 8, Article R53 Phylogenetic profiling of the Arabidopsis thaliana proteome: what comment proteins distinguish plants from other organisms? Rodrigo A Gutiérrez*†§, Pamela J Green‡, Kenneth Keegstra*† and John B Ohlrogge† Addresses: *Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824-1312, USA. †Department of Plant Biology, Michigan State University, East Lansing, MI 48824-1312, USA. ‡Delaware Biotechnology Institute, University of Delaware, 15 § Innovation Way, Newark, DE 19711, USA. Current address: Department of Biology, New York University, 100 Washington Square East, New reviews York, NY 10003, USA. Correspondence: Rodrigo A Gutiérrez. E-mail: [email protected] Published: 15 July 2004 Received: 16 March 2004 Revised: 10 May 2004 Genome Biology 2004, 5:R53 Accepted: 7 June 2004 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2004/5/8/R53 reports © 2004 Gutiérrez et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL Phylogenetic<p>Theopportunitycodingof life</p> genes availability to profilingof decipher <it>Arabidopsis of theof the the complete genetic Arabidopsis thaliana genomefactors </it>onthaliana that sequence define the proteome: ofbasis plant <it>Arab of form whattheir idopsisand proteinspattern function. thaliana of distingu sequence To </it>together beginish similarityplants this task, from wi to thwe other organismsthose have organisms? of classified other across organisms the the nuc threlear proe domains protein-vides an deposited research Abstract Background: The availability of the complete genome sequence of Arabidopsis thaliana together with those of other organisms provides an opportunity to decipher the genetic factors that define plant form and function.
    [Show full text]
  • Arabidopsis Thaliana Functional Genomics Project Annual Report 2008
    The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project Annual Report 2008 Xing Wang Deng [email protected] Chair Joe Kieber [email protected] Co-chair Joanna Friesner [email protected] MASC Coordinator and Executive Secretary Thomas Altmann [email protected] Sacha Baginsky [email protected] Ruth Bastow [email protected] Philip Benfey [email protected] David Bouchez [email protected] Jorge Casal [email protected] Danny Chamovitz [email protected] Bill Crosby [email protected] Joe Ecker [email protected] Klaus Harter [email protected] Marie-Theres Hauser [email protected] Pierre Hilson [email protected] Eva Huala [email protected] Jaakko Kangasjärvi [email protected] Julin Maloof [email protected] Sean May [email protected] Peter McCourt [email protected] Harvey Millar [email protected] Ortrun Mittelsten- Scheid [email protected] Basil Nikolau [email protected] Javier Paz-Ares [email protected] Chris Pires [email protected] Barry Pogson [email protected] Ben Scheres [email protected] Randy Scholl [email protected] Heiko Schoof [email protected] Kazuo Shinozaki [email protected] Klaas van Wijk [email protected] Paola Vittorioso [email protected] Wolfram Weckwerth [email protected] Weicai Yang [email protected] The Multinational Arabidopsis Steering Committee—July 2008 Front Cover Design Philippe Lamesch, Curator at TAIR/Carnegie Institute for Science, and Joanna Friesner, MASC Coordinator at the University of California, Davis, USA Images Map of geographical distribution of ecotypes of Arabidopsis thaliana Updated representation contributed by Philippe Lamesch, adapted from Jonathon Clarke, UK (1993).
    [Show full text]
  • Bioinformatics Approaches for Functional Predictions in Diverse Informatics Environments. Paula Maria Moolhuijzen, Bsc This Thes
    Bioinformatics approaches for functional predictions in diverse informatics environments. Paula Maria Moolhuijzen, BSc This thesis is presented for the degree of Doctor of Philosophy of Murdoch University 2011 i Declaration I declare that this thesis is my own account of my research and contains as its main content work, which has not previously been submitted for a degree at any tertiary education institution. Signature: Name: Paula Moolhuijzen Date: 4th October 2011 ii Abstract Bioinformatics is the scientific discipline that collates, integrates and analyses data and information sets for the life sciences. Critically important in agricultural and biomedical fields, there is a pressing need to integrate large and diverse data sets into biologically significant information. This places major challenges on research strategies and resources (data repositories, computer infrastructure and software) required to integrate relevant data and analysis workflows. These challenges include: ! The construction of processes to integrate data from disparate and diverse resources and legacy systems that have variable data formats, qualities, availability and accessibility constraints. ! Substantially contributing to hypothesis driven research for biologically significant information. The hypothesis proposed in this thesis is that in organisms from divergent origins, with differing data availability and analysis resources, in silico approaches can identify genomic targets in a range of disease systems. The particular aims were to: 1. Overcome data constraints that impact analysis of different organisms. 2. Make functional genomic predictions in diverse biological systems. 3. Identify specific genomic targets for diagnostics and therapeutics in diverse disease mechanisms. iii In order to test the hypothesis three case studies in human cancer, pathogenic bacteria, and parasitic arthropod were selected, the results are as follows.
    [Show full text]
  • Abstracts Selected for Platform Presentation
    Abstracts Selected for Platform Presentation Using semantic similarity analysis based on Human Phenotype Ontology terms to identify genetic etiologies for epilepsy and neurodevelopmental disorders Ingo Helbig1,2,3, Shiva Ganesan1,2, Peter Galer1,2, Colin A. Ellis3, Katherine L. Helbig1,2 1 Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA, 19104 USA 2 Department of Biomedical and Health Informatics (DBHi), Children’s Hospital of Philadelphia, Philadelphia, PA, 19104 USA 3 Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, 19104 USA More than 25% of children with intractable epilepsies have an identifiable genetic cause, and the goal of epilepsy precision medicine is to stratify patients based on genetic findings to improve diagnosis and treatment. Large-scale genetic studies in > 15,000 epilepsy patients have provided deep insights into causative genetic changes. However, the understanding of how these changes link to specific clinical features is lagging behind. This deficit is largely due to the lack of frameworks to analyze large-scale phenotypic data, which dramatically impedes the ability to translate genetic findings into improved treatment. The Human Phenotype Ontology (HPO) has been developed as a curated terminology, with formal semantic relationships between more than 13,500 phenotypic terms, and has been adopted by many diagnostic laboratories, the UK’s 100,000 Genomes Project, the NIH Undiagnosed Diseases Network, the GA4GH, and hundreds of global initiatives. The HPO also contains a rich vocabulary for epilepsy-related phenotypes, which we have developed in prior large research networks and expanded in current projects. However, analytical approaches leveraging this rich data resource have not been implemented so far.
    [Show full text]