Bioinformatic Databases

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatic Databases PharmaMatrix Workshop 2010 Bioinforma6c Databases 14 July 2010 Philip Winter & Ishwar Hosamani Database Growth Source: http://www.kokocinski.net/bioinformatics/databases.php Database Survey Genes & Proteins Cheminformacs: Gene & Protein Drugs & Metabolites Interac=ons Database Survey Pfam PDB TGI UniProt dbSNP Genes & GenBank Proteins GEO Cheminformacs: Gene & Protein Drugs & Metabolites Interac=ons Database Survey Pfam PDB TGI UniProt dbSNP Genes & GenBank Proteins GEO Cheminformacs: Gene & Protein Drugs & Metabolites Interac=ons SciFinder DrugBank PubChem ZINC Database Survey Pfam PDB TGI UniProt dbSNP Genes & GenBank Proteins GEO Cheminformacs: Gene & Protein Drugs & Metabolites Interac=ons SciFinder KEGG DrugBank PubChem NetPath BioGRID ZINC Curaon • Manual curaon (or just curaon): A human creates and annotates the database entry • Automa6c curaon: A computer program creates and annotates the database entry • Semi-automa6c curaon: A combina=on of manual and automa=c Database Idenfiers • Every database record will have a unique iden6fier; oUen this will be called an accession number which is assigned with the record is first added to the database • Be careful: databases will oUen permit a record to be modified but keep the same accession number; you should record the version number as well • Furthermore, databases may have different rules for handling records that are merged or split Database Idenfier Cheat Sheets PaMern Iden6fier Examples En6ty Database URL Name [op=onal GenInfo GI: Nucleo=de GenBank, hp:// “GI:”] Iden=fier 34222261 or protein RefSeq www.ncbi.nl [digits] sequence m.nih.gov/ [le`er][5 GenBank AB088100 Nucleo=de GenBank hp:// digits] ACCESSION sequence www.ncbi.nl OR m.nih.gov/ [2 le`ers][6 digits] [2 le`er RefSeq NM_178014 Nucleo=de RefSeq hp:// type code]_ ACCESSION or protein www.ncbi.nl [digits] sequence m.nih.gov/ [GenBank GenBank or AB088100.1 Nucleo=de GenBank, hp:// or RefSeq RefSeq or protein RefSeq www.ncbi.nl ACCESSION] VERSION NM_178014 sequence m.nih.gov/ .[version .2 number] (iden=cal to GenBank Nucleo=de GenBank, hp:// accession LOCUS or protein RefSeq www.ncbi.nl for recent sequence m.nih.gov/ entries) PaMern Iden6fier Examples En6ty Database URL Name [Protein Swiss-Prot TBB5_ Protein UniProtKB/ hp:// code]_ ID (entry HUMAN sequence Swiss-Prot www.unipro [Species name) t.org/ code] [UniProt AC] UniProt ID Q9BUU9_ Protein UniProtKB/ hp:// _[Species (entry HUMAN sequence TrEMBL www.unipro code] name) t.org/ [A-N,R-Z] UniProt AC P07437 Protein UniProtKB hp:// [0-9][A-Z] (accession sequence www.unipro [A-Z, 0-9][A- number) t.org/ Z, 0-9][0-9] OR [O,P,Q][0-9] [A-Z, 0-9][A- Z, 0-9][A-Z, 0-9][0-9] PaMern Iden6fier Examples En6ty Database URL Name [capital HGNC gene TUBB Human HGNC hp:// leers or symbol gene database www.genen digits; no TUBB1 ames.org/ ini=al digit] GO:[7 GO GO: Gene class AmiGO hp:// digits] accession 0005874 www.geneo number ntology.org/ [0-9][A-Z, PDB ID 1TUB Protein, PDB hp:// 0-9][A-Z, nucleic acid, www.rcsb.o 0-9][A-Z, or complex rg/ 0-9] structure [2 or 3 PDB ligand CN2 Ligand PDB hp:// leers or ID www.rcsb.o digits] rg/ PaMern Iden6fier Examples En6ty Database URL Name [up to 7 CAS registry 64-86-8 Chemical SciFinder hps:// digits]-[2 number structure scifinder- digits]-[1 cas- digit] org.login.ez proxy.library .ualberta.ca / [digits] PubChem 6167 Chemical PubChem hp:// CID structure pubchem.nc (compound bi.nlm.nih.g ID) ov/ ZINC[8 ZINC ID ZINC006218 Chemical ZINC hp:// digits] 53 structure zinc.docking OR .org/ [digits] 621853 DB[5 digits] DrugBank DB01394 Drug DrugBank hp:// accession (chemical www.drugb number structure) ank.ca/ Key File Formats for Sequences and Structures • Sequences – FASTA format .fasta .fst .txt! • Macromolecule structures – PDB format .pdb .ent! Accessing Databases • Web interface • Query string e.g. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?! db=nucleotide&id=34222261&rettype=fasta&retmode=fasta! • Web services (SOAP) • FTP -> local copy Cheminformac Database Survey CAS SciFinder PubChem DrugBank ZINC Cheminformac Database Survey h`ps://scifinder-cas- org.login.ezproxy.library.ualberta.ca/ hp://pubchem.ncbi.nlm.nih.gov/ CAS SciFinder PubChem DrugBank ZINC hp://www.drugbank.ca/ hp://zinc.docking.org/ Cheminformac Database Survey >27 million unique structures >52 million organic compounds >23 million with 3d conforma=ons >61 million inorganic compounds Mostly organic, biologically Physical property info interesng compounds CAS SciFinder PubChem DrugBank ZINC ~4,800 drugs >1,350 FDA approved drugs >13 million purchasable compounds Includes drug target info Ready to dock Stereochemistry Issues 5 Chaetocin structures from PubChem O O O HO HO HO H H H N N N N N N S S S N N N S S S O O O O O O S S S N N N S S S N N N N N N H H H OH OH OH O O O CID 161591: CID 5390098: CID 11563851: no stereochemistry bad stereochemistry incomplete stereochemistry O O HO HO H H N N N N S S N N CID 46191942: S S Enan=omer of O O O O natural product S S N N S S CID 11657687: N N Natural product N N H H OH OH stereochemistry O O Other Cheminformac Issues • Tautomers / protonaon states? • Salt forms? • Implicit or explicit hydrogens? • 2D connec=vity only or 3D conforma=on? • Non-organic elements? – Many programs only handle: CHNOPS + halogens – But some drugs have B, Pt, Hg, As, … SMILES O H CC(=O)N[C@H]1CCC2=CC! N (=C(C(=C2C3=CC=C(C(=O)! C=C13)OC)OC)OC)OC O O O • Isomeric SMILES O – Allows specifica=on of stereochemistry O • Canonical SMILES – Canonicaliza=on will generate a unique string for a molecule, regardless of atom order – Different programs will canonicalize differently • SMARTS – Chemical pa`erns for searching or filtering hp://www.daylight.com/smiles/index.html File formats • MDL Molfile .mol – Allows a 3D conforma=on to be stored • SDF .sdf! – Wraps Molfile format; mul=ple structures; annota=ons • PDB .pdb .ent! – Not the best for small molecules Need to convert? -> Try OpenBabel hp://openbabel.org/wiki/Main_Page Pathway and Interac6on Databases KEGG Pathways NetPath BioGRID Pathway and Interac6on Databases hp://www.genome.jp/kegg/ hp://www.netpath.org/ KEGG Pathways NetPath BioGRID hp://thebiogrid.org/ Pathway and Interac6on Databases Manually drawn pathways of metabolism, signaling, and other biological processes Curated protein signal pathways in humans >300 pathways + organism specific versions 20 pathways, 1,800 interac=ons KEGG Pathways NetPath BioGRID A repository for protein and gene interac=on data 345,620 interac=ons Pathway Formats • SBML .xml! – The Systems Biology Markup Language hp://sbml.org/Main_Page • Also check out the BioPAX format hp://www.biopax.org/ Pathway Tools • libSBML hp://sbml.org/SoUware/libSBML • Cell Designer hp://www.celldesigner.org/ • CytoScape hp://www.cytoscape.org/ <?xml version="1.0" encoding="UTF-8"?> <sbml level="2" version="3" xmlns="http://www.sbml.org/sbml/level2/version3"> ... <listOfSpecies> <species compartment="cytosol" id="ES" /> <species compartment="cytosol" id="P" /> <species compartment="cytosol" id="S" /> <species compartment="cytosol" id="E" /> </listOfSpecies> <listOfReactions> <reaction id="veq"> <listOfReactants> <speciesReference species="E"/> <speciesReference species="S"/> </listOfReactants> <listOfProducts> <speciesReference species="ES"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> KEGG: Pathways in Cancer NetPath: EGFR1 pathway Exercises 1. What databases are these iden=fiers from? a. 3KYL b. EZH2 c. Q15910 d. GO:0008017 e. GI:8017 f. A9145C 2. Try finding the corresponding entries online Exercise Answers 1. What databases are these iden=fiers from? a. 3KYL -> PDB (a protein-RNA structure for telomerase reverse transcriptase, cataly=c region) b. EZH2 -> HGNC (a human gene for a histone lysine methyl transferase) c. Q15910 -> UniProt (a protein sequence for EZH2) d. GO:0008017 -> AmiGO (microtubule binding gene ontology) e. GI:8017 -> GenBank (a DNA sequence from D. melanogaster) f. A9145C -> this one’s a trick: it’s a chemical compound; you can look it up in PubChem with CID: 6438632 .
Recommended publications
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Bioinformatics Study of Lectins: New Classification and Prediction In
    Bioinformatics study of lectins : new classification and prediction in genomes François Bonnardel To cite this version: François Bonnardel. Bioinformatics study of lectins : new classification and prediction in genomes. Structural Biology [q-bio.BM]. Université Grenoble Alpes [2020-..]; Université de Genève, 2021. En- glish. NNT : 2021GRALV010. tel-03331649 HAL Id: tel-03331649 https://tel.archives-ouvertes.fr/tel-03331649 Submitted on 2 Sep 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE GRENOBLE ALPES préparée dans le cadre d’une cotutelle entre la Communauté Université Grenoble Alpes et l’Université de Genève Spécialités: Chimie Biologie Arrêté ministériel : le 6 janvier 2005 – 25 mai 2016 Présentée par François Bonnardel Thèse dirigée par la Dr. Anne Imberty codirigée par la Dr/Prof. Frédérique Lisacek préparée au sein du laboratoire CERMAV, CNRS et du Computer Science Department, UNIGE et de l’équipe PIG, SIB Dans les Écoles Doctorales EDCSV et UNIGE Etude bioinformatique des lectines: nouvelle classification et prédiction dans les génomes Thèse soutenue publiquement le 8 Février 2021, devant le jury composé de : Dr. Alexandre de Brevern UMR S1134, Inserm, Université Paris Diderot, Paris, France, Rapporteur Dr.
    [Show full text]
  • Webnetcoffee
    Hu et al. BMC Bioinformatics (2018) 19:422 https://doi.org/10.1186/s12859-018-2443-4 SOFTWARE Open Access WebNetCoffee: a web-based application to identify functionally conserved proteins from Multiple PPI networks Jialu Hu1,2, Yiqun Gao1, Junhao He1, Yan Zheng1 and Xuequn Shang1* Abstract Background: The discovery of functionally conserved proteins is a tough and important task in system biology. Global network alignment provides a systematic framework to search for these proteins from multiple protein-protein interaction (PPI) networks. Although there exist many web servers for network alignment, no one allows to perform global multiple network alignment tasks on users’ test datasets. Results: Here, we developed a web server WebNetcoffee based on the algorithm of NetCoffee to search for a global network alignment from multiple networks. To build a series of online test datasets, we manually collected 218,339 proteins, 4,009,541 interactions and many other associated protein annotations from several public databases. All these datasets and alignment results are available for download, which can support users to perform algorithm comparison and downstream analyses. Conclusion: WebNetCoffee provides a versatile, interactive and user-friendly interface for easily running alignment tasks on both online datasets and users’ test datasets, managing submitted jobs and visualizing the alignment results through a web browser. Additionally, our web server also facilitates graphical visualization of induced subnetworks for a given protein and its neighborhood. To the best of our knowledge, it is the first web server that facilitates the performing of global alignment for multiple PPI networks. Availability: http://www.nwpu-bioinformatics.com/WebNetCoffee Keywords: Multiple network alignment, Webserver, PPI networks, Protein databases, Gene ontology Background tools [7–10] have been developed to understand molec- Proteins are involved in almost all life processes.
    [Show full text]
  • The Biogrid Interaction Database
    D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms.
    [Show full text]
  • PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E
    Tomkins et al. Cell Communication and Signaling (2020) 18:92 https://doi.org/10.1186/s12964-020-00554-5 METHODOLOGY Open Access PINOT: an intuitive resource for integrating protein-protein interactions James E. Tomkins1, Raffaele Ferrari2, Nikoleta Vavouraki1, John Hardy2,3,4,5,6, Ruth C. Lovering7, Patrick A. Lewis1,2,8, Liam J. McGuffin9* and Claudia Manzoni1,10* Abstract Background: The past decade has seen the rise of omics data for the understanding of biological systems in health and disease. This wealth of information includes protein-protein interaction (PPI) data derived from both low- and high-throughput assays, which are curated into multiple databases that capture the extent of available information from the peer-reviewed literature. Although these curation efforts are extremely useful, reliably downloading and integrating PPI data from the variety of available repositories is challenging and time consuming. Methods: We here present a novel user-friendly web-resource called PINOT (Protein Interaction Network Online Tool; available at http://www.reading.ac.uk/bioinf/PINOT/PINOT_form.html) to optimise the collection and processing of PPI data from IMEx consortium associated repositories (members and observers) and WormBase, for constructing, respectively, human and Caenorhabditis elegans PPI networks. Results: Users submit a query containing a list of proteins of interest for which PINOT extracts data describing PPIs. At every query submission PPI data are downloaded, merged and quality assessed. Then each PPI is confidence scored based on the number of distinct methods used for interaction detection and the number of publications that report the specific interaction. Examples of how PINOT can be applied are provided to highlight the performance, ease of use and potential utility of this tool.
    [Show full text]
  • Genbank Is a Reliable Resource for 21St Century Biodiversity Research
    GenBank is a reliable resource for 21st century biodiversity research Matthieu Leraya, Nancy Knowltonb,1, Shian-Lei Hoc, Bryan N. Nguyenb,d,e, and Ryuji J. Machidac,1 aSmithsonian Tropical Research Institute, Smithsonian Institution, Panama City, 0843-03092, Republic of Panama; bNational Museum of Natural History, Smithsonian Institution, Washington, DC 20560; cBiodiversity Research Centre, Academia Sinica, 115-29 Taipei, Taiwan; dDepartment of Biological Sciences, The George Washington University, Washington, DC 20052; and eComputational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052 Contributed by Nancy Knowlton, September 15, 2019 (sent for review July 10, 2019; reviewed by Ann Bucklin and Simon Creer) Traditional methods of characterizing biodiversity are increasingly (13), the largest repository of genetic data for biodiversity (14, 15). being supplemented and replaced by approaches based on DNA In many cases, no vouchers are available to independently con- sequencing alone. These approaches commonly involve extraction firm identification, because the organisms are tiny, very difficult and high-throughput sequencing of bulk samples from biologically or impossible to identify, or lacking entirely (in the case of eDNA). complex communities or samples of environmental DNA (eDNA). In While concerns have been raised about biases and inaccuracies in such cases, vouchers for individual organisms are rarely obtained, often laboratory and analytical methods used in metabarcoding
    [Show full text]
  • Genbank Dennis A
    Published online 28 November 2016 Nucleic Acids Research, 2017, Vol. 45, Database issue D37–D42 doi: 10.1093/nar/gkw1070 GenBank Dennis A. Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell and Eric W. Sayers* National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA Received September 15, 2016; Revised October 19, 2016; Editorial Decision October 24, 2016; Accepted November 07, 2016 ABSTRACT data from sequencing centers. The U.S. Patent and Trade- ® mark Office also contributes sequences from issued patents. GenBank (www.ncbi.nlm.nih.gov/genbank/)isa GenBank participates with the EMBL-EBI European Nu- comprehensive database that contains publicly avail- cleotide Archive (ENA) (2) and the DNA Data Bank of able nucleotide sequences for 370 000 formally de- Japan (DDBJ) (3) as a partner in the International Nu- Downloaded from scribed species. These sequences are obtained pri- cleotide Sequence Database Collaboration (INSDC) (4). marily through submissions from individual labora- The INSDC partners exchange data daily to ensure that tories and batch submissions from large-scale se- a uniform and comprehensive collection of sequence infor- quencing projects, including whole genome shotgun mation is available worldwide. NCBI makes GenBank data (WGS) and environmental sampling projects. Most available at no cost over the Internet, through FTP and a submissions are made using the web-based BankIt wide range of Web-based retrieval and analysis services (5). http://nar.oxfordjournals.org/ or the NCBI Submission Portal. GenBank staff assign accession numbers upon data receipt.
    [Show full text]
  • Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases
    Genomics Proteomics Bioinformatics 18 (2020) 91–103 Genomics Proteomics Bioinformatics www.elsevier.com/locate/gpb www.sciencedirect.com PERSPECTIVE Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases Qingyu Chen 1,*, Ramona Britto 2, Ivan Erill 3, Constance J. Jeffery 4, Arthur Liberzon 5, Michele Magrane 2, Jun-ichi Onami 6,7, Marc Robinson-Rechavi 8,9, Jana Sponarova 10, Justin Zobel 1,*, Karin Verspoor 1,* 1 School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia 2 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK 3 Department of Biological Sciences, University of Maryland, Baltimore, MD 21250, USA 4 Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA 5 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA 6 Japan Science and Technology Agency, National Bioscience Database Center, Tokyo 102-8666, Japan 7 National Institute of Health Sciences, Tokyo 158-8501, Japan 8 Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland 9 Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland 10 Nebion AG, 8048 Zurich, Switzerland Received 8 December 2017; revised 24 October 2018; accepted 14 December 2018 Available online 9 July 2020 Handled by Zhang Zhang Introduction assembled, annotated, and ultimately submitted to primary nucleotide databases such as GenBank [2], European Nucleo- tide Archive (ENA) [3], and DNA Data Bank of Japan Biological databases represent an extraordinary collective vol- (DDBJ) [4] (collectively known as the International Nucleotide ume of work.
    [Show full text]
  • Biogrid Australia Facilitates Collaborative Medical And
    SPECIAL ARTICLE Human Mutation OFFICIAL JOURNAL BioGrid Australia Facilitates Collaborative Medical and Bioinformatics Research Across Hospitals and Medical www.hgvs.org Research Institutes by Linking Data from Diverse Disease and Data Types Robert B. Merriel,1,2Ã Peter Gibbs,1–3 Terence J. O’Brien,1,4 and Marienne Hibbert4,5 1Melbourne Health, Melbourne, Australia; 2BioGrid Australia Ltd, Melbourne, Australia; 3Ludwig Institute for Cancer Research, Melbourne, Australia; 4Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Melbourne, Australia; 5Victorian Partnership for Advanced Computing, Melbourne, Australia For the HVP Bioinformatics Special Issue Received 17 September 2010; accepted revised manuscript 14 December 2010. Published online in Wiley Online Library (www.wiley.com/humanmutation). DOI 10.1002/humu.21437 between institutions. Although there was a positive collective ABSTRACT: BioGrid Australia is a federated data linkage will—there were limited resources, a lack of standards and an and integration infrastructure that uses the Internet to inconsistent approach to data collection, storage and utilization enable patient specific information to be utilized for within and across Australian hospitals. research in a privacy protected manner, from multiple Market analysis plus consultations and workshops with research- databases of various data types (e.g. clinical, treatment, ers, clinicians and key government stakeholders defined the genomic, image, histopathology and outcome), from a ‘‘preferred future state’’ and a pilot system, the Molecular Medicine range of diseases (oncological, neurological, endocrine Informatics Model (MMIM) was proposed. The vision was a virtual and respiratory) and across more than 20 health services, platform, where information is accessible to authorized users, yet the universities and medical research institutes.
    [Show full text]
  • Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D
    BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto BIOINFORMATICS SECOND EDITION METHODS OF BIOCHEMICAL ANALYSIS Volume 43 BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Copyright ᭧ 2001 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher.
    [Show full text]
  • Uniprot.Ws: R Interface to Uniprot Web Services
    Package ‘UniProt.ws’ September 26, 2021 Type Package Title R Interface to UniProt Web Services Version 2.33.0 Depends methods, utils, RSQLite, RCurl, BiocGenerics (>= 0.13.8) Imports AnnotationDbi, BiocFileCache, rappdirs Suggests RUnit, BiocStyle, knitr Description A collection of functions for retrieving, processing and repackaging the UniProt web services. Collate AllGenerics.R AllClasses.R getFunctions.R methods-select.R utilities.R License Artistic License 2.0 biocViews Annotation, Infrastructure, GO, KEGG, BioCarta VignetteBuilder knitr LazyLoad yes git_url https://git.bioconductor.org/packages/UniProt.ws git_branch master git_last_commit 5062003 git_last_commit_date 2021-05-19 Date/Publication 2021-09-26 Author Marc Carlson [aut], Csaba Ortutay [ctb], Bioconductor Package Maintainer [aut, cre] Maintainer Bioconductor Package Maintainer <[email protected]> R topics documented: UniProt.ws-objects . .2 UNIPROTKB . .4 utilities . .8 Index 11 1 2 UniProt.ws-objects UniProt.ws-objects UniProt.ws objects and their related methods and functions Description UniProt.ws is the base class for interacting with the Uniprot web services from Bioconductor. In much the same way as an AnnotationDb object allows acces to select for many other annotation packages, UniProt.ws is meant to allow usage of select methods and other supporting methods to enable the easy extraction of data from the Uniprot web services. select, columns and keys are used together to extract data via an UniProt.ws object. columns shows which kinds of data can be returned for the UniProt.ws object. keytypes allows the user to discover which keytypes can be passed in to select or keys via the keytype argument. keys returns keys for the database contained in the UniProt.ws object .
    [Show full text]
  • Unexpected Insertion of Carrier DNA Sequences Into the Fission Yeast Genome During CRISPR–Cas9 Mediated Gene Deletion
    Longmuir et al. BMC Res Notes (2019) 12:191 https://doi.org/10.1186/s13104-019-4228-x BMC Research Notes RESEARCH NOTE Open Access Unexpected insertion of carrier DNA sequences into the fssion yeast genome during CRISPR–Cas9 mediated gene deletion Sophie Longmuir, Nabihah Akhtar and Stuart A. MacNeill* Abstract Objectives: The fssion yeast Schizosaccharomyces pombe is predicted to encode ~ 200 proteins of < 100 amino acids, including a number of previously uncharacterised proteins that are found conserved in related Schizosaccharomyces species only. To begin an investigation of the function of four of these so-called microproteins (designated Smp1– Smp4), CRISPR–Cas9 genome editing technology was used to delete the corresponding genes in haploid fssion yeast cells. Results: None of the four microprotein-encoding genes was essential for viability, meiosis or sporulation, and the deletion cells were no more sensitive to a range of cell stressors than wild-type, leaving the function of the proteins unresolved. During CRISPR–Cas9 editing however, a number of strains were isolated in which additional sequences were inserted into the target loci at the Cas9 cut site. Sequencing of the inserts revealed these to be derived from the chum salmon Oncorhynchus keta, the source of the carrier DNA used in the S. pombe transformation. Keywords: Microprotein, Fission yeast, Schizosaccharomyces pombe, Oncorhynchus keta, CRISPR–Cas9 Introduction Te results presented here arose out of a project to Microproteins (also known as SEPs for smORF-encoded investigate the function of four unstudied S. pombe peptides) are small (generally < 100 amino acid) proteins microproteins, designated Smp1–Smp4 (see Table 1 for that are increasingly being implicated in a wide range of systematic IDs).
    [Show full text]