Databases and Algorithms for Pathway Bioinformatics

Databases and Algorithms for Pathway Bioinformatics Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International [email protected] BioCyc.org EcoCyc.org MetaCyc.org HumanCyc.org SRI International MOD Home Pages Bioinformatics Learn or standardize? Top / Left Cascading menus or not Must-haves on home page: Citing MOD Software/data download Contact us News Publications Statistics Update history Credits BioWarehouse Peter D. Karp, Tom J. Lee, Valerie Wagner, Yannick Pouliot MAGE-ML UniProt BioCyc Taxonomy ENZYME BioWarehouse [Oracle or MySQL] Genbank CMR GO KEGG BioPAX BioCyc ENZYMEUniProt TaxonomyCMR Genbank= C-basedOracleBioWarehouseKEGG Loaderor= Java-based Loader MySQL SRI International Motivations Bioinformatics Hundreds of bioinformatics DBs exist Important problems involve queries across multiple DBs SRI International Why is the Multidatabase ApproachBioinformatics Alone Not Sufficient? Multidatabase query approaches assume databases are in a queryable DBMS Most sites that do operate DBMSs do not allow remote query access because of security and loading concerns Users want to control data stability Users want to control speed of their hardware Internet bandwidth limits query throughput Users need to capture, integrate and publish locally produced data of different types Multidatabase and Warehouse approaches complementary SRI International Technical Approach Bioinformatics Multi-platform support: Oracle (10g) and MySQL Schema support for multitude of bioinformatics datatypes Create loaders for public bioinformatics DBs Parse file format of the source DB Semantic transformations Insert DB contents into warehouse tables Provide Warehouse query access mechanisms SQL queries via ODBC, JDBC, OAA Operate public BioWarehouse server: publichouse BMC Bioinformatics 7:170 2006 SRI International BioWarehouse Schema Bioinformatics Manages many bioinformatics datatypes simultaneously Pathways, Reactions, Chemicals Proteins, Genes, Replicons Sequences, Sequence Features Organisms, Taxonomic relationships Computations (sequence matches) Citations, Controlled vocabularies Links to external databases Each type of warehouse object implemented through one or more relational tables (currently 43) SRI International Warehouse Schema Bioinformatics Different databases storing the same biological datatypes are coerced into same warehouse tables Design of most datatypes inspired by multiple databases Representational tricks to decrease schema bloat Single space of primary keys Single set of satellite tables such as for synonyms, citations, comments, etc. SRI International BioWarehouse Loaders Bioinformatics Database Loader Input Comments Language Format Any BioPAX Java BioPAX Protein interaction data only BioCyc C BioCyc attribute-value Pathway/Genome Databases CMR C CMR column-delimited Comprehensive Microbial Resource: 150+ microbial genomes ENZYME Java ENZYME attribute-value Enzyme Commission set of reactions Genbank Java XML derived from ASN.1 Bacterial subset of Genbank Gene Ontology Java OBO XML Hierarchical controlled vocabulary KEGG C KEGG format Metabolic pathway data Any MAGE-ML Java MAGE-ML format Microarray gene expression data NCBI Taxonomy C Taxonomy format Organism taxonomy UniProt Java UniProt XML SWISS-PROT and TrEMBL SRI International Uses of BioWarehouse Bioinformatics SRI Bioinformatics Research Group Extract genome data from CMR for generation of BioCyc Tier 3 Use CMR for genome-context extensions to pathway hole filler Enzyme genomics research SRI International Pathway Tools Software Bioinformatics PathoLogic Predicts operons, metabolic network, pathway hole fillers, from genome Computational creation of new model organism databases Can work as complete MOD system, or pathway module only Pathway/Genome Editors Distributed curation of PGDBs Distributed object database system, interactive editing tools Pathway/Genome Navigator WWW publishing of PGDBs Querying, visualization of pathways, chromosomes, operons Pathway visualization of omics data Global comparisons of metabolic networks Brg.ai.sri.com/ptools/ SRI International Pathway Tools Update Bioinformatics Version 10.5 Ontology upgrade for signaling interactions Consistency checker Generate metabolic map poster Zooming in Overview Sequence retrieval tool BioPAX export, pathway page New reaction editor Spelling checker Version 11.0 Regulatory network viewer SRI International Acknowledgements Bioinformatics SRI Funding sources: Suzanne Paley, Michelle Green, NIH National Institute of Ron Caspi, Ingrid Keseler, Mario General Medical Sciences Latendresse, Carol Fulcher, NIH National Center for Markus Krummenacker, Alex Research Resources Shearer NIH National Human Genome Research Institute BioCyc.org Learn more from BioCyc webinars: biocyc.org/webinar.shtml.

Databases and Algorithms for Pathway Bioinformatics

Hypertrophic Cardiomyopathy- Associated Mutations in Genes That Encode Calcium-Handling Proteins

In This Table Protein Name, Uniprot Code, Gene Name P-Value

DOE Systems Biology Knowledgebase Implementation Plan

Anti-SRI / Sorcin Antibody (ARG42959)

Genetic Determinants of Sporadic Breast Cancer in Sri Lankan Women Nirmala Dushyanthi Sirisena1*, Adebowale Adeyemo2, Anchala I

Annexins: Putative Linkers in Dynamic Membrane–Cytoskeleton

The Biocyc Collection of Microbial Genomes and Metabolic Pathways Peter D

The Sri Lankan Personal Genome Project: an Overview

An Expanded Proteome of Cardiac T-Tubules☆

PATHWAY TOOLS Ii Pathway Tools User’S Guide, Version 13.0

The Human GCOM1 Complex Gene Interacts with the NMDA Receptor and Internexin-Alpha☆

TOPSAN: a Dynamic Web Database for Structural Genomics Kyle Ellrott1,2, Christian M