Biocyc Data Sheet

Phoeni BIOINFORMATICS BioCyc Microbial Genomes and Metabolic Pathways Web Portal The BioCyc web portal from SRI International contains genome and metabolic-pathway information for more BIOCYC IN A NUTSHELL than 14,500 microbes. BioCyc databases are unique in integrating a diverse range of data and providing • Extensive genome • Metabolic pathways and a high level of curation for important microbes. BioCyc informatics and regulatory networks curators summarize and synthesize information from comparative genomics • Omics data analysis thousands of publications, saving scientists time in tools tools including the literature research, and integrating data for large-scale • Curated data Omics Dashboard computational analyses. derived from 87,000 • Quantitative metabolic BioCyc Pathway/Genome Databases (PGDB) describe publications the genome of an organism, as well as its biochemical models pathways and (for a small fraction of organisms) its regulatory network. New expanded releases occur three times per year, often with new databases present. Two members of the BioCyc collection, the EcoCyc [1] THE BIOCYC DATABASE COLLECTION and MetaCyc [2] databases, are derived from more than three decades of literature-based curation of genome Each PGDB in the BioCyc collection describes the genome and pathway data. Compare BioCyc to other microbial and metabolic network of a single organism. genome web portals [5]. All BioCyc PGDBs include: The HumanCyc database provides a curated collection • The organism’s annotated genome of many human metabolic pathways and enzymes. • Predicted metabolic pathways BioCyc bioinformatics tools combine unparalleled breadth • Predicted pathway hole fillers (genes coding and user friendliness. BioCyc provides a unique set of for missing enzymes in metabolic pathways) visualization tools to speed comprehension of its extensive • Bacterial PGDBs include predicted operons and complex data. • Predicted reaction atom mappings BIOCYC APPLICATIONS (trace atoms from reactants to products) • Predicted Gibbs free energies of formation for chemical BioCyc databases integrate extensive data for each organism, compounds and provide platforms for analysis of large-scale datasets. • Generated metabolic network poster and genome poster BioCyc enables scientists to pursue several use cases: The exact types of data present in different databases • BioCyc is a massive encyclopedic reference on microbial will vary. Many databases also include: genes, metabolites, and pathways that integrates information from many sources. Scientists consult BioCyc • Protein subcellular locations, enzyme kinetics data, protein to save large amounts of time finding, understanding, features, Gene Ontology terms, predicted Pfam domains and synthesizing material from the primary literature. • Curated regulatory information including promoters, • BioCyc is a genome informatics and comparative operons, transcription-factor binding sites genomics platform • Ortholog relationships to other BioCyc genomes, • BioCyc enables exploration of a vast set of biological organism phenotypes networks • Database links • BioCyc provides gene-expression, metabolomics, and multi-omics analysis tools • BioCyc provides executable metabolic models for a small but growing set of organisms BioCyc databases are organized into three tiers • Mycobacterium tuberculosis H37Rv, Corynebacterium to reflect their quality levels: glutamicum ATCC 13032, Synechococcus elongatus PCC 7942, Staphylococcus aureus aureus NCTC 8325, Salmonella Tier 1 Pathway/Genome Databases have received at enterica enterica 14028S, Clostridioides difficile 630, Bacillus least one person-year, and in some cases, person-decades, anthracis Ames, Helicobacter pylori 26695, and others. of manual curation: • EcoCyc: The data in this Escherichia coli K-12 MG1655 Tier 3 Pathway/Genome Databases were computationally database have been gathered during two decades of generated with no subsequent curation. literature-based curation from more than 35,000 articles [1]. • EcoCyc describes the metabolic, transport, and regulatory BIOCYC INFORMATICS TOOLS machinery of E. coli. EcoCyc curators have authored the BioCyc provides a powerful and comprehensive set of features equivalent of 3,100 textbook-pages of mini-review summaries for querying, visualization, and analysis [3]. These tools help for E. coli genes and pathways. scientists find and digest information quickly. • MetaCyc: Contains 2,600 metabolic pathways and 15,000 GENOME INFORMATICS TOOLS biochemical reactions from all domains of life. MetaCyc data and commentary were gathered from 58,000 publications • Search for genomes by name, taxonomy, phenotypic properties to provide a comprehensive metabolic encyclopedia whose • Gene information page mini-review summaries encompass the equivalent of 8,600 – Retrieve amino-acid sequence and nucleotide sequence textbook pages [2]. of arbitrary genome region • YeastCyc: Contains a highly curated metabolic network for – Query genes by gene name, accession number, sequence Saccharomyces cerevisiae length, replicon position, GO terms, and protein properties • HumanCyc: This database was derived from a computational (pI, MW, protein features, subcellular location, ligand) pathway analysis of the human genome, followed by • Genome Browser (Figure 4) depicts genomic regions at literature-based curation of human pathways and enzymes. user-selected resolution with semantic zooming that reveals Tier 2 Pathway/Genome Databases were computationally new features at higher resolutions. Visible features include generated, and then received significant subsequent curation. pseudogenes, promoters, transcription-factor binding BioCyc Tier 2 databases include sites, repeats, terminators, nucleotide sequence. Zoom to sequence. Generate genome poster. • BsubCyc: The curated metabolic and regulatory networks for this database cite 3,900 publications. 170 regulatory • Transcription-unit information page genes control 1,150 regulated genes. • BLAST search, sequence-pattern search via patmatch, map S SNPs to genes and show effects on translation p Figure 1: The Cellular Omics viewer paints omics datasets onto a diagram of the cellular biochemical network. Reaction lines can be colored with gene expression, proteomics, or reaction flux p Figure 3: Pathway collages depict data; compound nodes can be colored a user-selected set of metabolic pathways. with metabolomics data. Multi-omics data can be analyzed by coloring data onto reactions and metabolites simultaneously. Omics pop-ups graph omics data values using bar graphs, p Figure 2: The Regulatory Overview depicts the transcriptional regulatory heat maps, or X-Y plots. network in a PGDB. Here, the E. coli regulatory network is highlighted to show genes that regulate the gntR gene (blue lines), and the genes that are regulated by gntR (purple lines). The inner two rings are populated by transcription factors and sigma factors; the outer ring contains other genes. TOOLS FOR EXPLORING BIOLOGICAL NETWORKS • Regulatory Overview (Figure 2) presents the genetic • Pathway information page regulatory network stored in a PGDB. – Each pathway shows a detailed mini-review from MetaCyc • Dead-end metabolite identification algorithm. – Search pathways by name, substrates, length • Identify anti-microbial drug targets using a tool that computes metabolic choke points. • Reaction information page includes atom mappings • Metabolite information page ANALYSIS TOOLS FOR GENE EXPRESSION – Search metabolites by name, accession numbers, substructure, AND METABOLOMICS DATA mass, monoisotopic mass, elemental composition • SmartTables store lists of genes or metabolites. Browse • Customize pathway diagrams for figures in publications database attributes, share with colleagues, transform (add/remove gene names, enzyme names, chemical to pathway lists, perform enrichment analysis. structures, omics data) • The Omics Dashboard presents a visual read-out of the • Create personalized pathway diagrams – pathway collages – expression status of all cellular systems to facilitate a rapid by assembling groups of pathways into one diagram, moving top-down user survey of cellular responses (Figure 4). pathways relative to one another, customizing display styles, • and adding omics data. Cellular Omics Viewer (Figure 1) enables the user to paint omics datasets onto the Cellular Overview diagram. • Cellular Overview diagrams (Figure 1) are organism-specific Scientists can interpret gene expression, proteomics, depictions of metabolic and transporter networks that and metabolomics datasets in a pathway context, including are zoomable and searchable. animation of time-course or comparative datasets • Route Search tool finds minimum-cost paths between (example animation at http://biocyc.org/ov-expr.shtml). metabolites in the metabolic network (Figure 6). Route • Paint omics data onto individual pathways Search paths maximize the number of atoms conserved and pathway collages. from feedstock to target by using an extensive library • of reaction atom mappings. Regulatory Omics Viewer paints omics datasets onto the regulatory network to enable comparisons of expression measurements with regulatory mechanisms. • Genome-browser tracks facility (Figure 5) allows user datasets to be plotted against the genome. tq Figure 4. Left: Omics Dashboard depicts high-level summaries of transcriptomics data across multiple cellular subsystems; the user can click on a subsystem to drill down to more detailed depictions of expression levels within each

Biocyc Data Sheet

Model-Based Integration of Genomics and Metabolomics Reveals SNP Functionality in Mycobacterium Tuberculosis

6.047/6.878 Lecture 4: Comparative Genomics I: Genome Annotation Using Evolutionary Signatures

The Distribution and Evolution of Arabidopsis Thaliana Cis Natural Antisense Transcripts Johnathan Bouchard, Carlos Oliver and Paul M Harrison*

Transcriptional Interferences in Cis Natural Antisense Transcripts of Humans and Mice

Application of Comparative Genomics for Detection of Genomic Features and Transcriptional Regulatory Elements Hong Lu Iowa State University

Comparative Genomics of Gossypium and Arabidopsis: Unraveling the Consequences of Both Ancient and Recent Polyploidy

Comparative Genomics for Reliable Protein-Function Prediction from Genomic Data

Comparative Genomics and Transcriptomics Analysis Reveals Evolution Patterns of Selection in the Salix Phylogeny

Chapter 6. Comparative Genomics Contents 6

Efficient Comparative Genomics with Low Coverage Data Using PALADIN

Integrated Omics: Tools, Advances and Future Approaches

Plant Paleopolyploidy James C