Kinetic Modeling Using Biopax Ontology
Total Page:16
File Type:pdf, Size:1020Kb
Kinetic Modeling using BioPAX ontology Oliver Ruebenacker, Ion. I. Moraru, James C. Schaff, and Michael L. Blinov Center for Cell Analysis and Modeling, University of Connecticut Health Center Farmington, CT, 06030 {oruebenacker,moraru,schaff,blinov}@exchange.uchc.edu Abstract – e.g. Cytoscape (http://cytoscape.org, [5]), cPath database http://cbio.mskcc.org/software/cpath, [6]), Thousands of biochemical interactions are PathCase (http://nashua.case.edu/PathwaysWeb), available for download from curated databases such VisANT (http://visant.bu.edu, [7]). However, the as Reactome, Pathway Interaction Database and other current standard for kinetic modeling is Systems sources in the Biological Pathways Exchange Biology Markup Language, SBML ([8], (BioPAX) format. However, the BioPAX ontology http://sbml.org). Both BioPAX and SBML are used to does not encode the necessary information for kinetic encode key information about the participants in modeling and simulation. The current standard for biochemical pathways, their modifications, locations kinetic modeling is the System Biology Markup and interactions, but only SBML can be used directly Language (SBML), but only a small number of models for kinetic modeling, because elements are included in are available in SBML format in public repositories. SBML specifically for the context of a quantitative Additionally, reusing and merging SBML models theory. In contrast, concepts in BioPAX are more presents a significant challenge, because often each abstract. SBML-encoded models typically contain all element has a value only in the context of the given data necessary for simulations, such as molecular model, and information encoding biological meaning species and their concentrations, reactions among is absent. We describe a software system that enables these species, and kinetic laws for these reactions. a variety of operations facilitating the use of BioPAX This data is uniquely identified within a given SBML data to create kinetic models that can be visualized, model, but often it has no value if considered outside edited, and simulated using the Virtual Cell (VCell), of it: there is no way to compare the SPECIES element including improved conversion to SBML (for use with with name S1 of model 1 with the SPECIES element other simulation tools that support this format). with name S1 of model 2 in many SBML files. The recent introduction in SBML of the SBOTERM attribute to support the Systems Biology Ontology (SBO), and 1. Introduction the standardization of the ANNOTATION elements, solves this problem only partially – since these are optional, and relatively new. SBML does not require 1.1. Motivation the use of SBOTERM in order to encode relationships, or the use of ANNOTATION to uniquely identify model Currently, a great deal of information about elements outside of the model itself (by the use of signaling pathways (ranging from complete pathways, references to controlled vocabularies). Moreover, to just molecules participating in such pathways, or to when the ANNOTATION element is being used, SBML just individual interactions) can be obtained in does not enforce any constraints on its content, and standardized formats from multiple online resources. therefore, for example, two SPECIES elements that are The Biological Pathways Exchange standard uniquely identified within the model by different ID (BioPAX, [1], [2], http://biopax.org) allows extracting attributes, may have the same identification qualitative information from Reactome database ([3], information included in ANNOTATION elements (for http://www.reactome.org/), Pathway Interaction example, phosphorylated and unphosphorylated forms Database (http://pid.nci.nih.gov/), BioCyc collection may be linked to the same external database of Pathway/Genome databases ([4], http://biocyc.org) reference). It is the liberty and the burden of the and more (for current listing see http://biopax.org). A SBML producer to properly curate the models in a growing number of tools for analysis and visualization comprehensive and consistent way. Currently, there of interaction networks support the BioPAX standard are few resources that provide publicly accessible vocabularies and other namespaces has been SBML models that consistently include such standardized. Unfortunately, most models in SBML information. Meanwhile, most pathways available currently use few or none of these features. For from public repositories in BioPAX format, while not example, the largest public resource of curated SBML having the necessary kinetic information required for models, the BioModels database [10], although it does simulation, do typically include unique identification use cross-referencing to controlled vocabularies, it of all elements through external references, as well as does not yet include ontology information. additional information regarding relationships between The BioPAX ontology was created from the the elements of the pathway which allow for beginning with the purpose of providing a pathway automated reasoning. Providing a modeling exchange format that aims to facilitate sharing of framework that uses data in BioPAX format and pathway information between databases and users. facilitates conversion to SBML would solve two big BioPAX is based on OWL (Web Ontology Language) problems: (i) use of abundant sources of well-curated that is designed for use by applications that need to quantitative data, and (ii) creating easily reusable process the content of information instead of just quantitative models. presenting information to humans. OWL provides a framework for controlled vocabulary along with a 1.2. BioPAX, SBML, and SBO formal semantics. BioPAX concepts, unlike generic XML concepts, have relationships to each other that Systems Biology Markup Language (SBML) is can be processed automatically (see [11, 12] for more designed mainly to enable the exchange of quantitative information on using BioPAX vs. SBML). An models of biochemical networks between different automatic reasoner can infer that if B is a kind of A, simulation software packages with little or no human then B inherits all of A’s property definitions. These intervention. One feature of simulation-centric XML relationships between different concepts are the key to standards, such as SBML and CellML [9], is that no merging or linking different sets of information from hierarchy of different types of molecular species or different sources. Additionally, each element of a different types of interactions is necessary to be BioPAX file is linked to an originating biological encoded. A simulation software simply needs a list of database, providing for a well-documented biological things of the same kind, called SPECIES, and a list of identification for each element of the model. These things of the same kind, called REACTIONS, uniquely two features make the BioPAX standard a practical identified within the model, and mathematical tool for reusable modeling modules. However, it has information such as kinetic laws, initial conditions, no support for all the critical information required for etc., in order to reproduce a certain simulation result. building a quantitative model and running simulations. Additional information that can help a human understand the meaning of the model elements and 1.3. Challenges and solutions their relationships, as well as unique identification of elements across different models, can simply be Currently, there are multiple converters between ignored by the simulator in the context of the specific the SBML and BioPAX standards. The BioModels model to be simulated. In practice, it became database that stores curated models in SBML format apparent, that while one can reliably port SBML can convert each model into the BioPAX standard. models between different software tools, true The Reactome database that stores curated pathways reusability is limited. data in BioPAX format can generate an SBML file for As long as the data is small enough to be tweaked each selection. However, these converters do not by hand, flat and simple formats are most welcome. provide unique identification of species in SBML or The user knows what each symbol means and physical entities in BioPAX; thus, they do not solve therefore the software does not need to. But this is the problem of model reusability. The SBML output changing: as projects grow, the need is growing to from the Reactome pathway database contains combine data from different sources and to process absolutely no additional information for SPECIES and them by software sophisticated enough to know that REACTION elements except names, and therefore these there is some sort of difference between a complex of can not be easily identified in the context of several proteins and a small molecule. SBML has evolved to different models. Since BioPAX is an extensible provide the means for this. As of Level 2, Revision 3, format, one possible approach is to extend it to support it includes direct support for SBO, which is a new and kinetic data for simulations – but the number and comprehensive ontology that covers both general complexity of the additional required abstractions is so biological relationships as well as model-specific ones. large as to dwarf the entire existing format. A much Additionally, support for the use of external controlled more practical approach is to rely on the SBML format for kinetic models, and (i) implement some of the concentrations, kinetic laws etc), but usually has a lot model-building operations