Kinetic Modeling using BioPAX ontology

Oliver Ruebenacker, Ion. I. Moraru, James . Schaff, and Michael L. Blinov Center for Cell Analysis and Modeling, University of Connecticut Health Center Farmington, CT, 06030 {oruebenacker,moraru,schaff,blinov}@exchange.uchc.edu

Abstract – e.g. Cytoscape (http://cytoscape.org, [5]), cPath database http://cbio.mskcc.org/software/cpath, [6]), Thousands of biochemical interactions are PathCase (http://nashua.case.edu/PathwaysWeb), available for download from curated databases such VisANT (http://visant.bu.edu, [7]). However, the as , Pathway Interaction Database and other current standard for kinetic modeling is Systems sources in the Biological Pathways Exchange Biology Markup Language, SBML ([8], (BioPAX) format. However, the BioPAX ontology http://sbml.org). Both BioPAX and SBML are used to does not encode the necessary information for kinetic encode key information about the participants in modeling and simulation. The current standard for biochemical pathways, their modifications, locations kinetic modeling is the System Biology Markup and interactions, but only SBML can be used directly Language (SBML), but only a small number of models for kinetic modeling, because elements are included in are available in SBML format in public repositories. SBML specifically for the context of a quantitative Additionally, reusing and merging SBML models theory. In contrast, concepts in BioPAX are more presents a significant challenge, because often each abstract. SBML-encoded models typically contain all element has a value only in the context of the given data necessary for simulations, such as molecular model, and information encoding biological meaning species and their concentrations, reactions among is absent. We describe a software system that enables these species, and kinetic laws for these reactions. a variety of operations facilitating the use of BioPAX This data is uniquely identified within a given SBML data to create kinetic models that can be visualized, model, but often it has no value if considered outside edited, and simulated using the (VCell), of it: there is no way to compare the SPECIES element including improved conversion to SBML (for use with with name S1 of model 1 with the SPECIES element other simulation tools that support this format). with name S1 of model 2 in many SBML files. The recent introduction in SBML of the SBOTERM attribute to support the Ontology (SBO), and 1. Introduction the standardization of the ANNOTATION elements, solves this problem only partially – since these are optional, and relatively new. SBML does not require 1.1. Motivation the use of SBOTERM in order to encode relationships,

or the use of ANNOTATION to uniquely identify model Currently, a great deal of information about elements outside of the model itself (by the use of signaling pathways (ranging from complete pathways, references to controlled vocabularies). Moreover, to just molecules participating in such pathways, or to when the ANNOTATION element is being used, SBML just individual interactions) can be obtained in does not enforce any constraints on its content, and standardized formats from multiple online resources. therefore, for example, two SPECIES elements that are The Biological Pathways Exchange standard uniquely identified within the model by different ID (BioPAX, [1], [2], http://biopax.org) allows extracting attributes, may have the same identification qualitative information from Reactome database ([3], information included in ANNOTATION elements (for http://www.reactome.org/), Pathway Interaction example, phosphorylated and unphosphorylated forms Database (http://pid.nci.nih.gov/), BioCyc collection may be linked to the same external database of Pathway/Genome databases ([4], http://biocyc.org) reference). It is the liberty and the burden of the and more (for current listing see http://biopax.org). A SBML producer to properly curate the models in a growing number of tools for analysis and visualization comprehensive and consistent way. Currently, there of interaction networks support the BioPAX standard are few resources that provide publicly accessible vocabularies and other namespaces has been SBML models that consistently include such standardized. Unfortunately, most models in SBML information. Meanwhile, most pathways available currently use few or none of these features. For from public repositories in BioPAX format, while not example, the largest public resource of curated SBML having the necessary kinetic information required for models, the BioModels database [10], although it does simulation, do typically include unique identification use cross-referencing to controlled vocabularies, it of all elements through external references, as well as does not yet include ontology information. additional information regarding relationships between The BioPAX ontology was created from the the elements of the pathway which allow for beginning with the purpose of providing a pathway automated reasoning. Providing a modeling exchange format that aims to facilitate sharing of framework that uses data in BioPAX format and pathway information between databases and users. facilitates conversion to SBML would solve two big BioPAX is based on OWL (Web Ontology Language) problems: (i) use of abundant sources of well-curated that is designed for use by applications that need to quantitative data, and (ii) creating easily reusable process the content of information instead of just quantitative models. presenting information to humans. OWL provides a framework for controlled vocabulary along with a 1.2. BioPAX, SBML, and SBO formal semantics. BioPAX concepts, unlike generic XML concepts, have relationships to each other that Systems Biology Markup Language (SBML) is can be processed automatically (see [11, 12] for more designed mainly to enable the exchange of quantitative information on using BioPAX vs. SBML). An models of biochemical networks between different automatic reasoner can infer that if B is a kind of A, simulation software packages with little or no human then B inherits all of A’s property definitions. These intervention. One feature of simulation-centric XML relationships between different concepts are the key to standards, such as SBML and CellML [9], is that no merging or linking different sets of information from hierarchy of different types of molecular species or different sources. Additionally, each element of a different types of interactions is necessary to be BioPAX file is linked to an originating biological encoded. A simulation software simply needs a list of database, providing for a well-documented biological things of the same kind, called SPECIES, and a list of identification for each element of the model. These things of the same kind, called REACTIONS, uniquely two features make the BioPAX standard a practical identified within the model, and mathematical tool for reusable modeling modules. However, it has information such as kinetic laws, initial conditions, no support for all the critical information required for etc., in order to reproduce a certain simulation result. building a quantitative model and running simulations. Additional information that can help a human understand the meaning of the model elements and 1.3. Challenges and solutions their relationships, as well as unique identification of elements across different models, can simply be Currently, there are multiple converters between ignored by the simulator in the context of the specific the SBML and BioPAX standards. The BioModels model to be simulated. In practice, it became database that stores curated models in SBML format apparent, that while one can reliably port SBML can convert each model into the BioPAX standard. models between different software tools, true The Reactome database that stores curated pathways reusability is limited. data in BioPAX format can generate an SBML file for As long as the data is small enough to be tweaked each selection. However, these converters do not by hand, flat and simple formats are most welcome. provide unique identification of species in SBML or The user knows what each symbol means and physical entities in BioPAX; thus, they do not solve therefore the software does not need to. But this is the problem of model reusability. The SBML output changing: as projects grow, the need is growing to from the Reactome pathway database contains data from different sources and to process absolutely no additional information for SPECIES and them by software sophisticated enough to know that REACTION elements except names, and therefore these there is some sort of difference between a complex of can not be easily identified in the context of several proteins and a small molecule. SBML has evolved to different models. Since BioPAX is an extensible provide the means for this. As of Level 2, Revision 3, format, one possible approach is to extend it to support it includes direct support for SBO, which is a new and kinetic data for simulations – but the number and comprehensive ontology that covers both general complexity of the additional required abstractions is so biological relationships as well as model-specific ones. large as to dwarf the entire existing format. A much Additionally, support for the use of external controlled more practical approach is to rely on the SBML format for kinetic models, and (i) implement some of the concentrations, kinetic laws etc), but usually has a lot model-building operations at the level of the BioPAX of auxiliary information which may be not essential files, before translation to SBML, and (ii) make use of for simulations (organisms, different names, linking existing SBML facilities to carry over the ontology molecular species to a variety of databases, etc). and controlled vocabulary information during BioPAX model can be easily visualized more translation. expressively than SBML models, by relying on this Primary challenges of converting of BioPAX data type of information – for example, by giving different into a kinetic model include: (1) merging several BioPAX objects (proteins, small molecules, BioPAX files through unique identification of complexes etc) different representations (e.g. colors). BioPAX objects; (2) converting a BioPAX file into Each object is linked to biological information from SBML format by deciding whether references to the public databases. A modeler will need to add some same BioPAX entity have to be represented by the information to convert a BioPAX model into a same or different SBML SPECIES; (3) annotating the SBML model such that annotations would uniquely computable kinetic model in SBML format. Figure 1 illustrates a summary of possible workflows using the identify all SPECIES and interactions across multiple datasets, and not just within the given model, and BioPAX modeling framework. preserving the relationship information among model DB1 DB2 ( VCell repository (Reactome.or) (NetPath.org) DB3 elements; (4) adding simulation-specific information of reusable (BioModels.net)BioModels.net) SBML Models such as kinetic laws and initial conditions. User BioPax modeling BioPax modeling framewoek framewoek intervention may be required to perform tasks (1) and

(2), especially when data is coming from different ReusableSBML SBML BioPAXMetaModel Model BioPAXMetaModel Model ReusableSBML SBML sources. To reduce the user intervention to a Model (BioPax) (BioPax) Model minimum, if not eliminate it entirely, we employ a BioPax modeling VCell VCell series of sophisticated tests based on a wide variety of framewoek Physiological Physiological attributes, including unique database identifiers, names Model Model Merged and types of entities and interactions and relationships BioPAXMetaModel Model between them. Task (3) should be performed ReusableSBML SBML automatically, and the SBML file should ideally have Model a one-to one mapping with information from the BioPAX file. Task (4) is usually performed manually, VCell Physiological but it could also be automated via retrieval of kinetic Model information from online sources. The implementation Fig. 1. BioPAX modeling framework. Data from of mechanisms to facilitate tasks (3) and (4) fall multiple sources in BioPAX format are imported into beyond the scope of this manuscript, but are briefly the BioPAX modeling framework, and can be discussed further where relevant. converted into BioPAX-annotated reusable SBML, and further exported into the VCell modeling framework. The BioPAX modeling framework can be used to 2. Modeling using BioPAX merge several BioPAX files. BioPAX-annotated SBML files can be stored locally or in the VCell database. Creating a kinetic model using the data in BioPAX Solid lines denote the implemented conversions. standard is a non-trivial problem. Most pathway Dashed lines denote features to be implemented. databases are not model centric, and to build a particular model, one would typically select several A BioPAX model can be converted into SBML for elements from such a database (or several databases), use with different simulation software tools, and can i.e. several separate BioPAX files which should then be exported into the Virtual Cell software framework be processed. Thus one important feature of the (VCell) (http://vcell.org, [13]) for running kinetic system is to have an algorithm for identifying simulations and further model development. BioPAX molecular species and reactions that are common in models along with easily reusable, BioPAX-annotated the different BioPAX files and allow for easy merging SBML models will be stored in the VCell database. of different pathway elements into a model. We have Several BioPAX models can be merged into a larger designed a BioPAX modeling framework intended to BioPAX model. The merged model can be compactly obtain, store, merge and complement data in BioPAX, visualized as a set of modules, where all elements of thus facilitating generation of kinetic models, such as the same BioPAX model are compressed into a single can be expressed in SBML. The BioPAX data in container node. general lacks simulation-related information (such as 3. Modeling framework design (2) Mapping of BioPAX objects onto sets of SPECIES and REACTIONS. For example, several objects The BioPAX modeling framework prototype that refer to the same p-entity can translate into one or implementation is designed to be a part of the VCell more SPECIES due to modifications, such as the software. To handle the BioPAX ontology classes, we phosphorylated or unphosphorylated state of a protein. use Jena. Jena is a Java application programming (3) Converting a BioPAX model into a fully interface that provides support for handling RDF annotated SBML model. Each SPECIES and REACTION (Resource Description Framework) documents, in the SBML model has an ANNOTATION element that including OWL documents, such as BioPAX files. In uniquely identifies this element, in the same way as the description below, we will use the following BioPAX does. notations: This processing is explained in more detail below. (1) By p-interactions we denote the elements of the BioPAX class type physicalInteraction, as well as of 3.2. BioPAX to model conversion those of all of its subclasses (for example, BIOCHEMICALREACTION). 3.2.1 Merging BioPAX files and identification of (2) By p-entities we denote the elements of the unique resources. OWL provides means for one BioPAX class PHYSICALENTITIES, as well as of those file to refer to another (via OWL:IMPORT) and to link objects as identical (via OWL:SAMEAS). When several of all of its subclasses (for example, PROTEIN,). (3) By p-participants we denote the elements of the files are coming from the same source, it may be enough to simply compare UNIFXREF. When the files BioPAX utility class PHYSICALENTITYPARTICIPANT, are coming from different sources, the complexity of and of all of its subclasses (SEQUENCEPARTICIPANT). the problem varies widely because information from Generally, in BioPAX pathways, there is one p- different BioPAX files can be very different. In the participant for each participation of a p-entity in a p- best case scenario, any p-entity may still have the interaction, and a set of p-participants will be mapped complete specification, including a type (for example onto SPECIES in SBML. A p-entity corresponds to a RNA, PROTEIN, etc), properties specific for the type reacting entity independent of location and (for example, SEQUENCE for a PROTEIN, CHEMICAL- modifications (such as phosphorylation of proteins), FORMULA for a SMALLMOLECULE), and one or more and thus in many cases corresponds to SPECIESTYPE in references to a database identifier. In the worst case SBML. scenario, each SPECIES may just be called a

PHYSICALENTITY with no additional detail (an 3.1. Features example is when the user brings in a BioPAX file converted from SBML by the BioModels database As the framework is based on Jena, which allows sbml2biopax tool). To identify a list of unique handling of generic OWL files, it is not tied to the SPECIES, we use a series of tests that provide either a current version of BioPAX and can support BioPAX certain decision, or give a likelihood score that two p- extensions and future versions of BioPAX. The entities refer to the same resource. See Figure 2 for framework can use data taken from databases offering additional details about the algorithm. A similar BioPAX web interfaces, from BioPAX files provided algorithm for identifying SPECIES will be described in by the user, and from the VCell repository of BioPAX the next subsection in more detail. modeling projects (which are described in more detail in section 3.2). Some of this data, such as data taken 3.2.2 Identification of SPECIES and REACTIONS. A from the VCell repository of BioPAX files and crucial question to be answered when converting from BioPAX-annotated SBML files, might already be in a BioPAX to SBML is whether two p-participants state that allows a straight-forward translation into should correspond to one SPECIES or two different kinetic simulation models; while in general, data SPECIES, which will be decided by an algorithm available as BioPAX ontology documents, such as similar to the one described in the previous subsection. from most public pathway databases, will require non- Here, we describe the most important steps. trivial processing and the addition of information. Two p-participants are the same SPECIES if they Processing primarily includes: have the same location and the same chemical identity. (1) Merging BioPAX files, including identification The same chemical identity can be assumed if only if and linking of p-entities referring to the same resource they refer to the same p-entity with the same from different files. For example, two files brought modifications. If the p-entity is a complex, all into the framework might each have an entry for the components have to be the same. A SMALLMOLECULE same protein, possibly with a slightly different name. with modifications is a different object. To find out whether two RNAs, DNAs, or PROTEINs have the same for example, that if the normalized distance is less modifications, we evaluate SEQUENCEFEATURES. This than 0.2, there is only a five percent chance that they usually gives a definite answer in the case of p- are not the same species, and then the score would be participants of the same REACTIONS, but often only the negative logarithm of 0.05. likelihoods for p-participants of different REACTIONS. The lack of definite answers in the latter case stems no unifXREF yes are the same else from problems with SEQUENCEFEATURE recommended Distinct Identical usage, such as using SEQUENCEFEATURES both for resources resources chemical modifications (e.g. phosphorylations) and for no Names are yes similar description of non-modified features (e.g. binding Increase sites), mentioning only SEQUENCEFEATURES “relevant score to the interaction”, and defining the same no Names are yes long SEQUENCEFEATURE separately for each interaction. In Increase some, but not all, cases, these issues can be resolved score by evaluating a sequenceFeature's FEATURETYPE (e.g. no Names are yes phosphorylation site) and FEATURELOCATION, if it is identical Increase specific enough. The BioPAX standard notes that score sequence features might be replaced in future versions no Types are yes by states, which we expect to be a great improvement the same (for our purposes). If proper information on sequence noTypes are yes Increase features is absent, we might still infer chemical compatible score modifications by analyzing other properties, for Distinct Individual example similarity of names or the occurrence of resources types tests phrases like “phospho”. Since reactions listed in the Fig. 2. Flowchart of algorithm for identification of same file usually form one network, is very likely that unique p-entities. The first “else” stands for the case at least one p-participant per reaction is the same when either one or both p-entities have no UNIFXREF. SPECIES as a p-participant in another reaction. Individual types tests include comparing sequence Similarly, it is somewhat likely that not all p- property for PROTEIN and RNA, chemical-formula for participants referring to a p-entity are a different SMALLMOLECULE, etc. The user sets negative and positive thresholds for the score function. If the score is SPECIES. All of these likelihoods contribute to above a positive threshold, p-entities are declared to computing a score, which will be compared to a be identical, if below negative threshold – distinct, tunable threshold for automatic or manual decision- otherwise the user has to make the decision. making. The score for each test roughly corresponds to the 3.2.3 Extension of BioPAX. BioPAX is based on negative logarithm of the probability that a statement OWL, which provides a standard to extend any (e.g. two p-entities being the same species) is false ontology easily by adding new properties to existing although tests are positive. If the tests have a low classes. There are two main issues that we deal with: probability of false positives and little correlation to (1) To store in a BioPAX file the information each other, each positive test result lowers the necessary to enable immediate and fully automatic probability that the statement is false by a factor conversion to SBML, we introduce to BioPAX two roughly equal to the probability of a false positive new properties, SPECIESID and SPECIESTYPEID. These divided by the probability of a correct positive. If two correspond to SBML's SPECIES and SPECIESTYPE tests have high degree of correlation or high false elements, and can be assigned to p-participants and p- positive probabilities, they may be combined and treated as one test. The total score will roughly entities. A SPECIES in SBML is an entity which can be estimate the negative logarithm of the probability that quantified in a kinetic simulation, and a SPECIESTYPE the statement is wrong in spite of one or more positive is a set of SPECIES differing only in location. test results. Therefore, in SBML, two entities are the same For example, we can estimate the probability that SPECIES if they are of the same SPECIESTYPE and at two p-entities are the same species if they have similar the same location, and they are different, if the names. Similarity can be measured by edit distances location or the SPECIESTYPE is different. It is optional (such as Damerau-Levenshtein [14] or Jaro-Winkler for a SPECIES to belong to a SPECIESTYPE. In many [15] distance). For the case of Damerau-Levenshtein cases, a p-entity in BioPAX corresponds to a we need to divide it by the length of the longer name SPECIESTYPE in SBML, and in this case, all referring to get a number between zero and one. We might say, p-participants with the same location correspond to the same SPECIES. Therefore, this is our default for a BQBIOL:IS, BQBIOL:ISVERSIONOF and conversion if no SPECIESID or SPECIESTYPEID is BQBIOL:ISDESCRIBEDBY qualifiers. given. Exceptions are primarily rnas, dnas and proteins The detailed description of this process is outside with modifications, such as phosphorylations, which the scope of this paper. Furthermore, potentially the cause the same p-entity to correspond to different most effective and powerful approach to create a SPECIESTYPES and different SPECIES. The properties mapping of all BioPAX information into SPECIESID and SPECIESTYPEID are introduced to mark corresponding SBML files is the use of the SBOTERM attribute which is included in SBASE and thus these exceptions. A SPECIESID or SPECIESTYPEID can be added to any p-participant or p-entity to control to inherited essentially by all relevant SBML classes. SBO is designed to be more comprehensive ontology which SPECIES or SPECIESTYPE it corresponds in framework than BioPAX, including definitions and SBML. Adding a SPECIESID or SPECIESTYPEID to a p- relationship hierarchies for concepts that are specific entity is the same as adding it to all referring p- to quantitative modeling (in the modeling framework, participants which do not have this property already quantitative parameter, mathematical expression, and assigned. event branches). The scope of BioPAX discussed here (2) To be able to use tools which support BioPAX, (p-entity and p-particpant) is mainly covered by the but do not tolerate extensions, we enclose each non- participant type branch composed of the participant standard property within a comment tag. Thus, functional type and participant physical type sub- standard tools will simply see a comment, which is a branches, applied to the SPECIES, SPECIESTYPE, and standard BioPAX property and which does not SPECIESREFERENCE elements. However, since SBO is mandate processing. Our framework will read relatively new, and still expanding and evolving quite comments searching for new properties. For example, rapidly, we chose to relegate this approach to future SPECIESID is declared through: versions of the software framework. 3.3. Data organization: BioPAX modeling speciesID project

3.2.4 Mapping BioPAX files onto SBML files When the user imports a BioPAX file into the annotated with ontology and controlled vocabulary framework, he has an option to create a new BioPAX information. The SBML language specification has modeling project or to add the file to an existing recently introduced two features that facilitate and project. Projects are stored as a collection of BioPAX standardize the inclusion of additional information that files either locally or in the VCell database. The is not required for the numerical interpretation of the project consists of: (1) the data-source: BioPAX files model, but which can help describe the model and imported into the framework that are designated read- relate model and model elements to each other, both only and stay unmodified; and (2) BioPAX models within the same file or between files from different and BioPAX-annotated SBML models. BioPAX sources. The most flexible approach is the use of the models have new properties added to store a log of standardized syntax for the ANNOTATION element, files added to the project. If several files are added to which can decorate any class such as SPECIES or the project, the BioPAX model imports the source files REACTION. Additional XML namespaces defined at and marks identity relationships. Most ontology- arbitrary URIs can be included, using a restricted form processing tools like Jena and Protégé that can handle of the Dublin Core (http://dublincore.org) embedded any OWL file can process the composite model. in RDF. This generalized syntax allows one to add To create a model ready for simulation, the any valid RDF/OWL fragments to SBML elements, BioPAX model can be imported into the VCell effectively allowing us to directly add the BioPAX software or converted to SBML format. It can also be data (such as the corresponding p-entity information exported to a BioPAX file without extensions. This is for a particular SBML SPECIES). This is not quite useful if the user intends to process the file with tools trivial, though, especially for the case of molecular which expect BioPAX but do not tolerate extensions. complexes and modifications, where references to p- In this case new data is included as comments. participants need to be added, and additional “dummy” SBML SPECIES are being created. Additionally, the standardization of the RELATION element, a subtype of ANNOTATION, allows direct inclusion of UNIFXREF information using the a) components. Each object that is a model component can be visualized as an element of a graph.

b) 3.5. Model visualization

To facilitate the organization of the data and to make selections, the framework provides a graphical representation to view the source data as well as the data needed to create kinetic models ready for simulation. This graphical interface can handle any c) OWL model but specifically supports the BioPAX ontology. Inspired by the graphical user interface of the VCell “physiology” editor, it displays a graph consisting of p-interactions among p-entities in the VCell style (see Figure 3). This is a bipartite graph in its fully flattened form (with nodes for both entities and interactions), using for each of the p-entity and p- interaction classes a separate symbol. Each p- interaction is connected by an edge with the p-entities d) participating in it. Complexes are displayed in a way that alludes to their components. All other objects are hidden, until they are properties of an object which becomes selected.

4. Conclusions and future directions

We have introduced a modeling framework that is Fig. 3. Screenshots of BioPAX modeling user interface prototype. (a) Phosphorylation of Cyclin A:Cdk2 based on the BioPax ontology. Ontologies provide a complexes by Wee1 as displayed by Reactome. (b) A great deal of flexibility in data representation, analysis compact representation: only species and reactions and visualization. Users have the option of modifying are shown in the form compliant with the VCell GUI. (c) or adding new tests for identification of elements of An extended view, where a user gains an kinetic model and tweak corresponding score understanding that both complexes contain the same functions to address files coming from different components Cyclin_A and Cdk2; Cdk2 is a protein, sources. The tests themselves and the appropriate which participates in one of these complexes in models can be validated against known data. phosphorylated form. (d) The model exported into Visualization of ontologies can be made very flexible, VCell, kinetic parameters and rate constants added within VCell framework. allowing the user to select which resources are visible and which are hidden, similar to the CytoScape visualization framework. Arbitrary sets of objects can 3.4. System Architecture be collapsed and expanded again. The user can decide

which kinds of property relationships are represented The software is organized into several layers: (1) by graph edges. Currently, project files are stored in a initialization and configuration; (2) graphical user local directory. When such a file-based repository interface; (3) specification of actions initiated by the becomes large, searching will be inefficient and one user or by other events; (4) organization of needs to provide a BioPAX-compatible database. This background threads; (5) representation of the BioPAX can be accommodated by extending the VCell data; and (6) general utilities. Calls are made only database schema. Another possible option is to use within layers or from a higher to a lower layer. Lower BioWarehouse [16], which provides specific interfaces layers communicate to higher layers through message for different data sources. passing based on call-backs. The data is stored in RDF The framework provides explicit specification of format using the Jena API. On top of the RDF model each and every molecular species and interactions. is a layer of model components, where there is a However, the data may allow multiple interpretations, component for each node or statement of the model, as such as a given interaction can be applied to several well as components representing groups of smaller phosphoforms of a protein. An intelligent framework can be used to generate kinetic rules for interactions [17]. These efforts should be concurrent with development of the next level of BioPAX ontology References that describes protein modifications. A highly desirable feature is automated data [1] Luciano JS. (2005) PAX of mind for pathway retrieval and verification using external web- researchers. Drug Discov Today. 10:937-42. resources. After the user picks elements to be included [2] Luciano JS, Stevens RD. (2007) e-Science and biological in a model, the framework should try to infer pathway semantics. BMC Bioinformatics. 8 Suppl 3:S3. additional elements (interactions, modifications, [3] Vastrik I, et al. (2007) Reactome: a knowledgebase of kinetic constants) to make a kinetic model complete, biological pathways and processes. Genome Biol. 8. and then request this information from external web [4] Krummenacker M, Paley S, Mueller L, Yan T, Karp PD. (2005) Querying and computing with BioCyc databases. resources. For example, if the user develops a model Bioinformatics. 21:3454-5. involving interactions between proteins A and B, the [5] Shannon P, et al. (2003) Cytoscape: a software framework should search local files, the VCell environment for integrated models of biomolecular repository and external databases for all data that interaction networks. Genome Res. 13:2498-504. affect these interactions. Qualitative information, such [6] Cerami EG, Bader GD, Gross BE, Sander C. (2006) as reaction catalysts, can be requested from databases cPath: open source software for collecting, storing, and like Reactome that provide an API for querying and querying biological pathways. BMC Bioinformatics. 7:497. retrieving BioPAX data over the web. Quantitative [7] Hu Z, Mellor J, Wu J, Yamada T, Holloway D, Delisi C. information, such as kinetic constants, can be (2005) VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res. requested from emerging databases of reaction 33:W352-7. kinetics, such as SABIO-RK (http://sabio.villa- [8] Hucka M, et al. (2003) The Systems Biology Markup bosch.de/SABIORK). Language (SBML): A Medium for Representation and The model which is augmented with such Exchange of Biochemical Network Models. Bioinformatics quantitative data for simulation purposes will be 19, 524-531. encoded in SBML format. A critical capability is the [9] Lloyd CM, Halstead MD, Nielsen PF. (2004) CellML: its ability to preserve the ontology information from the future, present and past. Progress in Biophys and Mol. Biol. BioPAX format in the SBML format. Initially, this is 85, 433-450. [10] Le Novere N, et al. (2006) BioModels Database: a free, being implemented by extending the SBML model centralized database of curated, published, quantitative with all the relevant ancillary SPECIES (that otherwise kinetic models of biochemical and cellular systems. Nucleic may not be necessary for performing the actual Acids Res. 34:D689-91. numerical simulations), and by using the SBML [11] Stromback L, Lambrix P (2005) Representations of ANNOTATION element (that can encode arbitrary RDF- molecular pathways: an evaluation of SBML, PSI MI and type data) to hold the BioPAX-specific information. BioPAX. Bioinformatics 21:4401-7. This improved level of BioPAX/SBML translation [12] Stromback L, Jakoniene V, Tan H, Lambrix P. (2006) allows for algorithms and operations specific to Representing, storing and accessing molecular interaction ontology-processing tools to also be performed on the data: a review of models and tools. Brief Bioinform. 7:331-8. SBML files, thus enhancing the reusability of the [13] Moraru II, Schaff JC, Slepchenko BM, Loew LM. (2002) The Virtual Cell: An integrated modeling generated SBML files, as well as allowing reverse environment for experimental and computational cell generation of qualitative models from quantitative biology. Ann. NY Acad. Sci., 971:595–6. models that were further processed. Ideally, in the [14] Damerau FJ. (1964) A technique for computer detection long term, a more sophisticated translation mechanism and correction of spelling errors. Communications of the would be highly desirable, by using direct ontology- ACM, 7:171-176. level mapping to the newly developed SBO [15] Jaro MA. (1995) Probabilistic linkage of large public framework. health data files. Stat Med. 14:491-8. When fully implemented, such capabilities will [16] Lee TJ, Pouliot Y, Wagner V, Gupta P, Stringer-Calvert provide an intelligent data-driven modeling framework DW, Tenenbaum JD, Karp PD. (2006) BioWarehouse: a bioinformatics database warehouse toolkit. BMC that can exploit the growing number of systems Bioinformatics. 7:170. biology resources, such as pathway and model [17] Hlavacek WS, Faeder JR, Blinov ML, Posner RG, repositories, and experimental data repositories. Hucka M, Fontana W. (2006) Rules for modeling signal- transduction systems. Sci. STKE, re6.