Reasoning About Systems Biology Models in the Semantic Web Era
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE IN PRESS Journal of Theoretical Biology 252 (2008) 538–543 www.elsevier.com/locate/yjtbi The markup is the model: Reasoning about systems biology models in the Semantic Web era Douglas B. Kella,c,Ã, Pedro Mendesb,c,d aSchool of Chemistry, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK bSchool of Computer Science, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK cThe Manchester Centre for Integrative Systems Biology, Manchester Interdisciplinary Biocentre, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK dVirginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street 0477, VA 24061, USA Available online 3 December 2007 Dedicated to the memory of Reinhart Heinrich (1946–2006). Abstract Metabolic control analysis, co-invented by Reinhart Heinrich, is a formalism for the analysis of biochemical networks, and is a highly important intellectual forerunner of modern systems biology. Exchanging ideas and exchanging models are part of the international activities of science and scientists, and the Systems Biology Markup Language (SBML) allows one to perform the latter with great facility. Encoding such models in SBML allows their distributed analysis using loosely coupled workflows, and with the advent of the Internet the various software modules that one might use to analyze biochemical models can reside on entirely different computers and even on different continents. Optimization is at the core of many scientific and biotechnological activities, and Reinhart made many major contributions in this area, stimulating our own activities in the use of the methods of evolutionary computing for optimization. r 2007 Elsevier Ltd. All rights reserved. Keywords: Systems biology; Metabolic control analysis; Reinhart Heinrich 1. Introduction the ‘local’ properties of each of the elements of the network. Indeed, MCA is a highly principled version of For most of us, including the present authors, Reinhart sensitivity analysis, that has subsequently developed quite Heinrich’s great contribution was the independent co- independently in other fields (e.g. Lu¨ dtke et al., 2007; invention of metabolic control analysis (MCA) (Heinrich Rabitz and Alis-,1999; Saltelli et al., 2004), a recent and Rapoport, 1973, 1974), a crucial forerunner of the development being the fusing of elasticity analysis with flux modern systems biology in which we recognize in particular balance analysis (Smallbone et al., 2007). Although D.B.K. that systems are logical graphs or networks, with emergent had been traveling to East Berlin—and indeed the properties that depend in a highly non-linear manner on Humboldt University—for a couple of years previously (visiting the person who is now his wife), it was at the Il Ciocco meeting in 1989 (Cornish-Bowden and Ca´ rdenas, Abbreviations: ICSB, International Conference on Systems Biology; MCA, metabolic control analysis; OWL, Web Ontology Language; RDF, 1990) that he first met Reinhart, and enjoyed many Resource Description Framework; SBML, Systems Biology Markup profound conversations with him over 17 years, including Language; WSDL, Web Services Description Language; XML, eXtensible at the Yokohama International Conference on Systems Markup Language. Biology (ICSB) meeting when they discussed science à Corresponding author. School of Chemistry, The University of together for the last time. In 1989, of course, there was Manchester, 131 Princess Street, Manchester M1 7DN, UK. Tel.: +44 161 306 4492. no Web, and few or no common metabolic modeling E-mail address: [email protected] (D.B. Kell). packages. At Il Ciocco, P.M., then a fresh graduate, also URL: http://www.dbkgroup.org (D.B. Kell). met Reinhart for the first time, and demonstrated an early 0022-5193/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2007.10.023 ARTICLE IN PRESS D.B. Kell, P. Mendes / Journal of Theoretical Biology 252 (2008) 538–543 539 version of Gepasi (Mendes, 1993, 1997, 2001; Mendes and technologies. It is also important to recognize that a model Kell, 1998) on an Amstrad computer with no hard disk is ‘just that’; it is a representation of reality (Kell and (Mendes, 1990). P.M. still remembers vividly the excite- Knowles, 2006), just as is Magritte’s famous painting ment that Reinhart displayed for being West of the Iron of a pipe (http://en.wikipedia.org/wiki/The_Treachery_Of_ Curtain; little did we know that this curtain was about to Images). fall within a year, by the time Reinhart organized a SBML, now at level 2 version 3, allows one to describe scientific meeting in Holzhau. It was also at the ICSB accurately the structure of most models in which there is meeting in Yokohama that P.M. last talked with Reinhart. not a highly complex spatial segregation within a What meetings like Il Ciocco represent are opportunities compartment nor combinatorial variants of, e.g., protein for the global network of scientists to interact efficiently, phosphorylation states. SBML is an XML (eXtensible and it is Reinhart’s huge contribution to the world of Markup Language) that allows one to describe facts in a metabolic network modeling and optimization, within the principled way by marking them up in a standardized concept of the global family of science and scientists, that ‘language’. As well as allowing the exchange of models, we would wish to acknowledge with pleasure. SBML provides a very convenient means for storing them Interactions between scientists concerned with modeling in model databases such as Biomodels (www.biomodels. and systems biology involve the exchange and analysis of net)(Le Nove` re et al., 2006). The present version of SBML such models, but models could and can be created and also now allows one to mark up the models with represented in many ways that would not be comprehen- rudimentary semantic information. In a similar way, the sible without access to all the software (and often the very Semantic Web (e.g. Alesso and Smith, 2006; Baker and same computer) that was used to create them. In other Cheung, 2007; Berners-Lee and Hendler, 2001; Berners-Lee words, there were certainly no standards of interoperability et al., 2006; Fensel et al., 2003; Stevens et al., 2006; Taylor between such programs, and it was not until 10 years et al., 2006), will allow the semantic annotation of such later in Visegrad (Cornish-Bowden and Ca´ rdenas, 2000; XML documents using the Resource Description Frame- Kell and Mendes, 2000)(http://bip.cnrs-mrs.fr/bip10/ work (RDF) and Web Ontology Language (OWL), while meet99.htm), also attended by Reinhart, that such com- the Web 2.0 concept (http://en.wikipedia.org/wiki/Web_2) munity discussion began in earnest, culminating so far in relates to the general increase in social or community the massively important Systems Biology Markup Lan- involvement in developing a particular domain. guage (SBML) (Hucka et al., 2003)(www.sbml.org). More Although it is plausible that RDF will add significantly specifically, a mailing list entitled MMFF (metabolic model to XML (Wang et al., 2005) (and SBML L2V3 integrates it file format) was set up by one of us (P.M.) in April 1999, now), the great power of SBML (Kell, 2006a, b) is its just after Visegrad. The members of this list created a draft ability to lie at the focus of a distributed computing model specification of a common metabolic file format (PMB; in which loosely coupled software programs allow its portable metabolic binary format). Most of the subscribers integration in pipelines or workflows constructed via a to this list joined the SBML effort in April 2000. The MDL distributed, service-oriented architecture (Foster, 2005; (model definition language) specification draft, a predeces- Hey and Trefethen, 2005). This loose integration reflects sor of SBML, was created in August 2000, as was the the thinking behind the Systems Biology Workbench sysbio mailing list. By September 2000 (in the first revision (Sauro et al., 2003), but we consider (Kell, 2006a, b, of this MDL document), SBML is referred to as such 2007; Kell and Paton, 2007) that more global distributed among this community. At all events, it is this last aspect, workflow environments such as Taverna (Hull et al., 2006; the importance of a lingua franca (Kell, 2006b), on which Oinn et al., 2004, 2006, 2007)(www.taverna.sourceforge.net) we wish first to focus here. are likely to provide the environment of choice. An obvious advantage of Taverna is that by using widely adopted 1.1. The medium is the message interoperability standards (WSDL and Web Services), it can readily make available a large number of diverse Marshall McLuhan’s famous slogan (http://en.wikipedia. services that can be incorporated into systems biology org/wiki/The_Medium_is_the_Massage) (e.g. (McLuhan workflows. As usual, the cost of adoption of standards is and Fiore, 1971)) (that predates the original ARPANET largely paid for by the benefits that come from compat- by 2 years and the widespread civilian use of the Internet ibility with other software. There are a great many modules by 20) not only made widespread the concept of the global that perform useful activities and that can beneficially write village but is intended to convey the idea that the generic to SBML (see, e.g. Kell, 2006b, 2007 and Fig. 1). One form of a medium is more important than its ‘meaning’ or example might be Web Services to retrieve bibliographic ‘content’. Consistent with this, it appears (Kell, 2006a) references that could be useful to annotate models that the initial difficulties of interoperability between (Ananiadou and McNaught, 2006; Ananiadou et al., modern software systems are much more about data 2006), while another whole area covers the optimization, structures (syntax) than about their meaning (semantics) in some sense or senses, of the models themselves (e.g.