A Systemic Multi-Scale Unified Representation of Biological Processes in Prokaryotes Vincent J
Total Page:16
File Type:pdf, Size:1020Kb
Henry et al. Journal of Biomedical Semantics (2017) 8:53 DOI 10.1186/s13326-017-0165-6 RESEARCH Open Access The bacterial interlocked process ONtology (BiPON): a systemic multi-scale unified representation of biological processes in prokaryotes Vincent J. Henry 1,2†, Anne Goelzer2*† , Arnaud Ferré1, Stephan Fischer2, Marc Dinh2, Valentin Loux2, Christine Froidevaux1 and Vincent Fromion2 Abstract Background: High-throughput technologies produce huge amounts of heterogeneous biological data at all cellular levels. Structuring these data together with biological knowledge is a critical issue in biology and requires integrative tools and methods such as bio-ontologies to extract and share valuable information. In parallel, the development of recent whole-cell models using a systemic cell description opened alternatives for data integration. Integrating a systemic cell description within a bio-ontology would help to progress in whole-cell data integration and modeling synergistically. Results: We present BiPON, an ontology integrating a multi-scale systemic representation of bacterial cellular processes. BiPON consists in of two sub-ontologies, bioBiPON and modelBiPON. bioBiPON organizes the systemic description of biological information while modelBiPON describes the mathematical models (including parameters) associated with biological processes. bioBiPON and modelBiPON are related using bridge rules on classes during automatic reasoning. Biological processes are thus automatically related to mathematical models. 37% of BiPON classes stem from different well-established bio-ontologies, while the others have been manually defined and curated. Currently, BiPON integrates the main processes involved in bacterial gene expression processes. Conclusions: BiPON is a proof of concept of the way to combine formally systems biology and bio-ontology. The knowledge formalization is highly flexible and generic. Most of the known cellular processes, new participants or new mathematical models could be inserted in BiPON. Altogether, BiPON opens up promising perspectives for knowledge integration and sharing and can be used by biologists, systems and computational biologists, and the emerging community of whole-cell modeling. Keywords: Systems biology, Multi-scale systemic description, Prokaryotic biological processes, Mathematical models, Biological ontology * Correspondence: [email protected] †Equal contributors 2INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France Full list of author information is available at the end of the article © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Henry et al. Journal of Biomedical Semantics (2017) 8:53 Page 2 of 16 Background general chemicals according to their chemical structures Systems biology emerged as a promising framework to and modifications [20]. The Systems Biology Ontology integrate the whole-cell for different model-organisms (SBO) provides a controlled vocabulary for kinetic param- [1–3]. However, current cell representations usually refer eters and mathematical models of biological processes to specific model organisms, which limits in practice the [21]. Taken together, the existing bio-ontologies cover the transfer of whole-cell models to non-model organisms. concepts necessary to the systemic representation of cells, In contrast, bio-ontologies are a suitable framework for i.e., biological processes, molecules and mathematical systematically describing biological objects and thus fa- models of biological processes. However, the systemic rep- cilitating knowledge transfer among organisms [4, 5]. In resentation of the whole cell cannot be handled without this paper, we address the following question: how to the addition of further logical relations between existing combine systems biology and bio-ontology? ontologies. Systems biology has its roots in engineering science and In this paper, we demonstrates that a systemic multi- conceptualizes the cell as a system composed of interacting scale representation of biological processes, the typical sub-systems [1, 6–11]. In this context, cellular processes are perspective of systems biology, can be formally described typically described as biological subsystems whose inputs as an ontology, and how this ontology can be built based (e.g. metabolites, proteins, or sequences, etc.) are converted on existing sparse bio-ontologies. As a proof of concept, into outputs by dedicated molecular machines. The mo- we developed the Bacterial interlocked Process ONtol- lecular machines are usually composed of proteins, con- ogy (BiPON) and showed that a) heterogeneous bio- sume energy and chemical building blocks, and display a logical processes can be described with the systemic characteristic of operation. This operation can be static or representation and b) be linked automatically to math- dynamic, deterministic or/and stochastic and is generally ematical models, and that c) information about these described by a formal mathematical model having inputs, processes can be enriched by automatic reasoning. As a outputs and model parameters. For example, a mathemat- use case, we focus on bacterial gene expression pro- ical model can be a nonlinear function or a set of ordinary cesses, which are well established and representative of differential equations. The systemic representation of cells known biological processes. They cover, among many is an efficient framework to interrelate all cellular entities other things, combination of polymers, sequence pat- (metabolites, proteins, cellular processes, sequences, etc.), terns, single molecules or complexes within biological together with their physical or biochemical properties (e.g. processes, as well as cyclic or branched-point processes. kinetic parameters, etc.) [1, 2].Systembiologiststhusneed We demonstrated on the use case how a systemic repre- now an adequate format of systemic description of the sentation of living cells can be formally described and in- whole cell to transfer and share their models. Existing stan- tegrated into an ontological model, and what benefits dardized formats for file exchange are adequate to exchange ensue from automatic reasoning on this ontology. mathematical models for specific cell processes [12, 13], but remain limited to describe a whole-cell model, i.e. a sys- Methods temic multi-scale representation of interacting complex Description of biological processes, corpus building and subsystems. entity tagging Bio-ontologies have been developed to formalize and In the absence of an exhaustive controlled vocabulary in integrate different pieces of biological knowledge [4]. systems biology, we use hereafter the notion of a “bio- The well-established Gene Ontology (GO) integrates the logical process”, which comprises the notions of (a) “bio- molecular functions of gene products (GO-MF) with cel- logical reaction” and “biochemical reaction” as in KEGG lular components (GO-CC) and biological processes (Kyoto Encyclopedia of Genes and Genomes [22]) Reac- (GO-BP) [14]. The combined sub-ontologies are com- tions database, (b) “biological phenomenon”, “biological monly used to annotate and characterize gene products pathway” and “biochemical pathway” as in PW or KEGG [5, 15], but there are also other useful bio-ontologies. Pathway database, and finally (c) “biological process” as The Ontology of Microbial Phenotypes links the pheno- in GO-BP. Moreover, we use the notion of a “chemical types of bacteria to cellular processes [16]. The Ontology entity” to denote any type of biological compound, in- of Genes and Genomes provides a list of genes from dif- cluding metabolites, proteins, protein complexes, poly- ferent organisms including prokaryotes [17], while the Se- mers, to cite a few. quence Ontology (SO) provides a detailed description of To develop a dedicated systemic representation for polymers and polymer sequence patterns [18]. At another each biological process involved in the bacterial gene ex- level, the Pathway Ontology (PW) provides a classification pression, we applied the standard state-of-art approach of metabolic, signaling and altered eukaryotic pathways of system engineering. The approach involves two main [19]. Independently, ChEBI (Chemical Entities of Bio- tasks. (A) We first gathered up-to-date available bio- logical Interest) acts as a reference for the classification of logical information about the biological process. (B) We Henry et al. Journal of Biomedical Semantics (2017) 8:53 Page 3 of 16 then converted the biological information into a sys- Elementary steps composing a biological process are temic representation using boxes, arrows, inputs and usually found in research articles while books or re- outputs, and a mathematical model. We describe and view articles provide global descriptions of processes. apply below the approach (A) and (B) on a specific ex- In a few cases, we used figures from didactic web ample (the formation of the 30S initiation complex) for sites and we checked the