Comprehensive Modeling Platform for Photosynthetic Organisms
Total Page:16
File Type:pdf, Size:1020Kb
MASARYK UNIVERSITY FACULTY OF INFORMATICS COMPREHENSIVE MODELING PLATFORM FOR PHOTOSYNTHETIC ORGANISMS THESIS MATEJ KLEMENT 2012 Contents 1 Introduction4 1.1 Objectives..................................5 2 State of the art6 2.1 Data Exchange formats..........................8 2.1.1 SBML................................8 2.1.2 CellML................................9 2.1.3 BioPAX................................9 2.1.4 PSI-MI................................ 10 2.1.5 SBGN................................ 10 2.1.6 Format of Matlab.......................... 11 2.1.7 Format Octave........................... 11 2.2 Data Exchange and modeling tools................... 11 2.2.1 Biomodels.net........................... 12 2.2.2 CellML.org.............................. 13 2.2.3 Copasi................................ 13 2.2.4 Vcell................................. 13 2.2.5 E-cell................................. 14 2.2.6 ProMot................................ 14 2.2.7 PaxTools............................... 15 2.2.8 Matlab................................ 15 2.2.9 Octave................................ 16 2.2.10Scilab................................ 16 2.2.11BioUML............................... 16 2.3 Annotation ontology............................ 17 2.3.1 Gene Ontology........................... 17 2.3.2 KEGG................................ 17 2.3.3 SBO................................. 17 2.4 Photosynthesis modeling......................... 17 3 Aims 18 3.1 Theoretical aims.............................. 18 3.2 Practical aims............................... 19 3.3 Methodology................................ 19 3.4 Progression schedule........................... 20 2 3.5 Expected Outputs............................. 20 4 Results 21 4.1 Design and specification......................... 21 4.1.1 Ontology tree............................ 21 4.1.2 Model structure.......................... 21 4.1.3 Connecting ontology and model................. 22 4.1.4 Annotation database........................ 22 4.1.5 Ontology and model annotation................. 23 4.2 Implemented system............................ 24 4.3 Conclusion................................. 27 5 Publications 28 3 Chapter 1 Introduction In last decades a great number of computer driven sciences has emerged which was caused by the fast development in microchip technology. One of those sciences is systems biology which is new field in biology aiming at system-level understanding of biological systems[14]. At the beginning molecular biology was researching biological systems and did remarkable progress in this area but recently is focusing on identifi- cation of genes and functions of their products which are components of systems. Next major task is to understand components of biological systems revealed by molecular biology at the system level. Systems biology was established to achieve this long-term task. While systems biology covers all aspects of analyzing behavior of system models computational systems biology aims only at the narrower part of this research. Compu- tational systems biology targets at understanding of system level of biological systems by analyzing biological data using computational techniques[16]. The latest enormous advance of genome sequencing projects, microarrays, proteomics and metabolomics moved this field forward giving more powerful tools and knowledge to discover re- lations and behavior among data. With systems biology in mind new sophisticated computational methods are being developed to analyze the data generated by that technology in systematic way deciphering complex and networked biological processes and phenomena taking place in cells, tissues and organisms. Latest development in information technology, cheap and accessible computer power, global networks and databases become widely accessible for mathematical modeling and simulation of com- plex biological systems. Simulation and modeling combines the use of different system analysis tools like discrete mathematics, stochastics, differential equations, complex system simulation with model-database integration architectures. Creating and test- ing of quantitative models unraveling hierarchical and non-linear character of cellular system will be feasible through cooperative work of theoretical and experimental bi- ologists working together with system analysts, computer scientists, mathematicians, engineers and physicists. These long-run efforts demand comprehensive tools to share knowledge and data among participating capacities. As a result of the latest trends moving from extremely reduced models and analyses which is caused by possibility of cooperation of large teams of scientists around the world, there are starting to be large amount of simulated data from thousands of com- ponents like mRNA or proteins. Connection of these simulated data creates compact 4 blocks of cellular machinery in action. Dynamic models describing these processes can be created from these blocks. These comprehensive models explicitly represents large amount of biochemical reactions at relatively high level of detail. But mentioned dynamic models present another challenge which originates from transcription of non- linear systems to models. This problem is estimation of numerical parameters which can be solved in inverse fashion where simulated data are compared to experiments by sophisticated software for searching of local and global minimum in multidimensional space. Last decade was fruitful for systems biology and formats, languages and tools han- dling these formats. Thanks to this development many tools were created and are used to present. All tools aimed on this field are mostly of general nature. This means there are not any tools dedicated for photosynthesis, its modeling and research. De- spite the fact photosynthesis research can bring solution of renewable fuels or artificial oxygen production main aims of current biology is research of DNA, mRNA, proteins, etc. Another pullback is that photosynthesis belongs to another field which is physics because of character of several reactions. This was reason of formation of CyanoTeam project which aims at solving of problems of photosynthesis. This project is done with cooperation with PSI company and Global Change Research Centre AS CR, v.v.i. The second chapter describes the current development of systems biology, more precisely ways of handling biological models, tools handling these models and annota- tions integrating these models in broader context. Third chapter deals with aims and objectives to be reached as well as steps which are necessary to undertake to reach this aims. Fourth chapter contains current results and state of work as well as described implementation. In fifth chapter are described achieved publications. 1.1 Objectives The main objective of this work is to create a tool and methodology providing for par- ticipating sides in photosynthesis research place for exchange and maintenance of dynamic models and knowledge about this process. Nedbal et al. created concept called Comprehensive Modeling Space[18] which describes fundamental ideas of photo- synthesis models specification. This work should propose solutions for Comprehensive Modeling Space conception and introduce methodology providing set of rules for correct encoding of modular models, data composition, suggest naming convention for indi- vidual model components (called Comprehensive Modeling Platform) and also should contain practical output in form of implemented application covering matters of visual- ization, sharing, exploring, maintenance, annotation and dynamic analysis of models on generally available platform running in the web environment. Mentioned models should support top-down and bottom-up modeling strategies. Moreover, the solution should support communication with common available tools and formats. The main benefit of the whole concept should be its domain specific aim which should bring pos- sibility to describe and understand better the given area of interest than general tools and approaches. 5 Chapter 2 State of the art In last decade systems biology went through a large improvement[14] caused by over- all progress in information technology. This fact facilitates better and faster sharing of information about particular examined processes. As a result of being systems biology new science, primarily researched areas are those of common interest which includes mostly DNA, gene profiling and protein-protein interactions. Computational systems biology concerns with subgroup of problems addressed by systems biology putting stress primarily on data analysis in systematic manner which originates from improvement of new technologies. It is necessary to have ability to exchange this newly discovered knowledge among specialists. Several languages for systems biology were created to share knowledge and models with this intention in mind. Systems biology models are mostly those of dynamic type what means they describe dynamics of modeled system in time and mainly are aimed for population development of examined process. The best-known and most common languages include the format SBML[7] along with the format CellML[21] while both these formats were created as subset of XML[8]. Advantage of these formats is keeping structure and lucidity of models necessary for cooperation of various teams and for passing of knowledge. There are more formats similar to those mentioned above like BioPax[5], PSI-MI[11] or SBGN[25] which area aimed for narrower part of scope. BioPax was developed for