In Silico Cell Biology and Biochemistry: a Systems Biology Approach
Total Page:16
File Type:pdf, Size:1020Kb
In silico cell biology and biochemistry: a systems biology approach Diogo M. Camacho Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Genetics, Bioinformatics and Computational Biology Pedro Mendes, Chair Ina Hoeschele Reinhard Laubenbacher Vladimir Shulaev Brenda Winkel June 1, 2007 Blacksburg, Virginia Keywords: Systems biology, computer simulation, mathematical modeling, reverse engineering, computational biology, biochemistry Copyright 2007, Diogo M. Camacho Abstract In silico cell biology and biochemistry: a systems biology approach Diogo M. Camacho In the post-‘omic’ era the analysis of high-throughput data is regarded as one of the ma- jor challenges faced by researchers. One focus of this data analysis is uncovering biological network topologies and dynamics. It is believed that this kind of research will allow the devel- opment of new mathematical models of biological systems as well as aid in the improvement of already existing ones. The work that is presented in this dissertation addresses the prob- lem of the analysis of highly complex data sets with the aim of developing a methodology that will enable the reconstruction of a biological network from time series data through an iterative process. The first part of this dissertation relates to the analysis of existing methodologies that aim at inferring network structures from experimental data. This spans the use of statistical tools such as correlations analysis (presented in Chapter 2) to more complex mathematical frameworks (presented in Chapter 3). A novel methodology that focuses on the inference of biological networks from time series data by least squares fitting will then be introduced. Using a set of carefully designed inference rules one can gain important information about the system which can aid in the inference process. The application of the method to a data set from the response of the yeast Saccharomyces cerevisiae to cumene hydroperoxide is explored in Chapter 5. The results show that this method can be used to generate a coarse-level mathematical model of the biological system at hand. Possible developments of this method are discussed in Chapter 6. This work was financially sponsored by the National Institutes of Health under grant R01- GM068947. To my parents, to my sister iii Acknowledgments I would like to thank, first and foremost, Dr. Pedro Mendes, my principal advisor, for all his help and guidance through my whole Ph.D., by providing his insight on the subject at hand, by inciting discussions that would lead to new ideas, new approaches and stimulating me to work harder to achieve the goals I would propose to get. Secondly, I would like to thank my advisory committee: Dr. Brenda Winkel, Dr. Ina Hoeschele, Dr. Reinhard Laubenbacher and Dr. Vladimir Shulaev. A special thanks also to Dr. Ana Martins, who was always a good friend in this adventure to foreign lands to pursue a career in science. Many thanks to my fellow country-men and country-women that I met here in Blacksburg. Angela, Inˆes, Jo˜ao, Polanah, Beatriz and the German-American-Portuguese crowd Renate, Romy and Katja, this thesis is also a bit yours, as your friendly manners and the joy you always bring to wherever we may be made my days brighter. A warm thank you note for all my friends back home, in the US, or in other parts of the world especially Pedro, Nuno and Catarina. A word of appreciation for everyone that, either in Pedro’s group or other research groups at the VBI, worked or exchanged ideas with me. To avoid missing anyone, I’ll refrain from naming you all. A very special thanks to Wei Sha, a cubicle-mate since the day I came to the VBI, for her joy in life and interest in science made our frustrations seem insignificant and stimulated our interest in whatever problem we had to face in the research projects we were involved in on both our dissertations. To Emily, for being you, for being here when I needed you. And last but certainly not the least, a big big thank you to my parents and to my sister, to whom I dedicate this work, for their unlimited love and support. iv Attribution Pedro Mendes, Ph.D. (Virginia Bioinformatics Institute), now also a faculty member of the Manchester Interdisciplinary Biocentre at the University of Manchester and a Professor at the Computer Science Department of the University of Manchester. Dr. Mendes is the primary adivsor and committee chair. Dr. Mendes provided important guidance in all of the projects that I was involved during my research, from the conception of the ideas to the completion of the project. Dr. Mendes also provided funding that allowed me to pursue my research at Virginia Tech. Chapter 2 Alberto de la Fuente, Ph.D. (Virginia Bioinformatics Institute), now coordinator of the RAGNO Group at the Center for Advanced Studies, Research and Development in Sardinia, was a graduate student at Vrije Universiteit in Amsterdam working under the guidance of Dr. Mendes. Dr. de la Fuente was responsible for the derivation of the mathematical link between metabolic control analysis and correlations. Ana Martins, Ph.D. (Virginia Bioinformatics Institute) is a Research Associate work- ing under Dr. Mendes. Dr. Martins was the lab coordinator and in the person of charge of experimental setups, from growth conditions to sample collection and preparation for the different techniques to be applied. Wei Sha, Ph.D. (Virginia Bioinformatics Institute) was a graduate student of Dr. Mendes. Dr. Sha was responsible for the statistical analysis of microarray data. Joel Shuman, Ph.D. (Virginia Bioinformatics Institute) is a Metabolomics Specialist and Laboratory Manager working under Dr. Vladimir Shulaev. Dr. Shuman was responsible for the metabolite profiling experiments. v Chapter 3 Paola Vera-Licona, Ph. D. (Department of Mathematics, Virginia Tech), now at the BioMaps Institute at Rutgers University. Dr. Vera-Licona was a graduate student working under Dr. Reinhard Laubenbacher performed an analysis on Dynamic Bayesian networks and their performance under the conditions of the study performed in this Chapter. Chapter 4 Abdul Jarrah, Ph. D. (Department of Mathematics, Virginia Tech), is a Research Asso- ciate working under Dr. Laubenbacher. Dr. Jarrah was provided me with insightful pointers and discussions throughout the development of the method presented. Brandy Stigler, Ph. D. (Department of Mathematics, Virginia Tech), now employed at the Mathematical Biosciences Institute at Ohio State University. Dr. Stigler was a grad- uate student working under Dr. Laubenbacher and contributed to this Chapter in discussing the problem of reverse engineering approaches and possible improvements that could be done in such approaches. Chapter 5 Ana Martins, Ph. D. (Virginia Bioinformatics Institute) is a Saccharomyces cerevisiae expert with a specific emphasis on oxidative stress response and was instrumental not only in the design of experiments (as in Chapter 2) but also in discussing the results and impli- cations in yeast physiology and their validity and impact in the community. vi Contents Acknowledgements iv Attribution v 1 Introduction 1 1.1 Abstract . 3 1.2 Introduction . 3 1.3 Systems biology: biology in the post genomic era . 5 1.3.1 Holism and the birth of systems biology . 5 1.3.2 The complexity of biology unleashed . 8 1.3.3 Modules and parts lists . 11 1.4 Simulation and modeling in the life sciences . 12 1.4.1 Historical overview . 13 1.4.2 Bottom-up versus top-down modeling . 17 1.4.3 Data limitations and modeling . 18 1.5 Reverse engineering for the “omics” . 20 1.5.1 Inference of gene regulatory interaction networks . 22 1.5.2 Availability vs. applicability . 26 1.6 Data analysis for the “omes” . 26 1.6.1 Correlation analysis in “omics” research . 27 vii 2 On the origin of strong correlations in metabolomics data 29 2.1 Abstract . 31 2.2 Introduction . 31 2.3 Methods . 33 2.3.1 Theoretical . 33 2.3.2 Computational . 34 2.3.3 Yeast model expansion . 35 2.4 Discussion . 36 2.4.1 Metabolic control analysis and correlations . 36 2.4.2 Scatter plots and correlation . 39 2.4.3 Simulations . 43 2.5 Yeast metabolism: model expansion and correlations . 46 2.5.1 Simulations and model validation . 51 2.6 Conclusions . 52 3 Comparison of reverse engineering methods 54 3.1 Abstract . 56 3.2 Introduction . 56 3.2.1 In silico networks . 58 3.2.2 Reverse engineering algorithms . 61 3.2.3 Benchmarking and reverse engineering . 63 3.3 Model . 64 3.3.1 Computational . 64 3.3.2 Genetic perturbations . 65 3.3.3 Environmental perturbations . 65 3.3.4 Adding noise . 66 3.3.5 Data requirements . 66 3.3.6 Method evaluation: measures of correctness . 66 3.4 Results . 68 viii 3.4.1 Gene network . 68 3.4.2 Artificial biochemical network . 75 3.5 Discussion . 78 4 Reverse engineering biological networks by least-squares fitting 80 4.1 Abstract . 82 4.2 Introduction . 82 4.2.1 Reverse engineering gene networks by least squares fitting . 83 4.3 Methods . 87 4.3.1 Computational . 87 4.3.2 Model . 87 4.3.3 Fitting the model to the data . 88 4.4 Results and Discussion . 88 4.5 Conclusions . 90 5 Beyond reverse engineering: applications to experimental data 92 5.1 Abstract . 94 5.2 Introduction . 94 5.2.1 The yeast response to oxidative stress . 95 5.3 Methods . 95 5.3.1 Experimental setup . 95 5.3.2 Data preparation . 96 5.3.3 Computational approach . 96 5.3.4 Yeast regulatory network . 97 5.4 Results and Discussion . 97 5.4.1 Revamping the inference rules .