Studies on the Topology, Modularity, Architecture and Robustness of the Protein-Protein Interaction Network of Budding Yeast Saccharomyces Cerevisiae
Total Page:16
File Type:pdf, Size:1020Kb
STUDIES ON THE TOPOLOGY, MODULARITY, ARCHITECTURE AND ROBUSTNESS OF THE PROTEIN-PROTEIN INTERACTION NETWORK OF BUDDING YEAST SACCHAROMYCES CEREVISIAE DISSERTATION Presented in Partial Fulfillment of the Requirements for The Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Jingchun Chen, M.S. ***** The Ohio State University 2006 Dissertation Committee: Professor Bo Yuan, Adviser Approved by Professor Tsonwin Hai Professor Roger Briesewitz Professor Ralf Bundschuh Adviser Integrated Biomedical Science Graduate Program ABSTRACT In this dissertation, statistical mechanics, graph theory, and machine learning methods have been used to study the topology, modularity, organization and robustness of the protein-protein interaction network of budding yeast Saccharomyces cerevisiae. The protein-protein interaction dataset is obtained by combining high confidence interactions, and is validated from multiple perspectives. Statistical mechanics is then used to analyze the connectivity distribution, graph spectrum, shortest path distance and clustering coefficients of the network, which indicates that the network is both scale-free and modular. Microarray gene expression profiles are used to compute the weight for each interaction and the network is represented as a weighted undirected graph. An edge betweenness-based algorithm is developed and applied on the graph, and a set of functional modules is identified in the network. The functional modules are then validated rigorously against gene annotation, growth phenotype and protein complexes. It is found that genes in the same functional module exhibit similar deletion phenotype, and that known protein complexes are largely contained in the functional modules. To find out the organizations of the yeast proteome network, the relationship between the gene expression profiles of hubs and their interacting proteins is analyzed. The results indicate that subpopulations of hubs exist in the yeast proteome network, which are ii classified as type core, local and global hubs. By examining these hub populations from the perspectives of protein complexes, interaction overlap, clustering coefficients, module connectivity, and visualization, it is found that global hubs form the backbone of module- module interaction, while core hubs are organizers within functional modules. In addition, analysis on the interactions between the hubs indicated that each of the three types of hubs preferentially interact with hubs from the same population, which suggests an ordered architecture for the network and the existence of central processing subnetwork at both global and functional module level. Gene expression changes of the hub populations in cellular responses are then analyzed to gain insights into the dynamics of module-module interactions, and the results suggest that global hubs are the major and early responders in cellular response. Next, network breakdown simulation and graph spectrum are used to examine the contributions of each hub population to the robustness of the yeast proteome network. The results indicate that network organizers contribute most to the robustness at both global and local levels. And last, it is found that genes contributing most to the robustness of functional modules, not that of the entire network, are more likely to be essential. iii Dedicated to my parents iv ACKNOWLEDGMENTS I wish to thank my adviser, Professor Bo Yuan, for his enthusiasm about science, scientific guidance, professional advice, intellectual assistance, and both financial and personal support. Without any of these this dissertation would never have been completed. I wish to thank the members of my dissertation advisory committee, Dr. Tsonwin Hai, Dr. Roger Briesewitz, and Dr. Ralf Bundschuh for their advice, help and critical evaluations on this dissertation research. I am indebted to Solomon Gibbs, who helped me tremendously both academically and personally during the first three years of this research. I wish to thank Russell Sears, Fa Zhang, and all other colleagues for their help and stimulating discussions on this dissertation research. I also wish to thank those nameless people who made their software available to the public free of charge, which greatly enhanced this dissertation research. I am most grateful to my wife, Xu Huang, for her unconditional support, enduring belief and incredible patience during the course of my scientific pursuit. I am in deep debt to all my family whose love has made this dissertation more meaningful. v VITA March 25, 1971 ……………………… Born – Nanchong, Sichuan Province, China June 1993 …………………………… B.S. Biochemistry, Sichuan University June 1996 …………………………… M.S. Cell Biology, Sichuan University March 2001 …………………………. M.S. Biochemistry, The Ohio State University August 2002 ………………………… M.S. Computer Science, Wright State University 1998 – 1999 ………………………… Teaching Associate, The Ohio State University 1999 – 2001 …………………………. Research Associate, The Ohio State University 2001 – 2002 …………………………. Research Associate, Wright State University 2002 – present ……………………….. Research and Teaching Associate, The Ohio State University PUBLICATIONS 1. Chen, J., and Yuan B. (2006). “Detecting functional modules in the yeast protein- protein interaction network”. Bioinformatics, in press. 2. Zhang, F., Liu, Z., Chen, J. and Yuan B. "The construction of structural templates for the modeling of conserved protein domains". International Conference on Bioinformatics and its Applications (ICBA'04), December 16-19, 2004, Fort Lauderdale, Florida, USA. 3. Ozer, H., Chen, J., Zhang, F. and Yuan B. "Clustering of eukaryotic orthologs using the Markov graph-flow algorithm". International Conference on vi Bioinformatics and its Applications (ICBA'04), December 16-19, 2004, Fort Lauderdale, Florida, USA. 4. Okamoto, Y., Chaves, A., Chen, J. et al. (2001). "Transgenic mice with cardiac- specific expression of activating transcription factor 3, a stress-inducible gene, have conduction abnormalities and contractile dysfunction." American Journal of Pathology, 159(2): 639-650. 5. Chen, J. and Shi, A. (1998). "Cytological observation on the fertilization of Anodonta woodiana (Elliptica)." Journal of Fisheries of China, 22(l): 78-80. 6. Shi, A. and Chen, J. (1997). "Mussel breeding and pearl cultivation." Sichuan Sci. & Tech. Press, pp. 1-36. 7. Chen, J. and Shi, A. (1996). "Malacozoan immunobiology research: A review." Acta Hydrobiologia Sinica, 20(l): 74-78. 8. Zhou, L., Li, J., Zheng, Y. and Chen, J. (1995). "Purification and partial characterization of Endo-Polygalacturonase from commercial pectinase of Aspergillus niger." Chinese Biochemical Journal, 11(4): 446-451. FIELDS OF STUDY Major Field: Integrated Biomedical Science Graduate Program vii TABLE OF CONTENTS Abstract ………………………………………………………………………………….. ii Dedication ………………………………………………………………………………. iv Acknowledgments …………………………………………………………………….…. v Vita ……………………………………………………………………………………... vi List of Tables …………………………………………………………………………… ix List of Figures ………………………………………………………………………….... x Chapters: 1. Introduction ……………………………………………………………………… 1 2. Data integration and validation ………………………………………………... 11 Materials and methods ……………………………………………………....14 Results …………………………………………………………………. .....15 Discussion …………………………………………………………………...27 3. Network topology ……………………………………………………………… 30 Materials and methods ………………………………………………………31 Results ……………………………………………………………………….34 Discussion …………………………………………………………………...40 4. Module detection and validation ……………………………………………….. 42 Materials and methods ……………………………………………………....44 Results ……………………………………………………………………….51 Discussion …………………………………………………………………...63 5. Module organizations …………………………………………………………... 67 Materials and methods ………………………………………………………68 Results ……………………………………………………………………….69 Discussion …………………………………………………………………...88 6. Network robustness …………………………………………………………….. 92 Materials and methods ………………………………………………………94 Results ……………………………………………………………………….95 Discussion ………………………………………………………………….107 7. Conclusions and future directions ……………………………………………...110 Bibliography …………………………………………………………………………...117 Appendix A …………………………………………………………………………….124 viii LIST OF TABLES Table Page 2.1. The reliability of representative high-throughput methods …………………….13 2.2. The datasets of three representative studies …………………………………....16 2.3. The protein coverage of the dataset among the functional categories ……….....17 3.1. The diameter and shortest path length of the networks …...…………………... 38 3.2. The clustering coefficients of the networks …....……………………………… 40 5.1. Three types of hubs ...………………………………………………………….. 74 5.2. The number of unique interactors in each hub population ...…………………. 75 ix LIST OF FIGURES Figure Page 2.1. The protein coverage of the dataset among the functional categories ………..... 18 2.2. The interaction intensities between the functional categories of the CYGD annotation system .…………………………………………………………….…20 2.3. The interaction intensities between the functional categories of GO annotation system ……………………..………………………………………………….... 21 2.4. Boxplot for the functional similarities between interacting proteins ……….….. 23 2.5. The interaction intensities between the localization categories of CYGD database ……………………...…………………………………………………. 24 2.6. The interaction intensities between the localization categories of SGD database ……………………………………………………………………25 2.7. The scale-free fitting of the datasets ….……………………………………..…..26 3.1. The connectivity distribution of the networks ………………………………….