INFERRING SIGNAL TRANSDUCTION PATHWAYS by G
Total Page:16
File Type:pdf, Size:1020Kb
ANALYZING AND MODELING LARGE BIOLOGICAL NETWORKS: INFERRING SIGNAL TRANSDUCTION PATHWAYS by GURKAN¨ BEBEK Submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Electrical Engineering And Computer Science Department CASE WESTERN RESERVE UNIVERSITY January 2007 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the dissertation of Gürkan Bebek ______________________________________________________ candidate for the Ph.D. degree *. Jiong Yang (signed)_______________________________________________ (chair of the committee) S. Cenk Sahinalp ________________________________________________ Tekin Ozsoyoglu ________________________________________________ Mark Adams ________________________________________________ Jing Li ________________________________________________ ________________________________________________ January 12, 2007 (date) _______________________ *We also certify that written approval has been obtained for any proprietary material contained therein. to Gamze... Contents List of Tables iii List of Figures iv 1 Introduction 1 1.1 Background ............................. 5 1.1.1 GraphTheoreticDefinitions . 5 1.1.2 SignalTransductionPathways . 7 1.1.3 Protein-ProteinInteractions. 10 1.1.4 Discovery of Protein-Protein Interactions . 11 1.2 Contributions ............................ 15 2 Evolutionary Models of Proteome Networks 18 2.1 BiologicalNetworks ........................ 24 2.1.1 The Evolutionof Protein-Protein Interactions . .. 26 2.1.2 RandomNetworkModels. 28 2.1.3 PropertiesofNetworks . 32 2.2 ProteomeGrowthModel . 35 2.3 AnalysisoftheProteomeGrowthModel . 36 2.3.1 Propertiesofthepureduplicationmodel . 37 2.3.2 On the degree distribution of the proteome growth model. 41 i 2.4 Discussion.............................. 44 3 Enhanced Duplication Model 48 3.1 Sequence Similarity Distribution in the Yeast Proteome ...... 51 3.2 EnhancedModelBasedonSequenceSimilarity . 56 3.3 Discussion.............................. 62 4 Discovering Signaling Pathways: PathFinder 65 4.1 PathFinder.............................. 71 4.1.1 Preliminary ......................... 73 4.2 Methods............................... 75 4.2.1 MappingProteinstoFunctionalAnnotations . 76 4.2.2 MiningAssociationRulesfromKnownPathways . 80 4.2.3 Constructing a Weighted Protein-Protein InteractionNetwork 87 4.2.4 SearchingforPathwaySegments . 89 4.3 ExperimentsontheYeastProteomeNetwork. 91 4.4 Discussion.............................. 102 5 Conclusions and Reflections 105 Bibliography 115 ii List of Tables 3.1 The average clustering coefficients of the DIP Protein-Protein In- teraction Network, Proteome Growth Model, and the Enhanced Model ................................ 60 4.1 BinaryTableExample. .. .. .. .. .. .. 81 4.2 PathFinderSearchResults . 97 iii List of Figures 2-1 ℓ hop ............................... 34 − 2-2 Percentage of singletons in the pure duplication model . ..... 40 2-3 Average degree of non-singleton nodes in pure duplicationmodel. 42 3-1 Degree distribution of the Yeast and the proteome growth model interactionnetworks. 49 3-2 ℓ-hop degree distribution comparison of the Yeast and Proteome GrowthModel............................ 50 3-3 Distribution of pairwise sequence similarity of yeast proteins . 54 3-4 Aggregate distribution of pairwise sequence similarity of yeast pro- teins................................. 55 3-5 EnhancedModelBasedonSequenceSimilarity . 57 3-6 Degree distribution of the proteome sequence similaritynetworks. 59 3-7 Degree distributionof the interactionnetworks . ..... 59 3-8 ℓ-hop degree distribution of the yeast, proteome growth model and thesequencesimilarityenhancedmodel . 61 4-1 MAPKinasePathways ....................... 74 4-2 PathFinder.............................. 77 4-3 Two interacting proteins and their linked annotation terms..... 79 4-4 AssociationRuleMiningParameters . 93 iv 4-5 PathFinderSte7-Dig2SimplePathResults . 94 4-6 PathFinder Ste7-Dig2 Signaling Pathway Segment Results .... 96 4-7 ThePheromoneResponseSignalingPathway . 98 4-8 TheHighOsmolaritySignalingPathway . 101 v Acknowledgements It is with great pleasure that I would like to thank those who have helped me in my Ph.D. studies. I would like to acknowledge Dr. S. Cenk S¸ahinalp for recruiting me as a grad- uate student, and for his guidance throughout my education. After his move to Vancouver, Canada, he offered me his continued help in finishing this program both financially and academically. I have learned a great amount of skills from him, and I will still be a supporter of Dr. S¸ahinalp after my graduation. I am very grateful to Dr. Jiong Yang, for accepting to take over my advisory duties and helping me accelerate my studies. I appreciate his financial support dur- ing my last years and his guidance throughout my studies since he moved to Case. His guidance on finding interesting problems and accurate approaches should be mentioned here. I also would like to thank him for being my dissertation committee chair. I would like to give my gratitude to Prof. Meral Ozsoyo˘glu¨ and Prof. Tekin Ozsoyo˘glu,¨ for their help and guidance during this last five years. It has always been an inspiration to see their academic achievements. I especially would like to mention Prof. Tekin Ozsoyo˘glu’s¨ support and priceless advice during my last year of study. I would like to thank Prof. Tekin Ozsoyo˘glu,¨ Dr. Mark Adams, and Dr. Jing Li for being on my dissertation committee. I deeply appreciate their input to this dissertation and my research. Soon after I met my wife, I was privileged to be introduced to the Wise, whom I am eternally indebted to, as I have gained so much from them. I always feel welcome among them, and I am happy to make them proud by finishing this degree. Here, I would like to mention Mrs. Marilyn Wise for her support in every aspect of my life and sharing her spiritual enlightenment with me. I appreciate her being vi my mother here in the United States. I also would like to acknowledge the moral support of Mr. Jonathon K. Wise and Ms. Cheryl Davis. Mr. Jonathon K. Wise has been a great role model, whom my wife and I respect, and always look for guidance. I would like to acknowledge my lab friends, Can Alkan, Emre Karakoc¸, and Eray T¨uz¨un. Although we have been separated by moves and graduations, they were a great support in this accoplishment. Also, I do appreciate Mr. Brendan Eliott for proofreading my dissertation. Finally, I appreciate more than anything the support and understanding of my beautiful wife, Gamze throughout my Ph.D. program. I can not express enough how thankful I am for her encouragement, help and endless patience. Without her I would not have finished this study. G¨urkan Bebek, Ph. D. August 2006 vii Analyzing and Modeling Large Biological Networks: Inferring Signal Transduction Pathways Abstract by Gurkan¨ Bebek Large scale two-hybrid screens have generated a wealth of information describing potential protein-protein intereactions (PPIs). When interacting proteins are asso- ciated with each other to generate networks, a map of the cell, picturing potential signaling pathways and interactive complexes is formed. PPI networks satisfy the small-world property and their degree distribution fol- low the power-law degree distribution. Recently, duplication based random graph models have been proposed to emulate the evolution of PPI networks and to satisfy these two graph theoretical properties. In this work, we show that the previously proposed model of Pastor-Satorras et al. (2003) does not generate a power-law degree distribution with exponential cutoff as claimed and the more restrictive model by Chung et al. (2003) cannot be interpreted unconditionally. It is possible to slightly modify these models to ensure that they generate a power-law degree distribution. However, even after this modification, the more general ℓ-hop degree distribution achieved by these models, for ℓ > 1, are very different from that of the yeast proteome network. We address this problem by introducing a new network growth model taking into account the sequence similarity between pairs of proteins as well as their interactions. The new model captures the ℓ-hop degree distribution of the yeast PPI network for all ℓ> 0, as well as the immediate degree distribution of the sequence similarity network. We further utilize the PPI networks to discover possible pathway segments. Dis- covering signal transduction pathways has been an arduous problem, even with the viii use of systematic genomic, proteomic and metabolomic technologies. The enor- mous amount of data and how to interpret and process this data becomes a chal- lenging computational problem. In this work we present a new framework to identify signaling pathways in PPI networks. Our goal is to find biologically significant pathway segments in a given interaction network. First, we discover association rules based on known signal transduction pathways and their functional annotations. Given a pair of starting and ending proteins, our methodology returns candidate pathway segments between these two proteins. These candidate pathway segments are further filtered by their gene expression levels. In our study, we used the S. cerevisiae interaction network and microarray data, to successfully reconstruct signal transduction pathways in yeast. ix Chapter 1 Introduction Aristotle (384-322 B.C.) is known as the originator of the scientific study of life. Aristotle himself wrote around 146 books on the subject. Throughout the past 24 centuries