Poirel CL D 2013.Pdf (3.645Mb)
Total Page:16
File Type:pdf, Size:1020Kb
Bridging Methodological Gaps in Network-Based Systems Biology Christopher L. Poirel Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science T. M. Murali, Chair Ananth Grama Narendran Ramakrishnan John Tyson Anil Vullikanti August 30, 2013 Blacksburg, Virginia Keywords: Computational Biology, Functional Enrichment, Graph Theory, Network, Random Walk, Signaling Pathways, Top-Down Analysis Copyright 2013, Christopher L. Poirel Bridging Methodological Gaps in Network-Based Systems Biology Christopher L. Poirel (ABSTRACT) Functioning of the living cell is controlled by a complex network of interactions among genes, proteins, and other molecules. A major goal of systems biology is to understand and explain the mechanisms by which these interactions govern the cell's response to various conditions. Molecular interaction networks have proven to be a powerful representation for studying cellular behavior. Numerous algorithms have been developed to unravel the complexity of these networks. Our work addresses the drawbacks of existing techniques. This thesis includes three related research efforts that introduce network-based approaches to bridge current methodological gaps in systems biology. i. Functional enrichment methods provide a summary of biological functions that are overrepresented in an interesting collection of genes (e.g., highly differentially expressed genes between a diseased cell and a healthy cell). Standard functional enrichment algorithms ignore the known interactions among proteins. We propose a novel network-based approach to functional enrichment that explicitly accounts for these underlying molecular interactions. Through this work, we close the gap between set-based functional enrichment and topological analysis of molecular interaction networks. ii. Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression p-values such that two genes connected by an interaction show similar changes in their gene expression values. iii. Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models of cellular processes. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present Linker, an efficient and automated data-driven method that analyzes molecular interactomes. Linker combines teleporting random walks and k-shortest path computations to discover connections from a set of source proteins to a set of target proteins. We demonstrate the efficacy of Linker through two applications: proposing extensions to an existing model of cell cycle regulation in budding yeast and automated reconstruction of human signaling pathways. Linker achieves superior precision and recall compared to state-of-the-art algorithms from the literature. Dedicated to the memory of my late grandfather Weston \Posh" Stevens for providing a lifetime of love, leadership, and laughter. iii Acknowledgments My deepest appreciation goes to T. M. Murali for inviting me to join his research group at Virginia Tech (VT) in 2009. Murali exposed me to the wonderful world of systems biology and the exciting open graph-theoretical challenges therein. His philosophy of teaching by example has greatly impacted my research. Through his mentorship, Murali demonstrated that careful attention to detail is necessary to accomplish high quality, reproducible research. I am greatly appreciative of his persistent words of encouragement and sound advice. Working with Murali introduced me to John Tyson. John's encyclopedic knowledge of dynamic systems (especially in yeast) has been an indispensable resource. His biological modeling expertise provided a fresh perspective to the challenging problems tackled in this thesis, and conversations with John had a profound influence on the algorithms we developed. Naren's discussions of machine learning and knowledge discovery have greatly impacted my development and enthusiasm as a researcher. His presentation of difficult concepts in data mining have been incredibly inspiring. Naren's data mining seminars and open discussion teaching style were wonderful experiences and helped me to explore some of my own research questions with new approaches. When I first joined the PhD program at VT, I had the pleasure of working with Anil as a graduate teaching assistant for his course on theory of algorithms. Anil's expertise in graph theory and algorithm analysis was a vital resource for ensuring this work is theoretically sound. I routinely pitched ideas to Anil and always received honest, valuable feedback. I am grateful to have Ananth as an external committee member. Ananth's research addresses challenges in computational biology similar to those expressed in this thesis, and his work has inspired my own. I am thankful for Ananth making special arrangements to accommodate all of my committee meetings. He graciously set up a temporary office at the airport for my research defense; fortunately, he made it through security during the meeting. I have been blessed with an overflowing fountain of support from my family. I express my deepest gratitude to my parents, Dennis Poirel and Sandra Stevens-Poirel, for providing a lifetime of love and encouragement. My parents stimulated my enthusiasm for academics when I was young and have been unquestioningly supportive of my decisions throughout my research career. They have always worked hard to ensure I received the best available education and experiences. My mother's determination to finish her PhD much later in life has provided me with a special source of inspiration. Thanks to my brother Tony for encouraging me in an unspoken way that only a big brother can. Thanks to Jim Hyatt for his unconditional support. My grandparents, Weston \Posh" and Emma \Mos" Stevens, provided continual re- minders of their love from back home. Thanks to Mos for routinely shipping boxes of iv homemade goods overnight; the care packages you and Mom put together truly made me feel at home. Posh will forever hold a special place in my heart for his calm and unending words of reassurance. I am indebted to the love of my life Brooke Cox for her support throughout graduate school. She attentively listened to my ramblings when courses, coding, and manuscript submissions simply did not proceed as planned. Spending time with Brooke and our pooches always offered a refreshing respite during the most stressful times. A unique thanks go to Rusty and Huck for their unremitting attention. Thanks to my wonderful circle of friends in Blacksburg, many of whom participated in our weekly Drinking Club, which served as a revitalizing break from the long hours in Torgersen Hall. I have forged too many friendships through my studies in Blacksburg to explicitly name them all here. However, I would be remiss not thank Joseph Turner for our numerous enlightening discussions over coffee, beer, and board games. Thanks to Kirk Cameron for always offering your unabashedly honest advice and for the internship opportunity at MiserWare. This work would not have been possible without contributions from an excellent sup- porting cast. Murali was intimately involved with every chapter. It was a pleasure working with Chris Lasher to write Chapter2 on response networks. Conley Owens made significant initial progress on our algorithm for network-based functional enrichment (Chapter3), and I enjoyed working with him to complete the project. Thanks to Ahsanur Rahman, Richard Rodrigues, Arjun Krishnan, and Jacqueline Addesa for their contributions to the (many) sub- missions and revisions of our work on network reconciliation (Chapter4). Thanks to Richard Rodrigues, Jean Peccoud, David Ball, Neil Adames, John Tyson, Kathy Chen, and Pavel Kraikivski for the numerous suggestions during our weekly HGTV meetings that greatly improved Linker and its application to the yeast cell cycle (Chapter5). Thanks to Naveed Massjouni, David Badger, and Craig Estep for developing and maintaining GraphSpace. This work is unpublished but facilitated seamless collaboration with other research groups and expedited the analyses in Chapters5 and6. Thanks to Anna Ritz, Hyunju Kim, and Allison Tegge for their contributions to the work on automated reconstruction of human signaling pathways (Chapter6). It was my pleasure to work with a visiting undergraduate, Nina Blanson, during the summer of 2013 on extensions to the shortest paths algorithm used by Linker; Nina is a promising young researcher with a bright future. A special thanks go to past and present members of Murali's research group for their insightful feedback and constructive criticisms. I wish all of my collaborators the best in their future endeavors. I am thankful for financial