Constructing and Analyzing Biological Interaction Networks for Knowledge Discovery

Constructing and Analyzing Biological Interaction Networks for Knowledge Discovery Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Duygu Ucar Graduate Program in Computer Science and Engineering The Ohio State University 2009 Dissertation Committee: Srinivasan Parthasarathy, Advisor Yusu Wang Umit Catalyurek c Copyright by Duygu Ucar 2009 ABSTRACT Many biological datasets can be effectively modeled as interaction networks where nodes represent biological entities of interest such as proteins, genes, or complexes and edges mimic associations among them. The study of these biological network structures can provide insight into many biological questions including the functional characterization of genes and gene products, the characterization of DNA-protein bindings, and the under- standing of regulatory mechanisms. Therefore, the task of constructing biological interaction networks from raw data sets and exploiting information from these networks is critical, but is also fraught with challenges. First, the network structure is not always known in a priori; the structure should be inferred from raw and heterogeneous biological data sources. Second, biological networks are noisy (containing unreliable interactions) and incomplete (missing real interactions) which makes the task of extracting useful information difficult. Third, typically these networks have non-trivial topological properties (e.g., uneven degree distribution, small world) that limit the effectiveness of traditional knowledge discovery algorithms. Fourth, these networks are usually dynamic and investigation of their dynamics is essential to understand the underlying biological system. In this thesis, we address these issues by presenting a set of computational techniques that we developed to construct and analyze three specific types of biological interaction networks: protein-protein interaction networks, gene co-expression networks, and regulatory networks. iii Dedicated to my mother, who gave me the courage for the journey to the PhD. I wish she were here to see the end of the journey. iv Acknowledgments I would like to first thank my advisor Dr. Srinivasan Parthasarathy for his invaluable support and guidance throughout my years at OSU. Starting from my first months at OSU, he supported me greatly and provided a stimulating environment for my academic studies. I am grateful for his openness for new ideas and fields including Bioinformatics. I would also like to thank Dr. Yusu Wang, Dr. Ramana Davuluri, and Dr. Umit Catalyurek for serving on my candidacy and defense committees and for providing me with invaluable insights and suggestions. I would also like to thank my collaborators Dr. Hakan Ferhatosmanoglu and Dr. Fatih Altiparmak for their assistance in my work with the gene external similarity. I would also like to thank and acknowledge my collaborators outside the OSU. First, I would like to thank Dr. Christopher Workman for supporting my study at the Technical University of Denmark and for providing a very motivating and friendly environment for me in Denmark. I would like to further thank Dr. Workman and to Dr. Andreas Beyer for all their domain expertise and guidance in my work with regulatory networks. I also would like to thank Dr. Rui-Ru Ji, for hosting me at the Bristol-Myers Squibb as a research intern and for her efforts in our work and in arranging my accommodation and transportation in New Jersey. I would like to acknowledge the National Science Foundation and Department of En- ergy for supporting this work in part through the following grants: NSF CAREER Grant v IIS-0347662, NSF SGER Grant IIS-0742999, NSF RI CNS-0403342, and DOE DE-FG02- 04ER25611. Any opinions, findings, and conclusions expressed in this dissertation are those of the author, her advisor, and collaborators, and do not necessarily reflect the views of the National Science Foundation or the Department of Energy. I would like to thank my friends and colleagues at the Data Mining Research Lab (DMRL) at OSU. I heartily thank my friend Sitaram Asur for long hours we spent dis- cussing ideas, writing papers, and sharing our passion for food and travel. I also want to thank Shirish Tatikonda for always being so positive and genuine. I want to thank Hui Yang for being very supportive during my first years at OSU. She has been a great mentor and a very caring friend to me. I also want to thank other friends at the DMRL for sharing so many things: Amol Ghoting, Greg Buehrer, Matthew Otey and Matthew Goyder, Sameep Mehta, Venu Satuluri, Xintian Yang, and Ye Wang. It was a great pleasure to be a member of this motivating and friendly group. I am more thankful than I can express to my family. Their patience and constant support through my studies kept me going. Special thanks are owed to my dear brother Utku Ucar for sharing my apartment, my ideas, and my life in Columbus during the last year of my PhD. It was a great pleasure to have him here as a friend and as a caring and loving family member. My PhD studies will not come to an end without the constant support of my friends. I am grateful to my dear friend Zulal Fazlioglu Akin for being such a good listener and a supporter. I thank her for her advice and enlightening thoughts in many difficult situations. I also want to thank Yigit and Zulal Akin for sharing so many dinners, conversations, and laughs with me over the years at Crane's. It will always be remembered as a place filled with joy and friendship. I would like to thank Sahika Vatan Korkmaz for always vi listening to me and cheering me up. I also want to thank Gokhan Korkmaz for his friendship and for graduating a year ahead of Sahika. The days we spent with Sahika in 2007 were among the best in my life. I also would like to thank my dear friend Hasibe Otter for her sisterly support and for taking such good care of me, during her years at Columbus. Hasibe, Thomas, Artun, and Timon always made me feel at home and welcomed me to the joyful Otter family. I would also like to thank my friends Arif and Hulya Cetintas for their friendship and for many hours we spent together in Columbus. And I would like to thank my friends in Turkey whom I see in person only once in a while for being there with love, support, and encouragement. And last of all, I want to pay my special thanks to my friend and partner Emre Sencer for his continuous support during my studies and for his joyful existence in my life. I am more thankful than I can say to share my life with such a loving and caring person and to have my share of his colorful stories, his appetite for ethnic-food, his history lessons, and his passion for travel. I had the privilege to learn a lot from him in a very broad range of topics, though none of those are relevant enough to be included into this dissertation. vii VITA July 27, 1980 . Born – Corum, Turkey May 2003 . B.S. Computer Engineering, Bilkent University, Turkey September 2007 . .M.Sc Computer Science & Engineering, The Ohio State University 2003 - 2009 . Graduate Teaching/Research Associate, The Ohio State University June 2007 - Sept 2007 . Research Intern, Bristol-Myers Squibb January 2008 - June 2008 . Guest Researcher, Technical University of Denmark PUBLICATIONS Research Publications Duygu Ucar, Fatih Altiparmak, Hakan Ferhatosmanoglu, and Srinivasan Parthasarathy. Mutual Information Based Extrinsic Similarity for Microarray Analysis. International Conference on Bioinformatics and Computational Biology, BiCOB 2009. Duygu Ucar, Andreas Beyer, Christopher T. Workman, Srinivasan Parthasarathy. Predict- ing functionality of protein-DNA interactions by integrating diverse evidence. Bioinfor- matics Volume 25:12, pages 137-144, 2009 viii Duygu Ucar, Andreas Beyer, Christopher T. Workman, Srinivasan Parthasarathy. Predict- ing functionality of protein-DNA interactions by integrating diverse evidence. In the Pro- ceedings of the 17th Annual International Conference on Intelligent Systems for Molecular Biology, ISMB, 2009. Duygu Ucar, Isaac Neuhaus, Petra Ross-MacDonald, Charles Tilford, Srinivasan Parthasarathy, Nathan Siemers, and Rui-Ru Ji. Construction of a Reference Gene Association Network from Multiple Profiling Data: Application to Data Analysis. Bioinformatics Volume 23:20, pages 2716-2724, August 2007 Sitaram Asur, Duygu Ucar, Srinivasan Parthasarathy. An Ensemble Framework for Clus- tering Protein-Protein Interaction Networks. Bioinformatics Volume 23:13, pages 29-40, July 2007. Sitaram Asur, Duygu Ucar, Srinivasan Parthasarathy. An Ensemble Framework for Clus- tering Protein-Protein Interaction Networks. In the Proceedings of the 15th Annual Inter- national Conference on Intelligent Systems for Molecular Biology, ISMB, 2007. Hui Yang, Srinivasan Parthasarathy, Duygu Ucar. A spatio-temporal mining approach to- wards summarizing and analyzing protein folding trajectories. Algorithms for Molecular Biology, Volume 2:3, April 2007 Duygu Ucar, Fatih Altiparmak, Hakan Ferhatosmanoglu, and Srinivasan Parthasarathy. Investigating the use of Extrinsic Similarity Measures for Microarray Analysis In the BioKDD workshop held at the 13th ACM International Conference on Knowledge Dis- covery and Data Mining, SIGKDD, 2007 Sitaram Asur, Srinivasan Parthasarathy, and Duygu Ucar. An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs. The 13th International Conference on Knowledge Discovery and Data Mining, SIGKDD, 2007 Duygu Ucar, Sitaram Asur, Umit Catalyurek, and Srinivasan Parthasarathy. Functional Modularity in Protein-Protein Interactions Graphs Using Hub-induced Subgraphs. The 17th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, 2006 Sitaram Asur, Srinivasan Parthasarathy, and Duygu Ucar. An Ensemble Approach for Clus- tering Scale-Free Graph. The LinkKDD workshop held at the 12th ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, 2006 ix Hui Yang, Srinivasan Parthasarathy, and Duygu Ucar. Protein Folding Trajectories Anal- ysis: Summarization, Folding Events Detection and Common Partial Folding Pathway Identification.

Constructing and Analyzing Biological Interaction Networks for Knowledge Discovery

Final Copy 2018 09 25 Gaunt

Analysis of Gene Expression Data for Gene Ontology

(TEX) Genes: a Review Focused on Spermatogenesis and Male Fertility

Ecological and Evolutionary Effects of Interspecific Competition in Tits

Proteomics Provides Insights Into the Inhibition of Chinese Hamster V79

Integrating Single-Step GWAS and Bipartite Networks Reconstruction Provides Novel Insights Into Yearling Weight and Carcass Traits in Hanwoo Beef Cattle

Supplementary Figure 1

Symbiotic Relationship in Which One Organism Benefits and the Other Is Unaffected

Making Ribosomes: Biochemical and Structural Studies of Early Ribosome Biogenesis in Yeast Malik Chaker-Margot

Pathogenic Variants in the DEAH-Box RNA Helicase DHX37 Are a Frequent Cause of 46,XY Gonadal Dysgenesis and 46,XY Testicular Regression Syndrome

A Regional Study

Multifaceted Deregulation of Gene Expression and Protein Synthesis with Age