A Framework for the Static and Dynamic Analysis of Interaction Graphs
Total Page:16
File Type:pdf, Size:1020Kb
A Framework for the Static and Dynamic Analysis of Interaction Graphs DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Sitaram Asur, B.E., M.Sc. * * * * * The Ohio State University 2009 Dissertation Committee: Approved by Prof. Srinivasan Parthasarathy, Adviser Prof. Gagan Agrawal Adviser Prof. P. Sadayappan Graduate Program in Computer Science and Engineering c Copyright by Sitaram Asur 2009 ABSTRACT Data originating from many different real-world domains can be represented mean- ingfully as interaction networks. Examples abound, ranging from gene expression networks to social networks, and from the World Wide Web to protein-protein inter- action networks. The study of these complex networks can result in the discovery of meaningful patterns and can potentially afford insight into the structure, properties and behavior of these networks. Hence, there is a need to design suitable algorithms to extract or infer meaningful information from these networks. However, the challenges involved are daunting. First, most of these real-world networks have specific topological constraints that make the task of extracting useful patterns using traditional data mining techniques difficult. Additionally, these networks can be noisy (containing unreliable interac- tions), which makes the process of knowledge discovery difficult. Second, these net- works are usually dynamic in nature. Identifying the portions of the network that are changing, characterizing and modeling the evolution, and inferring or predict- ing future trends are critical challenges that need to be addressed in the context of understanding the evolutionary behavior of such networks. To address these challenges, we propose a framework of algorithms designed to detect, analyze and reason about the structure, behavior and evolution of real-world interaction networks. The proposed framework can be divided into three components: ii • A static analysis component where we propose efficient, noise-resistant algo- rithms taking advantage of specific topological features of these networks to extract useful functional modules and motifs from interaction graphs. • An event detection component where we propose algorithms to detect and char- acterize critical events and behavior for evolving interaction graphs • A temporal reasoning component where we propose approaches wherein one can make useful inferences on events, communities, individuals and their interac- tions over time. For each component, we propose either new algorithms, or suggest ways to apply existing techniques in a previously-unused manner. Where appropriate, we compare against traditional or accepted standards. We evaluate the proposed framework on real datasets drawn from clinical, biological and social domains. iii To my family iv ACKNOWLEDGMENTS First of all, I would like to acknowledge and thank the invaluable support and guidance of my advisor, Dr Srinivasan Parthasarathy. From the beginning of my graduate study, he has provided me with the freedom to explore different types of research problems, and has constantly motivated and challenged me, which has helped me grow as a researcher. I am deeply indepted to him for his guidance and advice over these past 5 years. His passion, energy and work drive have been a source of inspiration and I have been priveleged to learn a lot from him during my PhD study. I would also like to thank Dr Hakan Ferhatosmanoglu, Dr Sadayappan and Dr Gagan Agrawal for serving on my candidacy and defense committees and providing me with valuable insights and suggestions. My research has been supported in part by grants from the National Science Foundation (CAREER Grant IIS-0347662 and NSF SGER Grant IIS-0742999) and Department of Energy (DE-FG02-04ER25611) Any opinions, findings, and conclusions or recommendations expressed here are those of the author and, if applicable, his advisor and collaborators, and do not necessarily reflect the views of the National Science Foundation or the Department of Energy. I am very thankful to my colleague, frequent co-author and friend, Duygu Ucar for the innumerable discussions and collaborations relating to this research. Her patience and dedication towards work has motivated and taught me a lot. I would like to thank the other members of the Data Mining Research Lab, past and present - Sameep, Hui, v Amol, Chao, Keith, the two Matts, Xintian and Venu, for their help, support and useful discussions. The road to the Phd is a long and hard one and I am deeply indebted to my good friends (in alphabetical order) - Abhilash, Gaurav, K2, Miti, Muthu, Rajkiran, Sound, Vijay for sharing the journey with me and making it lively, exciting and memorable. I thank them for their company and the endless bouts of dining, movies, sports and mindless and mindful discussions. Finally, I would want to thank and acknowledge my family, my parents and grand- parents for their constant support, faith and encouragement, and my brother for show- ing me the way and setting the standards high. And also my relatives and friends in India, who have waited as patiently for this day as I have. vi VITA Feb, 1981 . .Born - New Delhi, India 2002 . .B.E. (Hons.) Information Science En- gineering Visvesvaraya Technological University, Bangalore, India 2007 . .M.Sc Computer Science Engineering Ohio State University, Columbus, OH June 2007 - Sep. 2007 . Research Intern, IBM T. J. Watson Research Center. 2004-2005 . Graduate Teaching Associate, The Ohio State University. 2005-2009 . Graduate Research Associate, The Ohio State University. Mar 2009 - Jun 2009 . Research Intern, Microsoft Research. PUBLICATIONS Sitaram Asur and Srinivasan Parthasarathy. A Viewpoint-based Approach for Inter- action Graph Analysis. In the Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD, 2009. Xintian Yang, Sitaram Asur, Srinivasan Parthasarathy, and Sameep Mehta. A visual-analytic toolkit for dynamic interaction graphs. In the Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD, 2008. vii Sitaram Asur, Srinivasan Parthasarathy, and Duygu Ucar. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In the Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 913{921, 2007. Sitaram Asur and Srinivasan Parthasarathy. Correlation-based Feature Partition- ing for Rare Event Detection in Wireless Sensor Networks. In the Proceedings of the 1st ACM Workshop on Knowledge Discovery from Sensor Data at the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sensor-KDD), 2007. Sitaram Asur, Duygu Ucar and Srinivasan Parthasarathy. An ensemble framework for clustering protein protein interaction networks. Bioinformatics Volume 23, 13, i29{40, July 2007. Sitaram Asur, Duygu Ucar and Srinivasan Parthasarathy. An Ensemble Framework for Clustering Protein-Protein Interaction Networks. In the Proceedings of the 15th Annual International Conference on Intelligent Systems for Molecular Biology, ISMB, 2007. Duygu Ucar, Sitaram Asur, Umit V. Catalyurek, and Srinivasan Parthasarathy. Improving functional modularity in protein-protein interactions graphs using hub- induced subgraphs. In the Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases , PKDD, 2006. Sitaram Asur, Srinivasan Parthasarathy and Duygu Ucar. An Ensemble Approach for Clustering Scale-Free Graphs. In the Proceedings of the LinkKDD workshop at the ACM International Conference on Knowledge Discovery and Data Mining, (LinkKDD), 2006. Sitaram Asur, Pichai Raman, Matthew Eric Otey and Srinivasan Parthasarathy. A Model-based Approach for Mining Membrane Protein Crystallization Trials. Bioin- formatics Volume 22(14), e40-e48. July 2006. Sitaram Asur, Pichai Raman, Matthew Eric Otey and Srinivasan Parthasarathy. A Model-based Approach for Mining Membrane Protein Crystallization Trials. In the proceedings of the 14th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB), 2006. viii Duygu Ucar, Srinivasan Parthasarathy, Sitaram Asur, and Chao Wang. Effective preprocessing strategies for functional clustering of a protein-protein interactions net- work. In the IEEE International Symposium on Bioinformatics and Bioengineering, BIBE, 2005. FIELDS OF STUDY Major Field: Computer Science and Engineering Studies in Data Mining: Srinivasan Parthasarathy ix TABLE OF CONTENTS Page Abstract . ii Dedication . iv Acknowledgments . v Vita . vii LIST OF TABLES . xiv LIST OF FIGURES . xvii Chapters: 1. Introduction . 1 1.1 Challenges in Analyzing Interaction Graphs . 3 1.2 Research Overview . 7 1.3 Contributions . 10 1.3.1 Static Analysis . 10 1.3.2 Dynamic Analysis and Reasoning . 12 1.4 Organization . 13 2. Background and Related Work . 15 2.1 Static Analysis . 15 2.1.1 Protein-Protein Interaction (PPI) Networks . 16 2.1.2 PPI Dataset . 20 2.1.3 Clustering and Graph Partitioning Algorithms . 20 x 2.1.4 Ensemble Clustering . 24 2.1.5 Principal Component Analysis . 28 2.2 Dynamic Analysis . 29 2.2.1 Social network analysis . 29 2.2.2 Related Work in Dynamic Analysis . 31 2.2.3 Semantic Similarity . 34 2.2.4 Datasets . 35 3. Ensemble Clustering Framework . 39 3.1 Ensemble Framework . 41 3.1.1 Topological similarity measures . 41 3.1.2 Base algorithms . 44 3.1.3 Consensus Methods . 44 3.2 Validation