Graph Theoretic Generalizations of Clique

GRAPH THEORETIC GENERALIZATIONS OF CLIQUE: OPTIMIZATION AND EXTENSIONS A Dissertation by BALABHASKAR BALASUNDARAM Submitted to the O±ce of Graduate Studies of Texas A&M University in partial ful¯llment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2007 Major Subject: Industrial Engineering °c 2007 BALABHASKAR BALASUNDARAM ALL RIGHTS RESERVED GRAPH THEORETIC GENERALIZATIONS OF CLIQUE: OPTIMIZATION AND EXTENSIONS A Dissertation by BALABHASKAR BALASUNDARAM Submitted to the O±ce of Graduate Studies of Texas A&M University in partial ful¯llment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved by: Chair of Committee, Sergiy I. Butenko Committee Members, Illya V. Hicks Wilbert E. Wilhelm Catherine H. Yan Head of Department, Brett A. Peters August 2007 Major Subject: Industrial Engineering iii ABSTRACT Graph Theoretic Generalizations of Clique: Optimization and Extensions. (August 2007) Balabhaskar Balasundaram, B.Tech., Indian Institute of Technology { Madras Chair of Advisory Committee: Dr. Sergiy Butenko This dissertation considers graph theoretic generalizations of the maximum clique problem. Models that were originally proposed in social network analysis lit- erature, are investigated from a mathematical programming perspective for the ¯rst time. A social network is usually represented by a graph, and cliques were the ¯rst models of \tightly knit groups" in social networks, referred to as cohesive subgroups. Cliques are idealized models and their overly restrictive nature motivated the development of clique relaxations that relax di®erent aspects of a clique. Identifying large cohesive subgroups in social networks has traditionally been used in criminal network analysis to study organized crimes such as terrorism, narcotics and money laundering. More recent applications are in clustering and data mining wireless networks, biological networks as well as graph models of databases and the internet. This research has the potential to impact homeland security, bioinformatics, internet research and telecommunication industry among others. The focus of this dissertation is a degree-based relaxation called k-plex.A distance-based relaxation called k-clique and a diameter-based relaxation called k- club are also investigated in this dissertation. We present the ¯rst systematic study of the complexity aspects of these problems and application of mathematical programming techniques in solving them. Graph theoretic properties of the models are identi¯ed and used in the development of theory and algorithms. Optimization problems associated with the three models are formulated as bi- iv nary integer programs and the properties of the associated polytopes are investigated. Facets and valid inequalities are identi¯ed based on combinatorial arguments. A branch-and-cut framework is designed and implemented to solve the optimization problems exactly. Specialized preprocessing techniques are developed that, in con- junction with the branch-and-cut algorithm, optimally solve the problems on real-life power law graphs, which is a general class of graphs that include social and biological networks. Computational experiments are performed to study the e®ectiveness of the proposed solution procedures on benchmark instances and real-life instances. The relationship of these models to the classical maximum clique problem is studied, leading to several interesting observations including a new compact integer programming formulation. We also prove new continuous non-linear formulations for the classical maximum independent set problem which maximize continuous functions over the unit hypercube, and characterize its local and global maxima. Finally, clustering and network design extensions of the clique relaxation models are explored. v Dedicated to my parents vi ACKNOWLEDGMENTS I consider myself truly lucky to have worked with Dr. Sergiy Butenko for my doctoral research. Sergiy has been a resourceful, insightful and patient advisor, a valuable guide in my professional development and most importantly, a true friend and colleague. I am grateful to Sergiy for making my doctoral experience a rich and memorable one, and my admiration and respect go to him. I would like to express my sincere thanks to Dr. Illya Hicks, my committee member and collaborator, for taking a keen interest in my research and professional development. His expertise in polyhedral combinatorics was a tremendous support for me and guided several research directions taken in this dissertation. My thanks are also due to my wonderful committee members Dr. Wilbert Wilhelm and Dr. Catherine Yan, for their patience and constant support. I am ever grateful to Dr. G. Srinivasan, my undergraduate mentor, for introducing me to the fascinating ¯eld of operations research, and for encouraging me to pursue a doctorate. The ISE department at Texas A&M provided me with a wonderful learning atmosphere and opportunities to develop the skills I need in academia. I would especially like to thank Drs. Brett Peters, Guy Curry and Richard Feldman, for providing me with several opportunities to teach, and for their guidance and support. I am also indebted to Drs. Amarnath Banerjee, Gautam Natarajan, Yu Ding, Lewis Ntaimo, Gary Gaukler, Eylem Tekin, Andrew Johnson and Eric Bickel, for guiding me and supporting me through my search for a faculty position. Without the support from the e±cient and friendly administrative and technical sta® at ISE, my doctoral program would have been a lot more di±cult. In particular, my special thanks are due to Judy Meeks, Michele Bork, Claudia Samford, Katherine vii Edwards, Mark Henry, Mark Hopcus and Dennis Allen. I would also like to thank the ISE department for ¯nancially supporting my graduate studies. I would like to thank Deepak Warrier, Sharat Bulusu, Brijesh Vasudeva Rao, Svyatoslav Trukhanov, Oleksii Ursulenko, Reza Seyedshohadaie, Sera Kahruman, Sandeep Sachdeva, Homarjun Agrahari, Abhishek Shrivastava, Elif Kolotoglu and Jung Jin Cho, for being wonderful colleagues, and friends. I would also like to thank Sujan Dan, Gabriel Krishnamoorthy, Smriti Jayara- man, Vijay Ramakrishnan, Taraka Donti and Karambir Kalsi, for their lasting friend- ships which made my life in College Station, a memorable one. There are numerous others who have helped me personally and professionally, and it would be impossible for me to name all of them. But my sincere thanks are due to them all. Words cannot express the love and pride I have for my parents for making me who I am. It is their ambition, encouragement and support that has always kept me on the right track. I am ever grateful and indebted to my parents for always giving me more than I wanted, more than I deserved and more than they could. viii TABLE OF CONTENTS CHAPTER Page I INTRODUCTION :::::::::::::::::::::::::: 1 II BACKGROUND :::::::::::::::::::::::::: 10 II.1. Social Network Analysis . 10 II.2. Graph Theory . 17 II.3. Complexity Theory . 20 II.4. Polyhedral Theory and Combinatorial Optimization . 23 II.5. Branch-and-cut . 28 II.6. Cliques and Independent Sets . 30 II.6.1. Polyhedral Results . 32 II.6.2. Continuous Approaches . 35 III CLIQUE RELAXATIONS ::::::::::::::::::::: 39 III.1. Distance-Based and Diameter-Based Relaxations . 39 III.2. Degree-Based Relaxation . 43 III.3. Comparison of the Models . 44 III.4. Existing Approaches . 48 IV COMPUTATIONAL COMPLEXITY ::::::::::::::: 52 IV.1. Complexity of k-Clique and k-Club . 53 IV.2. Complexity of k-Plex . 57 IV.3. Some Special Cases . 60 V THE MAXIMUM k-CLIQUE AND k-CLUB PROBLEMS ::: 64 V.1. The Maximum k-Clique Problem . 64 V.2. The Maximum k-Club Problem . 66 V.3. The Maximum 2 -Club Problem . 68 V.4. Solving the Maximum 2 -Club Problem on Power Law Graphs . 71 VI THE MAXIMUM k-PLEX PROBLEM :::::::::::::: 75 VI.1. The Maximum k-Plex Problem . 75 VI.2. Facets and Valid Inequalities . 77 ix CHAPTER Page VI.3. Solving the Maximum k-Plex Problem . 84 VI.4. On the Maximum Clique Problem . 89 VII COMPUTATIONAL EXPERIMENTS :::::::::::::: 92 VII.1. General Implementation Details . 92 VII.2. Description of the Test-bed . 94 VII.3. Numerical Results: Maximum k-Plex Problem . 98 VII.3.1. BC Algorithms on Group I Instances . 99 VII.3.2. IPBC Algorithm on Group II Instances . 103 VII.4. Maximum Clique Problem: Formulation Study . 113 VII.5. Numerical Results: Maximum 2-Club Problem . 115 VIII CONTINUOUS GLOBAL OPTIMIZATION FORMULATIONS FOR INDEPENDENCE NUMBER OF A GRAPH ::::::: 119 VIII.1.Continuous Formulation for Independence Number . 120 VIII.2.Local Maxima . 124 VIII.3.Modi¯ed Formulation . 130 VIII.4.Numerical Experiments . 136 VIII.4.1. Global Optimization . 136 VIII.4.2. Local Optimization . 139 IX NETWORK CLUSTERING AND DESIGN EXTENSIONS ::: 143 IX.1. Network Clustering . 143 IX.2. The Clustering Problem . 145 IX.3. Clique-based Clustering . 146 IX.3.1. Clique Partitioning and Covering . 146 IX.3.2. Min-Max d-Clustering . 147 IX.4. Clique Relaxations in Clustering . 149 IX.4.1. k-Clique and k-Club Clustering . 149 IX.4.2. k-Plex Clustering . 152 IX.5. Network Design Problem . 153 X CONCLUSION AND FUTURE WORK :::::::::::::: 158 REFERENCES ::::::::::::::::::::::::::::::::::: 166 APPENDIX A ::::::::::::::::::::::::::::::::::: 178 VITA :::::::::::::::::::::::::::::::::::::::: 192 x LIST OF TABLES TABLE Page 1 Dimacs benchmarks ::::::::::::::::::::::::::: 95 2 Parameter settings :::::::::::::::::::::::::::: 98 3 Summary of results on Sanchis-log instances :::::::::::::: 101 4 Summary of results on Sanchis-linear instances ::::::::::::

Load more