Engineering Graph Clustering Algorithms

Total Page:16

File Type:pdf, Size:1020Kb

Engineering Graph Clustering Algorithms Engineering Graph Clustering Algorithms zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Fakult¨at f¨ur Informatik des Karlsruher Instituts f¨ur Technologie (KIT) genehmigte Dissertation von Andrea Kappes aus Schw¨abisch Hall Tag der m¨undlichen Pr¨ufung: 28. April 2015 Erster Gutachter: Prof. Dr. Dorothea Wagner Zweiter Gutachter: Prof. Dr. Ulrik Brandes Acknowledgments First of all, I would like to thank my advisor Dorothea Wagner for giving me the possibil- ity to work in her group. Among many things, I am especially grateful for her patience, encouragement, and support during the last year, making it possible for me to finish my dissertation after my son was born. My work as a PhD student would have been much more cumbersome and boring without all of my former and current colleagues. In particular, I would like to thank all coauthors of my publications for the good discussions and fruitful cooperation. Special thanks go to Robert G¨orke for introducing me to the concepts behind graph clustering and teaching me how to write scientific texts. A big thank-you goes to Tanja Hartmann, Andreas Gemsa, and Markus V¨olker for proof-reading parts of this thesis and giving valuable feedback. Furthermore, I would like to thank Lilian Beckert, Elke Sauer, Bernd Giesinger and Rolf K¨olmel for their friendly help with administrative tasks and technical issues, Ulrik Brandes for willingly taking over the task to review this thesis, and David Bader for giving me the opportunity to visit his group at Georgia Tech. Finally, I would like to thank everyone that supported and encouraged me outside the office, especially my parents and my husband Andr´e. It is still a mystery to me how Andr´emanaged to find time to proof-read parts of this thesis somewhere in between changing diapers and cooking delicious meals to keep me well-nourished in the last phase of the write-up. Last but not least, thanks are due to my son Johann who always managed to brighten me up with his smiles and who reminded me from time to time what life is all about. iii Deutsche Zusammenfassung Netzwerke im Sinne von Objekten, die in Beziehung zueinander stehen, sind allge- genw¨artig. Paradebeispiele sind soziale Netzwerke, die Beziehungen zwischen Individuen modellieren, seien es Freundschaftsbeziehungen in Online-Communities wie Facebook oder Google+ oder pers¨onliche Netzwerke, die Freunde, Arbeitskollegen oder Familien- angeh¨orige verbinden. Aber auch in vielen anderen Zusammenh¨angen treten Netzwerke auf: Produkte k¨onnen miteinander gekauft werden, Proteine miteinander interagieren oder wissenschaftliche Publikationen andere Artikel zitieren, um nur einige Beispiele zu nennen. In vielen Bereichen kommt dabei dem Begriff einer dicht vernetzten Gruppe eine semantisch beachtenswerte Bedeutung zu. In den oben genannten Netzwerken k¨onnen solche Gruppen Freundeskreisen, Gruppen ¨ahnlicher Produkte, funktionellen Gruppen von Proteinen oder Wissenschaftsbereichen entsprechen. Auf einer abstrakten Ebene werden Netzwerke als Graphen modelliert und dicht vernetz- te Gruppen Cluster genannt. Grob gesprochen liegt dem Begriff Cluster zugrunde, dass der damit verbundene Teilgraph viele Kanten enth¨alt und es gleichzeitig wenige Kanten gibt, die Knoten im Cluster mit dem restlichen Graphen verbinden. Da dieses Paradig- ma vergleichsweise vage ist, existieren verschiedene Formalisierungen, die versuchen, die Gute¨ einer Clusterung zu quantifizieren. In der vorliegenden Arbeit werden im Wesentlichen zwei verschiedene Ans¨atze betrach- tet. Der erste bemisst Clusterungen anhand der der Dunne¨ der durch die Cluster in- duzierten Schnitte, w¨ahrend der zweite Ansatz sich mit der Optimierung des Maßes Surprise [5] besch¨aftigt. Schnittbasiertes Clustern. Eines der grundlegenden Probleme der Algorithmik ist die Bestimmung eines minimales Schnittes in einem Graphen, das heißt, einer Zerlegung des Graphen in zwei Teile, die mit einer minimalen Anzahl Kanten verbunden sind. Ein naiver Algorithmus zum Clustern von Graphen k¨onnte dies benutzen, um den Graphen iterativ anhand von minimalen Schnitten zu zerlegen, so dass m¨oglichst wenige Kanten getrennt werden. In der Praxis erweisen sich minimale Schnitte jedoch als problematisch, da diese oftmals nur einen oder weniger Knoten vom Rest des Graphen trennen. Des- halb empfehlen sich fur¨ Algorithmen zum Clustern von Graphen Maße, die die Gr¨oße der einzelnen Teile miteinbeziehen, wie die conductance, expansion oder density eines Schnittes. Aufbauend auf diesen Schnittmaßen k¨onnen konkrete Separiertheitsmaße de- finiert werden, die bemessen, wie gut die einzelnen Cluster voneinander getrennt sind; dies bringt allerdings mehrere Freiheitsgrade mit sich. Im ersten Teil der Arbeit werden systematisch alle m¨oglichen Kombinationen dieser Freiheitsgrade untersucht, was zu einer Familie von Maßen fur¨ die Separiertheit von v vi Clustern fuhrt.¨ Wir betrachten das generische Problem, die Separiertheit der Cluster zu optimieren und gleichzeitig eine Gutegarantie¨ an die Dichte der einzelnen Cluster nicht zu unterschreiten. Dazu wird eine agglomerative Heuristik fur¨ distanzbasiertes Clustern betrachtet [204], die in jedem Schritt zwei Cluster mit minimaler Distanz vereinigt und untersucht, inwieweit sich effiziente Algorithmen fur¨ diese Heuristik auf das Clustern von Graphen ubertragen¨ lassen. Um diese Frage zu beantworten, werden Eigenschaften identifiziert, die das dabei verwendete Gutemaߨ erfullen¨ muss und die betrachteten Maße im Hinblick auf diese Eigenschaften klassifiziert. Die Intuition hinter dem betrachteten Optimierungsproblem ist, dass die schnittbasier- ten Maße große Cluster bevorzugen, die nur durch wenige Kanten verbunden sind, wo- hingegen Garantien an die Dichte leichter durch die Erzeugung vieler kleiner Cluster zu erfullen¨ sind. Fur¨ alle vorgeschlagenen Maße wird untersucht, ob diese Intuition zu- treffend ist, indem uberpr¨ uft¨ wird, ob bei der Verwendung agglomerativer Heuristiken lokale Optima auftreten k¨onnen. Diese theoretischen Betrachtungen werden durch eine umfassende, empirische Studie erg¨anzt. Ein Vergleich der erw¨ahnten Heuristik mit einer zweiten Heuristik, die im We- sentlichen auf dem Verschieben von Knoten und Teilmengen von Knoten zwischen Clus- tern basiert, zeigt, dass die letztere oft zu besserer Qualit¨at fuhrt.¨ Im zweiten Teil der Experimente wird untersucht, wie sich die Wahl der verwendeten Maße fur¨ die Sepa- riertheit von Clustern und die Dichte der einzelnen Cluster auf die Anzahl der Cluster, die Clustergr¨oßenverteilung und das Verhalten in Hinblick auf synthetische Daten mit einer eingepflanzten Clusterstruktur auswirkt. Surprise. Einen anderen Ansatz zum Clustern von Graphen liefern Clustermaße, die auf einem Nullmodell basieren. Die Idee ist hierbei, zu einem gegebenen Graphen und ei- ner gegebenen Clusterung einen Zufallsgraphen auf derselben Knotenmenge zu betrach- ten, der einige Eigenschaften des Ausgangsgraphen beibeh¨alt. Nun wird die tats¨achliche Anzahl der Kanten innerhalb von Clustern mit demselben Wert auf dem Zufallsgraphen verglichen. Ist dieser Wert wesentlich h¨oher, als zu erwarten w¨are, ist die Clusterung gut an den Graphen angepasst. Ein Clustermaß, das auf diesem Prinzip basiert und das von einer Reihe oft benutzter Algorithmen optimiert wird, ist die Modularity einer Cluste- rung [173]. Ein verwandtes, relativ neues Maß ist Surprise, das die Wahrscheinlichkeit minimiert, dass mindestens so viele Kanten zuf¨allig innerhalb von Clustern erzeugt wer- den wie tats¨achlich beobachtet werden. Experimentelle Studien [5, 7, 8] liefern Indizien dafur,¨ dass dieses Maß einige der in letzter Zeit aufgekommenen Kritikpunkte an Mo- dularity wie den Hang zu uberm¨ ¨aßig großen Clustern in großen Graphen nicht zu teilen scheint. Der zweite Teil der Arbeit widmet sich sowohl der Komplexit¨at des Problems, eine Clus- terung mit optimaler Surprise zu finden als auch praktischen Algorithmen, um dieses Problem zu l¨osen. Aufbauend auf der Beobachtung, dass Optimall¨osungen bezuglich¨ Sur- prise immer Pareto-optimal bezuglich¨ der Maximierung der Anzahl der Kanten und der Minimierung der Anzahl der Knotenpaare innerhalb von Clustern ist, wird gezeigt, dass das assoziierte Entscheidungsproblem -vollst¨andig ist, Optimall¨osungen in Graphen NP mit konstanter Baumbreite jedoch mit polynomieller Laufzeit gefunden werden k¨onnen. Aus praktischer Sicht werden Methoden vorgeschlagen, die eine optimale L¨osung auf kleinen Instanzen mit Hilfe von ganzzahligen linearen Programmen berechnen, sowie Algorithmen betrachtet, die heuristisch gute L¨osungen fur¨ große Graphen finden. Beide vii Ans¨atze werden experimentell evaluiert und mit einer bereits bestehenden Metaheuris- tik [9] verglichen. Zus¨atzlich wird ein neues Clustermaß abgeleitet und diskutiert, das Eigenschaften von Modularity und Surprise miteinander kombiniert. Generierung von Testinstanzen. Wie bei allen algorithmischen Fragestellungen stellt sich bei der Evaluierung von Algorithmen zum Clustern von Graphen die Frage nach geeigneten Testinstanzen. Um die praktische Anwendbarkeit eines Algorithmus zu belegen, eignen sich hierzu in erster Linie Echtweltdaten aus verschiedenen Anwendungs- bereichen. Es gibt jedoch auch eine Reihe von Grunden,¨ die fur¨ die (zus¨atzliche) Verwen- dung von synthetischen Graphen sprechen. Synthetische Graphen k¨onnen ublicherweise¨ in jeder beliebigen Gr¨oße generiert werden, was hilfreich in Verbindung mit Skalier- barkeitsstudien ist. Weiterhin erm¨oglichen es Graphgeneratoren, einen Algorithmus
Recommended publications
  • Strong Triadic Closure in Cographs and Graphs of Low Maximum Degree
    Strong Triadic Closure in Cographs and Graphs of Low Maximum Degree Athanasios L. Konstantinidis1, Stavros D. Nikolopoulos2, and Charis Papadopoulos1 1 Department of Mathematics, University of Ioannina, Greece. [email protected], [email protected] 2 Department of Computer Science & Engineering, University of Ioannina, Greece. [email protected] Abstract. The MaxSTC problem is an assignment of the edges with strong or weak labels having the maximum number of strong edges such that any two vertices that have a common neighbor with a strong edge are adjacent. The Cluster Deletion problem seeks for the minimum number of edge removals of a given graph such that the remaining graph is a disjoint union of cliques. Both problems are known to be NP-hard and an optimal solution for the Cluster Deletion problem provides a solu- tion for the MaxSTC problem, however not necessarily an optimal one. In this work we give the first systematic study that reveals graph families for which the optimal solutions for MaxSTC and Cluster Deletion coincide. We first show that MaxSTC coincides with Cluster Dele- tion on cographs and, thus, MaxSTC is solvable in quadratic time on cographs. As a side result, we give an interesting computational charac- terization of the maximum independent set on the cartesian product of two cographs. Furthermore we study how low degree bounds influence the complexity of the MaxSTC problem. We show that this problem is polynomial-time solvable on graphs of maximum degree three, whereas MaxSTC becomes NP-complete on graphs of maximum degree four. The latter implies that there is no subexponential-time algorithm for MaxSTC unless the Exponential-Time Hypothesis fails.
    [Show full text]
  • Complexity of the Cluster Deletion Problem on Subclasses of Chordal Graphs∗
    Complexity of the cluster deletion problem on subclasses of chordal graphs∗ Flavia Bonomoy Guillermo Duranz Mario Valencia-Pabonx Abstract We consider the following vertex-partition problem on graphs, known as the CLUSTER DELETION (CD) problem: given a graph with real nonnegative edge weights, partition the vertices into clusters (in this case, cliques) to minimize the total weight of edges outside the clusters. The decision version of this optimization problem is known to be NP-complete even for unweighted graphs and has been studied extensively. We investigate the complexity of the decision CD problem for the family of chordal graphs, showing that it is NP-complete for weighted split graphs, weighted interval graphs and unweighted chordal graphs. We also prove that the problem is NP-complete for weighted cographs. Some polynomial-time solvable cases of the optimization problem are also identified, in particular CD for unweighted split graphs, unweighted proper interval graphs and weighted block graphs. Keywords:Block graphs, Cliques, Edge-deletion, Cluster deletion, Interval graphs, Split graphs, Submodular functions, Chordal graphs, Cographs, NP-completeness. 1 Introduction Clustering is an important task in the data analysis process. It can be viewed as a data modeling technique that provides an attractive mechanism for automatically finding the hidden structure of large data sets. The input to the problem is typically a set of elements and pairwise similarity values between elements. The goal is to partition these elements into subsets called clusters such that two meta-criteria are satisfied: homogeneity{elements in a given cluster are highly similar to each other; and separation{elements from different clusters have low similarity to each other.
    [Show full text]
  • Graph-Based Data Clustering with Overlaps
    Proc. 15th COCOON, 2009 Graph-Based Data Clustering with Overlaps Michael R. Fellows1⋆, Jiong Guo2, Christian Komusiewicz2⋆⋆, Rolf Niedermeier2, and Johannes Uhlmann2⋆ ⋆ ⋆ 1 PC Research Unit, Office of DVC (Research), University of Newcastle, Callaghan, NSW 2308, Australia. [email protected] 2 Institut f¨ur Informatik, Friedrich-Schiller-Universit¨at Jena Ernst-Abbe-Platz 2, D-07743 Jena, Germany {jiong.guo,c.komus,rolf.niedermeier,johannes.uhlmann}@uni-jena.de Abstract. We introduce overlap cluster graph modification problems where, other than in most previous work, the clusters of the target graph may overlap. More precisely, the studied graph problems ask for a mini- mum number of edge modifications such that the resulting graph consists of clusters (maximal cliques) that may overlap up to a certain amount specified by the overlap number s. In the case of s-vertex overlap, each vertex may be part of at most s maximal cliques; s-edge overlap is anal- ogously defined in terms of edges. We provide a complete complexity dichotomy (polynomial-time solvable vs NP-complete) for the underly- ing edge modification problems, develop forbidden subgraph characteri- zations of “cluster graphs with overlaps”, and study the parameterized complexity in terms of the number of allowed edge modifications, achiev- ing fixed-parameter tractability results (in case of constant s-values) and parameterized hardness (in case of unbounded s-values). 1 Introduction Graph-based data clustering is an important tool in exploratory data analy- sis [21,25]. The applications range from bioinformatics [2,22] to image process- ing [24]. The formulation as a graph-theoretic problem relies on the notion of a similarity graph, where vertices represent data items and an edge between two vertices expresses high similarity between the corresponding data items.
    [Show full text]
  • Subgraph Complementation Fedor V
    Subgraph Complementation Fedor V. Fomin, Petr A. Golovach, Torstein Strømme, Dimitrios M. Thilikos To cite this version: Fedor V. Fomin, Petr A. Golovach, Torstein Strømme, Dimitrios M. Thilikos. Subgraph Comple- mentation. Algorithmica, Springer Verlag, 2020, 82 (7), pp.1859-1880. 10.1007/s00453-020-00677-8. hal-03002656 HAL Id: hal-03002656 https://hal.archives-ouvertes.fr/hal-03002656 Submitted on 20 Nov 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Algorithmica https://doi.org/10.1007/s00453-020-00677-8 Subgraph Complementation Fedor V. Fomin1 · Petr A. Golovach1 · Torstein J. F. Strømme1 Dimitrios M. Thilikos2 Received: 30 January 2019 / Accepted: 8 January 2020 © The Author(s) 2020 Abstract A subgraph complement of the graph G is a graph obtained from G by complement- ing all the edges in one of its induced subgraphs. We study the following algorithmic question: for a given graph G and graph class G , is there a subgraph complement of G which is in G ? We show that this problem can be solved in polynomial time for various choices of the graphs class G , such as bipartite, d-degenerate, or cographs.
    [Show full text]
  • Cluster Deletion on Interval Graphs and Split Related Graphs Athanasios L
    Cluster Deletion on Interval Graphs and Split Related Graphs Athanasios L. Konstantinidis Department of Mathematics, University of Ioannina, Greece [email protected] Charis Papadopoulos Department of Mathematics, University of Ioannina, Greece [email protected] Abstract In the Cluster Deletion problem the goal is to remove the minimum number of edges of a given graph, such that every connected component of the resulting graph constitutes a clique. It is known that the decision version of Cluster Deletion is NP-complete on (P5-free) chordal graphs, whereas Cluster Deletion is solved in polynomial time on split graphs. However, the existence of a polynomial-time algorithm of Cluster Deletion on interval graphs, a proper subclass of chordal graphs, remained a well-known open problem. Our main contribution is that we settle this problem in the affirmative, by providing a polynomial-time algorithm for Cluster Deletion on interval graphs. Moreover, despite the simple formulation of the algorithm on split graphs, we show that Cluster Deletion remains NP-complete on a natural and slight generalization of split graphs that constitutes a proper subclass of P5-free chordal graphs. Although the later result arises from the already-known reduction for P5-free chordal graphs, we give an alternative proof showing an interesting connection between edge-weighted and vertex-weighted variations of the problem. To complement our results, we provide faster and simpler polynomial-time algorithms for Cluster Deletion on subclasses of such a generalization of split
    [Show full text]
  • Extremal Questions in Graph Theory
    Extremal questions in graph theory vorgelegt von Maya Jakobine Stein Habilitationschrift des Fachbereichs Mathematik der Universit¨atHamburg Hamburg 2009 Contents 1 Introduction 1 2 The Loebl–Koml´os–S´osconjecture 5 2.1 History of the conjecture . 5 2.2 Special cases of the Loebl–Koml´os–S´osconjecture . 6 2.3 The regularity approach . 6 2.4 Discussion of the bounds . 7 3 An approximate version of the LKS conjecture 9 3.1 An approximate version and an extension . 9 3.2 Preliminaries . 10 3.2.1 Regularity . 10 3.2.2 The matching . 13 3.3 Proof of Theorem 2.3.3 . 15 3.3.1 Overview . 15 3.3.2 Preparations . 18 3.3.3 Partitioning the tree . 20 3.3.4 The switching . 23 3.3.5 Partitioning the matching . 25 3.3.6 Embedding lemmas for trees . 28 3.3.7 The embedding in Case 1 . 33 3.3.8 The embedding in Case 2 . 34 3.4 Proof of Theorem 3.1.1 . 36 4 Solution of the LKS conjecture for special classes of trees 39 4.1 Trees of small diameter and caterpillars . 39 4.2 Proof of Theorem 4.1.1 . 40 4.3 Proof of Theorem 4.1.2 . 45 5 An application of the LKS conjecture in Ramsey Theory 49 5.1 Ramsey numbers . 49 5.2 Ramsey numbers of trees . 50 5.3 Proof of Proposition 5.2.2 . 51 6 t-perfect graphs 53 6.1 An introduction to t-perfect graphs . 53 6.2 The polytopes SSP and TSTAB .
    [Show full text]
  • On the Relation of Strong Triadic Closure and Cluster Deletion
    Accepted for publication in Algorithmica 2019 On the Relation of Strong Triadic Closure and Cluster Deletion⋆ Niels Gr¨uttemeier and Christian Komusiewicz Abstract We study the parameterized and classical complexity of two problems that are concerned with induced paths on three vertices, called P3s, in undirected graphs G =(V,E). In Strong Triadic Closure we aim to label the edges in E as strong and weak such that at most k edges are weak and G contains no induced P3 with two strong edges. In Cluster Deletion we aim to destroy all induced P3s by a minimum number of edge deletions. We first show that Strong Triadic Closure admits a 4k-vertex kernel. Then, we study parameterization by ℓ := E k and | |− show that both problems are fixed-parameter tractable and unlikely to admit a polynomial kernel with respect to ℓ. Finally, we give a dichotomy of the classical complexity of both problems on H-free graphs for all H of order at most four. 1 Introduction We study two related graph problems arising in social network analysis and data clustering. Assume we are given a social network where vertices represent agents and edges represent relationships between these agents, and want to predict which of the relationships are important. In online social networks for example, one could aim to distinguish between close friends and spurious relationships. Sin- tos and Tsaparas [28] proposed to use the notion of strong triadic closure for this problem. This notion goes back to Granovetter’s sociological work [10]. Infor- mally, it is the assumption that if one agent has strong relationships with two other agents, then these two other agents should have at least a weak relationship.
    [Show full text]
  • Survey on Graph Algorithms Parameterized by Edge Edit Distances?
    Survey on graph algorithms parameterized by edge edit distances? Christophe Crespelle University of Bergen, Department of Informatics, N-5020 Bergen, Norway [email protected] Abstract. This survey is intended to list algorithmic results on graph problems that are param- eterized by the edit distance of the input graph G to some graph class G, i.e. the number of edges to be deleted and/or added to G in order to obtain a graph belonging to G. 1 Introduction and necessary background The goal of a survey is always to identify what has been done and what has not been done on a given topic. Here the topic we cover may seem quite precise and a bit restricted to the reader. The reason behind this is that the boundaries of this survey have been settled in order to determine to which extent one precise idea has been explored in the FPT framework. This idea is very simple. Special graph classes, such as chordal graphs or cographs for example, have much more properties than arbitrary graphs, which even stands for their definition. Because of these properties, they often admit more efficient solutions to algorithmic problems than those we can design for arbitrary graphs in general. This is in particular the case for many difficult (NP- hard) algorithmic problems which sometimes become polynomially solvable on special classes of graphs, such as coloring on chordal graphs for example. What about graphs that are not chordal but almost chordal, in the sense that removing and/or adding a few edges, say k, in the graph will make it chordal? Are they also easy to color, as chordal graphs are? One can legitimately expect that the difficulty to solve the coloring problem should depend on k: the smaller k, the more one should retrieve the good behaviour of chordal graphs which can be colored in polynomial time, and the larger k, the more one should retrieve the behaviour of arbitrary graphs which need exponential time to be colored (assuming P 6= NP ).
    [Show full text]
  • Cluster Deletion on Interval Graphs and Split Related Graphs
    Cluster Deletion on Interval Graphs and Split Related Graphs Athanasios L. Konstantinidis∗ Charis Papadopoulosy Abstract In the Cluster Deletion problem the goal is to remove the minimum number of edges of a given graph, such that every connected component of the resulting graph constitutes a clique. It is known that the decision version of Cluster Deletion is NP-complete on (P5-free) chordal graphs, whereas Cluster Deletion is solved in poly- nomial time on split graphs. However, the existence of a polynomial-time algorithm of Cluster Deletion on interval graphs, a proper subclass of chordal graphs, remained a well-known open problem. Our main contribution is that we settle this problem in the affirmative, by providing a polynomial-time algorithm for Cluster Deletion on inter- val graphs. Moreover, despite the simple formulation of the algorithm on split graphs, we show that Cluster Deletion remains NP-complete on a natural and slight gen- eralization of split graphs that constitutes a proper subclass of P5-free chordal graphs. To complement our results, we provide two polynomial-time algorithms for Cluster Deletion on subclasses of such generalizations of split graphs. 1 Introduction In graph theoretic notions, clustering is the task of partitioning the vertices of the graph into subsets, called clusters, in such a way that there should be many edges within each cluster and relatively few edges between the clusters. In many applications, the clusters are restricted to induce cliques, as the represented data of each edge corresponds to a similarity value between two objects [17, 18]. Under the term cluster graph.
    [Show full text]
  • A Graph Modification Approach for Finding Core–Periphery Structures in Protein Interaction Networks Sharon Bruckner1, Falk Hüffner2 and Christian Komusiewicz2*
    Bruckner et al. Algorithms for Molecular Biology (2015) 10:16 DOI 10.1186/s13015-015-0043-7 RESEARCH Open Access A graph modification approach for finding core–periphery structures in protein interaction networks Sharon Bruckner1, Falk Hüffner2 and Christian Komusiewicz2* Abstract The core–periphery model for protein interaction (PPI) networks assumes that protein complexes in these networks consist of a dense core and a possibly sparse periphery that is adjacent to vertices in the core of the complex. In this work, we aim at uncovering a global core–periphery structure for a given PPI network. We propose two exact graph-theoretic formulations for this task, which aim to fit the input network to a hypothetical ground truth network by a minimum number of edge modifications. In one model each cluster has its own periphery, and in the other the periphery is shared. We first analyze both models from a theoretical point of view, showing their NP-hardness. Then, we devise efficient exact and heuristic algorithms for both models and finally perform an evaluation on subnetworks of the S. cerevisiae PPI network. Keywords: Protein complexes, Graph classes, NP-hard problems Background core or its attachment (in each step, a vertex maximizing A fundamental task in the analysis of PPI networks is some specific objective function is chosen). The aim of the identification of protein complexes and functional these approaches was to predict protein complexes [4,5] modules. Herein, a basic assumption is that complexes or to reveal biological features that are correlated with in a PPI network are strongly connected among them- topological properties of core–periphery structures in selves and weakly connected to other complexes [1].
    [Show full text]
  • Complexity and Approximation of Some Graph Modi Cation Problems
    Tel Aviv University The Raymond and Beverly Sackler Faculty of Exact Sciences Scho ol of Mathematical Sciences Complexity and Approximation of Some Graph Mo dication Problems Thesis submitted in partial fulllment of graduate requirements for the degree MSc in Tel Aviv University Department of Computer Science By Assaf Natanzon Prepared under the sup ervision of Prof Ron Shamir Acknowledgments Iwould like to thank my advisor Prof Ron Shamir for directing me and giving me a helping hand and for his clarifying comments and reformulations Iwould liketothankmy b eloved Mom and Dad for their supp ort Abstract In an edge mo dication problem one has to change the edge set of a given graph as little as p ossible so as to satisfy a certain prop ertyWe prove in this pap er the NP hardness of a variety of edge mo dication problems with resp ect to some wellstudied classes of graphs These include p erfect chordal comparability split threshold and cluster graphs Weshow that some of these problems b ecome p olynomial when the input graph has b ounded degree We give a general constant factor approximation algorithm for deletion and editing problems on b ounded degree graphs with resp ect to prop erties that can b e characterized by a nite set of forbidden induced subgraphs We giveanapproximation algorithm for Cluster Deletion and Editing We also prove some combinatorial b ounds on the sizes of the minimum completion and deletion sets of a graph Contents Intro duction and Summary Intro duction Denitions Previous results
    [Show full text]
  • Graph Modification Problems and Their Applications to Genomic Research
    Sackler Faculty of Exact Sciences, School of Computer Science Graph Modification Problems and their Applications to Genomic Research THESIS SUBMITTED FOR THE DEGREE OF “DOCTOR OF PHILOSOPHY” by Roded Sharan The work on this thesis has been carried out under the supervision of Prof. Ron Shamir Submitted to the Senate of Tel-Aviv University August 2002 2 Abstract Edge modification problems call for making small changes to the edge set of an input graph in order to obtain a graph with a desired property. These problems play an important role in computer science and have applications in several fields, including molecular biology. In many application areas a graph is used to model experimental data, and then edge modifications correspond to correcting errors in the data: Adding an edge corrects a false negative error, and deleting an edge corrects a false positive error. This thesis deals with theoretical and practical modification problems. We first study the complexity and approximability of edge modification problems on some structured classes of graphs. We show that most of the studied problems are compu- tationally hard, but some have efficient solutions when restricting the degrees in the input graph. We then give a polynomial approximation algorithm for the classical minimum fill-in problem which has applications in numerical algebra. We provide fast algorithms for recognizing certain properties on dynamically changing graphs, with applications to physical mapping of DNA. We study a graph sandwich problem arising in phylogeny reconstruction and devise an efficient algorithm for it. Finally, we develop a new clustering algorithm which combines probabilistic and graph the- oretic reasoning.
    [Show full text]