On Threshold Editing

On Threshold Editing Bachelor Thesis of Matthias Schimek At the Department of Informatics Institute of Theoretical Informatics Reviewers: Prof. Dr. Dorothea Wagner Prof. Dr. Peter Sanders Advisor: Michael Hamann, M.Sc. Time Period: 15th June 2016 – 14th October 2016 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu Statement of Authorship I hereby declare that this document has been composed by myself and describes my own work, unless otherwise acknowledged in the text. Karlsruhe, 10th October 2016 iii Abstract In this thesis we consider the problems threshold deletion and threshold editing on quasi-threshold graphs. Both problems ask to transform an input graph into a threshold graph using a minimal number of edge operations. In threshold deletion these are restricted to the deletion of edges only, whereas in threshold editing both the deletion and the insertion of edges are permitted. Threshold graphs are graphs that do not contain a P4, C4 or 2K2 as a vertex-induced subgraph. Both problems are NP-complete on general graphs. We focus on quasi-threshold graphs as input graphs, i.e. graphs that do not contain a P4 or C4 as a vertex-induced subgraph. For the problem of threshold deletion we will give an algorithm that needs O(|V | + |E|) time, where Q = (V, E) is the input graph. In the case of threshold editing we present an algorithm that needs exponential time in the size of the vertex set of the input graph. However, our evaluation shows that the running time of this algorithm is feasible for a set of randomly generated input graphs consisting of up to 300 vertices. Furthermore, we give a heuristic for this problem with a quadratic running time. Deutsche Zusammenfassung In dieser Arbeit werden die Probleme Threshold Deletion und Threshold Editing auf Quasi-Threshold-Graphen untersucht. Das Problem Threshold Deletion besteht darin, eine minimale Anzahl von Kanten des Eingabegraphen zu löschen, um diesen in einen Threshold-Graphen umzuwandeln. Für Threshold Editing dürfen nicht nur Kanten gelöscht, sondern auch Kanten hinzugefügt werden, um den Eingabegraphen in einen Threshold-Graphen zu überführen. Threshold-Graphen können durch verbotene Subgraphen charakterisiert werden. Es sind genau die Graphen, welche keinen P4, C4 oder 2K2 als knoteninduzierten Subgraphen enthalten. Auf allgemeinen Graphen sind beide Probleme NP-vollständig. Als Eingabegraphen werden in dieser Arbeit nur Quasi-Threshold-Graphen betrach- tet. Diese sind Graphen, die keinen P4 oder C4 als knoteninduzierten Subgraphen enthalten. Für Threshold Deletion wird ein Algorithmus vorgestellt, der das Problem in O(|V | + |E|) löst, wobei Q = (V, E) der Eingabegraph ist. Ebenso wird ein Algorithmus für das Problem Threshold Editing gegeben, dieser hat eine exponen- tielle Laufzeit in der Knotenanzahl des Eingabegraphen. Jedoch erweist sich der Algorithmus auf den zur Evaluation verwendeten zufällig generierten Graphen mit bis zu 300 Knoten als praktikabel. Weiterhin wird eine Heuristik für dieses Problem vorgestellt, deren Laufzeit quadratisch ist. v Contents 1 Introduction 1 2 Editing to Threshold Graphs 5 2.1 Quasi-Threshold Graphs . .5 2.2 Threshold Graphs . .6 2.3 Quasi-Threshold to Threshold Deletion . .7 2.4 Quasi-Threshold to Threshold Editing . 11 2.4.1 Properties of Nearest Threshold Graphs . 11 2.4.2 Exact Algorithm . 21 2.4.2.1 Enumerating-Algorithm . 22 2.4.2.2 Leaf-Positioning-Algorithm . 26 2.4.2.3 Running Time . 31 2.4.3 Heuristic . 31 3 Experimental Evaluation 35 3.1 Number of Edits . 36 3.2 Running Time . 37 3.3 Other Graphs . 38 4 Conclusion 41 Bibliography 43 vii 1. Introduction The main topics of this thesis are the graph modification problems threshold editing and threshold deletion. In general, a graph modification problem asks to transform an input graph G into a graph G0 with certain properties using a number of operations such as the insertion or deletion of vertices or edges. We denote by threshold editing the modification problem that only allows the insertion/deletion of edges in order to obtain a threshold graph from an input graph G. A threshold graph is a graph that does not contain P4, C4 or 2K2 as vertex-induced subgraphs (see Section 2.2 for further details and Figure 1.1 for an illustration). In the following, we define the problem more precisely: Threshold Editing Input: A graph G = (V, E) and k ∈ N. Problem: Is there a set F ⊆ V × V of size at most k such that G0 = (V, E∆F ) is a threshold graph? If the only modification allowed is the deletion of edges, the corresponding problem is called threshold deletion. Threshold Deletion Input: A graph G = (V, E) and k ∈ N. Problem: Is there a set F ⊆ E of size at most k such that G0 = (V, E\F ) is a threshold graph? The NP-completeness of threshold deletion is shown in [Mar94]. In [DDLS15], Drange et al. show that the same holds for threshold editing, even on split graphs. Those are graphs whose vertex sets can be partioned into a clique and an independent set [BLS99]. These graphs can also be characterized as the graph class that does not contain C4,C5 or 2K2 as vertex-induced subgraphs. Furthermore, they give algorithms√ solving threshold editing and threshold deletion for an input graph G = (V, E) in 2O( k log k)+ poly(|V |) time, where k is the parameter from the problem definitions. This shows in particular that threshold editing and threshold deletion are fixed-parameter tractable, i.e. their complexity can be described 1 1. Introduction (a) (b) (c) (d) Figure 1.1: P4 (a), C4 (b), C5 (c) and 2K2 (d). c by f(k) · |P | , where P is a problem instance, f : N → N, c ∈ N and k is a parameter of P . This result can be genearalized. In [Cai96], Cai shows that all graph modification problems are fixed-parameter tractable if the target graph class can be characterized by a finite set of forbidden subgraphs. In this thesis we will focus on threshold editing and threshold deletion with quasi-threshold graphs as our problem input, i.e. graphs that do not contain P4 or C4 as vertex-induced subgraphs. As threshold editing is NP-complete on split graphs, the question arises whether the same holds true for quasi-threshold graphs, whose set of forbidden subgraphs is related. Quasi-threshold graphs can be recognized in linear time [JHJJC96]. One of the applications of quasi-threshold graphs is in graph clustering. In [NG13] Nastos et al. present the following idea in order to find clusters in social networks: a graph G representing a social network is transformed into a nearest quasi-threshold graph Q, i.e. the editing distance from G to Q is minimal. The connected components of Q are viewed as clusters of G. In the field of bioinformatics, the approach of transforming an input graph to a set of disjoint cliques and interpreting these cliques as clusters is called cluster editing and widely discussed (see for instance [BB13]). One could adapt this concept for threshold graphs. Hence, theoretical considerations in this area might be of interest. Our contribution. In the case of threshold deletion, we give an algorithm that computes for any quasi-threshold graph Q = (V, E) a nearest threshold graph G0 = (V, E\F ), i.e. the set F has minimal size among all threshold graphs with vertex set V . The algorithm has O(|V | + |E|) complexity, whereas the problem is NP-complete on general graphs. Similarly for threshold editing we present an algorithm that computes a nearest threshold graph G0 = (V, E∆F ), i.e. F has minimal size among all threshold graphs with vertex set V . This algorithm needs O(2n(log n+1)+log n + |V | + |E|) time, where n is the number of vertices that are not leaves in a skeleton of the input graph (see Section 2.1 for details about the skeleton representation of quasi-threshold graphs). Our evaluation shows that its running time is feasible for a set of randomly generated graphs consisting of up to 300 vertices. This suggests that our algorithm can solve threshold editing for quasi-threshold graphs for which the use of the algorithm given in [DDLS15] would not be practical as the quasi-threshold graphs of this size that we examined in our evaluation required about 800 edits. The question whether threshold editing is NP-complete on quasi-threshold graphs remains open. Furthermore we give a heuristic for threshold editing. It is based on the Quasi-Threshold Mover given in [BHSW15], a heuristic for quasi-threshold editing. Additionally, we compare the exact algorithms for threshold deletion, threshold editing and the heuristic for threshold editing with respect to running time and to differences in the number of calculated edits. Preliminaries. Let G be a graph. We define VG as its vertex set and EG as its edge set. For a vertex v ∈ VG, by NG(v) we denote the neighbourhood of v, i.e. N(v) = {u | {u, v} ∈ EG}. 2 Further we define N¯ G(v) := NG(v) ∪ {v}. For a vertex v ∈ VG, the graph G − v is the subgraph of G induced by VG\{v}. The graph G + v, where v∈ / VG, has the same edge set as G and the vertex set VG ∪{v}. For a directed tree T and v ∈ VT , DepthT (v) denotes the depth of v in T . With LeavesT as the set of all leaves in T we define InnerT := VT \ LeavesT as the set of all ‘inner’ vertices, i.e. the vertices that are not leaves. For a vertex v, AncestorsT (v) shall be the set of all ancestors of v in T , i.e.

Load more