On Threshold Editing
Bachelor Thesis of
Matthias Schimek
At the Department of Informatics Institute of Theoretical Informatics
Reviewers: Prof. Dr. Dorothea Wagner Prof. Dr. Peter Sanders Advisor: Michael Hamann, M.Sc.
Time Period: 15th June 2016 – 14th October 2016
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu
Statement of Authorship
I hereby declare that this document has been composed by myself and describes my own work, unless otherwise acknowledged in the text.
Karlsruhe, 10th October 2016
iii
Abstract In this thesis we consider the problems threshold deletion and threshold editing on quasi-threshold graphs. Both problems ask to transform an input graph into a threshold graph using a minimal number of edge operations. In threshold deletion these are restricted to the deletion of edges only, whereas in threshold editing both the deletion and the insertion of edges are permitted. Threshold graphs are graphs that do not contain a P4, C4 or 2K2 as a vertex-induced subgraph. Both problems are NP-complete on general graphs. We focus on quasi-threshold graphs as input graphs, i.e. graphs that do not contain a P4 or C4 as a vertex-induced subgraph. For the problem of threshold deletion we will give an algorithm that needs O(|V | + |E|) time, where Q = (V,E) is the input graph. In the case of threshold editing we present an algorithm that needs exponential time in the size of the vertex set of the input graph. However, our evaluation shows that the running time of this algorithm is feasible for a set of randomly generated input graphs consisting of up to 300 vertices. Furthermore, we give a heuristic for this problem with a quadratic running time.
Deutsche Zusammenfassung In dieser Arbeit werden die Probleme Threshold Deletion und Threshold Editing auf Quasi-Threshold-Graphen untersucht. Das Problem Threshold Deletion besteht darin, eine minimale Anzahl von Kanten des Eingabegraphen zu löschen, um diesen in einen Threshold-Graphen umzuwandeln. Für Threshold Editing dürfen nicht nur Kanten gelöscht, sondern auch Kanten hinzugefügt werden, um den Eingabegraphen in einen Threshold-Graphen zu überführen. Threshold-Graphen können durch verbotene Subgraphen charakterisiert werden. Es sind genau die Graphen, welche keinen P4, C4 oder 2K2 als knoteninduzierten Subgraphen enthalten. Auf allgemeinen Graphen sind beide Probleme NP-vollständig. Als Eingabegraphen werden in dieser Arbeit nur Quasi-Threshold-Graphen betrach- tet. Diese sind Graphen, die keinen P4 oder C4 als knoteninduzierten Subgraphen enthalten. Für Threshold Deletion wird ein Algorithmus vorgestellt, der das Problem in O(|V | + |E|) löst, wobei Q = (V,E) der Eingabegraph ist. Ebenso wird ein Algorithmus für das Problem Threshold Editing gegeben, dieser hat eine exponen- tielle Laufzeit in der Knotenanzahl des Eingabegraphen. Jedoch erweist sich der Algorithmus auf den zur Evaluation verwendeten zufällig generierten Graphen mit bis zu 300 Knoten als praktikabel. Weiterhin wird eine Heuristik für dieses Problem vorgestellt, deren Laufzeit quadratisch ist.
v
Contents
1 Introduction 1
2 Editing to Threshold Graphs 5 2.1 Quasi-Threshold Graphs ...... 5 2.2 Threshold Graphs ...... 6 2.3 Quasi-Threshold to Threshold Deletion ...... 7 2.4 Quasi-Threshold to Threshold Editing ...... 11 2.4.1 Properties of Nearest Threshold Graphs ...... 11 2.4.2 Exact Algorithm ...... 21 2.4.2.1 Enumerating-Algorithm ...... 22 2.4.2.2 Leaf-Positioning-Algorithm ...... 26 2.4.2.3 Running Time ...... 31 2.4.3 Heuristic ...... 31
3 Experimental Evaluation 35 3.1 Number of Edits ...... 36 3.2 Running Time ...... 37 3.3 Other Graphs ...... 38
4 Conclusion 41
Bibliography 43
vii
1. Introduction
The main topics of this thesis are the graph modification problems threshold editing and threshold deletion. In general, a graph modification problem asks to transform an input graph G into a graph G0 with certain properties using a number of operations such as the insertion or deletion of vertices or edges. We denote by threshold editing the modification problem that only allows the insertion/deletion of edges in order to obtain a threshold graph from an input graph G. A threshold graph is a graph that does not contain P4, C4 or 2K2 as vertex-induced subgraphs (see Section 2.2 for further details and Figure 1.1 for an illustration). In the following, we define the problem more precisely:
Threshold Editing
Input: A graph G = (V,E) and k ∈ N. Problem: Is there a set F ⊆ V × V of size at most k such that G0 = (V,E∆F ) is a threshold graph?
If the only modification allowed is the deletion of edges, the corresponding problem is called threshold deletion.
Threshold Deletion
Input: A graph G = (V,E) and k ∈ N. Problem: Is there a set F ⊆ E of size at most k such that G0 = (V,E\F ) is a threshold graph?
The NP-completeness of threshold deletion is shown in [Mar94]. In [DDLS15], Drange et al. show that the same holds for threshold editing, even on split graphs. Those are graphs whose vertex sets can be partioned into a clique and an independent set [BLS99]. These graphs can also be characterized as the graph class that does not contain C4,C5 or 2K2 as vertex-induced subgraphs. Furthermore, they give algorithms√ solving threshold editing and threshold deletion for an input graph G = (V,E) in 2O( k log k)+ poly(|V |) time, where k is the parameter from the problem definitions. This shows in particular that threshold editing and threshold deletion are fixed-parameter tractable, i.e. their complexity can be described
1 1. Introduction
(a) (b) (c)
(d)
Figure 1.1: P4 (a), C4 (b), C5 (c) and 2K2 (d).
c by f(k) · |P | , where P is a problem instance, f : N → N, c ∈ N and k is a parameter of P . This result can be genearalized. In [Cai96], Cai shows that all graph modification problems are fixed-parameter tractable if the target graph class can be characterized by a finite set of forbidden subgraphs. In this thesis we will focus on threshold editing and threshold deletion with quasi-threshold graphs as our problem input, i.e. graphs that do not contain P4 or C4 as vertex-induced subgraphs. As threshold editing is NP-complete on split graphs, the question arises whether the same holds true for quasi-threshold graphs, whose set of forbidden subgraphs is related. Quasi-threshold graphs can be recognized in linear time [JHJJC96]. One of the applications of quasi-threshold graphs is in graph clustering. In [NG13] Nastos et al. present the following idea in order to find clusters in social networks: a graph G representing a social network is transformed into a nearest quasi-threshold graph Q, i.e. the editing distance from G to Q is minimal. The connected components of Q are viewed as clusters of G. In the field of bioinformatics, the approach of transforming an input graph to a set of disjoint cliques and interpreting these cliques as clusters is called cluster editing and widely discussed (see for instance [BB13]). One could adapt this concept for threshold graphs. Hence, theoretical considerations in this area might be of interest. Our contribution. In the case of threshold deletion, we give an algorithm that computes for any quasi-threshold graph Q = (V,E) a nearest threshold graph G0 = (V,E\F ), i.e. the set F has minimal size among all threshold graphs with vertex set V . The algorithm has O(|V | + |E|) complexity, whereas the problem is NP-complete on general graphs. Similarly for threshold editing we present an algorithm that computes a nearest threshold graph G0 = (V,E∆F ), i.e. F has minimal size among all threshold graphs with vertex set V . This algorithm needs O(2n(log n+1)+log n + |V | + |E|) time, where n is the number of vertices that are not leaves in a skeleton of the input graph (see Section 2.1 for details about the skeleton representation of quasi-threshold graphs). Our evaluation shows that its running time is feasible for a set of randomly generated graphs consisting of up to 300 vertices. This suggests that our algorithm can solve threshold editing for quasi-threshold graphs for which the use of the algorithm given in [DDLS15] would not be practical as the quasi-threshold graphs of this size that we examined in our evaluation required about 800 edits. The question whether threshold editing is NP-complete on quasi-threshold graphs remains open. Furthermore we give a heuristic for threshold editing. It is based on the Quasi-Threshold Mover given in [BHSW15], a heuristic for quasi-threshold editing. Additionally, we compare the exact algorithms for threshold deletion, threshold editing and the heuristic for threshold editing with respect to running time and to differences in the number of calculated edits.
Preliminaries. Let G be a graph. We define VG as its vertex set and EG as its edge set. For a vertex v ∈ VG, by NG(v) we denote the neighbourhood of v, i.e. N(v) = {u | {u, v} ∈ EG}.
2 Further we define N¯ G(v) := NG(v) ∪ {v}. For a vertex v ∈ VG, the graph G − v is the subgraph of G induced by VG\{v}. The graph G + v, where v∈ / VG, has the same edge set as G and the vertex set VG ∪{v}. For a directed tree T and v ∈ VT , DepthT (v) denotes the depth of v in T . With LeavesT as the set of all leaves in T we define InnerT := VT \ LeavesT as the set of all ‘inner’ vertices, i.e. the vertices that are not leaves. For a vertex v, AncestorsT (v) shall be the set of all ancestors of v in T , i.e. all vertices on the path from the root of T to v but not the vertex v itself. Let the set ChildrenT (v) consist of all children of v in T and let the graph SubtreeT (v) be the subtree of v in T including v.
Let G be a graph and F a set with F ⊆ VQ ×VQ. Let T be a threshold graph with VT = VG and ET = EG∆F . The edit distance from G to T is the size of F . For a set F ⊆ EG and a threshold graph T with VT = VG and ET = EG\F , the size of F is called deletion distance from G to T . When there is no ‘target’ threshold graph mentioned, the terms edit or deletion distance of Q mean the edit or deletion distance to a nearest threshold graph of G.
3
2. Editing to Threshold Graphs
2.1 Quasi-Threshold Graphs First we define the class of quasi-threshold graphs.
Definition 2.1. A quasi-threshold graph (QTG) is a graph that does not contain a P4 or C4 as a vertex-induced subgraph.
The following is an equivalent inductive characterization [JHJJC96]:
1. A single vertex is a QTG. 2. Adding a universal vertex to a QTG results in a QTG. A universal vertex is a vertex that is adjacent to all other vertices of a graph. 3. The disjoint union of two QTGs is a QTG.
In [JHJJC96], it is shown as well that every quasi-threshold graph is induced by a directed forest. A directed forest F = (V 0,E0) is a set of directed trees. A quasi-threshold graph Q is induced in the following way: The vertex set of F is taken as that of Q and {u, v} ∈ EQ if and only if there is a path from v to u or a path from u to v in F . A directed forest which induces a quasi-threshold graph is called a skeleton representation of this graph (see Figure 2.1). It also holds that every directed forest induces a quasi-threshold graph. In [JHJJC96] and [BHSW15] linear time recognition algorithms are given that yield a skeleton representation if the input graph is a quasi-threshold graph. In the following we prove some structural properties of skeleton representations. Lemma 2.2. Let S(Q) be a skeleton of a connected QTG Q, then the following holds: ¯ ¯ 1. If DepthS(Q)(v) < DepthS(Q)(w) and {v, w} ∈ EQ then NQ(w) ⊆ NQ(v).
2. If DepthS(Q)(v) = DepthS(Q)(w) then {v, w} ∈/ EQ.
Proof. 1. If {v, w} ∈ EQ, then there must be a path from v to w in S(Q) or vice versa. As DepthS(Q)(v) < DepthS(Q)(w), the path goes from v to w. But then there is a path from v to every vertex x that can be reached from vertex w in S(Q). Also, every vertex x, that can reach v, has a path to w. Therefore we have N¯ Q(w) ⊆ N¯ Q(v).
5 2. Editing to Threshold Graphs
Figure 2.1: Skeleton of a QTG with implicit edges.
2. As two vertices v, w that have the same level in a directed tree cannot reach each other, it follows from the defintion of a skeleton that {v, w} ∈/ EQ.
Lemma 2.3. Iff a QTG Q contains a universal vertex u, then a skeleton representation S(Q) of Q is a tree.
Proof. If S(Q) is a tree, then there is a path from its root u to every vertex v ∈ VQ\{u} in S(Q). This implies that there is an edge {u, v} in Q. Hence, u is a universal vertex of Q. If u is a universal vertex in Q then it has an edge to every v ∈ VQ\{u}. Thus, there is a path from u to v. It follows that the representing forest must be connected.
Lemma 2.4. Let Q be a quasi-threshold graph with at least one universal vertex. Any skeleton representation S(Q) of Q is a tree with root r and r is a universal vertex of Q.
Proof. We know from Lemma 2.3 that any skeleton representation of a quasi-threshold graph that has a universal vertex is a tree. Further Lemma 2.2 (1) shows that DepthS(Q)(u) ≤ DepthS(Q)(v) for all universal vertices u and non-universal vertices v. Since the root of a tree is the only vertex with a depth of zero, it follows that r must be a universal vertex of Q.
2.2 Threshold Graphs Similar to quasi-threshold graphs, threshold graphs can be defined by forbidden subgraphs.
Definition 2.5. A threshold graph (TG) is a graph that does not contain a P4, C4 or 2K2 as a vertex-induced subgraph.
The following inductive characterization is equivalent [CH73].
1. A single vertex is a TG. 2. Adding an isolated vertex results in a TG. 3. Adding a universal vertex results in a TG.
Since every threshold graph is also a quasi-threshold graph, it has a skeleton representation. But a skeleton of a threshold graph can be characterized more restrictively. A directed forest that induces a threshold graph cannot have multiple connected components with more than two vertices as this would imply that the corresponding threshold graph would contain 2K2 as an induced subgraph. The connected component that consists of more than one vertex is a directed tree in which all vertices are either on the central path or leaves
6 2.3. Quasi-Threshold to Threshold Deletion
of vertices on the central path. Otherwise the induced graph would contain 2K2 as an induced subgrah. It follows that a threshold skeleton consists of a directed caterpillar and possibly some additional isolated vertices (Figure 2.2). Furthermore, an algorithm for the recognition of threshold graphs can be obtained from an algorithm recognazing quasi-threshold graphs to which we add the test whether the generated skeleton has the strucutre of a caterpillar (plus possibly additional isolated vertices). This can be achieved in linear time.
2.3 Quasi-Threshold to Threshold Deletion
In this section we treat the problem threshold deletion on quasi-threshold graphs. We will give a linear time algorithm that computes a nearest threshold graph for a given quasi-threshold input graph. Recall that the problem of threshold deletion is NP-complete on general graphs. As isolated vertices form a threshold graph we could delete all edges in Q to obtain a threshold graph from Q. But this is certainly not an optimal solution. Before giving the algorithm we prove two lemmas. They are both not restricted to threshold deletion but also work for the more general case of threshold editing.
Lemma 2.6. Let Q be a quasi-threshold graph and u a universal vertex in Q. Then u is also a universal vertex in all nearest threshold graphs of Q.
Proof. Let T be a nearest threshold graph of Q that can be reached by adding or deleting edges of Q. Assume that u is not a universal vertex of T (from which follows that e edges incident to u have been deleted). As threshold graphs are defined by forbidden subgraphs, T 0 = T − u is also threshold. If we add u as a universal graph to T 0, the resulting graph T 00 is still a threshold graph. The edit distance from Q to T 00 is smaller than the distance to T as we do not have to delete e edges. This contradicts the minimality of T . Thus, u has to be a universal vertex in T .
Lemma 2.7. Let Q be a quasi-threshold graph. We can add a universal vertex u obtaining a connected quasi-threshold graph Q0 that has the same edit distance as Q.
Proof. As Q is quasi-threshold, it follows from the inductive characterization that Q0 is also quasi-threshold. The graph Q0 is connected because u is a universal vertex. Let T 0 be a nearest threshold graph of Q0. It follows from Lemma 2.6 that u is a universal vertex in T 0. Removing u results in a threshold graph T with the vertex set of Q. Thus, the edit distance of Q0 is an upper bound for that of Q. As adding a universal vertex u to any nearest threshold graph T of Q results in a threshold graph with the vertex set of Q0 it follows that the edit distance of Q is also an upper bound for that of Q0. Hence, the edit distances of Q and Q0 are equal.
Figure 2.2: Skeleton of a threshold graph with implicit edges. The two vertices on the left are isolated vertices.
7 2. Editing to Threshold Graphs
In the following we restrict ourselves again to the problem of threshold deletion. Definition 2.8. Let S(Q) be a skeleton of a connected quasi-threshold graph, i.e. S(Q) is a tree. We define the following measures for all vertices v of S(Q). 1. Number of edges #Edges(v) The measure #Edges(v) denotes the number of edges in the subgraph of Q induced by the vertices in SubtreeS(Q)(v). 2. Deletion distance Dist(v) This measure is the deletion distance of the quasi-threshold graph induced by the subgraph of v in S(Q), i.e. the minimal number of deletions that have to be performed to obtain a threshold graph.
In the following paragraph we set C(v) := ChildrenS(Q)(v). The definition of the skeleton representation directly yields that X #Edges(v) = #Edges(c) + | SubtreeS(Q)(c)| c∈C(v) for every v ∈ S(Q).
Lemma 2.9. Let Q be a connected QTG with a skeleton S(Q). For all v ∈ VQ the following holds: ! ! min P #Edges(d) − #Edges(c) + Dist(c) if C(v) 6= ∅ Dist(v) = c∈C(v) d∈C(v) 0 if C(v) = ∅
Proof. We prove this lemma by induction on the number of vertices in Q: Base case n = 1: As C(v) is empty and a graph with a single vertex is already threshold, it follows that Dist(v) = 0. Induction hypothesis: The claim holds for a connected QTG Q with at most n ≥ 1 vertices. Inductive step: Let Q0 be a connected QTG with n + 1 vertices and S(Q0) a skeleton of Q0. The root u of S(Q0) is a universal vertex of Q0 (see Lemma 2.4). We have to consider the following two cases: 1. u has one child c. Let Q be the quasi-threshold graph induced by S(Q) := SubtreeS(Q0)(c). The formula for the deletion distance holds for all v ∈ VQ0 \{u} by the induction hypothesis as
SubtreeS(Q0)(v) = SubtreeS(Q0)−u(v) = SubtreeS(Q)(v)
and |VQ| ≤ n. Thus, we only have to show the formula’s correctnes for u. From Lemma 2.7 we know that Q0 has the same deletion distance as Q. Therefore
Dist(u) = Dist(c) = #Edges(c) − #Edges(c) + Dist(c) X = min #Edges(d) − #Edges(c) + Dist(c) c∈C(u) d∈C(u)
8 2.3. Quasi-Threshold to Threshold Deletion
holds. Furthermore, we can calculate a nearest threshold graph of Q0 by adding u as a universal vertex to the nearest threshold graph of Q. A skeleton of this graph can be obtained by adding u as a root to a skeleton of a nearest threshold graph of Q.
2. u has the children ci, where i ∈ {1, . . . , n} and n ≥ 2.
Let Qi be the quasi-threshold graph induced by SubtreeS(Q0)(ci). For all Qi we find |Qi| < n. Hence, it follows with
0 SubtreeS(Q )(v) = SubtreeS(Qi)(v)
0 for all v ∈ VQi , where i ∈ {1, . . . , n}, that the formula holds for all v ∈ VQ \{u}. Now we show the formula for u. 0 If all Qi are single vertices, Q is already threshold and we have X D(u) = 0 = min 0 − 0 + 0 c∈C(v) d∈C(v) X = min #Edges(d) − #Edges(c) + Dist(c) . c∈C(v) d∈C(v)
Otherwise, there is at least one Qi that is not a single vertex. Consider a nearest S threshold graph of the quasi-threshold graph U := Qi. A skeleton of a threshold i graph consists of a caterpillar and isolated vertices. In a skeleton of a nearest threshold graph of U the vertices in the caterpillar all belong to one Qi as we are only allowed
to delete edges and no vertex v ∈ VQi is adjacent to a vertex w ∈ VQj for i 6= j. Hence, we have to choose as our ’caterpillar graph’ the Qi which causes the lowest costs. The costs are composed of the deletion distance of Qi and the costs of isolating all other Qj, i.e. deleting all their edges. Thus, the nearest threshold graph of U has the deletion distance X 0 d = min #Edges(r ) − #Edges(r) + Dist(r) , r∈R r0∈R
where R is the set that contains all roots of the SubtreeS(Q0)(ci). Since R = C(u) and Q0 has the same deletion distance as Q, it follows that the claim holds in this case as well. Furthermore, a nearest threshold graph of Q0 can be obtained by adding u as a universal vertex to the nearest threshold graph of the lowest-cost Qi and isolating the vertices of all the other Qj with i =6 j. A skeleton of this tree can be obtained by adding u as a root to the skeleton of the lowest-cost threshold graph and attaching the vertices from the other subgraphs as leaves to u. Note that in all cases a skeleton of a nearest threshold graph of Q0 can be obtained by adding u as a root to the skeleton of the lowest-cost child and making all vertices in the other subgraphs (if there are any) leaves of u.
The proof does not only show the correctness of the formula, but it delivers an algorithm to compute a nearest threshold graph T of a quasi-threshold graph Q. For a given skeleton of a connected QTG we can perform a DFS on this tree. In this DFS we recursively calculate the measures #Edges(c), Dist(c) and the size of Subtree(c) for all children c of a vertex v. Using these measures we then calculate the measures for v. For each vertex v we store its lowest-cost child. In order to obtain the implicitly computed skeleton we traverse the
9 2. Editing to Threshold Graphs quasi-threshold skeleton following the path that is given by the stored lowest-cost children, starting from the root. All children and (their descendants in the quasi-threshold skeleton) of a visited vertex v are made leaves of v except for its lowest-cost child. The following pseudo code 2.1 describes this algorithm in detail.
Algorithm 2.1: Threshold Deletion Input: Array children containing all children of a vertex in the skeleton of a connected quasi-threshold graph Q with root r. Output: Array parent containing each vertex’s parent in a skeleton of a nearest threshold graph of Q. Dist(r) contains the deletion distance of Q. // Initialization 1 forall v ∈ VQ do 2 #Edges(v) ← 0 3 Dist(v) ← 0 4 sizeSubtree(v) ← 1 5 parent(v) ← −1 6 bestChild(v) ← ⊥
7 calculate(vertex v) 8 s, i, p, sum← 0 9 forall u ∈ children(v) do 10 calculate(u) 11 s ← s + sizeSubtree(u) 12 i ← i + #Edges(u) 13 sum ← sum + #Edges(u)
14 #Edges(v) ← i + s 15 sizeSubtree(v) ← 1 + s 16 j ← ∞ 17 forall u ∈ children(v) do 18 if sum−#Edges(u) + Dist(u) < j then 19 j ← sum−#Edges(u) + Dist(u) 20 Dist(v) ← j 21 bestChild(v) ← u
22 calculate(r) // calculate measures // create skeleton 23 cur ← r 24 while cur 6= ⊥ do 25 forall v ∈ children(cur) and v 6= bestChild(cur) do 26 forall x ∈ subtree(v) do 27 parent(x) ← cur
28 parent(bestChild(cur)) ← cur 29 cur ← bestChild(cur)
As we only traverse the skeleton two times, the complexity of this algorithm is linear in the number of vertices of Q. Note that this algorithm can even be sublinear in the number of deletions. The algorithm expects a skeleton of the input graph Q. It follows that the overall complexity is in O(|VQ| + |EQ|) as we have to compute a skeleton of Q first.
10 2.4. Quasi-Threshold to Threshold Editing
2.4 Quasi-Threshold to Threshold Editing
In this section we address the problem of threshold editing. First we will give an example of a quasi-threshold graph Q whose edit distance is smaller than its deletion distance. The QTG Q consists of two connected components: a clique of size n and a star graph of size 2n with inner vertex c (see Figure 2.3). Its edit distance is n as we can insert an edge from c to each vertex in the clique obtaining a treshold graph T . In the context of threshold deletion we have to isolate one of the two connected componets in order to obtain a threshold graph. Since the clique has n(n − 1)/2 and the star graph has 2n edges, the deletion distance of Q is 2n if n ≥ 5. The difference between edit and deletion distance of graphs with this structure is n. Hence, there are graphs for which the deletion distance is twice as large as the edit distance. Before giving an algorithm for this problem we prove some properties of nearest threshold graphs in the context of threshold editing.
2.4.1 Properties of Nearest Threshold Graphs We will show that for every QTG Q with a skeleton S(Q) there is a nearest threshold graph T and a skeleton S(T ) of T such that the quadruple (Q, S(Q),T,S(T )) has certain properties. Furthermore, we only consider connected QTGs as we can add a universal vertex to a QTG that is not connected preserving the edit distance (Lemma 2.7). The central result of this section is Theorem 2.10.
Theorem 2.10. Let Q be a connected QTG and S(Q) be a skeleton of Q. There is a nearest threshold graph T (and a skeleton S(T ) of this graph) such that the following holds: 1. A leaf in S(Q) remains a leaf in S(T ). 2. If w is an inner vertex in S(T ), then all ancestors v of w in S(Q) are also inner vertices in S(T ). 3. For two inner vertices v, w of S(Q) and S(T ) if v is an ancestor of w in S(Q), then the same holds for S(T ).
c
Figure 2.3: Skeleton of a graph Q whose edit distance (5) is smaller than its deletion distance (10).
11 2. Editing to Threshold Graphs
Before proving this theorem, we want to il- ... lustrate it. Figure 2.4 shows a segment of a skeleton of a quasi-threshold graph Q. The 0 leaves of the skeleton are coloured in blue. Let T be a nearest threshold graph of Q with 10 1 v skeleton S(T ) such that propositions (1) - (3) from Theorem 2.10 hold. 2 3 ... 5 9 Part (1) states that none of the blue coloured vertices can be an inner vertex of S(T ). Part (2) says that if w is an inner vertex of S(T ), 6 7 10 then for the vertices 6, 5, v, 0 the same holds true. The third part of this theorem is that if w 8 12 11 13 v and w are inner vertices of S(T ), the vertex w is situated below v in the skeleton S(T ). We ...... will show Theorem 2.10 by using Lemma 2.11, 2.12, 2.13, 2.14 and 2.15, which we prove first. Figure 2.4: Segment of a skeleton of a QTG Lemma 2.11. Let Q be a quasi-threshold graph, T a threshold graph of Q with skeleton S(T ) and v, w ∈ VQ vertices with NQ(w) ⊆ N¯ Q(v) and NT (v) ⊆ N¯ T (w). We denote the skeleton generated by a swap in the positions of v and w in S(T ) by S(T 0). It holds that the edit distance from Q to the threshold graph T 0 induced by S(T 0) is at most as great as that to T .