Graph modification problems related to graph classes
Federico Mancini
Dissertation for the degree of Philosophiae Doctor (PhD)
University of Bergen Norway
May 2008
i
Acknowledgements
The first person I need to thank is my supervisor Pinar Heggernes. Without her guidance, encouragement and scolding from time to time, this work would not exist. Thank you for taking me as your student, teaching me so much and believing in me from the very start. I have never told you how much this meant to me, but I hope this thesis can make up for at least some of it. These three years gave me the opportunity to fulfill many of my dreams, and for this I will always be thankful to you. Another person to whom I owe a lot for his unconditional help, even when he hardly knew me, is Marc Bezem. Your support has been critical in many occasions, including when I had to decide whether to apply for this PhD. Thank you for convincing me to do it, or I would have regretted it forever. I would like to thank also all my co-authors Hans L. Bodlaender, Michael R. Fellows, Fedor V. Fomin, Pinar Heggernes, Jan Kratochvil, Daniel Lokshtanov, Charis Papadopoulos, Frances Rosamond and Jan Arne Telle for the fruitful collaborations and interesting discussions. Solving problems alone is boring :-). And speaking of collaborations, I cannot forget Andreas Brandst¨adt, Vang Ban Le, Christian Hundt, Peter Wagner and Nguyen Ngoc Tuy for making the two months at Rostock University extremely enjoyable for me and my family. If my PhD has been such a great experience, it is due mostly to all my col- leagues at the Algorithms Group. I would like to thank in particular Alexey for all the fun time we spent together fishing, hiking, drinking or talking just about anything. All of this before we both got married of course... . Joanna, thanks for never letting things get boring, and Morten, one day we will catch that giant cod, I promise. A special thank to Yngve, who is always there with a reasonable an- swer to practically any question one could think about. Maybe too many stupid questions are also the reason why he moved out of our office, but thanks to this I could get a very thoughtful new office mate. Daniel thanks for always making sure that I do not work too much, and Saket, thank you for keeping Daniel out of the office. Serge, Rodica and Daniel, thanks to you too for being around and help me whenever I needed it. I also want to thank my parents for always supporting my choices and making sure that I have everything I need, despite the distance that separates us. Finally I want to dedicate this work to my lovely wife Hilde and our beautiful son Alessandro. Hilde, thanks for taking care of Alessandro on your own while I was away, giving me the chance to focus on this thesis and finish it in time.
Bergen, May 2008 Federico Mancini ii Contents
Part I 3
1 Introduction 3
2 Notation and definitions 7 2.1 Graphclasses ...... 9 2.1.1 Perfect graphs and their subclasses ...... 13 2.1.2 Othergraphclasses...... 19
3 Practical applications of graph modification problems 23 3.1 Minimizing the number of modifications ...... 24 3.1.1 Chordalgraphs ...... 26 3.1.2 Intervalgraphs ...... 27 3.1.3 Clustergraphs...... 29 3.1.4 Planargraphs...... 31 3.1.5 Connectedgraphs...... 31 3.1.6 Graphs characterized by cycles ...... 33 3.1.7 Comparabilitygraphs...... 34 3.2 Minimizing the maximum clique size ...... 35 3.2.1 TreewidthandPathwidth ...... 37 3.2.2 Bandwidth...... 39 3.3 Modificationwithrestrictions ...... 39 3.3.1 Sandwichproblem ...... 40 3.3.2 Extendingcolorings...... 41 3.3.3 Probegraphs ...... 42
4 Methods to solve graph modification problems 45 4.1 Restrictedinputs ...... 46 4.2 Minimality...... 47 4.2.1 Characterizing minimality ...... 52 4.2.2 Vertexincrementalapproach...... 59 4.2.3 Extraction and sandwich problem ...... 65 4.2.4 Minimality for other modification problems ...... 67
iii iv
4.3 Parameterizedalgorithms ...... 69
5 Solving problems on modified graphs 77 5.1 Parameterizedgraphfamilies...... 77 5.2 Getting more realistic: Fuzzy Graphs ...... 82
6 Conclusions 85
Part II 107
7 Paper I
8 Paper II
9 Paper III
10 Paper IV
11 Paper V Part I
Chapter 1
Introduction
This thesis consists of two parts. In the second part the research papers that constitute the new results of the thesis are presented. In this first part we want to put these results in a broader perspective and provide a better background on the general topic of graph modification problems. When speaking about graph modification problems, we mean problems con- cerned with deleting or adding edges or vertices from and to a graph, so that the resulting graph has some specified properties. As graphs can be used to represent various real world and theoretical structures, it is not difficult to see that these modification problems can model a large number of practical applications in sev- eral different fields. Some examples are: networks reliability; numerical algebra; molecular biology; computer vision; and relational databases. It is thus natural that such problems have been widely studied, but often presented in different terms and contexts, so that similar results have been rediscovered independently several times. This is one of the reasons why we want to give an organized presentation of all these different, but sometimes equivalent, works. In Chapter 2 we give basic notation and definitions about graph theory and graph classes. Chapter 3 consists of a brief survey of complexity results, with some discussion of the main motivations behind graph modification problems. In Chapter 4 we provide a more technical insight of various methods used to attack these problems, with special focus on those related to the papers that constitute the second part of this thesis. One main approach (Chapter 4.2) consists in relaxing the objective function and asking for a minimal, rather than a minimum, set of edges or vertices that when added or deleted to/from a graph gives the desired property on the graph. Another one (Chapter 4.3) consists in designing algorithms that are optimized for instances of the problem that can be solved by applying a small amount of modifications. Finally in Chapter 5 we describe two special kinds of graph families on which modification problems have not been studied extensively yet. We give some introductory discussion on the properties of these families and some basic results. Our aim is to point out their connection
3 4 with graph modification problems, and why we think they are worth investigating from this specific point of view. Five papers that the author produced during his PhD studies and that form the second part of this thesis, are the following.
• Paper I [130]: Minimal Split Completion of Graphs. Pinar Heggernes and Federico Mancini. (Accepted for publication in Disc. Appl. Math.) This paper deals with the problem of adding an inclusion minimal set of edges to a graph, in order to obtain a split graph. We give an algorithm that solves this problem in time linear in the size of the input. In addition we show that split graphs are sandwich monotone, and, as a consequence, the Minimal Split Extraction problem can be solved in linear time as well.
• Paper II [131]: Minimal Comparability Completions of Arbitrary Graphs. Pinar Heggernes, Federico Mancini, and Charis Papadopoulos. (Disc. Appl. Math. 156(1):705–71, 2008.) Here we show that the Minimal Comparability Completion problem is poly- nomial time solvable. The result is interesting considering that comparabil- ity graphs do not have the sandwich monotone property. This gives, in fact, further indication that completion problems into graph classes without this property, are harder to solve.
• Paper III [178]: Characterizing and Computing Minimal Cograph Completions. Daniel Lokshtanov, Federico Mancini and Charis Papadopoulos. (The short version of this paper appeared in Springer LNCS Proceedings of FAW’08.) The aim of this work is to characterize minimal cograph completions in order to design an efficient algorithm for the minimality testing. We accomplish this showing that testing whether a cograph completion is minimal, and computing a minimal cograph completion of an arbitrary input graph, can be done in time linear in the size of the completion.
• Paper IV [179]: Minimum fill-in and treewidth of split+kv and +ke graphs. Federico Mancini (The short version of this paper appeared in Springer LNCS Proceedings of ISAAC’07.) A parameterized graph class Π+kv is the class consisting of all graphs that can be obtained from some graph in the class Π by adding at most k 5
vertices to it. We study the parameterized complexity of Minimum Fill-in and Treewidth on Split+kv graphs, when the parameter is k. We get the surprising result that, when k = 1, Minimum Fill-in is NP-complete, while Treewidth can be solved in linear time.
• Paper V [20]: Clustering with partial information. Hans L. Bodlaender, Michael R. Fellows, Pinar Heggernes, Federico Mancini, Charis Papadopoulos and Frances Rosamond. (The short version of this paper has appeared in Springer LNCS Proceed- ings of MFCS’08.) In this paper we consider graphs that lack some information and we inves- tigate the parameterized complexity of some clustering problems on them. The lack of information is represented in the form of undetermined edges, that can be freely turned into either edges or non-edges. This adds a new level of difficulty to the problems, but also makes the model more realistic. We propose some natural parameters to cope with the hardness deriving from the lack of information, and show various complexity results depending on which parameters are used. 6 Chapter 2
Notation and definitions
A simple and undirected graph G is as a pair (V, E) where V is the set of vertices, E ⊆ V × V the set of edges, and two vertices u, v ∈ V have an edge between them, or are said to be adjacent, if and only if uv ∈ E. If uv∈ / E, we say that uv is a non-edge. If the edges have a direction, then we call each of them an arc and the graph is directed. In that case we denote them as (x, y) ∈ E if the direction is from x to y or (y, x) ∈ E otherwise. If there is more than one edge, or arc, between the same pair of vertices, these are called parallel edges, and the graph is not simple anymore. For a graph G =(V, E), we let n = |V | and m = |E|. The set of neighbors of a vertex v ∈ V , namely its neighborhood, is denoted by NG(v) = {u | uv ∈ E}, and the degree of a vertex v is denoted by dG(v) = |NG(v)|. We also define NG[v]= NG(v) ∪{v}, the close neighborhood of v. We will not use any subscript when there is no risk of ambiguity.
A subgraph of G = (V, E) is a graph G1 = (V, E1) with E1 ⊆ E. Similarly a supergraph of G = (V, E) is a graph G2 = (V, E2) with E2 ⊇ E. We will denote these relations informally by the notation G1 ⊆ G ⊆ G2 (proper subgraph relation ′ ′ ′ is denoted by G1 ⊂ G). A partial subgraph is, instead, a graph G =(V , E ) such that both V ′ ⊆ V and E′ ⊆ E. Keeping in mind this distinction, one may find that what we call a “subgraph”, is sometimes defined as a “spanning subgraph” in the literature. We distinguish between subgraphs and induced subgraphs. An induced sub- graph of G =(V, E) is a graph G′ =(V ′, E′), where V ′ ⊂ V and E′ = {uv | uv ∈ E ∧ u, v ∈ V ′}. We denote such induced subgraph by G[V ′], or, if V ′ = V \{x}, by G − x. A graph H is a minor of another graph G, if it can be obtained by edge contraction from some partial subgraph of G. Contracting an edge uv means to replace the vertices u and v with a new vertex x, such that N(x)= N(v) ∪ N(u). The complement of G = (V, E) is denoted by G = (V, E), where E = {uv |
7 8 u, v ∈ V ∧ uv∈ / E}. A graph is said to be connected if there exists a path connecting any two ver- tices of the graph. If a graph is disconnected, we refer to each maximal connected induced subgraph as a connected component of the graph. Let us now define some types of vertex sets. Given a graph G = (V, E), we call a set of vertices S ⊆ V a clique if G[S] is a complete graph, i.e., a graph with an edge between every pair of vertices. Similarly we call S an independent set, if G[S] is an edgeless graph. The dual of an independent set, is called a vertex cover. That is, a set S is a vertex cover, if V \ S is an independent set. In other words, every edge of the graph has at least one end-point in S, i.e., it is ”covered” by S. A close concept is that of a dominating set, namely a set that ”covers” all vertices of the graph. Formally, a set S is a dominating set of G if N[S] = V . A set S ⊂ V is said to be a separator for two vertices u, v ∈ V \ S, or a u, v-separator, if u and v belongs to two different connected components of G[V \ S]. Furthermore, S is a minimal u, v-separator, if no proper subset of S is a u, v-separator. Finally a set S is a minimal separator of G, if there exists at least two vertices u, v such that S is a u, v-minimal separator. This notation is sufficient to start defining the graph modification problems that we will study. Let us first point out that saying that a graph has a certain property is equivalent to saying that it belongs to a certain graph class. In fact, every graph property defines a graph class, namely the set of graphs that satisfy the property. Therefore we will frequently say both that a graph satisfies the property Π or that it belongs to (the class) Π, without explicitly distinguishing between the two. Given a property (graph class) Π, we define the following four graph modification problems. • Π-Completion: Add a set of edges F to an input graph G = (V, E), so that the resulting supergraph H =(V, E ∪ F ) has the property (belongs to) Π and E ∩ F = ∅. The graph H is also called a Π-completion of G, and the edges in F are referred to as fill edges. • Π-Deletion: Remove a set of edges F from an input graph G =(V, E), so that the resulting subgraph H = (V, E \ F ) has the property (belongs to) Π. The graph H is also called a Π-deletion of G, and the edges in F are referred to as deleted edges. • Π-Editing: Find a set of edges and non-edges F such that given an input graph G =(V, E), the graph H =(V, (E \ F ) ∪ (F \ E)) has the property (belongs to) Π. • Π-Vertex Deletion: Given an input graph G =(V, E), find a set V ′ ⊂ V such that the resulting induced graph H = G[V \ V ′] has the property (belongs to) Π. 9
The problems stated in this form are not really interesting, because we do not have a proper objective function to optimize, and the answer might become triv- ial. For example, for most graph classes for which the Π-Completion problem is interesting, the complete graph is always a valid Π-completion of the input graph. The same is true about the edgeless graph for the Π-Deletion problem. The most natural parameter one might want to optimize, in particular minimize, is the num- ber of modified edges or vertices, i.e., |F | or |V ′|. In this case we call them the Minimum Π-Completion/Deletion/Editing/Vertex Deletion problem. Notice that rather than minimizing the added or deleted edges or vertices, we can reformulate the problems equivalently, so that the objective function is the size of the resulting graph. That is:
• Minimum Π-Completion ≡ Minimum Π Supergraph: Find the small- est supergraph with the property Π.
• Minimum Π-Deletion ≡ Maximum Π Subgraph: Find the biggest subgraph with the property Π.
• Minimum Π-Editing ≡ Closest Π Graph: Find the graph with the property Π that differ in the minimum number edges and non-edges.
• Minimum Π-Vertex Deletion ≡ Maximum Π Induced Subgraph: Find the biggest induced subgraph with the property Π.
In Figure 2.1, we give an example of these problems. Notice that from the point of view of an optimal solution, the two formulations of each problem are completely equivalent, but they might lead to different results when considering them, for instance, from a parameterized point of view (see Chapter 4.3). Also other combinations of the modifications can be considered, like for ex- ample vertex and edge deletion, i.e., finding the maximum partial subgraph with a certain property. However we will focus only on the ones defined above. Throughout this work, we will consider many graph classes. For this rea- son, in the next section we describe the most relevant ones, giving one or more characterizations for each of them.
2.1 Graph classes
First we introduce some general notions, and then we describe in detail the graph classes most relevant to this work. A property is said to be non-trivial if it holds for at least one graph, but not for all graphs, while it is said to be interesting if there are infinite families of graphs for which it is true, and infinite families of graphs for which it is false. A monotone property is a property preserved under deletion of edges and vertices. Hence a 10
INPUT GRAPH
MINIMUM MINIMUM CHORDAL CHORDAL COMPLETION DELETION
MINIMUM MINIMUM CHORDAL CHORDAL EDITING VEREX DELETION
Figure 2.1: The property Π that we want the modified graph to have, is that of being chordal (see Chapter 2.1.1), i.e., the graph does not contain induced chordless cycles of length 4 or more. The dashed edges represent fill edges. 11 graph class is monotone if every partial subgraph of a graph in the class, is also in the class. Bipartite graphs (Chapter 2.1.1) and planar graphs (Chapter 2.1.2), for example, are monotone. If a property is preserved under vertex deletion, it is called hereditary. A graph class is hereditary if given a graph in the class, every induced subgraph is also in the class. Clearly every monotone property is also hereditary, but not vice versa. For example, complete graphs are hereditary, but not monotone. Another useful way to look at hereditary graph classes is the following characterization.
Theorem 2.1.1 ([43]) A graph class Π is hereditary if and only if it can be characterized by a (possibly infinite) set of forbidden induced subgraphs F .
When we characterize a hereditary graph class Π by the set of its forbidden subgraphs F = {F1, F2,...}, we say that Π is the class of (F1, F2,...)-free graphs. An example of property that is neither hereditary nor monotone is connectivity. Even though a graph is connected, it might have disconnected induced or partial subgraphs. The graph classes we are going to present can be separated into two main cat- egories: perfect graphs and their subclasses, and not perfect graphs. For proofs, more details on properties and characterizations of perfect and other graphs, we refer the reader to the books by: Golumbic [107]; Brandst¨adt, Le and Spinrad [36]; and McKee and McMorris [186]. Since all results we cite in this chapter are collected in the above mentioned books, we will not give specific references for each result. Before continuing, we define some basic graph names. A path on k vertices is denoted by Pk, similarly Ck is the chordless cycle on k vertices, and Kk is the complete graph on k vertices. Let us recall that saying that a graph contains a path or a cycle as a subgraph, is different from saying that it contains an induced path or cycle. For instance, a complete graph contains every graph (on at most the same number of vertices) as a subgraph, but only complete graphs as induced subgraphs. Other basic graphs are given in Table 2.1. We will often mention that some graphs are defined as intersection graphs of a family of sets. This means that the graph is obtained by representing each set of the family by a vertex, and has an edge between two vertices if and only if their corresponding sets intersect. Finally, two graphs G and H are isomorphic if there is a bijection f : V (G) → V (H), such that two vertices u, v ∈ V (G) are adjacent if and only if the vertices f(u) and f(v) are adjacent in H. In other words the two graphs are identical, but the vertices are labeled differently. 12
BASIC GRAPHS PATHS
P2 = K2 P3 P4 = P4 P5 Pk
CLIQUES 2K2 K3 K4 K5 Kk
CYCLES HOLES
- C3 C4 = 2K2 C5 = C5 Ck
CHORDAL AT’S 1 k k T2 NET=XF2 X31 XF2 (k > 0) XF3 (k ≥ 0)
1 2 k+1 1 2 k+1 WHEELS COMPLETE BIPARTITE W3 W4 Wk (k ≥ 5) K3,3 Kn,m 1 1 2 n m SUNS 0 - S3 = XF3 S4 S5 Sk
STARS OTHERS
CLAW=K1,3 k-STAR=K1,k GEM HOUSE=P5 DOMINO 2
1 3 k Table 2.1: Some basic graphs. 13
FORBIDDEN SUBGRAPHS OF COMPARABILITY GRAPHS Forbidden Graphs Complements of Forbidden Graphs
Table 2.2: This list is taken from [37]. The graphs on the left are some forbidden induced subgraphs of comparability graphs, while the graphs on the right are the complements of the remaining forbidden induced subgraphs of comparability graphs.
2.1.1 Perfect graphs and their subclasses
A graph G is perfect if for every induced subgraph H, the chromatic number χ(H) equals the size of the largest complete subgraph ω(H). The chromatic number of a graph is the minimum number of colors that can be assigned to the vertices of a graph, so that no two adjacent vertices have the same color. Usually the size of the maximum clique of a graph is only a lower bound for the chromatic number, and it is NP-complete to decide whether a graph can be colored with k colors, even when k is a fixed constant greater than 2 [102]. Perfect graphs have the useful property that both maximum clique size and chromatic number can be found in polynomial time [115]. Only very recently it was proved that perfect graphs are exactly Berge graphs, i.e., the graphs that do not contain odd holes or anti-holes [60]. An hole is an induced cycle on at least 5 vertices, and an anti-hole is simply the complement of a hole. So perfect graphs are the (C2k+1, C2k+1)-free graphs for every k > 1. This implies that the complement of a perfect graph is also perfect, that is, this is a self-complementary graph class. Therefore also a maximum independent set can be found in polynomial time on perfect graphs, as it is a maximum clique in the complement graph. 14
Perfect graphs are so interesting and well studied, that most of the other classes presented here, are actually subclasses of perfect graphs (see Fig. 2.2 for an overview).
Comparability graphs Given a graph with directed edges, we say that it is transitive if, whenever (x, y) and (y, z) are arcs of the graph, also (x, z) is an arc. Comparability graphs are the class of undirected graphs for which there exists a transitive orientation of their edges, and they are also called transitively orientable graphs. The set of forbidden induced subgraphs that characterize them, is also known thanks to a notable paper by Gallai [99]. We report them in Table 2.2, to give an example of class for which the set of induced forbidden subgraphs is not finite, and definitely not trivial to obtain. The class of co-comparability graphs, is the family of graphs whose com- plement is a comparability graph. d-Trapezoid graphs d-Trapezoid graphs are a subclass of co-comparability graphs. They are defined as intersection graphs of d-trapezoids, where a d-trapezoid is a polygon defined by intervals of d parallel lines. More specifically, take d parallel lines ordered from 1 to d, and an interval on each of them. Then, for each 1 ≤ i ≤ d − 1, connect the left(/right) endpoint of the interval on line i to the left(/right) endpoint to the interval on line i + 1 to get the trapezoid. Each d-trapezoid is a vertex of the graph, and two vertices are connected if the corresponding trapezoids intersect in some of the intervals that define them. One useful property of this graphs is that they have at most O(2n − 3)d minimal separators.
Permutation graphs Permutation graphs are a subclass of comparability graphs. In particular they are the graphs that are both comparability and co-comparability. Therefore they are a self-complementary graph class, and their forbidden subgraphs are those for comparability graphs, union the corresponding complement graphs. The name permutation comes from the definition of this graph class. Consider −1 a permutation π of the numbers 1, 2, 3,...,n, such that πi is the position in π where number i can be found. So for example, given π = {4, 3, 1, 2, 5}, we would −1 −1 −1 have that π3 = 2, π5 = 5, π1 = 3 and so on. Now let us define G[π]=(V, E) as the undirected graph with vertices labeled V = {1, 2, 3,...,n} and ij ∈ E if the biggest between i and j appears on the left of the other one in π. More −1 −1 formally, ij ∈ E if (i − j)(πi − πj ) < 0. Then an undirected graph G is called 15 a permutation graph, if there exists a labeling of its vertices V = {1, 2, 3,...,n} and a corresponding permutation π such that G is isomorphic to G[π]. By using a graphic interpretation of the permutation defining a permutation graph, it is straigthforward to see that permutation graphs are also a subclass of 2-trapezoid graphs. Take two parallel lines and choose the intervals on each line as single points. On one line we choose the intervals Ii = {i}, and on the second line we have the intervals Ji = {π(i)}.
Weakly Chordal graphs
A graph G is weakly chordal if neither G nor G contain an induced cycle Ck with k ≥ 5. That is, they are exactly the graphs that do not contain holes and anti-holes, i.e., the (Ck, Ck)-free graphs for k ≤ 5. Hence they are also self-complementary.
HHD-free graphs The HHD-free graphs, are the graphs that do not contain a house, holes or domino as induced subgraphs. Since the complement of the house is a P5, and the complement of a C5 is a C5, the complement of a HHD-free graph cannot contain holes either. Hence this class form a subclass of weakly chordal graphs.
Distance Hereditary graphs A graph is called distance hereditary if it is connected and every induced path is isometric; that is, if the distance function in every induced subgraph of G is the same as in G itself. In other words, if every induced path, is a shortest path. The forbidden induced subgraphs characterization of this class, shows that distance hereditary graphs form a subclass of HDD-graphs. They are, in fact, the house, holes, domino and gem free graphs, or HDDG-free graphs.
Chordal graphs
A graph G =(V, E) is chordal if it does not contain any induced Ck with k ≥ 4, or, equivalently, every cycle has a chord. Many other characterizations are known for this well studied graph class, but we will give only the most relevant to our work. In order to do this, we need to introduce some more definitions. A vertex v is said to be simplicial if N[v] is a clique. A perfect elimination order of a graph G = (V, E), is an ordering β = {v1, v2,...,vn} of its vertices, such that vi is a simplicial vertex in G[{vi, vi+1,...,vn}] for each 1 ≤ i ≤ n. It turns out that a graph is chordal if and only if it has a perfect elimination ordering. 16
Another characterization that is central in graph modification problems into chordal graphs, involves the minimal separators of a graph: A graph is chordal if and only if all its minimal separators are cliques. Since chordal graph do not contain C4’s and holes, they can not contain the domino and the house either, therefore they are also HDD-graphs.
Strongly Chordal graphs Strongly Chordal graphs are a proper subset of the class of chordal graphs. They are defined as the chordal graphs where every cycle C of even length at least 6 has a chord that divides C into two odd paths. They also have a nice characterization through forbidden subgraphs. They are, in fact, the chordal graphs that do not contain k-suns (also called trampolines) as induced subgraph (see Table 2.1).
Interval graphs Interval graph are a very important subclass of chordal graphs, or better, of strongly chordal graphs. They are defined as the graphs that can be represented as intersection of intervals on the real line, from which the name. More formally a graph G =(V, E) is interval if there exists a set of closed intervals on the real line {[i1, j1], [i2, j2],..., [in, jn]}, where each [ik, jk] is associated to a vertex vk ∈ V , and there is an edge vsvt ∈ E if and only if [is, js] and [it, jt] intersect. Their forbidden induced subgraphs characterization is also known. All the forbidden subgraphs are represented in Table 2.1, and they are Ck for k> 3, T2, k k X31, XF2 and XF3 . One more characterization that might be interesting, is that interval graphs are exactly the chordal graphs whose complement are comparability graph. More specifically, they are exactly 1-trapezoid graphs.
Proper and Unit Interval graphs Proper interval graphs are the interval graphs that admit an interval representation such that no interval is properly contained in some other interval. Unit interval graphs are those that admit an interval representation where all intervals have the same length. It turns out that this two classes actually coincide. This graph class can be characterized as the claw-free interval graphs.
Split graphs Split graphs are yet another subclass of chordal graphs, but incomparable to both interval and strongly chordal graphs. They are defined as the graphs whose vertex set can be partitioned into a clique and an independent set (not necessarily in a unique way). They can also be characterized by a forbidden set of induced 17 graphs, but differently from interval and chordal graphs, this set is finite. It consists, in fact, of only three graphs: C4, C5 and 2K2. Since C5 = C5, C4 =2K2 and 2K2 = C4, split graphs are self-complementary. Another way to see this, is that a graph G is a split graph if and only if both G and G are chordal.
Cographs
Cographs are one of the classes that have several different characterizations, dis- covered often independently and in different fields. We are interested mainly in three of them. They are the graphs that can be obtain by two basic operations through the following recursive definition:
• A vertex is a cograph.
• If G1 and G2 are cographs, then G3 = G1 ∪ G2 is also a cograph.
• If G1 and G2 are cographs, then G3 = G1 + G2 is also a cograph.
The “+” is an operation that consists in adding all edges between the vertices of G1 and G2. This operation was initially replaced with complementation, that is:
• If G is a cograph, then also G is a cograph.
The equivalence is straightforward since if a cograph can be built by using the + operation, then it can be built using the complement and the union operation as well: G1 + G2 = G1 ∪ G2. The second characterization we report is through forbidden induced sub- graphs. Cographs are exactly the P4-free graphs. The last characterization we are interested in, is the following. A graph G = (V, E) is a cograph if and only if for each V ′ ⊆ V such that |V | > 1, either G[V ] or G[V ] is disconnected. Connected cographs are distance-hereditary graphs.
Trivially perfect graphs
Trivially perfect graphs are the cographs that are also chordal. In other words they are P4 and Ck free for k > 3. However, as every chordless cycle on at least 5 vertices contains a P4, this is exactly the class of (P4,C4)-free graphs. 18
Threshold graphs The name of his class is due to the following definition. A graph is a threshold graph if there is a real number s (the threshold) and for every vertex v there is a real weight av such that: vw is an edge if and only if av + aw ≥ s. A more intuitive characterization is that threshold graphs are exactly the graphs that are both split graphs and cographs. Hence they are exactly the (P4,C4, 2K2)-free graphs. However, since split graphs are also chordal, threshold graphs are actually a subclass of trivially perfect graphs. As they are split graphs, their vertex set can be partitioned into a clique and an independent set, but, in addition, the neighborhood of the independent set has to have the special property of being orderable by inclusion. That is, there is an ordering {v1, v2,...,vl} of the vertices of the independent set, such that N(vi) ⊆ N(vi+1) for each 1 ≤ i ≤ l − 1.
Cluster graphs Cluster graphs are defined as the graphs that are a disjoint union of cliques. With this we mean that every connected component of the graph is a clique. As they are exactly the P3-free graphs, they do not contain P4’s, and therefore they form a subclass of cographs.
Bipartite graphs This class is somewhat similar to split graphs as the graphs belonging to it, are exactly the graphs whose vertex set can be partitioned into two independent sets. For this reason, a bipartite graph G = (V, E), is often defined as G = (A ∪ B, E), where A and B are the two independent sets V can be partition into, or bipartitions. They are also known as 2-colorable graphs, and odd cycle free graphs, i.e., C2k+1-free graphs for k ≥ 1. Many classes are defined as the intersection of bipartite graphs and some other class. For example: Bipartite distance hereditary graphs are the graphs that are both bipartite and distance hereditary. Bipartite permutation graphs are the graphs that are both bipartite and permutation. And so on. Other such classes are listed independently.
Chordal bipartite graphs Chordal bipartite graphs, are the intersection class of weakly chordal and bipartite graphs. In other words they are the bipartite graphs that do not contain induced cycles on more than 4 vertices. However, in contrast with the name, these graphs are not chordal, because they can contain C4’s. 19
The name comes from weakly chordal graphs and the many similarities with chordal graphs, and analogous applications to matrices. For example a graph is chordal bipartite if and only if every minimal separator is a complete bipar- tite graph, and it is possible to define chordal bipartite graphs through special elimination orderings as well. This class also contains as a subclass, the bipartite permutation graphs.
Chain graphs Chain graphs are for bipartite graphs, what threshold graph are for split graphs. A graph is a chain graph if and only if it is bipartite and the vertices of each independent set are orderable by neighborhood inclusion.
Forests and trees A forest is a graph that does not contain any cycle, and trees are connected forests. So another way to see a forest is as the union of disjoint trees. Notice that there is a big difference between these two classes in terms of being hereditary or not. Forests are both monotone and hereditary, trees are neither of them. Removing an edge or a vertex from a tree might create a forest, not necessarily a tree. The vertices of degree 1 in a forest are called leaves. As forests can be defined as the Ck-free graphs with k > 2, they are also chordal and bipartite graphs. Trees are also distance hereditary graphs.
Caterpillars A caterpillar is a special tree, such that if we remove all the leaves, we are left with a path. A generalization of caterpillars is obtained allowing paths attached to the vertices of the main path, rather than just leaves. These paths are called hairs. So a caterpillar with hair length at most k consists of a main path with paths of length at most k attached to the vertices of the main path. Caterpillar can be seen also as the trees that are interval graphs. Therefore, interval bipartite graphs are actually forests of caterpillars.
2.1.2 Other graph classes Planar graphs Loosely speaking, planar graphs are the graphs that can be drawn on the plane so that no two edges cross each other. However the most famous characterization of this graph class is through forbidden minors. Planar graphs are exactly the graphs that do not contain K5 and K3,3 as minors. 20
AT-free graphs
We say that three vertices {x1, x2, x3} form an asteroidal triple or AT in a graph G, if there exists a path Pij from xi to xj, such that N[xk] ∩ Pij = ∅ for each i =6 j =6 k and 1 ≤ i, j, k ≤ 3. That is, we can go from any of the three vertices to another of them, without passing through the neighborhood of the remaining one. AT-free graphs are the graphs that do not contain AT’s as induced subgraphs. In Table 2.1 there are some examples of asteroidal triples. Let us note that a C5 does not contain AT’s, so AT-free graphs are not a subclass of perfect graphs, however they contain many subclasses of perfect graphs. For example, co-comparability graphs, permutation graphs, interval graphs, cographs and co- bipartite graphs. About this last class, notice that the complement of bipartite graphs consist of two cliques with some edges in between, hence they cannot contain independent set of size more than 2, and consequently asteroidal triples.
Circle graphs
Circle graphs are a superclass of permutation graphs and distance hereditary garph, but they are not perfect. In fact, they are the intersection graphs of chords of a circle, and therefore any cycle is a circle graph. A chord of a circle is a straight line that joins any two points of the circle. They are also exactly the graphs that are k-polygon graphs for some k. A k-polygon graph, is the intersection graph of chords of a polygon with k sides. It will be useful to know that distance hereditary graphs are a subclass of circle graphs.
Circular arc graphs
This graph class is defined similarly to interval graphs, but it is more general. Circular arc graphs are the intersection graphs of a family of closed arcs of a circle, or, equivalently, of a family of connected subgraphs of a cycle. Clearly interval graphs are circular arc graphs, but not vice-versa. Actually circular arc graphs are not even perfect, because any chordless cycle is a circular arc graph, including the odd-length ones. Another similarity with interval graphs is that we can define proper and unit circular arc graph analogously to proper and unit interval graphs. However, in this case, the two classes are not equivalent. Unit circular arc graphs are a subclass of the proper circular arc graphs. Furthermore, unit circular arc graphs are a subclass of circle graphs, and contain proper interval graphs. 21 k-Connected graphs A graph G is k-connected if the minimum number of vertices that must be re- moved to make G disconnected is at least k. In this sense, every connected graph is at least 1-connected. A similar class of graphs can be defined when removing edges rather than vertices to disconnect the graph. If the minimum number of edges that must be removed to make G disconnected is at least k, then we say that the graph is k-edge connected. To avoid confusion we will sometimes refer to k-connected graphs as k-vertex connected graphs. Let us note that this classes are neither hereditary nor monotone. 22 CHORDAL STRONGLY INTERVAL PROPER INTERVAL CHORDAL HHD−free CHORDAL WEAKLY SPLIT CIRCULAR COCHORDAL ARC UNIT ARC CIRCULAR PROPER PERFECT ARC CIRCULAR CLUSTER THRESHOLD RABILITY CO−COMPA−
d−TRAPEZOID TRIVIALLY PERMUTA AT−FREE PERFECT COGRAPH TION COMPARA− BILITY CIRCLE TREES FOREST BIPARTITE PILLARS CATER− BIPARTITE PERMUT. BIPARTITE CHORDAL DISTANCE HEREDIT. CHAIN DIST. HER. BIPART.
Figure 2.2: This diagram represents the inclusion relations among the graph classes we described in Chapter 2.1. An arrow from class A to class B means that B ⊂ A, and if there are arrows from two classes A and B to a class C, it means that C ⊆ A ∩ B. The thicker ovals, represents the classes that are not perfect. Chapter 3
Practical applications of graph modification problems
Many fundamental problems in graph theory can be expressed as graph modi- fication problems. Already in the classic book of Gary and Johnson [102] (Sec. A1.2), 18 graph modification problems were explicitly mentioned, and many oth- ers can be easily reformulated as such. For instance, the Connectivity problem is the problem of finding the minimum number of vertices or edges that disconnect the graph when removed from it. The Maximum Induced Matching problem, the Vertex Cover problem, the Maximum Clique problem and the Feedback Vertex Set problem can be seen as the problems of removing the smallest set of vertices from the graph to obtain respectively, a collection of disjoint edges, an indepen- dent set, a complete graph and a forest. With time, the number of interesting graph modification problems increased dramatically, and in this section we give an overview of some of them. First we consider graph modification problems where the goal is the most natural, namely that of minimizing the number of modifications. Then we sur- vey also graph modification problems where other parameters are sought to be optimized, or some restrictions are applied. Notice that we examine in detail only the classes for which there are practi- cal applications that can be reformulated directly as some modification problem into one of these classes. This does not mean that the other graph classes we mentioned in the previous chapter are uninteresting from this point of view. For any graph class that has some useful application, it is natural to wonder whether general graphs can be turned into it by some modifications. As we will see later, in fact, investigating graph modification problems often led, for instance, to the discovery of new and very useful structural properties and characterizations of the graph class in question. These new tools were then used to produce new and more efficient algorithms for specific practical problems related to this graph class.
23 24
3.1 Minimizing the number of modifications
In this section we will focus on problems that can be generally classified as Mini- mum Graph Modification problems, i.e., modification problems where the goal is to minimize the number of modifications. Before describing in detail the practical motivations that lay behind their study, we would like to summarize some general complexity results. In the late 1970’s a lot of effort was put into establishing the general com- plexity of large families of problems, rather than considering only one specific problem at the time. Graph modification problems are one of the families that were investigated. As most interesting graph properties are hereditary, the efforts were particularly concentrated on proving results for graph modification problems into hereditary graph classes. Vertex Deletion problems turned out to be the best candidate for this kind of investigation, and in a series of subsequent papers, the following theorem was derived by Yannakakis and Lewis.
Theorem 3.1.1 ([247, 174, 173]) For every non-trivial interesting hereditary property Π, finding the largest induced subgraph with property Π is NP-hard.
Thanks to this very general theorem, the complexity of the Minimum Π-Vertex Deletion problem was settled for all hereditary properties. This theorem was later extended to take into consideration one more requirement on the induced subgraph, connectivity.
Theorem 3.1.2 ([248]) For every non-trivial interesting property Π that holds on every connected induced subgraph, finding the largest connected induced sub- graph with property Π is NP-hard.
This implies that finding a maximum connected subgraph with a property Π is NP-hard not only for all hereditary properties Π, but also some other properties Π that are not hereditary. For example, trees and stars are not hereditary, but every connected induced subgraph of a tree is a tree and every connected induced subgraph a star is a star. Hence removing the minimum number of vertices to obtain a tree or a star is NP-hard as well by Theorem 3.1.2, even though this did not follow directly from Theorem 3.1.1. For the Minimum Π-Deletion problem, there are no such general results, as it seems to be easier than the Vertex counter part for some non-trivial and inter- esting hereditary properties Π. Consider, for instance, the property Π of being a forest. In this case the Minimum Π-Deletion problem would be equivalent to find- ing a maximum spanning tree of each connected component of the input graph. This problem is well known to be polynomial [64]. Consistently with Theorem 3.1.1, instead, it is NP-hard to find an induced forest by removing the minimum number of vertices, i.e., solving the Minimum Feedback Vertex Set problem. 25
There are, however some NP-hardness results for the Minimum Π-Deletion problem that hold for some quite large families of hereditary properties. The first result of this type is due to Asano and Hirata in 1982 [6], and it completes some previous partial results [250, 196]. They proved NP-hardness for a class of properties that can be characterized by certain forbidden minors. New results were added by Colbourn and El-Mallah in [63], the most relevant of which, to us, is the following.
Theorem 3.1.3 ([63]) Making a graph Pk-free by deleting the minimum number of edges is NP-hard for every k ≥ 3.
Theorem 3.1.3 implies that the Minimum Cluster and Cograph Deletion prob- lems are NP-complete. Recently another very general result for a large class of monotone properties was shown in [3], where the authors prove NP-hardness of the Minimum Π-Deletion problem for all monotone properties Π that are true for all bipartite graphs. An example is the property of being triangle-free.
Theorem 3.1.4 ([3]) Given a property Π such that Π holds for every bipartite graph, the Minimum Π-Deletion problem is NP-hard.
There is no general result known for editing problems, while we can consider the previous ones for edge deletion, somewhat valid for completions as well. No- tice that if deleting edges to obtain a graph with certain properties is NP-hard, then it follows that also adding the minimum number of edges to obtain the com- plement of this property is NP-hard. By Theorem 2.1.1, a hereditary property Π can be defined by a set of forbidden induced subgraphs F = {F1, F2,...,Fl}. So we define the complement Π of a hereditary property Π, as the property char- acterized by the set of forbidden subgraphs F = {F1, F2,..., Fl}. A property is invariant under complement, if Π = Π, i.e, if the corresponding graph class is self-complementary. Hence, by Theorem 3.1.3 it also follows that the Minimum Cograph Completion problem is NP-hard. The general motivation that links together most of the graph modification problems that we are going to present, is that many real world structures can be modeled as graphs, and there are some special properties that such structures should have in order to guarantee reliability, functionality or correctness. How- ever, as real instances are obtained through empirical methods, often they do not satisfy all the requirements the we expect them to have due to errors in data collections, loss of data, wrong measurements and so on. So, we might want to modify them as little as possible in order to fit the theoretical model as much as possible. Using the graph representation, this translates into minimum graph modification problems. 26
3.1.1 Chordal graphs
One of the oldest and most well known graph modification problems, explicitly stated as such, is probably that of making a graph chordal by using the min- imum number of fill edges possible, namely the Minimum Chordal Completion problem. This problem is also known with other names in the literature, like Minimum Triangulation and Minimum Fill-in. Triangulating a graph, in this case, means making an arbitrary graph chordal, and refers to the fact that the smallest induced cycles in a chordal graph are triangles. It has nothing to do with the, probably better known, problem of triangulating planar graphs, i.e., adding edges to a planar graph so that each face is a triangle. In Figure 3.1.1, we show the difference between the two definitions. The name Minimum Fill-in actually refers to the original problem that started the study of chordal completion, i.e., the Gaussian elimination method for sparse matrices. Gaussian elimination is a method used to solve systems of linear equations by repeatedly manipulating the matrix representing the system. However this process can turn zero entries of the matrix into non-zeros entries, called fills, increasing the space required to store the matrix. It is therefore desirable to minimize the number of fills produced dur- ing the process, especially when the original matrix is sparse, i.e., contains many zero entries. It was shown in [211] that if we consider the matrix representing the system as the adjacency matrix of a graph G = (V, E), then the Gaussian elimination process is equivalent to the Elimination Game on G.
Given an ordering α = {v1, v2,...,vn} of the vertices in V , the Elimination Game consists in removing the vertices from the graph in the given order, so that before each removal, the neighborhood of the removed vertex in the current graph is made into a clique by adding fill edges. It was proved in [97] that the graph resulting from the addition of these edges to G, is always chordal. Furthermore, in the same paper it is also proved that the set of graphs that can be produced by the Elimination Game, is exactly the class of chordal graph. Hence, minimizing the fill produced during the Gaussian elimination of a matrix, is equivalent to making the corresponding graph chordal by adding minimum number of fill edges. Later also other applications of chordal completions were found in relational databases [239], computer vision [61] and expert systems [170]. The Minimum Fill-in problem was proved NP-complete by Yannakakis in [249] as an independent result, showing that the problem remains NP-complete also when the input is the complement of a bipartite graph. Actually this result implies that also the Minimum Chain Completion of bipartite graphs and the Minimum Interval, Proper Interval and Trivially Perfect Completion problems are NP-complete, because they are all equivalent to Minimum Fill-in when the input is a co-bipartite graph (see Theorem 4.2.3). As we will see in the next section, the NP-completeness of some of these problems has been proved independently as well. 27
Also for the Minimum Chordal Deletion problem some practical application has been proposed. For example as a heuristic to solve the Maximum Clique problem, since a maximum clique in a chordal graph can be found in linear time. Unfortunately also this problem is NP-complete [201], along with Minimum Chordal Editing [229].
e
b c
a d
e e
c b b c
a d a d
Figure 3.1: Here we can see the difference between a minimum triangulation (on the left) and a planar triangulation (on the right) of the same input graph (on the top). In the input graph there is only one C4 induced by the vertices {a,b,c,d}, so it is enough to add the fill edges ac (or bd) to make it chordal. This does not make the graph have all faces as triangles, since now the external face is defined by {a, b, e, c}. On the other hand, the external face of the input graph consists of 5 vertices, so we need to add ea and ed to make all faces into triangles. Notice that this does not destroy the C4, so the resulting graph is not chordal.
3.1.2 Interval graphs The most well known motivation for Minimum Interval Modification problems, comes from molecular biology, and it is one of the main reasons why interval graphs started being studied in the first place. In a paper from 1959 [14], Benzer first gave strong evidences that the collection of DNA composing a bacterial gene was linear, just like the structure of the genes themselves in the chromosome. This linear structure, could be represented as overlapping intervals on the real line, and therefore as an interval graph. However, mapping of the genetic structure is done by indirect observation. That is, such linear structure is not observed directly, but it is inferred by how various fragments of the original genome can be recombined. In order to study various properties of a certain DNA sequence, the original piece of DNA is fragmented into smaller pieces. This fragments are then cloned many 28 times using various biological methods, and take the name of clones. In this process the position of each clone on the original stretch of DNA is lost, but since usually many copies of the same piece of DNA are fragmented in different ways, some clones will overlap. The problem of reconstructing the original arrangements of the clones in the original sequence is called physical mapping of DNA. Deciding whether two clones overlap or not is the critical part where errors may arise, since it is a process based on partial information. We know that once we decide an arrangement of these clones consistent with the overlapping, the resulting model should represent an interval graph. However, there might be some false positive or false negatives, due to erroneous interpretation of some data. Correcting the model to get rid of inconsistencies is then equivalent to remove or add edges to the graph representing the dataset, so that it becomes interval. Of course we want to change it as little as possible. When all the clones have the same size, i.e., the DNA sequence has been fragmented in equal parts, the resulting graph should be not only interval, but proper interval. All these problems were shown NP-complete. It was noted in [106] that the NP-completeness of Minimum Interval and Proper Interval Completion actually follows from the NP-completeness proof of the Chordal Completion problem [249], even though the result was independently proved also in [102, 155]. Other prob- lems to be shown NP-complete were the Minimum Interval and Proper Interval Deletion in [106], and the Minimum Interval and Proper Interval Editing in [41]. The Vertex Deletion version of the problem is also interesting for physical map- ping of DNA. In this case the assumption is that the errors may be caused by some clones that were corrupted during the cloning process, rather than by the overlapping. The NP-hardness of this problem follows directly from Theorem 3.1.1. Later another model for a different, more restricted, kind of physical mapping was proposed [114]. In this case we map two sets of clones that have a fix position on the original DNA sequence. However we do not know the relative order of the clones in each set, and we can deduce it only by looking at the overlapping between clones in different sets. This means that the corresponding overlapping graph will have to be both interval and bipartite. Additionally the authors pose as an open problem that of removing the minimum number of edges to make a graph consistent with this model, that is, the Interval Bipartite Minimum Deletion problem. This problem was also showed to be NP-hard [244], along with its editing version [62]. In this case it does not make sense to ask for completion. As reported in [41], there are cases in animals when bacterial DNA and cy- toplasmic DNA have closed circular form. Furthermore, giant DNA molecules in higher organisms form loop structures held together by protein fasteners in which each loop is largely analogous to closed circular DNA. This can motivate graph modification problems into circular-arc graphs and related subclasses. However, 29 also all the modification problems related to these classes are NP-complete as shown in [41, 229]. Other important problems equivalent to the Minimum Interval Completion problem include applications from sparse matrices and search games. During Gaussian elimination of sparse matrices, it is a standard procedure to permute the rows and columns of the matrix so that non-zero elements are gathered close to the main diagonal. This helps reducing the non-zero entries created during the elimination process [105]. The profile of a matrix is the smallest number of entries that can be enveloped within off-diagonal non-zero elements of the matrix. Translated to graphs, the profile of a graph G is exactly the minimum number of edges in an interval supergraph of G [237]. In [90], Golovach and Fomin show that the minimum number of fill edges needed to make a graph interval, is equal to the search cost of the graph. The search cost is defined as the minimum number of total steps that a group of searchers need to perform in order to clean a contaminated graph. A searcher can be placed or removed from a node. Initially all edges of a graph are contaminated, and an edge is “cleaned” when both its endpoints are occupied by a searcher. An edge e can be recontaminated if there is a path without any searcher leading from e to a contaminated edges.
3.1.3 Cluster graphs Another type of graphs that is widely used to represent special data sets are cluster graphs. Clustering information according to similarity is a daily process implied by many applications. An easy example comes from an area of artificial intelligence called text categorization. The aim of this field is to design systems that can automatically sort texts into categories, without knowing the definition of what the particular category actually is. To do this, the texts are compared with each other, and a probability that two text belongs to the same category is computed. Assume that we build a graph where the vertices are the texts, and there is an edge between two vertices if the corresponding texts have been found similar enough to belong to the same category. Now, the graph of an ideal solution, if we assume that each text belongs to exactly one category, would be one where all texts in the same category form a clique. However this is rarely the case, as the algorithm that compares the texts and decides which edges to put in the graph, is not perfect (or there would not be any problem). What we want to do at this point then is to remove and add the minimum number of edges, so that our graph becomes a collection of disjoint cliques, i.e., a cluster graph. This problem in graph theory takes the name of Cluster Editing, and it has been shown NP -hard in [228]. In our terminology, this is the Minimum Cluster Editing problem. We can, of course, also be concerned only with addition or deletion of edges, if 30 we consider models that admit only false negative or false relative respectively. In contrast to almost all other modification problems, even though the edge deletion version is NP-hard by Theorem 3.1.3, it is easy to see that adding the minimum number of edges to a graph to make it a cluster graph, is polynomial. The best possible solution is, in fact, to make each connected component into a clique.
The Cluster Editing problem arises naturally in so many practical situations, that it has been rediscovered and reproved NP-hard many times. For exam- ple its NP-hardness is already implicit in the results of [168], in the context of Hierarchical Tree Clustering.
In biology this problem arises when the expression levels of many genes are analyzed simultaneously and we want to group together the genes that manifest similar expression patterns. Clustering this data correctly is a key step in the analysis of gene expression data [13].
In phylogenetics, the Cluster Editing problem is a special case of the Closest Phylogenetic k-th Root problem, namely when k = 2. The problem was proved NP-hard in this context by Chen et al. [56]. Given a graph G =(V, E), the Phy- logenetic k-th Root problem, asks whether there is a tree T = (N, ET ) with no internal degree 2 vertices, such that the leaves of T are in one to one correspon- dence with V , and two vertices of G are adjacent if and only if the corresponding leaves in T are at distance at most k in T . This problem is very similar to that of finding the k-th leaf root of a graph. The only difference is that the leaf root can have internal nodes of degree 2. The Closest Phylogenetic k-th Root problem is then the problem of editing a graph in a minimum way, so that it becomes a Phy- logenetic k-th root. It is not too difficult to see that a graph has a Phylogenetic 2-th Root if and only if it is a cluster graph.
The last work where this problem was once more proved NP-hard is the famous paper by Blum et al. [11] about the Correlation Clustering problem. Here the work is motivated by agnostic learning and machine learning. The problem as it is defined in [11], is exactly the Cluster Editing problem in disguise. Rather than considering edge modifications on a graph, they reformulate the problem as an optimization one. The input is a complete graph with weights 1 and 0 on the edges, and the goal is to find a partition (clustering) of the vertices such that the sum of the 0-edges inside the clusters and the 1-edges in between, is minimized. Clearly, if we replace the 0-edges with non-edges, and remove the weights, an optimum clustering also define a minimum editing. The problem as defined by Blum et al., has been also generalized to graphs with lack of information, i.e., when the input graph is not complete. In [20] (Paper V), we call this kind of graphs fuzzy graphs, and we shortly discuss them in chapter 5.2. 31
3.1.4 Planar graphs When we want to make a graph planar, the only modification that makes sense is deletion, either of edges or vertices. Both these problems are NP-hard when we want to minimize the number of deletions. For Minimum Planar Vertex Deletion, the result is a consequence of Theorem 3.1.1, while Minimum Planar Deletion has been proved NP-hard independently in [247, 196, 104]. Planarity is a very desired property in many applications that have to do with visualization of graphs or where we need to minimize the crossings of edges in a network-like structure. When the drawing of a graph is as planar as possible, it is much easier for a user to understand its topology. This translates in better readability and understanding when the graph represents some kind of diagram, or real world structure. Hence, making a graph planar removing edges can be used as a sub-routine in graph drawing applications to get a layout of the graph that minimizes crossings or emphasize an underlying structure [236]. In general, planarity of graphs has applications in areas such as graph drawing, circuit de- sign and facility layout [148, 126, 12], and so does the problem of making a graph planar modifying it as little as possible. However, the problem of removing ver- tices was not considered much, as it was seen as a much more drastic operation than just removing edges, causing loss of information useful for the mentioned applications. Only in the recent years Edwards and Farr [80, 81] showed that the Minimum Planar Vertex Deletion is useful in determining the fragmentability of a graph class, awakening interest in this problem as well. The fragmentability is defined as the minimum number of vertices that need to be removed from a graph, so that all resulting connected components have bounded size. Formally it is defined as following. Let G =(V, E) be our input graph, C a natural number and ǫ ∈ [0, 1]. Then G is said to be (C, ǫ)-fragmentable if there is a set V ′ ⊂ V such that |V ′| ≤ ǫ ·|V |, and all connected components of G[V \ V ′] have at most C vertices. This is also a vertex deletion problem with additional constraints on its own. As a further note on the Minimum Planar Deletion problem, the minimum number of edges to be removed in order to make a graph planar, is often referred to as skewness in the literature.
3.1.5 Connected graphs When working with networks, one is often concerned with connectivity properties, since they are strictly related to the reliability of the network. For example we might want to upgrade the network so that the failure of any k nodes or links will not compromise the connectivity of the whole network. This is equivalent to adding links so that the graph representing the network topology is (k + 1)- vertex or (k + 1)-edge connected. Of course it is natural to wish to minimize the 32 total cost of the links we add, or if they all cost the same, their number. When the property we wish to obtain has to do with connectivity, the term completion is often replaced with augmentation, and minimum augmentation refers to the minimization of the total cost of the added edges, while when we minimize the number of edges, the problem is called smallest augmentation.
Recall that connectivity properties are not hereditary. However, as in the case of most interesting graph properties, also many natural augmentation problems have been proved to be NP-hard. The most general results are due to Watan- abe and Nakamura [197]. They show that the minimum augmentation problem for k-edge or k-vertex connectivity, is NP-hard for any k ≥ 2. In particular when k = 3, the problems remain NP-hard even if the input graph in already 2-connected [195]. When k = 1, the problem is polynomial, as it is equivalent to find a minimum spanning tree that connects the graph [240]. In contrast, there are various polynomial algorithms to solve smallest augmentation problems. For the smallest k-edge connectivity augmentation problem, there have been initially polynomial and linear algorithms for k = 2 [82] and k = 3 [195, 207], but after- ward more general algorithms for arbitrary k were given, first on restricted inputs, like trees [241], and then on general graphs [42, 197]. More efficient algorithms exist also for increasing the connectivity of the input graph by a specific value [198, 98], or to reach arbitrary edge-connectivity requirements [95]. In this last case the vertices of the graph have weights, and the weight of an edge is defined as the sum of the weights on its endpoints. For vertex connectivity requirements, the situation is not as nice as for edge-connectivity. There are polynomial algorithms for the problem of adding the smallest set of edges to a general graph to obtain 2-connectivity [156] and 3-connectivity [141], or 4-connectivity of 3-connected graphs [140]. However the complexity of the problem is open for k > 4. Fur- thermore, it remains open even in the special case in which the input graph is (k−1)-vertex connected. If the output graph is required to be planar (and there- fore the input as well), even the Smallest 2-Vertex Connectivity Augmentation problem is NP-hard [151].
Some authors consider also deletion problems for connectivity properties. In this case the input graph has already the desired property, because if a graph is not k-edge or k-vertex connected, then none of its subgraphs can be. Thus the natural goal is to remove as many edges as possible from the input graph without destroying the given property. This clearly differs from the edge deletion problems we defined previously, where we wanted to remove the minimum number possible of edges in order to make the graph have the property. However the complexity is the same. For the property of being k-edge or k-vertex connected, the problem of finding a minimum subgraph is NP-hard for k ≥ 2 ([102], problem GT31). If k = 1, the problem is again that of finding a spanning tree of the graph. In Chapter 4.2.3, we will discuss these problems again. 33
3.1.6 Graphs characterized by cycles
Various graph modification problems concern destroying or creating cycles. We have already mentioned one of these problems, namely the Minimum Fill-in. Let us recall that completing a graph into a chordal graph means to destroy all induced cycles of length at least 4 by adding edges to the graph. Other important problems are defined when the cycles are not induced, or there are restrictions on their lengths. For example the Feedback Vertex/Edge set problem asks to find the smallest set of vertices/edges, that when removed from the graph, eliminates all cycles, triangles included. In other words the output graph must be a forest. We have already mentioned that while the vertex case is NP-hard, the edge version is polynomial. The same problems are also formulated on directed graphs, considering directed cycles rather than just cycles. In contrast to the undirected version, both the Directed Feedback Vertex Set and the Feedback Arc Set problems are NP-hard [102, 154]. The Feedback Vertex Set problem was one of the 21 NP-complete problems in Karp’s list [154], and has important applications in both Databases [100] and Operating Systems [230] to solve deadlock problems. For instance, in the Deadlock Recovery problem, a deadlock is represented by a cycle in a system resource-allocation graph. Therefore, in order to recover the system from deadlocks, we need to abort a set of processes, so that removing the corresponding vertices in the resource-allocation graph, all cycles are broken. A similar problem where, rather than all cycles, we want to destroy all odd cycles of the graph by removing the smallest set of edges or vertices, is called Odd Cycle Traversal problem. These problems are equivalent to finding the maximum bipartite subgraph and induced subgraph, respectively. In our terminology, they are the Minimum Bipartite Deletion and Minimum Bipartite Vertex Deletion problems. Differently from the previous problem, here both the vertex and the edge version of the problem are NP-hard on undirected graphs. In particular, the Minimum Bipartite Deletion problem is just a different way to formulate the problem of finding a partition of the vertices into two sets so that the number of edges between these sets is maximized, i.e., the (unweighted) Max-Cut problem [154]. Also the NP-hardness of Minimum Bipartite Vertex Deletion problem fol- lows from a reduction from Max-Cut, or directly from Theorem 3.1.1. Apart form the equivalence to Max-Cut, these problems have applications in computational biology [7], register allocation [208], and VSLI design [58]. Another very important class of graphs that can be characterized by the ab- sence of certain induced cycles, is that of perfect graphs, as we have seen in Chapter 2.1.1. Since these graphs have so many interesting properties, it would be natural to wish to modify a graph in a minimum way to make it perfect, even though all graph modification problems into perfect graphs are NP-hard [201, 247]. However, to our knowledge, no attempt to design any algorithm has been made, even for approximation or restricted inputs. The only exception is 34 a paper by Natanzon and Shamir that considers perfect deletion and completion of random graphs [199]. This might be due to the fact that until recently it was not even certain whether this class could be recognized in polynomial time. In fact, even though there was a polynomial algorithm to recognize Berge graphs [59], the equivalence between these graphs and perfect graphs was proved only recently as we mentioned in Chapter 2.1.1. A more studied problem, and the last one we will present in this section, is the Minimum Hamiltonian Completion problem (HCP). In contrast with the previous problems, here we want to create cycles rather than destroying them. In particular we want to create a cycle (not necessarily induced) that goes through all vertices of the graph. Such a cycle is called Hamiltonian, and so is called a graph that contains one. Its decision version can be formulated as: Given a graph G =(V, E) and an integer k ≥ 0, is there a Hamiltonian graph G′ =(V, E′) such that E ⊆ E′ and |E′ \ E| ≤ k? It is easy to see that the problem is NP-complete for every k ≥ 0, because when k = 0 it is equivalent to the Hamiltonian Cycle problem, that is well known to be NP-complete [154]. This implies that also a special version of the Traveling Salesman problem (TSP), i.e., the problem of finding a minimum weight Hamiltonian cycle in a weighted complete graph, is NP-hard, namely when the edges of the complete graph have weight only 1 or 0. We can reduce the HCP directly to it. Take a graph G and consider a complete graph G′ on the same vertex set with the following weights on the edges: if an edge of G′ is also in G, it has weight 0, otherwise 1. If the optimal value of a TSP tour on G′ is k, then the minimum number of edges we have to add to G to make it Hamiltonian is k − 1. However when the input is a tree, the line graph of a tree or a cactus, or a series parallel graph, the problem can be solved in linear time [2, 219, 75, 204]. The line graph L(G) of a graph G is the graph that has a vertex for each edge of G, and an edge between two vertices, if the corresponding edges in G share an endpoint. A cactus is a graph where each edge belongs to at most one cycle. Finally a series parallel graph, is a graph that can be obtained recursively from a K2, by two operations: edge subdivision (turn a K2 into a P3) and replacing one edge with a set of edges between the same endpoints. This means that the graph might have multiple parallel edges between two vertices.
3.1.7 Comparability graphs One more graph class for which the corresponding Minimum Completion prob- lem has a direct application, is the class of comparability graphs. The Maximum Reachability problem consists in finding an orientation of the input graph G that maximizes the pairs of vertices (u, v), such that there exist a direct path from u to v. Hakimi, Schmeichel and Young show in [123] that when we reverse the problem and ask for the orientation that minimizes such pair of vertices, i.e., the Mini- mum Reachability Orientation problem, then this problem can be reduced to the 35
Minimum Comparability Completion problem. In particular, if we define r(G) as the minimum reachability number of a graph G, then r(G) = |E(G)| + c(G), where c(G) is the minimum number of fill edges that added to G make it into a comparability graph. In the same paper it is also shown that the Minimum Comparability Completion problem is NP-complete. The Minimum Comparabil- ity Deletion problem had been shown NP-complete in [250] and appears as one of the 18 basic subgraph problems in the book by Gary and Johnson [102], while the editing version of the problem has been proved NP-complete in [201]. In Table 3.1 we report a summary of the NP-completeness results both for graph modification problems that we discussed in this section and for other classes for which complexity results are known.
3.2 Minimizing the maximum clique size
We mentioned that the most natural objective function to optimize when con- sidering graph modification problems is the number of modifications needed to obtain a certain property. However it is not the only interesting one. In the case of the Π-Completion problem, a very important parameter associated to the output graph, is the size of its maximum clique. When Π is the class of chordal, interval or proper interval graphs, finding the corresponding completion that minimizes the size of the maximum clique, is equivalent to computing three fundamental graph parameters associated to the input graph, namely treewidth, pathwidth and bandwidth, respectively. We formalize this equivalence in the following theorem.
Theorem 3.2.1 Let G be any graph, and Hc, Hi and Hpi the minimal chordal, interval and proper interval completion of G respectively, of minimum largest clique size. Then:
• The treewidth of G, denoted by tw(G), is equal to the treewidth of Hc. Since in chordal graphs the treewidth is equal to the size of the maximum clique minus one, we have that tw(G)= ω(Hc) − 1 [24].
• The pathwidth of G, denoted by pw(G), is equal to the treewidth of Hi. Since in interval graphs the pathwidth is equal to the size of the maximum clique minus one, we have that pw(G)= ω(Hi) − 1 [24].
• The bandwidth of G, denoted by bw(G), is equal to the bandwidth of Hpi. Since in proper interval graphs the bandwidth is equal to the size of the maximum clique minus one, we have that bw(G)= ω(Hpi) − 1 [57]. 36
ΠΠ-DEL. Π-COMP. Π-EDIT. Π−VDL.
Perfect NPC [201] NPC [201] NPC [201] NPC [247] Chordal NPC [201] NPC [249] NPC [229] NPC[247] Interval NPC [106] NPC [155, 102] NPC[41] NPC[247] Split NPC [201] NPC [201] P [124] NPC[247] Trivially Perf. NPC [229] NPC [249, 229] ? NPC[247] Cographs NPC [63] NPC [63] ? NPC [247] Bipartite NPC [103] - NPC [103] NPC [247] Threshold NPC [180] NPC [180] ? NPC [247] Cluster NPC [63] P [229] NPC [229] NPC [247] Chain NPC [201] NPC [249] ? NPC [247] Planar NPC [196, 104] - - NPC [247] k-Ver. Conn. - ? - - Forest P [64] - - NPC [247] k-Edge Conn. - P [42] - - Comparability NPC [250] NPC [123] NPC[201] NPC [247] Permutation NPC [41] NPC [41] NPC [41] NPC [247] Strongly Chordal ? NPC [153] ? NPC [247] Weakly Chordal NPC [41] NPC [41] ? NPC[247] Proper Interval NPC [106] NPC [249, 102] NPC[41] NPC [247] Interval Bip. NPC [244] - NPC [62] NPC [247] Chordal Bip. ? ? ? NPC [247] Circular Arc NPC [201] NPC [201] NPC [41] NPC [247] Proper circ. Arc NPC [201] NPC [201] NPC [41] NPC [247] Unit Circ. Arc NPC [201] NPC [201] NPC[41] NPC [247] Circle NPC [41] NPC [41] NPC [41] NPC[247] AT-free NPC [201] ? ? NPC [247] Trees P [64] - - NPC [248]
Table 3.1: In this table we summarize the complexity results for the Minimum Π-Deletion, Completion, Editing and Vertex Deletion problem respectively. The “-” entries mean that the problem is not interesting, while the “?” indicates an open problem. NPC stands for NP-complete, since here we consider only the decision version of these problems, and P for polynomial time solvable. 37
Needless to say, all these three problems are NP-complete [5, 96, 101]. However when we fix the parameter to be a constant k, checking whether the treewidth or the pathwidth of a graph is at most k, can be done in linear time for every k [23, 159], while for bandwidth only an O(nk) algorithm [120] is known. One should note that a minimum completion does not coincide with a completion that minimizes the size of the maximum clique. We give an example in Figure 3.2.
x
X
C
x x
Figure 3.2: The top graph consists of a clique C and an independent set X such that |C| > (|X|2 −|X|)/2, all edges between X and C are present, and a vertex x completely adjacent to X. This means that the set {x, x1, x2,c} induces a C4 for any x1, x2 ∈ X and c ∈ C. The way to achieve Minimum Fill-in is to make X into a clique, as shown in the left graph. This creates a chordal graph with maximum clique size |C| + |X|, by using (|X|2 −|X|)/2 fill edges. If instead we add a fill edge from x to each vertex in C as in the graph to the right, we still destroy all cycles, but we use |C| fill edges. Hence strictly more than the optimal solution for Minimum Fill-in. On the other hand, the maximum clique of the chordal graph we obtain in this way, is of size |C| + 1. This is the smallest maximum clique that any chordal completion of this graph can have, since also the input graph has maximum clique size |C| + 1.
3.2.1 Treewidth and Pathwidth Treewidth and pathwidth were introduced by Robertson and Seymour during their Graph Minor project as a tool to decompose graphs into treelike and pathlike structures [222, 223]. We give the formal definition here.
Definition 3.2.2 A tree decomposition of a graph G = (V, E) is a pair ({Xi | i ∈ I}, T =(I, M)) where {Xi | i ∈ I} is a collection of subsets of V (also called bags), and T is a tree such that: 38
(i) Si∈I Xi = V (ii) (u, v) ∈ E =⇒ ∃i ∈ I with u, v ∈ Xi (iii) For all vertices v ∈ V , {i ∈ I | v ∈ Xi} induces a connected subtree of T .
The width of a tree decomposition ({Xi | i ∈ I}, T =(I, M)) is maxi∈I |Xi|−1. The treewidth of a graph G, denoted tw(G), is the minimum width over all tree decompositions of G. The pathwidth of a graph G, denoted by pw(G), is defined analogously to treewidth. The only difference is that the tree is replaced with a path in the defi- nition, obtaining what is called a path decomposition. Thus, path decompositions are restricted types of tree decompositions, which implies that tw(G) ≤ pw(G) for every graph G. The reason why treewidth and pathwidth are so well studied, is mostly that a very large class of NP-hard problems can be solved in polynomial time on graphs that have bounded treewidth or pathwidth. In particular Courcelle proved the following very powerful theorem.
Theorem 3.2.3 ([66]) Let k be a fixed constant and Π be a property of graphs that is definable in monadic second order logic. Then Π can be decided in linear time on graphs of treewidth at most k.
Monadic second order logic (MSOL) is a language for expressing properties, especially graph-theoretic ones, in logic. For example, we can express the Maxi- mum Independent Set problem in MSOL.
′ ′ ′ maxV ′⊆V V : ∀u∀v(u ∈ V ∧ v ∈ V ) ⇒ ¬(uv ∈ E)
Thus, by Theorem 3.2.3, the Maximum Independent Set problem is solvable in linear time on graphs of bounded treewidth. Theorem 3.2.3 holds also when replacing treewidth with pathwidth, since if a graph has bounded pathwidth, then it has bounded treewidth (but not vice- versa). The fact that for various problems there exist algorithms of running time O(ck(n + m)), where k is the pathwidth or treewidth of the graph, and c is a constant, is also used to speed-up exact (exponential time) algorithms. Usually exact algorithms are based on branching, and the idea is that we gain a lot by branching on high degree vertices, but not so much when branching on low degree vertices. However, after we get rid of high degree vertices, the remaining graph is likely to have bounded treewidth or pathwidth, so we can use the above mentioned algorithm to solve the problem exactly and more efficiently than if we kept branching. See the survey on exact algorithms by Fomin, Grandoni, and Kratsch for more details [92]. The Treewidth and Pathwidth problems have also a game-theoretic interpre- tation. Consider the search game where searchers are placed to and removed from 39 vertices of the graph, while looking for an invisible fugitive that moves arbitrarily fast along paths in the graph. The goal is to find a strategy that guarantees the searchers to capture the fugitive. The fugitive is captured if a searcher is on the same node of the fugitive, and all its neighbors are also occupied by searchers. The minimum number of searchers needed for the strategy to success, is equal to the pathwidth of the graph minus one [18]. If the fugitive is visible, instead, then this number is equal to the treewidth of the graph plus one [18]. For a comprehensive survey on Treewidth we refer the reader to the work of Bodlaender [24].
3.2.2 Bandwidth
The Bandwidth problem can be expressed as a layout problem. For a graph G =(V, E), with |V | = n, a layout of G, is a 1-1 function L : V →{1, 2,...,n}. The bandwidth of L is bwL(G) = maxuv∈E{|L(u) − L(v)|}. The bandwidth of G is the minimum bandwidth over all layouts of G, namely bw(G) = minL bwL(G). Its importance is due to its connections with sparse matrix computation [67]. Bandwith is defined in a way similar to Profile, but rather than minimizing the total number of non-zero entries around the diagonal, the goal here is to mini- mize the distance of the furthest non-zero entry from the diagonal in every row and column. The Bandwidth problem is NP-complete even on binary trees and caterpillars of hair length at most 3 [101, 192]. One more kind of graph modification problem that has been studied moti- vated by bandwidth, is the Minimum Degree Interval Completion problem. The problem is introduced in [91], and consists in finding the interval supergraph of the input graph, with smallest maximum degree. The motivation is that, if we define id(G) as the minimum degree over all interval supergraphs of G, then bw(G) ≤ id(G) ≤ 2bw(G). This also imply that the Minimum Degree Interval Completion problem is NP-complete, or there would be a polynomial 2-approximation algorithm for bandwith, while this is not the case unless P=NP [242].
3.3 Modification with restrictions
A variation of modification problems with important applications, consists in limiting the set of possible modifications. That is, only some edges are allowed to be added or removed from the graph. We will show how this problem is often hard even though there is no optimization involved. 40
3.3.1 Sandwich problem
The Π-Sandwich problem was formally introduced by Golumbic in [110], but it was already considered explicitly [109] or implicitly in other papers [26, 233]. Formally the problem takes as input two graphs, one a supergraph of the other, and asks whether there exists a graph “sandwiched” between the two input ones, with a specific property Π.
If we call G1 =(V, E1) and G2 =(V, E2) the input graphs, such that E1 ⊆ E2, then the problem is equivalent to either finding a Π-completion of G1 by adding only edges belonging to E2 \ E1, or finding a Π-deletion of G2 by removing only edges from E2 \ E1. This problem can be seen either as a generalization of the Π- Recognition problem, or a of the Π-Completion and Π-Deletion problem. In the first case, if we set G = G1 = G2, what we are actually doing is checking whether G has the property Π. For the latter if G2 is is the complete graph, or G1 is the edgeless graph, the problem becomes exactly the Π-Completion or Π-Deletion problem respectively. By the definition of the problem, some instances are not interesting. For example if the property Π is closed under either the addition or deletion of edges, then the answer to the problem is yes if and only if either G2 or G1, respectively, already has the property Π. Hence we can say that monotone properties are not interesting in this case. A useful observation is also that the answer to the Π-sandwich problem for G1 and G2 is yes if and only if the answer to the Π-Sandwich problem is yes for G1 and G2. For other hereditary properties there are various complexity results, both poly- nomial and NP-complete. Most of them are listed in [110], where the Comparabil- ity, Permutation, Chordal, Interval, Circle and Circular-arc Sandwich problems are showed NP-complete, while polynomial algorithms for the Split, Threshold and Cograph Sandwich problems are provided. The result for cographs has been generalized in [70, 71], where the authors give a polynomial time algorithm for the P4-sparse Sandwich problem. A graph is said to be P4-sparse if for every subset of 5 vertices, there is at most one induced P4. Later, also the Strongly Chordal and Chordal Bipartite Sandwich problems were shown NP-complete [72]. Another interesting graph class considered for this problem is that of (k, l)-graphs [69]. A (k, l) partition of a graph is a partition of its vertex set into k independent sets and l cliques. A (k, l)-graph is a graph that admits a (k, l) partition. This graph class generalizes many other important classes. For example, the (k, 0)-graphs, are exactly the graphs the are k colorable. Hence for each fixed k > 2, it is NP-complete to recognize them, while (2, 0)-graphs are exactly bipartite graphs. The same complexity results hold for (0, l)-graphs: it is NP-complete to recognize them when l > 2 and polynomial otherwise. The class of (1, 1)-graphs is exactly the class of split graphs, and therefore polynomial time recognizable. Since the Sandwich problem is a generalization of the Recognition problem, when k> 2 or l> 2 the Sandwich problem for (k, l)-graphs is NP-complete. In [69], the authors 41 settle the complexity of the (k, l)-Sandwich problem for all values of k and l. It is NP-complete when k + l > 2 and polynomial otherwise. In the same paper they also investigate the complexity of other optimization versions of the (k, l)- Sandwich problems. For example they consider input graphs of bounded degree, and prove that if k > 2 or the maximum degree of the input graphs is greater than 3, then the problem is NP-complete. They show that it is NP-complete also to find a maximum induced subgraph of the input graphs, for which there exists a (2, 1)-sandwich graph. Other properties investigated for the Sandwich problem are related to de- compositions arising in perfect graph theory, like homogeneous set [45] and join composition [87]. There is even an example for which the sandwich graph is hard for a co-NP property, namely that of being Pk-free for large enough k’s [227]. Finally, Habib et al. in [122] investigated some restrictions of the problem, namely when G1 and G2 have orientation on the edges, both transitive and not.
3.3.2 Extending colorings A special case of the Π-Completion problem that has attracted much attention is that of making a colored graph into a graph with property Π, adding only edges that do no violate the given coloring. In other words, we are only allowed to add edges between vertices with different colors. The first problem of this kind was introduced in 1974, when it was shown that the Perfect Phylogeny problem was equivalent to the problem of triangulating a colored graph [40]. The problem has been shown to be polynomial time solvable when the input graph is colored with 2, 3 [149, 27], or 4 colors [150], but NP-complete in general [26]. The Perfect Phylogeny problem deals with the construction of evolutionary trees. For a more detailed explanation the reader can refer to [86]. It is easy to see that the Colored Π-Completion problem, as we will call it, is equivalent to Π-Sandwich problem where G1 is the input colored graph, and G2 is the graph where all possible edges that do not violate the coloring are added. Hence if Colored Π-Completion problem is NP-complete, it follows that also the Π-Sandwich problem is. Besides, if it can be proved that the Colored Π-Completion problem is NP-complete when the input graph has at most k colors for a fixed k, this implies that Π-Sandwich problem is hard also when we require the sandwiched graph to have clique size at most k. Motivated by physical mapping of DNA, there has been some research also on the Colored Interval Completion problem. We described the physical mapping of DNA in Chapter 3.1.2, and we said that many copies of the same piece of DNA are fragmented and recombined. In this case all fragments of the same copy have the same color. This problem is not only NP-complete in general [32], but even when the number of colors of the input graph is greater or equal to 4 [32], or the input graph is a caterpillar with hair length at most 2 [4]. In [32] the authors 42 give a polynomial algorithm for the case of 2 and 3 colors. Restricting Π to the class of proper interval graphs, does not simplify the problem. In fact, also the Colored Proper Interval Completion problem is NP-complete [152].
3.3.3 Probe graphs Graph modification problems can take also the form of recognition problems for some special graph classes. That is, given a graph class Π, we can generalize this class defining the set of graphs that can be obtained from the graphs in Π through some particular modifications. One straightforward example is given by parameterized graphs, that we will cover in detail in Chapter 5.1. For example, given an hereditary property Π, we define the parameterized graph class Π+ke to be the class of the graphs that can be obtained by adding at most k edges to some graph with the property Π. Recognizing such graphs is equivalent to solving the Π-Deletion problem, with the restriction that at most k edges can be deleted. Another example is given by probe graphs. Given a property Π, we say that a graph G =(V, E) is probe-Π if we can partition V into two sets, probes and non- probes, with the following properties. The non-probes form an independent set, and a graph with property Π can be obtained from G by adding edges only be- tween non-probe vertices. The set of probes might be given as a part of the input or not, giving two different recognition problems: a partitioned and unpartitioned version respectively. This kind of graphs were introduced in 1994 in molecular biology as a new, more flexible tool for the physical mapping of DNA sequences [35]. In particular the class of probe interval graphs was introduced [187]. The main open prob- lem was whether such graphs could be recognized in polynomial time. For the partitioned case, first Johnson and Spinrad, and later McConnell and Spinrad, an- swered affirmatively, giving an O(n2) [147] and an O(n + m log n) [185] algorithm respectively. The problem for unpartitioned probe interval graphs remained open until 2005, when Chang et al. showed that it is solvable in polynomial time [50]. In the meanwhile, many other classes of probe graphs began to be investigated. The first new class to be introduced was that of probe chordal graphs [108], motivated by some applications in phylogenetics. Polynomial time algorithms for their recognition in both the partitioned and unpartitioned case, were given one year later by Berry at al. [16]. Subsequently also probe permutation [46], probe cograph [48, 171], probe split [172], probe distance hereditary [48], probe chordal bipartite [15], probe interval bigraphs [51], probe ptolemaic [49], probe comparability [47] and probe self complementary graphs [53] were introduced and studied. All of them admit polynomial recognition algorithms, either for one or both versions of the problem, but most interesting is that some of them also have characterizations through a finite set of forbidden subgraph. Such characteriza- 43 tions are very desirable as they imply polynomial time algorithms for recognition, and give a lot of information on the structure of the graph. It is rare to be able to characterize the yes instances of a graph modification problem in this way, and that is why looking for a forbidden subgraph characterization for probe interval graphs is a very interesting open problem. There is also an interesting conjecture about probe graphs, that if a graph class Π is recognizable in polynomial time, so is the corresponding probe class. The conjecture concerned only perfect graphs in the beginning, but it was extended in [171] to include all polynomial time recognizable graph classes. 44 Chapter 4
Methods to solve graph modification problems
As we mentioned in the previous chapter, most interesting graph modification problems have been proved NP-hard. For this reason, even though there are some optimized exponential time algorithms to solve the most applicable of these problems exactly [243, 94, 93, 89, 215, 83], the research has mainly developed in various other directions, looking for efficient solutions at cost of optimality or generality. We can summarize them in 4 main areas.
• Approximation: The optimality of the solution is traded for a more efficient algorithm that can still find a solution guaranteed to be “good enough” with respect to the optimum.
• Restricted inputs: In this case what we loose is generality. We assume the input graph not to be arbitrary, but having specific properties that might allow us to solve the problem optimally in polynomial time.
• Minimal completions or deletions: We require that the set of edges or vertices that when added or removed to or from the graph gives us the desired property, is inclusion minimal. There is no guarantee about how bad the solution might be with respect to the optimal one, but we obtain other interesting properties.
• Parameterized algorithms: We use some extra information related to the problem (the parameter) in order to design algorithms that can compute an optimal solution in polynomial time, when the parameter is small enough.
In this work we will leave out the approximation results, as they are ex- tremely numerous and would require a survey on their own, and no paper of this thesis concerns approximation. We will focus instead on the other three methods.
45 46
However the reader can refer to some surveys and papers for an overview of the approximation results for some problems like: Minimum Fill-in [200]; Minimum Interval Completion [217]; Minimum Cluster Editing [11]; Planar Deletion [175]; and various connectivity properties [139].
4.1 Restricted inputs
In this section we present a summary of the complexity results for some graph modification problems when the input graph is not arbitrary, but it belongs to a particular class. Often, considering restricted inputs helps, and efficient polynomial algorithms can be designed for problems that are in general NP-complete. However, this is not always the case, as we are going to see. For the Interval Vertex Deletion problem there exists a polynomial time algo- rithm when the input is a distance hereditary graph [213]. Making a graph into interval bipartite (a forest where each tree is a caterpillar) by editing the minimum number of edges is NP-complete even when the input is bipartite with bounded degree [62]. The Minimum Interval Bipartite Deletion problem, however, becomes polynomial when the input graph is a tree [244]. Another case where taking a restricted input does not help, is when we want to make a split graph threshold by adding a minimum number of edges [212]. The fact that the Minimum Threshold Completion (or Deletion) problem is NP- complete even on split graphs is particularly surprising since threshold graphs are also split. As we can see in Table 4.1, however, this is not an isolated case. Also the Minimum Interval Completion problem restricted on chordal graphs remains NP-complete, even though interval graphs are chordal. On the bright side, for any hereditary property Π characterized by a finite set of forbidden induced subgraphs, a maximum (induced) subgraph with property Π can be found in linear time for series-parallel graphs [204]. The remaining results are summarized in Table 4.1. We consider the mod- ification problems that are probably the most interesting and therefore studied also on a large number of special graph classes. They are: The Minimum Fill-in (MIN FILL); Treewidth (TW); Pathwidth (PW); The Minimum Interval Com- pletion problem (INT. COMP.); and Bandwidth (BW). However we need to clear some details to help the correct reading of Table 4.1. When we write that a result is “by” some reference [X], we mean that the result follows from the re- sults of [X], even though it might not be mentioned explicitly in that paper. For instance, in the case of Bandwidth, it is proved in [192] that this problem is NP-complete for trees. This implies NP-completeness for all superclasses of trees, like chordal, weakly chordal and chordal bipartite graphs. Positive results, i.e., polynomial time algorithms, for a problem on a certain class, instead, imply 47 polynomial algorithms for the same problem on all the subclasses of this class. This is the case of r-AT-free graphs, namely the AT-free graphs with at most r minimal separators. When r is polynomial, then the algorithm given in [165] for Minimum Fill-in, Treewidth, Pathwidth and Minimum Interval Completion on r-AT-free graphs, is polynomial as well. Hence, for all subclasses of AT-free graphs that have a polynomial number of minimal separators, like permutation graphs, all these problems can be solved in polynomial time. The diagram given in Figure 2.2 might help reading the results in Table 4.1. Another note about AT-free graphs is that, for this graph class and all its subclasses, the treewidth equals the pathwidth, and Minimum Fill-in is equivalent to Minimum Interval Completion [191]. Furthermore, for the AT-free graphs that are also claw-free, bandwidth equals treewidth and therefore pathwidth as well [209]. Thus all NP- completeness results for AT-free graphs, follow from co-bipartite graphs, that are AT-free and claw free graphs. Similarly the NP-completeness of bandwidth on circle graphs, follows from their equivalence to k-polygon graphs [194]. When we write modular decomposable graphs, we mean the graphs that have a special modular decomposition , i.e., one where all prime graphs have either a polynomial number of separators or they are clique-matching graphs (see [31] for details and definitions). We can consider these graphs as a generalization of the graphs with a polynomial number of minimal separators. Hence, if a problem is NP-hard on a graph class like chordal, where all graphs have a polynomial number of minimal separators, then it is NP-hard on modular decomposable graphs. Finally, the entries marked with a “-”, indicate a trivial problem, like when the input to the Minimum fill-in or Minimum Interval Completion problem is already chordal or interval respectively. Similarly Treewidth, Pathwidth and Bandwidth are equiv- alent to the Maximum Clique problem on chordal, interval and proper interval graphs, respectively. Thus they are all solvable in linear time on these graph classes, and their subclasses, since a maximum size clique can be found in linear time on chordal graphs [107]. An interesting fact that can be deduced by Table 4.1 is that Minimum Fill-in and Treewidth seem to always have the same computational complexity. The first counter example to this was given in [179] (Paper IV), for the class of split+1v graphs. A similar behavior can be noticed also for the Minimum Interval Com- pletion problem and Pathwidth, that have the same complexity on most graph classes, but they differ for split graphs.
4.2 Minimality
A first specific approach to deal with the hardness of graph modification problems, was proposed for chordal completions in 1976 by Rose, Tarjan and Lueker and Ohtsuki [226, 205]. The idea was to relax the optimality in the following way. 48 al .:Teetismre iha“”idct noe prob open an indicate “?” a with marked entries The 4.1: Table fteiptgah iiu ili fbpriepermutatio bipartite of Fill-in Minimum graph. input the of i.ds.hereditary dist. Bip. NU LS I ILT WIT OP BW COMP. INT. PW TW FILL MIN CLASS INPUT i.Permutation Bip. oua eop 3]P[1 P y[2]NCb 22 P y[1 by NPC [212] by NPC [121] by NPC [31] P [31] P decomp. Modular ekyChordal Weakly it hereditary Dist. rprItra - - - - - Interval Proper hra Bip. Chordal Permutation obprieNC[4]NC[]NC[]NC[5]NC[209] NPC [251] NPC [5] NPC [5] NPC [249] NPC Co-bipartite iclrarc Circular d HHD-free -Trapezoid Cographs r iatt P 28 P 11 P 11 P y[192] by NPC ? [161] NPC [161] NPC [238] NPC Bipartite Split+1 hra P 11 P[1]NCb [192] by NPC NPC[212] [121] NPC - - Chordal nevl------Interval tFe P y[4]NCb 5 P y[]NCb 21 P y[20 by NPC [251] by NPC [5] by NPC [5] by NPC [249] by NPC At-Free -At-Free lnr??NC[9]?? ? [193] NPC ? ? Planar Circle re - - Trees pi 11 P 22 NPC[164] [212] NPC [121] P - - Split v O O ( O O O n O ( O O O n ( P [179] NPC 5 O O O ( ( ( n r + ( ( n ( n n ( ( ( + + n n n 2 + + n n n m 5 3 3 m n m d 4 6 [232] ) m [166] ) [166] ) m y[39] by ) [52] ) [38] ) [29] ) 3 2 [188] ) r [65] ) [39] ) [33] ) 3 [165] ) O O O O O ( O O O ( O n O O ( n ( ( O ( ntw ( ( O + n n ( + ( n ( n n m ( + + n n m ( 2 5 + n m n 3 3 m r 3 m m 6 y[188] by ) d 10 P y[6]?NC[194] NPC ? [162] by NPC [160] ) [216] ) m [30] ) + 13 P y[6]?NCb [192] by NPC ? [162] by NPC [163] ) y[9 P 12 P y[192] by NPC ? [162] NPC [39] by ) 3]NCb 11 P y[1]NCb [192] by NPC [212] by NPC [121] by NPC [38] ) − 2 [188] ) 19 P y[1]? [212] by NPC ? [179] ) 1 3]NCb 12 P y[192] by NPC ? [162] by NPC [39] ) n 3]NCb 11 P y[1]NCb [192] by NPC [212] by NPC [121] by NPC [33] ) [29] ) 3 r 3 ) satal ier u ect nyteoiia reference original the only cite we but linear, actually is n lem. O O ( O n O O ( O pw ( ntw O + ( ( n ( ntw n m ( n 5 n 2 r and [231] ) y[8]Pb 15 ( [165] by P [188] by ) d 25 P [242] NPC ? [235] ) [30] ) + − 2]Pb 15 ? [165] by P [28] ) 1 n [29] ) 3 bw r 3 ) tnsfrtetewdhadbandwidth and treewidth the for stands O O ( O O n ( ( ( 3+3 n n m 1 5 . + 722 d r y[6]? [165] by ) + n 19 P [192] NPC [169] ) [90] ) n 3 r 3 ? ) O n O ( 4 n ( log nbw + m n )[158] [246] ) [128] ) 92] 9] . 49
Rather than looking for minimum set of fill edges that added to the input graph makes it chordal, and that we can consider as the global optimum, one might want to look for a minimal inclusion set of fill edges, comparable to a local optimum. A set of fill edges is minimal if none of its subsets added to the original input graph can produce a chordal graph. This approach is quite different from other classical ways to tackle hard prob- lems, and it is characteristic of graph modification problems. The main reason why it attracted much interest, is that minimal chordal completions are com- putable in polynomial time, in contrast to the minimum version, and that every perfect elimination ordering of a minimal triangulation produces the same trian- gulation when applied to the original input graph [19]. This is a very desirable property in sparse matrix computations, because it guarantees that the planned data storage scheme is not disturbed [176]. In general, every minimum comple- tion is contained in the set of minimal ones. Hence being able to compute and generate all minimal completions of an input graph is very appealing for the de- sign of exact exponential time algorithms or heuristics to compute the minimum. Besides, being able to characterize and search through all minimal completions efficiently might yield polynomial time algorithms for computing a minimum one when the input is restricted to some special graph class with few or special pos- sible minimal completions. We will discuss further motivations throughout the whole chapter. Here we will focus mostly on the Minimal Π-Completion problem, as it is the most studied, and we will discuss the minimal version of the other modification problems in Chapter 4.2.4. However we define formally all the problems here, and give an example in Table 4.2.
• Minimal Π-Completion: Given a graph G =(V, E), find a set of fill edges F such that H = (V, E ∪ F ) has the property Π, and for every F ′ ⊂ F , G′ =(V, E ∪ F ′) does not have the property Π. • Minimal Π-Deletion: Given a graph G = (V, E), find a set of edges F ⊂ E such that H =(V, E \ F ) has the property Π, and for every F ′ ⊂ F , G′ =(V, E \ F ′) does not have the property Π. • Minimal Π-Editing: Given a graph G = (V, E), find a set F ⊂ V × V such that H = (V, (E \ F ) ∪ (F \ E)) has the property Π, and for every F ′ ⊂ F , G′ =(V, (E \ F ′) ∪ (F ′ \ E)) does not have the property Π. • Minimal Π-Vertex Deletion: Given a graph G = (V, E), find a set of vertices V ′ ⊂ V such that G[V \ V ′] has the property Π, and for every V ′′ ⊂ V ′, G[V \ V ′′] does not have the property Π.
For the Minimal Chordal Completion problem, or Minimal Triangulation problem, there has been a long sequence of results. From the first algorithm 50
CHORDAL MODIFICATION PROBLEMS INPUT GRAPH
MINIMUM COMPLETION MINIMAL COMPLETION
INPUT GRAPH
MINIMUM DELETION MINIMAL DELETION
MINIMUM EDITING MINIMAL EDITING
MINIMUM VER. DELETION MINIMAL VER. DELETION
Table 4.2: We give some examples that show the difference between a minimum and a minimal solution. The property we want in this case is chordality. The dashed edges are, as usual, fill edges. 51 showing that the problem was solvable in O(nm) time [205, 226], there was not much progress for a few years, until the point that the problem was almost aban- doned until the mid-1990’s. In this period it was discovered the first connection between the minimal separators of a graph and minimal chordal completions and treewidth. The subsequent new characterizations that emerged, revived the in- terest in the problem. Still, it was not until 2004 that the first algorithm breaking the o(n3) barrier was given by Kratsch and Spinrad [167]. Simultaneously, an- other group gave an even faster algorithm, running in O(nω log n) [136], where O(nω) is the running time of the best matrix multiplication algorithm. The al- gorithm given in [136] is currently the fastest known for minimal triangulations. For a survey of results and more detailed history of the problem we refer the reader to [127].
Encouraged by this series of positive results, other researchers started studying minimal completions into other interesting graph classes, trying to understand better the problem and its complexity. Until then, in fact, nothing except the chordal case was known to be polynomial. There was indeed a paper by Fujisawa and Wakabashi, a few years after the one by Rose and Tarjan, were the Minimal Interval Completion problem was shown to be polynomial [206]. However the problem did not attract as much attention, as it is proved by the fact that it was rediscovered and restudied independently only a few years ago [134, 234]. In particular in [234], the authors improve the running time given in [206] from O((m + |F |) · n)) to O(mn).
To our knowledge, computing minimal completions or deletions into any class that has been studied so far, turned out be a polynomial time solvable problem. This includes minimal chordal [205, 226, 167, 136], interval [206, 134, 234], proper interval [218], split [130] (Paper I), comparability [131] (Paper II), threshold [132], chain [132], co-bipartite [132] and cograph [178] (Paper III) completions.
As proved by many results, a key step in the design of an efficient algorithm for minimal completions, consists in characterizing minimality in a practical way. For example in the case of minimal chordal completions, the breakthrough re- sults came when the connection between minimal separators of a graph and its minimal triangulations was well understood, as we will explain soon in Chapter 4.2.1. Same goes for minimal split and cograph completions. Therefore, the first problem related to minimal completions that we will take in consideration, is the characterization problem. Afterward we will discuss the possibility of designing general algorithms for minimal completions into graph classes with special, but still very general, properties. We will consider both a direct approach to compute minimal completions by adding edges to the input graph in Chapter 4.2.2 and a more general one that aims to make a given completion into a minimal one by deleting fill edges in Chapter 4.2.3. 52
4.2.1 Characterizing minimality Let us define the Π-Minimality problem as the problem of deciding whether a given Π-completion H =(V, E ∪ F ) of an input graph G =(V, E) is minimal or not. Thus, when we ask for a good characterization of minimality for a specific property Π, we mean one that implies a polynomial time algorithm for the Π- Minimality problem. As we mentioned at the beginning of Chapter 4.2, listing all possible minimal completions, or all the interesting ones, is a common method to compute a min- imum completion, either in exact exponential time algorithms or when the input is restricted to a particular class. Being able to characterize minimal completions in a good way is a key step in designing efficient algorithms that can produce all minimal completions of a given graph. For example, in the case of split graphs, it has been proved in [179] (paper IV) that every minimal split completion can be obtained considering some maximal independent set of the input graph, and turn the rest of the graph into a clique adding fill edges. This implies a straightforward O(3n/3) algorithm to compute a minimum split completion of any graph, by listing all its maximal independent sets. For chordal graphs, there exists a characterization of minimality through min- imal separators of the input graph.
Theorem 4.2.1 ([210]) A triangulation of a graph is minimal if and only if it can be obtained by making a maximal set of non-crossing minimal separators into cliques by adding edges.
Thanks to this result and the concept of potential maximal clique, i.e., a set of vertices that becomes a maximal clique in some minimal triangulation of the graph, it was possible to prove the following very general result.
Theorem 4.2.2 ([34]) Minimum fill-in and Treewidth can be computed in poly- nomial time on graphs with a polynomial number of minimal separators.
The idea is that, given an input graph G = (V, E), once a potential maxi- mal clique C of G has been made into a clique by adding fill edges to it and we obtain a new graph G′, the problem can be separated into smaller independent subproblems. A subproblem consists of the graph induced by NG′ [K], where K is a connected component of the graph G′[V \C]. Thus, if G has a polynomial num- ber of potential maximal cliques, then it is possible to generate all “promising” minimal completions in polynomial time, by using smart dynamic programming. Since a graph has a polynomial number of minimal separators if and only if it has a polynomial number of potential maximal cliques, the result follows. Practically all results given in Table 4.1 for Minimum Fill-in and Treewidth use, implicitly or explicitly, the idea behind Theorem 4.2.2, even though they 53 exploit optimizations related to the specific graph class considered. Also the result about modular decomposable graphs, that generalizes Theorem 4.2.2 to graphs where all prime graphs have a polynomial number of minimal separators, is based on Theorem 4.2.2 itself. Listing all potential maximal cliques of a graph is also the basic technique used to design the fastest exact algorithms to compute Treewidth and Minimum Fill-in available at the moment [93, 243, 94]. Besides the development of new tools for the design of efficient algorithms for Minimum Fill-in and Treewidth, the characterization of the minimal chordal completions of particular graph classes has also led to new characterizations of the graphs classes themselves:.
Theorem 4.2.3 Graph classes characterized through minimal chordal comple- tions:
• A graph is 2K2-free if and only if all its minimal triangulations are split graphs [189].
• A graph is a cograph if and only if all its minimal triangulations are trivially perfect graphs [30].
• A graph is AT-free if and only if all its minimal triangulations are interval graphs [191, 210].
• A graph is AT-free and claw-free if and only if all its minimal triangulations are proper interval graphs [209].
• Let k ≤ 5. A graph is Pk-free if and only if all its minimal triangulations are Pk-free [210].
• A graph is permutation if and only if all its minimal triangulations are interval and their connector graph is bipartite. [190].
One further example of how new tools developed to characterize minimal com- pletions turned out to have applications in the design of different algorithms, is the 3-partition of split graphs introduced in [130] (paper I). This new way to look at split graphs made it possible to develop an efficient algorithm to dynamically maintain a graph split under edge and vertex deletions and additions [129]. A similar algorithm was also given for chordal graphs in [17]. The first example of a practical characterization was given for minimal trian- gulations.
Theorem 4.2.4 ([226]) A chordal completion is minimal if and only if no single fill edge can be removed without creating 4-cycles. 54
One direction is trivial, since it holds for any Π-minimal completion. If a completion is minimal, then no single fill edge can be removed preserving the property Π by the general definition of minimality. However being able to claim that if no single fill edge can be removed, then the completion is minimal, is much stronger. In general this is not true, as it can be seen in the case of minimal com- parability, interval or cograph completions (Figure 4.1). Most importantly this property guarantees the existence of a polynomial time algorithm for computing minimal chordal completions. Just add to the input graph any set of fill edges F that makes it chordal, and then try to remove them one by one until none of them can be removed anymore. The output is clearly a minimal chordal completion by Theorem 4.2.4. The running time is polynomial for two reasons: Every time we try and remove one fill edge we can check whether the resulting graph is chordal in linear time [226]; and we need to try and remove every single fill edge at most |F | times at each step. If we try to use such a trivial approach applying the general definition of minimality, we might of course end up with an exponential running time, since we should try all possible subsets of the set F . So the first thing one might want to check when studying minimal completions into some new graph class Π, is whether minimal Π-completions behave like minimal triangulations. This behavior has been formalized and discussed in [132], where the authors call it sandwich monotonicity.
Definition 4.2.5 Let G1 = (V, E1) and G2(V, E2) be any two graphs with prop- erty Π such that E1 ⊂ E2. Then we say that Π is sandwich monotone if there exists an ordering f1, f2,...,f|E2\E1| of the edges in E2 \ E1, such that Gi = (V, E2 \{f1, f2,...,fi}) has the property Π for each 1 ≤ i ≤|E2 \ E1|.
It is not difficult to see that the following two statements are equivalent: a class Π is sandwich monotone; a Π-completion is minimal if and only if no single fill edge can be removed. Hence, if a graph class Π is sandwich monotone, then we get for free a good characterization of minimal Π-completions, and a polynomial time algorithm for the Minimal Π-Completion problem. Of course we are implicitly assuming two things. The first is that the property Π is polynomial time recognizable, otherwise the Minimal Π-Completion problem would not be polynomial time solvable either. Notice, in fact, that the Minimal Π-Completion problem is a generalization of the Π-Recognition problem, i.e, a graph G is in Π if and only if for every minimal Π-completion H of G, we have that H = G. The second is that we are able to compute a Π-completion (not necessarily minimal) of any input graph in polynomial time. For most interesting properties Π the complete graph is often a valid Π-completion, since this implies that there exists always a minimal Π-completion for every input, but there can be exceptions. For instance, when we want to complete a graph to make it a chain graph, it 55
COMPARABILITY INTERVAL
PROPER INTERVAL COGRAPH AND TRIVIALLY PERFECT
PERMUTATION
Figure 4.1: This figure is borrowed from [132]. Each graph given with solid edges belongs to the graph class mentioned above it. The addition of the dashed edges gives a supergraph in the same class, thus a non-minimal completion. If we remove any single fill edge, the resulting graph will not be in the class anymore, unlike what happens when removing both of them. This means that, although the completions are not minimal, no single fill edge can be removed to keep the completion in the desired class, as it can be done with non-minimal chordal completions. 56 makes sense to consider only inputs graphs that are bipartite, but in this case the complete bipartite graph would be a valid chain completion of any input. Nevertheless, it is not that easy to show that a graph class is sandwich mono- tone. At the moment we know that chordal [9, 226], split [130], and threshold graphs [132] are sandwich monotone, and that interval, comparability, permuta- tion, trivially perfect, proper interval graphs and cographs are not (see Figure 4.1). Interesting open problems are whether the classes of weakly and strongly chordal [132], and chordal bipartite [8] graphs are sandwich monotone, even though this last one would follow from any of the other two. If they were, then the corresponding minimal completion problems would be polynomial time solvable, while this is unknown at the moment. As a further note, the family of sandwich monotone properties contains that of monotone ones, but for monotone proper- ties the completion problem is not interesting. In fact, either the input graph has the desired property, or no supergraph exists that can have it. For instance, if a graph is not planar, we cannot make it planar adding edges to it. Also, not all sandwich monotone properties are hereditary. One example is the class of k-connected graphs for each fixed k. Given any two k-connected graphs, one supergraph of the other, also all graphs in between are k-connected. However, an induced subgraph of a k-connected graph, might easily be not even connected. For all other properties Π that are not sandwich monotone, the task of char- acterizing minimality seems to become much harder. The first non-sandwich monotone class for which the Π-Minimality problem was solved, is that of interval graphs [135]. This result is absolutely non-trivial, and it appeared only some time after the Minimal Interval Completion prob- lem was well studied and understood. Also, the resulting algorithm for testing minimality has a much higher running time than that for directly computing a minimal interval completion. At the moment, the only other non-sandwich monotone class for which there is a good characterization of minimality for the corresponding completions, are cographs [178] (paper III). In this case the char- acterization actually yields a linear time algorithm for the Cograph Minimality problem. We think the proper interval completions [218] are good candidates to be investigated next. A characterization that could lead to an efficient algo- rithm to list all minimal proper interval completions, in fact, could yield better algorithms for Bandwidth in the same way as potential maximal cliques did for Treewidth. All known approaches to characterize minimality at the moment, heavily de- pend on the specific graph class taken into consideration. It is appealing to think that there might be a general way to test minimality for some large family of properties. Even though this problem does not appear to be trivial at all, we think that if there is some solution, it should imply some concept of locality. For sandwich monotone properties, the locality is given by the single edge that cannot be removed, and for cographs by a certain connectivity property of the graphs 57 induced by the co-connected components of the completion (see Paper III). For a general approach we would need to somehow bound the number of subsets of the fill edges that we can try to remove while preserving the desired property. To this regard, properties characterized by a finite set of forbidden induced sub- graphs seem good candidates for an attempt. The reason is that the obstructions that can destroy a property when removing fill edges, have bounded size, and they might be the key to define some local minimality property. The problem arises when, removing some fill edges locally, can affect other parts of the graphs. The real challenge is to understand whether these interactions can be controlled and bounded or they are arbitrary. Let us have a look at a trivial approach to the problem, to understand why it does not work. Let us assume that Π is a hereditary property characterized by a finite set of forbidden induced subgraphs F = {F1, F2,...,Fl}, where s = max1≤i≤l |V (Fi)|. Then, as s is a constant, many problems become polynomial time solvable on Π, for instance, Recognition. Let us indicate the running time of a polynomial recognition algorithm for Π, with RΠ(n, m). Then, if a graph does not have the property Π, we can find a forbidden s subgraph in O(n · RΠ(n, m)) time, or list all of them in O(n · RΠ(n, m)) time.
The first thing that might come to mind at this point, given a Π-completion H =(V, E ∪ F ) of some input graph G =(V, E), is to select a set S ⊂ V of size at most s, and try to remove all possible subsets of fill edges in H[S], so that the resulting induced subgraph has still the property Π. The good news is that s s2 this can be done in polynomial time, namely O(n 2 RΠ(n, m)). The problem is that, even though locally we did not create any forbidden subgraph, the whole graph might now not have the property Π any more. This because there might exist another set S′ that intersects with S in some fill edges, and after removing these fill edges, H[S′] becomes a forbidden subgraph. An example is given using cograph completions in Figure 4.2. Thus, there might be cases in which no subset of the fill edges can be removed locally from any induced subgraph of size at most s without destroying Π, and yet the given completion is not minimal.
The concept of locality we just described is evidently too weak, but we believe that it is a promising direction to investigate. The solution, if any exists, might consist in an extension of the previous idea that takes into consideration a set of intersecting induced subgraphs of bounded size, rather than just one. Or maybe we might start by considering a more restricted family of properties, like the ones that have both a finite set of forbidden subgraphs and the universal vertex property that we discuss in the next chapter. This property, in fact, implies by itself an interesting minimality property of the induced subgraphs of a completion.
We leave as an open problem whether there is a characterization of minimal completions into all graph classes that are characterized by a finite set of forbidden subgraphs. 58