A fast for finding junction trees 81

A sufficiently fast algorithm for finding close to optimal junction trees

Ann Becker and Dan Geiger Computer Science Department Technion Haifa 32000, ISRAEL anyuta(Qlcs. technion.ac.il, dangC9lcs. technion.ac.il

Abstract presented with an appropriate example. One algo­ rithm, due to Rose (1974), is as fo llows: repeatedly,

select a vertex v with minimum number of neighbors An algorithm is developed for finding a close N(v), delete v fr om the graph, and make a clique out to optimal junction tree of a given graph G. of N ( v) . The resulting sequence of cliques creates a The algorithm has a worst case complexity junction tree. This greedy algorithm minimizes the 0 ( ckn a) where a and c are constants, n is size of each clique as it is being created. However, it the number of vertices , and k is the size of the could easily make a mistake at the first step that would largest clique in a junction tree of Gin which lead it to a junction tree far offthe optimal size. An­ this size is minimized. The algorithm guaran­ other algorithm, investigated by Kjaerulff (1990), is tees that the logarithm of the size of the state simulating annealing which takes a long time to run space of the heaviest clique in the junction and has no guarantees on the quality of the output. tree produced is less than a constant factor off tl;e optimal value. When k = O(logn), Finding an optimal junction tree is NP-complete but

our algorithm yields a polynomial inference for a graph with n vertices and a cliquewidth k there algorithm fo r Bayesian networks. exits an O(nk+ 1) algorithm that finds an optimal junc­ tion tree [ACP87]. This algorithm is not practical for the size of Bayesian networks dealt in practice. 1 Introduction Other for finding an optimal junction tree have a complexity of O(f(k)n) where f(k) is a super­ exponential function of k [Bo93]. These algorithms All exact inference algorithms for the computation are practical for cliques of size k 5 at most. A more of a posterior probability in general Bayesian net­ = practical algorithm for constructing an optimal j unc­ works have two conceptual phases. One phase handles e 4 operations on the graphical structure itself and the tion tree when th largest clique size is is given in [AP86]. For larger values of k there is no algorithm to other performs probabilistic computations; The junc­ date that can find the optimum junction tree quickly. tion tree algorithm [LS88] requires us to first find a The exponential dependency in k cannot be improved "good" junction tree and then perform probabilistic computations on the junction tree and the method of unless P = N P because finding an optimal clique tree conditioning [Pe86] requires to find a "good" loop cut­ fork= O(n) is NP-complete [ACP87]. set and then perform a calculation using the loop cut­ Kloks in his book [Kl94], which is devoted to set. In [BG94], we offered an algorithm that finds a finding junction trees in various graphs, gives a poly­ loop cutset for which the logarithm of the state space nomial algorithm that finds a junction tree of a given is guaranteed to be a constant factor off the optimal graph G such that its maximal clique size is at most value. In this paper, we provide a similar optimization 12t.logn off optimal where t. is a large unspecified for the junction tree algorithm. constant (See also, [BGHK91]). Kloks states that find­ We shall first restrict our discussion to networks for ing a polynomial algorithm that constructs a junction which all vertices have the same state space size and tree such that its maximal clique is a constant fa ctor to the optimality criterion which we call cliquewidth. off optimal is a major open problem. The importance The cliquewidth of an undirected graph G is the size of of this problem stems from the fa ct that many NP­ the largest clique in a junction tree of G in which the complete problems on graphs can be solved polynomi­ size of the largest clique is minimized. A more common ally if the input graph has a junction tree with fixed sized cliques and if such a junction tree can be fo und term is treewidth which is the cliquewidth minus L efficiently [Ar85, ALS91]. Some of these problems To date all methods in the AI and Statistics commu­ are: INDEPENDENT SET, DOMINATING SET, GRAPH nities for finding a junction tree had no guarantee of K-COLORABILITY, HAMILTONIAN CIRCUIT and CON- performance and could perfo rm rather poorly when 82 Becker and Geiger

STRAINT SATISFACTION PROBLEMS [DP89]. the junction tree algorithm. For details, consult [LS88, .JL090]. Robertson and Seymour [RS95], among other key re­ sults, were the first to present an algorithm that finds Definition A directed acyclie graph (DAG) is a graph a junction tree of a given graph G such that its max­ with no directed cycles. In a DAG, pa( v) denotes the imal clique size is at most a constant factor off opti­ set of parents of a vertex v. A is a mal (They actually used a slightly different concept DAG such that with each vertex v we associate a finite termed branchwidth). Reed [Re92] presents Robertson set D(v) called the state space of v and a probability and Seymour's algorithm in a more accessible form and distribution P( vlpa(v)). The joint distribution of V is shows that its output is always less than 4 times the given by P(V) = P(vlpa(v)). cliquewidth and the complexity is O(k233kn2). Reed ITvEV also gives an algorithm that errs by a factor of 5 and The updating problem is to compute the posterior prob­ has a complexity O(k234kn log n . Lagergren [La96] ) ability of a certain vertex given specific values to a set presents efficient parallel algorithms for this problem. of other vertices. We offer an algorithm that finds a junction tree such The junction tree algorithm solves the updating prob­ that its largest clique is at most (2a: + 1) times the lem as follows. For every vertex v, it connects every cliquewidth where a is the approximation ratio for pair of v 's parents and removes the direction of all any approximation algorithm for the 3-way vertex cut edges in the graph. The resulting graph is undirected problem. When using a -approximation algorithm for � (called the ). Then, the moral graph is tri­ the 3-way vertex cut problem (a = �) due to [GVY94], angulated; edges are added until every cycle of length

our algorithm's complexity is 0(24·66kn · poly (n)) and more than three has a chord. These are called fill-in it errs by a factor of 3.66 where poly(n) is the running edges. Once the graph is triangulated (or chordal), a time of linear programming. tree of cliques, called the junction tree, is constructed. The junction tree algorithm then loads all probabilities When k = O(log n ) , our algorithm, like previous ones, into the junction tree and performs the calculations on is polynomial. Consequently, it yields a polynomial the new structure. inference algorithm for the class of Bayesian networks have a cliquewidth. Of course, one that logarithmic Definition Let G = (V, E) be a . A does not know a priori what is the cliquewidth of a junction tree of G is a tree 1{ such that each maximal given network and so a user must terminate the algo­ clique C of G is a node in H, and for every vertex v rithm if the running time is too long, in which case, of G, if we remove from 1{ all nodes not containing v, however, the running time of exact inference must be the remaining (hyper) graph stays connected. quite large as well. We show that for the class of Bayesian networ s having a slightly larger than loga­ k The single most important step of this algorithm is rithmic cliquewidth, there exists no polynomial infer­ triangulation. There are many ways to add edges to ence algorithm unless all NP-complete problems are a given graph until it becomes chordal. In particular, solvable in less than exponential time. one can simply make a single large clique. However, In Section 3, we describe the algorithm and prove its the time for loading the probabilities and performing

performance guarantee. This algorithm is made as the calculations is proportional to the total state space simple as possible to facilitate the proof. In Section 4, given by LCEH ITvEC ID(v)l, which is dominated by we describe several heuristics that improve the algo­ the size of the maximal clique if all vertices have the rithm's average case performance. In Section 5, we de­ same state space size. For example, if a maximal clique scribe the changes needed so that the algorithm takes contains m vertices and if their state space is of size into account vertices with different state space sizes. two, then the probability table for this clique is of size The modified algorithm guarantees that the logarithm 2m. The objective of triangulation is to find a trian­ of the size of the state space of the heaviest clique in gulation such that the maximal clique size is as small the junction tree found is less than a constant factor as possible. Sections 3 and 4 are doing just that. In off the optimal value. In Section 6 we describe ex­ section 5, we describe the changes needed in order to periments made using the graph Medianus I. In most account for varying state space sizes. instances our algorithm was superior to au enhanced greedy algorithm both in terms of the largest state 3 The Triangulation Algorithm space and in terms of the total state space. In Sec­ tion 7 we discuss the extend to which our results can A natural approach to triangulate a graph G = (V, E) be improved. is to use a divide and conquer technique. In each it­ eration a minimum set of vertices X is found which removal from G splits G into two disconnected com­ 2 The Junction Tree Algorithm ponents having vertex sets A and B such that AU B U X = V. The set X is called a minimum vertex cut. The junction tree algorithm is currently the most The algorithm proceeds on the two smaller problems practical inference method for Bayesian networks. In G[X U A] and G[X U B], the subgraphs induced from this section we provide the relevant highlights of G by the vertex sets XU A and XU B respectively. A fast algorithm for finding junction trees 83

Each subgraph is triangulated such that X becomes a ALGORITHM Triangulate(G,W,k) clique in it. Input: An undirected graph G(V,E),W � V,k. Output: A triangulation of such that \Vhile this approach yields a triangulated graph, the G W is made a clique and such that the size size of the cliques produced may grow up to an O(n) factor off their initial size if in each step one of the of the largest clique< (2a + 1)k (Success) graphs shrinks only by a constant number of vertices or, a valid stateme nt that the cliquewidth and the vertex cut found in each step has many edges of G is larger than k (Failure). connecting it to previously found cuts. Robertson and Seymour, Reed, and Kloks all provide clever modifi­ If lVI< (2a + 1)k then make a clique out of G cations that prevent the initial clique X from growing else beyond a constant factor off its initial size. Find a W-decomposition (X, A, B, C) of G wrt (k, a); We provide an algorithm that is similar to previous If not found return "cliquewidth > k" ones except that rather than dividing the graph to WA +- w n A, WB +- w n B, We:+- w n C; two subproblems, we divide it to three subproblems. call Triangulate(C[A U X], WA U X, k); As a procedure, we use an a-approximation algorithm call Triangulate(C[B U X], W8 U X, k); for the 3-way vertex cut problem. The 3-way vertex call Triangulate(C[CUX], Wc: U X, k); cut problem is defined as follows: given a weighted make a clique of G[W U X]. undirected graph and three vertices, find a set of ver­ tices of minimum weight whose removal leaves each Figure 1: The triangulation algorithm of the three vertices disconnected from the other two. This problem is known to be NP-hard [Cu91]. There exists a simple 2-approximation algorithm, that is, order to triangulate a graph G having a cliquewidth k the weight of its output is no more than twice the we call Triangulate (G, 0, k). When the algorithm is weight of an optimal 3-way vertex cut. A polynomial called with W = 0, the size of the second argument of -approximation algorithm for the 3-way vertex cut 1 Triangulate in each recursive call is (strictly) less than problem is reported in [GVY94] (Actually, their algo­ (a+ l)k because, when lVI � (2a: +1) k, by definition of rithm is a (2 - f)-approximation algorithm that finds W-decompositions, the sets WA U X, H'B U X, We U k-way vertex cuts). X which are the arguments passed in the recursive calls, contain less than (a+ 1)k vertices, respectively. Our algorithm produces a triangulated graph whose Figure 2 shows graph and how it splits into three maximal clique size is less than (2a+ 1)k where k is the a cliquewidth of G and a is the ratio between the weight subgraphs in a recursive call of Triangulate . The set of the 3-way vertex cut found by the algorithm we use W serves to monitor the shrinking rate of the size of the subproblems in each recursive call. and the optimal3-way vertex cut. For a = 1, obtained by using Garg et al's algorithm, our approximation The next two lemmas show that a W-decomposition algorithm yields a triangulation having a cliquewidth must exist or the cliquewidth is greater than k, in hounded by 3�k. which case the algorithm outputs correctly this fact. Consequently, a naive way to use this algorithm is Definition Let G = (V, E) be a graph. A decompo­ to repeatedly call TRIANGULATE(G,0,k) starting with sition of G is a partition (X, A, B, C) of V, where A k= 1 and incrementing k by 1 whenever the algorithm and B are non-empty sets, such that there are no edges fails to triangulate G. In the next section, we provide between A, B and C. implementation details and a complexity analysis.

Definition Given an integer a real number k � 1, Lemma 1 Given a graph G(V, E) with a cliquewidth a � 1, a graph G = (V, E) such that lVI � (2a + l)k, � k, lVI � k + 2, and a subset of vertices W, IWI > 1, and a subset of vertices H' � V, a decomposition there exists a decomposition (X, A, B, C) of G such (X, A, B, C) of G is called a W-decomposition wrt that /X/� k, IW nA/ � �IWI, /W nBI ��/WI and (k, a) if IWI < (a+ 1)k, lXI � ak, I(W n A) U XI < IW CI �IW (a+1)k, I(WnB)UXI < (a+1)k, and I(W n C)UXI < n � I. (a + 1)k. Proof: A constructive proof of similar claims is given For example, suppose G is the chain a - b- c- d-e. in [Kl94, Lemma 2.2.9]. Let H(G) be a junction tree of The triplet X= {c},A = {a,b}, B = {d,e} and C = G with a cliquewidth � k. Add edges until all cliques 0 is a decomposition of G. Given W = {b, d}, this in this junction tree become of size k. Call the result­ decomposition is a W-clecomposition of G wrt k = ing junction tree T(G). Now consider the following 1 and a = 2. Given W = {b, c}, the triplet X = algorithm. Start with any clique X in T(G). If there {d},A = {a,b,c}, B = {e} and C = 0 is not a W­ is no connected component in C[V\X] which has more decomposition of G wrt (k = 1, a= 2) because I(W n than �!WI vertices of W, then stop. Otherwise, let S u XI= 3. A) be a component in G[V\X] which has more than �I WI The triangulation algorithm is given in Figure 1. In vertices of W. There exists a vertex yin S which has k- 1 neighbors in X in the graph T(G) (viewed as a 84 Becker and Geiger

contains k - 1 vertices of X and which is not a leaf in T(G). The graph G[V \ Y] has at least two connected components and each contains less than half the ver­ tices of W (because IWI > 1). D

Lemma 2 Given an integer k 2: 1, a real number a 2: 1, a graph G(V, E) with lVI 2: (2a + 1)k and a subset of vertices W such that IWI < (a+ 1)k, there exists a W-decomposition (X, A, B, C) ofG wrt (k, a) or the cliquewidth of G is larger than k.

Proof: Let G be a graph with a cliquewidth � k. If IWI � 1, then let X be any minimal vertex cut that does not contain a vertex of W. If I XI � k, the resulting decomposition is a W-decomposition wrt (k, a). Otherwise, the cliquewidth is larger than k.

Suppose IWI > 1. Let (X, A, B, C) be a decomposition ofG with the properties guaranteed by Lemma 1. We will prove that (X, A, B, C) is also a W-decomposition wrt (k,a). If it were not, then assume that I(WnA)U XI 2: (a+ 1)k, this inequality implies that [W n A\ 2: Figure 2: An example of one level of a recursive call ak because \XI.� k. But according to Lemma 1, we with k = 3 and a = 1. Highlighted vertices are in W have IWI 2: 2IW n AI. Consequently, IWI 2: 2ak in

and X = {j, g}. The three graphs at the bottom are contradiction to its given size which is smaller than

passed as arguments to the next level of recursion. (o+1)k. Hence, if the cliquewidth ofG � k, then there is a W-decomposition wrt (k, a). Equivalently, if there isn't a W-decomposition wrt (k, a), the cliquewidth must be larger than k. D chordal graph). Let x be the vertex in X that is not a neighborofy in T(G). Define Y = X\{x}U{y}. Note Theorem 3 If G(V, E) is a graph with n vertices, that Y also has k vertices. The algorithm continues k 2: 1 an integer, a 2: 1 a real number, and W withY. is a subset of V such that IWI < (a + l)k, then Triangulate(G, W, k) triangulatesG such that the ver­ To show that this algorithm terminates, we prove that tices of W foro� a clique and such that the size of a in each step of the algorithm one of two conditions largest clique of the triangulated graph < (2a + 1)k or is met. Hereafter, the component which includes the the algorithm correctly outputs that the cliquewidth of largest part of W will be called the main component. G is larger than k. The first condition is that the number of vertices in the main component decreases and the number of vertices Proof: If the algorithm outputs that the cliquewidth of W in the main component does not increase. The of G is larger than k, then this is a valid statement by second condition is that the number of vertices of W lemma 2. Assume the algorithm does not produce this in the main component decreases. output. Notice that G[V\Y] has two types of components. One The algorithm always terminates because in every re­ type consist only of vertices in S \ { y}. If the main cursive call of Triangulate the graphs G[AU X], G[ B component of G[V \ Y] is among these, the number U X] and G[CUX] have less vertices thanG[AUBUCUX] of vertices is decreased and the number of vertices of since A and B are not empty. W does not increase. The other type of components consist only of vertices of { x} U V \ ( S U X). The Next, we show that the algorithm returns a triangu­ total number of vertices of W in this set is less than lated graph. We prove this by induction using the i IWI because S contains more than half the vertices recursive structure of the algorithm. Clearly the claim of W. Hence, in this case, the number of vertices from is true if lVI < (2a + 1)k. Assume \VI 2: (2a + 1)k. W in the main component decreases by at least one. By induction the recursive call Triangulate (G [A U X],

Consequently, the algorithm terminates. WA U X,k) returns a triangulation of G[A U X], suc h that WA U X is a clique. Similarly, for B and C. Suppose now that X is the final clique considered by The algorithm then makes a clique of W U X. Conse­ this algorithm. IfG [V \X] has two or more non empty quently, the graphs G[A U W U X], G[B W X] and components, then group them into three sets to form U U G[C U W U X] are triangulated as well. Since the in­ the desired decomposition. Otherwise, there is only tersection of these triangulated graphs is a clique, the one component in G[V \X]. Consequently, the clique union must also be triangulated. X is a leaf in the junction tree T(G). Since lVI con­ tains at least k + 2 vertices, and there is only one com­ It remains to show that the cliquewidth of the trian­ ponent in G[V \X], there exists a unique cliqueY that gulated graph is less than (2n + 1)k. This is clearly A fast algorithm for finding junction trees 85

true if lVI < (2a+l)k. Hence assume lVI 2: (2a+ l)k. lXI <(a+ l)k -IWAI, IX[ < (o:+ l)k -IWB u Wei Let M be a largest clique in the triangulated graph. and /X[� ak then output (X,A,B,C) as the desired There are two cases to consider. If M contains no ver­ W-decomposition wrt (k, a). tex of A\ W, B \ W and C \ W, then M contains only N�w we will show that if a W-clecomposition wrt (k, ct) vertices of W U X. Consequently, \MI == IW U XI � ex1sts, as gu aranteed by Le mma 2, then either proce­ /W/ +/X/ < (a+ 1)k + ak, and the cliquewidth is less dure I or procedure II will find a W-decomposition than (2a + l)k as claimed. If M contains a vertex of wrt (k,c�) for some choice of WA,W8,We,Wx. Let A \ W, then it contains no vertex of B U C because (X', A', B', C') be a decomposition of G with the prop­ there are no edges between A and B U C. Hence M is erties guaranteed by Lemma l. Let wA = A' n w a clique in the triangulation of G[AUX]. By induction ' WB = Bl n W,Wc· = C' n W and Wx = X' n W, we know that IMI < (2a+ l)k. 0 and assume without loss of generality that IW AI ;:: Note that Lemma 2 and Theorem 3 hold for every [WB[ 2: [We[. Pro cedure I for [WAI < k, and proce­ n 2: 1. However, in order to find a W-decomposition dure II for I WA I 2: k both generate for this choice of wrt (k, a) sufficiently fast (Lemma 2 only guarantees WA, WB, We, Wx, a decomposition (X,A,B,C). We existence), we choose n to be the approximation fac­ now show that in either case this decomposition is a tor of an algorithm for the 3-way vertex cut prob­ W-decomposition wrt (k, a). lem, an algorithm which we employ for finding a W­ Case 1: /WA/ < k. The set of vertices X'\ Wx is a decomposition. We now give an algorithm that finds a 3-way vertex cut for the sets W A, WB, and We in the \V-decomposition wrt (k, n) where o: is chosen as just graph G['V\ W x ]. An a-approximation algorithm for described. Then we will argue for correctness. the 3-way vertex cut problem outputs a cut Y, such For every possible selection of four disjoint sub­ that fY/ < a/X'\Wx/. Since a:> 1, we get [YUWxl < sets WA,WB, We, Wx of W, such that \WAI 2: n[X'I· C�nsequently, \YUWxl:;; nkbecause IX'\< k. I WB I 2: I We I, we show how to check if there exists Finally, \WA U (Yu Wx )I< k+ o:k= (o: + l)k. Tims a W -decomposition (X,A,B,C) wrt (k,n), such that all the conditions for a W-decomposition wrt (k, n) are WA � A, WB � B, We: � C and Wx C X. There met. are at most 41 w I choices to divide W int� four set of Case 2: JWAI 2: k. The set of vertices X'\ Wx is a vertices WA, WB,We, Wx. vertex cut for the sets WA, WB U We in C[V \ Wx]. Let WA, WB, We:, Wx be a particular selection. We A max flowI min-cut algorithm outputs a vertex cut consider two cases, 1) \WA\ < k and 2) \WA\ 2: k. Y such that [Y[ S IX'\ WxJ. Consequently /Y U Each case uses a different procedure. Wx I� kb ecause [X' I S k. Finally, since [WI 2: 2/WAI (Lemma 1), we get IWA U (Y U Wx )I < ti,lk + k < Procedure I (IWAI < k): Remove Wx from the graph, (o:+l)k. Hencefrom/W8UWc)/

whenever lVI < (2a + l)k a clique is formed out of a minimum vertex cut b-c between vb and Vc · The G. However, instead of a clique, it suffices to produce output vertex cut is the union of any two of the three a junction tree of G in which W is a clique. This step vertex cuts. This output is clearly a 3-way cut and is done by forming a clique of W and then complet­ it is at most twice the optimal weight because each ing it to a junction tree of G by any of the known of the three cuts weighs less than an optimal 3-way greedy heuristics. The proof of Theorem 3 remains vertex cut. Finding each vertex cut is done using a valid without any change. Consequently, the worst max flow/min-cut algorithm which takes O(kn2 log n). case approximation is not affected. However, in many This algorithm for the 3-way vertex cut is analogous to instances the approximation is improved. the one described in [D.JPSY92] for the edge multi way cut. A more clever implementatio n using Reed 's argu­ The output of the algorithm is a triangulated graph ments can find an appropriate vertex cut in T(G) which is not necessarily minimal. This means O(k2n). Consequently, since a = 2, the total complexity is that some edges that were added (fill-in edges) might O(k243kn2) and the largest clique in the output is at possibly be removed and the resulting graph remains most 5 times the cliquewidth. triangulated. Kjaerulff provides an algorithm that, given a triangulation of a graph G and an ordering In practice, our algorithm encountered complexity is on its vertices, produces a minimal triangulated graph substantially less. The set W is almost always less [Kj 90] . We use Kj aerulff'salg orithm with an ordering than ( 1 + a) k and in most cases it is less then k that is determined as follows. First in the ordering are which implies that the complexity encountered is pro­ the simplicial vertices in the order in which they are portional to 22k rather than to 24·66k . Furthermore, removed from G. The order of the remaining vertices when a W-decomposition (X, A, B, C) exists, it is of­ is determined recursively while running Tri angulate; ten the case that W consists of two subsets and the In each level of the recursion, the vertices in X \ W third is empty, in which case the algorithm for finding follow those in A \ W, those in B \ W and those in a 3-way vertex cut is not activated (as is the case in the C\W. graph of Section 6). In addition, instead of increasing k by one whenever Tri angulate fails on the input k, We now demonstrate the effects of these modifications we can increase it to the minimal value k* for which on the graph depicted in Figure 2 (assuming W = 0). a decomposition that was tested wrt (k, a) was found If simplicial vertices are removed, then the remaining to be a W-decomposition wrt (k*,a) (k* > k). graph does not contain the vertices a and b. The next phase, when k = 3 and a = 1, creates three cliques: Finally, note that Theorem 3 provides only a worst {c, d, e, f, g}, {f, g, h, i} and {!, g, j, k}, in addition to case bound of 2a + 1 for the ratio between the size {a, c} and {b, c} due to the simplicial vertices. Fi­ of the largest clique and the cliquewidth of the given nally, applying Kjaerulff's minimization algorithm re­ graph. However, if for an integer k, Tri angulate pro­ moves the edges (!, i), (!, j), ( c, f), (c, g), (d, g) yielding duces a triangulation having a largest clique of size l an optimal junction tree. and the algorithm fails for k - 1 (it is run for i = l..k until in succeeds), then the ratio ljk is an upper bound The total complexity of running Triangulate with a for the ratio between the output and the cliquewidth given k is the time it takes to find a W-decomposition of G because the cliquewidth must be greater than times the number of nodes in the recursion tree k - 1. This bound is much tighter than 2o + 1 be­ which is at most n. The time it takes to verify cause it takes into account the given graph and the whether a choice WA, W8, We , Wx can generate a specific steps made by Tr iangulate. It is an instance­ W-decomposition with respect to (k, o: = takes 1) specific posteriori bound rather than a worst case a poly(n) which is the time it takes to run Garg et al's priori bound. The bound l j k is produced by the algo­ �-approximation algorithm for the 3-way vertex cut rithm in order to inform the user about the quality of problem. This polynom is quite high as it is the com­ the junction tree found. plexity linear programming. (A more practical algo­ ritlun, without a complexity guarantee, is the simplex algorithm). Thus the complexity of running Trian­ 5 The We ighted Problem gulate with a given k is 0(4(I+et)kn· poly(n)) where 4IWI a = !, because there are at most choices and It remains to describe the changes needed in order to IWI < (a + l)k. Since, in the worst case, the algo­ account for different state spaces of each vertex. The ritlun is run for i = 1 up to the cliquewidth of G, weight w(v) of a vertex v is the logarithm (base 2) of its the total running time is O(I;�=I 42·33in · poly(n))) state space size and the weight of a clique is the sum which is 0(24·66kn · poly(n)). The size of the largest of the weights of its constituent vertices. Note that clique in the output is at most 2a + 1 = 3.66 times the the weight of a vertex with a binary state space is 1 cliquewidth. and the weight of other vertices is larger than 1. Our optimality criterion is now the weighted cliquewidth In a simpler implementation we use a straightforward of G. The weighted cliquewidth of G is the weight of 2-approximation algorithm for finding a 3-way vertex the heaviest clique in a junction tree of G in which the cut; Find a minimum a-b vertex cut between Va and weight of the heaviest clique is minimized. vb , a minimum vertex cut a-c between Va and Vc and To minimize the heaviest clique, we modify the algo- A fast algorithm for finding junction trees 87

rithrn as fo llows. We find a weighted W- decornpositian W-Triangulate Eq Greedy wrt (m, n ) , namely, a decomposition (X, A, B, C) of # �ave �1 nax # # .6.ave tl1nax G = (V, E), where w(V) 2::: (2a + 1)m, such that M 75 .62 2.35 16 9 .3 .93 w(W) < (a + 1)m, w(X) :s; am, w((W n A) U T 73 .64 2.37 11 16 .42 1.22 X) < (o + 1)m, w((W n E) U X) < (o + 1)m and w((vV n C) u X) < (o + l)m. Once the termination Figure 3: The results for 100 runs on Medi anus L The condition is met , namely, w(V) < (2a + 1)m, we ap­ first line records the differences on the average and ply the following greedy algorithm which is called the in the extreme case of the logarithm of the heaviest minimum weight heuristics: repeatedly, select a vertex clique. In 16 instances the algorithms produced equal 1J which forms with its neighbors N(v) a set of min­ output. The second line records the same information imum weight, remove it from the current graph, and regarding the logarithm of the total state space. make N(v) a clique. We call this modified algorithm W- Tr iangulate.

The following claim holds. reported herein) because in all our experiments when­ Theorem 4 If G is a graph with n vertices, m and ever the weight of W was small, the size was small as n 2::: 1 an: real nmnben;, and W is a subset. of V such well. that w(W) < (o + 1)m, then W- Triangulate(G,W,m) tr"iang1tlatcs G such that. the vertices of W fo rm a dique and such that the weight of a heaviest clique of 6 Experimental Results the triangulated graph < (2o + 1)m or the algorithm correctly outputs that the weighted cliquewidth of G is Kjaerulff ( 1990) has tested several heuristic algorithms larger than m. for ronstructing junction trees for two graphs that were used for a medical application: Medianus I and Me­ Proof: Theorem 3 and Lemmas 1 and 2 remain valid dianus II. His experiments show that the minimum when the cardinatity of sets is replaced with their weight heuristics enhanced by removing redundant fill­ weight and k is replaced with m. D in edges is superior to all other heuristics that were

Theurem 4 states that in the junction tree found by mnsiclered. We will compare W- Trirmgulat� with this W- Tri angulate the weight of the he;wiest clique is less enh

7TL two graphs in which T is approximately 30 using W­ Triangulate and T is approximately 34.5 using the en­ The algorithm W*-Triangulate gets two parameters k hanced greedy algorithm which implies that instead of and m. In each step, it finds a decomposition which is 1GB of memory which we need for storing the condi­ bounded both by (2o + 2)m and by (2o + 2)k. Thus , tional probabilities, the greedy algorithm would have it guarantees that the weight and, simultaneously, the used over 20GB. In general, as the state spaces increase size of W will not grow too much. This algorithm our algorithm becomes far better than the enhanced cannot outperform W- TT iangulate (in the experiments minimum weight heuristics. 88 Becker and Geiger

Recall that the algorithm W- Triangulate(G, 0, k) is be solved in subexponential time which implies that run with increasing values of k until a triangulation every NP-complete problem can be solved in s1.1bex­ is fo und. We have recorded the number of vertices l in ponential time due to Cooper's reduction fr om 3-SAT the largest clique (in size) of the junction tree fo und [Co90] . Let Gl,k, be a sequence of graphs fo r which the by W- Triangulate(G, 0, k) when it succeeded and com­ cliquewidth grows at a slightly faster rate than loga­ pared it to the value of k. Let t. = l-k. The maximal rithmic. Suppose an inference problem is given on each clique size found is of size l while the cliquewidth is network in this sequence. Examine the network in the larger than k - 1 (because the algorithm failed with sequence with l vertices. Add isolated vertices to the k - 1 as an input). Then, t. was 0 in 6 graphs (prov­ given network. The cliquewidth remains unchanged ably optimum size) , 1 in 14 graphs (at most one vertex and is at most l. When enough vertices are added 1+ offoptimum) , 2 in 29 graphs, 3 in 48 graphs and 4 in 3 (i.e., l = O(log ' n)), we use the assumed polynomial graphs. The worst case upper bound on the ratio be­ inference algorithm to solve the inference problem of tween the size of the largest clique and the unknown the augmented graph which also solves the original cliquewidth was l/k = 10/6 rather than 3.66 which inference problem. The complexity of this assumed is guaranteed in theory. Indeed, one cannot hope to algorithm is polynomial in n-the number of vertices improve the junction tree too much on this graph. with the added isolated ones-which is subexponential in Consequently, this algorithm solves an arbitrary We also collected some statistics on the running time l. problem in less than exponential time (in 1). complexity. We counted the number of partitions inference made each time a W-decomposition is constructed. One must emphasize that this negative result means k The count for Medianus I was always fa r less than 4 that probably there are some hard graphs for inference rather than 43k which is the worst case bound. The among those having a supper logarithmic cliquewidth. recursion depth was at most 3. The algorithm runs in We believe that actually all such graphs are hard for less than a minute for most graph instances but occa­ inference if the proper conditional tables are used (e.g. sionally it takes up to two minutes. On these examples polytrees can have an arbitrary large cliquewiclth and Robertson and Seymour's algorithm runs faster and they are still solvable for specific conditional tables, obtains identical results. i.e., the noisy-or model [Pe88]). Our results could possibly be improved in the fo llowing 7 Discussion direction. One extension of our work is to construct an algorithm that finds an optimal junction tree wrt the weighted cliquewidth with a complexity of optimal We presented an algorithms that finds a junction tree inference, i.e., or errs by a factor than in which the largest clique is no more than 3.66 times 0(2kn), smaller 3.66. Our algorithm can yield at best a factor of 3 if the cliquewidth. If the cliquewidth of G is of size an efficient exact algorithm is found for the 3-way ver­ k = O(log n) , then our approximation algorithm is tex cut problem for graphs with bounded cliquewidth polynomial since its complexity is 0(24 66kn · poly(n)) (The existence of such an algorithm is hinted paren­ where poly(n) is the complexity of linear program­ thetically in [D.JPSY92] but we have not yet pursued ming. It is well known that inference in an optimal this direction) . junction tree with binary variables takes 0(2kn) which is polynomial for a logarithmic cliquewidth. Thus, in­ As a final comment, let us shed light on the common ference done using the junction tree produced by our utterance used by the AI community, that "inference is algorithm, as well as by Robertson and Seymour's al­ easy in sparse graphs" . Recall that if the diquewidth gorithm, is guaranteed to be polynomial as well be­ is of size k, then the graph has no more than kn edges cause if we err at most by a constant factor, the time (see e.g., Section 4). Hence, sparse graphs in the con­ of inference is at most the optimal time raised to some text of easy inference should mean that the cliquewidth power and so inference stays polynomial. Note that is of size O(log n), which allows a polynomial inference the heuristic constructions of junction trees which do algorithm, and implies that there are no more than not guarantee a constant error bound are not polyno­ O(n log n) edges in the graph. mial.

The claim that finding the cliquewiclth of a graph is polynomial if k = 0 (log n) means that for every se­ Ack nowledgment quence of graphs Gn,k,. , n = 1, . .., with n vertices and a cliquewidth k11 , our algorithm finds the cliquewidth We thank Seffi Naor fo r pointing us to the term in polynomial time if kn < clog n for n 2: no where n0 treewidth, for helping us prove that a polynomial infer­ and c are constants. ence algorithm is not likely to exist if the cliquewidth The natural question to raise is whether a polynomial is larger than logarithmic, and for pointing us to inference algorithm exists if the diquewiclth grows a [GVY94] . We thank Leonid Zusin for showing us 1+ bit fa ster than logarithmic, say k11 = O(log ' n) for examples of poor junction trees produced by various f > 0. We now show that if a polynomial infer­ greedy algorithms. We thank the reviewers for their ence algorithm exists for all networks having such a helpful comments and for the references they provided diquewidth growth, then every inference problem can us, in particular, Reed's paper. A fast algorithm for finding junction trees 89

References [JL090] Jensen F. V., Lauritzen S.L., and Olesen K.G., Bayesian updating in causal probabilistic net­ [Ar85] ArnborgS., Efficient algorithms for combina­ works by local computations, Compntational Statis­ torial problems on graphs with bounded decompos­ tics Quarterly 4 (1990), 269-282. ability, BIT 25, pp. 2-23, 1985. [Kj90] KjaerulffU . , Triangulation of graph-algorithms [ACP87] Aruborg S., CorneilD. G., and Proskurowski giving small total state space. Technical Report R A. Complexity of finding embeddings in a k-tree, 90-09, Department of Mathematics and Computer SIA M J. Alg. Disc. Meth. 8, pp. 277-284, 1987. Science, Aalborg university, Denmark, March 1990.

[ALS91] ArnborgS., Lagergren .J., and Seese D., Easy [Kl94] T. Kloks, Treewidth, Springier-Verlag, Lecture notes in Computer Science #842, 1994. problems for tree-decomposable graphs, J. of Alg. 12, pp. 308-340, 1991. [La96] J. Lagergren. Efficient parallel algorithms for graphs of bounded treewidth, Journal of Algo­ [AP86] Arnborg S. and Proskurowski A. Characteri­ rithms, 20:20-44, 1996. zation and recognition of partial 3-trees, SIA M J.

Alg. Disc. Meth. 7, pp. 305-314, 1986. [LS88] Lauritzen, S.L. and Spiegelhalter, D . .J. Lo­ cal Computations with Probabilities on Graphical [BG94] Becker A., and Geiger D. The loop cutset Structures and Their Application to Expert Systems problem, Proceedings of the tenth conference on Un­ (with discussion). Journal Royal Statistical Society, certainty in artificial intelligence, Seattle, Washing­ B, 1988, 50(2):157-224. ton, pp. 60-68, 1994. [Pe88] Pearl, .J ., Pro babilistic reasoning in intelligent [BGHK91] Bodlaender H.L., Gilbert .J.R, Hafsteins­ systems: Networks of plausible inference. Morgan son H. and Kloks T. Approximating treewidth, Kaufmann, San Mateo, California, 1988. pathwidth, and minimum elimination tree height, Graph-theoretic concepts in computer science, In [Pe86] Pearl, .J., Fusion, propagation and structur­ proceedings: 17th International workshop, WG '91, ing in belief networks, Artificial Intelligence, 29:3

Fischbachau, Germany, Lecture notes in Computer (1986)) 241-288. Science, Springer-Verlag, #570, 1991. [Re92] Reed B., Finding approximate separators and [Bo93] Bodlaender H.L., A linear time algorithm for computing treewidth quickly, Proceedings of the findingtree-de compositions of small treewidth, Pro­ 24th Annual Symposium on Theory of Computing, ceedings of the 25th ACM STOC, pp. 226-234. pp. 221-228, NY, 1992, ACM Press. [RS95] Robertson N. and Seymour P.D., Graph mi­ [Cu91] Cunningham W.H. The optimal multiterminal nors XIII. The disjoint paths problem, J. Comb. cut problem. DIMACS Series in Discrete Mathemat­ Theory, Series B, 63:65-110, 1995. ics and Theoretical Computer Science, 5, pp.105- 120,1991. [Ro74] Rose D., Triangulated graphs and the elimina­ tion process, J. Math. anal. appl. , 32 (1974), 597- [CLR90] Corrnen T.H., Leiserson C.E., and Rivest 609. R.L., In troduction to algorithms, The MIT press, London, England, 1990.

[Co90] Cooper G. The computational complexity of probabilistic inference using Bayesian belief net­ works, Artificial Intelligence, 42, pp.393-405, 1990.

[D.JPSY92] Dahlhaus E., .Johnson D.S., Papadirn­ itriou C. H., Seymour P.D., and Yannakakis M., The complexity of multiway cuts, In Proceedings: 24th Annual ACM STOC, pp. 241-251, 1994.

[DP89] Dechter R. and Pearl .J ., Tree clustering for constraint networks, Artificial Intelligence, 38 (1989), 353-366.

[GVY94] Garg N., Vazirani V. V., and Yannakakis M., Multiway cuts in directed and node weighted graphs, In Proceedings: Automata, Languages

and Programming, 21st In ternational Colloquium, ICALP94, .Jerusalem, Israel, 1994, Lecture Notes in Computer Science #820, pp. 487-498.