A Sufficiently Fast Algorithm for Finding Close to Optimal Junction Trees

A fast algorithm for finding junction trees 81 A sufficiently fast algorithm for finding close to optimal junction trees Ann Becker and Dan Geiger Computer Science Department Technion Haifa 32000, ISRAEL anyuta(Qlcs. technion.ac.il, dangC9lcs. technion.ac.il Abstract presented with an appropriate example. One algo rithm, due to Rose (1974), is as fo llows: repeatedly, select a vertex v with minimum number of neighbors An algorithm is developed for finding a close N(v), delete v fr om the graph, and make a clique out to optimal junction tree of a given graph G. of N ( v) . The resulting sequence of cliques creates a The algorithm has a worst case complexity junction tree. This greedy algorithm minimizes the 0 ( ckn a) where a and c are constants, n is size of each clique as it is being created. However, it the number of vertices , and k is the size of the could easily make a mistake at the first step that would largest clique in a junction tree of Gin which lead it to a junction tree far offthe optimal size. An this size is minimized. The algorithm guaran other algorithm, investigated by Kjaerulff (1990), is tees that the logarithm of the size of the state simulating annealing which takes a long time to run space of the heaviest clique in the junction and has no guarantees on the quality of the output. tree produced is less than a constant factor off tl;e optimal value. When k = O(logn), Finding an optimal junction tree is NP-complete but our algorithm yields a polynomial inference for a graph with n vertices and a cliquewidth k there algorithm fo r Bayesian networks. exits an O(nk+ 1) algorithm that finds an optimal junc tion tree [ACP87]. This algorithm is not practical for the size of Bayesian networks dealt in practice. 1 Introduction Other algorithms for finding an optimal junction tree have a complexity of O(f(k)n) where f(k) is a super exponential function of k [Bo93]. These algorithms All exact inference algorithms for the computation are practical for cliques of size k 5 at most. A more of a posterior probability in general Bayesian net = practical algorithm for constructing an optimal j unc works have two conceptual phases. One phase handles e 4 operations on the graphical structure itself and the tion tree when th largest clique size is is given in [AP86]. For larger values of k there is no algorithm to other performs probabilistic computations; The junc date that can find the optimum junction tree quickly. tion tree algorithm [LS88] requires us to first find a The exponential dependency in k cannot be improved "good" junction tree and then perform probabilistic computations on the junction tree and the method of unless P = N P because finding an optimal clique tree conditioning [Pe86] requires to find a "good" loop cut fork= O(n) is NP-complete [ACP87]. set and then perform a calculation using the loop cut Kloks in his book treewidth [Kl94], which is devoted to set. In [BG94], we offered an algorithm that finds a finding junction trees in various graphs, gives a poly loop cutset for which the logarithm of the state space nomial algorithm that finds a junction tree of a given is guaranteed to be a constant factor off the optimal graph G such that its maximal clique size is at most value. In this paper, we provide a similar optimization 12t.logn off optimal where t. is a large unspecified for the junction tree algorithm. constant (See also, [BGHK91]). Kloks states that find We shall first restrict our discussion to networks for ing a polynomial algorithm that constructs a junction which all vertices have the same state space size and tree such that its maximal clique is a constant fa ctor to the optimality criterion which we call cliquewidth. off optimal is a major open problem. The importance The cliquewidth of an undirected graph G is the size of of this problem stems from the fa ct that many NP the largest clique in a junction tree of G in which the complete problems on graphs can be solved polynomi size of the largest clique is minimized. A more common ally if the input graph has a junction tree with fixed sized cliques and if such a junction tree can be fo und term is treewidth which is the cliquewidth minus L efficiently [Ar85, ALS91]. Some of these problems To date all methods in the AI and Statistics commu are: INDEPENDENT SET, DOMINATING SET, GRAPH nities for finding a junction tree had no guarantee of K-COLORABILITY, HAMILTONIAN CIRCUIT and CON- performance and could perfo rm rather poorly when 82 Becker and Geiger STRAINT SATISFACTION PROBLEMS [DP89]. the junction tree algorithm. For details, consult [LS88, .JL090]. Robertson and Seymour [RS95], among other key re sults, were the first to present an algorithm that finds Definition A directed acyclie graph (DAG) is a graph a junction tree of a given graph G such that its max with no directed cycles. In a DAG, pa( v) denotes the imal clique size is at most a constant factor off opti set of parents of a vertex v. A Bayesian network is a mal (They actually used a slightly different concept DAG such that with each vertex v we associate a finite termed branchwidth). Reed [Re92] presents Robertson set D(v) called the state space of v and a probability and Seymour's algorithm in a more accessible form and distribution P( vlpa(v)). The joint distribution of V is shows that its output is always less than 4 times the given by P(V) = P(vlpa(v)). cliquewidth and the complexity is O(k233kn2). Reed ITvEV also gives an algorithm that errs by a factor of 5 and The updating problem is to compute the posterior prob has a complexity O(k234kn log n . Lagergren [La96] ) ability of a certain vertex given specific values to a set presents efficient parallel algorithms for this problem. of other vertices. We offer an algorithm that finds a junction tree such The junction tree algorithm solves the updating prob that its largest clique is at most (2a: + 1) times the lem as follows. For every vertex v, it connects every cliquewidth where a is the approximation ratio for pair of v 's parents and removes the direction of all any approximation algorithm for the 3-way vertex cut edges in the graph. The resulting graph is undirected problem. When using a -approximation algorithm for � (called the moral graph). Then, the moral graph is tri the 3-way vertex cut problem (a = �) due to [GVY94], angulated; edges are added until every cycle of length our algorithm's complexity is 0(24·66kn · poly (n)) and more than three has a chord. These are called fill-in it errs by a factor of 3.66 where poly(n) is the running edges. Once the graph is triangulated (or chordal), a time of linear programming. tree of cliques, called the junction tree, is constructed. The junction tree algorithm then loads all probabilities When k = O(log n ) , our algorithm, like previous ones, into the junction tree and performs the calculations on is polynomial. Consequently, it yields a polynomial the new structure. inference algorithm for the class of Bayesian networks have a cliquewidth. Of course, one that logarithmic Definition Let G = (V, E) be a chordal graph. A does not know a priori what is the cliquewidth of a junction tree of G is a tree 1{ such that each maximal given network and so a user must terminate the algo clique C of G is a node in H, and for every vertex v rithm if the running time is too long, in which case, of G, if we remove from 1{ all nodes not containing v, however, the running time of exact inference must be the remaining (hyper) graph stays connected. quite large as well. We show that for the class of Bayesian networ s having a slightly larger than loga k The single most important step of this algorithm is rithmic cliquewidth, there exists no polynomial infer triangulation. There are many ways to add edges to ence algorithm unless all NP-complete problems are a given graph until it becomes chordal. In particular, solvable in less than exponential time. one can simply make a single large clique. However, In Section 3, we describe the algorithm and prove its the time for loading the probabilities and performing performance guarantee. This algorithm is made as the calculations is proportional to the total state space simple as possible to facilitate the proof. In Section 4, given by LCEH ITvEC ID(v)l, which is dominated by we describe several heuristics that improve the algo the size of the maximal clique if all vertices have the rithm's average case performance. In Section 5, we de same state space size. For example, if a maximal clique scribe the changes needed so that the algorithm takes contains m vertices and if their state space is of size into account vertices with different state space sizes. two, then the probability table for this clique is of size The modified algorithm guarantees that the logarithm 2m. The objective of triangulation is to find a trian of the size of the state space of the heaviest clique in gulation such that the maximal clique size is as small the junction tree found is less than a constant factor as possible. Sections 3 and 4 are doing just that.

A Sufficiently Fast Algorithm for Finding Close to Optimal Junction Trees

Bayesian Methods for Unsupervised Learning

Neural Trees for Learning on Graphs Rajat Talak, Siyi Hu, Lisa Peng, and Luca Carlone

Duality of Graphical Models and Tensor Networks Arxiv:1710.01437V1

Machine Learning

Grammar Factorization by Tree Decomposition

Performance Evaluation of Algorithms for Soft Evidential Update in Bayesian Networks: First Results

Preventing Unnecessary Groundings in the Lifted Dynamic Junction Tree Algorithm

Region Based Approximation for High Dimensional Bayesian Network Models

Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2

The Inclusion-Exclusion Rule and Its Application to the Junction Tree

6 : Factor Graphs, Message Passing and Junction Trees

Duality of Graphical Models and Tensor Networks Arxiv:1710.01437V1