Constraint-Based Inference Using in Junction Graphs

Le Chang and Alan K. Mackworth 1

Abstract. In this paper we take a step beyond the generalized arc consis- The concept of local consistency plays a central role in constraint tency approach and propose a new family of generalized local con- satisfaction and has been extended to handle general constraint-based sistency concepts for the junction graph representation of CBI prob- inference (CBI) problems. We propose a family of novel generalized lems. These concepts are based on a general condition that depends local consistency concepts for the junction graph representation of only on the existence and properties of the multiplicative absorbing CBI problems. These concepts are based on a general condition that element and does not depend on other semiring properties of CBI depends only on the existence and property of the multiplicative ab- problems. We present several local consistency enforcing algorithms sorbing element and does not depend on the other semiring proper- with various levels of enforcement and corresponding theoretic and ties of CBI problems. We present several local consistency enforcing empirical complexity analyses. We show in this paper that some of algorithms and their approximation variants. Theoretical complex- these algorithms can be seen as generalized versions of well-known ity analyses and empirical experimental results for the application local consistency enforcing techniques in CSPs and can be exported of these algorithms to both MaxCSP and probability inference are to other domains. Other abstract local consistency concepts are novel given. We also discuss the relationship between these local consis- to the community and provide efficient pre- tency concepts and message passing schemes such as junction tree processing results with a user-specified approximation threshold. We algorithms and loopy message propagation. also discuss the relationship between these local consistency con- cepts and message passing schemes such as junction tree algorithms and loopy message propagation. Local consistencies can be achieved 1 Introduction along with message propagation and can improve the efficiency of The concept of local consistency plays a central role in constraint message passing schemes. satisfaction. Given a problem (CSP), local In this paper, we use bold letters to denote sets of elements and consistency can be characterized as deriving new, possibly tighter, regular letters to denote individual elements. Given a set of elements constraints based on local information. The derived constraints sim- X and an element Z ∈ X, X−Z denotes the set of elements X − plify the representation of the original CSP without the loss of solu- {Z}. tions. This can be seen as a preprocessing procedure. For example, a value may be removed from a variable domain by the preprocess- ing because it violates these derived constraints. Both systematic ap- 2 The Constraint-Based Inference Framework proaches, such as inference or propagation algorithms, and stochas- Constraint-Based Inference (CBI) is an umbrella term for a class of tic approaches, such as local searches, benefit from these simplifi- various superficially different problems including probabilistic in- cations or domain size reduction. Among the family of local con- ference, decision-making under uncertainty, constraint satisfaction sistency enforcing algorithms or filtering algorithms, arc consistency problems, propositional satisfiability problems, decoding problems, [16] is one of the most important techniques for binary classic CSPs. and possibility inference. We abstract these problems into a sin- It is straightforward to extend it as generalized arc consistency [17] gle formal framework [5] using the algebraic semiring structure to handle non-binary classic CSPs. Many stronger local consisten- S = hA, ⊕, ⊗i where constraint combination (constraint aggrega- cies [18, 25, 14] have been studied within the constraint program- tion) is represented by the abstract multiplicative operator ⊗ and ming community. Based on the Semiring CSP [4] and Valued CSP constraint marginalization (domain reduction) is represented by the [22] frameworks, arc consistency has also been extended, as soft arc abstract additive operator ⊕. This framework, based on the synthesis consistency [7, 3], to handle over-constrained and preference-based of the existing abstract representation and algorithmic frameworks problems that can be modelled as soft CSPs. Recently, we presented from various fields [4, 22, 11, 10, 1, 21], provides a broader cover- a weaker condition [6] based on a semiring based framework for age of both the problem space and the inference algorithm space. constraint-based inference (CBI) problems [5]. More specifically, we A CBI problem P in this framework is a tuple (X, D, S, F), reduce a CBI problem to its underlying classic CSP according to the where X is a set of variables, D is a corresponding set of fi- weaker condition and then apply the generalized arc consistency ap- nite domains for the variables, S = hA, ⊕, ⊗i is a commutative proach to general CBI problems beyond classic and soft CSPs. The semiring, and F is a set of constraints. Each constraint is a func- weaker condition proposed in [6] has also been relaxed to fit gener- tion that maps a subset of variables to values in A. More specif- alized approximate preprocessing schemes. ically, we use Scope(f) to denote the subset of variables that is 1 Department of Computer Science, University of British Columbia, Canada, in the scope of the constraint f and DX to represent the domain email: {lechang,mack}@cs.ubc.ca of variable X. Two constraint operations then are defined: (1) A constraint g = f1 ⊗ f2 is the combination of two constraints Input: A CBI problem P = (X, D, S, F) and its initialized junc- f1 and f2 if Scope(g) = Scope(f1) ∪ Scope(f2) and g(w) = tion graph representation J = (C, S) ′ f1(w↓Scope(f1)) ⊗ f2(w↓Scope(f2)) for every value assignment w Output: A Single Cluster Consistent CBI problem P = Ä ′ ′ of variables in Scope(g); and (2) A constraint g = X f is the (X, D , S, F )

constraint that marginalizes out X ∈ Scope(f) from a constraint f 1: for each Ci ∈C do Ä if Scope(g) = Scope(f) − {X} and g(w) = f(xi, w) 2: for each X Domain φ do xi∈DX ∈ ( Ci ) for every value assignment w of variables in Scope(g). Given a CBI 3: if REVISE(X, φCi ) then

problem and the two constraint operations specified above, the infer- 4: for each Cj ∈C do

Ä Æ ence task is defined as computing gCBI (Z) = Y f∈F f, where 5: if X ∈ Scope(φCj ) then

Z is a subset of variables of interest and Y = X − Z. If ⊕ of the 6: Remove all tuples in φCj with the value that is re-

semiring S is idempotent, the allocation task is defined as computing moved from X

Ä Æ y = arg Y f∈F f, where arg is a prefix of operator ⊕. In other 7: end if words, arg ⊕ is an operator that returns the arguments of the ⊕ op- 8: end for erator. For example, the classic CSP can be seen as a CBI problem 9: end if using the commutative semiring SCSP = h{FALSE, TRUE}, ∨, ∧i 10: end for

to embed it into the framework. The inference task of classic CSP is 11: end for Î Ï ′ to compute the truth value of gCBI () = X f∈F f and the allo- 12: Return P := P cation task is find a complete assignment of variables that satisfies g . CBI Figure 1. Single cluster consistency enforcing algorithm (SCC-Enforcing). Our local consistency concepts are based on this CBI framework and apply to CBI problems with commutative semirings that are eliminative. A commutative semiring S = hA, ⊕, ⊗i is elimina- tive [6] if there exists a multiplicative absorbing element α⊗ ∈ A the multiplicative absorbing element α⊗ that is equal to the addi- (α⊗ ⊗ a = α⊗, ∀a ∈ A) and α⊗ is equal to the additive identity tive identity element 0 for an eliminative commutative semiring, and element 0 (0 ⊕ a = a, ∀a ∈ A). Furthermore, our approximate lo- using ≤S to replace 6= in the following definitions. cal consistency concepts apply to CBI problems with commutative semirings that are eliminative and monotonic. A commutative semir- 3.1 Single, Directional and Neighborhood Cluster ing S = hA, ⊕, ⊗i is monotonic [6] if there exists a total order ≤S on A, the additive identity element 0 is the minimum element w.r.t. Consistencies ≤S, a≤Sb implies a⊕c≤Sb⊕c and a⊗c≤Sb⊗c, ∀a, b, c ∈ A. For The fundamental concept of local consistency for an initialized junc- example, Weighted CSPs can be embedded into the CBI framework tion graph of a CBI problem with an eliminative commutative semir- N using the semiring SWCSP = h , min, +i. Because the multiplica- ing is single cluster consistency. Here we consider only the local tive absorbing element α⊗ of the semiring SWCSP is equal to the constraints attached to a single cluster and do not consider the effects additive identity element 0 that is equal to +∞, SWCSP is elimina- of other clusters. Formally: tive. Also we can show that SWCSP is monotonic. More details on eliminative and monotonic semirings can be found in [6]. Definition 1 (Single Cluster Consistency (SCC)) A cluster Ci of , P A junction graph J = (C S) of a CBI problem = an initialized junction graph is locally consistent if ∀X ∈ (X, D, S, F) is defined as follows: C = {C1, · · · ,Cn} is a set of Scope(φCi ), ∀x ∈ DX , ∃w, a value assignment of variables clusters, each cluster Ci is an aggregation of variables that is a sub- Scope(φC ) , s.t. φC (x, w) 6= α⊗. An initialized junction graph set of X and has attached initially a local constraint φ with null i −X i Ci of a CBI problem is Single Cluster Consistent if all the clusters are scope and 1 as the default constraint value or cost (1 is the iden- consistent. tity element for ⊗); S = {Sij |Ci,Cj ∈ C} is a set of separators between C and C if C C and S is an aggregation of i j i ∩ j 6= ∅ ij Single cluster consistency covers the definition of Generalization variables that consists of C C . A junction graph satisfies the con- i ∩ j of Generalized Arc Consistency (GGAC) [6], which abstracts Gen- dition that for any constraint f F, there exists a cluster C s.t. ∈ i ∈C eralized Arc Consistency (or Hyper-Arc Consistency) in constraint Scope f C . The definition of junction graph ensures that the ( ) ⊆ i programming. If the junction graph of a CBI problem is primal, in subgraph induced by any variable is connected. We say a junction other words, there is one cluster that corresponds to exactly one con- graph is initialized if for each constraint f ∈ F, we choose a cluster 2 straint (bijection), SCC is identical to GGAC. If the junction graph Ci s.t. Scope(f) ⊆ Ci and update φC by φC ⊗ f . i i is constructed without satisfying this special structural requirement, SCC is stronger than GGAC in general. 3 Local Consistency for CBI Problems Figure 1 shows a generalized routine for enforcing single clus- ter consistency for an initialized junction graph of a CBI problem We present here novel local consistency concepts for initialized junc- P = (X, D, S, F) with an eliminative commutative semiring S. The tion graphs of a CBI problem with an eliminative semiring. If the procedure REVISE of SCC-Enforcing is shown in Figure 2. semiring used to represent a CBI problem is both eliminative and We also introduce two other local consistencies for an initialized monotonic, it is straightforward to modify these concepts to approx- junction graph of a CBI problem that are stronger than Single Cluster imate local consistencies using an element ǫ ∈ A to approximate Consistency. They are Directional Cluster Consistency and Neigh- borhood Cluster Consistency. Effects of other clusters in the junction 2 Alternatively, we can keep a set of constraints inside each cluster and not initialize the potential with a combination of the constraints. Combination graph are taken into account. The distinction between these two local of constraints can be computed along with the local consistency enforce- consistencies is based on which clusters are selected for considera- ment using more space-efficient methods. tion. Input: A variable X ∈ X and a constraint f with X ∈ Scope(f) Input: A CBI problem P = (X, D, S, F), its initialized junction Output: TRUE if a value is removed from the domain of X, FALSE graph representation J = (C, S), and a total ordering OC of for else clusters in C 1: flag := TRUE Output: A Directional Cluster Consistent CBI problem P′ = ′ ′ 2: for each x ∈ DX do (X, D , S, F )

3: for each value assignment w of Scope(f)−X do 1: for i = |C| − 1 to 1 do 4: if f(x, w) 6= α⊗ then 2: Let Ci = OC[i]

5: flag := FALSE 3: φold := φCi 6: Break loop 4: for j = |C| to i + 1 do

7: end if 5: Let Cj = OC[j] Ä 8: end for 6: φC := φC ⊗ ( φC ) i i Cj −Sij j 9: if flag then 7: end for 10: Remove x from D X 8: if REVISE(X, φCi ) then 11: Return TRUE 9: for each Ck ∈C do 12: end if 10: if X ∈ Scope(φCk ) then 13: end for 11: Remove all tuples in φCj with the value that is re- 14: Return FALSE moved from X 12: end if Figure 2. Procedure REVISE(X, f) for eliminating a domain value from a 13: end for variable X according to the local constraint f. 14: end if

15: φCi := φold 16: end for Definition 2 (Directional Cluster Consistency (DCC)) Given 17: Return P′ := P a total ordering of the clusters and a cluster Ci of an ini- tialized junction graph. Let Sij be a separator that connects Figure 3. Directional cluster consistency (DCC) enforcing algorithm. another cluster Cj to Ci and L(Ci) be a subset of clusters

that consist of lower order neighbor clusters of Ci. Define

Æ Ä gi = φC ⊗ ( φC ). We say Ci is directional i Cj ∈L(Ci) Cj −Sij j consistent if ∀X ∈ Scope(gi), ∀x ∈ DX , ∃w, a value assignment of variables Scope(gi)−X , s.t. gi(x, w) 6= α⊗. An initialized junction graph of a CBI problem is Directional Cluster Consistent Input: A CBI problem P = (X, D, S, F) and its initialized junc- given a total ordering of the clusters if all clusters of the graph are tion graph representation J = (C, S) directional consistent. Output: A Single Cluster Consistent CBI problem P′ = (X, D′, S, F′) Definition 3 (Neighborhood Cluster Consistency (NCC)) Given 1: for each Ci ∈C do a cluster Ci of an initialized junction graph, Let N(Ci) be 2: φold := φCi

a subset of clusters that are neighbor clusters of Ci. Define 3: for each Cj ∈ N(Ci) do

Æ Ä Ä gi = φCi ⊗ C ∈N C ( C −S φCj ). We say Ci is neigh- 4: φC := φC ⊗ ( φC ) j ( i) j ij i i Cj −Sij j borhood consistent if ∀X ∈ Scope(gi), ∀x ∈ DX , ∃w, a value 5: end for assignment of variables Scope(gi)−X , s.t. gi(x, w) 6= α⊗. An 6: if REVISE(X, φCi ) then initialized junction graph of a CBI problem is Neighborhood Cluster 7: for each Ck ∈C do

Consistent if all clusters are neighborhood consistent. 8: if X ∈ Scope(φCk ) then

9: Remove all tuples in φCj with the value that is re- We revise the single cluster consistency enforcing algorithm in moved from X Figure 1 to directional cluster consistency and neighborhood cluster 10: end if consistency enforcing algorithms by updating the local potential φCi 11: end for according to the definition, as shown in Figure 3 and 4, respectively. 12: end if

13: φCi := φold 3.2 Approximate Local Consistencies 14: end for 15: Return P′ := P Given a CBI problem P = (X, D, S, F), if the commutative semir- ing S = hA, ⊕, ⊗i is both eliminative and monotonic, we propose Figure 4. Neighborhood cluster consistency (NCC) enforcing algorithm. an approximation scheme to enforce local consistency for its initial- ized junction graph representation with a user-controlled threshold. More specifically, we use an element ǫ ∈ A to approximate the mul- tiplicative absorbing element α⊗ that is equal to the additive identity element 0 for an eliminative commutative semiring, and use ≤S to replace 6= in the previous local consistency definitions. The mono- [6]. Correspondingly, the procedure REVISE in Figure 2 is modified tonic properties for both multiplicative and additive operators in a to handle approximate local consistency enforcing tasks, as shown in monotonic semiring ensure that this approximation always returns a Figure 5. All the local consistency enforcing algorithms discussed in lower bound estimate of the inference task for a given CBI problem the previous section then can be modified respectively. Input: A variable X ∈ X, a constraint f, an element ǫ ∈ A The junction tree representation is a special case of junction graphs Output: TRUE if a value is removed from the domain of X; FALSE that satisfies the tree property. The junction tree algorithm [23] is a if else widely studied inference algorithm in probability inference that uti- 1: flag := TRUE lizes the properties of the junction tree structure. It is also general- 2: for each x ∈ DX do ized to handle constraint-based inference problems [5], based on the

3: for each value assignment w of Scope(f)−X do seminal work on constraint programming [8, 26] and the latest gen- 4: if ǫ≤Sf(x, w) then eral algorithmic framework [10]. Given the identical message rep- 5: flag := FALSE resentation and updating scheme in the inward phase of the junc- 6: Break loop tion tree algorithm and our directional cluster consistency enforc- 7: end if ing algorithm (with a cluster order given by the width-first traverse 8: end for starting from the root cluster), it is straightforward to show that di- 9: if flag then rectional cluster consistency can be achieved along with the inward 10: Remove x from DX message passing in the junction tree algorithm, if the junction graph 11: Return TRUE of a given CBI problem satisfies the junction tree properties. In this 12: end if case, the potential reset steps (Lines 3 and 15 in Figure 3) are not 13: end for necessary if the constraint combination operator ⊗ is idempotent. 14: Return FALSE If it is not, the complement operator ⊘ of ⊗ should be introduced to cancel the duplicated costs or potentials passing from the chil- dren of the child clusters [5]. We refer readers to [12] for further Figure 5. Procedure ǫ-REVISE(X,f,ǫ) for eliminating a domain value from a variable X according to the approximate threshold ǫ of a local discussion of this technique from the semiring perspective. Analo- constraint f. gous approaches to cancelling the double-counted information in lo- cal consistency methods can also be found in [13, 7]. A CBI problem processed by such a DCC-enforcing procedure then can be solved 4 Complexities and Discussion by a search starting from the root cluster, which is a process equiva- lent to outward message propagation in the junction tree algorithm. The worst case space complexities of all three local consistency en- This observation ensures that we can perform the message passing forcing algorithms are the same: linear in the number of clusters in of junction tree algorithms and at the same time simplify the original the junction graph and exponential in the maximal cluster size. The problem representation according to the DCC enforcement. Perform- worst case time complexities are linear in the size of the junction ing the message propagation and the simplification together reduces graph and exponential in maximal cluster size too. We compare their both the time and space complexities of the junction tree algorithm. upper bounds for time and space of local consistency enforcing al- The nature of the message passing scheme in the junction tree algo- gorithms for initialized junction graphs in Table 1. All of them use rithm ensures that the directional cluster consistency enforcing can the same space, though achieving Single Cluster Consistency uses be performed in parallel for clusters in different branches of the tree. the least time, followed by Directional Cluster Consistency, and then We plan to investigate different parallel and hybrid DCC-enforcing Neighborhood Cluster Consistency. techniques following the results of [26] in future work. Loopy message propagation [19] is another widely studied ap- Table 1. Time and space upper bound comparison among various local proximate inference approach based on the junction graph represen- consistency enforcing algorithms for a junction graph J =(C, S) of a given tation in probability inferences. It is also generalized to apply to other CBI problem, where d = max D |D | and k = max |C |. Di∈ i Ci∈C i CBI problems [5] using the semiring concepts. Neighborhood cluster SCC DCC NCC consistency can be achieved along with each message updating step Time |C|dk+1 (|S| + |C|)dk+1 (2|S| + |C|)dk+1 in the generalized loopy message propagation. This can be achieved Space |C|dk+1 |C|dk+1 |C|dk+1 without additional computational cost except for invalid value detec- tion at each cluster. Similar to the junction tree algorithm, the poten- As shown in Table 1, the upper bounds of both time and space tial reset steps (Lines 2 and 13 in Figure 4) are not necessary if the for achieving local consistencies using cluster consistency enforc- constraint combination operator ⊗ is idempotent. If ⊗ is not idem- ing algorithms proposed in this paper are bounded by the maximum potent, we can cancel the duplicated costs or potentials coming from cluster size as well as the structure of the junction graph for a given the same cluster via different paths by introducing the complement CBI problem. Intuitively a simple junction graph implies large clus- operator ⊘ of ⊗. The time and space complexities of loopy message ter sizes, so there is a tradeoff between the size of the graph and the propagation are reduced after invalid values are removed from the largest cluster when constructing a junction graph. Various heuristic CBI problem following NCC enforcement. The message updating search approaches that can be used to construct junction graphs are step as well as NCC enforcement of the generalized loopy message discussed in [5]. propagation can be performed in all clusters in parallel, saving sig- The space complexity of SCC can be improved using the GAC ap- nificant computational cost if parallel computing is feasible. proaches for global constraints such as in [2] and [24] for constraint networks. The basic idea is not to initialize with a combination of 5 Experimental Results the constraints that are allocated to a single cluster into a large con- straint, but to keep a set of constraints inside each cluster. When we We discuss in this section experimental results of applying the local need to compute a combination of constraints during SCC enforcing, consistency enforcing algorithms proposed in this paper to the junc- we only need to compute new tuples that are not marked as deleted. tion graph representation of Weighted CSP and Probability Assess- The same approach can also be applied to DCC and NCC enforcing, ment that can be modelled as CBI problems. These preprocessing or though the upper bounds for space will not change. filtering algorithms simplify the original problem so that inference algorithms can then be applied with less computational complexity. 1 0.4 A workstation with a Pentium 4 3.0GHz CPU and 1 GB memory run- SCC SCC DCC DCC 0.9 NCC NCC ning SuSE Linux 9.1 was used to run the experiments in this section. 0.35

0.8 0.3

0.7 5.1 Weighted CSP 0.25 0.6 Weighted CSP is a direct extension of MaxCSP where each value 0.2 0.5 Average Error assignment in a constraint corresponds to a non-negative integer or 0.15 weight instead of 0 for legal and 1 for forbidden in MaxCSP. Two Number of Operations (%) 0.4 0.1 constraint tuple weights are combined with arithmetic plus and the 0.3

0.05 goal of the inference is to find a value assignment of all variables 0.2 that minimizes the combination of all constraints in the problem. 0.1 0 0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02 Weighted CSPs can be easily embedded into the semiring-based CBI ε ε N framework using the semiring SWCSP = h , min, +i. Because (a) (b) the multiplicative absorbing element α⊗ of the semiring SWCSP is equal to the additive identity element 0 that is equal to +∞, SWCSP is eliminative. Also we can show that SWCSP is monotonic, so both Figure 6. (a) Number of operations: the number of binary operations the exact and approximate local consistency enforcing schemes in required for probability assessment after using the local cluster consistency this paper apply to Weighted CSPs. enforcing algorithms (shown as a fraction of the number required without preprocessing) as a function of ǫ and (b) Average error: the resultant average We study a random binary Weighted CSP with 100 variables and error of the marginal probability for the Insurance network as a function of ǫ 200 constraints. The domain of each variable consists of 5 values. We choose randomly a weight from 0 to 10 for each value assign- ment of every constraint. We construct junction graphs through re- combination of the CPDs using the arithmetic product. The prob- stricting the maximum cluster size from 2 to 4. Then we apply SCC, ability assessment problem computes the posterior marginal prob- DCC and NCC enforcing algorithms to preprocess this Weighted ability of a subset of variables, given values for some variables as CSP with various ǫ that approximate the multiplicative absorbing el- known evidence. The probability assessment problem can be repre- ement α⊗ = ∞. The efficiency of the preprocessing algorithms is sented as a CBI problem using the commutative semiring SPROB = + characterized as the average variable domain size. We show the ex- hR ∪ {0}, +, ×i. It is easy to show that α⊗ = 0 = 0 in SPROB perimental results in Table 2. For the purpose of comparison, we nor- and SPROB is monotonic that both the exact and approximate lo- malize the local constraint at each cluster before performing invalid cal consistency enforcing schemes in this paper apply to probability value detection. Given these experimental results, we conclude: (1) assessment problems. For all of these approximate local cluster consistency enforcing algo- The Bayesian network used here is the Insurance network from rithms, the closer ǫ is to the exact multiplicative absorbing element the Bayesian Network Repository [9]. The network has 27 variables α⊗ = ∞, the fewer domain values are eliminated during the prepro- and 27 non-binary constraints (CPDs). In our experiments, we ran- cessing. (2) The preprocessing time for each local cluster consistency domly choose one variable as observed. The approximate local clus- enforcing algorithm is affected by the structure of the junction graph, ter consistency enforcing algorithms are used to preprocess the prob- but it does not change monotonically with the maximal cluster size. lem based on a junction graph representation with the maximal clus- (3) In sequential computing schemes, SCC uses the least preprocess- ter size of 5. The junction tree algorithm in Lauritzen-Spiegelhalter ing time, followed by DCC and then NCC. The time used by DCC architecture [15] is used to infer the marginal probability of every or NCC can be reduced if parallel computing is introduced. (4) DCC unobserved variable. We compare the number of binary operations has the strongest preprocessing ability due to its “global” property. required for probability assessment after using the preprocessing al- In other words, message passing from lower order clusters contains gorithms (shown as a fraction of the number required without pre- information from clusters that are lower than them. NCC with one processing) and the resultant error of the marginal probability for the step message updating is slightly better than SCC in that a cluster Insurance network as a function of ǫ in Figure 6. At each value of in NCC collects information from all its immediate neighbors. If we ǫ, we collect data for 5 runs. It is clear that ǫ controls the tradeoff perform several steps (3 in our experiments) of message updating, of the precision and the speed of the inference. DCC and NCC have more values are removed. stronger ability to speed up the inference but introduce more errors than SCC, so they are more sensitive to the selection of the approxi- mation threshold ǫ. 5.2 Probability Assessment Probability inference problems can be seen as constraint-based in- 6 Conclusion ference by treating conditional probability distributions (CPDs) as soft constraints over variables. A Bayesian network (BN) [20] is a As the first contribution of this paper we propose a family of novel graphical representation for probability inference under conditions generalized local consistency concepts for the junction graph repre- of uncertainty. BN is defined as a directed acyclic graph (DAG) sentation of CBI problems. These concepts apply to a broader cov- where vertices X = {X1, · · · ,Xn} denote n random variables and erage of inference problems from various fields based only on the directed edges denote causal influences between variables. D = general condition that depends on the properties of semirings that {D1, · · · , Dn} is a collection of finite domains for the variables. A are used to abstract these problems. Second, we present several local set of conditional probability distributions F = {f1, · · · , fn}, where consistency enforcing algorithms, including single, directional and fi = P (Xi|P arents(Xi)) is attached to each variable (vertex) Xi. neighborhood cluster consistency enforcing algorithms and their cor- The probability distribution over all the variables is given by the responding approximation variants. Third, theoretical complexities Table 2. Average variable domain sizes for a Weighted CSP that is preprocessed by three approximate local cluster consistency enforcing algorithms with different junction graph representations and different approximate element ǫ. Here SCC stands for Single Cluster Consistency; DCC stands for Directional Cluster Consistency; NCC-n stands for Neighborhood-Cluster Consistency with n steps of message updating. k is the size of the maximal cluster in a junction graph J =(C, S). The original average domain size is 5 and we normalize the local constraint at each cluster for comparison.

k = 2 k = 3 k = 4 |C| 200 107 69 |S| 1500 772 520 Max Degree 26 24 26 Algorithm SCC DCC NCC-1 NCC-3 SCC DCC NCC-1 NCC-3 SCC DCC NCC-1 NCC-3 ǫ = 7 4.84 2.40 3.17 1.38 4.70 2.69 2.90 1.44 4.41 2.94 3.02 1.50 ǫ = 10 5.00 2.88 4.21 1.58 4.97 3.36 3.97 1.65 4.94 3.51 4.05 1.87 ǫ = 15 5.00 3.35 4.85 1.87 5.00 4.05 4.79 2.02 4.98 4.22 4.76 2.28 ǫ = 25 5.00 3.97 5.00 2.36 5.00 4.44 4.96 2.61 5.00 4.65 4.97 2.88 ǫ = 50 5.00 4.32 5.00 3.09 5.00 4.88 5.00 3.39 5.00 4.91 5.00 3.75 ǫ = 75 5.00 4.55 5.00 3.53 5.00 4.94 5.00 3.93 5.00 4.97 5.00 4.22 ǫ = 100 5.00 4.66 5.00 3.88 5.00 4.97 5.00 4.22 5.00 4.98 5.00 4.46 ǫ = 125 5.00 4.77 5.00 4.08 5.00 4.98 5.00 4.36 5.00 4.99 5.00 4.59 ǫ = 150 5.00 4.82 5.00 4.24 5.00 4.99 5.00 4.53 5.00 5.00 5.00 4.66 ǫ = 200 5.00 4.90 5.00 4.46 5.00 5.00 5.00 4.65 5.00 5.00 5.00 4.75 ǫ = 500 5.00 4.94 5.00 4.89 5.00 5.00 5.00 4.99 5.00 5.00 5.00 4.92 Avg. Time (s) 0.16 16.25 42.88 180.65 2.62 12.72 30.86 137.48 3.55 16.25 43.85 234.80 of these preprocessing or consistency enforcing algorithms are dis- [11] J. Kohlas and P.P. Shenoy, ‘Computation in valuation algebras’, in cussed and experimental results of applying them to both MaxCSP Handbook of Defeasible Reasoning and Uncertainty Management Sys- and probability assessment problems are given. Finally, we discuss tems, Volume 5: Algorithms for Uncertainty and Defeasible Reasoning, 5–40, Kluwer, Dordrecht, (2000). the relationship between these local cluster consistency concepts and [12] J. Kohlas and N. Wilson, ‘Exact and approximate local computation in message passing schemes such as junction tree algorithms and loopy semiring induced valuation algebras’, Technical Report 06-06, Depart- message propagation. We intend to study efficient approaches to ment of Informatics, University of Fribourg, (2006). combining local cluster consistency enforcing with message prop- [13] Javier Larrosa, ‘Node and arc consistency in weighted CSP’, in Proc. of AAAI02, pp. 48–53, Menlo Park, CA, USA, (2002). agation for general CBI problems in future work. [14] Javier Larrosa and Thomas Schiex, ‘In the quest of the best form of local consistency for weighted CSP’, in Proc. of IJCAI-03, pp. 239– 244, Acapulco, Mexico, (2003). ACKNOWLEDGEMENTS [15] S. L. Lauritzen and D. J. Spiegelhalter, ‘Local computations with prob- We thank Dr. Juerg Kohlas for his comments on this paper. This re- abilities on graphical structures and their application to expert systems’, Journal of the Royal Statistical Society, Series B, 50, 157–224, (1988). search was supported by NSERC and IRIS. Le Chang is a Precarn [16] Alan K. Mackworth., ‘Consistency in networks of relations’, Artificial Scholar and Alan K. Mackworth holds a Canada Research Chair. Intelligence, 8, 99–118, (1977). [17] Alan K. Mackworth, ‘On reading sketch maps.’, in IJCAI77, pp. 598– 606, (1977). REFERENCES [18] R. Mohr and T.C. Henderson, ‘Arc and path consistency revisited’, Ar- [1] Srinivas M. Aji and Robert J. McEliece, ‘The generalized distributive tificial Intelligence, 28(2), 225–233, (1986). law’, IEEE Transactions on Information Theory, 46, 325–343, (2000). [19] Kevin P. Murphy, Yair Weiss, and Michael I. Jordan, ‘Loopy belief [2] Christian Bessiere` and Jean-Charles Regin,´ ‘Enforcing arc consistency propagation for approximate inference: An empirical study’, in UAI99, on global constraints by solving subproblems on the fly.’, in CP, pp. pp. 467–475, (1999). 103–117, (1999). [20] Judea Pearl, Probabilistic reasoning in intelligent systems: networks of [3] Stefano Bistarelli, Semirings for Soft Constraint Solving and Program- plausible inference, Morgan Kaufmann Publishers Inc., 1988. ming, Springer-Verlag, 2004. [21] Cedric Pralet, Gerard Verfaillie, and Thomas Schiex, ‘Composite [4] Stefano Bistarelli, Ugo Montanari, and Francesca Rossi, ‘Semiring- graphical models for reasoning about uncertainties, feasibilities, and based constraint satisfaction and optimization’, J. ACM, 44(2), 201– utilities’, in CP05 Workshop on Preferences and Soft Constraints, 236, (1997). (2005). [5] Le Chang, Generalized Constraint-Based Inference, Master’s thesis, [22] Thomas Schiex, Hel´ ene` Fargier, and Gerard Verfaillie, ‘Valued con- Dept. of Computer Science, Univ. of British Columbia, 2005. straint satisfaction problems: Hard and easy problems’, in IJCAI95, pp. [6] Le Chang and Alan K. Mackworth, ‘A generalization of generalized 631–637, Montreal, (1995). arc consistency: From constraint satisfaction to constraint-based infer- [23] P. P. Shenoy and G. Shafer, ‘Axioms for probability and belief-function ence’, in IJCAI05 Workshop on Modelling and Solving Problems with propagation’, in UAI90, 169–198, (1990). Constraints, pp. 68–75, Edinburgh, (July 2005). [24] Gerard Verfaillie, David Martinez, and Christian Bessiere,` ‘A generic [7] Martin Cooper and Thomas Schiex, ‘Arc consistency for soft con- customizable framework for inverse local consistency’, in AAAI ’99, straints’, Artificial Intelligence, 154(1-2), 199–227, (2004). pp. 169–174, Menlo Park, CA, USA, (1999). [8] Rina Dechter and Judea Pearl, ‘Tree clustering for constraint networks’, [25] Richard Wallace, ‘Directed arc consistency preprocessing as a strategy Artif. Intell., 38(3), 353–366, (1989). for maximal constraint satisfaction’, in Proc. of ECAI’94 Workshop on [9] N. Friedman, M. Goldszmidt, D. Heckerman, and S. Russell. Bayesian Constraint Processing, Amsterdam, (1994). network repository, http://www.cs.huji.ac.il/labs/compbio/repository/. [26] Ying Zhang and Alan K. Mackworth, ‘Parallel and distributed finite [10] Kalev Kask, Rina Dechter, Javier Larrosa, and Avi Dechter, ‘Unifying constraint satisfaction: Complexity, algorithms and experiments’, in cluster-tree decompositions for reasoning in graphical models’, Artifi- Parallel Processing for Artificial Intelligence, chapter 1, 305–334, El- cial Intelligence, 166, 165–193, (August 2005). sevier Science Publishers, (1994).