Application of a Hill-Climbing Algorithm to Exact and Approximate Inference in Credal Networks
Total Page:16
File Type:pdf, Size:1020Kb
4th International Symposium on Imprecise Probabilities and Their Applications, Pittsburgh, Pennsylvania, 2005 Application of a hill-climbing algorithm to exact and approximate inference in credal networks Andr´es Cano, Manuel G´omez, Seraf´ın Moral Department of Computer Science and Artificial Intelligence E.T.S Ingenier´ıa Inform´atica, University of Granada [email protected], [email protected], [email protected] Abstract function of the number of parents [20]. This hypothe- sis will be accepted in this paper. Some authors have This paper proposes two new algorithms for inference considered the propagation of probabilistic intervals in credal networks. These algorithms enable proba- directly in graphical structures [1, 12, 23, 24]. In the bility intervals to be obtained for the states of a given proposed procedures, however, there is no guarantee query variable. The first algorithm is approximate that the calculated intervals are always the same as and uses the hill-climbing technique in the Shenoy- those obtained when a global computation using the Shafer architecture to propagate in join trees; the associated convex sets of probabilities is used. In gen- second is exact and is a modification of Rocha and eral, it can be said that calculated bounds are wider Cozman's branch-and-bound algorithm, but applied than exact ones. The problem is that exact bounds to general directed acyclic graphs. need a computation with the associated convex sets Keywords. Credal network, probability inter- of probabilities. Following this approach, different vals, Bayesian networks, strong independence, hill- approximate and exact methods have been proposed climbing, branch-and-bound algorithms. [5, 3, 2, 8, 20, 19]. These assume that there is a convex set of conditional probabilities for each configuration of the parent variables in the dependence graph, and 1 Introduction provide a model for obtaining exact bounds through the use of local computations. Working with con- A credal network is a graphical structure (a di- vex sets, however, may be extremely inefficient: if we rected acyclic graph (DAG) [9]) which is similar to a have n variables and each variable, Xi, has a convex Bayesian network [18], where imprecise probabilities set with li extreme points as the conditional infor- are used to represent quantitative knowledge. Each mation, the propagation algorithm has a complexity node in the DAG represents a random event, and n order of O(K: i=1 li), where K is the complexity of is associated with a set of convex sets of conditional carrying out aQsimple probabilistic propagation. This probability distributions. is so since the propagation of convex sets is equiva- There are several mathematical models for imprecise lent to the propagation of all the global probabilities probabilities [25]. We consider convex sets of proba- that can be obtained by choosing an exact conditional bilities to be the most suitable for calculating and rep- probability in each convex set. resenting imprecise probabilities as they are powerful In this paper, we shall propose a new approximate enough to represent the results of basic operations algorithm for inference in credal networks. This is within the model without having to make approxima- a hill-climbing procedure which quickly obtains good tions which cause loss of information (as in interval inner approximations to the correct intervals. The probabilities). algorithm is based on the Shenoy-Shafer architecture In this paper, we shall consider the problem of com- [22] for propagation in join trees. Rocha, Campos and puting upper and lower probabilities for each state of Cozman [13] present another hill-climbing search, in- a specific query variable, given a set of observations. spired by the Lukatskii-Shapot algorithm, for obtain- A very common hypothesis is that of separately speci- ing accurate inner approximations. We shall also pro- fied credal sets [17, 20] (there is no restriction between pose a branch-and-bound procedure (also used in [19]) the set of probabilities associated to a variable given for inference in credal networks, where the initial so- two different configurations of its parents). Under this lution is computed with our hill-climbing algorithm, hypothesis, the size of the conditional convex sets is a reducing its running time. This approach is similar to the one in [13]. Campos and Cozman [11] also give a multilinear programming method which provides a very fast branch-and-bound procedure. The rest of the paper is organized as follows: Sec- tion 2 describes the basics of probability propagation in Bayesian networks and the Shenoy-Shafer architec- ture; Section 3 presents basic notions about credal sets and credal networks; Section 4 introduces proba- Figure 1: A Bayesian network and its join tree bility trees and briefly describes two algorithms based on variable elimination and probability trees for in- j Each observation, Xi = xi , is represented by means ference in credal networks; Section 5 details the pro- of a valuation which is a Dirac function defined on posed hill-climbing algorithm for inference in credal j j ΩXi as δXi (xi; xi ) = 1 if xi = xi, xi 2 ΩXi , and networks; Section 6 explains how to apply the pro- j j δ (x ; x ) = 0 if x 6= x . posed algorithm in Rocha and Cozman's branch-and- Xi i i i i bound algorithm; Section 7 shows the experimental The aim of probability propagation algorithms is work; and finally Section 8 presents the conclusions. to calculate the a posteriori probability function p(xqjxE), (for every xq 2 ΩXq ) by making local com- 2 Probability propagation in putations, where Xq is a given query variable. This distribution verifies: Bayesian networks Let X = fX1; : : : ; Xng be a set of variables. Let us #Xq assume that each variable Xi takes values on a finite 0 j 1 p(xqjxE) / p(xijyj ) δXi (xi; xi ) (2) set ΩXi (the frame of Xi). We shall use xi to de- Y Y Xi j @ xi 2xE A note one of the values of Xi, xi 2 ΩXi . If I is a set of indices, we shall write XI for the set fXiji 2 Ig. Let N = f1; : : : ; ng be the set of all the indices. The In fact, the previous formula is the expression for p(xq; xE). p(xqjxE ) can be obtained from p(xq; xE) Cartesian product i2I ΩXi will be denoted by ΩXI . Q by normalization. The elements of ΩXI are called configurations of XI and will be written with x or xI . In Shenoy and A known propagation algorithm can be constructed Shafer's [22] terminology, a mapping from a set ΩXI by transforming the DAG into a join tree. For exam- R+ into 0 will be called a valuation h for XI . Shenoy ple, the righthand tree in Figure 1 shows a possible and Shafer define two operations on valuations: com- join tree for the DAG on the left. There are sev- bination h1 ⊗ h2 (multiplication) and marginalization eral schemes [16, 22, 21, 14] for propagating on join #J h (adding the removed variables). trees. We shall follow Shafer and Shenoy's scheme A Bayesian network is a directed acyclic graph (see [22]. Every node in the join tree has a valuation ΨCi the left side of Figure 1), where each node represents attached, initially set to the identity mapping. There a random event Xi, and the topology of the graph are two messages (valuations) MCi!Cj , MCj !Ci be- shows the independence relations between variables tween every two adjacent nodes Ci and Cj . MCi!Cj according to the d-separation criterion [18]. Each is the message that Ci sends to Cj , and MCj !Ci is node Xi also has a conditional probability distribu- the message that Cj sends to Ci. Every conditional tion pi(XijY) for that variable given its parents Y. distribution pi and every observation δi will be as- Following Shafer and Shenoy's terminology, these con- signed to one node Ci which contains all its variables. ditional distributions can be considered as valuations. Each node obtains a (possibly empty) set of condi- A Bayesian network determines a unique joint proba- tional distributions and Dirac functions, which must bility distribution: then be combined into a single valuation. The propagation algorithm is performed by crossing p(X = x) = p (x jy ) 8x 2 Ω (1) Y i i j X the join tree from leaves to root and then from root i2N to leaves, updating messages as follows: #Ci\Cj An observation is the knowledge of the exact value MCi!Cj = (ΨCi ⊗ ( MCk!Ci )) j O Xi = xi of a variable. The set of observed variables is Ck 2Ady(Ci;Cj ) denoted by XE; the configuration for these variables (3) is denoted by xE and it is called the evidence set. E where Ady(Ci; Cj ) is the set of adjacent nodes to Ci will be the set of indices of the observed variables. with the exception of Cj . Once the propagation has been performed, the a pos- The propagation of credal sets is completely analo- teriori probability distribution for Xq, p(XqjxE ), can gous to the propagation of probabilities; the proce- be calculated by looking for a node Ci containing Xq dures are the same. Here, we shall only describe the and normalizing the following expression: main differences and further details can be found in [5]. Valuations are convex sets of possible probabili- p(X ; x ) = (Ψ ⊗ ( M ))#Xq (4) q E Ci O Cj !Ci ties, with a finite number of vertices. A conditional Cj 2Ady(Ci) valuation is a convex set of conditional probability where Ady(Ci) is the set of adjacent nodes to Ci. The distributions. An observation of a value for a variable normalization factor is the probability of the given will be represented in the same way as in the prob- evidence that can be calculated from the valuation abilistic case.