Introduction to NeuraBASE:

Forming Join Trees in

Bayesian Networks

Robert Hercus Choong-Ming Chin

Kim-Fong Ho

Introduction

A or a belief network is a condensed of probabilistic relationships amongst a set of variables and is also a popular representation to encode uncertainties in expert systems.

In general, Bayesian networks can be designed and trained to provide future classifications using a set of attributes derived from statistical data analysis. Other methods for learning Bayesian networks from data have also been developed (see Heckerman [2]). In some cases, they can also handle situations where data entries are missing.

Bayesian networks are often associated with the notion of causality and for a network to be considered a Bayesian network, the following requirements (see Jensen [3]) must hold:

● A set of random variables and a set of directed edges between variables must exist. ● Each variable must have a finite set of mutually exclusive states. ● The variables together with the directed edges must form a (DAG).

(A is acyclic if there is no directed path A1 → . . . An such that A1 = An).

● For each variable A with parents B1, B2, . . . , Bn there is an attached potentiated table P(A|B1, B2, . . . , Bn).

In this paper, its main objective is to provide an alternative method to find clique trees that are paramount for the efficient computation of probabilistic enquiries posed to Bayesian networks. For a simple Bayesian network, as shown in Figure 1, given the evidence that people are leaving, we may want to know what is the probability that fire has occurred, or even the probability that there is smoke, given that the alarm has gone off.

Much of the research in Bayesian networks is focused on developing efficient algorithms for computing probabilities and updating them whenever the evidence becomes available. There are currently two families of algorithms for computing such probabilities. The first algorithm is attributed to Pearl [5] and, Russell and Norvig [7]. In those works, the focus of the research was applied to polytrees, which are directed acyclic graphs (DAGs) with a maximum of one path between any two variables as illustrated in Figure 2.

Figure 1. A Bayesian network with six Boolean variables

Figure 2. An example of a polytree

The other family of algorithms follows the techniques of Lauritzen and Spiegelhalter [4] and is applied to general DAGs. To compute the required probabilities there is a need to first convert the DAG into trees of cliques or a join of cliques. For the case of DAGs which are polytrees, techniques attributed to Pearl [6] have shown that there is no necessity to convert such DAGs into join trees before calculating probabilities on the Bayesian network. Therefore, join trees are usually needed when the Bayesian network cannot be expressed as a polytree.

On the other hand, polytrees can be converted into join trees to compute the required probabilities of the Bayesian networks using the techniques of Lauritzen and Spiegelhalter [4]. In theory, a join tree of cliques is a connected acyclic graph comprising of connected nodes. In this case, each node (or clique) is a collection of joined network variables. The current technique to form a join tree is through the use of the approach.

This paper examines an alternate neuronal network modelling method based on Hercus [9], which is based on a simple concept that can be applied to model large scale Bayesian network problems.

Graph Theory Approach

In the methods of Lauritzen and Spiegelhalter [4], the first stage of creating clique trees is the conversion of the DAG into a moral graph, which is an undirected graph. An undirected graph is a collection of vertices and edges. In Bayesian networks, the set of variables correspond to the vertices, and if there is a causal connection between two variables, the pair of variables is joined by an edge, which is indicated on a diagram by a line (without an arrow).

To form a moral (undirected) graph of a DAG, the following steps are used:

1. If each vertex in the DAG has two or more parents, undirected edges joining the parents are added; 2. All directed edges in a DAG are replaced by undirected ones.

To illustrate how the procedure works, an example of a DAG is shown in Figure 3. To form the moral graph as shown in Figure 4, since vertices d, g and h has two parents each, a to b, d to e and d are first joined to f with undirected edges, and subsequently, replace all the directed edges by undirected ones.

Figure 3. An example of a DAG

Figure 4. The moral graph corresponding to Figure 3

A set of cliques is a collection of vertices, where all the vertices are connected to each other, and to which, no other vertex is completely connected. The general requirement to form a tree of cliques is that the moral graph has to be triangulated (chordal or decomposable), and where, there are no cycles of four or more vertices without an edge joining two non-consecutive vertices. To triangulate a moral graph, extra edges are usually added on certain vertices.

The following procedure is based on Tarjan and Yannakakis [8] to triangulate the moral graph and to form the tree of cliques:

1. Number the vertices by a maximal cardinality search. Start anywhere, and number a vertex with the largest number of numbered neighbours. 2. Starting with the highest numbered vertex, check that all the lower numbered neighbours of each vertex are neighbours. Add missing edges in order to produce a triangulated graph. 3. Identify all cliques, and number them by the highest numbered vertex in the clique.

4. Form the join tree (a connected graph with no cycles) by connecting each clique to a predecessor (according to the numbering in step 3) which shares the most vertices. Variables that two neighbouring cliques have in common are called the separator.

From the moral graph given in Figure 4, we can see that it is already a triangulated graph where, there is an edge joining the four vertices a, b, d and e. Using the steps described above, we can begin the maximum cardinality search. By starting, for example, from a, and one of the ordering is abdegfhc and the cliques are C1 = abd, C2 = ade, C3 = deg, C4 = dfh and C5 = cf with separators S2 = ad, S3 = de, S4 = d and S5 = f.

There are three choices for forming the join tree, as C4 can be joined to any of C1, C2 or C3. Figure 5 shows the various types of clique trees where the separators are marked on the edges. In step 1 and step 2, we can see that the technique to identify the cliques of a Bayesian network is quite subjective, where different ordering or numbering of the vertices of the moral graph will tend to produce different types tree of cliques.

Hence, in order to find a different join tree from the one we have found, there is a need to repeat the entire procedure again. This is especially true if we have more complex DAGs in Figure 15 and Figure 21.

Figure 5. Three clique trees corresponding to Figure 4 where the separators are marked on the edges.

In the next section, we will look into an alternate strategy of forming various types of join trees by following the same procedure once, with the steps involved being less subjective than the graph theory approach.

NeuraBASE Neuronal Network Approach

In this section, we will discuss a technique to form join trees of Bayesian networks using a neuronal network approach described in Hercus [9]. Rather than setting up a built-in unsupervised learning approach, the neuronal network nodes are built based on causal relationships between the variables inherited from the Bayesian network problem.

Based on the association rules analysis terminology (see Hastie et al. [1], pp. 439 - 444), if there is a causal relationship between the network variables A and B, which follows the direction A → B, such that A ≠ B, the neuronal network setup will use the following notation, where A is called the “antecedent" and B is referred to as the “consequent".

A ⟹ B

In essence, the method of forming neuronal network is based on linking two parent nodes at Level 1 of the neuronal network (variables from the causal probability Bayesian network) to form an association at Level 2, i.e. a child node (causal relationship between two variables). By so doing, this new node will then encapsulate the information of both its parents, which, will then form links with other nodes if the node has causal relationship with other variables. For a visual explanation on how the neuronal network works, see Figure 6 below.

Figure 6. An example of a linking process to form causal relationships encapsulated in the child node A → B, and also a child node B → A at Level 2. Both parents A and B are at Level 1 of the neuronal network framework.

In Figure 6, the central point of this technique is the directional associations from the parent nodes (representing variables in a Bayesian network) to the child node. The child node is viewed as a representation of causal relationship between its parent variables. Instead of directly associating the relationship of both parents onto the child node, the proposed neuronal network strategy works one step at a time. For example in Figure 6, if there is a network causal relationship from A → B, then the node A → B is formed by first linking the variable A to the child node with a directional association, followed by another directional association to node B. On the other hand, if there is a link from B to A, the linking process will therefore be based on linking a directional association from node B to the child node, followed by a directional association to node A, which then forms the child node B → A.

The NeuraBASE neuronal network can be used to define various types of child nodes using rules of connections in Bayesian networks. In a serial connection, of a Bayesian network having three variables A, B and C, we can therefore express them visually given in Figure 7.

Figure 7. A serial connection of a Bayesian network where A has an influence on B, which in turn has an influence of C.

The corresponding neuronal network to Figure 7 can be expressed in Figure 8 where the child node,

at Level 3 shows that the variables A and C are d-separated given B, using a Bayesian network terminology (see Jensen [3]). Note that in an actual neuronal network setup, the child node will be treated as an inherent node.

Figure 8. A neuronal network representing a serial connection corresponding to Figure 7. The node

at Level 3 shows that A and C are d-separated given B.

In a diverging connection involving three variables A, B and C shown in Figure 9, where both B and C are children of A, the corresponding neuronal network is shown in Figure 10, where the relationship at Level 3 of the network,

shows that A has an impact on both B and C. In Figure 10, node A ⟹ C can also be linked to A ⟹ B to form . Because the consequents of both child nodes and are congruent in relation to the influence of the parent A, we can therefore choose either one of these two nodes to represent a Level 3 child node.

Note that the child node of or is only treated as an inherent node in an actual neuronal network setup.

Figure 9. A diverging connection of a Bayesian network where A has an influence on B, as well, as C.

Figure 10. A neuronal network setup of a diverging connection corresponding to Figure 9. The node A → {B, C} at Level 3 shows that A has an influence on B and C.

Figure 11. A converging connection of a Bayesian network where A and B have an influence on C

Finally, in a converging connection of a Bayesian network (see Figure 11) the neuronal network architecture for Figure 11 can be represented as in Figure 12 where the child node at Level 3,

shows that both A and B have an impact on C. From Figure 12, we can also link the node B ⟹ C to A ⟹ C to form {A, B} ⟹ C. Because the antecedents of both child nodes and are congruent in relation to the influence of the parent C, we can therefore choose either one of these two nodes to represent a Level 3 child node as well. Note also that the child node of either and is only treated as an inherent node in an actual neuronal network setup.

Figure 12. A neuronal network setup of a converging connection corresponding to Figure 11. The node {A, B} → C at Level 3 shows that A and B have an influence on C

The following summarises the initial steps of forming tree of cliques from an n-variable DAG for Bayesian networks without the graph theory using the NeuraBASE neuronal network:

1. List out all the n-variables DAG as Level 1 neuronal network nodes. Join the nodes to form child nodes at Level 2 based on the causal relationships of variables in the DAG.

2. Form converging (diverging) child node connections at Level 3. If there are no such connections, designate such nodes as output nodes and go to step 4.

3. At Level 3, if there exists two different nodes sharing the same consequents (antecedents) or if intersection of the set of antecedents (consequents) is not empty, join the two nodes to form a child node at Level 4. Otherwise, designate such nodes as output nodes and go to step 4.

4. Identify the cliques (cluster of variables) from the output nodes. Using the least number of cliques to form all possible join trees by connecting each clique that shares the most vertices. Variables that two neighbouring cliques have in common are the separators.

Step 3 of the procedure can also be explained as a combinatorial arrangement of the variables involved. If there are unique m ≥ 3 variables converging (or diverging) to (or from) a particular M variable, then step 3 is equivalent to finding the number of distinct nodes at Level 4 of NeuraBASE.

As a result of this linking process, each child node at Level 4 will have four numbers of variables (total number of antecedents and consequents) that will constitute the clique information. In actual implementation, child nodes generated from Level 3 onwards are treated as inherent nodes in our neuronal network setup.

Using the example of a DAG given in Figure 3, the procedure works by first listing out all the variables as neuronal network nodes. Next, form child nodes at Level 2 based on the causal relationship of the variables, and subsequently, try to form converging or diverging connections according to the steps given in the procedure.

For further information see Figure 13 below for different types of Bayesian networks connections expressed in NeuraBASE.

Figure 13. A neuronal network setup for the DAG (in Figure 3), showing converging and diverging connections at Level 3, and Level 2. The identification of cliques abd, deg, dgh, dfh, ade and cf comes from the output nodes.

In Figure 13, all the child nodes at Level 3 and one child node at Level 2 constitute the output nodes according to the neuronal network procedure as no two nodes can form either a converging or diverging connection for the next level. Hence, the building up of new child nodes will stop at this stage. From the information given in the output nodes, we then obtain the required cliques abd, deg, dgh, dfh, ade and cf.

By comparing the cliques obtained using NeuraBASE with the graph theory approach, it can be seen that the new procedure is able to locate all the cliques found using the graph theory approach. In addition, it is also able to identify an additional clique dgh, which the graph theory approach did not manage to identify.

By using the clique information, we can form join trees by utilising the least number of cliques available to us. One such combination is to only use cliques abd, ade, deg, dfh and cf, which is identical to forming the join trees found using graph theory approach (see Figure 5).

Apart from forming the join trees given in Figure 5 with an additional clique dgh, we can also form other types of clique trees. We can do this by substituting clique deg with dgh. The two types of join trees that can be formed are therefore given in Figure 14.

Figure 14. Two additional clique trees corresponding to Figure 4. The separators are marked on the edges.

In this example, the new procedure shows that it is not only adequate to produce various types of join trees, but also it is also far more efficient than the graph theory in identifying all types of possible cliques of the Bayesian network, without the need to repeat the procedure more than once.

In the next example, (see Figure 15 below), this new procedure can also help locate all possible cliques based on the rules of forming converging or diverging connections in a neuronal network. In Figure 16 and Figure 18 which relates to the forming of child nodes using converging and diverging connections (of the DAG) respectively, we can see that based on step 3 of the procedure, the neuronal network for the two types of connection will stop propagating new child nodes beyond Level 4. Hence, at Level 4, all the child nodes are output nodes with the variable information in each node constituting the required clique. Take note that for an alternative viewpoint of forming cliques based on a combinatorial arrangement according to step 3 of the procedure, see also Figure 17 and Figure 19.

From the output nodes of the neuronal network setup given in Figure 16 and Figure 18, the total number of cliques obtained are more than adequate to form a variety of join trees. As for forming the clique trees, we would have to depend on the DAG structure (Figure 15) in order to produce the necessary trees (see Figure 20 for further details).

If steps described in the previous section (Graph Theory Approach) are used, by adding appropriate undirected edges to triangulate the moral graph, the clique trees obtained will also be the same as Figure 20.

This shows that, without following the graph theory to identify the correct cliques, the new procedure is also capable of producing clique trees that correspond to a triangulated moral graph of a DAG.

Figure 15. An example of a DAG.

Figure 16. A converging connection neuronal network setup for Figure 15 with cliques: abce, abde, acde and bcde from the output nodes. Take note that we stop forming new converging connections beyond level 4 of the neuronal network setup.

Figure 17. An alternative way of observing how the cliques abce, abde, acde and bcde can also be identified for a converging connection neuronal network setup for Figure 15.

In Figure 17, we assume all child nodes at Level 2 “converge" (dotted line with directional arrows) to a “quasi-node" storing the information e. To obtain the required cliques, find all the necessary combinations of antecedents by selecting three variables from the set .

Figure 18. A diverging connection neuronal network setup for Figure 15 with cliques defg, defh, degh and dfgh from the output nodes. Take note that we stop forming new diverging connections beyond level 4 of the neuronal network setup.

Figure 19. An alternative way of observing how the cliques defg, defh, degh and dfgh can also be identified for a diverging connection neuronal network setup for Figure 15.

In Figure 19, we assume all the child nodes at Level 2 “converge" (dotted line with directional arrows) to a “quasi-node" storing the information d ⟹ {e, f, g, h}. To obtain the required cliques, find all necessary combinations of consequents by selecting three variables from the set {e, f, g, h}.

Figure 20. Four clique trees corresponding to Figure 15. The separators are marked on the edges.

Although the procedure of using a simple neuronal network setup is capable of producing clique of trees, for some cases, the cliques obtained may not be able to form trees if the connected graph of cliques has a . In order for the join trees to be equivalent to the graph theory approach, we can therefore cause the DAG to be “triangulated" by adding an edge connecting two non-consecutive vertices in the affected cliques (i.e. having no separators between them).

We do this by adding two extra steps to the procedure discussed.

5. For nodes having converging (diverging) connections, join the variable antecedents (consequents) with an undirected edge in the DAG if they have no causal relationship. Add an undirected edge connecting two non-consecutive vertices in the affected cliques. Form causal relationship nodes for any two newly connected vertices in the DAG, and form converging or diverging connection child nodes using nodes from the affected cliques. If possible, form child nodes up to Level 4 of the neuronal network setup.

6. Identify the cliques from the output nodes and form join trees by connecting each clique to a clique, which shares the most vertices.

To illustrate the above steps of forming tree cliques corresponding to a “triangulated" DAG, an example shown below in Figure 21.

Figure 21. A DAG example to show clique trees found using Step 1 – Step 4 of the neuronal network procedure can have a cycle.

Using the neuronal network procedure (step 1 - 4) the cliques obtained are abc, bd, ce and def, and the connected graph using such cliques has a cycle as illustrated in Figure 22. The affected cliques that do not have any separators between them are bd and ce.

Figure 22. A connected graph of cliques corresponding to Figure 21, which has a cycle (using steps 1 - 4 of the neuronal network procedure). The affected cliques are “bd” and “ce”

To avoid forming cycles in a connected graph of cliques, based on step 5 of the extended procedure, we can connect the vertices b and c, and also d and e with undirected edges, and in addition, we also connect say, vertices b and e with a diagonal undirected edge (see Figure 23) since both b and e are non-consecutive vertices in the DAG.

Figure 23. Adding two edges (dashed line) on the converging and diverging connections on the DAG, and also a diagonal edge (dashed line), joining two non-consecutive vertices b and e in the affected cliques of Figure 21.

Hence, using step 5 of the extended procedure, we can obtain new nodes b ⟹ c, c ⟹ b, d ⟹ e, e ⟹ d and also, b ⟹ e and e ⟹ b. By connecting the nodes via converging or diverging connections with existing nodes b ⟹ d and c ⟹ e, we obtain the required cliques given Figure 24 below, with the join tree given in following Figure 25.

Of course, the join trees presented in Figure 25 are by no means exhaustive, as there are other ways to form other clique trees. One such clique tree that can be constructed is using cliques abd, ade, ace and def. In order for the neuronal network procedure to consistently be able to find cliques that do not form cycles, we can always “triangulate" the DAG by first adding edges before joining the nodes in the usual converging or diverging connections to obtain the required cliques.

Figure 24. New cliques bce and bde for “triangulated" DAG in Figure 23.

Figure 25. Two types of clique trees that can be formed using neuronal network procedure corresponding to the DAG in Figure 21. The separators are marked on the edges.

This alternate strategy does not pose much of a problem to our procedure (steps 1 - 4) since it is still able to identify the required cliques based on the converging or diverging connections of the vertices in a “triangulated" DAG. On the other hand, we can also use the serial connection strategy (see Figure 7) in the neuronal network setup to form cliques from a “triangulated" DAG. However, in order to do so in a systematic way, there is a need to introduce an alternate procedure or even add a few more extra steps to the procedure discussed.

In this white paper, we have discussed the NeuraBASE neuronal network approach to building trees of cliques based on the causal relationships using variables of a Bayesian network. Using a straightforward node linking process, the NeuraBASE approach serves as a new model, in which, Bayesian networks can be constructed and understood in probabilistic reasoning systems.

References

[1] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference and Prediction, Springer-Verlag, New York, (2001).

[2] D. Heckerman, A tutorial on learning with Bayesian networks, Technical Report MSR-TR- 95-06, Microsoft Research, Advanced Technology Research, Microsoft Corporation, (1996).

[3] F.V. Jensen, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, (2001).

[4] S. Lauritzen and D.J. Spiegelhalter, Local computations with probabilities on graphical structures and their applications to expert systems (with discussion), Journal of the Royal Statistical Society Series B, 50, pp. 157 { 224, (1988).

[5] J. Pearl, Probabilistic Inference in Intelligent Systems. Networks of Plausible Inference, Morgan Kaufmann Publishers, Inc., San Mateo, California, (1988).

[6] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, United Kingdom, (1996).

[7] S.J. Russel and P. Norvig, Artificial Intelligence. A Modern Approach, Prentice-Hall, Englewood Cli®s, New Jersey, (1995).

[8] R.E. Tarjan and M. Yannakakis, Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs, SIAM Journal of Computing, 13, pp. 566 - 579, (1984).

[9] Hercus RG (2009) Neural Network with Learning and Expression Capability, U.S. Patent 20090119236,