Introduction to Neurabase: Forming Join Trees in Bayesian Networks
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to NeuraBASE: Forming Join Trees in Bayesian Networks Robert Hercus Choong-Ming Chin Kim-Fong Ho Introduction A Bayesian network or a belief network is a condensed graphical model of probabilistic relationships amongst a set of variables and is also a popular representation to encode uncertainties in expert systems. In general, Bayesian networks can be designed and trained to provide future classifications using a set of attributes derived from statistical data analysis. Other methods for learning Bayesian networks from data have also been developed (see Heckerman [2]). In some cases, they can also handle situations where data entries are missing. Bayesian networks are often associated with the notion of causality and for a network to be considered a Bayesian network, the following requirements (see Jensen [3]) must hold: ● A set of random variables and a set of directed edges between variables must exist. ● Each variable must have a finite set of mutually exclusive states. ● The variables together with the directed edges must form a directed acyclic graph (DAG). (A directed graph is acyclic if there is no directed path A1 → . An such that A1 = An). ● For each variable A with parents B1, B2, . , Bn there is an attached potentiated table P(A|B1, B2, . , Bn). In this paper, its main objective is to provide an alternative method to find clique trees that are paramount for the efficient computation of probabilistic enquiries posed to Bayesian networks. For a simple Bayesian network, as shown in Figure 1, given the evidence that people are leaving, we may want to know what is the probability that fire has occurred, or even the probability that there is smoke, given that the alarm has gone off. Much of the research in Bayesian networks is focused on developing efficient algorithms for computing probabilities and updating them whenever the evidence becomes available. There are currently two families of algorithms for computing such probabilities. The first algorithm is attributed to Pearl [5] and, Russell and Norvig [7]. In those works, the focus of the research was applied to polytrees, which are directed acyclic graphs (DAGs) with a maximum of one path between any two variables as illustrated in Figure 2. Figure 1. A Bayesian network with six Boolean variables Figure 2. An example of a polytree The other family of algorithms follows the techniques of Lauritzen and Spiegelhalter [4] and is applied to general DAGs. To compute the required probabilities there is a need to first convert the DAG into trees of cliques or a join tree of cliques. For the case of DAGs which are polytrees, techniques attributed to Pearl [6] have shown that there is no necessity to convert such DAGs into join trees before calculating probabilities on the Bayesian network. Therefore, join trees are usually needed when the Bayesian network cannot be expressed as a polytree. On the other hand, polytrees can be converted into join trees to compute the required probabilities of the Bayesian networks using the techniques of Lauritzen and Spiegelhalter [4]. In theory, a join tree of cliques is a connected acyclic graph comprising of connected nodes. In this case, each node (or clique) is a collection of joined network variables. The current technique to form a join tree is through the use of the graph theory approach. This paper examines an alternate neuronal network modelling method based on Hercus [9], which is based on a simple concept that can be applied to model large scale Bayesian network problems. Graph Theory Approach In the methods of Lauritzen and Spiegelhalter [4], the first stage of creating clique trees is the conversion of the DAG into a moral graph, which is an undirected graph. An undirected graph is a collection of vertices and edges. In Bayesian networks, the set of variables correspond to the vertices, and if there is a causal connection between two variables, the pair of variables is joined by an edge, which is indicated on a diagram by a line (without an arrow). To form a moral (undirected) graph of a DAG, the following steps are used: 1. If each vertex in the DAG has two or more parents, undirected edges joining the parents are added; 2. All directed edges in a DAG are replaced by undirected ones. To illustrate how the procedure works, an example of a DAG is shown in Figure 3. To form the moral graph as shown in Figure 4, since vertices d, g and h has two parents each, a to b, d to e and d are first joined to f with undirected edges, and subsequently, replace all the directed edges by undirected ones. Figure 3. An example of a DAG Figure 4. The moral graph corresponding to Figure 3 A set of cliques is a collection of vertices, where all the vertices are connected to each other, and to which, no other vertex is completely connected. The general requirement to form a tree of cliques is that the moral graph has to be triangulated (chordal or decomposable), and where, there are no cycles of four or more vertices without an edge joining two non-consecutive vertices. To triangulate a moral graph, extra edges are usually added on certain vertices. The following procedure is based on Tarjan and Yannakakis [8] to triangulate the moral graph and to form the tree of cliques: 1. Number the vertices by a maximal cardinality search. Start anywhere, and number a vertex with the largest number of numbered neighbours. 2. Starting with the highest numbered vertex, check that all the lower numbered neighbours of each vertex are neighbours. Add missing edges in order to produce a triangulated graph. 3. Identify all cliques, and number them by the highest numbered vertex in the clique. 4. Form the join tree (a connected graph with no cycles) by connecting each clique to a predecessor (according to the numbering in step 3) which shares the most vertices. Variables that two neighbouring cliques have in common are called the separator. From the moral graph given in Figure 4, we can see that it is already a triangulated graph where, there is an edge joining the four vertices a, b, d and e. Using the steps described above, we can begin the maximum cardinality search. By starting, for example, from a, and one of the ordering is abdegfhc and the cliques are C1 = abd, C2 = ade, C3 = deg, C4 = dfh and C5 = cf with separators S2 = ad, S3 = de, S4 = d and S5 = f. There are three choices for forming the join tree, as C4 can be joined to any of C1, C2 or C3. Figure 5 shows the various types of clique trees where the separators are marked on the edges. In step 1 and step 2, we can see that the technique to identify the cliques of a Bayesian network is quite subjective, where different ordering or numbering of the vertices of the moral graph will tend to produce different types tree of cliques. Hence, in order to find a different join tree from the one we have found, there is a need to repeat the entire procedure again. This is especially true if we have more complex DAGs in Figure 15 and Figure 21. Figure 5. Three clique trees corresponding to Figure 4 where the separators are marked on the edges. In the next section, we will look into an alternate strategy of forming various types of join trees by following the same procedure once, with the steps involved being less subjective than the graph theory approach. NeuraBASE Neuronal Network Approach In this section, we will discuss a technique to form join trees of Bayesian networks using a neuronal network approach described in Hercus [9]. Rather than setting up a built-in unsupervised learning approach, the neuronal network nodes are built based on causal relationships between the variables inherited from the Bayesian network problem. Based on the association rules analysis terminology (see Hastie et al. [1], pp. 439 - 444), if there is a causal relationship between the network variables A and B, which follows the direction A → B, such that A ≠ B, the neuronal network setup will use the following notation, where A is called the “antecedent" and B is referred to as the “consequent". A ⟹ B In essence, the method of forming neuronal network is based on linking two parent nodes at Level 1 of the neuronal network (variables from the causal probability Bayesian network) to form an association at Level 2, i.e. a child node (causal relationship between two variables). By so doing, this new node will then encapsulate the information of both its parents, which, will then form links with other nodes if the node has causal relationship with other variables. For a visual explanation on how the neuronal network works, see Figure 6 below. Figure 6. An example of a linking process to form causal relationships encapsulated in the child node A → B, and also a child node B → A at Level 2. Both parents A and B are at Level 1 of the neuronal network framework. In Figure 6, the central point of this technique is the directional associations from the parent nodes (representing variables in a Bayesian network) to the child node. The child node is viewed as a representation of causal relationship between its parent variables. Instead of directly associating the relationship of both parents onto the child node, the proposed neuronal network strategy works one step at a time. For example in Figure 6, if there is a network causal relationship from A → B, then the node A → B is formed by first linking the variable A to the child node with a directional association, followed by another directional association to node B.