Extracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection
Total Page:16
File Type:pdf, Size:1020Kb
Extracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection Scott Wahl 1, John W. Sheppard 1 1 Department of Computer Science, Montana State University, Bozeman, MT, 59717, USA [email protected] [email protected] ABSTRACT procedure is a natural extension of the general process In this paper, we present a comparison of five differ- used in troubleshooting systems. Given no prior knowl- ent approaches to extracting decision trees from diag- edge, tests are performed sequentially, continuously nar- nostic Bayesian nets, including an approach based on rowing down the ambiguity group of likely faults. Re- the dependency structure of the network itself. With this sulting decision trees are called “fault trees” in the sys- approach, attributes used in branching the decision tree tem maintenance literature. are selected by a weighted information gain metric com- In recent years, tools have emerged that apply an al- puted based upon an associated D-matrix. Using these ternative approach to fault diagnosis using diagnostic trees, tests are recommended for setting evidence within Bayesian networks. One early example of such a net- the diagnostic Bayesian nets for use in a PHM applica- work can be found in the creation of the the QMR knowl- tion. We hypothesized that this approach would yield edge base (Shwe et al., 1991), used in medical diagno- effective decision trees and test selection and greatly re- sis. Bayesian networks provide a means for incorpo- duce the amount of evidence required for obtaining ac- rating uncertainty into the diagnostic process; however, curate classification with the associated Bayesian net- Bayesian networks by themselves provide no guidance works. The approach is compared against three alter- on which tests to perform when. Rather, test information natives to creating decision trees from probabilistic net- is applied as evidence whenever it becomes available. In works such as ID3 using a dataset forward sampled from this paper, we compare approaches to using Bayesian the network, KL-divergence, and maximum expected networks to derive decision trees to integrate the ad- utility. In addition, the effects of using χ2 statistics and vantage of each approach.1 Our approach weights the probability measures for pre-pruning are examined. The classes based on prior probability distributions and then results of our comparison indicate that our approach pro- derives a weighted decision tree using the associated D- vides compact decision trees that lead to high accuracy matrix characterizing the structure of the Bayesian net- classification with the Bayesian networks when com- work. We hypothesize that this method will yield com- pared to trees of similar size generated by the other meth- pact decision trees (thus reducing the amount of evidence ods, thus supporting our hypothesis. required to be evaluated) that result in high classification accuracy relative to alternative methods that we evalu- ated. 1. INTRODUCTION Historically, creating decision trees is usually based Proper support of a system is wisely regarded as vital around a static set of data (Casey & Nagy, 1984; Heck- to its function. In general, support of a system involves erman, Geiger, & Chickering, 1995; Martin, 1997; both corrective and preventative maintenance. The pri- Murthy, 1997; Quinlan, 1986). It has been noted by such mary goal of corrective maintenance is to repair faults previous work that information theory and other methods while preventative maintenance attempts to avoid faults are based upon the assumption that the data set accu- or improve the useful lifetime of parts. Satisfying these rately represents the true underlying distribution of the goals requires isolating what faults have occurred or are data. To completely do so requires an infinitely large most likely to occur. data set. However, by using diagnostic Bayesian net- Decision trees have been used extensively in perform- works directly, it is possible to use the distributions and ing fault diagnosis during corrective maintenance. This information provided by the network to create a decision tree directly. Without adequate stopping criteria, how- This is an open-access article distributed under the terms of ever, the resulting trees are likely to be extraordinarily the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction 1While this paper focuses on deriving static decision trees, in any medium, provided the original author and source are the algorithms used can be adapted easily to an online setting credited. where tests are recommended dynamically. 1 Annual Conference of the Prognostics and Health Management Society, 2010 large. Another potential method for creating decision of these measures are used directly in the generation of trees from probabilistic networks is based upon using the decision trees in this paper. Specifically, information the structure of the network. The work here uses a spe- theory is used in generating the decision tree from a D- cialized adjacency matrix used for diagnostic networks, Matrix and by forward sampling the diagnostic network. called a D-matrix, to accomplish this task. Much of the prior work in creating decision trees In describing the results of our study, this paper has and test selection from Bayesian networks focuses on been organized as follows. Section 2 provides motiva- reducing the expected cost of performing a series of tion behind the current study. Section 3 describes related actions, both tests and repairs (Heckerman, Breese, & work with decision trees and Bayesian networks. Sec- Rommelse, 1995; Mussi, 2004; Skaanning, Jensen, & tion 4 provides background information on probabilistic Kjærulff, 2000; Jensen et al., 2001; Vomlel, 2003). In models and decision trees. Section 5 specifies the ap- the SACSO system, tests and actions are recommended proach used in creating the decision trees. Sections 6 based upon the expected cost of taking any action or per- and 7 give the results and discussion, respectively. Fi- forming of tests using the probability the problem will nally, we conclude in section 8. be corrected and a cost function. Given the nature of the problem, the SACSO system relies on a set of networks 2. MOTIVATION where each reported problem, e.g. spots on a page, is the root of its own tree. A similar approach was used Recent trends in developing diagnostic tools have fo- by Mussi for the generation of sequential decision sup- cused on providing online methods for selecting tests to port systems. These approaches are similar to what is evaluate and on incorporating uncertainty into the rea- defined by Koller for measures of maximum expected soning process. Legacy diagnostic systems rely on static utility (MEU) (Koller & Friedman, 2009). Given that fault trees to guide the test and diagnosis process. The performing the inference for such measures is intractable primary issue with using static trees is that they are not for large problems, Zheng, Rish, and Beygelzimer devel- able to adapt to changing conditions such as the loss of a oped an algorithm for calculating approximate entropy test resource. Even so, evaluating the performance of an (Zheng, Rish, & Beygelzimer, 2005). adaptive system is difficult in that the reasons for select- Use of probabilities in creating decision trees has been ing certain sources of evidence may be unclear. analyzed before by Casey and Nagy for use in optical Our research is focused on developing Bayesian diag- character recognition (Casey & Nagy, 1984). Based nostic systems that can be assessed through sound em- upon a data set, the probability of a single pixel having pirical analysis. In an attempt to reduce the amount of a specific color was determined for each possible letter. evidence to be evaluated and maintain control, we found This information was then used in creating the decision it beneficial to use static decision trees in the evaluation tree. process. Such trees can be used to control the amount Additional work has been performed in optimal cre- of evidence evaluated, can be examined to justify the ation of AND/OR decision trees. Research by Patti- test choices, and can be applied consistently across all pati and Alexandridis has shown that using a heuristic methods studied. Unfortunately, as we started looking search with AND/OR graphs and the Gini approach can for existing approaches to generate decision trees to be be used to create optimal and near-optimal test sequences used with Bayesian networks, we found very little direct, for problems (Pattipati & Alexandridis, 1990). comparative work had been done in this area. There- fore, we sought to evaluate alternative approaches with There has also been research performed in extract- the goal of finding one that had minimal computational ing rules and decision trees from neural networks in an burden (both in generating the tree and in using the tree) attempt to provide context to learned neural networks and still yielded accurate results once evidence was eval- (Craven. & Shavlik, 1996). Additionally, combining the uated. two approaches, decision trees and Bayesian networks, has been performed by Zhou, Zheng, and Chen (Zhou, In order to do so, three existing methods were se- Zhang, & Chen, 2006). Their research associated a prior lected based upon information theory and value of in- distribution to the leaves of the decision tree and uti- formation: forward sampling and ID3, maximum ex- lized a Monte-Carlo method to perform inference.