Investigating the Use of Bayesian Networks for Small Dataset Problems Anastacia Maria Macallister Iowa State University

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2018 Investigating the use of Bayesian networks for small dataset problems Anastacia Maria Macallister Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Engineering Commons Recommended Citation Macallister, Anastacia Maria, "Investigating the use of Bayesian networks for small dataset problems" (2018). Graduate Theses and Dissertations. 16629. https://lib.dr.iastate.edu/etd/16629 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Investigating the use of Bayesian networks for small dataset problems by Anastacia M. MacAllister A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Majors: Mechanical Engineering; Human Computer Interaction Program of Study Committee: Eliot Winer, Major Professor Eric Davis-Rozier Stephen Gilbert James Oliver Adarsh Krishnamurthy The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2018 Copyright © Anastacia M. MacAllister, 2018. All rights reserved. ii TABLE OF CONTENTS ABSTRACT ............................................................................................................................ iv CHAPTER 1: INTRODUCTION .................................................................................................. 1 Supervised Machine Learning ........................................................................................................ 4 Linear Classification .................................................................................................................. 5 Decision Trees ........................................................................................................................... 5 Support vector machines (SVM) ................................................................................................ 7 Neural networks ....................................................................................................................... 8 Bayesain networks .................................................................................................................... 9 Motivation .................................................................................................................................. 11 Dissertation Organization ............................................................................................................ 12 CHAPTER 2: BACKGROUND .................................................................................................. 13 Bayesian Network Applications ................................................................................................... 14 Bayesain networks for small sample sizes................................................................................ 16 Bayesian Statistics ....................................................................................................................... 19 Bayesian Networks ...................................................................................................................... 21 Research Issues ........................................................................................................................... 29 CHAPTER 3: MANIPULATING PRIOR PROBABILITES TO IMPROVE THE PERFROMANCE OF BAYESAIN NETWORKS WITH SMALL DATASETS ................................................................... 32 Abstract ...................................................................................................................................... 32 Introduction ................................................................................................................................ 33 Background ................................................................................................................................. 36 Bayesian Statistics ................................................................................................................... 36 Bayesian Networks.................................................................................................................. 37 Methods ..................................................................................................................................... 42 Data Collection and Processing ............................................................................................... 42 Constructing the Bayesian Network ........................................................................................ 44 Training the Bayesian Network................................................................................................ 46 Prior Manipulation Using Particle Swarm Optimization (PSO) .................................................. 47 Results and Discussion ................................................................................................................ 49 Manual Manipulation of Prior Probabilities ............................................................................. 49 PSO Optimization of Prior Probabilities ................................................................................... 54 Conclusion and Future Work ....................................................................................................... 58 References .................................................................................................................................. 60 iii CHAPTER 4: USING HIGH-FIDELITY MEATA-MODELS TO IMPROVE PERFORMANCE OF SMALL DATASET TRAINED BAYESAIN NETWORKS ................................................................ 64 Abstract ...................................................................................................................................... 64 Introduction ................................................................................................................................ 65 Background ................................................................................................................................. 67 Bayesian Statistics ................................................................................................................... 68 Bayesian Networks.................................................................................................................. 69 Methodology .............................................................................................................................. 70 Data Collection and Processing ............................................................................................... 70 Data Generation ..................................................................................................................... 72 Kriging .................................................................................................................................... 72 Radial Basis Functions (RBF) .................................................................................................... 75 Network Construction ............................................................................................................. 77 Training the Bayesian Networks .............................................................................................. 78 Prior Manipulation Using Particle Swarm Optimization (PSO) .................................................. 79 Results and Discussion ................................................................................................................ 81 Conclusions and Future Work ...................................................................................................... 91 References .................................................................................................................................. 93 CHAPTER 5: CONCLUSIONS AND FUTURE WORK ................................................................. 98 REFERENCES ...................................................................................................................... 101 iv ABSTRACT Benefits associated with machine learning are extensive. Industry is increasingly beginning to recognize the wealth of information stored in the data they are collecting. To sort through and analyze all of this data specialized tools are required to come up with actionable strategies. Often this is done with supervised machine learning algorithms. While these algorithms can be extremely powerful data analysis tools, they require considerable understanding, expertise, and a significant amount of data to use. Selecting the appropriate data analysis method is important to coming up with valid strategies based on the collected data. In addition, a characteristic of machine learning is the need to have large amounts of data to train a system’s behavior. Large quantities of data, thousands to millions of data points, ensure that automated

Investigating the Use of Bayesian Networks for Small Dataset Problems Anastacia Maria Macallister Iowa State University

Foundations of Bayesian Learning from Synthetic Data

Synthpop: Bespoke Creation of Synthetic Data in R

Other Departments and Institutes Courses 1

Synthetic Data Generation with Probabilistic Bayesian Networks

Fidelity and Privacy of Synthetic Medical Data Review of Methods and Experimental Results

Federated Principal Component Analysis

Synthetic Data Sharing and Estimation of Viable Dynamic Treatment Regimes with Observational Data

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

Anomaly Detection Based on Wavelet Domain GARCH Random Field Modeling Amir Noiboar and Israel Cohen, Senior Member, IEEE

SYNC: a Copula Based Framework for Generating Synthetic Data from Aggregated Sources

Optimal Principal Component Analysis of STEM XEDS Spectrum Images Pavel Potapov1,2* and Axel Lubk2

A Dissertation Entitled Big and Small Data for Value Creation And