Properties, Learning Algorithms, and Applications of Chain Graphs and Bayesian Hypergraphs

University of South Carolina Scholar Commons Theses and Dissertations Fall 2019 Properties, Learning Algorithms, and Applications of Chain Graphs and Bayesian Hypergraphs Mohammad Ali Javidian Follow this and additional works at: https://scholarcommons.sc.edu/etd Part of the Computer Sciences Commons, and the Engineering Commons Recommended Citation Javidian, M. A.(2019). Properties, Learning Algorithms, and Applications of Chain Graphs and Bayesian Hypergraphs. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/5558 This Open Access Dissertation is brought to you by Scholar Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Properties,Learning Algorithms, and applications of Chain Graphs and Bayesian Hypergraphs by Mohammad Ali Javidian Bachelor of Science Shahid Bahonar University of Kerman 2003 Master of Science Sharif University of Technology 2013 Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science and Engineering College of Engineering and Computing University of South Carolina 2019 Accepted by: Marco Valtorta, Major Professor Pooyan Jamshidi, Committee Member Linyuan Lu, Committee Member Chin-Tser Huang, Committee Member John Rose, Committee Member Cheryl L. Addy, Vice Provost and Dean of the Graduate School c Copyright by Mohammad Ali Javidian, 2019 All Rights Reserved. ii Dedication To Marco Valtorta, my dearest professor. iii Abstract Probabilistic graphical models (PGMs) use graphs, either undirected, directed, or mixed, to represent possible dependencies among the variables of a multivariate probability distribution. PGMs, such as Bayesian networks and Markov networks, are now widely accepted as a powerful and mature framework for reasoning and decision making under uncertainty in knowledge-based systems. With the increase of their popularity, the range of graphical models being investigated and used has also expanded. Several types of graphs with different conditional independence interpretations - also known as Markov properties - have been proposed and used in graphical models. The graphical structure of a Bayesian network has the form of a directed acyclic graph (DAG), which has the advantage of supporting an interpretation of the graph in terms of cause-effect relationships. However, a limitation is that only asymmetric relationships, such as cause and effect relationships, can be modeled between variables in a DAG. Chain graphs, which admit both directed and undirected edges, can be used to overcome this limitation. Today there exist three main different interpretations of chain graphs in the lit- erature. These are the Lauritzen-Wermuth-Frydenberg, the Andersson-Madigan-Perlman, and the multivariate regression interpretations. In this thesis, we study these interpretations based on their separation criteria and the intuition behind their edges. Since structure learning is a critical component in constructing an intelligent system based on a chain graph model, we propose new feasible and efficient structure learning algorithms to learn chain graphs from data under the faithfulness assumption. The proliferation of different PGMs that allow factorizations of different kinds leads us to consider a more general graphical structure in this thesis, namely directed acyclic iv hypergraphs. Directed acyclic hypergraphs are the graphical structure of a new probabilistic graphical model that we call Bayesian hypergraphs. Since there are many more hypergraphs than DAGs, undirected graphs, chain graphs, and, indeed, other graph-based networks, Bayesian hypergraphs can model much finer factorizations and thus are more computationally efficient. Bayesian hypergraphs also allow a modeler to represent causal patterns of interaction such as Noisy-OR graphically (without additional annotations). We introduce global, local and pairwise Markov properties of Bayesian hypergraphs and prove under which conditions they are equivalent. We also extend the causal interpretation of LWF chain graphs to Bayesian hypergraphs and provide corresponding formulas and a graphical criterion for intervention. The framework of graphical models, which provides algorithms for discovering and analyzing structure in complex distributions to describe them succinctly and extract un- structured information, allows them to be constructed and utilized effectively. Two of the most important applications of graphical models are causal inference and information ex- traction. To address these abilities of graphical models, we conduct a causal analysis, comparing the performance behavior of highly-configurable systems across environmen- tal conditions (changing workload, hardware, and software versions), to explore when and how causal knowledge can be commonly exploited for performance analysis. v Table of Contents Dedication .................................... iii Abstract ..................................... iv List of Tables .................................. ix List of Figures .................................. xi Chapter 1 Introduction ............................. 1 Chapter 2 LWF Chain Graphs ......................... 8 2.1 Basic Definitions and Concepts . 9 2.2 On the Properties of LWF Chain Graphs . 11 2.3 Finding Minimal Separators in LWF Chain Graphs . 13 2.4 Efficient Learning of LWF Chain Graphs under the Faithfulness Assumption 21 Chapter 3 MVR Chain Graphs ......................... 42 3.1 Basic Definitions and Concepts . 43 3.2 On the Properties of MVR Chain Graphs . 48 3.3 Finding Minimal Separators in MVR Chain Graphs . 65 3.4 Order-Independent Structure Learning of Multivariate Regression Chain Graphs . 74 vi 3.5 A Decomposition-Based Algorithm for Structure Learning of MVR Chain Graphs . 93 3.6 Discussion and Conclusion . 117 Chapter 4 AMP Chain Graphs . 120 4.1 Basic Definitions and Concepts . 121 4.2 On the Properties of AMP Chain Graphs . 124 4.3 Finding Minimal Separators in AMP Chain Graphs . 126 4.4 Order-Independent Structure Learning of AMP Chain Graphs . 134 4.5 A Decomposition-Based Algorithm for Learning the Structure of AMP CGs 144 4.6 Discussion and Conclusion . 161 Chapter 5 Bayesian Hypergraphs . 166 5.1 Basic Definitions and Concepts . 168 5.2 Markov properties for undirected graphs . 177 5.3 Bayesian Hypergraphs . 179 5.4 Intervention in Bayesian hypergraphs . 193 Chapter 6 Causal Transfer Learning . 205 6.1 Introduction . 205 6.2 Causal Graphs . 208 6.3 Research Questions and Methodology . 210 6.4 Identification of Causal Effects (RQ1) . 212 6.5 Transportability of Causal and Statistical Relations Across Environ- ments (RQ2) . 215 vii 6.6 Generalizing Statistical Findings Across Sampling Conditions (RQ3) . 217 6.7 Threats to Validity . 218 Chapter 7 Concluding Remarks . 220 Bibliography ................................... 225 Appendix AProofs of Correctness of the Theorems and Algorithms in Sec- tion 2.4 . 239 Appendix BProof of Theorem 3.10 . 243 Appendix CProofs of Theoretical Results in Section 3.5 . 247 Appendix DProofs of Correctness of Algorithms 11 and 12 . 253 Appendix EProofs of Correctness of the Algorithms in Sections 4.4 and 4.5 . 255 viii List of Tables Table 2.1 The trace table of Algorithm 4 for i = 3 and order1(V) = (d; e; a; c; b). 29 Table 2.2 The trace table of Algorithm 4 for i = 3 and order2(V) = (d; c; e; a; b). 29 Table 2.3 Results for discrete samples from the ASIA, INSURANCE, ALARM, and HAILFINDER networks respectively. Each row corresponds to the significance level: α = 0:05=0:01=0:005 respectively. 41 Table 3.1 The trace table of Algorithm 9 for i = 3 and order1(V) = (d; e; a; c; b). 79 Table 3.2 The trace table of Algorithm 9 for i = 3 and order2(V) = (d; c; e; a; b). 79 Table 3.3 Order-dependence issues and corresponding modifications of the PC- like algorithm that remove the problem. “Yes" indicates that the corresponding aspect of the graph is estimated order-independently in the sample version. 88 Table 3.4 Results for discrete samples from the ASIA network. Each row corresponds to the significance level: α = 0:05=0:01=0:005 respectively. 115 Table 3.5 Results for discrete samples from the INSURANCE network. Each row corresponds to the significance level: α = 0:05=0:01=0:005 respectively. 116 Table 3.6 Results for discrete samples from the ALARM network. Each row corresponds to the significance level: α = 0:05=0:01=0:005 respectively. 116 Table 3.7 Results for discrete samples from the HAILFINDER network. Each row corresponds to the significance level: α = 0:05=0:01=0:005 respectively. 117 Table 4.1 The trace table of Algorithm 17 for i = 1 and order1(V) = (d; c; b; a; e). For simplicity, we define ADJ (u):= [ad (u) ad (ad (u))] u; v . 138 H H [ H H n f g Table 4.2 The trace table of Algorithm 17 for i = 1 and order2(V) = (d; e; a; c; b). For simplicity, we define ADJ (u):= [ad (u) ad (ad (u))] u; v . 138 H H [ H H n f g ix Table 4.3 The trace table of Algorithm 18 for i = 1, order1(V) = (d; c; b; a; e), and order2(V) = (d; e; a; c; b). For simplicity, we define ADJH(u):= [ad (u) ad (ad (u))] u; v . 142 H [ H H n f g Table 4.4 Order-dependence issues and corresponding modifications of the PC- like algorithm that remove the problem. “Yes" indicates that the corresponding aspect of the graph is estimated order-independently in the sample version. 144 Table 4.5 Results for discrete samples from the ASIA, INSURANCE, ALARM, and HAILFINDER networks respectively. Each row corresponds to the significance level: α = 0:05=0:01=0:005 respectively. 162 Table 5.1 The conditional probability distribution P(C A; B) for a model with: j (a) noisy functional dependence (Noisy-OR), and therefore P(C = n A = y; B = y) = P(C = n A = y)P(C = n B = y); (b) non-noisy j j j functional dependence. 167 Table 5.2 Factorizations and corresponding BH representations . 185 Table 6.1 Overview of the real-world subject systems . 213 Table 7.1 Properties of chain graphs under different interpretations .

Load more