<<

entropy

Article Graphical Models in Reconstructability Analysis and Bayesian Networks

Marcus Harris * and Martin Zwick

Systems Science Program, Portland State University, Portland, OR 97207, USA; [email protected] * Correspondence: [email protected]

Abstract: Reconstructability Analysis (RA) and Bayesian Networks (BN) are both probabilistic graphical modeling methodologies used in and artificial intelligence. There are RA models that are statistically equivalent to BN models and there are also models unique to RA and models unique to BN. The primary goal of this paper is to unify these two methodologies via a lattice of structures that offers an expanded of models to represent complex systems more accurately or more simply. The conceptualization of this lattice also offers a framework for additional innovations beyond what is presented here. Specifically, this paper integrates RA and BN by developing and visualizing: (1) a BN neutral system lattice of general and specific graphs, (2) a joint RA-BN neutral system lattice of general and specific graphs, (3) an augmented RA directed system lattice of prediction graphs, and (4) a BN directed system lattice of prediction graphs. Additionally, it (5) extends RA notation to encompass BN graphs and (6) offers an to search the joint RA-BN neutral system lattice to find the best representation of system structure from underlying system variables. All lattices shown in this paper are for four variables, but the theory and methodology   presented in this paper are general and apply to any number of variables. These methodological innovations are contributions to machine learning and artificial intelligence and more generally to Citation: Harris, M.; Zwick, M. complex systems analysis. The paper also reviews some relevant prior work of others so that the Graphical Models in innovations offered here can be understood in a self-contained way within the context of this paper. Reconstructability Analysis and Bayesian Networks. Entropy 2021, 23, Keywords: probabilistic graphical models; Reconstructability Analysis; Bayesian networks; informa- 986. https://doi.org/10.3390/ tion theory; maximum entropy; artificial intelligence; machine learning; lattice of general structures; e23080986 ; directed acyclic graph

Academic Editor: Christopher J. Fonnesbeck

Received: 5 June 2021 1. Introduction Accepted: 26 July 2021 Reconstructability Analysis (RA) and Bayesian Networks (BN) are both probabilistic Published: 30 July 2021 graphical modeling methodologies. A probabilistic uses a graph (or hypergraph) to encode independencies and dependencies between variables and proba- Publisher’s Note: MDPI stays neutral bility theory to encode the precise nature of the relations between variables. Graphs are with regard to jurisdictional claims in either undirected or directed. RA graphs include undirected graphs (or ) that published maps and institutional affil- have loops or do not have loops. BN graphs are directed graphs that do not have cycles. iations. (“Loops” here refer to undirected graphs; “cycles” refer to directed graphs.) RA and BN graphs can represent independence structures that are unique to each methodology, and also independence structures that are the same in both methodologies. For RA models without loops and for all BN models, variable independencies can be represented in closed Copyright: © 2021 by the authors. algebraic (factorized) form. For RA models with loops, solutions require iterative calcula- Licensee MDPI, Basel, Switzerland. tions. The value of integrating these two methodologies lies in the fact that the RA lattice This article is an open access article of structures offers potential models of complex systems not found in BNs, while BNs distributed under the terms and are a more widely used analytical approach than RA and also include unique models. conditions of the Creative Commons Combining the candidate models of the two methodologies thus offers a more expressive Attribution (CC BY) license (https:// framework than either alone. It also does so in an organized and coherent way that allows creativecommons.org/licenses/by/ for future possible extensions discussed in Section6. 4.0/).

Entropy 2021, 23, 986. https://doi.org/10.3390/e23080986 https://www.mdpi.com/journal/entropy Entropy 2021, 23, 986 2 of 39

RA is a data modeling approach developed in the systems community [1–17] that combines and information theory. Its applications are diverse, including time- series analysis, classification, decomposition, compression, pattern recognition, prediction, control, and decision analysis [14]. It is designed especially for nominal variables, but continuous variables can be accommodated if their values are discretized. RA could in theory accommodate continuous variables; however, this extension of the methodology has yet to be formalized. Graph theory specifies the structure of the model: if the relations between the variables are all dyadic (pairwise), the structure is a graph; if some relations have higher ordinality, the structure is a hypergraph. In speaking of RA, the word ‘graph’ will henceforth include the possibility that the structure is a hypergraph. The structure is independent of the data except for specification of variable cardinalities. In RA, information theory uses the data to characterize the precise nature and the strength of the relations. Data applied to a graph structure yields a probabilistic graphical model of the data. RA has three primary types of models: variable-based models without loops, variable- based models with loops and state-based models (where individual states of variables specify model constraints) that nearly always have loops. Models that do not have loops have closed-form algebraic solutions; those that have loops require iterative proportional fitting. In RA, graphs are undirected, although directions are implicit if one variable is designated as the response variable (dependent variable or DV), while all other variables are designated as explanatory variables (independent variables or IVs). In principle, there could be more than one DV, but in the discussion that follows, a single DV is assumed. If the IV-DV distinction is made, the system is ‘directed’ and the primary aim is prediction of the DV given the IVs; if no IV-DV distinction is made, the system is ‘neutral’ and the primary aim is to characterize the nature of relations among all variables. RA models are undirected graphs that either have or do not have loops, where a ‘loop’ is the presence of circularity in a set of undirected links. We reserve the word ‘’ and ‘acyclic’ for circularity or lack thereof in directed graphs, which are used in BN and not in RA. An undirected graph having a loop can become an acyclic graph for certain assignments of link directions. For example, an RA model that posits relations between A and B, between B and C, and between A and C has a loop, but if directions are assigned in a BN model so that these relations are A→B, B→C, and A→C, the resulting graph is acyclic. Graphs are general or specific. A general graph identifies relations among variables that are unlabeled, i.e., variables whose identity is not specified; a specific graph labels (identifies) the variables. For example, for a system consisting of variables A, B, and C, AB:BC is a specific graph where nodes A and B are linked and B and C are also linked. Specific graphs AB:BC, BA:AC and AC:CB are all instances of the same general graph that has a unique independence structure regardless of variable labels. In this notation, the order of variables in any relation is arbitrary, as is the order of the relations. For example, CB:BA is identical to AB:BC. Relations include all of their embedded relations. For example, ABC includes embedded relations AB, AC and BC and the univariate margins A, B, and C. The lattice of graphs for a neutral or a directed system with or without loops depends upon the number of variables in the data. For a three-variable neutral system allowing loops there are five general graphs and nine specific graphs; for four variables there are 20 general graphs and 114 specific graphs. The number of graphs increases hyper-exponentially with the number of variables. In the confirmatory mode, RA can test the significance of a single model—a hypothesis being tested—relative to another model used as a reference. In the exploratory mode, RA can search the lattice of graphs for models that are statistically significant and best represent the data with maximal information captured and minimal complexity. Bayesian Networks (BN) are another probabilistic graphical modeling approach to data modeling that is closely related to RA. Indeed, where BN overlaps RA the two methods are equivalent, but with respect to neutral systems, RA and BN each has distinctive features absent in the other methodology. For directed systems; however, where prediction of a single dependent variable is the aim, RA encompasses all models found in BN under the Entropy 2021, 23, 986 3 of 39

convention used in this paper that all nodes except for parent nodes within a V-structure are allowed to be the variable being predicted; this inclusion of the BN directed system lattice within the RA lattice will be shown later in this paper. BNs have origins in the type of model described by Wright [18,19], but it was not until the 1980s that BNs became more formally established [20–23]. As does RA, BN combines graph theory and probability theory: graph theory provides the structure and probability theory characterizes the nature of relationships between variables. BNs are represented by a single type of graph structure; a directed acyclic graph, which is a subset of chain graphs, also known as block recursive models [24]. BNs can be represented more generally by partially directed acyclic graphs (PDAG), a subset of chain graphs where edge directions are removed when directionality has no effect on the underlying inde- pendence structure. Discrete variables are most common in BNs, but BNs accommodate continuous variables without discretization [25]. In principal RA could also accommodate continuous variables but this feature has not yet been implemented. For a three variable BN lattice, there are 5 general graphs and 11 specific graphs; for four variables there are 20 general graphs and 185 specific graphs with unique probability distributions. In the confirmatory mode, BNs can test the significance of a model relative to another model used as a reference [26]; in the exploratory mode, BNs can search for the best possible model given a scoring metric. BNs are used to model expert knowledge about uncertainty and [20,21] and are also used for exploratory data analysis with no use of expert knowledge [27]. Like RA, BN applications in machine learning and artificial intelligence are broad including classification, prediction, compression, pattern recognition, image processing, time-series, decision analysis and many others. The joint RA-BN lattice of neutral system general and specific graphs and the accom- panying search algorithm developed in this paper expands both RA and BN beyond what was previously available by either RA alone or BN alone, thus providing a more complete ensemble of models for the representation of complex systems. When prediction of a single dependent variable (DV) is the aim, the RA directed system lattice encompasses the BN directed system lattice under the strict convention used in this paper that excludes a parent node with a V-structure being the DV. However, we also show that when this constraint is relaxed so that a parent node within a V-structure can be the DV, BN models can offer predictions unique to BN. We also show that (under the above convention) the BN directed system lattice reduces the size of the full BN neutral system lattice by retaining only graphs that give unique predictions of the DV, significantly reducing the search space to find the best BN when prediction of a single DV is the aim. Finally, this paper develops an aug- mented RA directed system lattice which expands the conventional RA lattice of prediction graphs to include naïve Bayes equivalent graphs. This augmented lattice encompasses graphs in the BN directed system lattice and allows for models of complex systems which are (i) more predictive and/or (ii) simpler and thus both more comprehensible and more generalizable than models restricted to the conventional RA directed system lattice.

2. RA Lattice 2.1. RA Neutral Systems All lattices shown in this paper are for four variables, but the theory and methodology presented in this paper are general and apply to any number of variables. RA neutral systems include only independent variables, i.e., there is no concept in such systems of a dependent variable. A neutral system model thus represents the relationships, graphically and probabilistically, between all the (independent) variables. The graphical representation specifies the independencies among variables. When data are then applied, probabilities represent the strength of the relationships between dependent variables. Neutral system graphs are commonly used in applications where variable clustering is important, such as computer vision and social and biological network analyses. Neutral system analysis is more computationally demanding than directed system analysis, so when one is really interested in predicting specific variables, directed system models are more convenient. Entropy 2021, 23, 986 4 of 39

The four-variable RA lattice of neutral system general graphs (Figure1), [ 7,9], rep- resents all four-variable RA graphs with unique independence structures. Bold graphs do not have loops while non-bold graphs have loops. In these graphs, lines (including branching lines) are variables; boxes are relations. Where only two lines extend from a box, the relation is dyadic. If more than two lines extend from a box, the graph is a hypergraph. Where two or more specific graphs have the same independence structure, regardless of variable labels, they are part of the same general graph equivalence class. For example, the left-most and right-most variables in G7 are independent of one another given the two central variables that connect both relations; this results in the general independence structure (. ⊥.. | ... , ... .), where each different number of dots indicates a different variable, but does not specify its actual identity. The expression says that the first variable is independent (“⊥” is the symbol used in this paper for independence) of the second variable given (“|” is the symbol used in this paper for “given”) the third and (the comma “,” represents a logical “and”) fourth variables. G1 is the most complex general graph, in which the variables are connected in a tetradic relation. Graphs below G1 reflect increasingly less complex decompositions of G1, ending with G20 which has complete independence among the variables. Arrows from one general graph to another represent hierarchy such that going from the parent graph (the source of the arrow) to the child graph (the terminus of the arrow) results from deleting one relation from the parent graph. In this paper, when the variables of a general graph are labeled in RA or BN, it is called a specific graph, which is a unique probabilistic model given the data. For RA, given data and after labeling all the variables, there is only one specific graph for any general graph. By contrast, as explained in the Section3, (beginning in the Section 3.2.1), two or more topologically different BN general graphs can have the same probability distribution; such equivalent graphs have the same underlying set of independencies even though they are topologically different; they are said to constitute a ‘Markov equivalence class’ [28]. RA graphs can include pairwise and non-pairwise relations. For example, graph G15 has four lines (variables) and three boxes (relations). One line connects to all three boxes, meaning one variable is included in all three relations, and separately a single line representing one of the other three variables extends from each box. Because only two lines extend from any given box, all relations in G15 are pairwise (dyadic). Figure2 shows G15 with labels (A, B, C, D) added for the variables, yielding a specific structure having dyadic relations AD, BD, and CD. In RA notation, this graph is AD:BD:CD, the colon represents independence among relations. The notation AD:BD:CD encodes the inde- pendencies (A ⊥ B, C | D), (B ⊥ C | D). The example in Figure2 represents one of four specific graphs for the general graph G15, the other possible permutations are AB:AC:AD, BA:BC:BD, CA:CB:CD. These permutations have the same general independence struc- ture (. ⊥.., ... | ... .), ( ... ⊥ ... | ... .), but given data, produce different conditional probability distributions. In contrast to graph G15 which includes dyadic relations only, graph G13 in Figure1 is a hypergraph, with three lines extending from one box and a single line extending from the other. This could, for example, represent four variables A, B, C and D, where A, B, and C label the three lines extending from one box, and D labels the single line extending from the other box. Figure3 shows this specific graph, which in RA notation is ABC:D with the independence structure of (D ⊥ A, B, C). This example represents one of four specific graphs for general graph G13, the other three being, ABD:C, ACD:B, and BCD:A with independencies of (C ⊥ A, B, D), (B ⊥ A, C, D), and (A ⊥ B, C, D), respectively. Given data, each of these four specific graphs (ABC:D, ABD:C, ACD:B, and BCD:A) generates a unique probability distribution. Entropy 2021, 23, 986 5 of 39 Entropy 2021, 23, x FOR PEER REVIEW 5 of 42

G1

G2

G3

G4

G7 G5

G8 G6

G10 G9

G13 G11 G12

G14 G15 G16

G17 G18

G19

Entropy 2021, 23, x FOR PEER REVIEW 6 of 42 G20

FigureCA:CB:CD.Figure 1. 1.LatticeLattice These of offour-variable permutations four-variable RA RA neutralhave neutral the system same system general general general graphs independence graphs.. Structures Structures structurewith withbold bold(.boxes ⊥.., boxes (re-… | lations) are loopless. All lattices in this paper are for four variables. ….),(relations) (… ⊥ are… | loopless. ….), but All given lattices data, in thisproduce paper different are for four conditional variables. probability distributions. RA graphs can include pairwise and non-pairwise relations. For example, graph G15 has four lines (variables) and three boxes (relations). One line connects to all three boxes, meaning one variable is included in all three relations, and separately a single line repre- senting one of the other three variables extends from each box. Because only two lines extend from any given box, all relations in G15 are pairwise (dyadic). Figure 2 shows G15 with labels (A, B, C, D) added for the variables, yielding a specific structure having dyadic

relationsFigure 2. AD,RA specificBD, and graph CD. G15,In RA AD:BD:CD. notation, this graph is AD:BD:CD, the colon represents independenceFigure 2. RA specific among graph relations. G15, AD:BD:CD. The notati on AD:BD:CD encodes the independencies (𝐴 ⊥ 𝐵, 𝐶 | 𝐷), (𝐵 ⊥ 𝐶|𝐷). The example in Figure 2 represents one of four specific graphs for theIn general contrast graph to graph G15, G15 the which other includes possible dyadic permutations relations are only, AB:AC:AD, graph G13 BA:BC:BD, in Figure 1 is a hypergraph, with three lines extending from one box and a single line extending from the other. This could, for example, represent four variables A, B, C and D, where A, B, and C label the three lines extending from one box, and D labels the single line extending from the other box. Figure 3 shows this specific graph, which in RA notation is ABC:D with the

independence structure of (𝐷 ⊥ 𝐴,𝐵,𝐶). This example represents one of four specific graphs for general graph G13, the other three being, ABD:C, ACD:B, and BCD:A with independencies of (𝐶 ⊥ 𝐴,𝐵,𝐷), (𝐵 ⊥ 𝐴,𝐶,𝐷), and (𝐴 ⊥ 𝐵,𝐶,𝐷), respectively. Given data, each of these four specific graphs (ABC:D, ABD:C, ACD:B, and BCD:A) generates a unique probability distribution.

Figure 3. RA specific graph G13.

Figure 4 shows all of the general graphs from Figure 1 as well as all of the specific graphs associated with each general graph. There are 20 general graphs in the RA lattice and 114 specific graphs.

Searching the RA Neutral System Lattice The data are the top of the lattice, i.e., G1 ABCD, and one searches the lattice to find a good representation of the data. The lattice can be searched from the top down or from the bottom up or from some other starting model. Typically, a reference model, a specific graph, is selected to begin the search. Commonly, it is the independence (bottom) model, G20 A:B:C:D from Figure 4, that is selected as the reference model, and the lattice is searched upward to find the best model. The lattice may also be searched downward start- ing from the saturated (top) model, G1, or from a reference model in-between the bottom or top, searching up or down. The starting model does not have to be the reference model, but this is often the case. Commonly, when the lattice is being searched, the goal is to find a model (a specific graph) that adequately represents but is less complex than the data. This best characterizes a search downwards that (typically) starts from G1. When searching down the lattice, the goal is to search as far down the lattice as possible, resulting in the greatest complexity reduction from the reference model, while incurring the least amount of information loss, so that the model still adequately represents the data. Finding a simpler representation of the data reduces the complexity of the system under observation, allowing for greater understanding of the most important underlying relations. Alternatively, the goal is to find a model, a specific graph, that captures as much of the information in the data as possible, as long as its difference from mutual independence of the variables, i.e., G20 in Figure 4, is defensible, so the model is not overfit, and its application to new data is likely

Entropy 2021, 23, x FOR PEER REVIEW 6 of 42

CA:CB:CD. These permutations have the same general independence structure (. ⊥.., … | ….), (… ⊥ … | ….), but given data, produce different conditional probability distributions.

Figure 2. RA specific graph G15, AD:BD:CD.

In contrast to graph G15 which includes dyadic relations only, graph G13 in Figure 1 is a hypergraph, with three lines extending from one box and a single line extending from the other. This could, for example, represent four variables A, B, C and D, where A, B, and C label the three lines extending from one box, and D labels the single line extending from the other box. Figure 3 shows this specific graph, which in RA notation is ABC:D with the independence structure of (𝐷 ⊥ 𝐴,𝐵,𝐶). This example represents one of four specific graphs for general graph G13, the other three being, ABD:C, ACD:B, and BCD:A with independencies of (𝐶 ⊥ 𝐴,𝐵,𝐷), (𝐵 ⊥ 𝐴,𝐶,𝐷), and (𝐴 ⊥ 𝐵,𝐶,𝐷), respectively. Given Entropy 2021 23 , , 986 data, each of these four specific graphs (ABC:D, ABD:C, ACD:B, and BCD:A) generates a6 of 39 unique probability distribution.

Entropy 2021, 23, x FOR PEER REVIEW 7 of 42

FigureFigure 3. RA 3.specificRA specific graph graph G13. G13. to be more successful. This best characterizes a search upwards that (typically) starts from FigureG20.Figure For 4 showsdirected4 shows all systems of all the of general the where general predictiongraphs graphs from of from Figurea single Figure 1 DV as1 wellisas the well as aim, all as of alla highthe of thespecific information specific graphsgraphsmodel associated is associated one withthat witheachgives eachgeneral maximal general graph. reduction graph. There Thereof are the 20 are Shannon general 20 general graphsentropy graphs in (uncertainty) the in RA the lattice RA latticeof the and 114andDV. specific 114 specific graphs. graphs.

Searching the RA Neutral System LatticeG1 ABCD The data are the top of the lattice, i.e., G1 ABCD, and one searches the lattice to find a good representation of the data. The lattice can be searched from the top down or from G2 the bottom up or from some other starting model. Typically, a reference model, a specific ABC:ABD:ACD:BCD graph, is selected to begin the search. Commonly, it is the independence (bottom) model, G20 A:B:C:D from Figure 4, that is selected as the reference model, and the lattice is G3 ABC:ABD:ACD searched upward to find the best model. TheABC:ABD:BCD lattice may also be searched downward start- ing from the saturated (top) model, G1, or fromABC:ACD:BCD a reference model in-between the bottom ABD:ACD:BCD or top, searching up or down. The starting model does not have to beABC:ABD:CD the reference model, but this is often the case. G4 ABC:ACD:BD ABC:BCD:AD Commonly, when the lattice is being searched, the goal is to findABD:ACD:BC a model (a specific ABC:ABD ABD:BCD:AC graph) that adequatelyABC:ACD represents but is less complex than the data. ThisACD:BCD:AB best characterizes a search downwardsABC:BCD that G7(typically) starts from G5G1. WhenABC:DA:DB:DC searching down the lattice, the ABD:CA:CB:CD ABC:AD:BD ABD:ACD goal is to searchABD:BCD as far down the lattice as possible, resultingACD:BA:BC:BD in the greatest complexity ABC:AD:CD BCD:AB:AC:AD reductionABC:BD:CD fromACD:BCD the reference model, while incurring the least amount of information loss, ABD:AC:BC so ABD:AC:DCthat the model still adequatelyG8 represents theG6 data. Finding a simpler representation of ABC:AD ABD:BC:DC ABC:BDthe data reduces the complexity of the system under observation, allowing for greater ACD:AB:CB AB:AC:AD:BC:BD:CD ABC:CD ACD:AB:DB ABD:ACunderstanding of the most important underlying relations. Alternatively, the goal is to ACD:CB:DB ABD:BC findBCD:AB:AC a model, a specific graph, that captures as much of the information in the data as ABD:DC BCD:AB:AD ACD:ABpossible, as long as its difference from mutual independence of the variables, i.e., G20 in BCD:AC:AD ACD:CB G10 G9 AB:AC:AD:BC:BD ACD:DBFigure 4, is defensible, so the model is not overfit, and its application AB:AC:AD:BC:CDto new data is likely BCD:BA AB:AC:AD:BD:CD BCD:CA AB:AC:BC:BD:CD BCD:DA AB:AD:BC:BD:CD AC:AD:BC:BD:CD AB:AC:AD:BC G13 G11 G12 AC:AD:BC:BD AB:AC:AD:BD ABC:D AB:AD:BC:CD AB:AC:BC:BD ABD:C AB:AC:BD:CD AB:AD:BC:BD ACD:B AB:AC:AD:CD BCD:A AB:AC:BC:CD AC:AD:BC:CD AB:AC:AD AB:BC:CD AB:AD:BD:CD G14 BA:BC:BD G15 G16 AB:BD:DC AC:AD:BD:CD CA:CB:CD AB:AC:BC:D AC:CB:BD AB:BC:BD:CD DA:DB:DC AB:AD:BD:C AC:CD:DB AD:BC:BD:CD AC:AD:CD:B AD:DB:BC AC:BC:BD:CD BC:BD:CD:A AD:DC:CB CA:AB:BD DA:AB:BC AB:AC:D G17 G18 AB:CD BA:AC:CD AB:BC:D AC:BD DA:AC:CB AC:BC:D AD:BC AB:AD:C BA:AD:DC AB:BD:C CA:AD:DB AD:BD:C AC:AD:B G19 AB:C:D AC:CD:B AC:B:D AD:CD:B AD:B:C BC:BD:A BC:A:D BC:CD:A BD:A:C BD:CD:A CD:A:B G20

A:B:C:D

FigureFigure 4. 4.Lattice Lattice ofof RARA neutralneutral systemsystem general general and and specific specific graphs. graphs.

Given data, specific graphs can be tested for statistical significance. The Chi square statistical test can be used to test the difference between any candidate model and a refer- ence model, usually the data, G1, or the independence model, G20. As an alternative or in addition to such a statistical test, the Bayesian Information Criteria (BIC) and the Akaike

Entropy 2021, 23, 986 7 of 39

Searching the RA Neutral System Lattice The data are the top of the lattice, i.e., G1 ABCD, and one searches the lattice to find a good representation of the data. The lattice can be searched from the top down or from the bottom up or from some other starting model. Typically, a reference model, a specific graph, is selected to begin the search. Commonly, it is the independence (bottom) model, G20 A:B:C:D from Figure4, that is selected as the reference model, and the lattice is searched upward to find the best model. The lattice may also be searched downward starting from the saturated (top) model, G1, or from a reference model in-between the bottom or top, searching up or down. The starting model does not have to be the reference model, but this is often the case. Commonly, when the lattice is being searched, the goal is to find a model (a specific graph) that adequately represents but is less complex than the data. This best characterizes a search downwards that (typically) starts from G1. When searching down the lattice, the goal is to search as far down the lattice as possible, resulting in the greatest complexity reduction from the reference model, while incurring the least amount of information loss, so that the model still adequately represents the data. Finding a simpler representation of the data reduces the complexity of the system under observation, allowing for greater understanding of the most important underlying relations. Alternatively, the goal is to find a model, a specific graph, that captures as much of the information in the data as possible, as long as its difference from mutual independence of the variables, i.e., G20 in Figure4, is defensible, so the model is not overfit, and its application to new data is likely to be more successful. This best characterizes a search upwards that (typically) starts from G20. For directed systems where prediction of a single DV is the aim, a high information model is one that gives maximal reduction of the Shannon entropy (uncertainty) of the DV. Given data, specific graphs can be tested for statistical significance. The Chi square statistical test can be used to test the difference between any candidate model and a reference model, usually the data, G1, or the independence model, G20. As an alternative or in addition to such a statistical test, the Bayesian Information Criteria (BIC) and the Akaike Information Criteria (AIC) are among the other measures that can be used to decide on the best model.

2.2. RA Directed Systems 2.2.1. Conventional Directed System Lattice The RA lattice of directed systems shown in Figure5 is a sub-lattice of the complete neutral system lattice of Figure1. The purpose of the directed system lattice is to organize models that an IV-DV (explanatory-response) distinction and where prediction of the DV is the sole aim. In contrast, the neutral system lattice organizes models that do not make any IV-DV distinction; these models do not focus on a single response variable. There are fewer general graphs in the directed system lattice compared to the neutral system lattice because, by convention, we care only about models whose predictions of the DV are different and are not interested in identifying relations among the IVs. The word ‘directed’ in RA ‘directed systems’ has a meaning that is different from the meaning of the same word in BN ‘directed acyclic graphs’. In RA ‘directed systems’, this word means that the focus of modeling is on the relation of the dependent variable to the independent variables. It does not imply directionality of edges from the IVs to the DV as this word means in BN ‘directed acyclic graphs’. In the neutral system lattice of Figure1, any of the variables can be part of any relation. In contrast, in the standard directed system lattice, by convention, all of the IVs are always included in one of the relations (the “IV relation”); the other relations in the model include predictive IV-DV interactions (or the DV alone if there are no such interactions). In this paper, the DV in directed system specific graphs is called “Z” and the IVs are called A, B, C, and so on. For example, the first specific graph listed under G3 from Figure5, ABC:ABZ:ACZ, has all three IVs in the first relation, followed by two IV-DV relations. Aside from allowing for the presence of relations among the IVs (without specifying any Entropy 2021, 23, 986 8 of 39

such relations), the model says that there is a relation in which A and B might predict Z

Entropy 2021, 23, x FOR PEER REVIEW and another relation in which A and C might predict Z; the net predictive9 of 42 relation between A, B, C and Z is a maximum entropy fusion of these two predictive relations.

G1 ABCZ

G2

ABC:ABZ:ACZ:BCZ

G3 ABC:ABZ:ACZ ABC:ABZ:BCZ ABC:ACZ:BCZ

G4 ABC:ABZ:CZ ABC:ACZ:BZ ABC:BCZ:AZ

G7 G5 ABC:ABZ ABC:ACZ ABC:AZ:BZ:CZ ABC:BCZ

G8 G6 ABC:AZ:BZ ABC:AZ:CZ ABC:BZ:CZ

G10 G9 ABC:AZ ABC:BZ ABC:CZ

G13 G11 G12

ABC:Z

G14 G15 G16

G17 G18

G19

G20

Figure 5. Conventional RA directed system lattice. Structures with boxes (relations) in bold are loop- less.Figure 5. Conventional RA directed system lattice. Structures with boxes (relations) in bold are loopless. 2.2.2. Augmented Directed System Lattice FigureGeneral 6 augments graph the G13 conventional (ABC:Z) directed from Figuresystem 5lattice represents (on the left) independence of Figure 5 between the IVs with(ABC) a lattice and of additional the DV (Z), predictive thus theregraphs is (on no the relation right). These between additional the IVsgraphs and offer the DV, and graph G1 unique(ABCZ, analytical which results, is not but written that are as not ABC:ABCZ typically included because when ABC searching is embedded the hierar- in ABCZ) represents chically restricted directed system lattice. complete dependence among the IVs and the DV. It should be noted; however, that the directed system lattice of Figure5 is not entirely exhaustive. What restricts this lattice is that all models include the “IV relation”; this makes these models hierarchically nested, and

amenable to standard statistical tests. There are additional predictive graphs where this restriction is dropped that produce different predictions of Z than the models of Figure5 ; these additional graphs are discussed in the following Section 2.2.2. Figure5 shows all directed system general and specific graphs for four variables. The graphs that are greyed represent graphs from the neutral system lattice from Figure5 that Entropy 2021, 23, 986 9 of 39

are not part of the directed system lattice because they do not offer unique predictions of the DV. There are nine directed system general graphs and 19 specific graphs in contrast to the neutral system lattice, which has 20 general graphs and 114 specific graphs.

2.2.2. Augmented Directed System Lattice Figure6 augments the conventional directed system lattice (on the left) of Figure5 with a lattice of additional predictive graphs (on the right). These additional graphs Entropy 2021, 23, x FOR PEER REVIEWoffer unique analytical results, but that are not typically included when searching10 of 42 the

hierarchically restricted directed system lattice.

FigureFigure 6. Conventional 6. Conventional RA RA directed directed system system lattice lattice and and additionaladditional predictive predictive specific specific graphs. graphs. Structures Structures with withbold boxes bold boxes (relations)(relations) are loopless. are loopless.

InIn Figure Figure6, 6, the the graphs graphs inin thethe Additional Predictive Predictive Graphs Graphs lattice lattice are denoted are denoted by an by an apostropheapostrophe to to identify identifythat that the original original grap graphh was was altered altered by removing by removing the IV the relation. IV relation. For For example,example, the the bottom bottom relation relation in in graph graph G2, G2, interpreted interpreted asasthe theIV IV relation, relation, ABC,ABC, waswas removedre- to producemoved to anproduce additional an additional predictive predictive graph graph G20. This G2′. generalThis general graph graph has has only only one one specific specific graph, ABZ:ACZ:BCZ, which is analytically different from G2 graph, ABZ:ACZ:BCZ, which is analytically different from G2 (ABC:ABZ:ACZ:BCZ) be- (ABC:ABZ:ACZ:BCZ) because the ABC term in graph G2 imposes a constraint among the cause the ABC term in graph G2 imposes a constraint among the IVs that is not imposed IVs that is 0not imposed in0 graph G2′. Because G2′ does not follow the standard directed in graphsystem G2convention. Because of including G2 does the not IV followterms in the a first standard relation, directed it produces system a different convention pre- of includingdiction of the Z. The IV apostrophe-marked terms in a first relation, graphs are it producesless complex a than different the graphs prediction from which of Z. The apostrophe-markedthey are derived, and graphs so should are lessalso be complex considered than in the searches graphs for from good whichpredictive they models. are derived, 0 0 andG5 so′ and should G8′ from also Figure be considered 6 represent in naïve searches Bayes for equivalent good predictive RA graphs; models. G4′ isG5 alsoand a naïve G8 from FigureBayes-like6 represent graph. naïve This is Bayes discussed equivalent in Section RA 3.3. graphs; G4 0 is also a naïve Bayes-like graph. This is discussedA merger of in the Section conventional 3.3. directed system lattice with the additional predictive graphsA merger of Figure of the 6 gives conventional the augmented directed directed system system lattice lattice with in Figure the additional 7. The specific predictive graphsgraphs of from Figure G26′ and gives G3′ the from augmented Figure 6 are directed members systemof general lattice graphs in G3 Figure and G7,7. The respec- specific graphstively. from Three G2 general0 and G3 graphs0 from are Figure added6 areto the members augmented of general lattice, graphsnamely G3G10, and G15 G7, and respec- tively.G17; Three these are general G4′, G5 graphs′, and areG8′ addedfrom Figure to the 6, augmentedthe naïve Bayes lattice, or naïve namely Bayes-like G10, G15 equiv- and G17; alent RA general graphs. All of the specific structures that are added to the augmented lattice are denoted in bold letters in Figure 7. G13 is the independence model for the con- ventional directed system graphs. The augmented lattice also includes G20 A:B:C:D, which is the natural independence model for the additional predictive graphs that do not include the IV term (ABC). Including these additional predictive graphs in the directed

Entropy 2021, 23, 986 10 of 39

these are G40, G50, and G80 from Figure6, the naïve Bayes or naïve Bayes-like equivalent RA general graphs. All of the specific structures that are added to the augmented lattice are denoted in bold letters in Figure7. G13 is the independence model for the conventional Entropy 2021, 23, x FOR PEER REVIEW 11 of 42 directed system graphs. The augmented lattice also includes G20 A:B:C:D, which is the natural independence model for the additional predictive graphs that do not include the IV term (ABC). Including these additional predictive graphs in the directed system lattice systemincreases lattice the increases number the of predictivenumber of generalpredictive graphs general from graphs nine in from the nine conventional in the conven- directed tionalsystem directed lattice system to 12 lattice in the to augmented 12 in the augmented lattice and 19lattice specific and graphs19 specific in the graphs conventional in the conventionallattice to 31 lattice in the to augmented 31 in the augmented lattice. lattice.

FigureFigure 7. Augmented 7. Augmented RA RAdirected directed system system lattice. lattice. Structures Structures with with bold bold boxes boxes are areloopless; loopless; model model namesnames in bold in bold are areaugmentations. augmentations.

3. BN Lattice 3.1. BN Introduction A model, like an RA model, is a type of probabilistic graphical model. BN modeling originated from path models in the early 1900s [18,19] and was ex- panded as a field of study in the late 1900s by Pearl [21], Neapolitan [20] and others.

Entropy 2021, 23, 986 11 of 39

3. BN Lattice 3.1. BN Introduction A Bayesian Network model, like an RA model, is a type of probabilistic graphical model. BN modeling originated from path models in the early 1900s [18,19] and was expanded as a field of study in the late 1900s by Pearl [21], Neapolitan [20] and others. BNs are directed graphs: nodes represent variables, and edges represent relations. The graph structure or topology (variables, edges, orientations of edges) encodes inde- pendencies, and thus also dependencies, among the variables identified in a particular graph. Since BNs are directed graphs, edges typically have arrows or some form of notation representing directionality: A→B means that variable B is dependent upon variable A. (This dependency might be interpreted as a causal influence of A on B, but in this paper, we will not address such causal interpretations of BNs.) A is the ‘parent’ of B, which means that they are dependent. One variable is independent of all other variables given its parents. For example, in the BN A→B→C, variable C is independent of A given B, since B is the parent of C. A BN graph provides the structure from which a probability expression can be derived that describes the relation between variables. For example, the graph A→B provides the structure identifying the dependence between A and B, and probability values define the nature and strength of the relation between A and B. A unique feature of BNs versus other graphical models is in the independencies that are encoded when two edges converge. For example, in A→B←C the edges converge on variable B. If A and C are not directly connected by an edge, this convergence is called a V-structure [29]. This V-structure is interpreted as yielding the conditional distribution p(B|A,C)p(A)p(C), which encodes dependence among A, B, and C, but marginal independence between A and C. The interpretation is that together, but being independent of one another, A and C influence or cause or allow one to predict B. BNs are also acyclic graphs, meaning they have no closed paths following the arrows. For example, graph A→B→C→A is disallowed because it contains a cycle. Because BNs are acyclic, inference on all BN graphs can be performed in closed algebraic form. The primary differences between RA and BN are two-fold: (1) BNs are directed and acyclic whereas RA graphs are undirected and can have loops or not have loops and (2) some BN graphs contain converging edges, that is one or more V-structures that encode unique independence relations not found in RA graphs. The absence of a V-structure in a BN graph results in this graph being equivalent to some (loopless) RA graph. The presence of a V-structure results in the graph not having an RA equivalent and thus being unique to BN. This is discussed below in Section 3.2.6, in connection with Table 3.

3.2. BN Neutral Systems 3.2.1. Lattice of BN General Graphs As in RA, there are general BN graphs and specific BN graphs; in the BN literature gen- eral graphs are referred to as maximally oriented graphs [30], essential graphs [31], equiva- lence classes of directed acyclic graphs [32], and partially directed graphs (PDAG) [29]. In BN general graphs, the graph structure (variables, edges, and orientations of edges) results in a unique independence structure, where specific identities are not assigned to the variables. Figure8, developed by Harris and Zwick [ 33], shows all BN general graphs with four variables and their hierarchy. There are 20 BN general graphs in the lattice, i.e., 20 unique independence structures. The procedure to generate this lattice is outlined in Section 3.2.6. EntropyEntropy2021 2021, 23, 23, 986, x FOR PEER REVIEW 1213 ofof 39 42

Figure 8. Lattice of BN neutral system general graphs. Figure 8. Lattice of BN neutral system general graphs.

In Figure8, general graphs are labeled BN1, BN2 ... BN19, BN20. Solid squares represent variables; edges are represented by directed arrows from one square to another,

EntropyEntropy 2021 2021, ,23 23, ,x x FOR FOR PEER PEER REVIEW REVIEW 1414 ofof 4242

Entropy 2021, 23, x FOR PEER REVIEW 14 of 42

InIn Figure Figure 8, 8, general general graphs graphs are are labeled labeled BN1, BN1, BN2…BN19, BN2…BN19, BN20. BN20. Solid Solid squares squares repre- repre- Entropy 2021, 23, x FOR PEER REVIEW sent variables; edges are represented by directed arrows from one square to another,14 of rep-42 sent variables; edges are represented by directed arrows from one square to another, rep- resentingresentingIn Figure a a parent–child parent–child 8, general graphsdependency dependency are labeled relationship. relationship. BN1, BN2…BN19, The The dashed dashed BN20.lines lines with withSolid arrows arrows squares from from repre- one one Entropy 2021, 23, 986 sentgeneralgeneral variables; graph graph edgesto to another another are represented represent represent the bythe directhierar hieraredchychy arrows of of general general from graphs, onegraphs, square with with to parent another,parent13 ofgraphsgraphs 39rep- resentingbeingbeingIn aboveFigure above a parent–child child 8,child general graphs. graphs. graphsdependency Child Child are graphs graphs labeled relationship. result result BN1, from from BN2…BN19, The the the dashed deletion deletion BN20. lines of of one withone Solid edge edge arrows squares from from from the repre-the par-one par- sentgeneralentent variables;graph. graph. graph The The edgesto insert insert another are on on representedthe therepresent bottom bottom right the byright directhierar indica indicaedchytes tesarrows of structures structures general from graphs,thatone that squareare are withtopologically topologically to parent another, graphs differ- differ-rep- resentingbeingent from above a parent–childgraphs child graphs.in the dependency latticeChild graphsmarked relationship. result with from asterisks The the dashed deletion but have lines of one identicalwith edge arrows fromindependence from the onepar- representingresentingent from a graphs aparent–child parent–child in the dependency lattice dependency marked relationship. relationship. with asterisks The The dashed but dashed have lines lines identical with with arrows arrows independence from from one generalentstructures graph. graph The to tothese insert another marked on the represent bottomgraphs therightand hierarthus indica archyetes Markov of structures general equivalent graphs,that are (the withtopologically topological parent graphs differ- differ- onegeneralstructures general graph graph to these to to another another marked represent represent graphs and the the thushierar hierarchy archye Markov ofof generalgeneral equivalent graphs,graphs, (the with with topological parent parent graphs graphs differ- beingentence from above cannot graphs child be removed graphs.in the lattice Child by any markedgraphs labeling result with of fromtheasterisks variables). the deletionbut have For of example, identicalone edge BN2b independencefrom andthe par-BN2c beingbeingence above abovecannot child child be graphs.removed graphs. Child Childby any graphs graphs labeling result result of from the from thevariables). the deletion deletion For of one ofexample, one edge edge from BN2b from the and parentthe BN2cpar- structuresin the insert to these are topologically marked graphs different and thus but arhavee Markov the same equivalent independence (the topological structure asdiffer- BN2* graph.entin thegraph. The insert insert The are insert on topologically the on bottom the bottom different right right indicates but indica have structurestes the structures same that independence arethat topologically are topologically structure different asdiffer- BN2* ence cannot be removed by any labeling of the variables). For example, BN2b and BN2c fromentinin graphsthe fromthe lattice. lattice. graphs in the These These latticein theaddition addition marked latticealal withmarkedrepresentations representations asterisks with butasterisks are are have discu discu identical butssedssed have below below independence identical in in Section Section independence structures 3.2.2. 3.2.2. in the insert are topologically different but have the same independence structure as BN2* tostructures theseTable markedTable to 1 1these summarizes graphssummarizes marked and thus the graphsthe RA RA are and Markovand BNthus BN terminol equivalentterminolare Markovogyogy (the andequivalent and topological supports supports (the the differencethe topological discussion discussion cannot differ- of of BN BN in the lattice. These additional representations are discussed below in Section 3.2.2. beencethat removedthat cannotfollows. follows. by be anyEntries Entries removed labeling in in the theby of table anytable the labelingfor variables). for RA RA general generalof the For variables).and example, and specific specificBN2b For graphs graphs example, and (the BN2c(the latticesBN2b lattices in the and of of insert general generalBN2c Table 1 summarizes the RA and BN terminology and supports the discussion of BN areinandtopologicallyand the specificinsertspecific are graphsdifferentgraphs topologically fromfrom but haveFigures Figuresdifferent the 1 same1 but andand have independence 4,4, respectively)respectively) the same structureindependence havehave already asalready BN2* structure been inbeen the discussed lattice.discussedas BN2* that follows. Entries in the table for RA general and specific graphs (the lattices of general Theseinabove.above. the additional lattice. The The discussion discussionThese representations addition that that follows alfollows representations are discussed this this table table belowwill willare explain discuexplain in Sectionssed the the below additional additional3.2.2. in Section representations representations 3.2.2. of of andBNBNTable generalTable specificgeneral1 summarizes 1 graphssummarizes graphsgraphs in infrom the thethe the insert RAinsertFigures RA and of ofand Figure BN Figure1 BNand terminology terminol8, 8,4, an an respectively)dd will willogy andderive derive and supports supportsthe havethe lattice lattice already the the of discussionof specific discussionspecific been BN discussedBN of BNgraphsof graphs BN thatthatabove.summarizedsummarized follows. follows. The Entriesdiscussion Entries in in Figure Figure in in the thatthe 14 14 table tablepresen followspresen for forted RAted thisRA below generalbelow generaltable in inwill and Section Sectionand explain specific specific 3.2.5. 3.2.5. the graphs graphs additional (the (the lattices latticesrepresentations of of general general of andandBN specific generalspecific graphs graphs graphs from in from the Figures insert Figures1 ofand Figure 14 ,and respectively) 8, 4, an respectively)d will havederive already thehave lattice beenalready of discussed specific been BNdiscussed above. graphs Theabove.summarizedTableTable discussion 1.The 1. RA RA discussion and andin that FigureBN BN follows terminology. terminology. that 14 presen follows this table ted this below will table explain in will Section explain the additional3.2.5. the additional representations representations of BN of generalBN general graphs graphs in the in insert the insert of Figure of Figure8, and 8, an willd will derive derive the the lattice lattice of specific of specific BN BN graphs graphs Table 1. RAOur Ourand Termi- BNTermi- terminology.LiteratureLiterature Termi- Termi-LatticeLattice Name, Name, RA- RA- summarizedsummarized in in Figure Figure 14 14 presented presented below below in in Section Section 3.2.5 3.2.5.. VisualsVisuals nologynology nologynology likelike Notation Notation Our Termi- Literature Termi- Lattice Name, RA- G15G15 Table 1. RA andTable BN 1.terminology.RA and BN terminology. Visuals GeneralGeneralnology RA RA nology like Notation RARA G-structuresG-structures [7] [7] G15 G15 (Figure (Figure 1) 1) Our Termi-graph Literature Termi-LatticeLattice Name, Name, RA-Like RA- G15 Our Terminology Literaturegraph Terminology VisualsVisuals Generalnology RA nology likeNotation Notation Visuals RA nology G-structuresnology [7] like G15 Notation(Figure 1) graph G15 G15G15G15 General RA BB RA General RA graphRA General G-structuresSpecific RA RA [7 ]G-structuresSpecific RA graph[7] G15 G15G15 (Figure (Figure (Figure1) 1) 4), RA Specificgraph RA SpecificG-structures RA graph [7] G15G15 (Figure (Figure 1) 4), G15 graph BDBD graphgraph [15][15] AD:BD:CDAD:BD:CD B Specific RA Specific RA graph G15 (Figure 4), AA ADAD CDCD CC DD graph [15] AD:BD:CD G15BD Maximally ori- A AD B CD C Specific RA SpecificMaximally RA graph ori- G15 (Figure 4), D Specific RA graph SpecificSpecific RA RA graph Specific [15] RA graph G15 (FigureG15 (Figure4), AD:BD:CD 4), entedented graphs, graphs, es- es- BD graph Maximally[15] ori- AD:BD:CD sentialsential graphs, graphs, A AD CD C ented graphs, es- D GeneralGeneral BN BN equivalenceequivalence clas- clas- BN11*BN11* & & BN11b BN11b BNBN Maximallysential graphs, ori- Maximally orientedgraphgraph graphs, Maximallyses essentialses of of directed directed ori- (Figure(Figure 8) 8) graphs, equivalenceGeneral BN classes entedequivalence ofacyclic directed graphs, graphs, clas- es- BN11* & BN11b BN General BN graph BN acyclic graphs,BN11* & BN11b (Figure8) graph ses of directed (Figure 8) acyclic graphs, partiallypartiallysential directedpartially graphs, directed directed graphs [29–32] General BN equivalenceacyclicgraphs graphs, [29–32] clas- BN11* & BN11b BN graphs [29–32] graph partiallyses of directed directed (Figure 8) Labeled maxi- SpecificSpecific BN BN acyclicgraphsLabeled graphs, [29–32] maxi- BN11*,BN11*, BN11b BN11b Specific BN graph BN11*, BN11b (Figure 14), graphgraph (no-V- (no-V-partiallymallymally oriented directedoriented (Figure(Figure 14), 14), (no-V-structure) Specific BN Labeled maxi- BN11*,AD:BD:CD BN11b Labeled maximallystructure)structure) oriented graphsgraphs,graphs, graphs, [29–32] essential essential AD:BD:CDAD:BD:CD mally oriented essentialgraph graphs, (no-V- equivalencegraphs,graphs, classes equiva- equiva- (Figure 14),

of directedSpecificstructure) acyclic BN graphs, lencegraphs,lenceLabeled partially classes classes essential maxi- of of di- di- BN11*,AD:BD:CD BN11b graphSpecificdirectedSpecific (no-V- graphsBN BN mallygraphs,rected oriented equiva- acyclic (Figure 14), Specific BN graph rected acyclic BN17BN17BN17 (Figure (Figure (Figure 14), 14), 14), structure)graphgraph (V- (V- lencegraphs, classes essential of di- AD:BD:CD structure) graphs,graphs,graphs, essential partially partially AD:BD:CDB:C (V-structure) Specific BN BCDBCDBCDB:C:AB:C:A:A structure)structure) graphs,recteddirected equiva-acyclic graphs BN17 (Figure 14), graph (V- directed graphs lencegraphs, classes partially of di- BCDB:C:A Specificstructure) BN directedrected acyclic graphs rected acyclic BN17 (Figure 14), 3.2.2.3.2.2. Additional Additionalgraph (V- Representations Representationsgraphs, partially of of BN BN General General Graphs Graphs 3.2.2. Additional Representationsgraphs, ofpartially BN General BCD GraphsB:C:A ThereTherestructure) are are 20 20 general general directed graphs graphs graphs in in the the BN BN latti lattice.ce. However, However, eight eight of of these, these, marked marked with with 3.2.2.There Additional are 20 general Representations graphsdirected in graphs theof BN BN General lattice. However,Graphs eight of these, marked with asterisksasterisks in in Figure Figure 8, 8, namely namely BN2*, BN2*, BN4*, BN4*, BN5*, BN5*, BN9*, BN9*, BN11*, BN11*, BN14*, BN14*, BN15*, BN15*, and and BN16*, BN16*, asterisks in Figure8, namely BN2*, BN4*, BN5*, BN9*, BN11*, BN14*, BN15*, and BN16*, representThere areMarkov 20 general equivalence graphs classes in the thatBN lattiincludece. However, additional eight unique of these,edge topologiesmarked with that representrepresent Markov Markov equivalence equivalence classes classes that that include include additional additional unique unique edge edge topologies topologies that that 3.2.2.asterisks Additional in Figure Representations 8, namely BN2*, of BN4*,BN General BN5*, GraphsBN9*, BN11*, BN14*, BN15*, and BN16*, have identical probability distributions when applied to data. These additional topologies, representThere Markov are 20 general equivalence graphs classes in the that BN incllattiudece. However,additional eightunique of edgethese, topologies marked with that shownThere in the are insert 20 general at the bottomgraphs in right the ofBN Figure lattice.8, However, cannot be eight made of equivalent these, marked to the with asterisks in Figure 8, namely BN2*, BN4*, BN5*, BN9*, BN11*, BN14*, BN15*, and BN16*, representativeasterisks in Figure graphs 8, (thosenamely with BN2*, asterisks) BN4*, BN5*, by any BN9*, 1:1 mappingBN11*, BN14*, of unlabeled BN15*, variables.and BN16*, represent Markov equivalence classes that include additional unique edge topologies that Thisrepresent property, Markov described equivalence by Heckerman classes that [34 ],incl whoude showed additional that unique BNs with edge differing topologies edge that

topologies can have the same independence structure and thus the same probability

Entropy 2021, 23, x FOR PEER REVIEW 15 of 42

Entropy 2021, 23, x FOR PEER REVIEW 15 of 42

have identical probability distributions when applied to data. These additional topologies, shown in the insert at the bottom right of Figure 8, cannot be made equivalent to the rep- have identical probability distributions when applied to data. These additional topologies, resentative graphs (those with asterisks) by any 1:1 mapping of unlabeled variables. This shown in the insert at the bottom right of Figure 8, cannot be made equivalent to the rep- property, described by Heckerman [34], who showed that BNs with differing edge topol- resentative graphs (those with asterisks) by any 1:1 mapping of unlabeled variables. This ogies can have the same independence structure and thus the same probability distribu- property, described by Heckerman [34], who showed that BNs with differing edge topol- Entropy 2021, 23, 986 tion, is unique to BN and is not found in RA, where there is a single unique representation14 of 39 ogies can have the same independence structure and thus the same probability distribu- of each RA general graph. All general graphs in Figure 8 without an asterisk have no Mar- tion, is unique to BN and is not found in RA, where there is a single unique representation kov equivalent representations. of each RA general graph. All general graphs in Figure 8 without an asterisk have no Mar- Two Bayesian Networks are Markov equivalent if and only if they have the same distribution, iskov unique equivalent to BN representations. and is not found in RA, where there is a single unique skeleton and the same V-structure [28], resulting in the same underlying independence representation of eachTwo RA Bayesian general graph.Networks All generalare Markov graphs equivalent in Figure 8if without and only an if asterisk they have the same structure. The skeleton of a graph is its undirected representation. As already defined, a have no Markovskeleton equivalent and the representations. same V-structure [28], resulting in the same underlying independence V-structure occurs when two or more directed edges that are not themselves directly con- Two Bayesianstructure. Networks The skeleton are Markov of a graph equivalent is its un ifdirected and only representation. if they have the As same already defined, a nected by an edge converge on a single node. Figure 9 shows an example of Markov non- skeleton and theV-structure same V-structure occurs when [28], two resulting or more in directed the same edges underlying that are independencenot themselves directly con- equivalent (Example 1) and equivalent (Example 2) BN general graphs. structure. Thenected skeleton by an of aedge graph converge is its undirected on a single representation.node. Figure 9 shows As already an example defined, of Markov non- a V-structure occursequivalent when (Example two or more1) and directed equivalent edges (Example that are 2) notBN themselvesgeneral graphs. directly connected by an edge converge on a single node. Figure9 shows an example of Markov non-equivalent (Example 1) and equivalent (Example 2) BN general graphs.

Figure 9. Examples of Markov equivalence tests. Figure 9. Examples of Markov equivalence tests. BNs that are MarkovFigure equivalent 9. Examples define of Markov an eqequivalenceuivalence tests. class; this is illustrated by BNs that are Markov equivalent define an equivalence class; this is illustrated by BN2* BN2* in Figure 10 for which two other general graphs (BN2b and BN2c) included in the in Figure 10 for whichBNs twothat other are Markov general graphsequivalent (BN2b define and an BN2c) equivalence included class; in the this insert is illustrated by insert at the bottom of Figure 8 are in the same equivalence class. All three general graphs at the bottomBN2* of Figure in Figure8 are in 10 the for same which equivalence two other general class. All graphs three (BN2b general and graphs BN2c) are included in the are Markov equivalent because they have the same skeleton and V-structures, and thus Markov equivalentinsert becauseat the bottom they haveof Figure the same8 are in skeleton the same and equivalence V-structures, class. and All thus three the general graphs the same independence structure, but they have semantically different edge orientations. same independenceare Markov structure, equivalent but they because have semantically they have differentthe same edge skeleton orientations. and V-structures, BN2* and thus BN2* was chosen arbitrarily to represent this equivalence class and its unique independ- was chosen arbitrarilythe same toindependence represent this structure, equivalence but they class ha andve semantically its unique independence different edge orientations. ence structure. BN2b and BN2c have the same independence structure, and for corre- structure. BN2bBN2* and was BN2c chosen have arbitrarily the same independence to represent th structure,is equivalence and for class corresponding and its unique independ- sponding variable labels, have identical probability distributions. variable labels,ence have structure. identical BN2b probability and BN2c distributions. have the same independence structure, and for corre- sponding variable labels, have identical probability distributions.

Figure 10. BN2*, BN2b, BN2c. Figure 10. BN2*, BN2b, BN2c. A BN general graph is represented in the literature by an unlabeled PDAG [[29],29], also known as a Maximally Oriented Graphs [[30],30], Essential Graph [[31]31] and equivalence classes A BN general graph is represented in the literature by an unlabeled PDAG [29], also of directed acyclic graphs [32]. In a PDAG, edges can be directed, undirected or a mix of known as a Maximally Oriented Graphs [30], Essential Graph [31] and equivalence classes directed and undirected. A PDAG includes edge direction when a V-structure is present and removes edge direction when no V-structure is present. If there are no V-structures in a given BN, all edges are undirected in its PDAG representation. Figure 11 shows the PDAG representation of the graphs shown in the insert at the bottom of Figure8. (PDAG2 encompasses BN2b and BN2c, etc.) Undirected edges can have either direction as long as a cycle is not created and also a V-structure is not created that is represented by another BN general graph. For example PDAG16, labeling variables A, B, C, D in order of left to right, top to bottom could be oriented as B←D→C (BN16*) or B→D→C (BN16b) (or its mirror Entropy 2021, 23, x FOR PEER REVIEW 16 of 42

Entropy 2021, 23, x FOR PEER REVIEW 16 of 42

of directed acyclic graphs [32]. In a PDAG, edges can be directed, undirected or a mix of ofdirected directed and acyclic undirected. graphs A[32]. PDAG In a PDAG,includes edg edesge candirection be directed, when aundirected V-structure or is a presentmix of directedand removes and undirected. edge direction A PDAG when includes no V-structure edge direction is present. when If there a V-structure are no V-structures is present andin a removes given BN, edge all edgesdirection are whenundirected no V-structure in its PDAG is present. representation. If there areFigure no V-structures11 shows the inPDAG a given representation BN, all edges of are the undirected graphs shown in its in PDAGthe insert representation. at the bottom Figure of Figure 11 shows8. (PDAG2 the PDAGencompasses representation BN2b and of theBN2c, graphs etc.) shown Undirected in the edges insert canat the have bottom either of direction Figure 8. as(PDAG2 long as encompassesa cycle is not BN2b created and and BN2c, also etc.) a V-structure Undirected is edgesnot created can have that either is represented direction byas longanother as Entropy 2021, 23, 986 15 of 39 aBN cycle general is not graph. created For and example also a V-structure PDAG16, labeling is not created variables that A, is B, represented C, D in order by anotherof left to BNright, general top to graph. bottom For could example be oriented PDAG16, as B labeling←D→C (BN16*) variables or A, B→ B,D C,→C D (BN16b) in order (or of itsleft mir- to right,ror image) top to butbottom could could not be be oriented oriented as as B B→←DD←→C,C (BN16*)because orthat B →createsD→C (BN16b)a V-structure (or its result- mir- rorimage)ing image) in a but different but could could independence not not be be oriented oriented structure as Bas→ BD→ ←representedD←C,C, because because separately that that creates creates by a BN17.a V-structure V-structure resulting result- ingin ain different a different independence independence structure structure represented represented separately separately by BN17.by BN17.

Figure 11. PDAGs for graphs in Figure 8 insert. FigureFigure 11. 11. PDAGsPDAGs for for graphs graphs in in Figure Figure 8 insert. Although representation of an entire Markov equivalence class in a single PDAG is useful,AlthoughAlthough the PDAG representation representation does not visibly of of an an displayentire entire Mark the Markov factov equivalencethat equivalence semantically class class indifferent ina single a single edge PDAG PDAG topol- is useful,isogies useful, inherethe thePDAG in PDAG many does does BNnot generalvisibly not visibly displaygraphs display the(in fact8 theof that20 fact general semantically that semanticallygraphs different in the different four-variable edge topol- edge ogiestopologieslattice). inhere Use inhere inof manyFigure in many BN 8 to general BNdisplay general graphs the graphs BN (in ge neral (in8 of 8 20 ofgraph 20general general lattice graphs graphs opts insteadin in the the four-variable four-variableto show rep- lattice).lattice).resentatives Use Use of of ofFigure these Figure classes 8 8to to display displayand also the thetheir BN BN gealteneral generalrnative graph graphtopologies lattice lattice opts in the opts instead insert instead toat showthe to bottom show rep- resentativesrepresentativesof the figure. of these of these classes classes and also and their also alte theirrnative alternative topologies topologies in the insert in the at insert the bottom at the ofbottom the figure. of the figure. 3.2.3. BN Specific Graph Notation 3.2.3. BN Specific Graph Notation 3.2.3. BNA BN Specific specific Graph graph Notation is simply a labeled BN general graph. As summarized in Table A BN specific graph is simply a labeled BN general graph. As summarized in Table1, 1, weA BNuse specificthe terminology graph is simplyof “specific a labeled graph” BN forgeneral what graph. in the As BN summarized literature is in called Table a we use the terminology of “specific graph” for what in the BN literature is called a labeled 1,labeled we use maximally the terminology oriented of graph “specific or essentia graph”l graphfor what or equivalence in the BN literatureclass of directed is called acy- a maximally oriented graph or essential graph or equivalence class of directed acyclic graphs labeledclic graphs maximally or partially oriented graph;or essentia thesel graph four differentor equivalence terms classall refer of directed to the sameacy- or partially directed graph; these four different terms all refer to the same thing. All specific clicthing. graphs All specific or partially graphs directed for a given graph; BN these general four graph different class canterms be generatedall refer to by the permut- same graphs for a given BN general graph class can be generated by permuting all possible thing.ing all All possible specific variable graphs forlabels. a given Given BN data, general two graph BN specific class can graphs be generated with different by permut- labels variable labels. Given data, two BN specific graphs with different labels from the same BN ingfrom all the possible same BNvariable general labels. graph Given class data, will producetwo BN differentspecific graphs probability with distributions.different labels general graph class will produce different probability distributions. from theThe same notation BN generalthat we graphuse for class BN specificwill produce graphs different is derived probability from the distributions. RA notation de- The notation that we use for BN specific graphs is derived from the RA notation scribedThe previously.notation that As we in useRA, for the BN colon specific represents graphs marginal is derived or fromconditional the RA independence notation de- described previously. As in RA, the colon represents marginal or conditional independence scribedamong previously.variables and As relations. in RA, the For colon example, represents Figure marginal 12 shows or a conditional labeled version independence of RA gen- among variables and relations. For example, Figure 12 shows a labeled version of RA among variables and relations. For example, Figure 12 shows a labeled version of RA gen- generaleral graph graph G15 G15 and and BN BN general general graph graph BN11* BN11* which which can can also also equivalently equivalently be be represented eral graph G15 and BN general graph BN11* which can also(𝐴 equivalently ⊥ 𝐵, 𝐶 | 𝐷) (𝐵be represented ⊥ 𝐶 | 𝐷) by BN11b, bothboth ofof whichwhich havehave thethe samesame independenciesindependencies (A ⊥B, C | D), (B ⊥C | D) ,, thethe by BN11b, both of which have the same independencies𝑝(𝐴|𝐷)𝑝(𝐵|𝐷)𝑝(𝐶|𝐷)𝑝(𝐷) (𝐴 ⊥ 𝐵, 𝐶 |, 𝐷) (𝐵 ⊥ 𝐶 | 𝐷), the same conditionalconditional probabilityprobability distributiondistribution p(A|D)p(B|D)p(C|D)p(D) andand thus thus the samesame same conditional probability distribution 𝑝(𝐴|𝐷)𝑝(𝐵|𝐷)𝑝(𝐶|𝐷)𝑝(𝐷) and thus the same notation AD:BD:CD. notation AD:BD:CD.

Figure 12. RA and BN notation example,example, without subscripts. Figure 12. RA and BN notation example, without subscripts. RA notation must be modifiedmodified to accommodateaccommodate the V-structures that are uniqueunique toto BNs and not found in RA; this is done by adding subscripts that specify the independence BNsRA and notation not found must in RA; be modifiedthis is done to byaccommoda adding subscriptste the V-structures that specify that the areindependence unique to BNsrelations and not encoded found in by RA; the this V-structures. is done by (Foradding a BN subscripts graph without that specify a V-structure, the independence BN nota- tion is identical to the RA notation.) For example, BN17 in Figure 13a has the notation BCDB:C:A, where the colon between BCDB:C and A states the independency (A ⊥ B, C, D), namely that A is marginally independent of B, C, and D. The subscript B:C states marginal independence between B and C within the triadic, dependent, BCD relation. Figure 13b shows the more complex BN4, which has a V-structure in which A, B, and C have arrows going to D; this means that it has a tetradic dependency between A, B, C and D, which will be reflected in a p(D|ABC) in the probability expression for this graph. The graph also has the single independency (A ⊥ B | C) . The notation for this graph is thus ABCDAC:BC, which preserves the dependency between A, B, C, and D, and also encodes the condi- tional independence between A and B given C. (In RA, this conditional independence is Entropy 2021, 23, x FOR PEER REVIEW 17 of 42

relations encoded by the V-structures. (For a BN graph without a V-structure, BN notation is identical to the RA notation.) For example, BN17 in Figure 13a has the notation BCDB:C:A, where the colon between BCDB:C and A states the independency (𝐴 ⊥ 𝐵,𝐶,𝐷), namely that A is marginally independent of B, C, and D. The subscript B:C states marginal independence between B and C within the triadic, dependent, BCD relation. Figure 13b shows the more complex BN4, which has a V-structure in which A, B, and C have arrows going to D; this means that it has a tetradic dependency between A, B, C and D, which will be reflected in a 𝑝(𝐷|𝐴𝐵𝐶) in the probability expression for this graph. The graph Entropy 2021, 23, 986 also has the single independency (𝐴 ⊥ 𝐵 | 𝐶) . The notation for this graph is 16thus of 39 ABCDAC:BC, which preserves the dependency between A, B, C, and D, and also encodes the conditional independence between A and B given C. (In RA, this conditional inde- pendenceexpressed is byexpressed saying thatby sayingT(AC that: BC )𝑇(𝐴𝐶:= TC(A 𝐵𝐶): B) =C 𝑇 (𝐴:0, where 𝐵) T, 0 where is information-theoretic T is information- theoretictransmission.) transmission.)

Figure 13. BN notation examples with subscripts. (a) BCD :A; (b) ABCD . Figure 13. BN notation examples with subscripts. (a) BCDB:CB:C:A; (b) ABCDAC:BCAC:BC. 3.2.4. BN Independencies and Probability Distributions 3.2.4. BN Independencies and Probability Distributions As has been repeatedly stated in the above discussion, the marginal or conditional independenceAs has been between repeatedly variables stated andin the relations above isdiscussion, what uniquely the marginal specifies or an conditional RA or BN independencemodel. “It is knownbetween that variables the statistical and relations meaning is what of any uniquely causal modelspecifies can an be RA described or BN model.economically “It is known by its stratifiedthat the statistical protocol, mean whiching is of a listany of causal independence model can statements be described that economicallycompletely characterize by its stratified the model” protocol, [22 ,which23,28]. is The a list method of independence to determine statements BN independen- that completelycies is known characterize as D-separation, the model” and [22,23,28] is described. The in method the Appendix to determine A.2. BN To determine independen- the cieslist is of known independence as D-separation, statements and that is described completely in describe the Appendix any BN, A.2. D-separation To determine is appliedthe list ofto independenceall possible independence statements that statements completely for describe a given BN.any ThoseBN, D-separation satisfying independence is applied to allamong possible variables independence are retained statements and represent for a give the setn BN. of independencies Those satisfying that independence fully describe amongthe structure variables of relationsare retained within and arepresent given BN. the For set four of independencies variables, Table that2 provides fully describe all pos- thesible structure independence of relations statements. within a For given a given BN. For BN, four with variables, node labels Table and 2 directedprovides edges, all pos- all sibleindependence independence statements statements. from For this a table given need BN to, with be tested. node labels Independence and directed statements edges, thatall independenceare satisfied are statements kept, and from represent this table the setneed of independenciesto be tested. Independence that fully describe statements that that BN. are satisfied are kept, and represent the set of independencies that fully describe that BN. Table 2. Four-variable independence statements. Table 2. Four-variable independence statements. Marginal Independence Conditional Independence General Marginal Independence Conditional Independence (. ⊥ ..)(. ⊥ .., . . .)(. ⊥ .., . . . , . . . .) (. ⊥ .. | ...)(. ⊥ .. | ...,....) GeneralExpression Expression (. ⊥ . . ) (.⊥ ..,…) (.⊥ ..,…,….) (.⊥ ..| ...) (.⊥ ..| …,….) Specific Expression (𝐴 ⊥ 𝐵) (𝐴 ⊥ 𝐵,𝐶) (𝐴 ⊥ 𝐵,𝐶,𝐷) (𝐴 ⊥ 𝐵 | 𝐶) (𝐴 ⊥ 𝐵 | 𝐶, 𝐷) Specific Expression 1 (A ⊥ B)(A ⊥ B, C)(A ⊥ B, C, D) (A ⊥ B | C)(A ⊥ B | C, D) 21 (𝐴 ⊥ 𝐶) (𝐴 ⊥ 𝐵,𝐷) (𝐵 ⊥ 𝐴,𝐶,𝐷) (𝐴 ⊥ 𝐵 | 𝐷) (𝐴 ⊥ 𝐶 | 𝐵, 𝐷) 2 (A ⊥ C)(A ⊥ B, D)(B ⊥ A, C, D) (A ⊥ B | D)(A ⊥ C | B, D) 3 (𝐴 ⊥ 𝐷) (𝐴 ⊥ 𝐶,𝐷) (𝐶 ⊥ 𝐴,𝐵,𝐷) (𝐴 ⊥ 𝐶 | 𝐷) (𝐴 ⊥ 𝐷 | 𝐵, 𝐶) 3 (A ⊥ D)(A ⊥ C, D)(C ⊥ A, B, D) (A ⊥ C | D)(A ⊥ D | B, C) 4 (𝐵(B ⊥ C ⊥ )( 𝐶) (𝐵B ⊥ ⊥ 𝐴A,𝐶), C )((𝐷D ⊥ ⊥ 𝐴A,𝐵,𝐶), B, C ) (𝐵(B ⊥ ⊥ 𝐴A | |𝐶)C )((𝐵B ⊥ ⊥ 𝐴A || 𝐶,C, D 𝐷) ) 5 (𝐵(B ⊥ D ⊥ ) 𝐷) (𝐵(B ⊥ ⊥ 𝐴A,𝐷, D) ) (𝐵(B ⊥ ⊥ 𝐴A | |𝐷)D )((𝐵B ⊥ ⊥ C 𝐶| |𝐴A,𝐷), D ) 6 (𝐶(C ⊥ D ⊥ )( 𝐷) (𝐵B ⊥ ⊥ C ,𝐶,D 𝐷)) (𝐵(B ⊥ ⊥ C 𝐶| D| 𝐷))((𝐵B ⊥ ⊥ D 𝐷| |𝐴A,𝐶), C ) (C ⊥ A B) (C ⊥ A | B)(C ⊥ A | B D) 7 (𝐶 ⊥ 𝐴,𝐵), (𝐶 ⊥ 𝐴 | 𝐵) (𝐶 ⊥ 𝐴 | 𝐵,, 𝐷) 8 (C ⊥ A, D) (C ⊥ A | D)(C ⊥ B | A, D) 89 (𝐶(C ⊥ ⊥ 𝐴B,𝐷), D ) (𝐶(C ⊥ ⊥ 𝐴B | |𝐷)D )((𝐶C ⊥ ⊥ D 𝐵| |𝐴A,𝐷), B )

109 (𝐶(D ⊥ ⊥ A ,𝐵,B 𝐷)) (𝐶(D ⊥ ⊥ A 𝐵 | B| 𝐷))((𝐶D ⊥ ⊥ A 𝐷| |𝐴B,𝐵), C ) 1011 (𝐷(D ⊥ ⊥ 𝐴 A,𝐵), C ) (𝐷(D ⊥ ⊥ 𝐴A | 𝐵)| C )((𝐷D ⊥ ⊥ 𝐴B || 𝐵,A, C 𝐶) ) ⊥ ( ⊥ )( ⊥ ) 1112 (𝐷(D ⊥ 𝐴B,𝐶), C ) (𝐷D ⊥ 𝐴B | |𝐶)C (𝐷D ⊥ C 𝐵| |𝐴A,𝐶), B

D-separation can also be used to test the Markov equivalence of any labeled BNs. If two BNs have the same independencies as revealed by D-separation tests, they are in the same Markov equivalence class and thus the same BN general graph. The prior section, however, provided a simpler way, illustrated above in Figure9, to test for Markov equivalence of two BNs with different edge topologies.

3.2.5. Lattice of BN General and Specific Graphs The BN literature on lattices predominately focuses on search to find the best BN given a scoring metric. Implicit in these search algorithms is a lattice of candidate graphs being explored in search of the best model. Chickering [35] and others have shown the search problem to be NP-hard, with four variables there are 543 possible BNs, with Entropy 2021, 23, 986 17 of 39

10 variables there are O(10ˆ18) [36]. Because of this, research in this has focused less on characterizing exhaustively the lattice of BN graphs, and more on advancing search heuristics to efficiently traverse the lattice to identify the best BN given a scoring metric [37–46], and others. Heckerman [34] first showed that BNs with differing edge topologies can have the same independence structure and the same probability distribution, herein described as BN specific graphs. In contrast to heuristics that search all BNs, search heuristics for BN specific graphs have proven to be more efficient because they reduce the dimensionality of search space [29,31,32,40,47–50], and others. For four variables, this approach reduces the search space from 543 BNs to 185 BN specific graphs [31]. These 185 BN specific graphs can be summarized by 20 BN general graphs all with unique independence structures when variable labels are removed. Building from the RA work of Klir [8] and Zwick [14], and the BN work of Pearl [21–23,51], Verma [28], Heckerman [34], Chickering [29,35,40,52,53], Andersson [31], Rubin [54], and others, the following procedure was used to generate the four variable BN general and specific graph lattice of Figure 14 in a way that can be integrated with the RA general graph lattice. While this procedure is applied in this paper to four variables, it could in principle be used for any number of variables, although of course as the number of variables increases the effort required increases exponentially.

3.2.6. BN Neutral System General and Specific Graph Procedure The procedure to generate the BN neutral system general and specific graph lattice for any number of variables is as follows: 1. Assign labels arbitrarily to the n solid squares representing variables. 2. Generate all graphs for these n variables by permuting all possible edge connections and edge orientations. Eliminate graphs with cycles. The result is the set of all labeled directed acyclic graphs for n variables. 3. For each directed acyclic graph, determine its independence structure using the D- separation procedure [55] detailed in Appendix A.2. This identifies which of the independence statements in Table2 apply to the graph. 4. Collect together all graphs with the same unlabeled independencies. The set of these DAGs comprise a general graph equivalence class. 5. For each general graph equivalence class, collect together all graphs with the same labeled independencies into specific graph equivalence classes. List the RA notation for each of these specific graphs. 6. Select one specific graph equivalence class to represent the general graph, and from this specific graph equivalence class, select a single edge topology to represent the general graph. List any additional equivalent general graphs with unique edge topologies separately, as was done in the insert in Figures8 and 14. 7. Organize general graphs into levels based upon the number of edges in each general graph and link hierarchically nested general graphs in the lattice to reflect parent-child general graphs. Figure 14 shows the result of following this procedure for four variables. This BN general and specific graph lattice can be directly compared with the RA general and specific graph lattice. The RA lattice can also be extended to include the BN lattice. The comparison and extension will be discussed in Section4. Table3 lists specific graph representatives for each of the general graphs in Figure 14. These specific graphs, highlighted in bold in Figure 14, assume that nodes are labeled in the order A, B, C, D from left to right, top to bottom, which is the labeling convention throughout this paper. The notation for a BN specific graph without a V-structure is identical to the RA notation. As in RA, the colon represents marginal or conditional independence among variables. For a BN graph with a V-structure, the notation adds subscripts to represent the independence relations encoded by the V-structure, which are unique to BNs and not found in RA. (See the Section 3.2.3. for more details on this notation). Entropy 2021, 23, 986 18 of 39

Entropy 2021, 23, x FOR PEER REVIEWThus, graphs in Table3 without subscripts are equivalent to an RA graph and graphs19 withof 42 subscripts are unique to BN. Equivalence and non-equivalence between RA and BN graphs will be discussed in Section4.

Figure 14. Lattice of general and specificspecific BN neutral systemsystem graphs.graphs.

3.2.6. BN Neutral System General and Specific Graph Procedure The procedure to generate the BN neutral system general and specific graph lattice for any number of variables is as follows: 1. Assign labels arbitrarily to the n solid squares representing variables.

Entropy 2021, 23, 986 19 of 39

Table 3. Probability distribution and independencies of BN specific graph examples.

Specific Graph Example BN General Graph RA Notation Probability Distribution Independencies BN1 ABCD p(B|A)p(A)p(C|AB)p(D|ABC) none BN2 ACD:BCD p(A|CD)p(C)p(B|CD)p(D|C) (A ⊥ B | C, D)

BN3 ABCDA:B p(C|AB)p(A)p(B)p(D|ABC) (A ⊥ B)

BN4 ABCDAC:BC p(A|C)p(C)p(B|C)p(D|ABC) (A ⊥ B | C) BN5 BCD:AD p(A|D)p(D)p(B|CD)p(C|D) (A ⊥ B, C | D)

BN6 ABCDBC:A p(B|C)p(C)p(D|ABC)p(A) (A ⊥ B, C)

BN7 BCD:ABDA:B p(C|BD)p(B)p(D|AB)p(A) (A ⊥ B), (A ⊥ C | B, D)

BN8 ACDC:D:BCDC:D p(A|CD)p(C)p(D)p(B|CD) (C ⊥ D), (A ⊥ B | C, D)

BN9 ABD:ABCAC:BC p(A|C)p(C)p(B|C)p(D|AB) (A ⊥ B | C), (C ⊥ D | A, B) BN10 BCD:A p(B|C)p(C)p(D|BC) (A ⊥ B, C, D) BN11 AD:BD:CD p(A|D)p(D)p(B|D)p(C|D) (A ⊥ B, C | D), (B ⊥C | D)

BN12 ABCDA:B:C p(D|ABC)p(A)p(B)p(C) (A ⊥ B, C), (B ⊥ C)

BN13 ACDA:C:BD p(B|D)p(D|AC)p(A)p(C) (A ⊥ C), (B ⊥ A, C |D) BN14 AD:BC:BD p(A|D)p(D)p(B|D)p(C|B) (A ⊥ B | D), (C ⊥ A, D | B)

BN15 ABDA:B:BC p(C|B)p(B)p(D|AB)p(A) (A ⊥ B, C), (C ⊥ D | A, B) BN16 BD:CD:A p(B|D)p(D)p(C|D)p(A) (B ⊥ C | D), (A ⊥ B, C, D) EntropyBN17 2021, 23, x FOR PEER REVIEW BCD :A p(D|BC)p(B)p(C)p(A) (B ⊥ C), (A ⊥ B, C, D) 21 of 42 B:C BN18 AD:BC p(C|B)p(B)p(D|A)p(A) ( A, D ⊥ B, C) BN19 CD:A:B p(D|C)p(C)p(A)p(B) (B ⊥ C, D), (A ⊥ B, C, D) BN9 ABD:ABCAC:BC p(A|C)p(C)p(B|C)p(D|AB) (A ⊥ B | C), (C ⊥ D | A, B) BN20 A:B:C:D p(A)p(B)p(C)p(D) (A ⊥ B, C, D), (B ⊥ C, D), (C ⊥ D) BN10 BCD:A p(B|C)p(C)p(D|BC) (A ⊥ B, C, D) BN11 AD:BD:CD p(A|D)p(D)p(B|D)p(C|D) (A ⊥ B, C | D), (B ⊥C | D)

BN12 Table3ABCD showsA:B:C for eachp(D|ABC)p(A)p(B)p(C) BN general graph from Figure (A ⊥ 14B, C),a specific (B ⊥ C) graph with its RA BN13 notation, probabilityACDA:C:BD distribution,p(B|D)p(D|AC)p(A)p(C) and minimal list of (A independencies ⊥ C), (B ⊥ A, C |D) resulting from the BN14 D-separationAD:BC:BD procedure. Thep(A|D) probabilityp(D)p(B|D)p(C|B) distribution (A is ⊥ obtained B | D), (C as ⊥ follows: A, D | B)(1) For each BN15 labeled nodeABD ofA:B a:BC BN specificp(C|B)p(B)p(D|AB)p(A) graph, list each node’s individual (A ⊥ B, C), (C probability ⊥ D | A, B) expression as BN16 the probabilityBD:CD:A of the nodep(B|D)p(D)p(C|D)p(A) given its parents, i.e., p(node(B ⊥| parents C | D), (A) ; if⊥ thereB, C, D) are no parents, BN17 simply theBCDp(nodeB:C:A) . (2) Joinp(D|BC)p(B)p(C)p(A) the list of probability expressions. (B ⊥ C), (A For⊥ B, example, C, D) for BN2* in BN18 Figure 15,AD:BC the individual p(C|B)p(B)p(D|A)p(A) probability expressions are( A,p( DA |⊥C ,B,D C)) for A, p(B|C, D) for B, BN19 p(C) for C,CD:A:B and p (D|C) forp(D|C)p(C)p(A)p(B) D. Joining these gives p(A(B| C⊥, DC,) D),p( B(A|C ⊥, D B,) C,p( CD)) p(D|C) . (The BN20 table omitsA:B:C:D the commas forp(A)p(B)p(C)p(D) variables that are given in (A conditional ⊥ B, C, D), probability(B ⊥ C, D), (C terms.) ⊥ D)

p(A|C,D)p(B|C,D)p(C)p(D|C)

Figure 15. Probability distribution for BN2* example. Figure 15. Probability distribution for BN2* example. The equivalence or non-equivalence of RA and BN graphs is discussed in detail in SectionThe equivalence4, below, butor non-equivalence Table3 provides of an RA advanced and BN lookgraphs at thisis discussed issue. Any in BNdetail general in Sectiongraph 4, below, with a but specific Table graph 3 provides example an whoseadvanced RA look notation at this does issue. not Any include BN subscriptsgeneral is graphequivalent with a specific to some graph general example RA graph; whose there RA notation are 10 of does these not BN include general subscripts graphs. Any is BN equivalentgeneral to graph some withgeneral a specific RA graph; graph there example are 10 whoseof these notation BN general includes graphs. subscripts Any BN is not general graph with a specific graph example whose notation includes subscripts is not equivalent to any general RA graph; there are also 10 of these BN general graphs, which all have V-structures.

3.3. BN Directed Systems The BN discussion so far has focused on BN neutral systems in which an IV-DV dis- tinction is not made. This section narrows the focus to BN predictive graphs, analogous to RA directed systems, where the aim is to predict a single DV given the IVs. As in RA, we define Z as the dependent variable in the BN directed system lattice, replacing variable D in the neutral system lattice. We designate as the DV in a given BN any node with the exception of a parent node within a V-structure. That is, we do not consider here the pos- sibility that a parent node within a V-structure could be designated as a DV; this will be discussed further in Section 5. As is the case for RA, many graphs in the neutral system lattice are redundant when the aim is only to predict the DV. The BN directed system lattice of Figure 16, where only graphs with unique predictions of Z are highlighted, is thus a subset of the BN neutral system lattice of Figure 14. For each general graph in Figure 16 with a unique prediction, associated specific graphs are listed. Specific graphs that are bolded correspond to the displayed BN edge orientation and edge connections assuming labeling of nodes from top left, top right, bottom left, bottom right as A, B, C, Z respec- tively. These bolded specific graphs also correspond to the examples in below Table 4. Graphs not highlighted in Figure 16 are equivalent in their predictions to highlighted graphs. (Asterisks in this figure have the same meaning they have in BN Figures 8 and 14). For two graphs with identical predictions, the graph with the least degrees of freedom was selected. There are eight general graphs and 18 specific graphs in the BN directed system lattice; this is a significant compression of the BN neutral system lattice that in- cludes 20 general graphs and 185 specific graphs.

Entropy 2021, 23, 986 20 of 39

equivalent to any general RA graph; there are also 10 of these BN general graphs, which all have V-structures.

3.3. BN Directed Systems The BN discussion so far has focused on BN neutral systems in which an IV-DV distinction is not made. This section narrows the focus to BN predictive graphs, analogous to RA directed systems, where the aim is to predict a single DV given the IVs. As in RA, we define Z as the dependent variable in the BN directed system lattice, replacing variable D in the neutral system lattice. We designate as the DV in a given BN any node with the exception of a parent node within a V-structure. That is, we do not consider here the possibility that a parent node within a V-structure could be designated as a DV; this will be discussed further in Section5. As is the case for RA, many graphs in the neutral system lattice are redundant when the aim is only to predict the DV. The BN directed system lattice of Figure 16, where only graphs with unique predictions of Z are highlighted, is thus a subset of the BN neutral system lattice of Figure 14. For each general graph in Figure 16 with a unique prediction, associated specific graphs are listed. Specific graphs that are bolded correspond to the displayed BN edge orientation and edge connections assuming labeling of nodes from top left, top right, bottom left, bottom right as A, B, C, Z respectively. These bolded specific graphs also correspond to the examples in below Table4 . Graphs not highlighted in Figure 16 are equivalent in their predictions to highlighted graphs. (Asterisks in this figure have the same meaning they have in BN Figures8 and 14). For two graphs with identical predictions, the graph with the least degrees of freedom was selected. There are eight general graphs and 18 specific graphs in the BN directed system lattice; this is a significant compression of the BN neutral system lattice that includes 20 general graphs and 185 specific graphs.

Table 4. BN directed system graphs.

Predictively Equivalent Simpler Specific Graph Example Specific Graph Example BN General Graph Graph RA Notation Probability Distribution BN1 BN12 ABCZ p(Z|ABC)p(C|AB)p(B|A)p(A) BN7 ABZ:BCZ p(C|BZ)p(Z|AB)p(A|B)p(B) BN2 BN17 ABC:BCZ p(Z|BC)p(B|CA)p(A|C)p(C)

BN3 BN12 ABCZA:B p(Z|ABC)p(C|AB)p(A)p(B)

BN4 BN12 ABCZAC:BC p(Z|ABC)p(A|C)p(B|C)p(C) BN13 ACZ:BZ p(Z|AC)p(B|Z)p(C|A)p(A) BN5 BN19 ABC:CZ p(Z|C)p(B|CA)p(C|A)p(A)

BN6 BN12 ABCZBC:A p(Z|ABC)p(B|C)p(C)p(A) BCZ:ABZ p(C|BZ)p(Z|AB)p(B)p(A) BN7 A:B BN17 BCZ:ABCA:C p(Z|BC)p(B|CA)p(C)p(A)

BN8 BN17 ABCB:C:BCZB:C p(Z|BC)p(A|BC)p(B)p(C)

BN9 BN17 BCZ:ABCAB:AC p(Z|BC)p(B|A)p(C|A)p(A) BN10 BN17 BCZ:A p(Z|BC)p(B|C)p(C)p(A) AZ:BZ:CZ p(A|Z)p(B|Z)p(C|Z)p(Z) BN11 BN19 AC:BC:CZ p(Z|C)p(B|C)p(C|A)p(A)

BN12 ABCZA:B:C p(Z|ABC)p(A)p(B)p(C) BN13 ACZ :BZ p(Z|AC)p(B|Z)p(A)p(C) BN19 A:C ABCA:B:CZ p(Z|C)p(C|AB)p(A)p(B) BN16 AB:BZ:CZ p(Z|B)p(C|Z)p(B|A)p(A) BN14 BN19 AB:BC:CZ p(Z|C)p(B|A)p(A)p(C|B) BN17 BCZ :AB p(Z|BC)p(A|B)p(B)p(C) BN15 B:C BN19 ABCA:C:CZ p(Z|C)p(B|CA)p(A)p(C) BN16 BZ:CZ:A p(B|Z)p(C|Z)p(Z)p(A) BN19 BC:CZ:A p(Z|C)p(C|B)p(B) Entropy 2021, 23, 986 21 of 39

Table 4. Cont.

Predictively Equivalent Specific Graph Example Specific Graph Example BN General Graph Simpler Graph RA Notation Probability Distribution

BN17 BCZB:C:A p(Z|BC)p(B)p(C)p(A) BN18 BN19 AB:CZ p(Z|C)p(B|A)p(A)p(C) BN19 CZ:A:B p(Z|C)p(C)p(A)p(B) Entropy 2021, 23, x FOR PEER REVIEW 23 of 42 BN20 A:B:C:Z p(Z)p(A)p(B)p(C)

FigureFigure 16. 16. BNBN directed system system lattice. lattice. Table 4 lists all BN directed system general graphs. When BN graphs are greyed in column 1 it means the graph is equivalent in terms of prediction to a simpler (fewer de- grees of freedom) general graph. Column 2 identifies which simpler graph it is equivalent to. General graphs with a blank row in column 2 have no simpler equivalently predicting

Entropy 2021, 23, 986 22 of 39

Entropy 2021, 23, x FOR PEER REVIEW 24 of 42 Table4 lists all BN directed system general graphs. When BN graphs are greyed in column 1 it means the graph is equivalent in terms of prediction to a simpler (fewer degrees of freedom) general graph. Column 2 identifies which simpler graph it is equivalent to. graph,General and graphs are included with a blank in the row directed in column system 2 havelattice no of simpler Figure equivalently16. Column 3 predicting provides specificgraph, andgraph are examples included of in these the directed general systemgraphs latticeand column of Figure 4 shows 16. Column the specific 3 provides graph probabilityspecific graph distributions. examples ofWithin these column general 4, graphs only the and expressions column 4 shows that are the used specific to predict graph theprobability dependent distributions. variable are Within highlighted column in 4, bl onlyack. the All expressions other non-pred thatictive are used relations to predict are greyed.the dependent For example, variable BN1, are BN highlighted3, BN4, and in BN6 black. and All BN12 other all non-predictive predict Z in the relations same way, are i.e.,greyed. 𝑝(𝑍|𝐴𝐵𝐶) For example,, thus they BN1, are BN3,all equivalent BN4, and in BN6 terms and of BN12prediction. all predict However, Z in the BN12 same has way, the leasti.e., pdegrees(Z|ABC of) , freedom thus they and are is all therefore equivalent select in termsed to ofrepresent prediction. all five However, of these BN12 equivalent has the generalleast degrees graphs. of freedom and is therefore selected to represent all five of these equivalent general graphs. 4. Joint RA-BN Neutral System Lattice 4.1.4. Joint Joint RA-BNRA-BN Neutral System System Lattice Lattice Introduction 4.1. Joint RA-BN Neutral System Lattice Introduction This section integrates the RA and BN neutral system general graph lattices using the four variableThis section Rho integrateslattice [7]. theCombining RA and the BN Rho, neutral RA system and BN general lattice graphcreates lattices a larger using and morethe four descriptive variable lattice Rho lattice than any [7]. previously Combining identified the Rho, RAin the and literature. BN lattice The creates lattice a identi- larger fiesand independence more descriptive structures lattice thanunique any to previously RA or to BNs, identified and independence in the literature. structures The lattice that identifies independence structures unique to RA or to BNs, and independence structures are equivalent across RA and BN. Equivalence is in terms of independence structure as that are equivalent across RA and BN. Equivalence is in terms of independence structure as described separately for RA in the Section 2, and BN in the Section 3. Where two or more described separately for RA in the Section2, and BN in the Section3. Where two or more graphs, RA or BN, have the same general independence structure regardless of variable graphs, RA or BN, have the same general independence structure regardless of variable labels, they are equivalent. General independence structure is represented with independ- labels, they are equivalent. General independence structure is represented with indepen- ence statements without labels. For example, (. ⊥ . . | … ), one variable is independent of dence statements without labels. For example, (. ⊥ .. | ...) , one variable is independent of another, given a third. Consider, for example, RA general graph G15 and BN general another, given a third. Consider, for example, RA general graph G15 and BN general graph graph BN11 have the same general independence structure (.⊥ ..,… | …),(..⊥...| …), BN11 have the same general independence structure (. ⊥ .., . . . | ...), (.. ⊥ ... | ...) , thus thus they are equivalent. Two specific graphs are equivalent if they have the same inde- they are equivalent. Two specific graphs are equivalent if they have the same independence pendence structure given variable labels. For example, using RA general graph G15 and structure given variable labels. For example, using RA general graph G15 and BN general BN general graph BN11 again, Figure 17 shows these general graphs with variable labels graph BN11 again, Figure 17 shows these general graphs with variable labels added mak- addeding them making specific them graphs. specific Given graphs. these Given labels, thes theye labels, have they equivalent have equivalent general andgeneral specific and (.⊥ ..,… | ….),(..⊥...| …) (𝐴 ⊥ 𝐵, 𝐶 | 𝐷), (𝐵⊥ specificindependence independence structure, structure,(. ⊥ .., . . . | ....), (.. ⊥ ... | ...) and (A and⊥B , C | D), (B ⊥ C | D) respectively.𝐶 | 𝐷) respectively.

Figure 17. G15 and BN11* specific graph example. Figure 17. G15 and BN11* specific graph example. 4.2. RA-BN Rho Neutral System Graphs 4.2. RA-BNThe Rho Rho ( ρNeutral) lattice System of Figure Graphs 18 (adapted from Klir [7] (p. 237)) is a simplification of theThe RA Rho lattice (ρ) lattice of general of Figure graphs 18 and(adapted is used from here Klir to [7] integrate (p. 237)) the is RAa simplification neutral system of thelattice RA withlattice the of BNgeneral neutral graphs system and lattice. is used The here Rho to integrate lattice is the an evenRA neutral more generalsystem lattice withthan the the BN RA neutral general system graph lattice. lattice The and Rho can latti mapce both is an RA even and more BN general general lattice graphs than to onethe RAof its general eleven graph structures. lattice Aand solid can dot map represents both RA and a variable; BN general a line graphs connects to one variables of its eleven in the structures.Rho lattice A if solid these dot two represents variables a are variable; directly a connectedline connects by variables any box (relation)in the Rho in lattice the RA if thesegeneral two graph variables lattice. are Arrows directly from connected one Rho by graph any box to another (relation) represent in the RA hierarchy, general i.e., graph the lattice.generation Arrows of a from child one graph Rho from graph a parentto another graph. representρ1 represents hierarchy, maximal i.e., the connectedness, generation of aor child dependence, graph from between a parent variables, graph. ρ and1 representsρ11 represents maximal independence connectedness, among or dependence, all variables. betweenGraphs in-betweenvariables, andρ1 andρ11 ρrepresents11 represent independence a mix of dependence among all and variables. independence Graphs among in-be- tweenvariables. ρ1 and Each ρ11 RA represent or BN general a mix or of specific dependence graph and corresponds independence to one, among and only variables. one, of Eachthe eleven RA or Rho BN graphs.general or specific graph corresponds to one, and only one, of the eleven Rho graphs.

Entropy 2021, 23, 986 23 of 39 Entropy 2021, 23, x FOR PEER REVIEW 25 of 42

FigureFigure 18. 18.Lattice Lattice of of four-variable four-variable Rho Rho graphs. graphs.

4.3.4.3. Rho Rho and and Equivalent Equivalent RA RA and and BN BN General General Graphs Graphs OutOut of of 20 20 RA RA neutral neutral system system general general graphs graphs and 20 and BN 20 neutral BN neutral system generalsystem graphs,general theregraphs, are there 10 RA are general 10 RA general graphs, graphs, comprising comprising all of the all graphsof the graphs with no with loops no loops in the in RA the latticeRA lattice that that are equivalentare equivalent to BN to BN general general graphs. graphs. Each Each of of these these RA-BN RA-BN equivalent equivalent pairs pairs correspondscorresponds to to one one of of the the 11 11 Rho Rho graphs graphs from from Figure Figure 18 18,, with with the the exception exception of ofρ 4.ρ4.ρ ρ44 has has correspondingcorresponding RA RA and and BN BN general general graphs, graphs, but but these these do do not not have have equivalent equivalent independence independence structures,structures, and and are are discussed discussed in in the the following following Section Section 4.3 4.3.. ρρ11 reflects reflects maximal maximal connectedness connectedness among among all all four four variables. variables. For For both both the the RA RA general general graphgraph G1 G1 and and the the BN BN general general graph graph BN1 BN1 from from Figures Figures1 1and and8 respectively,8 respectively, there there are are no no independenciesindependencies among among the the variables variables and and thus thus the the graphs graphs are are equivalent. equivalent. Both Both graphs graphs have have onlyonly one one specific specific graph, graph, ABCD. ABCD. This This is is summarized summarized in in Figure Figure 19 19..

Entropy 2021, 23, x FOR PEER REVIEW 26 of 42 Entropy 20212021,, 2323,, 986x FOR PEER REVIEW 2624 of of 3942

Figure 19. Rho1, G1 and BN1 specific graph. Figure 19. Rho1, G1 and BN1 specific specific graph. 2 ρρ2corresponds corresponds to to RA RA general general graph graph G7 G7 an andd BN BN general graph BN2*, as shown in ρ2 corresponds to RA general graph G7 and BN general2 graph BN2*, as shown in Figure 20. It is clear how BN2* corresponds to Rho graph ρ because visually they are Figure 20.20. It It is clear how BN2* corresponds to Rho graph ρ2 because visually they are representedrepresented in in the the same same way way with with the the exception exception that that the the Rho Rho graph graph has has undirected undirected edges. edges. represented in the same way with the exception that the Rho graph has undirected2; edges. There are two additional BN general graphs (BN3 and BN4*) that correspond ρto ρ how- There areare twotwo additionaladditional BN BN general general graphs graphs (BN3 (BN3 and and BN4*) BN4*) that that correspond correspond to to2; ρ however2; how- everthey they have have no equivalent no equivalent RA general RA general graph, grap so theyh, so arethey discussed are discussed in the in next the section next section which ever they have no equivalent RA general graph, so they are discussed2, in the next section whichconcerns concerns non-equivalent non-equivalent RA and RA BN and general BN graphs.general ρgraphs.2, G7, and ρ BN2*G7, and represent BN2* tworepresent three- which concerns non-equivalent RA and BN general graphs. ρ2, G7, and BN2* represent twovariable three-variable relations withrelations conditional with conditional independence independence between between two variables, two variables, with general with two three-variable relations with conditional independence between two variables, with generalindependence independence structure structure(. ⊥ .. | ...,....(.⊥ ..| …,….)) . Assigning. Assigning labels labels to variables to variables makes makes it easier it eas- to general independence structure (.⊥ ..| …,….)2 . Assigning labels to variables makes it eas- ierinterpret to interpret the RA the association RA association with ρ 2.with Figure ρ . Figure20 shows 20 anshows example an example with variable with variable labels (one la- ier to interpret the RA association with ρ2. Figure 20 shows an example with variable la- belsof six (one possible of six permutationspossible permutations of variable of labels)variable assigned labels) toassigned RA graph to RA G7 graph which G7 results which in bels (one of six possible permutations of variable labels) assigned to RA graph G7 which resultsRA specific in RA graph specific ACD:BCD, graph ACD:BCD, in which A isin independent which A is independent of B given C and of B D, given(A ⊥ CB |andC, D D,) . results in RA specific graph ACD:BCD, in which A is independent of B given C and D, (𝐴Assigning ⊥ 𝐵|𝐶, labels 𝐷). Assigning to the BN labels graph to in the Figure BN 20graph yields in theFigure same 20 specific yields graph.the same Other specific label (𝐴 ⊥ 𝐵|𝐶, 𝐷). Assigning labels to the BN graph in Figure 20 yields the same specific graph.permutations Other label yield permutations five other equivalent yield five RA other and BN equivalent specific graphs: RA and ABC:ABD, BN specific ABC:ACD, graphs: graph. Other label permutations yield five other equivalent RA and BN specific graphs: ABC:ABD,ABC:BCD, ABC:ACD, ABD:ACD, ABC:BCD, ABD:BCD. ABD:ACD, ABD:BCD. ABC:ABD, ABC:ACD, ABC:BCD, ABD:ACD, ABD:BCD.

Figure 20. Rho2, G7, and BN2* example. Figure 20. Rho2, G7, and BN2* example. ρρ33 represents represents RA graph G10 and BN BN graph graph BN BN5*5* which which have have the the same same independence independence 3 structure,structure,ρ represents (. ⊥ ..,.., …... RA |….).| ...graph Figure.). Figure G10 21 and 21shows BNshows graph an an example exampleBN5* which of of one one have of of eight the eight same RA RA G10 independence G10 and and BN5* BN5* ⊥ ( ⊥ | ) specificstructure,specific graphs, (. .., BCD:AD, BCD:AD,… |….). with Figurewith independencies independencies 21 shows an example A(𝐴 B, ⊥ Cof 𝐵, oneD .𝐶 of |𝐷).The The eight full full listRA list ofG10 of eight eightand specific BN5* spe- cificspecificRA G10RA graphs, andG10 BN5* and BCD:AD, specificBN5* specificwith graphs independencies graphs are: ABC:AD, are: ABC:AD,(𝐴 ABC:BD, ⊥ 𝐵, ABC:BD, ABC:CD,𝐶 |𝐷). The fullABC:CD, ABD:AC, list of eightABD:AC, ABD:BC, spe- ABD:BC,cificABD:DC, RA ABD:DC,G10 ACD:AB, and ACD:AB,BN5* ACD:CB, specific ACD:CB, ACD:DB, graphs ACD: BCD:BA, are:DB, ABC:AD, BCD:BA, BCD:CA, ABC:BD, BCD:CA, and BCD:DA. ABC:CD, and BCD:DA. G10 ABD:AC, has beenG10 previously characterized as naïve BN-like. hasABD:BC, been previouslyABD:DC, ACD:AB,characterized ACD:CB, as naïve ACD: BN-like.DB, BCD:BA, BCD:CA, and BCD:DA. G10 has beenρ4 is previously discussed latercharac interized the section as naïve on non-equivalent BN-like. RA and BN general graphs. ρ5 represents RA graph G13 and BN graph BN10 which have the same independence structure, (. ⊥ .., ... , ... .) in that they have no independencies in the triadic relation and the fourth variable is independent of all three variables in the triadic relation. Figure 22 shows an example of one of four RA G13 and BN10 specific graphs, BCD:A, with independencies (A ⊥ B, C, D). The full list of RA G13 and BN10 specific graphs are: ABC:D, ABD:C, ACD:B, and BCD:A.

Entropy 2021, 23, x FOR PEER REVIEW 27 of 42 Entropy 2021, 23, x FOR PEER REVIEW 27 of 42

EntropyEntropy 20212021, 23, 23, x, 986FOR PEER REVIEW 27 25of of42 39

FigureFigure 21.21. Rho3,Rho3, G10,G10, andand BN5*BN5* spspecificecific graphgraph example.example. 4 ρρ4 isis discusseddiscussed laterlater inin thethe sectionsection onon nonon-equivalentn-equivalent RARA andand BNBN generalgeneral graphs.graphs. 5 ρρ5 representsrepresents RARA graphgraph G13G13 andand BNBN graphgraph BNBN1010 whichwhich havehave thethe samesame independenceindependence structure,structure, (.(. ⊥⊥ ..,.., …,…, ….)….) inin thatthat theythey havehave nono independenciesindependencies inin thethe triadictriadic relationrelation andand thethe fourthfourth variablevariable isis independentindependent ofof allall threethree variablesvariables inin thethe triadictriadic relation.relation. FigureFigure 2222 showsshows an example of one of four RA G13 and BN10 specific graphs, BCD:A, with independencies an example of one of four RA G13 and BN10 specific graphs, BCD:A, with independencies (𝐴(𝐴 ⊥⊥ 𝐵,𝐶,𝐷)𝐵,𝐶,𝐷).. TheThe fullfull listlist ofof RARA G13G13 andand BN10BN10 specificspecific graphsgraphs are:are: ABC:D,ABC:D, ABD:C,ABD:C, FigureFigure 21. 21. Rho3,Rho3, G10, G10, and and BN5* BN5* sp specificecific graph graph example. example. ACD:B,ACD:B, andand BCD:A.BCD:A. ρ4 is discussed later in the section on non-equivalent RA and BN general graphs. ρ5 represents RA graph G13 and BN graph BN10 which have the same independence structure, (. ⊥ .., …, ….) in that they have no independencies in the triadic relation and the fourth variable is independent of all three variables in the triadic relation. Figure 22 shows an example of one of four RA G13 and BN10 specific graphs, BCD:A, with independencies (𝐴 ⊥ 𝐵,𝐶,𝐷). The full list of RA G13 and BN10 specific graphs are: ABC:D, ABD:C, ACD:B, and BCD:A.

FigureFigure 22.22. Rho5,Rho5, G13,G13, andand BN10*BN10* spsp specificecificecific graphgraph example.example. 6 ρρ6 representsrepresents represents RARA RA generalgeneral general graphgraph graph G15G15 G15 andand and BNBN BN graphgraph graph BN11*BN11* BN11* whichwhich which havehave have thethe the samesame same in-in- dependencedependenceindependence structure,structure, structure, (.(. ⊥⊥ (. ..,..,⊥ …….., || ...….),….),| (( ...... ⊥⊥.), …… ( | ..| ….).….).⊥ ... ThereThere| ... areare.). threethree There dyadicdyadic are three relationsrelations dyadic inin thesetheserelations graphsgraphs in these withwith graphsoneone variablevariable with one presentpresent variable inin allall present threethree dyadicdyadic in all three relationsrelations dyadic andand relations thethe otherother and threethree the variablesvariablesother three presentpresent variables inin onlyonly present oneone inofof onlythreethree one dyadicdyadic of three relations.relations. dyadic relations. ThisThis graph graph is is described described in in the the literature literature (Zhang (Zhang 2004) 2004) as asa naïve a naïve BN, BN, simple simple Bayes, Bayes, or This graph is described in the literature (Zhang 2004) as a naïve BN, simple Bayes, or or independence Bayes, because of its simple dyadic relations among variables. What independenceindependence Bayes,Bayes, becausebecause ofof itsits simplesimple dyadicdyadic relationsrelations amongamong variables.variables. WhatWhat isis alsoalso Figureis also 22. clear Rho5, is G13, RA and general BN10* graph specific G15 graph represents example. a naïve BN because of its equivalent clearclear isis RARA generalgeneral graphgraph G15G15 representsrepresents aa nanaïveïve BNBN becausebecause ofof itsits equivalentequivalent independ-independ- enceindependence structure. structure.Figure 23 Figureshows 23an shows example an of example one of ofRA one G15 of and RA G15BN11* and specific BN11* graphs, specific enceρ 6structure. represents Figure RA general 23 shows graph an example G15 and of BN one graph of RA BN11* G15 and which BN11* have specific the same graphs, in- AD:BD:CD,graphs, AD:BD:CD, with independencies with independencies (𝐴 ⊥ 𝐵,(A 𝐶⊥ |B ,𝐷),C | D (𝐵), ( ⊥𝐶|𝐷), Band⊥C conditional| D), and conditionalprobability dependenceAD:BD:CD, structure, with independencies (. ⊥ .., … | ….),(𝐴 ( .. ⊥ ⊥ 𝐵, … 𝐶| ….). | 𝐷), There (𝐵 are ⊥𝐶|𝐷), and three conditional dyadic relations probability in distributionprobability distribution𝑝(𝐴|𝐷) 𝑝(𝐵|𝐷)𝑝(𝐶|𝐷)𝑝(𝐷)p(A|D) p(B|D.The)p(C full|D) listp(D of).The specific full RA list G15 of specific and BN11* RA G15 specific and thesedistribution graphs with𝑝(𝐴|𝐷) one variable 𝑝(𝐵|𝐷)𝑝(𝐶|𝐷)𝑝(𝐷) present .Thein all full three list dyadicof specific relations RA G15 and and the BN11* other specific three graphsBN11* specificare: AB:AC:AD, graphs are: AB:BC:BD, AB:AC:AD, AC:BC:CD, AB:BC:BD, and AC:BC:CD, AD:BD:CD. and AD:BD:CD. variablesgraphs are: present AB:AC:AD, in only oneAB:BC:BD, of three AC:BC:CD, dyadic relations. and AD:BD:CD. This graph is described in the literature (Zhang 2004) as a naïve BN, simple Bayes, or independence Bayes, because of its simple dyadic relations among variables. What is also clear is RA general graph G15 represents a naïve BN because of its equivalent independ- ence structure. Figure 23 shows an example of one of RA G15 and BN11* specific graphs, AD:BD:CD, with independencies (𝐴 ⊥ 𝐵, 𝐶 | 𝐷), (𝐵 ⊥𝐶|𝐷), and conditional probability distribution 𝑝(𝐴|𝐷) 𝑝(𝐵|𝐷)𝑝(𝐶|𝐷)𝑝(𝐷).The full list of specific RA G15 and BN11* specific graphs are: AB:AC:AD, AB:BC:BD, AC:BC:CD, and AD:BD:CD.

Figure 23. Rho 6, G15 and BN11* specific graph example.

ρ7 represents RA graph G16 and BN graph BN14* which have the same independence

structure, (. ⊥ .. | ... .), ( ... ⊥ ., ... . | ..). Figure 24 shows an example of one of twelve RA G16 and BN14* specific graphs, AD:BC:BD, with independencies (A ⊥ B | D), (C ⊥ A, D | B) and conditional probability distribution p(A|D)p(B|D)p(C|B)p(D). The full list of spe- cific RA G16 and BN14* specific graphs are: AB:AC:BD, AB:AC:CD, AB:AD:BC, AB:AD:CD,

Entropy 2021, 23, x FOR PEER REVIEW 28 of 42 Entropy 2021, 23, x FOR PEER REVIEW 28 of 42

Figure 23. Rho 6, G15 and BN11* specific graph example. Figure 23. Rho 6, G15 and BN11* specific graph example. ρ7 represents RA graph G16 and BN graph BN14* which have the same independ- ρ7 represents RA graph G16 and BN graph BN14* which have the same independ- ence structure, (. ⊥ .. | ….), (… ⊥ ., …. | ..). Figure 24 shows an example of one of twelve ence structure, (. ⊥ .. | ….), (… ⊥ ., …. | ..). Figure 24 shows an example of one of twelve RA G16 and BN14* specific graphs, AD:BC:BD, with independencies (𝐴 ⊥ 𝐵 | 𝐷), (𝐶⊥ Entropy 2021, 23, 986 RA G16 and BN14* specific graphs, AD:BC:BD, with independencies (𝐴 ⊥ 𝐵 | 26𝐷), of 39 (𝐶⊥ 𝐴, 𝐷 | 𝐵) and conditional probability distribution 𝑝(𝐴|𝐷)𝑝(𝐵|𝐷)𝑝(𝐶|𝐵)𝑝(𝐷). The full list 𝐴, 𝐷 | 𝐵) and conditional probability distribution 𝑝(𝐴|𝐷)𝑝(𝐵|𝐷)𝑝(𝐶|𝐵)𝑝(𝐷). The full list of specific RA G16 and BN14* specific graphs are: AB:AC:BD, AB:AC:CD, AB:AD:BC, of specific RA G16 and BN14* specific graphs are: AB:AC:BD, AB:AC:CD, AB:AD:BC, AB:AD:CD, AB:BC:CD, AB:BD:CD, AC:AD:BC, AC:AD:BD, AC:BC:BD, AC:BD:CD, AB:AD:CD, AB:BC:CD, AB:BD:CD, AC:AD:BC, AC:AD:BD, AC:BC:BD, AC:BD:CD, AB:BC:CD,AD:BC:BD, AB:BD:CD,and AD:BC:CD. AC:AD:BC, AC:AD:BD, AC:BC:BD, AC:BD:CD, AD:BC:BD, and AD:BC:BD,AD:BC:CD. and AD:BC:CD.

Figure 24. Rho7, G16 and BN14* specific graph example. FigureFigure 24. 24. Rho7,Rho7, G16 G16 and and BN14* BN14* spec specificific graph graph example. example. ρ8 represents RA general graph G17 and BN general graph BN16* which have the ρρ88 represents represents RA RA general general graph graph G17 G17 and and BN BN general general graph graph BN16* BN16* which which have have the the same independence structure, (.. ⊥ … | ….), (. ⊥ .., …, ….). There are two dyadic relations samesame independence independence structure, structure, (.. (.. ⊥ ⊥… ...| ….),| ... (. ⊥.), .., (. …,⊥ ….)..., ... There, ... .).are There two dyadic are two relations dyadic in these graphs with one variable present in both dyadic relations, and the fourth variable inrelations these graphs in these with graphs one variable with one present variable in present both dyadic in both relations, dyadic relations,and the fourth and the variable fourth not present in either dyadic relation, and thus independent of the three other variables. notvariable present not in present either dyadic in either relation, dyadic and relation, thus independent and thus independent of the three of other the three variables. other This graph is also representative of a naïve BN. Figure 25 shows an example of one of Thisvariables. graph Thisis also graph representative is also representative of a naïve ofBN. a naïveFigure BN. 25 shows Figure an25 showsexample an of example one of twelve RA G17 and BN16* specific graphs, BD:CD:A, with independencies (𝐵 ⊥ twelveof one ofRA twelve G17 and RA G17BN16* and specific BN16* specificgraphs, graphs,BD:CD:A, BD:CD:A, with independencies with independencies (𝐵 ⊥ 𝐶 | 𝐷), (𝐴 ⊥ 𝐵, 𝐶,𝐷), and conditional probability distribution p(B|D) p(C|D)p(D). The 𝐶(B |⊥ 𝐷),C | D (𝐴), (A ⊥ ⊥𝐵,B 𝐶,𝐷),, andC, D conditional) , and conditional probability probability distribution distribution p(B|D) p(B|D) p(C|D)p(D). p(C|D)p(D). The full list of specific RA G17 and BN16* specific graphs are: AB:AC:D, AB:BC:D, AC:BC:D, fullThe list full of list specific of specific RA RAG17 G17 and and BN16* BN16* specific specific graphs graphs are: are: AB:AC:D, AB:AC:D, AB:BC:D, AB:BC:D, AC:BC:D, AC:BC:D, AB:AD:C, AB:BD:C, AD:BD:C, AC:AD:B, AC:CD:B, AD:CD:B, BC:BD:A, BC:CD:A, and AB:AD:C,AB:AD:C, AB:BD:C, AB:BD:C, AD:BD:C, AD:BD:C, AC:AD:B, AC:AD:B, AC AC:CD:B,:CD:B, AD:CD:B, AD:CD:B, BC:BD:A, BC:BD:A, BC:CD:A, BC:CD:A, and and BD:CD:A. BD:CD:A.BD:CD:A.

Figure 25. Rho 8, G17 and BN16* specificspecific graph example. Figure 25. Rho 8, G17 and BN16* specific graph example. ρ9 represents RA general graph G18 and BN general graph BN18 whichwhich havehave thethe ρ9 represents RA general graph G18 and BN general graph BN18 which have the same independenceindependence structure,structure, (.(. ,,... ….. ⊥ ..,.., ...…).). There There are are two two dyadic dyadic relations in thesethese same independence structure, (. , …. ⊥ .., …). There are two dyadic relations in these graphs with two two variables variables included included in in one one dyadic dyadic relation relation and and the the other other two two included included in graphs with two variables included in one dyadic relation and the other two included in inthe the other. other. Figure Figure 26 shows 26 shows an example an example of one of of one three of RA three G18 RA and G18 BN18 and specific BN18 specificgraphs, the other. Figure 26 shows an example of one of three RA G18 and BN18 specific graphs, graphs,AD:BC, AD:BC,with independencies with independencies (𝐴,𝐷(A ,⊥D 𝐵,𝐶)⊥, Band, C) ,conditional and conditional probability probability distribution distribu- AD:BC,( with) independencies( ) (𝐴,𝐷 ⊥ 𝐵,𝐶), and conditional probability distribution tion𝑝(𝐶|𝐵)𝑝(𝐵)𝑝(𝐷|𝐴)𝑝(𝐴)p C|B p(B)p D|A. pThe(A) full. The list full of listspecific of specific RA G1 RA8 and G18 BN18 and BN18 specific specific graphs graphs are: 𝑝(𝐶|𝐵)𝑝(𝐵)𝑝(𝐷|𝐴)𝑝(𝐴). The full list of specific RA G18 and BN18 specific graphs are: are:AB:CD, AB:CD, AC:BD, AC:BD, and andAD:BC. AD:BC. AB:CD,ρ10 AC:BD, represents and AD:BC. RA graph G19 and BN graph BN19 have the same independence structure, (.. ⊥ ... , ... .), (. ⊥ .., ... , ... .). There is one dyadic relation and two variables independent of all other variables. Figure 27 shows an example of one of six RA G19 and BN19 specific graphs, CD:A:B, with independencies (B ⊥ C, D), (A ⊥ B, C, D) and conditional probability distribution p(D|C)p(C)p(A)p(B) . The full list of specific RA G19 and BN19 specific graphs are: AB:C:D, AC:B:D, AD:B:C, BC:A:D, BD:A:C, and CD:A:B.

ρ11 represents RA graph G20 and BN graph BN20 which have the same independence structure (. ⊥.., ... , ... .), (..⊥ ... , ... .), ( ... ⊥....) in which all variables are independent of one another, (A ⊥ B, C, D), (B ⊥ C, D), (C ⊥ D). Figure 28 shows the only specific graph for RA G20 and BN20, A:B:C:D. Entropy 2021, 23, x FOR PEER REVIEW 29 of 42

Entropy 2021, 23, x FOR PEER REVIEW 29 of 42

EntropyEntropy 2021,, 23,, 986x FOR PEER REVIEW 29 of27 42 of 39

Figure 26. Rho 9, G18 and BN18* specific graph example. Figure 26. Rho 9, G18 and BN18* specific graph example. ρ10 represents RA graph G19 and BN graph BN19 have the same independence structure,ρ10 represents (.. ⊥ …, ….), RA (. graph⊥ .., …, G19 ….). and There BN is graphone dyadic BN19 relation have the and same two variablesindependence inde- structure,pendent of (.. all ⊥ other…, ….), variables. (. ⊥ .., …, Figure ….). There27 shows is one an exampledyadic relation of one ofand six two RA variables G19 and BN19inde- specificpendent graphs, of all other CD:A:B, variables. with Figureindependencies 27 shows an(𝐵 example ⊥ 𝐶,𝐷),(𝐴 of one of ⊥six 𝐵,𝐶,𝐷) RA and G19 conditional and BN19 probabilityspecific graphs, distribution CD:A:B, 𝑝(𝐷|𝐶)𝑝(𝐶)𝑝(𝐴)𝑝(𝐵)with independencies. (𝐵The full⊥ 𝐶,𝐷),(𝐴 list of specific ⊥ 𝐵,𝐶,𝐷)RA and G19 conditional and BN19 specificprobability graphs distribution are: AB:C:D, 𝑝(𝐷|𝐶)𝑝(𝐶)𝑝(𝐴)𝑝(𝐵) AC:B:D, AD:B:C, .BC:A:D, The full BD:A:C,list of specific and CD:A:B. RA G19 and BN19 specificρ11 graphs represents are: AB:C:D,RA graph AC:B:D G20 and, AD:B:C, BN graph BC:A:D, BN20 BD:A:C, which haveand CD:A:B. the same independ- ence ρstructure11 represents (. ⊥.., RA…, ….),graph (.. ⊥G20 …, and….), BN(… ⊥graph....) in BN20which which all variables have the are same independent independ- of enceone another, structure (𝐴 (. ⊥ .., ⊥ …, 𝐵, ….), 𝐶, (.. ⊥𝐷), …, (𝐵….), ⊥(… 𝐶,⊥....) 𝐷), in (𝐶⊥𝐷). Figure which 28 all shows variables the onlyare independent specific graph of forone RA another, G20 and (𝐴 BN20, ⊥ 𝐵, A:B:C:D. 𝐶, 𝐷), (𝐵 ⊥ 𝐶, 𝐷), (𝐶⊥𝐷). Figure 28 shows the only specific graph forFigureFigure RA 26. G20 26. RhoRho and 9, 9, G18BN20, G18 and and A:B:C:D. BN18* BN18* spspecific ecific graph graph example. example.

ρ10 represents RA graph G19 and BN graph BN19 have the same independence structure, (.. ⊥ …, ….), (. ⊥ .., …, ….). There is one dyadic relation and two variables inde- pendent of all other variables. Figure 27 shows an example of one of six RA G19 and BN19 specific graphs, CD:A:B, with independencies (𝐵 ⊥ 𝐶,𝐷),(𝐴 ⊥ 𝐵,𝐶,𝐷) and conditional probability distribution 𝑝(𝐷|𝐶)𝑝(𝐶)𝑝(𝐴)𝑝(𝐵). The full list of specific RA G19 and BN19 specific graphs are: AB:C:D, AC:B:D, AD:B:C, BC:A:D, BD:A:C, and CD:A:B. ρ11 represents RA graph G20 and BN graph BN20 which have the same independ- ence structure (. ⊥.., …, ….), (..⊥ …, ….), (…⊥....) in which all variables are independent of one another, (𝐴 ⊥ 𝐵, 𝐶, 𝐷), (𝐵 ⊥ 𝐶, 𝐷), (𝐶⊥𝐷). Figure 28 shows the only specific graph Figurefor RA 27. G20 Rho, and 10, BN20, G19 and A:B:C:D. BN19 specific graph example. FigureFigure 27. 27. Rho,Rho, 10, 10, G19 G19 and and BN19 BN19 specific specific graph graph example. example.

FigureFigure 28.27. 28. RhoRho,Rho 11, 10, 11, G20 G19 G20 and and and BN20 BN19 BN20 specific specific specific graph. graph graph. example. Figure 28. Rho 11, G20 and BN20 specific graph. TableTable 5 summarizes all all equivalent equivalent RA RA and and BN BN general general graphs, graphs, with with their their associated associated RhoRho graph,Table graph, 5 an summarizes an example example of ofall their theirequivalent specific specific RAgraph graph and notations BN notations general and and graphs, their their independences. with independences. their associated These These specificRhospecific graph, graph graph an examplesexample examples ofalign align their with with specific the the BN BNgraph general general notations graphs and of theirFigure independences. 88 assuming assuming labelinglabeling These of ofspecificnodes nodes A,graph A, B, B, C, C,examples D D in in the the orderalign order with of of top top the left, left, BN top top general right, right, graphs bottom bottom of left, left, Figurebottom bottom 8 assuming right.right. labeling of nodes A, B, C, D in the order of top left, top right, bottom left, bottom right. Table 5. Equivalent Rho, RA and BN neutral system general graphs.

Rho RA General BN General Specific Graph Example Independencies Graph Graph Graph (RA Notation) ρ1 G1 BN1 ABCD no independencies ρ2 G7 BN2* ACD:BCD (A ⊥ B | C, D) Figureρ3 28. Rho G10 11, G20 and BN20 BN5* specific graph. BCD:AD (A ⊥ B, C | D) ρ5 G13 BN10 BCD:A (A ⊥ B, C, D) ρ6Table 5 G15 summarizes BN11*all equivalent AD:BD:CD RA and BN general graphs, (A ⊥ B, Cwith | D), their (B ⊥ Cassociated | D) Rho graph, an example of their specific graph notations and their independences. These ρ7 G16 BN14* AD:BC:BD (A ⊥ B | D), (C ⊥ A, D | B) specific graph examples align with the BN general graphs of Figure 8 assuming labeling ρ8 G17 BN16* BD:CD:A (B ⊥ C | D), (A ⊥ B, C, D) of nodes A, B, C, D in the order of top left, top right, bottom left, bottom right. ρ9 G18 BN18 AD:BC (A, D ⊥ B, C)

ρ10 G19 BN19 CD:A:B (B ⊥ C, D), (A ⊥ B, C, D) ρ11 G20 BN20 A:B:C:D (A ⊥ B, C, D), (B ⊥ C, D), (C ⊥ D)

Entropy 2021, 23, 986 28 of 39

4.4. Rho and Non-Equivalent RA and BN General Graphs In addition to the 10 equivalent RA and BN general graphs, there are 10 general graphs unique to the RA lattice and 10 general graphs unique to the BN lattice. All 10 non- equivalent RA general graphs in the four variable lattice have loops and require iteration to generate their probability distributions. BNs are acyclic and have analytic solutions, so there are no BN general graphs that are equivalent to the RA graphs with loops. Since RA graphs are undirected, one might think that there could be some equivalent acyclic directed BN graphs, but this is not the case, because BN graphs that are acyclic when directions are considered but cyclic if directions are ignored have V-structure interpretations, as described previously. All 10 non-equivalent BN general graphs have such V-structures, which encode independence relations unique to BNs. To illustrate: the structure A→B, B→C, C→D, D→A is cyclic and not a legitimate BN structure, but the directed structure of A→B, B→C, C→D, A→D (BN9b from Figure8), which has the same undirected links, is not cyclic, and is a legitimate BN structure. However, this latter structure is not interpreted as a set of dyadic relations, which would be written in RA notation as AB:BC:CD:AD and contains a loop (RA general graph G12 from Figure1). Rather, the V-structure consisting of C →D and A→D is interpreted as a triadic relation, which contributes a p(D|AC) to the probability expression, p(A)p(B|A)p(C|B) p(D|AC) , which does not correspond to any RA structure.

4.5. Lattice of Rho, RA, BN Neutral System General Graphs The lattice of Rho, RA and BN equivalent and non-equivalent general graphs in Figure 29 was developed from the RA lattice in Figure1 and the BN lattice in Figure8. This lattice includes all 10 unique RA general graphs, 10 unique BN general graphs, and 10 RA and BN equivalent general graphs, for a total of 30 unique general graphs. The lattice is organized using the Rho lattice [7]. All 20 RA general graphs and all 20 BN general graphs for each Rho graph are represented in the joint lattice. Within each Rho graph, where RA and BN graphs are equivalent, that is, when their independence structures are identical, the BN graph is placed under the RA equivalent graph. Where RA or BN graphs are not equivalent, representing an independence structure unique to RA or BN, they stand alone. Arrows from one graph to another in the joint lattice represent the hierarchy of the RA lattice only. As can be seen in Section3, the hierarchy of the BN lattice has many more links from parent to child graphs and thus is not a useful representation in the joint lattice. Additionally, Figure A6 in AppendixA includes the Joint RA-BN lattice of general and specific graphs. This lattice shows 53 unique RA specific graphs, 124 unique BN specific graphs, and 61 RA-BN equivalent specific graphs, for a total of 238 combined, unique, RA and BN specific graphs.

4.6. Joint RA-BN Lattice Algorithm This section defines an algorithm for generating the Joint RA-BN lattice of neutral system general and specific graphs.

4.6.1. Procedure to Generate the RA Neutral System General and Specific Graphs from a Single Rho Graph This is done in three steps: in Step 1, generate the most complex set of specific graphs that correspond to the Rho graph; in Step 2, generate all their less complex specific graph descendants; in Step 3, specific graphs are collected together in general graphs. Step 1 begins with (Step 1.1) labeling the Rho graph, as shown in Figure 30. The most complex specific graph that corresponds to this labeled Rho graph is obtained (Step 1.2) by representing each clique with a single relation encompassing all the variables in the clique and then joining these relations with a “:”. For example, in Figure 30, A, B, and C are in a clique, i.e., are fully linked to one another and this is also the case for B, C, and D, but A and B are not linked. The resulting specific graph is ABC:BCD, which is encompassed in RA general graph G7. Next (Step 1.3), permute all the variables in this specific graph, Entropy 2021, 23, 986 29 of 39

Entropy 2021, 23, x FOR PEER REVIEW 31 of 42 which generates the other five specific graphs that are encompassed within G7, as shown in RA lattice of Figure4.

Entropy 2021, 23, x FOR PEER REVIEW 32 of 42

Step 1 begins with (Step 1.1) labeling the Rho graph, as shown in Figure 30. The most complex specific graph that corresponds to this labeled Rho graph is obtained (Step 1.2) by representing each clique with a single relation encompassing all the variables in the clique and then joining these relations with a “:”. For example, in Figure 30, A, B, and C are in a clique, i.e., are fully linked to one another and this is also the case for B, C, and D, but A and B are not linked. The resulting specific graph is ABC:BCD, which is encom- passed in RA general graph G7. Next (Step 1.3), permute all the variables in this specific graph, which generates the other five specific graphs that are encompassed within G7, as FigureFigure 29. 29. LatticeLattice of of 4-variable 4-variable general general Rho, Rho, RA RA and and BN BN neutral neutral system system graphs. graphs. shown in RA lattice of Figure 4. 4.6. Joint RA-BN Lattice Algorithm This section defines an algorithm for generating the Joint RA-BN lattice of neutral system general and specific graphs.

4.6.1. Procedure to Generate the RA Neutral System General and Specific Graphs from a

Single Rho Graph Figure 30. Example,Example, Rho 2. This is done in three steps: in Step 1, generate the most complex set of specific graphs that correspondStep 2 then togenerates the Rho the graph; simpler in Step RA 2,representations generate all their of G7 less that complex map to specific Rho2, namely graph descendants;the specific graphs in Step that 3, specific are encompassed graphs are collectedwithin the together RA general in general graphs graphs. G8 and G9. Klir [7] (p. 231) details the procedure for this step. In Step 3, specific graphs with the same independence structure are then collected together in general graph equivalence classes. Doing this for Rho 2 results in general graphs G7, G8 and G9 and their specific graphs as shown in Figure 4.

4.6.2. Procedure to Generate the BN Neutral System General and Specific Graphs from a Single Rho Graph In contrast to RA graphs, BNs are just Rho graphs with directions added to edges, as shown in Figure 31. To generate all BN specific graphs for a given Rho graph, simply permute all possible edge directions and variable combinations, and follow the BN neutral system general and specific graph procedure outlined above in Section 3.2.6. Essentially, the process entails discarding redundant specific graphs and graphs with cycles from all these permutations, and collecting together BN specific graphs with unique independence structures into a general graph.

ρ2

Figure 31. Rho 2 example, with associated BNs general graphs.

4.6.3. Generating the Joint RA-BN General and Specific Graph Lattice The following provides a general algorithm to generate the joint RA-BN lattice of neutral system general and specific graphs for any number of variables from some specific starting graph, either downwards or upwards. 1. Identify a starting Rho graph 2. Generate all possible RA and BN specific graphs for the given Rho graph. a. For RA, follow the procedure detailed in the prior Section 4.6.1. b. For BN, follow the procedure detailed in the prior Section 4.6.2. c. Organize all RA and BN general graph equivalence classes into three categories: RA graphs with loops, BN graphs with V-Structures, and equivalent RA-BN graphs containing no loops or V-structures.

Entropy 2021, 23, x FOR PEER REVIEW 32 of 42

Step 1 begins with (Step 1.1) labeling the Rho graph, as shown in Figure 30. The most complex specific graph that corresponds to this labeled Rho graph is obtained (Step 1.2) by representing each clique with a single relation encompassing all the variables in the clique and then joining these relations with a “:”. For example, in Figure 30, A, B, and C are in a clique, i.e., are fully linked to one another and this is also the case for B, C, and D, but A and B are not linked. The resulting specific graph is ABC:BCD, which is encom- passed in RA general graph G7. Next (Step 1.3), permute all the variables in this specific graph, which generates the other five specific graphs that are encompassed within G7, as shown in RA lattice of Figure 4.

Entropy 2021, 23, 986 30 of 39

Figure 30. Example, Rho 2. Step 2 then generates the simpler RA representations of G7 that map to Rho2, namely the specificStep 2 then graphs generates that are the encompassedsimpler RA representations within the RA of G7 general that map graphs to Rho2, G8 andnamely G9. Klir [7] (p.the specific 231) details graphs the that procedure are encompassed for this with step.in the In RA Step general 3, specific graphs G8 graphs and G9. with Klir the same independence[7] (p. 231) details structure the procedure are then for collected this step. togetherIn Step 3, in specific general graphs graph with equivalence the same classes. independence structure are then collected together in general graph equivalence classes. Doing this for Rho 2 results in general graphs G7, G8 and G9 and their specific graphs as Doing this for Rho 2 results in general graphs G7, G8 and G9 and their specific graphs as shown in Figure4. shown in Figure 4. 4.6.2. Procedure to Generate the BN Neutral System General and Specific Graphs from a 4.6.2. Procedure to Generate the BN Neutral System General and Specific Graphs from a Single Rho Graph Single Rho Graph In contrast to RA graphs, BNs are just Rho graphs with directions added to edges, In contrast to RA graphs, BNs are just Rho graphs with directions added to edges, as asshown shown in Figure in Figure 31. To31 .generate To generate all BN all spec BNific specific graphs graphs for a given for a Rho given graph, Rho simply graph, simply permutepermute all all possible possible edge edge directions directions and and variable variable combinations, combinations, and follow and followthe BN neutral the BN neutral systemsystem general general and and specific specific graph graph procedure procedure outlined outlined above above in Section in Section 3.2.6. Essentially, 3.2.6. Essentially, thethe process entails entails discarding discarding redundant redundant specif specificic graphs graphs and graphs and graphs with cycles with from cycles all from all thesethese permutations, permutations, and and collecting collecting together together BN specific BN specific graphs graphs with unique with uniqueindependence independence structuresstructures into into a ageneral general graph. graph.

ρ2

FigureFigure 31. 31. RhoRho 2 2example, example, with with associated associated BNs BNsgeneral general graphs. graphs.

4.6.3.4.6.3. Generating Generating the the Joint Joint RA-BN RA-BN General General and andSpecific Specific Graph Graph Lattice Lattice The following following provides provides a general a general algorithm algorithm to generate to generate the joint the RA-BN joint RA-BNlattice of lattice of neutralneutral system system general general and and specific specific graphs graphs for any for number any number of variables of variables from some from specific some specific startingstarting graph, graph, either either downwards downwards or upwards. or upwards. 1. Identify a starting Rho graph 1.2. GenerateIdentify all a startingpossible RhoRA and graph BN specific graphs for the given Rho graph. 2. Generate all possible RA and BN specific graphs for the given Rho graph. a. For RA, follow the procedure detailed in the prior Section 4.6.1. b.a. ForFor BN, RA, follow follow the procedure the procedure detailed detailed in the prior in the Section prior 4.6.2. Section 4.6.1. c.b. OrganizeFor BN, all follow RA and the BN procedure general graph detailed equivalence in the classes prior Sectioninto three 4.6.2 categories:. c. RAOrganize graphs with all RA loops, and BN BN graphs general with graph V-Structures, equivalence and classes equivalent into threeRA-BN categories: graphsRA graphscontaining with no loops,loops or BN V-structures. graphs with V-Structures, and equivalent RA-BN graphs containing no loops or V-structures. 3. If searching the lattice upward, add an edge to the prior Rho graph. If searching the lattice downward, delete an edge from the prior Rho graph. 4. Repeat steps 2 and 3 until the top or bottom of the lattice is reached. Consider for example the results of the RA and BN procedures for Rho 2. Organizing these results via step 2c gives the following six general structures: G8 and G9 for RA graphs with loops, BN3 and BN4* for BN graphs with V-structures, and G7 and BN2* for equivalent RA-BN graphs. Specific structures can be simply obtained from these general structures by listing all permutations of variable labels. Following these procedures for any number of variables will result in the exhaustive, non-redundant, lattice of joint RA-BN neutral system general and specific graphs.

5. Comparing RA and BN Directed System Graphs Figure 32 shows side-by-side for comparison the RA augmented directed system lattice from Figure7 and the BN directed system lattice from Figure 16. To the left or right of each BN directed system general graph is the equivalent RA directed system general graph. For example, BN7 is equivalent to RA general graph G7. Equivalence in this context is in terms statistical equivalence of prediction results given data. Two directed system general graphs are equivalent if they predict the DV (Z) in the same way. Each of the BN directed system general graphs in the lattice is equivalent to an RA general graph in the Entropy 2021, 23, 986 31 of 39

augmented RA directed system general graph lattice. In addition, the RA directed system lattice includes additional predictive graphs, those with loops that are not found in the BN Entropy 2021, 23, x FOR PEER REVIEW 34 of 42 lattice. Thus, restricting BN directed systems to those where the DV is not a parent in a V-structure, the RA augmented directed system lattice fully encompasses the BN directed system lattice and offers additional predictive graphs.

FigureFigure 32. Comparison 32. Comparison of RA of RA augmented augmented directed directed system system lattice to to BN BN directed directed system system lattice. lattice.

Table6 shows allHowever, BN directed as pointed system out generalabove, the graphs BN directed and their system RA lattice equivalents developed as in this pa- well as specific graphper was examples constrained with to disallow their associated any DV that probability is a parent distributions. node within a V-structure. In these If this probability distributions,constraint only were the to termsbe relaxed used to to allow predict DVs the that DV are (Z) parent are highlighted nodes in V-structures, in black; then non-predictive termsthere areare greyed.BN predictive All equivalences models that necessarilygive different involve analytical loopless results RA than models; RA predictive half of these involvemodels. RA graphsTherefore, in the BN standard directed directed system systemlattice developed lattice, where in this every paper model is preliminary has an IV component,and incomplete. and the other half involve graphs in the augmentation of this lattice. Prior to developmentTo of theillustrate BN directed this point, system consider lattice inBN17 this from paper, Figure the RA 16 directed with its system specific graph lattice did not includeABZA:B:C, naïve and Bayesremoving equivalent variable “C” graphs, for simplicity e.g., G15 resulting and G17, in andABZA:B the with naïve edge orien- tations A->Z<-B and with probability distribution 𝑝(𝑍|𝐴𝐵)𝑝(𝐴)𝑝(𝐵). Here, Z is the DV Bayes-like graph, G10. The development of the BN directed system lattice in this paper in

Entropy 2021, 23, 986 32 of 39

part inspired the augmentation of the standard RA directed system lattice to include naïve Bayes type graphs.

Table 6. BN directed system graphs and RA equivalent example.

BN General BN Specific Graph Example BN Specific Graph Example Equivalent RA Equivalent RA Graph Equivalent RA Graph Graph RA Notation Probability Distribution Graph Notation Probability Distribution

BN7 BCZ:ABZA:B p(C|BZ)p(Z|AB)p(B)p(A) G7 (augmentation) BCZ:ABZ p(C|BZ)p(Z|AB)p(B|A)p(A) BN11 AZ:BZ:CZ p(A|Z)p(B|Z)p(C|Z)p(Z) G15 (augmentation) AZ:BZ:CZ p(A|Z)p(B|Z)p(C|Z)p(Z)

BN12 ABCZA:B:C p(Z|ABC)p(A)p(B)p(C) G1 ABCZ p(Z|ABC)p(ABC)

BN13 ACZA:C:BZ p(Z|AC)p(B|Z)p(A)p(C) G10 (augmentation) ACZ:BZ p(Z|AC)p(B|Z)p(C|A)p(A) BN16 BZ:CZ:A p(B|Z)p(C|Z)p(Z)p(A) G17 (augmentation) BZ:CZ:A p(B|Z)p(C|Z)p(Z)p(A)

BN17 BCZB:C:A p(Z|BC)p(B)p(C)p(A) G7 ABC:BCZ p(Z|BC)p(B|CA)p(A|C)p(C) BN19 CZ:A:B p(Z|C)p(C)p(A)p(B) G10 ABC:CZ p(Z|C)p(ABC) BN20 A:B:C:Z p(Z)p(A)p(B)p(C) G13 ABC:Z p(Z)p(ABC)

However, as pointed out above, the BN directed system lattice developed in this paper was constrained to disallow any DV that is a parent node within a V-structure. If this constraint were to be relaxed to allow DVs that are parent nodes in V-structures, then there are BN predictive models that give different analytical results than RA predictive models. Therefore, the BN directed system lattice developed in this paper is preliminary and incomplete. To illustrate this point, consider BN17 from Figure 16 with its specific graph ABZ :C, Entropy 2021, 23, x FOR PEER REVIEW 35 of 42 A:B and removing variable “C” for simplicity resulting in ABZA:B with edge orientations A->Z<-B and with probability distribution p(Z|AB)p(A)p(B). Here, Z is the DV and is the child node within the V-structure and is thus included within the BN directed system and is the child node within the V-structure and is thus included within the BN directed lattice developed in this paper. This graph is equivalent in terms of prediction to RA system lattice developed in this paper. This graph is equivalent in terms of prediction to directedRA directed system system graph graph G7 G7 with with specific specific graph graph ABC:ABZ.ABC:ABZ. In In cont contrast,rast, consider consider BN17 BN17 with itswith specific its specific graph graph ABZ ABZA:ZA:Z:C.:C. Again, Again, forfor simplicitysimplicity and and comparability, comparability, removing removing vari- variable “C”able results“C” results in ABZ in ABZA:Z A:Zwith with edge edge orientations orientations A->B<-Z A->B<-Z and and with with probability probability distribu- distribution ption(B |AZ𝑝(𝐵|𝐴𝑍)𝑝(𝐴)𝑝(𝑍))p(A)p(Z) . Here,. Here, Z Z is is the the DV, DV, and and is is a parenta parentnode node within within the the V-structure; V-structure; therefore, thistherefore, specific this graph specific was graph not was considered not consider in theed BNin the directed BN directed system system lattice lattice developed devel- in this paper.oped in However, this paper. the However, predicting the componentspredicting components within the within probability the probability distribution distribu- are different andtion thusare different will result and inthus a differentwill result statistical in a different result. statistical The differences result. The differences between ABZ be- A:B and tween ABZA:B and ABZA:Z are illustrated in Figure 33, in which a hypothetical joint proba- ABZA:Z are illustrated in Figure 33, in which a hypothetical joint probability distribution pbility(ABZ distribution), shown in𝑝(𝐴𝐵𝑍) (a), yields, shown a conditionalin (a), yields a distribution conditional distributionp(Z|AB) for𝑝(𝑍|𝐴𝐵) RA model for RA ABZ and model ABZ and BN model ABZA:B, shown in (b), that is different from the conditional BN model ABZA:B, shown in (b), that is different from the conditional distribution q(Z|AB) distribution 𝑞(𝑍|𝐴𝐵) for BN model ABZA:Z, shown in (c). ABZA:Z is an unconventional for BN model ABZ , shown in (c). ABZ is an unconventional BN model in its choice of BN model in its choiceA:Z of the parent node Z asA:Z the DV. These non-conventional BN models theare parentnot considered node Z asin this the DV.paper, These but non-conventionalare a promising topic BN for models future areresearch not considered that will in this paper,extend butthe arework a promisingreported here. topic for future research that will extend the work reported here.

(a) p(ABZ), joint distribution for ABZ

Z0 Z1 B0 B1 B0 B1 A0 0.01 0.19 0.06 0.14 A1 0.17 0.31 0.03 0.10 (c) q(Z|AB), conditional distribution for ABZA:Z (b) p(Z|AB), conditional distribution where q(ABZ) = p(B|AZ) p(A) p(Z)

for ABZ & ABZA:B and q(Z|AB) = q(ABZ) / q(AB)

Z0 Z1 Z0 Z1 B0 B1 B0 B1 B0 B1 B0 B1 A0 0.11 0.58 0.89 0.42 A0 0.21 0.74 0.79 0.26 A1 0.83 0.75 0.17 0.25 A1 0.74 0.64 0.26 0.36

Figure 33. BN Directed System Prediction Example. Figure 33. BN Directed System Prediction Example.

6. Discussion 6.1. Neutral Systems This paper builds on the RA work of Harris and Zwick [33], which developed the BN neutral system general graph lattice of Figure 8, expanding it here to offer the BN neutral system specific graph lattice of Figure 14. This paper also builds on the joint RA-BN neu- tral system general graph lattice of Figure 29 developed in that earlier work, expanding it here to offer the joint RA-BN neutral system specific graph lattice of Figure A6. In devel- oping these new lattices, this paper extends RA notation to encompass BN graphs (see Section 3.2.3). For four variables, the joint RA-BN neutral system general graph lattice increases the number of general graphs from 20 in the RA lattice and 20 in the BN lattice to 30 in the joint RA-BN lattice, and unique specific graphs from 114 in the RA lattice and 185 in the BN lattice to 238 in the joint lattice. The integration of the two lattices offers a richer and more expansive way to model and represent complex systems leveraging the V-structure unique to BN graphs and the ability accommodate loops and hypergraphs in the RA lat- tice. This paper also develops an algorithm to generate the joint RA-BN neutral system general and specific graph lattices for any number of variables in both upward and down- ward directions (Section 4.6). The exhaustive and non-redundant RA and BN lattices fol- low the more general Rho lattice. Figure A6 shows the results of this algorithm for four variables. Although this algorithm is exhaustive, it does not create a hierarchical nesting

Entropy 2021, 23, 986 33 of 39

6. Discussion 6.1. Neutral Systems This paper builds on the RA work of Harris and Zwick [33], which developed the BN neutral system general graph lattice of Figure8, expanding it here to offer the BN neutral system specific graph lattice of Figure 14. This paper also builds on the joint RA-BN neutral system general graph lattice of Figure 29 developed in that earlier work, expanding it here to offer the joint RA-BN neutral system specific graph lattice of Figure A6. In developing these new lattices, this paper extends RA notation to encompass BN graphs (see Section 3.2.3). For four variables, the joint RA-BN neutral system general graph lattice increases the number of general graphs from 20 in the RA lattice and 20 in the BN lattice to 30 in the joint RA-BN lattice, and unique specific graphs from 114 in the RA lattice and 185 in the BN lattice to 238 in the joint lattice. The integration of the two lattices offers a richer and more expansive way to model and represent complex systems leveraging the V-structure unique to BN graphs and the ability accommodate loops and hypergraphs in the RA lattice. This paper also develops an algorithm to generate the joint RA-BN neutral system gen- eral and specific graph lattices for any number of variables in both upward and downward directions (Section 4.6). The exhaustive and non-redundant RA and BN lattices follow the more general Rho lattice. Figure A6 shows the results of this algorithm for four variables. Although this algorithm is exhaustive, it does not create a hierarchical nesting of general or specific graphs. Such nesting is a desirable feature, so future extensions of this work could enhance the algorithm by enabling it to develop sequentially with each new graph being hierarchically nested. Given data, such an extension would allow statistical significance tests to be performed at each incremental step of lattice generation. Additionally, the current algorithm produces the exhaustive lattice, but searching the exhaustive lattice to find best candidate graphs is inefficient, so algorithms to efficiently search the joint lattice for best candidate graphs would be a useful extension. Another promising extension of this work would be to develop hybrid RA-BN gen- eral graphs [13] for neutral systems to further extend the expression of the joint RA-BN neutral system lattice developed in this paper. Such hybrid graphs could incorporate directed edges to encode BN V-structures with loops and hypergraphs found in RA. Other possible extensions of this work could explore the application of Bayesian networks to hypergraphs [56] and under appropriate conditions to certain types of cycles [57].

6.2. Directed Systems This paper develops the RA augmented directed system lattice (Figure7), which is an extension of the conventional RA directed system lattice (Figure5). While the conventional RA directed system lattice encompasses all prediction graphs in the BN directed system lattice (under the restriction that DVs in BN models are not parent variables in V-structures), the RA conventional directed system lattice did not include naïve Bayes graphs. Doing so, as shown in Figure7, increases the number of general graphs from nine in the conventional RA lattice to 12 in the augmented lattice, and the number of specific graphs from 19 to 31. The augmented RA directed system lattice thus offers more candidate graphs, and this allows for the possibility of more accurate or simpler and thus more generalizable RA prediction models. Augmentation of the conventional RA directed system lattice was inspired in part by the BN directed system lattice developed in this paper. Future extension of this work could examine whether BN graphs with predictions equivalent to RA models but with fewer degrees of freedom than RA predictive equivalents (because of independence constraints among the IVs) offer any advantage in calculations of statistical significance. If so, such BN graphs might replace their RA equivalents in the augmented directed system RA lattice. A related statistical issue that should be explored is how to compare augmenting directed RA models whose natural reference is A:B: ... , the neutral system independence reference, with conventional directed systems models whose Entropy 2021, 23, 986 34 of 39

natural reference is AB ... :Z, i.e., a reference that has an IV component that joins together all IVs in a single relation. This paper develops the BN directed system lattice of prediction graphs for four variables (Figure 16), reducing the number of possible specific graphs from 185 in the BN neutral system lattice to 18 in the BN directed system lattice—a significant compression of the BN neutral system lattice when prediction of a single DV is the goal. This paper also shows that all of the graphs in the BN directed system lattice (where this lattice disallows graphs where the DV is a V-structure parent) are equivalent in their predictions to RA graphs, although many of them have fewer degrees freedom than their RA-equivalent counterpart. The augmented RA directed system lattice thus encompasses all of the BN directed system general graphs in terms of prediction, and offers additional predicative graphs, those including loops, that are not in the BN lattice. However, the restriction that disallows BN graphs where the DV is a V-structure parent might be relaxed, so a future extension of this work could consider expanding the BN directed system lattice to include such unusual BN predictive graphs. An additional extension could be to develop an algorithm to generate the BN directed system lattice of general and specific graphs for any number of variables allowing for efficient search of the BN lattice for graphs that uniquely predict a single DV.

Author Contributions: Conceptualization, M.H. and M.Z.; formal analysis, M.H.; writing—original draft preparation, M.H.; writing—review and editing, M.Z.; visualization, M.H.; supervision, M.Z. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Acknowledgments: Many thanks to Rajesh Venkatachalapathy for stimulating discussions of Bayesian Networks. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A The contents of this Appendix are the work of other researchers and are included here to allow this paper to be understood in a self-contained way.

Appendix A.1. RA Loop Detection Procedure In the RA graphs of Figures1 and4, graphs that do not have loops are highlighted in bold, while graphs that have loops are non-bold. Graphs without loops are fitted algebraically, whereas graphs with loops are fitted using iterative proportional fitting. To determine if a graph has a loop, the following procedure is performed: Given the set of relations of a specific graph: 1. Remove all variables that are unique to any individual relation 2. Remove any relation that is equal to or embedded in any other relation of the (remain- ing) set 3. Repeat 1 and 2 until either a. No variables remain, in which case there are no loops, or b. The remainder is unalterable by steps 1 or 2, in which case there are loops For example, in Figure4, graph G7, illustrated by specific graph ABC:ABD does not have a loop. Krippendorff’s loop detection algorithm [11] produces the following results. First, removing variables unique to both ABC and ABD removes C from ABC and D from ABD. What remains is AB:AB, for which the second AB is redundant and thus removed, leaving AB. Then, removing the variables unique to AB removes both A and B, leaving the null set, and thus this specific graph does not contain a loop. Entropy 2021, 23, 986 35 of 39

By contrast, in Figure4, graph G8, illustrated by specific graph ABC:AD:BD, does have a loop. Krippendorff’s loop detection algorithm [11] produces the following results. First, removing variables unique to one relation removes C from ABC. What remains is AB:AD:BD. There is no redundant relation, i.e., no relation that is repeated or embedded in another relation. There are also no variables unique to one relation. Therefore, nothing further can be removed; because the remaining set is unalterable, the specific graph has a loop.

Appendix A.2. D-Separation Procedure The following provides the procedure for determining all independencies for a BN [55] Step 1. List all possible independence statements for a given BN. Step 2. For each independence statement, construct the ‘ancestral graph’ [58] for the variables mentioned in the independence statement. Step 3. ‘Moralize’ the ancestral graph by adding an undirected edge between two nodes if Entropy 2021, 23, xEntropy FOR PEER 2021 REVIEW, 23, x FOR PEER REVIEWthey have a common child. 38 of 42 38 of 42

Step 4. ‘Disorient’ the moralized, ancestral graph, by making all edges undirected. Step 5. Delete the givens (nodes) and any of their edges from the independence statement Step 6. Read theStepbeing answer 6. tested. Read to thethe answerindependen to thece independenstatement questionce statement from question the remaining from the remaining graph, if the variablesgraph,Step 6. if Read are the disconnected thevariables answer are to in thedisconnected the independence remaining in thegraph, statement remaining the questionanswer graph, to from the theanswerinde- remaining to the graph,inde- pendence statementpendenceif the variablesis in statement the affirmative. are disconnected is in the affirmative. in the remaining graph, the answer to the independence statement is in the affirmative. The following Theprovides following examples provides of the examples D-separation of the procedure D-separation for BN12procedure and BN9*: for BN12 and BN9*: The following provides examples of the D-separation procedure for BN12 and BN9*: Example 1 Example 1 Example 1 Step 1. List all SteppossibleStep 1. 1. List List independence all all possible possible independencestat independenceements for stat statementsa givenements BN. for for For a a givenfour given variables, BN. BN. For For four four variables, variables, Table 2 is the completeTableTable 22 isis list. thethe complete complete list.list. Step 2. For eachStepStep independence 2. 2. For For each each independencestatement, independence construct statement, statement, the ‘ancestral construct construct graph’the the ‘ancestral ‘ancestral [58] for graph’ graph’the [58] [58] for the variables mentionedvariablesvariables in thementioned mentioned independence in the independence statement. statement. An ancestral graphAnAn ancestral ancestral of the probability graph graph of of the theexpressi probability probabilityon includes expressi expression all onnodes includes includes listed all allin nodes nodesthe inde- listed listed in in the the inde- inde- pendence statementpendencependence that statementis being tested thatthat isis and being being all tested testedparents, and and grandparents, all all parents, parents, grandparents, grandparents,great-grandpar- great-grandparents, great-grandpar- ents, etc., of thoseents,etc., nodes. ofetc., those of those nodes. nodes. BN12 and theBN12 BN12independence and and the the independence statement (A ⊥statement statement B | D) will(A (A ⊥⊥be BB used| | D) D) aswill will an be beexample used as an example throughout thethroughoutthroughout remaining theprocedure the remaining remaining (Steps procedure 2–5, Figures (Steps A1–A5). 2–5, Figures A1A1–A5).–A5).

Figure A1. Step Figure2, create A1. the FigureStep ancestral 2, A1.create graph.Step the 2, ancestral create the graph. ancestral graph.

Figure A2. Step Figure3, moralize A2. StepStepthe ancestral 3, 3, moralize graph. the ancestral graph.

Figure A3. Step Figure4, disorient A3. Step the moralized, 4, disorient ancestral the moralized, graph. ancestral graph.

Entropy 2021, 23, x FOR PEER REVIEW 38 of 42 Entropy 2021, 23, x FOR PEER REVIEW 38 of 42

Step 6. Read the answer to the independence statement question from the remaining Step 6. Read the answer to the independence statement question from the remaining graph, if the variables are disconnected in the remaining graph, the answer to the inde- graph, if the variables are disconnected in the remaining graph, the answer to the inde- pendence statement is in the affirmative. pendence statement is in the affirmative. The following provides examples of the D-separation procedure for BN12 and BN9*: The following provides examples of the D-separation procedure for BN12 and BN9*: Example 1 Example 1 Step 1. List all possible independence statements for a given BN. For four variables, Step 1. List all possible independence statements for a given BN. For four variables, Table 2 is the complete list. Table 2 is the complete list. Step 2. For each independence statement, construct the ‘ancestral graph’ [58] for the Step 2. For each independence statement, construct the ‘ancestral graph’ [58] for the variables mentioned in the independence statement. variables mentioned in the independence statement. An ancestral graph of the probability expression includes all nodes listed in the inde- An ancestral graph of the probability expression includes all nodes listed in the inde- pendence statement that is being tested and all parents, grandparents, great-grandpar- pendence statement that is being tested and all parents, grandparents, great-grandpar- ents, etc., of those nodes. ents, etc., of those nodes. BN12 and the independence statement (A ⊥ B | D) will be used as an example BN12 and the independence statement (A ⊥ B | D) will be used as an example throughout the remaining procedure (Steps 2–5, Figures A1–A5). throughout the remaining procedure (Steps 2–5, Figures A1–A5).

Figure A1. Step 2, create the ancestral graph. Figure A1. Step 2, create the ancestral graph.

Entropy 2021, 23, 986 36 of 39 Figure A2. Step 3, moralize the ancestral graph. Figure A2. Step 3, moralize the ancestral graph.

Figure A3. Step 4, disorient the moralized, ancestral graph. FigureFigure A3. A3. StepStep 4, 4, disorient disorient the the moralized, moralized, ancestral ancestral graph. graph.

Entropy 2021, 23, x FOR PEER REVIEW 39 of 42

Figure A4. Step 5, delete the givens.Figure A4. Step 5, delete the givens.

FigureFigure A5. A5. Example,Example, full full D-separation D-separation proced procedureure for for independence independence statement statement (C (C ⊥ ⊥D|A,D|A, B). B).

Step 3. ‘Moralize’Step the 3. ‘Moralize’ ancestral graph the ancestral by adding graph an byundirected adding an edge undirected between edge two between two nodes if they havenodes a ifcommon they have child. a common child. Step 4. ‘Disorient’Step the 4. ‘Disorient’ moralized theancestral moralized graph ancestral by making graph all byedges making undirected. all edges undirected. Step 5. DeleteStep the givens 5. Delete from the and givens any from of their and edges. any of theirIn the edges. continuing In the example, continuing D example, D is is the ‘given’ (Athe ⊥ ‘given’ B | D), (A thus⊥ BD | and D), its thus connected D and its edges connected to A, B, edges and toC A,are B, removed. and C are removed. Step 6. Read theStep answer 6. Read to thethe answerindependence to the independence statement question statement from question the remaining from the remaining graph, if the variablesgraph, if theare variablesdisconnected are disconnected in the remaining in the graph, remaining the answer graph, theto the answer inde- to the indepen- pendence statementdence statementis in the affirmative. is in the affirmative. In this example, In this the example, independence the independence statement be- statement being ing tested is thetested assertion is the assertion that A and that B A are and independent B are independent given D. given This D. assertion This assertion is false is false because because A andA andB are B connected are connected in the in theremaining remaining graph; graph; thus, thus, they they are are not not conditionallyconditionally independent independent givengiven D. D. Example 2 Example 2 Consider a second example, using BN9*. The independence statement being tested Considerhere a second is the assertion:example, using C independent BN9*. The of independence D given A and statement B (C ⊥ D|A, being B). Figuretested A5 shows all here is the assertion:steps in C the independent procedure for of thisD given example, A and affirming B (C ⊥ CD|A, and B). D are Figure indeed A5independent shows given A all steps in theand procedure B. for this example, affirming C and D are indeed independent given A and B.

Entropy 2021, 23, x FOR PEER REVIEW 40 of 42 Entropy 2021, 23, 986 37 of 39

FigureFigure A6. A6.Joint Joint RA-BNRA-BN latticelattice ofof 44 variablevariable generalgeneral and specificspecific graphs.

Entropy 2021, 23, 986 38 of 39

References 1. Ashby, W.R. Constraint Analysis of Many-Dimensional Relations. Gen. Syst. Yearb. 1964, 9, 99–105. 2. Broekstra, G. Nonprobabilistic constraint analysis and a two stage approximation method of structure identification. In Proceed- ings of the 23rd Annual SGSR Meeting, Houston, TX, USA, 3–8 January 1979. 3. Cavallo, R. The Role of System Science Methodology in Social Science Research; Martinus Nijhoff Publishing: Leiden, The Netherlands, 1979. 4. Conant, R. Mechanisms of Intelligence: Ashby’s Writings on Cybernetics; Intersystems Publications: Seaside, CA, USA, 1981. 5. Conant, R. Extended dependency analysis of large systems. Int. J. Gen. Syst. 1988, 14, 97–123. [CrossRef] 6. Klir, G. Identification of generative structures in empirical data. Int. J. Gen. Syst. 1976, 3, 89–104. [CrossRef] 7. Klir, G. The Architecture of Systems Problem Solving; Plenum Press: New York, NY, USA, 1985. 8. Klir, G. Reconstructability analysis: An offspring of Ashby’s constraint theory. Syst. Res. 1986, 3, 267–271. [CrossRef] 9. Krippendorff, K. On the identification of structures in multivariate data by the spectral analysis of relations. In General Systems Research: A Science, a Methodology, a Technology; Gaines, B.R., Ed.; Society for General Systems Research: Louisville, KY, USA, 1979; pp. 82–91. Available online: http://repository.upenn.edu/asc_papers/207 (accessed on 27 July 2021). 10. Krippendorff, K. An algorithm for identifying structural models of multivariate data. Int. J. Gen. Syst. 1981, 7, 63–79. [CrossRef] 11. Krippendorff, K. Information Theory: Structural Models for Qualitative Data; Quantitative Applications in the Social Sciences #62; Sage: Beverly Hills, CA, USA, 1986. 12. Willet, K.; Zwick, M. A software architecture for reconstructability analysis. Kybernetes 2004, 33, 997–1008. Available online: https://works.bepress.com/martin_zwick/55/ (accessed on 27 July 2021). [CrossRef] 13. Zwick, M. Reconstructability Analysis of Epistasis. Ann. Hum. Genet. 2010, 157–171. Available online: https://works.bepress. com/martin_zwick/3/ (accessed on 27 July 2021). [CrossRef] 14. Zwick, M. Wholes and parts in general systems methodology. In The Character Concept in Evolutionary Biology; Wagner, G., Ed.; Academic Press: New York, NY, USA, 2001. Available online: https://works.bepress.com/martin_zwick/52/ (accessed on 27 July 2021). 15. Zwick, M. An overview of reconstructability analysis. Kybernetes 2004, 33, 887–905. Available online: https://works.bepress. com/martin_zwick/57/ (accessed on 27 July 2021). [CrossRef] 16. Zwick, M.; Johnson, M.S. State-based reconstructability analysis. Kybernetes 2004, 33, 1041–1052. Available online: https: //works.bepress.com/martin_zwick/47/ (accessed on 27 July 2021). [CrossRef] 17. Zwick, M.; Carney, N.; Nettleton, R. Exploratory Reconstructability Analysis of Accident TBI Data. Int. J. Gen. Syst. 2018, 47, 174–191. Available online: https://works.bepress.com/martin_zwick/80/ (accessed on 27 July 2021). [CrossRef] 18. Wright, S. Correlation and causation. J. Agric. Res. 1921, 20, 557–585. 19. Wright, S. The method of path coefficients. Ann. Math. Stat. 1934, 5, 161–215. [CrossRef] 20. Neapolitan, R. Probabilistic Reasoning in Expert Systems: Theory and Algorithms; Wiley: New York, NY, USA, 1989. 21. Pearl, J. Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning. In Proceedings of the 7th Conference of the Cognitive Science Society, Irvine, CA, USA, 15–17 August 1985; pp. 329–334. 22. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann Publishers, Inc.: San Francisco, CA, USA, 1988. 23. Pearl, J.; Verma, T. The logic of representing dependencies by directed graphs. In Proceedings of the 6th National Conference on Artificial Intelligence, Seattle, WA, USA, 13–17 July 1987. 24. Lauritzen, S. Graphical Models; Oxford Statistical Science Series; Oxford University Press: New York, NY, USA, 1996. 25. Driver, E.; Morrell, D. Implementation of Continuous Bayesian Networks Using Sums of Weighted Gaussians. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montréal, QC, Canada, 18–20 August 1995. 26. Tang, Y.; Srihari, N. Efficient and Accurate Learning of Bayesian Networks using Chi-Squared. In Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 11–15 November 2012. 27. Rebane, G.; Pearl, J. The recovery of causal from statistical data. In Proceedings of the Third Conference on Uncertainty Artificial Intelligence, Seattle, WA, USA, 10 July 1987; pp. 222–228. 28. Verma, T.; Pearl, J. Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, USA, 27–29 July 1990; pp. 255–270. 29. Chickering, D. Learning Equivalence Classes of Bayesian-Network Structures. J. Mach. Learn. Res. 2002, 2, 445–498. 30. Meek, C. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–20 August 1995; Hanks, S., Besnard, P., Eds.; Morgan Kaufmann: Burlington, MA, USA, 1995; pp. 403–410. 31. Andersson, S.; Madigan, D.; Perlman, M. A Characterization of Markov Equivalence Classes For Acyclic Digraphs. Ann. Stat. 1997, 25, 505–541. [CrossRef] 32. Gillispie, S.; Perlman, M. Enumerating Markov Equivalence Classes of Acyclic Digraph Models. In Proceedings of the Seventeenth conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2 August 2001; pp. 171–177. Entropy 2021, 23, 986 39 of 39

33. Harris, M.; Zwick, M. Joint Lattice of Reconstructability Analysis and Bayesian Network General Graphs. In Unifying Themes in Complex Systems X, ICCS 2020; Springer: Cham, Switzerland, 2021. 34. Heckerman, D.; Geiger, D.; Chickering, D. Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 29–31 July 1994; pp. 293–301. 35. Chickering, D.; Heckerman, D.; Meek, C. Large-Sample Learning of Bayesian Networks is NP-Hard. J. Mach. Learn. Res. 2004, 5, 1287–1330. 36. Murphy, K. A Brief Introduction to Graphical Models and Bayesian Networks. 1998. Available online: https://www.cs.ubc.ca/ ~{}murphyk/Bayes/bayes_tutorial.pdf (accessed on 27 July 2021). 37. Bouckaert, R. Properties of Bayesian Belief Network Learning Algorithms. In Proceedings of the Tenth international conference on Uncertainty in artificial intelligence, Seattle, WA, USA, 29–31 July 1994; pp. 102–109. 38. Buntine, W. Classifiers: A theoretical and empirical study. In Proceedings of the IJCAI, Sydney, Australia, 24–30 August 1991; Morgan Kaufmann: Burlington, MA, USA, 1991; pp. 638–644. 39. Buntine, W. Theory refinement on Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, USA, 13–15 July 1991; pp. 52–60. 40. Chickering, D.; Geiger, D.; Heckerman, D. Learning Bayesian networks: Search methods and experimental results. In Proceedings of the Fifth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 4–7 January 1995; pp. 112–128. 41. Cooper, D.A. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [CrossRef] 42. Friedman, N.; Goldszmidt, M. Building classifiers using Bayesian Networks. In Proceedings of the National Conference on Artificial Intelligence, Portland, OR, USA, 4–8 August 1996. 43. Friedman, N.; Koller, D. Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 2003, 50, 95–125. [CrossRef] 44. Koivisto, M.; Sood, K. Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 2004, 5, 549–573. 45. Larranaga, P.; Kuijpers, C.; Murga, R.; Yurramendi, Y. Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Trans. Syst. Man Cybern. 1996, 26, 487–493. [CrossRef] 46. Malone, B.; Yuan, C.; Hanse, E. Memory-Efficient Dynamic Programming for Learning Optimal Bayesian Networks. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011; pp. 1057–1062. 47. Chen, Y. Structure Discovery in Bayesian Networks: Algorithms and Applications. Master’s Thesis, Iowa State University, Ames, IA, USA, 2016. 48. Studený, M.; Vomlel, J. Geometric view on learning Bayesian Network Structures. Int. J. Approx. Reason. 2010, 51, 573–586. [CrossRef] 49. Tian, J.; He, R.; Ram, L. Bayesian model averaging using the k-best Bayesian network structures. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, 8–11 July 2010. 50. Zhang, H. The Optimality of Naive Bayes. In Proceedings of the FLAIRS Conference, Miami Beach, FL, USA, 12–14 May 2004. 51. Pearl, J. Causality: Models, Reasoning, and Inference; Cambridge University Press: Cambridge, UK, 2000. 52. Chickering, D. Transformational Characterization of Equivalent Bayesian Network Structures. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montréal, QC, Canada, 18–20 August 1995; pp. 87–98. 53. Chickering, D.; Heckerman, D.; Meek, C. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, USA, 1–3 August 1997; Geiger, D., Ed.; Morgan Kaufmann: Burlington, MA, USA, 1997; pp. 80–90. 54. Rubin, D. Bayesian Inference for Causal Effects: The Role of Randomization. Ann. Stat. 1978, 6, 34–58. [CrossRef] 55. MIT. D-Separation. 2015. Available online: http://web.mit.edu/jmn/www/6.034/d-separation.pdf (accessed on 27 July 2021). 56. Javidian, M.A.; Wang, Z.; Lu, L.; Valtorta, M. On a hypergraph probabilistic graphical model. Ann. Math Artif. Intell. 2020, 88, 1003–1033. [CrossRef] 57. Forre, P.; Mooij, J. Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias. In Proceedings of the Uncertainty in Artificial Intelligence Conference, Tel Aviv, Israel, 22–25 July 2019; Volume 115, pp. 71–80. 58. Richardson, T.; Spirtes, P. Ancestral graph Markov models. Ann. Stat. 2002, 30, 962–1030. [CrossRef]