An Algorithm to Learn Polytree Networks with Hidden Nodes
Total Page:16
File Type:pdf, Size:1020Kb
An Algorithm to Learn Polytree Networks with Hidden Nodes Firoozeh Sepehr Donatello Materassi Department of EECS Department of EECS University of Tennessee Knoxville University of Tennessee Knoxville 1520 Middle Dr, Knoxville, TN 37996 1520 Middle Dr, Knoxville, TN 37996 [email protected] [email protected] Abstract Ancestral graphs are a prevalent mathematical tool to take into account latent (hid- den) variables in a probabilistic graphical model. In ancestral graph representations, the nodes are only the observed (manifest) variables and the notion of m-separation fully characterizes the conditional independence relations among such variables, bypassing the need to explicitly consider latent variables. However, ancestral graph models do not necessarily represent the actual causal structure of the model, and do not contain information about, for example, the precise number and location of the hidden variables. Being able to detect the presence of latent variables while also inferring their precise location within the actual causal structure model is a more challenging task that provides more information about the actual causal rela- tionships among all the model variables, including the latent ones. In this article, we develop an algorithm to exactly recover graphical models of random variables with underlying polytree structures when the latent nodes satisfy specific degree conditions. Therefore, this article proposes an approach for the full identification of hidden variables in a polytree. We also show that the algorithm is complete in the sense that when such degree conditions are not met, there exists another polytree with fewer number of latent nodes satisfying the degree conditions and entailing the same independence relations among the observed variables, making it indistinguishable from the actual polytree. 1 Introduction The presence of unmeasured variables is a fundamental challenge in discovery of causal relationships [1, 2, 3]. When the causal diagram is a Directed Acyclic Graph (DAG) with unmeasured variables, a common approach is to use ancestral graphs to describe the independence relations among the measured variables [2]. The main advantage of ancestral graphs is that they involve only the measured variables and successfully encode all their conditional independence relations via m-separation. Furthermore, complete algorithms have been devised to obtain ancestral graphs from observational data, e.g., the work in [3]. However, recovering the actual structure of the original DAG is something that ancestral graphs somehow circumvent. For example, it might be known that the actual causal diagram has a polytree structure including the hidden nodes, but the ancestral graph associated with the measured variables might not even be a polytree [4]. Instead, the recovery of causal diagrams including the location of their hidden variables is a very challenging task and algorithmic solutions are available only for specific scenarios [5, 6, 7, 8]. For example, in the case of specific distributions (i.e., Gaussian and Binomial) when the causal diagram is known to be a rooted tree, the problem has been solved by exploiting the additivity of a metric along the paths of the tree [6, 7, 8, 9]. In the case of generic distributions, though, additive metrics might be too difficult to define or cannot be defined in general. Furthermore, rooted trees can be considered a rather limiting class of networks 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. since they represent probability distributions which can only be factorized according to second order conditional distributions [10]. This article makes a novel contribution towards the recovery of more general causal diagrams. Indeed, it provides an algorithm to learn causal diagrams making no assumptions on the underlying probability distribution, and considering polytree structures which can represent factorizations involving conditional distributions of arbitrarily high order. Furthermore, it is shown that a causal diagram with a polytree structure can be exactly recovered if and only if each hidden node satisfies the following conditions: (i) the node has at least two children; (ii) if the node has exactly one parent, such a parent is not hidden; (iii) the node has at least degree 3, or each of its two children has at least another parent. The provided algorithm recovers every polytree structure with hidden nodes satisfying these conditions, and, remarkably, makes use only of third order statistics. If the degree conditions are not satisfied, then it is shown that there exists another polytree with fewer number of hidden random variables which entails the same independence relations among the observed variables. Indeed, in this case, when no additional information/observations are provided, no test can be constructed to determine the true structure. Another main advantage of this proposed approach lies in the fact that it follows a form of Occam’s razor principle since in the case where the degree conditions on the hidden nodes are not met, then a polytree with minimal number of hidden nodes is selected. We find this property quite relevant in application scenarios since Occam’s razor is arguably one of the cardinal principles in all sciences. 2 Preliminaries, Assumptions and Problem Definition In order to formulate our problem, we first introduce a generalization of the notions of directed and undirected graphs (see for example [11, 12]) which also considers a partition of the set of nodes into visible and hidden nodes. Definition 1 (Latent partially directed graph). A latent partially directed graph G¯` is a 4-ple (V; L; E; E~ ) where • the disjoint sets V and L are named the set of visible nodes and the set of hidden nodes, • the set E is the set of undirected edges containing unordered pairs of (V [ L) × (V [ L), • the set E~ is the set of directed edges containing ordered pairs of (V [ L) × (V [ L). We denote the unordered pair of two elements yi; y j 2 V [ L as yi − y j, and the ordered pair of yi; y j (when yi precedes y j) as yi ! y j. In a latent partially directed graph the sets E and E~ do not share any edges. Namely, yi − y j 2 E implies that both yi ! y j and y j ! yi are not in E.~ A latent partially directed graph is a fully undirected graph when E~ = ;, and we simplify the notation by writing G` = (V; L; E). Similarly, when E = ;, we have a fully directed graph, and we denote it by G~ ` = (V; L; E~ ). Furthermore, if we drop the distinction between visible and hidden nodes and consider V [ L as the set of nodes, we recover the standard notions of undirected and directed graphs. Thus, latent partially directed graphs inherit, in a natural way, all notions associated with standard graphs (e.g., path, degree, neighbor, etc., see for example [11]). In the scope of this article, we denote degree, outdegree, indegree, children, parents, descendants and ancestors of a node y in graph G~ using deg (y), deg+ (y), deg− (y), ch (y), pa (y), de (y) and an (y), respectively (see [11, 12] for G~ G~ G~ G~ G~ G~ G~ precise definitions). Furthermore, the notion of restriction of a graph to a subset of nodes follows immediately. Definition 2 (Restriction of a latent partially directed graph). The restriction of a latent partially directed graph G¯` = (V; L; E; E~ ) with respect to a set of nodes A ⊆ V [ L is the latent partially directed graph obtained by considering only the nodes in A and the edges linking pairs of nodes which are both in A. Moreover, a latent partially directed graph is called a latent partially directed tree when there exists exactly one path connecting any pair of nodes. Definition 3 (Latent partially directed tree). A latent partially directed tree ~P` is a latent partially directed graph G¯` = (V; L; E; E~ ) where every pair of nodes yi; y j 2 V [ L is connected by exactly one path. 2 Trivially, latent partially directed trees generalize the notions of undirected trees and polytrees (directed trees) [13]. In a latent partially directed tree, we define a hidden cluster as a group of hidden nodes that are connected to each other via a path constituted exclusively of hidden nodes. Definition 4 (Hidden cluster). A hidden cluster in a latent partially directed tree ~P` = (V; L; E; E~ ) is a set C ⊆ L such that for each distinct pair of nodes yi; y j 2 C the unique path connecting them contains only nodes in C and no node in C is linked to a node which is in L n C. Observe that each node in a hidden cluster has neighbors which are either visible or hidden nodes of the same cluster. Figure 1 (a) depicts a latent directed tree (or a latent polytree) and its hidden clusters C1 and C2 highlighted by the dotted lines. 14 C3 1 2 h1 h2 C2 1 2 1 2 h1 12 C1 C1 3 h3 h4 h5 4 3 4 3 h3 13 h5 4 C2 C1 h6 5 h7 6 5 6 h6 5 h7 6 14 C2 7 8 9 10 11 7 8 9 10 11 7 8 9 10 11 (a) (b) (c) Figure 1: A hidden polytree (a) and its collapsed hidden polytree (b), a minimal hidden polytree (c). Furthermore, we introduce the set of (visible) neighbors of a hidden cluster, its closure and its degree. Definition 5 (Neighbors, closure, and degree of a hidden cluster). In a latent partially directed tree, the set of all visible nodes linked to any of the nodes of a hidden cluster C is the set of neighbors of C and is denoted by N(C). We define the degree of the hidden cluster as jN(C)j, namely, the number of neighbors of the cluster.