On the Parameterized Complexity of Polytree Learning

On the Parameterized Complexity of Polytree Learning Niels Grüttemeier , Christian Komusiewicz and Nils Morawietz∗ Philipps-Universität Marburg, Marburg, Germany {niegru, komusiewicz, morawietz}@informatik.uni-marburg.de Abstract In light of the hardness of handling general Bayesian networks, the learning and inference prob- A Bayesian network is a directed acyclic graph that lems for Bayesian networks fulfilling some spe- represents statistical dependencies between vari- cific structural constraints have been studied exten- ables of a joint probability distribution. A funda- sively [Pearl, 1989; Korhonen and Parviainen, 2013; mental task in data science is to learn a Bayesian Korhonen and Parviainen, 2015; network from observed data. POLYTREE LEARN- Grüttemeier and Komusiewicz, 2020a; ING is the problem of learning an optimal Bayesian Ramaswamy and Szeider, 2021]. network that fulfills the additional property that its underlying undirected graph is a forest. In One of the earliest special cases that has received atten- this work, we revisit the complexity of POLYTREE tion are tree networks, also called branchings. A tree is a LEARNING. We show that POLYTREE LEARN- Bayesian network where every vertex has at most one par- ING can be solved in 3n · |I|O(1) time where n is ent. In other words, every connected component is a di- the number of variables and |I| is the total in- rected in-tree. Learning and inference can be performed in stance size. Moreover, we consider the influence polynomial time on trees [Chow and Liu, 1968; Pearl, 1989]. of the number of variables d that might receive a Trees are, however, very limited in their modeling power nonempty parent set in the final DAG on the com- since every variable may depend only on at most one other plexity of POLYTREE LEARNING. We show that variable. To overcome this problem, a generalization of branchings called polytrees has been proposed. A poly- POLYTREE LEARNING has no f(d) · |I|O(1)-time algorithm, unlike Bayesian network learning which tree is a DAG whose underlying undirected graph is a forest. An advantage of polytrees is that the inference task can be solved in d O(1) time. We show that, in 2 · |I| can be performed in polynomial time on them [Pearl, 1989; contrast, if d and the maximum parent set size are Guo and Hsu, 2002]. POLYTREE LEARNING, the problem bounded, then we can obtain efficient algorithms. of learning an optimal polytree structure from parent scores, however, remains NP-hard [Dasgupta, 1999]. We study exact 1 Introduction algorithms for POLYTREE LEARNING. Bayesian networks are the most important tool for modelling statistical dependencies in joint probability distributions. A Related Work. POLYTREE LEARNING is NP-hard even if every parent set with strictly positive score has size arXiv:2105.09675v1 [cs.DS] 20 May 2021 Bayesian network consists of a DAG (N, A) over the vari- [ ] able set N and a set of condiditional probability tables. Given at most 2 Dasgupta, 1999 . Motivated by the contrast a Bayesian network and the observed values on some of its between the NP-hardness of POLYTREE LEARNING and variables, one may infer the probability distributions of the the fact that learning a tree has a polynomial-time algo- remaining variables under the observations. One of the draw- rithm, the problem of optimally learning polytrees that are backs of using Bayesian networks is that this inference task close to trees has been considered. More precisely, it is NP-hard. Moreover, the task of constructing a Bayesian has been shown that the best polytree among those that network with an optimal network structure is NP-hard as can be transformed into a tree by at most k edge dele- O(k) O(1) well, even on very restricted instances [Chickering, 1995]. tions can be foundin n |I| time [Gaspers et al., 2015; In this problem, we are given a local parent score func- Safaei et al., 2013] where n is the number of variables and |I| N\{v} is the overall input size. Thus, the running time of these algo- tion fv : 2 → N0 for each variable v and the task is to learn a DAG (N, A) such that the sum of the parent scores rithms is polynomial for every fixed k. As noted by Gaspers over all variables is maximal. et al. [2015], a brute-force algorithm for POLYTREE LEARN- ING would need to consider nn−2 · 2n−1 directed trees. Poly- ∗Supported by the Deutsche Forschungsgemeinschaft (DFG), trees have been used, for example, in image-segmentation for project OPERAH, KO 3669/5-1. microscopy data [Fehri et al., 2019]. A Our Results. We obtain an algorithm that solves POLY- that has no children is a sink.Given a vertex v, we let Pv := TREE LEARNING in 3n · |I|O(1) time. This is the first algo- {u | (u, v) ∈ A} denote the parent set of v. The skeleton rithm for POLYTREE LEARNING where the running time is of a directed graph (N, A) is the undirected graph (N, E) singly-exponential in the number of variables n. Therunning where {u, v} ∈ E if and only if (u, v) ∈ A or (v,u) ∈ time is a substantial improvement over the brute-force algo- A. A directed acyclic graph is a polytree if its skeleton is a rithm mentioned above, thus positively answering a question forest, that is, the skeleton is acyclic [Dasgupta, 1999]. Asa of Gaspers et al. [2015] on the existence of such algorithms. shorthand, we write P × v := P ×{v} for a vertex v and a We then study whether POLYTREE LEARNING is amenable set P ⊆ N. to polynomial-time data reduction that shrinks the instance I Problem Definition. Given a vertex set N, a family F := if |I| is much bigger than n. Using tools from parameter- {f : 2N\{v} → N | v ∈ N} is a family of local scores ized complexity theory, we show that such a data reduction v 0 for N. Intuitively, fv(P ) is the score that a vertex v obtains algorithm is unlikely. More precisely, we show that (un- if P is its parent set. Given a directed graph D := (N, A) we der standard complexity-theoretic assumptions) there is no A define score(A) := v∈N fv(Pv ). We study the following polynomial-time algorithm that replaces any n-variable in- computational problem. stance I by an equivalent one of size nO(1). In other words, P it seems necessary to keep an exponential number of parent POLYTREE LEARNING sets to compute the optimal polytree. Input: A set of vertices N, local scores F = {fv | v ∈ We then consider a parameter that is potentially much N}, and an integer t ∈ N0. Question smaller than n and determine whether POLYTREE LEARN- : Is there an arc-set A ⊆ N × N such ING can be solved efficiently when this parameter is small. that (N, A) is a polytree and score(A) ≥ t? The parameter d, whichwe callthenumberof dependent vari- Given an instance I := (N, F,t) of POLYTREE LEARN- ables, is the number of variables v for which at least one en- ING, an arc-set A is a solution of I if (N, A) is a try of fv is strictly positive. The parameter essentially counts polytree and score(A) ≥ t. Without loss of general- how many variables might receive a nonemptyparent set in an ity we may assume that fv(∅) = 0 for every v ∈ optimal solution. We show that POLYTREE LEARNING can N [Grüttemeier and Komusiewicz, 2020a]. Throughout this be solved in polynomial time when d is constant but that an work, we let n := |N|. O(1) algorithm with running time g(d) · |I| is unlikely for any Solution Structure and Input Representation. We as- computable function g. Consequently, in order to obtain pos- sume that the local scores F are given in non-zero represen- itive results for the parameter d, one needs to consider further tation. That is, f (P ) is part of the input if it is different restrictions on the structure of the input instance. We make a v from 0. For N = {v1,...,vn}, the local scores F are repre- first step in this direction and consider the case where all par- sented by a two-dimensional array [Q ,Q ,...,Q ], where ent sets with a strictly positive score have size at most p. Us- 1 2 n each Qi is an array containing all triples (fvi (P ), |P |, P ) ing this parameterization, we show that every input instance where f (P ) > 0. The size |F| is defined as the number ωdp O(1) vi can be solved in 2 ·|I| time where ω is the matrix mul- of bits needed to store this two-dimensional array. Given an tiplication constant. With the current-best known value for ω instance I := (N, F,t), we define |I| := n + |F| + log(t). dp O(1) this gives a running time of 5.18 · |I| . We then consider For a vertex v, we call PF (v) := {P ⊆ N \ {v} | again data reduction approaches. This time we obtain a posi- fv(P ) > 0} ∪ {∅} the set of potential parents of v. Given tive result: Any instance of POLYTREE LEARNING where p is a yes-instance I := (N, F,t) of POLYTREE LEARNING, constant can be reduced in polynomial time to an equivalent A there exists a solution A such that Pv ∈ PF (v) for ev- O(1) A one of size d . Informally, this means that if the instance ery v ∈ N: If there is a vertex with Pv 6∈ PF (v) has only few dependent variables, the parent sets with strictly we can simply replace its parent set by ∅.

On the Parameterized Complexity of Polytree Learning

Context-Bounded Analysis of Concurrent Queue Systems*

Bayesian Networks

Lecture 10: Variable Eliminaton, Contnued

Polytree-Augmented Classifier Chains for Multi-Label Classification

Arxiv:1802.06316V1 [Math.AC] 18 Feb 2018 Ne Yl,Eg Ideal

Introduction to Neurabase: Forming Join Trees in Bayesian Networks

The Study of the Theoretical Size and Node Probability of the Loop Cutset in Bayesian Networks

Learning Polytrees

An Algorithm to Learn Polytree Networks with Hidden Nodes

Arxiv:1507.02608V6 [Math.ST] 3 Feb 2018 Represented by a Completed Partially Directed Acyclic Graph (CPDAG)

Graph Theory

Inference in Bayesian Networks