arXiv:2105.09675v1 [cs.DS] 20 May 2021 rjc PRH O3669/5-1. KO OPERAH, project aeinntokcnit faDAG a of consists distributi network probability Bayesian joint modelling in for tool dependencies important statistical most the are networks Bayesian Introduction 1 nti rbe,w r ie oa aetsoefunc- score parent local a given are we tion problem, this In ewr iha pia ewr tutr sN-adas NP-hard instances restricted is very structure Bayesian on network a even optimal constructing well, of an task task with inference the this network Moreover, that is NP-hard. networks is Bayesian draw using t the of of of One backs its distributions observations. of the probability under some the variables on remaining infer values may observed one the variables, and network Bayesian a set able olanaDAG a learn to vralvralsi maximal. is variables all over ∗ upre yteDush oshnseencat(DFG), Forschungsgemeinschaft Deutsche the by Supported f etlts ndt cec st er Bayesian a P learn to data. is observed funda- from science A network data in distribution. task probability mental joint vari- a between of ables dependencies that statistical graph acyclic directed represents a is network Bayesian A h ubro aibe and variables of number the t neligudrce rp safrs.In forest. P of a complexity the is revisit that we graph work, property this undirected additional underlying the its fulfills that network ftenme fvariables influence of the number consider the we of Moreover, size. stance oepyprn e ntefia A ntecom- the on P DAG final of the plexity in set parent nonempty one,te ecnoti fcetalgorithms. efficient obtain can we then bounded, otat if contrast, a esle in solved be can which learning network Bayesian unlike algorithm, L ING ING P v OLYTREE EARNING N 2 : stepolmo eriga pia Bayesian optimal an learning of problem the is a esle in solved be can n e fcniiinlpoaiiytbe.Given tables. probability condiditional of set a and N \{ v ( } eso htP that show We . d ,A N, L OLYTREE → n h aiu aetstsz are size set parent maximum the and EARNING ntePrmtrzdCmlxt fPlte Learning Polytree of Complexity Parameterized the On il Grüttemeier Niels 2 ) N d uhta h u fteprn scores parent the of sum the that such 0 | · Abstract ner,kmseiz morawietz}@informatik.uni-marbu komusiewicz, {niegru, o ahvariable each for I 3 | L n O a no has EARNING (1) | · I d hlpsUiesttMrug abr,Germany Marburg, Marburg, Philipps-Universität ie eso ht in that, show We time. | O htmgtrciea receive might that (1) | OLYTREE f OLYTREE I ( | ( d eso that show We . iewhere time ,A N, stettlin- total the is ) [ hceig 1995 Chickering, | · v I , ) | n h akis task the and OLYTREE O vrtevari- the over hita Komusiewicz Christian L L (1) EARN EARN -time n n.A ons. is - - he ] - . inare tion re aebe sd o xml,i mg-emnainfo image-segmentation in data example, microscopy for used, been have trees esfrBysa ewrsflligsm spe- exten- studied some been fulfilling prob- have networks inference constraints and Bayesian sively structural learning for cific the lems networks, Bayesian aawm n zie,2021 Szeider, and Ramaswamy 2020a; Komusiewicz, and Grüttemeier 2015; Parviainen, and Korhonen n.I te od,eeycnetdcmoeti di- in a performed is be can component trees inference on connected time and every polynomial par- Learning one words, in-. most other rected at In has vertex ent. every where network Bayesian s.A datg fpltesi htteifrnetask inference the them that on time is for- polynomial polytrees a in of is poly- performed be graph advantage A can undirected An underlying of whose proposed. generalization DAG est. been a a has is problem, other tree polytrees one this called most overcome power at branchings To modeling on only their depend variable. in may limited variable every very since however, are, Trees eae Work. Related tal. et u n s,2002 Hsu, and Guo loihsfrP for algorithms steoealiptsz.Tu,ternigtm fteealg fixed these every of for time polynomial running the is Thus, rithms size. input overall the is flann notmlplte tutr rmprn score parent from NP-hard structure remains polytree however, optimal an learning of feeyprn e ihsrcl oiiesoehssize has score positive strictly with 2 set most parent at every if ewe h Phrns fP of NP-hardness the between a etasomdit reb tmost at that it by those tree precisely, among a polytree More into best transformed the considered. be that been algo- can shown are polynomial-time has been that a polytrees trees has learning has to optimally tree of close a problem the learning rithm, that fact the Safaei in a efudin found be can tions ING n fteeris pca ae hthsrcie atten- received has that cases special earliest the of One nlgto h adeso adiggeneral handling of hardness the of light In ol edt consider to need would [ 2015 tal. et tree [ ] er,18;Kroe n avann 2013; Parviainen, and Korhonen 1989; Pearl, 2013 , rt-oc loih o P for algorithm brute-force a , [ agpa 1999 Dasgupta, and ewrs localled also networks, OLYTREE [ Fehri ] P isMorawietz Nils ] where OLYTREE P . n O tal. et ( OLYTREE k [ n ) hwadLu 98 er,1989 Pearl, 1968; Liu, and Chow L | [ I agpa 1999 Dasgupta, stenme fvralsand variables of number the is n EARNING ] 2019 , | oiae ytecontrast the by Motivated . n O rg.de − L (1) ] . 2 EARNING OLYTREE · time ] 2 L . branchings n k EARNING − . sntdb Gaspers by noted As . ∗ 1 [ Gaspers ietdtes Poly- trees. directed OLYTREE ] sN-adeven NP-hard is esuyexact study We . L EARNING h problem the , k rei a is tree A . [ er,1989; Pearl, tal. et dedele- edge L 2015; , EARN and | o- I s, ] - r | . A Our Results. We obtain an algorithm that solves POLY- that has no children is a sink.Given a vertex v, we let Pv := TREE LEARNING in 3n · |I|O(1) time. This is the first algo- {u | (u, v) ∈ A} denote the parent set of v. The skeleton rithm for POLYTREE LEARNING where the running time is of a (N, A) is the undirected graph (N, E) singly-exponential in the number of variables n. Therunning where {u, v} ∈ E if and only if (u, v) ∈ A or (v,u) ∈ time is a substantial improvement over the brute-force algo- A. A is a polytree if its skeleton is a rithm mentioned above, thus positively answering a question forest, that is, the skeleton is acyclic [Dasgupta, 1999]. Asa of Gaspers et al. [2015] on the existence of such algorithms. shorthand, we write P × v := P ×{v} for a vertex v and a We then study whether POLYTREE LEARNING is amenable set P ⊆ N. to polynomial-time data reduction that shrinks the instance I Problem Definition. Given a vertex set N, a family F := if |I| is much bigger than n. Using tools from parameter- {f : 2N\{v} → N | v ∈ N} is a family of local scores ized complexity theory, we show that such a data reduction v 0 for N. Intuitively, fv(P ) is the score that a vertex v obtains algorithm is unlikely. More precisely, we show that (un- if P is its parent set. Given a directed graph D := (N, A) we der standard complexity-theoretic assumptions) there is no A define score(A) := v∈N fv(Pv ). We study the following polynomial-time algorithm that replaces any n-variable in- computational problem. stance I by an equivalent one of size nO(1). In other words, P it seems necessary to keep an exponential number of parent POLYTREE LEARNING sets to compute the optimal polytree. Input: A set of vertices N, local scores F = {fv | v ∈ We then consider a parameter that is potentially much N}, and an integer t ∈ N0. Question smaller than n and determine whether POLYTREE LEARN- : Is there an arc-set A ⊆ N × N such ING can be solved efficiently when this parameter is small. that (N, A) is a polytree and score(A) ≥ t? The parameter d, whichwe callthenumberof dependent vari- Given an instance I := (N, F,t) of POLYTREE LEARN- ables, is the number of variables v for which at least one en- ING, an arc-set A is a solution of I if (N, A) is a try of fv is strictly positive. The parameter essentially counts polytree and score(A) ≥ t. Without loss of general- how many variables might receive a nonemptyparent set in an ity we may assume that fv(∅) = 0 for every v ∈ optimal solution. We show that POLYTREE LEARNING can N [Grüttemeier and Komusiewicz, 2020a]. Throughout this be solved in polynomial time when d is constant but that an work, we let n := |N|. O(1) algorithm with running time g(d) · |I| is unlikely for any Solution Structure and Input Representation. We as- computable function g. Consequently, in order to obtain pos- sume that the local scores F are given in non-zero represen- itive results for the parameter d, one needs to consider further tation. That is, f (P ) is part of the input if it is different restrictions on the structure of the input instance. We make a v from 0. For N = {v1,...,vn}, the local scores F are repre- first step in this direction and consider the case where all par- sented by a two-dimensional array [Q ,Q ,...,Q ], where ent sets with a strictly positive score have size at most p. Us- 1 2 n each Qi is an array containing all triples (fvi (P ), |P |, P ) ing this parameterization, we show that every input instance where f (P ) > 0. The size |F| is defined as the number ωdp O(1) vi can be solved in 2 ·|I| time where ω is the matrix mul- of bits needed to store this two-dimensional array. Given an tiplication constant. With the current-best known value for ω instance I := (N, F,t), we define |I| := n + |F| + log(t). dp O(1) this gives a running time of 5.18 · |I| . We then consider For a vertex v, we call PF (v) := {P ⊆ N \{v} | again data reduction approaches. This time we obtain a posi- fv(P ) > 0} ∪ {∅} the set of potential parents of v. Given tive result: Any instance of POLYTREE LEARNING where p is a yes-instance I := (N, F,t) of POLYTREE LEARNING, constant can be reduced in polynomial time to an equivalent A there exists a solution A such that Pv ∈ PF (v) for ev- O(1) A one of size d . Informally, this means that if the instance ery v ∈ N: If there is a vertex with Pv 6∈ PF (v) has only few dependent variables, the parent sets with strictly we can simply replace its parent set by ∅. The run- positive score are small, and there are many nondependent ning times presentend in this paper will also be measured variables, then we can identify some nondependent variables in the maximum number of potential parent sets δF := that are irrelevant for an optimal polytree representing the in- maxv∈N |PF (v)| [Ordyniak and Szeider, 2013]. put data. We note that this result is tight in the following A tool for designing algorithms for POLYTREE LEARNING sense: Under standard complexity-theoretic assumptions it is is the directed superstructure [Ordyniak and Szeider, 2013] impossible to replace each input instance in polynomial time which is the directed graph SF := (N, AF ) with AF := O(1) by an equivalent one with (d + p) variables. Thus, the {(u, v) | ∃P ∈ PF (v): u ∈ P }. In other words, there is an assumption that p is a constant is necessary. arc (u, v) ∈ AF if and only if u is a potential parent of v. Parameterized Complexity. A problem is in the class XP 2 Preliminaries for a parameter k if it can be solved in |I|g(k) time for some Notation. An undirected graph is a tuple (V, E), where V computable function g. In other words, a problem is in XP is a set of vertices and E ⊆ {{u, v} | u, v ∈ V } is a set of if it can be solved within polynomial time for every fixed k. edges. Given a vertex v ∈ V , we define NG(v) := {u ∈ V | A problem is fixed-parameter tractable (FPT) for a parame- {u, v} ∈ E} as the neighborhood of v. A directed graph is a ter k if it can be solved in g(k) · |I|O(1) time for some com- tuple (N, A), where N is a set of verticesand A ⊆ N ×N is a putable g. If a problem is W[1]-hard for a parameter k, then set of arcs. If (u, v) ∈ A we call u a parent of v and v a child it is assumed to not be fixed-parameter tractable for k. A of u. A vertex that has no parents is a source and a vertex problem kernel is an algorithm that, given an instance I with S and i ∈ [1, |P |] is ′ T [v,P,S,i] := max′ T [v,P,S \ S ,i − 1]+ P p1 p2 p3 p4 p5 S ⊆S max max T [v′, P ′, (S′ ∪{p }) \ (P ′ ∪{v′}), |P ′|]. ′ ′ ′ ′ i v ∈S ∪{pi} P ∈PF (v ) ′ ′ v P ⊆S ∪{pi} Note that the two vertex sets P ∪ (S \ S′) ∪{v} and P ′ ∪ ′ ′ ′ ′ ′ Figure 1: Illustration of an entry T [v,P,S,i] where i = 3. (S ∪{pi}) \ (P ∪{v }) ∪{v } = S ∪{pi} share only the vertex pi. Hence, combining the polytree on these two vertex sets results in a polytree. parameter k computes in polynomial time an equivalent in- If i is equal to zero, then the recurrence is stance I′ with parameter k′ such that |I′| + k′ ≤ h(k) for some computable function h. If h is a polynomial, then the T [v, P, S, 0] := problem admits a polynomial kernel. Strong conditional run- ′ ′ ′ ′ ′ ′ fv(P )+max′ ′ max ′ T [v , P ,S \ (P ∪{v }), |P |]. v ∈S P ∈PF (v ) ning time bounds can also be obtained by assuming the Ex- ′ P ⊆S ponential Time Hypothesis (ETH), a standard conjecture in complexity theory [Impagliazzo et al., 2001]. For a detailed Thus, to determine if I is a yes-instance of POLYTREE introduction into parameterized complexity we refer to the LEARNING, it remains to check if T [v,P,N \ (P ∪ standard textbook [Cygan et al., 2015]. {v}), |P |] ≥ t for some v ∈ N and some P ∈ PF (v). The corresponding polytree can be found via traceback. The cor- rectness proof is straightforward and thus omitted. 3 Parameterization by the Number of Vertices n 2 The dynamic programming table T has 2 · n · δF entries. |S| 2 In this section we study the complexity of POLYTREE Each of these entries can be computed in 2 ·n ·δF . Conse- LEARNING when parameterized by n, the number of vertices. quently, all entries can be computed in n n 2i ·|I|O(1) = n−1 i=0 i Note that there are up to n·2 entries in F and thus, the to- 3n · |I|O(1) time in total. To evaluate if there is some v ∈ N tal input size of an instance of POLYTREE LEARNING might P  and some P ∈PF (v) such that T [v,P,N \(P ∪{v}), |P |] ≥ be exponential in n.On the positive side, we show that the t can afterwards be done in O(n · δ ) time. Hence, the total n O(1) F problem can be solved in O(3 · |I| ) time. running time is 3n · |I|O(1). On the negative side, we complement the result above by proving that POLYTREE LEARNING does not admit a poly- A Kernel Lower Bound In this subsection we prove nomial kernel when parameterized by n. In other words, it the following kernel lower bound. The proof is closely is presumably impossible to transform an instance of POLY- related to a similar kernel lower bound for BAYESIAN TREE LEARNING in polynomial time into an equivalent in- NETWORK STRUCTURE LEARNING parameterized stance of size |I| = nO(1). This shows that it is sometimes by n [Grüttemeier and Komusiewicz, 2020a]. necessary to keep an exponential number of parent scores to Theorem 3.2. POLYTREE LEARNING does not admit a poly- compute an optimal polytree. nomial kernel when parameterized by n, unless NP ⊆ coNP/poly. 3.1 An FPT Algorithm Proof. We give a polynomial parameter transforma- Theorem 3.1. POLYTREE LEARNING can be solved in 3n · tion [Bodlaender et al., 2011] from MULTICOLORED |I|O(1) time. INDEPENDENT SET. MULTICOLORED INDEPENDENT SET Proof. Let I := (N, F,t) be an instance of POLYTREE Input LEARNING. We describe a dynamic programming algorithm : An integer k, a graph G = (V, E), and a par- to solve I. We suppose an arbitrary fixed total ordering on the tition (V1,...,Vk) of V such that G[Vi] is a clique for vertices of N. For every P ⊆ N, and every i ∈ [1, |P |], we each i ∈ [1, k]. Question: Does G contain an independent set of denote with pi the ith smallest element of P according to the total ordering. size k? The dynamic programming table T has entries of The parameter for this polynomial parameter transformation type T [v,P,S,i] with v ∈ N, P ∈ PF (v), S ⊆ N \ (P ∪ is the number of vertices not contained in Vk. {v}), and i ∈ [0, |P |]. Given an instance I = (G = (V, E), k) of MULTI- Each entry stores the maximal score of an arc set A of a COLORED INDEPENDENT SET with partition (V1,...,Vk) polytree on {v} ∪ P ∪ S where v has no children, v learns with ℓ := |V | − |Vk|, we construct in polynomial time an exactly the parent set P under A, and foreach j ∈ [i+1, |P |], equivalent instance I′ = (N, F,t) of POLYTREE LEARNING 2

Consequently, S is an independent set of size k in G and, we set fvi (Pv) := 1 for each i ∈ [1, k]. All other local scores are set to 0. Finally, we set t := k. This completes the con- results are based on computing max q-representative sets in a struction of I′. Next, we show that I is a yes-instance of IN- matroid [Fomin et al., 2014; Lokshtanov et al., 2018]. DEPENDENT SET if and only if I′ is a yes-instance of POLY- To apply the technique of representative sets we assume TREE LEARNING. that there is a solution with exactly d · p arcs and every (⇒) Let S ⊆ V be an independent set of size k in G. nonempty potential parent set contains exactly p vertices. k Moreover, let S = {u1,...,uk}. We set A := ∪i=1Pui × vi This can be obtained with the following simple modification and show that D = (N, A) is a polytree of score t = k. of an input instance (N, F,t): For every dependent vertex v

Since S is an independent set in G, for every edge e ∈ E, we we add vertices v1, v2,...,vp to N and set fvi (P ) := 0 for is the endpoint of at most one arc in A. Moreover, there is all their local scores. Then, for every potential parent set P ∈ ∗ no arc (vi, vj ) contained in A and v is only an endpoint of PF (v) with |P | < p we set fv(P ∪ {v1,...,vp−|P |}) := ∗ the arcs (v , vi) for each i ∈ [1, k]. As a consequence, D is fv(P ) and, afterwards, we set fv(P ) := 0. Then, the given a polytree and has score k due to the fact that fvi (Pui )=1 instance is a yes-instance if and only if the modified in- for all i ∈ [1, k]. Hence, I′ is a yes-instance of POLYTREE stance has a solution with exactly d · p arcs. Furthermore, LEARNING. note that fv(∅)=0 for every dependent vertex and every (⇐) Let A ⊆ N × N be an arc set such that D = (N, A) nonempty potential parent set has size exactly p after apply- is a polytree with score at least t = k. By construction, only ing the modification. Before we present the results of this the vertices v1,...,vk can learn a nonempty potential parent section we state the definition of a matroid. set. Moreover, no parent set has score larger than one. As Definition 5.1. A pair M = (E, I), where E is a set and I a consequence, for every i ∈ [1, k] there is some ui ∈ V is a family of subsets of E is a matroid if such that vi learns the parent set Pui under A. We show that S := {u1,...,uk} is an independent set of size k in G. 1. ∅ ∈ I, By construction of the local scores, each parent set Pi con- tains the vertex v∗. Since every vertex of G has degree at 2. if A ∈ I and B ⊆ A, then B ∈ I, and least one, each such parent set has size at least two. Hence, 3. if A, B ∈ I and |A| < |B|, then there exists some b ∈ the vertices ui and uj are distinct if i 6= j as, otherwise, ∗ B \ A such that A ∪{b} ∈ I. the skeleton of D would contain the cycle (vi, v , vj , we) for ∗ each we ∈ Pui \{v }. Moreover, since D is a polytree, dis- Given a matroid M = (E, I), the sets in I are called in- tinct vertices ui and uj are not adjacent in G as, otherwise, dependent sets. A representation of M over a field F is a the vertex w{ui,uj } is contained in both the learned parent mapping ϕ : E → V where V is some vector space over F ∗ such that A ∈ I if and only if the vectors ϕ(a) with a ∈ A are sets Pui and Puj and, hence, the cycle (vi, v , vj , w{ui,uj }) is contained in the skeleton of D. Consequently, I is a yes- linearly independent in V . A matroid with a representation is instance of INDEPENDENT SET since S is an independent set called linear matroid. Given a set B ⊆ E, a set A ⊆ E fits B of size k in G. if A ∩ B = ∅ and A ∪ B ∈ I. In the constructed instance, d = k and δF = n +1. Unless Definition 5.2. Let M = (E, I) be a matroid, let A be afam- the ETH fails, INDEPENDENT SET cannot be solved in no(k) ily of subsets of E, and let w : A → N0 be a weight function. time [Chen et al., 2006] and, hence, POLYTREE LEARNING A subfamily A ⊆ A max q-represents A (with respect to w) cannot be solved in (δ )o(d) · |I|O(1) time. F if for every set B ⊆ E with |B| = q the following holds: If Theorem 4.2 points out a difference between POLY- there is a setbA ∈ A that fits B, there exists some A ∈ A TREE LEARNING and STRUCTURE that fits B, and w(A) ≥ w(A). If A max q-represents A we LEARNING (BNSL), where we aim to learn a DAG. write A⊆q A. b b In BNSL, a nondependent vertex v can be easily re- b b ′ We refer to a set family A where every A ∈ A has size ex- moved from the input instance (N, F,t) by setting N := b N \{v} and modifying the local scores to f ′ (P ) := actly x ∈ N0 as an x-family. Our results rely on the fact that u max q-representative sets of an x-family can be computed ef- max(fu(P ),fu(P ∪{v})). ficiently as stated in a theorem by Lokshtanov et al. [2018] that is based on an algorithm by Fomin et al. [2014]. In 5 Dependent Vertices and Small Parent Sets the following, ω < 2.373 is the matrix multiplication con- Due to Theorem 4.2, fixed-parameter tractability for POLY- stant [Williams, 2012]. TREE LEARNING parameterized by d is presumably not pos- Theorem 5.3 ([Lokshtanov et al., 2018]). Let M = (E, I) sible. However, in instances constructed in the proof of The- be a linear matroid which representation can be encoded with orem 4.2 the maximum parent set size p is not bounded by a k × |E| matrix over the field F2 for some k ∈ N. Let A some computable function in d. In practice there are many be an x-family containing ℓ sets, and let w : A → N0 be a instances where p is relatively small or upper-bounded by weight function. Then, some small constant [van Beek and Hoffmann, 2015]. First, q x+q we provide an FPT algorithm for the parameter d+p, the sum a) there exists some A ⊆ A of size x that can be of the number of dependent vertices and the maximum parent 2 ω computed with O x+q · ℓx3k2 + ℓ x+q kx +(k+ set size. Second, we provide a polynomial kernel for the pa- x q  O(1) b rameter d if the maximum parent set size p is constant. Both |E|) operations in F2, and   q x+q b) there exists some A⊆ A of size x · k · x possible solution in the family A by the arc-set that defines P x+q 3 2 as the parent set of a vertex v. that can be computed with O x · ℓx k + ω−1 b Definition 5.5. Let v ∈ N, and let A be an x-family of sub- ℓ x+q (kx)ω−1 + (k + |E|)O(1) operations in F . A q 2 sets of AF such that Pv = ∅ for every A ∈ A. For a vertex set P ⊆ N we define We next define the matroid we use in this work. Recall that, given an instance (N, F,t) of POLYTREE LEARNING, A⊕v P := {A ∪ (P × v) | A ∈ A and A ∪ (P × v) ∈ I}. the directed superstructure SF is defined as SF := (N, AF ) where AF is the set of arcs that are potentially present in a Observe that for every A ∈ A, the set P × v is disjoint A solution, and we set m := |AF |. In this work we consider from A since Pv = ∅. Consequently, A⊕v P is an (x + the super matroid MF which we define as the graphic ma- |P |)-family. The next lemma ensures that some operations troid [Tutte, 1965] of the super structure. Formally, MF := (including ⊕) are compatible with representative sets. AF (AF , I) where A ⊆ AF is independent if and only (N, A) is Lemma 5.6. Let w : 2 → N0 be a weight function a polytree. with w(A) := score(A). Let A be an x-family of subsets The super matroid is closely related to the acyclicity ma- of AF . troid OLY that has been used for a constrained version of P - q q q TREE LEARNING [Gaspers et al., 2015]. The proof of the a) If A⊆ A and A⊆ A, then A⊆ A. following proposition is along the lines of the proof that the q b) If A ⊆ A and B is an x-family of subsets of AF e b e b graphic matroid is a linear matroid. We provide it here for with B⊆q B, then A ∪ B⊆q A∪B. sake of completeness. c) Letbv ∈ N and let P ⊆ N such that P A = ∅ Proposition 5.4. Let (N, F,t) be an instance of POLYTREE v for everyb .b Then,b if q+|P | it follows LEARNING. Then, the super matroid M is a linear matroid A ∈ A A ⊆ A F q and its representation can be encoded by an n × m matrix that A⊕v P ⊆ A⊕v P . over the field F . b 2 Proof. Statements a) and b) are well-known facts [Cyganb et al., 2015; Fomin et al., 2014]. We prove Proof. We first show that MF is a matroid. Since (N, ∅) contains no cycles, it holds that ∅ ∈ I. Next, if (N, A) Statement c). Let B be a set of size q. Let there be a set that fits . That is, is a polytree for some A ⊆ AF , then (N,B) is a polytree A ∪ (P × v) ∈A⊕v P B for every B ⊆ A. Thus, Conditions 1 and 2 from Defini- B ∩ (A ∪ (P × v)) = ∅, and (1) tion 5.1 hold. We next show that Condition 3 holds. Consider A, B ∈ I B ∪ (A ∪ (P × v)) ∈ I. (2) with |A| < |B|. Let N ′ ⊆ N be the vertices of a connected We show that there is some that fits component of (N, A). Since (N,B) is a polytree, the number A ∪ (P × v) ∈ A⊕v P B of arcs in B between vertices of N ′ is at most the number of and w(A ∪ (P × v)) ≥ w(A ∪ (P × v)). To this end, observe arcs in A between the vertices of N ′. Then, since |B| > that Property (1) implies b b , there exists some that has endpoints in |A| (u, v) ∈ B \ A b B ∩ A = ∅, and (3) two distinct connected components of (N, A). Thus, (N, A ∪ B ∩ (P × v)= ∅. (4) {(u, v)}) is a polytree. Consequently, MF is a matroid. We next consider the representation of MF . Let v1,...,vn n We define B := B ∪ (P × v). Observe that Property (3) be the elements of N. We define the mapping ϕ : AF → F2 together with A ∩ (P × v)= ∅ implies B ∩ A = ∅, and that by letting ϕ((vi, vj )) be the vector where the i-th and the j-th entry equal 1 and all other entries equal 0. It is easy to see Property (2) implies A ∪ B ∈ I. Consequently, A fits B. q+|P | that an arc-set A is independent in MF if and only if the ϕ(a) Then, since |B| = q + |P | due to Property (4) and A⊆ n with a ∈ A are linearly independent in F2 . Clearly, ϕ can A, there exists some A ∈ A that fits B and w(A) ≥ w(A). be encoded by an n × m matrix over F2. Consider A ∪ (P × v). Since A fits B it holds thatb (A ∪ 5.1 An FPT Algorithm (P × v)) ∪ B ∈ I. Moreover,b b observe that w(Ab∪ (P × v)) = w(A)+ fv(Pb) ≥ w(A)+ fv(P )b = w(A ∪ (P × v)).b It We now use the super matroid MF to show that POLYTREE b LEARNING can be solved in 2ωdp · |I|O(1) time where ω is remains to show that (A ∪ (P × v)) ∩ B = ∅. Since A fits B the matrix multiplication constant. The idea of the algorithm andbB ⊆ B it holds that A ∩ B = ∅. Then, Property (4) is simple: Let H := {v1,...,vd} be the set of dependent implies (A ∪ (P × v)) ∩bB = (A ∩ B) ∪ ((P × v) ∩ Bb)) = ∅. vertices, and for i ∈ {0, 1,...,d} let H := {v ,...,v } q i 1 i Thus, A∪(P ×v) fits B and thereforeb A⊕vP ⊆ A⊕v P . be the set containing the first i dependent vertices. The idea b b is that, for every H , we compute a family A of possible We now describe the FPT algorithm. Let I := (N, F,t) i i b b polytrees where only the vertices from {v1,...,vi} learn a be an instance of POLYTREE LEARNING. Let H := nonempty potential parent set. We use the algorithm behind {v1, v2,...,vd} denote the set of dependentvertices of I, and Theorem 5.3 as a subroutine to delete arc-sets from Ai that for i ∈{0, 1,...,d} let Hi := {v1,...,vi} contain the first i are not necessary to find a solution. We next define the op- dependent vertices. Observe that H0 = ∅ and Hd = H. We eration ⊕. Intuitively, A⊕v P means that we extend each define Ai as the family of possible directed graphs (even the Algorithm 1: FPT-algorithm for parameter d + p First, consider Line 3. Since we assume that every nonempty potential parent set contains exactly p vertices, Input: (N, F,t) and dependent vertices v1,...,vd Lemma 5.6 and Lemma 5.7 imply that Ai max ((d − i) · 1 A := {∅} 0 -represents . Note that at this point contains up 2 p) Ai Ai for i =1 ...d do dp to max(1, · n · (i − 1) · p) · δF sets.e 3 b Ai = Ai−1 ⊕v P (i−1)p P ∈PF (vi)\{∅} i Next, consider Line 4. Since we assumee that every 4  Ai :=SComputeRepresentation(Ai, (d − i) · p) nonempty potential parent set contains exactly p vertices, e b 5 return A ∈ Ad such that (N, A) is a polytree the family Ai is an (i · p)-family. Then, the algorithm be- b e and score(A) is maximal hind Theorem 5.3 computes a ((d − i) · p)-representing fam- b b b ily Ai. Then,e by Theorem 5.3 b) and Lemma 5.6 a), Ai dp b max ((d − i) · p)-represents Ai and |Ai| ≤ ip · n · i · p graphs that are no polytrees) where only the vertices in Hi afterb the execution of Line 4. b♦  learn a nonempty potential parent set. Formally, this is b We next show that Ad contains an arc-set that defines a A Pv ∈PF (v) \ {∅} for all v ∈ Hi polytree with maximum score and thus, a solution is returned Ai := A ⊆ AF A . Pv = ∅ for all v ∈ N \ Hi in Line 5. Since we assumeb that there is an optimal solution A   that consists of exactly d·p arcs, this solution is an element of The algorithm is based on the following recurrence. 0 the family Ad. Then, since Ad ⊆ Ad, there exists some A ∈ Lemma 5.7. If i = 0, then Ai = {∅}. If i > 0, Ai can be Ad with A ∪∅∈I and w(A) ≥ w(A). Since A ∪∅∈I, computed by the graph (N, A) is a polytree,b and since w(A) ≥ w(A)bthe scoreb of Abis maximal. b b Ai = Ai−1 ⊕vi P. Running time.b We next analyze the runningb time of the P ∈PF[(vi)\{∅} a a algorithm.b For this analysis, we use the inequality b ≤ 2 Intuitively, Lemma 5.7 states that Ai can be computed by for every b ≤ a. Let i be fixed.  considering Ai−1 and combining every A ∈ Ai−1 with ev- We first analyze the running time of one execution of dp ery arc-set that defines a nonempty potential parent set of vi. Line 3. Since Ai−1 has size at most (i−1)p · n · i · p due The correctness proof is straightforward and thus omitted. We dp O(1) next present the FPT algorithm. to Claim 1, Line 3 can be executed in 2 · |I| time. We next analyzeb the running time of one execution of Theorem 5.8. POLYTREE LEARNING can be solved in 2ωdp · O(1) Line 4. Recall that Ai is an (i · p)-family of size at |I| time, where ω is the matrix multiplication constant. dp most (i−1)p · n · i · p · δF . Furthermore, recall that there Proof. Let I := (N, F,t) be an instance of POLYTREE e are |Ai| many evaluations of the weight function. Combin- LEARNING with dependent vertices H = {v1,...,vd}, let ing the running time from Theorem 5.3 b) with the time for the families Ai for i ∈{0, 1,...,d} be defined as above, and evaluating w, the subroutine takes time AF e let w : 2 → N0 be defined by w(A) := score(A). All representing families considered in this proof are max repre- dp dp O δ (i · p)4n3 senting families with respect to w. We provethat Algorithm 1 ip (i − 1)p F    computes an arc-set A such that (N, A) is a polytree with ω−1 maximal score. dp dp ω + δF (n · i · p) The subroutine ComputeRepresentation (i − 1)p ip (Ai, (d − i) · p)     ! in Algorithm 1 is an application of the algorithm behind The- + (n + m)O(1) orem 5.3 b). It computes a max ((d − i) · p)e-representing dp family for Ai. As a technical remark we mention that the + · n · i · p · δ · |I|O(1) . algorithm as described by Lokshtanov et al. [2018] evalu- (i − 1)p F   ates the weighte w(A) for |Ai| many arc-sets A. We assume evaluating w that each such evaluation w(A) is replaced by the compu- A tation of score(A) = fv(P ) which can be done Therefore,| one execution{z of Line 4} can be done ve∈H v ωdp O(1) in O(1) time. in 2 |I| time. Since there are d repetitions of |I| O(1) Correctness. We first proveP the following invariant. Lines 3–4, and Line 5 can be executed in |I| time, the algorithm runs within the claimed running time. Claim 1. The family Ai max ((d − i) · p)-represents Ai dp 5.2 Problem Kernelization and |Ai|≤ max(1, ip · n · i · p). b We now study problem kernelization for POLYTREE LEARN- Proof . The loop-invariant  holds before entering the loop ING parameterized by d when the maximum parent set size p b since A0 := {∅} in Line 1 and A0 = {∅}. Suppose that is constant. We provide a problem kernel consisting of at the loop-invariant hold for the (i − 1)th execution of the loop. most (dp)p+1+d vertices where each vertex has at most (dp)p ωp O(1) We showb that the loop-invariant holds after the ith execution. potentialparent sets which can be computedin (dp) ·|I| A time. Observe that both, the running time and the kernel (N,B ∩ (P × v)) is a polytree, and fv(P ) ≥ fv(Pv ). size, are polynomial for every constant p. Note also that, Thus, C := B ∪ (P × v) is a solution of (N, F,t) and the C ′ since d + p ∈ O(n), Theorem 3.2 implies that there is pre- number of vertices v that satisfy Pv ⊆ N is bigger than the O(1) A ′ sumably no kernel of size (d + p) that can be computed number of vertices v that satisfy Pv ⊆ N . This contradicts in (d + p)O(1) time. the choice of A. ′ The basic idea of the kernelization is that we use max q- Bound on the size of |N | and δF ′ . By Theorem 5.3, (d−1)p+p dp representations to identify nondependent vertices that are not each Ai has size at most p = p . Conse- dp necessary to find a solution. ′ p ′ quently, δF ≤ (dp) andN ≤ d · p ·p + d ≤ Theorem 5.9. There is an algorithm that, given an in- (dp)p+1b + d. stance (N, F,t) of POLYTREE LEARNING computes in  time (dp)ωp · |I|O(1) an equivalent instance (N ′, F ′,t) such Observe that the instance I := (N ′, F ′,t) from Theo- ′ p+1 p that |N |≤ (dp) + d and δF ′ ≤ (dp) . rem 5.9 is technically not a kernel since the encoding of the integer t and the values of fv(P ) might not be bounded Proof. Let H be the set of dependent verticesof (N, F,t). in d and thus the size of the instance |I| is not bounded Computation of the reduced instance. We describe how we in . We use the following lemma [Etscheid et al., 2017; ′ ′ d compute (N , F ,t). We define the family Av := {P × v | Frank and Tardos, 1987]to show that Theorem 5.9 implies an P ∈ PF (v)} for every v ∈ H and the weight function w : actual polynomial kernel for the parameter d when p is con- Av → N0 by w(P × v) := fv(P ). We then apply the algo- stant. rithm behind Theorem 5.3 a) and compute a max ((d−1)·p)- Lemma 5.10 ([Etscheid et al., 2017]). There is an algorithm representing family A for every A . v v that, given a vector w ∈ Qr and some W ∈ Q com- Given all Av, a vertex w is called necessary if w ∈ H r putes in polynomial time a vector w = (w1,...,wr) ∈ Z 3 or if there exists someb v ∈ H such that (w, v) ∈ A for O(r ) where maxi ,...,r |wi| ∈ 2 and an integer W ∈ Z some A ∈ Ab . We then define N ′ as the set of neces- ∈{1 } v with total encoding length O(r4) such that w · x ≥ W if and sary vertices. Next, F ′ consists of local score functions f ′ : ′ ′ v r N \{v} ′ N \{v} only if w · x ≥ W for every x ∈{0, 1} . 2 → Nb0 with fv(P ) := fv(P ) for every P ∈ 2 . ′ In other words, fv is the limitation of fv on parent sets that Corollary 5.11. POLYTREE LEARNING with constant parent contain only necessary vertices. set size admits a polynomial kernel when parameterized by d. Next, consider the running-time of the computation ′ ′ ′ ′ of (N , F ,t). Since each Av contains at most δF arc sets and Proof. Let (N , F ,t) be the reduced instance from Theo- we assume that every potential parent set has size exactly p, rem 5.9. Let r be the number of triples (P, |P |,fv(P )) in ′ each can be computed in time the two-dimensional array representing F . Clearly, r ≤ Av ′ |N | · δF ′ . Let w be the r-dimensional vector containing all 2 ω dp dp values fv(P ) for all v and P . Applying the algorithm behind O b · δ · p3 · n2 + δ · · n · p + (n + m)O(1). p F F p Lemma 5.10 on w and t computes a vector w and an integer t     ! ′ 4 that has encoding length O((|N | · δF ′ ) ) with the property Observe that we use the symmetry of the binomial coefficient stated in Lemma 5.10. ′ to analyze this running time. After computing all Av, we Substituting all local scores stored in F with the corre- compute N ′ and F ′ in polynomial time in |I|. The overall sponding values in w and substituting t by t converts the in- ′ ′ running time is (dp)ωp · |I|O(1). b stance (N , F ,t) into an equivalent instance which size is Correctness. We next show that (N, F,t) is a yes-instance polynomially bounded in d if p is constant. if and only if (N ′, F ′,t) is a yes-instance. (⇐) Let (N ′, F ′,t) be a yes-instance. Then, there exists 6 Conclusion an arc-set A′ such that (N ′, A′) is a polytree with score at ′ ′ ′ We believe that there is potential for practically relevant exact least t. Since N ⊆ N, fv(P ) = fv(P ) for every v ∈ N , and P ⊆ N ′ \{v} we conclude that (N, A′) is a polytree POLYTREE LEARNING algorithms and that this work could with score at least t. constitute a first step. Next, one should aim for improved (⇒) Let (N, F,t) be a yes-instance. We choose a solu- parameterizations. For example, Theorem 4.1 gives an FPT A ′ algorithm for the parameter δF + d. Can we replace δF tion A for (N, F,t) such that Pv ⊆ N for as many de- pendent vertices v as possible. We prove that this implies or d by smaller parameters? Instead of δF , one could con- A ′ sider parameters that are small when only few dependent that Pv ⊆ N for all dependent vertices. Assume towards A ′ vertices have many potential parent sets. Instead of d, one a contradiction that there is some v ∈ H with Pv 6⊆ N . A could consider parameters of the directed superstructure, for Observe that (Pv × v) ∈ Av \ Av. We then define the example the size of a smallest vertex cover; this parameter arc set A . Since we assume that all B := w∈H\{v} Pw × w never exceeds d. We think that the algorithm with running nonempty potential parent sets haveb size exactly p, we con- n O(1) S time 3 · |I| might be practical for n up to 20 based clude |B| = (d − 1)p. Then, since Av max ((d − 1) · p)- on experience with dynamic programs with a similar run- A represents Av and (Pv × v) ∈ Av fits B we conclude that ning time [Komusiewicz et al., 2011]. A next step should be there is some (P × v) ∈ Av such thatb B ∩ (P × v) = ∅, to combine this algorithm with heuristic data reduction and

b pruning rules to further increase the range of tractable values [Grüttemeier and Komusiewicz, 2020a] Niels Grüttemeier of n. and Christian Komusiewicz. Learning Bayesian networks under sparsity constraints: A parameterized complexity References analysis. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI ’20), pages [Bodlaender et al., 2011] Hans L. Bodlaender, Stéphan 4245–4251. ijcai.org, 2020. Thomassé, and Anders Yeo. Kernel bounds for dis- [Grüttemeier and Komusiewicz, 2020b] Niels Grüttemeier joint cycles and disjoint paths. Theor. Comput. Sci., and Christian Komusiewicz. On the relation of strong 412(35):4570–4578, 2011. triadic closure and cluster deletion. Algorithmica, [Chen et al., 2006] Jianer Chen, Xiuzhen Huang, Iyad A. 82(4):853–880, 2020. Kanj, and Ge Xia. Strong computational lower bounds [Guo and Hsu, 2002] Haipeng Guo and William Hsu. A sur- via parameterized complexity. J. Comput. Syst. Sci., vey of algorithms for real-time Bayesian network infer- 72(8):1346–1367, 2006. ence. In Working Notes of the Joint AAAI/UAI/KDD Work- [Chickering, 1995] David Maxwell Chickering. Learning shop on Real-Time Decision Support and Diagnosis Sys- Bayesian networks is NP-complete. In Proceedings of the tems, 2002. Fifth International Conference on Artificial Intelligence [Impagliazzo et al., 2001] Russell Impagliazzo, Ramamo- and Statistics, (AISTATS’95), pages 121–130. Springer, han Paturi, and Francis Zane. Which problems have 1995. strongly exponential complexity? J. Comput. Syst. Sci., [Chow and Liu, 1968] C. K. Chow and C. N. Liu. Approxi- 63(4):512–530, 2001. mating discrete probability distributions with dependence [Komusiewicz et al., 2011] Christian Komusiewicz, Rolf trees. IEEE Trans. Inf. Theory, 14(3):462–467, 1968. Niedermeier, and Johannes Uhlmann. Deconstructing in- [Cygan et al., 2015] Marek Cygan, Fedor V. Fomin, Lukasz tractability - A multivariate complexity analysis of interval Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin constrained coloring. J. Discrete Algorithms, 9(1):137– Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parame- 151, 2011. terized Algorithms. Springer, 2015. [Korhonen and Parviainen, 2013] Janne H. Korhonen and [Dasgupta, 1999] Sanjoy Dasgupta. Learning polytrees. In Pekka Parviainen. Exact learning of bounded tree-width Proceedings of the 15th Conference on Uncertainty in Ar- Bayesian networks. In Proceedings of the Sixteenth Inter- tificial Intelligence (UAI ’99), pages 134–141. Morgan national Conference on Artificial Intelligence and Statis- Kaufmann, 1999. tics, (AISTATS’13), pages 370–378. JMLR.org, 2013. [Korhonen and Parviainen, 2015] Janne H. Korhonen and [Downey and Fellows, 2013] Rodney G. Downey and Pekka Parviainen. Tractable Bayesian network structure Michael R. Fellows. Fundamentals of Parameterized learning with bounded vertex cover number. In Proceed- Complexity. Texts in Computer Science. Springer, 2013. ings of the Twenty-Eighth Annual Conference on Neural [Etscheid et al., 2017] Michael Etscheid, Stefan Kratsch, Information Processing Systems, (NIPS’15), pages 622– Matthias Mnich, and Heiko Röglin. Polynomial kernels 630. MIT Press, 2015. for weighted problems. J. Comput. Syst. Sci., 84:1–10, [Lokshtanov et al., 2018] Daniel Lokshtanov, Pranabendu 2017. Misra, Fahad Panolan, and Saket Saurabh. Determinis- [Fehri et al., 2019] Hamid Fehri, Ali Gooya, Yuanjun Lu, tic truncation of linear matroids. ACM Trans. Algorithms, Erik Meijering, Simon A. Johnston, and Alejandro F. 14(2):14:1–14:20, 2018. Frangi. Bayesian polytrees with learned deep features for [Ordyniak and Szeider, 2013] Sebastian Ordyniak and Ste- multi-class cell segmentation. IEEE Trans. Image Pro- fan Szeider. Parameterized complexity results for exact cess., 28(7):3246–3260, 2019. Bayesian network structure learning. J. Artif. Intell. Res., [Fomin et al., 2014] Fedor V. Fomin, Daniel Lokshtanov, 46:263–302, 2013. and Saket Saurabh. Efficient computation of representa- [Pearl, 1989] Judea Pearl. Probabilistic reasoning in intel- tive sets with applications in parameterized and exact al- ligent systems - networks of plausible inference. Morgan gorithms. In Proceedings of the 25th Annual ACM-SIAM Kaufmann series in representation and reasoning. Morgan Symposium on Discrete Algorithms (SODA ’14), pages Kaufmann, 1989. 142–151. SIAM, 2014. [Ramaswamy and Szeider, 2021] Vaidyanathan Peruvemba [Frank and Tardos, 1987] András Frank and Éva Tardos. An Ramaswamy and Stefan Szeider. Turbocharging application of simultaneous diophantine approximation in treewidth-bounded Bayesian network structure learning. combinatorial optimization. Comb., 7(1):49–65, 1987. In Proceedings of the 35th AAAI Conference on Artificial [Gaspers et al., 2015] Serge Gaspers, Mikko Koivisto, Math- Intelligence (AAAI ’21), 2021. ieu Liedloff, Sebastian Ordyniak, and Stefan Szeider. On [Safaei et al., 2013] Javad Safaei, Ján Manuch, and Ladislav finding optimal polytrees. Theor.Comput. Sci., 592:49–58, Stacho. Learning polytrees with constant number of roots 2015. from data. In Proceedings of the 26th Australasian Joint Conference on Advances in Artificial Intelligence (AI ’13), volume 8272 of Lecture Notes in Computer Science, pages 447–452. Springer, 2013. [Tutte, 1965] William Thomas Tutte. Lectures on matroids. J. Research of the National Bureau of Standards (B), 69:1– 47, 1965. [van Beek and Hoffmann, 2015] Peter van Beek and Hella- Franziska Hoffmann. Machine learning of Bayesian net- works using constraint programming. In Proceedings of the twenty-first Conference of Principles and Practice of Constraint Programming (CP’15), volume 9255 of Lec- ture Notes in Computer Science, pages 429–445. Springer, 2015. [Williams, 2012] Virginia Vassilevska Williams. Multiply- ing matrices faster than coppersmith-winograd. In Pro- ceedings of the 44th Symposium on Theory of Computing Conference (STOC ’12), pages 887–898. ACM, 2012.