Parameterized Complexity Results for Probabilistic Network Structure Learning
Stefan Szeider Vienna University of Technology, Austria
Workshop on Applications of Parameterized Algorithms and Complexity APAC 2012 Warwick, UK, July 8, 2012 Joint work with:
Serge Gaspers Mikko Koivisto Mathieu Liedloff Sebastian Ordyniak UNSW, Sydney U Helsinki U d’Orleans Masaryk U Brno
Papers: • Algorithms and Complexity Results for Exact Bayesian Structure Learning. Ordyniak and Szeider. Conference on Uncertainty in Artificial Intelligence (UAI 2010). • An Improved Dynamic Progamming Algorithm for Exact Bayesian Network Structure Learning. Ordyniak and Szeider. NIPS Workshop on Discrete Optimization in Machine Learning (DISCML 2011). • On Finding Optimal Polytrees. Gaspers, Koivisto, Liedloff, Ordyniak, and Szeider. Conference on Artificial Intelligence (AAAI 2012).
APAC 2012 2 Stefan Szeider Probabilistic Networks
• Bayesian Networks (BNs) were introduced by Judea Perl in 1985 (2011 Turing Award Winner) • A BN is a DAG D=(V,A) plus tables associated with the nodes of the network • In addition to Bayesian networks, also other probabilistic networks have been considered: Markov Random Fields, Judea Pearl Factor Graphs, etc.
APAC 2012 3 Stefan Szeider Example:
Rain Sprinkler A Bayesian Rain T F T F Network F 0.4 0.6 0.8 0.2 T 0.01 0.99
Sprinkler Rain
Sprinkler Rain Grass Wet T F F F 0.8 0.2 Grass F T 0.01 0.99 T F 0.9 0.1 T T 0.99 0.01
APAC 2012 4 Stefan Szeider Applications
• diagnosis • computational biology • document classification • information retrieval • image processing • decision support etc. A BN showing the main pathophysiological • relationships in the diagnostic reasoning focused on a suspected pulmonary embolism event APAC 2012 5 Stefan Szeider Computational Problems
APAC 2012 6 Stefan Szeider Computational Problems
• BN Reasoning: Given a BN, compute the probability of a variable taking a specific value
APAC 2012 6 Stefan Szeider Computational Problems
• BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best.
APAC 2012 6 Stefan Szeider Computational Problems
• BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. • BN Structure Learning: Given sample data, find the best DAG
APAC 2012 6 Stefan Szeider Computational Problems
• BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. • BN Structure Learning: Given sample data, find the best DAG • BN Parameter Learning: Given sample data and a DAG, find the best probability tables
APAC 2012 6 Stefan Szeider Computational Problems
• BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. • BN Structure Learning: Given sample data, find the best DAG • BN Parameter Learning: Given sample data and a DAG, find the best probability tables
APAC 2012 6 Stefan Szeider BN Learning and Local Scores
BN n2 Sample Data n1 n1 n2 n3 n4 n5 n6 .... 0 1 0 1 1 0 .... n4 1 1 0 1 1 0 .... n3 * 1 0 0 * 1 ....
0 * 1 1 * 1 .... n6 1 0 1 1 0 1 ...... n5 .... APAC 2012 7 ... Stefan Szeider BN Learning and Local Scores
Local Score Function f(n,P)=score of node n with P ⊆V as its in-neighbors (parents)
BN n2 Sample Data n1 n1 n2 n3 n4 n5 n6 .... 0 1 0 1 1 0 .... n4 1 1 0 1 1 0 .... n3 * 1 0 0 * 1 ....
0 * 1 1 * 1 .... n6 1 0 1 1 0 1 ...... n5 .... APAC 2012 7 ... Stefan Szeider Our Combinatorial Model
• BN Structure Learning: Input: a set N of nodes and a local sore function f (by explicit listing of nonzero tuples) Task: find a DAG D=(N,A) such that the sum of f(n,PD(n)) over all nodes n is maximum.
APAC 2012 8 Stefan Szeider “poster” — 2011/12/9 — 21:33 — page 1 — #1
3rd“poster” Workshop — 2011/12/9 on Discrete — 21:33 Optimization — page in1 — Machine #1 Learning (DISCML’11), Sierra Nevada, Spain, Dec 17, 2011
3rd WorkshopAn on Improved Discrete Optimization in Dynamic Machine Learning (DISCML’11), Programming Sierra Nevada, Spain, Algorithm Dec 17, 2011 for Exact Bayesian Network Structure Learning An Improved Dynamic Programming Algorithm for Exact Bayesian Network Structure Learning Sebastian Ordyniak and Stefan Szeider1
Sebastian Ordyniak and Stefan Szeider1
Exact Bayesian Network Structure Learning Super-structure Exact Bayesian NetworkExample Structure Learning Super-structure Problem: Given a local score function f over a set of nodes N find a Idea: Restrict the search to BNs whose skeleton is contained inside an Problem: Given a local score function f over a set of nodes NAnfind aoptimalIdea: BNRestrict the search to BNs whose skeleton is contained inside an BayesianABayesian local network score network (BN) function on (BN)N whose on N scorewhose is maximum score is maximum with respect with to f. respectundirected to f. graphundirected (the super-structure). graph (the super-structure). structure a a nPnPf(n, P)f(n, P) a a a ad d 1 1 { } { } a b,ac, db, c,0d.5 0.5 { } I A good super-structure contains b a,{f 1} I A good super-structure contains {b }a, f 1 c e { }1 b c d an optimal solution. {c } e 1 b c d an optimal solution.b c d d { } 1 7! I A super-structure can be b c d ; 7! I A super-structure can be e db 1 1 efficiently obtained, e.g., using { } ; efficiently obtained, e.g., using f ed b 1 1 the tiling algorithm. { } { } g cf, d d 1 1 e g the tiling algorithm. { }{ } f g c, d 1 e f g A local score{ function} f An optimal BN for f e f g Exact BayesianA local Network score Structure function Learningf is NP-hard.An optimal BN for f e f g Heuristic Algorithms: find first an undirected graph, then greedily orient Dynamic Lower Bound Approach Exact Bayesian Network Structure Learning is NP-hard. the edges,Dynamic then do Programming local search over for a improving Tree Decomposition the orientation Idea: Ignore records that do not representDynamic optimal Lower BNs. Bound Approach Dynamic Programming over a Tree Decomposition APAC 2012 a, b, c, d 9 Stefan Szeider Idea:a, b,Ignorec, d recordsX X that do not represent optimal BNs.
a, b, c, d a, b, c, d X X
b, c, d b, c, d X
b, c, d b, c, d X e, b, c f, b, d g, c, d e, b, c f, b, d g, c, d
X
Idea: Usee, b a, treec decomposition of a super-structuref, b, d to find an optimal Method:g, c, d e, b, c f, b, d g, c, d BN via dynamic programming (Ordyniak and Szeider, UAI 2010). I For every record R compute a lower bound and an upper bound for X Method: the score of any BN represented by R. I Compactly represent locally optimal (partial) BNs via records ( ). I Ignore all records whose upper bound is below the minimum of the Idea: Use a tree decomposition of a super-structure to find an optimallower bounds over all records seen so far. I Compute the set of all records that represent locally optimal (partial) Method: BN via dynamic programming (Ordyniak and Szeider, UAI 2010). BNs for each node of the tree decomposition in a bottom-up manner. I For every record R compute a lower bound and an upper bound for the scoreExperimental of any BN Results represented by R. Bottleneck:Method: Large Number of Records! I Ignore all records whose upper bound is below the minimum of the I Compactly represent locally optimal (partial) BNs via recordsWe ( compared). the algorithm without (no LB) to the algorithm with the lower bounds over all records seen so far. I Compute the set ofReferences all records that represent locally optimaldynamic (partial) lower bound (LB) on the Alarm network. BNs for each node of the tree decomposition in a bottom-up manner. running time (s) memory usage (MB) I S. Ordyniak, S. Szeider, Algorithms and complexity results for exact super-structure no LB LB noExperimental LB ResultsLB BayesianBottleneck: structure Large learning, Number in: P. of Grünwald, Records! P. Spirtes (Eds.), 14 7 337 180 Proceedings of UAI 2010, The 26th Conference on Uncertainty in SSKEL TIL(We0.01 compared) 20785 the algorithm6393 15679 without (no5346 LB) to the algorithm with the Artificial Intelligence, Catalina Island, California, USA, July 8-11, 2010, S TIL(dynamic0.05) lower46560 bound16712 (LB38525) on the Alarm18786 network. AUAI Press, Corvallis, Oregon, 2010.References S TIL(0.1) 44554 16928 38520 18918 S I I. Tsamardinos, L. Brown, C. Aliferis, The max-min hill-climbing running time (s) memory usage (MB) I S. Ordyniak, S. Szeider, Algorithms and complexity results for exact Bayesian network structure learning algorithm, Machine Learning 65 I SKEL is the skeleton ofsuper-structure the Alarm network.no LB LB no LB LB Bayesian structure learning, in: P. Grünwald, P. Spirtes (Eds.), S (2006) 31–78. I (↵) is a super-structure learned with the tiling algorithm, for the STIL SKEL 14 7 337 180 Proceedings of UAI 2010, The 26th Conference on Uncertainty insignificance level ↵. S I E. Perrier, S. Imoto, S. Miyano, Finding optimal Bayesian network given TIL(0.01) 20785 6393 15679 5346 a super-structure,Artificial Intelligence, J. Mach. Learn. Catalina Res. Island, 9 (2008) California, 2251–2286. USA, July 8-11, 2010, S TIL(0.05) 46560 16712 38525 18786 AUAI Press, Corvallis, Oregon, 2010. The dynamic lower boundS approach reduces the running time and the memory usage by about a halfTIL( or0.1 more!) 44554 16928 38520 18918 I I. Tsamardinos, L. Brown, C. Aliferis, The max-min hill-climbing S 1Vienna University of Technology. Research supported by the European Research Council (COMPLEX REASON, 239962). Bayesian network structure learning algorithm, Machine Learning 65 I is the skeleton of the Alarm network. SSKEL (2006) 31–78. I (↵) is a super-structure learned with the tiling algorithm, for the STIL I E. Perrier, S. Imoto, S. Miyano, Finding optimal Bayesian network given significance level ↵. a super-structure, J. Mach. Learn. Res. 9 (2008) 2251–2286. The dynamic lower bound approach reduces the running time and the memory usage by about a half or more!
1Vienna University of Technology. Research supported by the European Research Council (COMPLEX REASON, 239962). NP-Hardness Theorem: BN Structure Learning is NP-hard. [Chickering 1996] • Proof: By reduction from FAS (NP-h for max deg 4, [Karp 1972])
u u’ u u’
f(x,{u})=1 x x’ f(x’,{u’})=1
v v f(v,{x})=1 f(v,{x’})=1 f(v,{x,x’})=2
APAC 2012 10 Stefan Szeider “poster” — 2011/12/9 — 18:55 — page 1 — #1
3rd Workshop on Discrete Optimization in Machine Learning (DISCML’11), Sierra Nevada, Spain, Dec 17, 2011
Towards Finding Optimal Polytrees
Serge Gaspers1, Mikko Koivisto2, Mathieu Liedloff3, Sebastian Ordyniak1 and Stefan Szeider1
“poster” — 2011/12/9 — 18:55 — page 1 — #1
3rd Workshop on Discrete Optimization in Machine Learning (DISCML’11), Sierra Nevada, Spain, Dec 17, 2011
Exact Probabilistic Network Structure Learning Our Method
Problem: Given a local score function f over a set of nodes N find a DAG (I) Guess an augmentedTowards arc set S := FindingD D0: Optimal Polytrees [ on N whose score is maximum with respect to f. 1 2 Guess two sets D and D0 of at most k vPf (P) v 1 2 arcs each, such that: 3 1 1 Serge Gaspers1, Mikko Koivisto2, Mathieu Liedloff3, Sebastian Ordyniak1 and Stefan Szeider1 { } I The undirected graph induced by the 4 1 0.1 { } arcs in S := D D0 is acyclic. 5 1 0.5 [ { } 3 4 5 5 1, 2 1 I D0 contains only arcs whose heads { } 3 4 5 6 4 0.9 7! are also the heads of arcs in D. { } 6 3, 4 1 I No two arcs in D0 have the same { } 7 5 0.9 head. { } Exact Probabilistic Network Structure6 Learning7 7 4, 5 1 Our Method { } 6 7 Problem: Given a local score function f (II)over Find a set an of optimal nodes extensionN find a DAGA for S:(I) Guess an augmented arc set S := D D0: A local score function f An optimal DAG for f [ on N whose score is maximum with respect to f. Using the weight function Exact Probabilistic Network Learning is NP-hard. 1 2 w(u, v):=fv( u ) fv( ) compute aGuess two sets D and D0 of at most k vPfv(P) { } ; 1 2 set A of arcs that is a heaviest commonarcs each, such that: Known Results 3 1 1 1 2 Very Few Polynomial Cases{ } independent set of the following two I The undirected graph induced by the 4 1 0.1 { } matroids: arcs in S := D D0 is acyclic. 1 2 5 1 1 0.25 3 4 5 [ { } I The in-degree matroid M (S) 5 1, 2 1 1 I D0 contains only arcs whose heads { } 3 4 5 contains all arc sets B such that no 6 4 0.9 7! 3 4 5 are also the heads of arcs in D. { } arc in B has a head that is also a 6 3, 4 1 I No two arcs in D0 have the same Theorem: An optimal branching { } head of an arc in S and no two arcs of 7 5 0.9 head. 3 4 5 3 { } 4 5 B share the6 same head;7 7 4, 5 1 can be found in polynomial { } I The acyclicity matroid M2(S) contains 6 6 7 7 A local score function f An optimal DAG for f all(II) arc Find sets anB optimalsuch that extension the undirectedA for S: (Chu and Liu 1965, Edmonds 1967) graph induced by the arcs in B S is time. [ Using the weight function 6 7 Exact Probabilistic6 Network7 Learning is NP-hard. acyclic. w(u, v):=f ( u ) f ( ) compute a v { } v ; A polytree A branching An arc set S A with maximum score is an optimal k-branching. set A of arcs that is a heaviest common [ 1 2 Known Results independent set of the following two Restricting the structure of the resulting DAG has lead to the following Running Time: Because there are at most n3k possible arc sets S and a results: ( ) ( ) matroids: Theorem: Finding an optimal 1 2 heaviest common1 independent2 set A for M1 S and M2 S can be found in time O(n4) (Brezovec et al., 1986) the overall running time of the I The in-degree matroid M1(S) I An optimal branching can found in polynomial time (Chu and Liu, 3k+4 contains all arc sets B such that no polytree is NP-hard (Dasgupta 1999) algorithm is O(n ). 3 5 1965;Edmonds, 1967). 4 arc in B has a head that is also a I Finding an optimal polytree is NP-hard (Dasgupta, 1999). head of an arc in S and no two arcs of Open Problems 3 4 5 3 4 5 B share the same head;
I Can we find an optimal k-branching in time O(f(k)p( N )) for some I The acyclicity matroid M2(S) contains Our Result 6 7 | | computable function f and polynomial p, i.e., is the problem to find all arc sets B such that the undirected Can parameters make Mainthe Result: For every constant k an optimal k-branching can be found graph induced by the arcs in B S is an optimal k-branching fixed-parameter tractable for parameter k? [ in polynomial time. 6 7 6 7 acyclic. problem tractable? I Can we generalize k-branchings from polytrees to DAGs, i.e., can we A polytreeefficiently A branching find a maximum-score DAGAn arc that set canS beA turnedwith maximum into a score is an optimal k-branching. 1 2 branching by the deletion of at most k arcs? [ Restricting the structure of the resulting DAG has lead to the following Running Time: Because there are at most n3k possible arc sets S and a APAC 2012 11 Stefan Szeider results: heaviest common independent set A for M (S) and M (S) can be found References 1 2 in time O(n4) (Brezovec et al., 1986) the overall running time of the I An optimal branching can found in polynomial time (Chu and Liu, 3 4 5 algorithm is O(n3k+4). 1965;Edmonds, 1967). I Y. J. Chu, T. H. Liu, On the shortest arborescence of a directed graph, Science Sinica 14 (1965) 1396–1400. I Finding an optimal polytree is NP-hard (Dasgupta, 1999). I J. R. Edmonds, Optimum branchings, Journal of Research of the National BureauOpen of Problems Standards 71B (4) (1967) 233–240. 6 7 I S. Dasgupta, Learning polytrees, in: K. B. Laskey,I Can H. Pradewe find (Eds.), an optimal UAI ’99: k-branching in time O(f(k)p( N )) for some Our Result | | A 2-branching Proceedings of the Fifteenth Conference on Uncertaintycomputable in Artificial function Intelligence,f and polynomial p, i.e., is the problem to find Stockholm, Sweden, July 30-August 1, 1999, Morgan Kaufmann, 1999, pp. 134–141. Main Result: For every constant k an optimal k-branching can be found an optimal k-branching fixed-parameter tractable for parameter k? Remark: k-branchings are in betweenin polynomial branchings time. and polytrees. I C. Brezovec, G. Cornuéjols, F. Glover, Two algorithms for weighted matroid intersection, Mathematical Programming 36 (1)I Can (1986) we 39–53. generalize k-branchings from polytrees to DAGs, i.e., can we efficiently find a maximum-score DAG that can be turned into a 1Vienna University of Technology. Research supported by the European Research Council (COMPLEX1 REASON,2 239962). branching by the deletion of at most k arcs? 2University of Helsinki. Research supported by the Academy of Finland (Grand 125637). 3Université d’Orléans. Research supported by the French Agence Nationale de la Recherche (ANR AGAPE ANR-09-BLAN-0159-03). References 3 4 5 I Y. J. Chu, T. H. Liu, On the shortest arborescence of a directed graph, Science Sinica 14 (1965) 1396–1400.
I J. R. Edmonds, Optimum branchings, Journal of Research of the National Bureau of Standards 71B (4) (1967) 233–240.
6 7 I S. Dasgupta, Learning polytrees, in: K. B. Laskey, H. Prade (Eds.), UAI ’99: A 2-branching Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30-August 1, 1999, Morgan Kaufmann, 1999, pp. 134–141.
Remark: k-branchings are in between branchings and polytrees. I C. Brezovec, G. Cornuéjols, F. Glover, Two algorithms for weighted matroid intersection, Mathematical Programming 36 (1) (1986) 39–53.
1Vienna University of Technology. Research supported by the European Research Council (COMPLEX REASON, 239962). 2University of Helsinki. Research supported by the Academy of Finland (Grand 125637). 3Université d’Orléans. Research supported by the French Agence Nationale de la Recherche (ANR AGAPE ANR-09-BLAN-0159-03). Parameterized Complexity
• Identify a parameter k of the problem • Aim: solve the problem in time f(k)nc fixed-parameter tractable (FPT) • XP: solvable in time nf(k) • W-classes: FPT ⊆W[1] ⊆W[2]⊆···⊆ XP Downey and Fellows: Parameterized Complexity Springer 1999
APAC 2012 12 Stefan Szeider Parameters for BN Structure Learning?
1. Parameterized by the treewidth (of the super structure) 2. Parameterized Local Search 3. Parameterized by distance from being a Branching
APAC 2012 13 Stefan Szeider 1.
BN Structure learning parameterized by the treewidth (of the super structure)
APAC 2012 14 Stefan Szeider “poster” — 2011/12/9 — 21:33 — page 1 — #1
3rd Workshop on Discrete Optimization in Machine Learning (DISCML’11), Sierra Nevada, Spain, Dec 17, 2011 “poster” — 2011/12/9 — 21:33 — page 1 — #1
3rd Workshop on Discrete Optimization in Machine LearningAn (DISCML’11), Improved Sierra Nevada, Dynamic Spain, Dec 17, 2011 Programming Algorithm for Exact “poster” — 2011/12/9 — 21:33 — page 1 — #1 Bayesian Network Structure Learning An Improved Dynamic Programming Algorithm for Exact 3rd Workshop on Discrete Optimization in MachineThe Learning (DISCML’11), Super Sierra Nevada, Spain, DecStructure 17, 2011 Bayesian Network Structure Learning 1 An Improved Dynamic Programming Algorithm for Exact Sebastian Ordyniak and Stefan Szeider Bayesian Network Structure Learning Sebastian Ordyniak and Stefan Szeider1 Super structure Sf of a local score function f is the undirected • 1 Sebastian Ordyniak and Stefan Szeider graph Sf=(N,E) containing all edges that can participate in an optimal DAG [Perrier, Imoto, ExactMiyano Bayesian 2008] Network Structure Learning Super-structure Exact Bayesian Network StructureProblem: Learning Given a local score function f over a setSuper-structure of nodes N find a Idea: Restrict the search to BNs whose skeleton is contained inside an Exact Bayesian Network Structure Learning Super-structure Problem: Given a local score function f over a setBayesian of nodes N networkfind a (BN)Idea: on NRestrictwhose the score search is to maximum BNs whose with skeleton respect is contained to f. undirected inside an graph (the super-structure). Problem: Given a local score function f over a set ofBayesian nodes N find network a (BN)Idea: onRestrictN whose the search score to BNs is maximum whose skeleton with isrespect contained to insidef. undirected an graph (the super-structure). a Bayesian network (BN) on N whose score is maximum with respect to f. undirected graph (the super-structure). nPa f(n, P) a a nPf(n, P) a nPf(n, P) a a d 1 a d 1 a d { } { } 1 , , . a b, c, d 0.5 { } a b c d 0 5 { } a b, c, d 0.5 { } I A good super-structure contains b a, f 1 { } I A good super-structure contains b a, f 1 { } b a, f 1 I A good super-structure contains c e 1 b c d an optimal solution. { } an optimal solution. { } { } b cc e d1 an optimal solution.b c d d 1 7! c e I A1 super-structure can beb c d c b c d ; { } { } b I A super-structured can be e b 1 d efficiently1 obtained,7! e.g., using d 1I A super-structure7! can be { } ; ; f d 1 the tiling algorithm. efficiently obtained, e.g., using efficiently obtained, e.g., using { } e b 1 e b 1 g c, d 1 e g { } { } the tiling algorithm. { } f f d 1 f d 1 the tiling algorithm. { } { } A local score function f An optimal BN for gf c, d 1 e e f g gf c, d g 1 e g { } { } f Exact Bayesian Network Structure Learning is NP-hard. A local score function f An optimal BN for f e f g local score functionDynamic Lower Boundsuper ApproachA local structure score function f An optimalBN BN for f e f g Dynamic Programming over a Tree DecompositionExact Bayesian Network Structure Learning is NP-hard. Idea: Ignore records that do not represent optimal BNs. Exact Bayesian Network Structure LearningDynamic is NP-hard. Lower Bound Approach a, b, c, d Dynamic Programminga, overb, c, d a TreeX DecompositionX Dynamic Lower Bound Approach Idea: Ignore records that do not represent optimal BNs. Dynamic Programming over a Tree Decomposition APAC 2012 15 Stefan Szeider Idea: Ignore records that do not represent optimal BNs. a, b, c, d a, b, c, d X X b, c, d b, c, d X a, b, c, d a, b, c, d X X
, , e, b, c f, b, d g, c, d e, b, c f,bb, cd d g, c, d b, c, d X
X b, c, d b, c, d X Idea: Use a tree decomposition of a super-structure to find an optimal Method: BN via dynamic programming (Ordyniak and Szeider, UAI 2010). e, b, c I For every recordf, b,Rdcompute a lower bound and ang, upperc, d bound for e, b, c f, b, d g, c, d Method: the score of any BN represented by R. I Compactly represent locally optimal (partial) BNs via records ( ). I Ignore all records whose upper bound is below the minimum of the X lower bounds over all records seen so far. I Compute the set of all records that represent locally optimal (partial) e, b, c f, b, d g, c, d e, b, c f, b, d g, c, d BNs for each node of the tree decomposition in aIdea: bottom-upUse manner. a tree decomposition of a super-structure to find an optimal Method: BN via dynamic programming (OrdyniakExperimental and Szeider, Results UAI 2010). Bottleneck: Large Number of Records! I For every record R compute a lower bound and an upper bound for X Method: We compared the algorithm without (no LB) to the algorithm with the the score of any BN represented by R. References dynamic lower bound (LB) on the Alarm network. I Compactly represent locally optimal (partial) BNsIdea: viaUse records a tree ( ). decompositionI Ignore of all a records super-structure whose upper to bound find an is below optimal the minimumMethod: of the lower bounds over all records seen so far. I Compute the set of all records that representrunning time locallyBN (s) viamemory optimal dynamic usage (partial) (MB) programming (Ordyniak and Szeider, UAI 2010). I S. Ordyniak, S. Szeider, Algorithms and complexity results for exact I For every record R compute a lower bound and an upper bound for super-structure no LB LB no LB LB Bayesian structure learning, in: P. Grünwald, P. SpirtesBNs (Eds.), for each node of the tree decomposition in a bottom-up manner. SKEL 14 7 337 180 the score of any BN represented by R. Proceedings of UAI 2010, The 26th Conference on Uncertainty in S Method: Experimental Results TIL(0.01) 20785 6393 15679 5346 Artificial Intelligence, Catalina Island, California, USA,Bottleneck: July 8-11, Large 2010, Number ofS Records! I Compactly represent locally optimal (partial) BNs via records ( ). I Ignore all records whose upper bound is below the minimum of the TIL(0.05) 46560 16712 38525 18786 AUAI Press, Corvallis, Oregon, 2010. S lower bounds over all records seen so far. TIL(0.1) 44554 16928I 38520Compute the18918 set ofWe all compared records that the algorithm represent without locally (no optimal LB) to the (partial) algorithm with the I I. Tsamardinos, L. Brown, C. Aliferis, The max-min hill-climbing S References BNs for each nodedynamic of the tree lower decomposition bound (LB) on the in aAlarm bottom-up network. manner. Bayesian network structure learning algorithm, Machine Learning 65 I is the skeleton of the Alarm network. SSKEL (2006) 31–78. I (↵) is a super-structure learned with the tiling algorithm, for the STIL running time (s) memory usage (MB) Experimental Results I S. Ordyniak, S. Szeider,significance Algorithms level ↵ and. complexity results for exact I E. Perrier, S. Imoto, S. Miyano, Finding optimal Bayesian network given Bottleneck: Large Number of Records!super-structure no LB LB no LB LB Bayesian structure learning, in: P. Grünwald, P. Spirtes (Eds.), a super-structure, J. Mach. Learn. Res. 9 (2008) 2251–2286. 14 7 337 180We compared the algorithm without (no LB) to the algorithm with the Proceedings of UAIThe 2010, dynamic The lower 26th bound Conference approach reduces on Uncertainty the running in time and the SSKEL memory usage by about a half or more! TIL(0.01) 20785 6393 15679 5346 Artificial Intelligence, Catalina Island, California, USA, July 8-11, 2010, ReferencesS dynamic lower bound (LB) on the Alarm network. TIL(0.05) 46560 16712 38525 18786 1Vienna University of Technology. Research supported by the European ResearchAUAI Council Press, (COMPLEX Corvallis, REASON, 239962). Oregon, 2010. S TIL(0.1) 44554 16928 38520 18918 running time (s) memory usage (MB) I S. Ordyniak, S. Szeider, AlgorithmsS and complexity results for exact I I. Tsamardinos, L. Brown, C. Aliferis, The max-min hill-climbing super-structure no LB LB no LB LB Bayesian network structure learning algorithm, MachineBayesian Learning structure 65 learning,I SKEL in:is the P. Grünwald, skeleton of the P. Spirtes Alarm network. (Eds.), S SKEL 14 7 337 180 (2006) 31–78. Proceedings of UAI 2010,I The(↵) 26this a super-structure Conference on learned Uncertainty with the tiling in algorithm, for the STIL S significance level ↵. TIL(0.01) 20785 6393 15679 5346 I E. Perrier, S. Imoto, S. Miyano, Finding optimal BayesianArtificial network Intelligence, given Catalina Island, California, USA, July 8-11, 2010, S TIL(0.05) 46560 16712 38525 18786 a super-structure, J. Mach. Learn. Res. 9 (2008) 2251–2286.AUAI Press, Corvallis, Oregon, 2010. S The dynamic lower bound approach reduces the running time and the TIL(0.1) 44554 16928 38520 18918 I I. Tsamardinos, L. Brown,memory C. usage Aliferis, by about The max-min a half or more! hill-climbing S
Bayesian network structure learning algorithm, Machine Learning 65 I SKEL is the skeleton of the Alarm network. 1Vienna University of Technology. Research supported by the European Research Council (COMPLEX REASON, 239962). S (2006) 31–78. I (↵) is a super-structure learned with the tiling algorithm, for the STIL I E. Perrier, S. Imoto, S. Miyano, Finding optimal Bayesian network given significance level ↵. a super-structure, J. Mach. Learn. Res. 9 (2008) 2251–2286. The dynamic lower bound approach reduces the running time and the memory usage by about a half or more!
1Vienna University of Technology. Research supported by the European Research Council (COMPLEX REASON, 239962). Hardness Theorem: BN Structure Learning is W[1]-hard when parameterized by the treewidth of the super structure.
APAC 2012 16 Stefan Szeider Hardness Theorem: BN Structure Learning is W[1]-hard when parameterized by the treewidth of the super structure.
Proof: by reduction from multi-colored clique (MCC).
APAC 2012 16 Stefan Szeider Hardness Theorem: BN Structure Learning is W[1]-hard when parameterized by the treewidth of the super structure.
Proof: by reduction from multi-colored clique (MCC).
v11 v12 v13 v11 v12 v13
a13 a12
v33 v21 7! v33 v21
v32 v22 v32 a23 v22
v31 v23 v31 v23
Figure 7: An example for the graph G and the super-structure Sf according to the con- MCCstruction instance, used in the k=3 proof of Theorem 3.
APAC 2012v11 v12 v13 v11 v1216 v13 Stefan Szeider
a13 a12
v33 v21 7! v33 v21
v32 v22 v32 a23 v22
v31 v23 v31 v23
GD(E0)
Figure 8: The example graph G from Figure 7 together with an edge set E0, given as the bold edges in the illustration of G, and the resulting dag D(E0).
Before we go on to show Claim 5 we give some further notation and explanation. Let E E(G). We denote by V (E ) the set of all nodes incident to edges in E ,i.e.,the 0 ✓ 0 0 set e E e. We say that E0 is representable if for every 1 i