Parameterized Complexity Results for Probabilistic Network Structure Learning

Total Page:16

File Type:pdf, Size:1020Kb

Parameterized Complexity Results for Probabilistic Network Structure Learning Parameterized Complexity Results for Probabilistic Network Structure Learning Stefan Szeider Vienna University of Technology, Austria Workshop on Applications of Parameterized Algorithms and Complexity APAC 2012 Warwick, UK, July 8, 2012 Joint work with: Serge Gaspers Mikko Koivisto Mathieu Liedloff Sebastian Ordyniak UNSW, Sydney U Helsinki U d’Orleans Masaryk U Brno Papers: • Algorithms and Complexity Results for Exact Bayesian Structure Learning. Ordyniak and Szeider. Conference on Uncertainty in Artificial Intelligence (UAI 2010). • An Improved Dynamic Progamming Algorithm for Exact Bayesian Network Structure Learning. Ordyniak and Szeider. NIPS Workshop on Discrete Optimization in Machine Learning (DISCML 2011). • On Finding Optimal Polytrees. Gaspers, Koivisto, Liedloff, Ordyniak, and Szeider. Conference on Artificial Intelligence (AAAI 2012). APAC 2012 2 Stefan Szeider Probabilistic Networks • Bayesian Networks (BNs) were introduced by Judea Perl in 1985 (2011 Turing Award Winner) • A BN is a DAG D=(V,A) plus tables associated with the nodes of the network • In addition to Bayesian networks, also other probabilistic networks have been considered: Markov Random Fields, Judea Pearl Factor Graphs, etc. APAC 2012 3 Stefan Szeider Example: Rain Sprinkler A Bayesian Rain T F T F Network F 0.4 0.6 0.8 0.2 T 0.01 0.99 Sprinkler Rain Sprinkler Rain Grass Wet T F F F 0.8 0.2 Grass F T 0.01 0.99 T F 0.9 0.1 T T 0.99 0.01 APAC 2012 4 Stefan Szeider Applications • diagnosis • computational biology • document classification • information retrieval • image processing • decision support etc. A BN showing the main pathophysiological • relationships in the diagnostic reasoning focused on a suspected pulmonary embolism event APAC 2012 5 Stefan Szeider Computational Problems APAC 2012 6 Stefan Szeider Computational Problems • BN Reasoning: Given a BN, compute the probability of a variable taking a specific value APAC 2012 6 Stefan Szeider Computational Problems • BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. APAC 2012 6 Stefan Szeider Computational Problems • BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. • BN Structure Learning: Given sample data, find the best DAG APAC 2012 6 Stefan Szeider Computational Problems • BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. • BN Structure Learning: Given sample data, find the best DAG • BN Parameter Learning: Given sample data and a DAG, find the best probability tables APAC 2012 6 Stefan Szeider Computational Problems • BN Reasoning: Given a BN, compute the probability of a variable taking a specific value • BN Learning: Given a set of sample data, find a BN that fits the data best. • BN Structure Learning: Given sample data, find the best DAG • BN Parameter Learning: Given sample data and a DAG, find the best probability tables APAC 2012 6 Stefan Szeider BN Learning and Local Scores BN n2 Sample Data n1 n1 n2 n3 n4 n5 n6 .... 0 1 0 1 1 0 .... n4 1 1 0 1 1 0 .... n3 * 1 0 0 * 1 .... 0 * 1 1 * 1 .... n6 1 0 1 1 0 1 .... .... n5 .... APAC 2012 7 ... Stefan Szeider BN Learning and Local Scores Local Score Function f(n,P)=score of node n with P ⊆V as its in-neighbors (parents) BN n2 Sample Data n1 n1 n2 n3 n4 n5 n6 .... 0 1 0 1 1 0 .... n4 1 1 0 1 1 0 .... n3 * 1 0 0 * 1 .... 0 * 1 1 * 1 .... n6 1 0 1 1 0 1 .... .... n5 .... APAC 2012 7 ... Stefan Szeider Our Combinatorial Model • BN Structure Learning: Input: a set N of nodes and a local sore function f (by explicit listing of nonzero tuples) Task: find a DAG D=(N,A) such that the sum of f(n,PD(n)) over all nodes n is maximum. APAC 2012 8 Stefan Szeider “poster” — 2011/12/9 — 21:33 — page 1 — #1 3rd“poster” Workshop — 2011/12/9 on Discrete — 21:33 Optimization — page in1 — Machine #1 Learning (DISCML’11), Sierra Nevada, Spain, Dec 17, 2011 3rd WorkshopAn on Improved Discrete Optimization in Dynamic Machine Learning (DISCML’11), Programming Sierra Nevada, Spain, Algorithm Dec 17, 2011 for Exact Bayesian Network Structure Learning An Improved Dynamic Programming Algorithm for Exact Bayesian Network Structure Learning Sebastian Ordyniak and Stefan Szeider1 Sebastian Ordyniak and Stefan Szeider1 Exact Bayesian Network Structure Learning Super-structure Exact Bayesian NetworkExample Structure Learning Super-structure Problem: Given a local score function f over a set of nodes N find a Idea: Restrict the search to BNs whose skeleton is contained inside an Problem: Given a local score function f over a set of nodes NAnfind aoptimalIdea: BNRestrict the search to BNs whose skeleton is contained inside an BayesianABayesian local network score network (BN) function on (BN)N whose on N scorewhose is maximum score is maximum with respect with to f. respectundirected to f. graphundirected (the super-structure). graph (the super-structure). structure a a nPnPf(n, P)f(n, P) a a a ad d 1 1 { } { } a b,ac, db, c,0d.5 0.5 { } I A good super-structure contains b a,{f 1} I A good super-structure contains {b }a, f 1 c e { }1 b c d an optimal solution. {c } e 1 b c d an optimal solution.b c d d { } 1 7! I A super-structure can be b c d ; 7! I A super-structure can be e db 1 1 efficiently obtained, e.g., using { } ; efficiently obtained, e.g., using f ed b 1 1 the tiling algorithm. { } { } g cf, d d 1 1 e g the tiling algorithm. { }{ } f g c, d 1 e f g A local score{ function} f An optimal BN for f e f g Exact BayesianA local Network score Structure function Learningf is NP-hard.An optimal BN for f e f g Heuristic Algorithms: find first an undirected graph, then greedily orient Dynamic Lower Bound Approach Exact Bayesian Network Structure Learning is NP-hard. the edges,Dynamic then do Programming local search over for a improving Tree Decomposition the orientation Idea: Ignore records that do not representDynamic optimal Lower BNs. Bound Approach Dynamic Programming over a Tree Decomposition APAC 2012 a, b, c, d 9 Stefan Szeider Idea:a, b,Ignorec, d recordsX X that do not represent optimal BNs. a, b, c, d a, b, c, d X X b, c, d b, c, d X b, c, d b, c, d X e, b, c f, b, d g, c, d e, b, c f, b, d g, c, d X Idea: Usee, b a, treec decomposition of a super-structuref, b, d to find an optimal Method:g, c, d e, b, c f, b, d g, c, d BN via dynamic programming (Ordyniak and Szeider, UAI 2010). I For every record R compute a lower bound and an upper bound for X Method: the score of any BN represented by R. I Compactly represent locally optimal (partial) BNs via records ( ). I Ignore all records whose upper bound is below the minimum of the Idea: Use a tree decomposition of a super-structure to find an optimallower bounds over all records seen so far. I Compute the set of all records that represent locally optimal (partial) Method: BN via dynamic programming (Ordyniak and Szeider, UAI 2010). BNs for each node of the tree decomposition in a bottom-up manner. I For every record R compute a lower bound and an upper bound for the scoreExperimental of any BN Results represented by R. Bottleneck:Method: Large Number of Records! I Ignore all records whose upper bound is below the minimum of the I Compactly represent locally optimal (partial) BNs via recordsWe ( compared). the algorithm without (no LB) to the algorithm with the lower bounds over all records seen so far. I Compute the set ofReferences all records that represent locally optimaldynamic (partial) lower bound (LB) on the Alarm network. BNs for each node of the tree decomposition in a bottom-up manner. running time (s) memory usage (MB) I S. Ordyniak, S. Szeider, Algorithms and complexity results for exact super-structure no LB LB noExperimental LB ResultsLB BayesianBottleneck: structure Large learning, Number in: P. of Grünwald, Records! P. Spirtes (Eds.), 14 7 337 180 Proceedings of UAI 2010, The 26th Conference on Uncertainty in SSKEL TIL(We0.01 compared) 20785 the algorithm6393 15679 without (no5346 LB) to the algorithm with the Artificial Intelligence, Catalina Island, California, USA, July 8-11, 2010, S TIL(dynamic0.05) lower46560 bound16712 (LB38525) on the Alarm18786 network. AUAI Press, Corvallis, Oregon, 2010.References S TIL(0.1) 44554 16928 38520 18918 S I I. Tsamardinos, L. Brown, C. Aliferis, The max-min hill-climbing running time (s) memory usage (MB) I S. Ordyniak, S. Szeider, Algorithms and complexity results for exact Bayesian network structure learning algorithm, Machine Learning 65 I SKEL is the skeleton ofsuper-structure the Alarm network.no LB LB no LB LB Bayesian structure learning, in: P. Grünwald, P. Spirtes (Eds.), S (2006) 31–78. I (↵) is a super-structure learned with the tiling algorithm, for the STIL SKEL 14 7 337 180 Proceedings of UAI 2010, The 26th Conference on Uncertainty insignificance level ↵. S I E. Perrier, S. Imoto, S. Miyano, Finding optimal Bayesian network given TIL(0.01) 20785 6393 15679 5346 a super-structure,Artificial Intelligence, J.
Recommended publications
  • Masters Thesis: an Approach to the Automatic Synthesis of Controllers with Mixed Qualitative/Quantitative Specifications
    An approach to the automatic synthesis of controllers with mixed qualitative/quantitative specifications. Athanasios Tasoglou Master of Science Thesis Delft Center for Systems and Control An approach to the automatic synthesis of controllers with mixed qualitative/quantitative specifications. Master of Science Thesis For the degree of Master of Science in Embedded Systems at Delft University of Technology Athanasios Tasoglou October 11, 2013 Faculty of Electrical Engineering, Mathematics and Computer Science (EWI) · Delft University of Technology *Cover by Orestis Gartaganis Copyright c Delft Center for Systems and Control (DCSC) All rights reserved. Abstract The world of systems and control guides more of our lives than most of us realize. Most of the products we rely on today are actually systems comprised of mechanical, electrical or electronic components. Engineering these complex systems is a challenge, as their ever growing complexity has made the analysis and the design of such systems an ambitious task. This urged the need to explore new methods to mitigate the complexity and to create sim- plified models. The answer to these new challenges? Abstractions. An abstraction of the the continuous dynamics is a symbolic model, where each “symbol” corresponds to an “aggregate” of states in the continuous model. Symbolic models enable the correct-by-design synthesis of controllers and the synthesis of controllers for classes of specifications that traditionally have not been considered in the context of continuous control systems. These include qualitative specifications formalized using temporal logics, such as Linear Temporal Logic (LTL). Be- sides addressing qualitative specifications, we are also interested in synthesizing controllers with quantitative specifications, in order to solve optimal control problems.
    [Show full text]
  • Efficient Graph Reachability Query Answering Using Tree Decomposition
    Efficient Graph Reachability Query Answering using Tree Decomposition Fang Wei Computer Science Department, University of Freiburg, Germany Abstract. Efficient reachability query answering in large directed graphs has been intensively investigated because of its fundamental importance in many application fields such as XML data processing, ontology rea- soning and bioinformatics. In this paper, we present a novel indexing method based on the concept of tree decomposition. We show analytically that this intuitive approach is both time and space efficient. We demonstrate empirically the efficiency and the effectiveness of our method. 1 Introduction Querying and manipulating large scale graph-like data has attracted much atten- tion in the database community, due to the wide application areas of graph data, such as GIS, XML databases, bioinformatics, social network, and ontologies. The problem of reachability test in a directed graph is among the fundamental operations on the graph data. Given a digraph G = (V; E) and u; v 2 V , a reachability query, denoted as u ! v, ask: is there a path from u to v? One of the fundamental queries on biological networks is for instance, to find all genes whose expressions are directly or indirectly influenced by a given molecule [15]. Given the graph representation of the genes and regulation events, the question can also be reduced to the reachability query in a directed graph. Recently, tree decomposition methodologies have been successfully applied to solving shortest path query answering over undirected graphs [17]. Briefly stated, the vertices in a graph G are decomposed into a tree in which each node contains a set of vertices in G.
    [Show full text]
  • Depth-First Search & Directed Graphs
    Depth-first Search and Directed Graphs Story So Far • Breadth-first search • Using breadth-first search for connectivity • Using bread-first search for testing bipartiteness BFS (G, s): Put s in the queue Q While Q is not empty Extract v from Q If v is unmarked Mark v For each edge (v, w): Put w into the queue Q The BFS Tree • Can remember parent nodes (the node at level i that lead us to a given node at level i + 1) BFS-Tree(G, s): Put (∅, s) in the queue Q While Q is not empty Extract (p, v) from Q If v is unmarked Mark v parent(v) = p For each edge (v, w): Put (v, w) into the queue Q Spanning Trees • Definition. A spanning tree of an undirected graph G is a connected acyclic subgraph of G that contains every node of G . • The tree produced by the BFS algorithm (with (( u, parent(u)) as edges) is a spanning tree of the component containing s . • The BFS spanning tree gives the shortest path from s to every other vertex in its component (we will revisit shortest path in a couple of lectures) • BFS trees in general are short and bushy Spanning Trees • Definition. A spanning tree of an undirected graph G is a connected acyclic subgraph of G that contains every node of G . • The tree produced by the BFS algorithm (with (( u, parent(u)) as edges) is a spanning tree of the component containing s . • The BFS spanning tree gives the shortest path from s to every other vertex in its component (we will revisit shortest path in a couple of lectures) • BFS trees in general are short and bushy Generalizing BFS: Whatever-First If we change how we store
    [Show full text]
  • Context-Bounded Analysis of Concurrent Queue Systems*
    Context-Bounded Analysis of Concurrent Queue Systems Salvatore La Torre1, P. Madhusudan2, and Gennaro Parlato1,2 1 Universit`a degli Studi di Salerno, Italy 2 University of Illinois at Urbana-Champaign, USA Abstract. We show that the bounded context-switching reachability problem for concurrent finite systems communicating using unbounded FIFO queues is decidable, where in each context a process reads from only one queue (but is allowed to write onto all other queues). Our result also holds when individual processes are finite-state recursive programs provided a process dequeues messages only when its local stack is empty. We then proceed to classify architectures that admit a decidable (un- bounded context switching) reachability problem, using the decidability of bounded context switching. We show that the precise class of decid- able architectures for recursive programs are the forest architectures, while the decidable architectures for non-recursive programs are those that do not have an undirected cycle. 1 Introduction Networks of concurrent processes communicating via message queues form a very natural and useful model for several classes of systems with inherent paral- lelism. Two natural classes of systems can be modeled using such a framework: asynchronous programs on a multi-core computer and distributed programs com- municating on a network. In parallel programming languages for multi-core or even single-processor sys- tems (e.g., Java, web service design), asynchronous programming or event-driven programming is a common idiom that programming languages provide [19,7,13]. In order to obtain higher performance and low latency, programs are equipped with the ability to issue tasks using asynchronous calls that immediately return, but are processed later, either in parallel with the calling module or perhaps much later, depending on when processors and other resources such as I/O be- come free.
    [Show full text]
  • An Efficient Reachability Indexing Scheme for Large Directed Graphs
    1 Path-Tree: An Efficient Reachability Indexing Scheme for Large Directed Graphs RUOMING JIN, Kent State University NING RUAN, Kent State University YANG XIANG, The Ohio State University HAIXUN WANG, Microsoft Research Asia Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposed for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this paper, we introduce a novel graph structure, referred to as path- tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches. Categories and Subject Descriptors: H.2.8 [Database management]: Database Applications—graph index- ing and querying General Terms: Performance Additional Key Words and Phrases: Graph indexing, reachability queries, transitive closure, path-tree cover, maximal directed spanning tree ACM Reference Format: Jin, R., Ruan, N., Xiang, Y., and Wang, H.
    [Show full text]
  • The Surprizing Complexity of Generalized Reachability Games Nathanaël Fijalkow, Florian Horn
    The surprizing complexity of generalized reachability games Nathanaël Fijalkow, Florian Horn To cite this version: Nathanaël Fijalkow, Florian Horn. The surprizing complexity of generalized reachability games. 2010. hal-00525762v2 HAL Id: hal-00525762 https://hal.archives-ouvertes.fr/hal-00525762v2 Preprint submitted on 2 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The surprising complexity of generalized reachability games Nathanaël Fijalkow1,2 and Florian Horn1 1 LIAFA CNRS & Université Denis Diderot - Paris 7, France {nath,florian.horn}@liafa.jussieu.fr 2 ÉNS Cachan École Normale Supérieure de Cachan, France Abstract. Games on graphs provide a natural and powerful model for reactive systems. In this paper, we consider generalized reachability objectives, defined as conjunctions of reachability objectives. We first prove that deciding the winner in such games is PSPACE-complete, although it is fixed-parameter tractable with the number of reachability objectives as parameter. Moreover, we consider the memory requirements for both players and give matching upper and lower bounds on the size of winning strategies. In order to allow more efficient algorithms, we consider subclasses of generalized reachability games. We show that bounding the size of the reachability sets gives two natural subclasses where deciding the winner can be done efficiently.
    [Show full text]
  • Lecture 16: Directed Graphs
    Lecture 16: Directed Graphs BOS ORD JFK SFO DFW LAX MIA Courtesy of Goodrich, Tamassia and Olga Veksler Instructor: Yuzhen Xie Outline Directed Graphs Properties Algorithms for Directed Graphs DFS and BFS Strong Connectivity Transitive Closure DAG and Toppgological Ordering 2 Digraphs A digraph is a ggpraph whose E edges are all directed D Short for “directed graph” C Applications B one-way streets A flights task scheduling 3 Digraph Properties E D A graph G=(V,E) such that Each edge goes in one direction: C Ed(dge (A,B) goes from A to B, Edge (B,A) goes from B to A, B If G is simple, m < n*(n-1). A If we keep in-edges and out-edges in separate adjacency lists, we can perform listing of in-edges and out-edges in time proportional to their size OutgoingEdges(A): (A,C), (A,D), (A,B) IngoingEdges(A): (E,A),(B,A) We say that vertex w is reachable from vertex v if there is a directed path from v to w EiseahablefomAE is reachable from A E is not reachable from D 4 Algorithm DFS(G, v) Directed DFS Input digraph G and a start vertex v of G Output traverses vertices in G reachable from v We can specialize the setLabel(v, VISITED) traversal algorithms (DFS and for all e ∈ G.outgoingEdges(v) BFS) to digraphs by traversing edges only along w ← opposite(v,e) their direction if getLabel(w) = UNEXPLORED DFS(G, w) In the directed DFS algorithm, we have 3 four types of edges E discovery edges 4 bkdback edges D forward edges cross edges A directed DFS starting at a C 2 vertex s visits all the vertices reachable from s B 5
    [Show full text]
  • On the Parameterized Complexity of Polytree Learning
    On the Parameterized Complexity of Polytree Learning Niels Grüttemeier , Christian Komusiewicz and Nils Morawietz∗ Philipps-Universität Marburg, Marburg, Germany {niegru, komusiewicz, morawietz}@informatik.uni-marburg.de Abstract In light of the hardness of handling general Bayesian networks, the learning and inference prob- A Bayesian network is a directed acyclic graph that lems for Bayesian networks fulfilling some spe- represents statistical dependencies between vari- cific structural constraints have been studied exten- ables of a joint probability distribution. A funda- sively [Pearl, 1989; Korhonen and Parviainen, 2013; mental task in data science is to learn a Bayesian Korhonen and Parviainen, 2015; network from observed data. POLYTREE LEARN- Grüttemeier and Komusiewicz, 2020a; ING is the problem of learning an optimal Bayesian Ramaswamy and Szeider, 2021]. network that fulfills the additional property that its underlying undirected graph is a forest. In One of the earliest special cases that has received atten- this work, we revisit the complexity of POLYTREE tion are tree networks, also called branchings. A tree is a LEARNING. We show that POLYTREE LEARN- Bayesian network where every vertex has at most one par- ING can be solved in 3n · |I|O(1) time where n is ent. In other words, every connected component is a di- the number of variables and |I| is the total in- rected in-tree. Learning and inference can be performed in stance size. Moreover, we consider the influence polynomial time on trees [Chow and Liu, 1968; Pearl, 1989]. of the number of variables d that might receive a Trees are, however, very limited in their modeling power nonempty parent set in the final DAG on the com- since every variable may depend only on at most one other plexity of POLYTREE LEARNING.
    [Show full text]
  • An Efficient Strongly Connected Components Algorithm In
    An Efficient Strongly Connected Components Algorithm in the Fault Tolerant Model∗† Surender Baswana1, Keerti Choudhary2, and Liam Roditty3 1 Department of Computer Science and Engineering, IIT Kanpur, Kanpur, India [email protected] 2 Department of Computer Science and Engineering, IIT Kanpur, Kanpur, India [email protected] 3 Department of Computer Science, Bar Ilan University, Ramat Gan, Israel [email protected] Abstract In this paper we study the problem of maintaining the strongly connected components of a graph in the presence of failures. In particular, we show that given a directed graph G = (V, E) with n = |V | and m = |E|, and an integer value k ≥ 1, there is an algorithm that computes in O(2kn log2 n) time for any set F of size at most k the strongly connected components of the graph G \ F . The running time of our algorithm is almost optimal since the time for outputting the SCCs of G \ F is at least Ω(n). The algorithm uses a data structure that is computed in a preprocessing phase in polynomial time and is of size O(2kn2). Our result is obtained using a new observation on the relation between strongly connected components (SCCs) and reachability. More specifically, one of the main building blocks in our result is a restricted variant of the problem in which we only compute strongly connected com- ponents that intersect a certain path. Restricting our attention to a path allows us to implicitly compute reachability between the path vertices and the rest of the graph in time that depends logarithmically rather than linearly in the size of the path.
    [Show full text]
  • Arxiv:1908.08911V1 [Cs.DS] 23 Aug 2019
    Parameterized Algorithms for Book Embedding Problems? Sujoy Bhore1, Robert Ganian1, Fabrizio Montecchiani2, and Martin N¨ollenburg1 1 Algorithms and Complexity Group, TU Wien, Vienna, Austria fsujoy,rganian,[email protected] 2 Engineering Department, University of Perugia, Perugia, Italy [email protected] Abstract. A k-page book embedding of a graph G draws the vertices of G on a line and the edges on k half-planes (called pages) bounded by this line, such that no two edges on the same page cross. We study the problem of determining whether G admits a k-page book embedding both when the linear order of the vertices is fixed, called Fixed-Order Book Thickness, or not fixed, called Book Thickness. Both problems are known to be NP-complete in general. We show that Fixed-Order Book Thickness and Book Thickness are fixed-parameter tractable parameterized by the vertex cover number of the graph and that Fixed- Order Book Thickness is fixed-parameter tractable parameterized by the pathwidth of the vertex order. 1 Introduction A k-page book embedding of a graph G is a drawing that maps the vertices of G to distinct points on a line, called spine, and each edge to a simple curve drawn inside one of k half-planes bounded by the spine, called pages, such that no two edges on the same page cross [20,25]; see Fig.1 for an illustration. This kind of layout can be alternatively defined in combinatorial terms as follows. A k-page book embedding of G is a linear order ≺ of its vertices and a coloring of its edges which guarantee that no two edges uv, wx of the same color have their vertices ordered as u ≺ w ≺ v ≺ x.
    [Show full text]
  • GRAIL: Scalable Reachability Index for Large Graphs
    Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work GRAIL: Scalable Reachability Index for Large Graphs Hilmi Yıldırım1 Vineet Chaoji2 Mohammed J.Zaki1 1Rensselaer Polytechnic Institute Troy, NY 2Yahoo! Labs Bangalore, India 14 September VLDB 2010 Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work Outline Problem Definition & Motivation Background Related Work Interval Labeling Our Approach : GRAIL Index Construction Querying Experiments Experimental Setup & Datasets Results and Comparison with Other Methods Sensitivity to Different Graph Types and Parameters Conclusion & Future Work Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work Problem Definition Reachability Query : Given two vertices u and v in a directed acyclic graph G, is there a path between u and v? Simple in undirected graphs • Any directed graph can be transformed into a dag • A Query(B,I) B C D • Reachable E F G HI Query(D,B) • Not Reachable J Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work Motivation Traditional Applications Class Hierarchies, GIS, • dependency graphs Trending Applications Semantic Web • Biological networks • Citation graphs • Motivation Existing methods do not • scale for large and dense graphs Motivation Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work Related Work Construction Time Query Time Index Size Opt. Tree Cover (Agrawal et al. 89) O(nm) O(n) O(n2) GRIPP (Trissl et al. 07) O(m + n) O(m n) O(m + n) − Dual Labeling (Wang et al. 06) O(n + m + t3) O(1) O(n + t2) PathTree (Jin et al. 08) O(mk) O(mk)/O(mn) O(nk) 2HOP (Cohen et al.
    [Show full text]
  • Bayesian Networks
    Bayesian networks Charles Elkan October 23, 2012 Important: These lecture notes are based closely on notes written by Lawrence Saul. Text may be copied directly from his notes, or paraphrased. Also, these type- set notes lack illustrations. See the classroom lectures for figures and diagrams. Probability theory is the most widely used framework for dealing with uncer- tainty because it is quantitative and logically consistent. Other frameworks either do not lead to formal, numerical conclusions, or can lead to contradictions. 1 Random variables and graphical models Assume that we have a set of instances and a set of random variables. Think of an instance as being, for example, one patient. A random variable (rv for short) is a measurement. For each instance, each rv has a certain value. The name of an rv is often a capital letter such as X. An rv may have discrete values, or it may be real-valued. Terminology: Instances and rvs are universal in machine learning and statis- tics, so many other names are used for them depending on the context and the application. Generally, an instance is a row in a table of data, while an rv is a column. An rv may also be called a feature or an attribute. It is vital to be clear about the difference between an rv and a possible value of the rv. Values are also called outcomes. One cell in a table of data contains the particular outcome of the column rv for the row instance. If this outcome is unknown, we can think of a special symbol such as “?” being in the cell.
    [Show full text]