Efficient Frequent Subtree Mining Beyond Forests

Total Page:16

File Type:pdf, Size:1020Kb

Efficient Frequent Subtree Mining Beyond Forests Efficient Frequent Subtree Mining Beyond Forests Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn vorgelegt von Pascal Welke aus Bonn Bonn, 2018 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Diese Arbeit wurde von Dr. Tamás Horváth und Prof. Dr. Stefan Wrobel betreut 1. Gutachter: Prof. Dr. Stefan Wrobel 2. Gutachter: Prof. Dr. Christian Bauckhage Tag der Promotion: 20.3.2019 Erscheinungsjahr: 2019 Pascal Welke Department of Computer Science III [email protected] ii Declaration I, Pascal Welke, confirm that this work is my own and is expressed in my own words.Any uses made within it of the works of other authors in any form (e.g. ideas, equations, fig- ures, text, tables, programs) are properly acknowledged at the point of their use. A full list of the references employed has been included. iii Acknowledgments This thesis would not have been possible without the support of many people. In partic- ular, I would like to thank my supervisors Tamás Horváth and Stefan Wrobel. Without them, this thesis would not exist. I don’t know many people that have the opportunity to talk to their supervisor as frequently as I did with Tamás. I know even less1, who ad- ditionally got invited to several intense weeks of discussions, writing, no Internet, and great food and wine in Nemesvita. I’d like to thank Stefan for the freedom, the generous support of my work, and for creating our motivating working environment. During the course of my work on this thesis I was surrounded by a great group of peo- ple. CAML became KDML, countless coffees were imbibed, discussions were had and slowly an idea for this thesis manifested. Thank you, Krisztian Buza, Thomas Gärtner, Laurentiu Ilici, Michael Kamp, Olana Missura, Daniel Paurat, Till Schulz, Florian Seif- farth, Katrin Ullrich, and all the great people at Fraunhofer IAIS. A special thank you also goes to Ionut Andone, Konrad Błaszkiewicz, and Alexander Markowetz for the fruitful distractions. I’d like to thank Lars Borutzky, Moritz Fürneisen, Tamás Horváth, Michael Kamp, Ra- jkumar Ramamurthy, Florian Seiffarth, Till Schulz, and Stefan Welke, who have read var- ious draft versions of my thesis and gave valuable feedback, pointed out errors, and asked nasty helpful questions. All remaining mistakes are of course my own responsibility. Last but not least, I want to thank my parents Daniela and Stefan Welke and my girl- friend Hanna Hünert for everything. You are the best! 1 All of them were supervised by Tamás v Abstract A common paradigm in distance-based learning is to embed the instance space into some appropriately chosen feature space equipped with a metric and to define the dissimilar- ity between instances by the distance of their images in the feature space. If the instances are graphs, then frequent connected subgraphs are a well-suited pattern language to de- fine such feature spaces. Identifying the set of frequent connected subgraphs andsub- sequently computing embeddings for graph instances, however, is computationally in- tractable. As a result, existing frequent subgraph mining algorithms either restrict the structural complexity of the instance graphs or require exponential delay between the output of subsequent patterns. Hence distance-based learners lack an efficient way to op- erate on arbitrary graph data. To resolve this problem, in this thesis we present a mining system that gives up the demand on the completeness of the pattern set to instead guar- antee a polynomial delay between subsequent patterns. Complementing this, we devise efficient methods to compute the embedding of arbitrary graphs into the Hamming space spanned by our pattern set. As a result, we present a system that allows to efficiently ap- ply distance-based learning methods to arbitrary graph databases. To overcome the computational intractability of the mining step, we consider only fre- quent subtrees for arbitrary graph databases. This restriction alone, however, does not suf- fice to make the problem tractable. We reduce the mining problem from arbitrary graphs to forests by replacing each graph by a polynomially sized forest obtained from a random sample of its spanning trees. This results in an incomplete mining algorithm. However, we prove that the probability of missing a frequent subtree pattern is low. We show em- pirically that this is true in practice even for very small sized forests. As a result, our algorithm is able to mine frequent subtrees in a range of graph databases where state-of- the-art exact frequent subgraph mining systems fail to produce patterns in reasonable time or even at all. Furthermore, the predictive performance of our patterns is compara- ble to that of exact frequent connected subgraphs, where available. The above method considers polynomially many spanning trees for the forest, while many graphs have exponentially many spanning trees. The number of patterns found by our mining algorithm can be negatively influenced by this exponential gap. We hence propose a method that can (implicitly) consider forests of exponential size, while remain- ing computationally tractable. This results in a higher recall for our incomplete mining algorithm. Furthermore, the methods extend the known positive results on the tractabil- ity of exact frequent subtree mining to a novel class of transaction graphs. We conjecture that the next natural extension of our results to a larger transaction graph class is at least as difficult as proving whether P = NP, or not. Regarding the graph embedding step, we apply a similar strategy as in the mining step. We represent a novel graph by a forest of its spanning trees and decide whether the fre- quent trees from the mining step are subgraph isomorphic to this forest. As a result, the embedding computation has one-sided error with respect to the exact subgraph isomor- phism test but is computationally tractable. Furthermore, we show that we can leverage a partial order on the pattern set. This structure can be used to reduce the runtime of the embedding computation dramatically. For the special case of Jaccard-similarity between graph embeddings, a further substantial reduction of runtime can be achieved using min- hashing. The Jaccard-distance can be approximated using small sketch vectors that can be computed fast, again using the partial order on the tree patterns. vii Die Welt geht nicht unter, wenn wir nicht vollständig aufzählen. (Tamás Horváth) ix Contents 1. Introduction 1 1.1. A Motivating Experiment ............................... 4 1.2. Contributions ...................................... 6 1.2.1. Efficient Frequent Subtree Mining .................... 7 1.2.2. Fast Computation in Probabilistic Subtree Feature Spaces ...... 10 1.3. Outline .......................................... 11 1.4. Previously Published Work .............................. 12 2. Preliminaries 13 2.1. Notions and Notation ................................. 13 2.2. Frequent Connected Subgraph Mining ...................... 18 2.2.1. A Generic Levelwise Mining Algorithm ................. 20 2.2.2. The Computational Complexity of Frequent Subtree Mining .... 23 2.3. Embedding Computation ............................... 27 2.4. Datasets ......................................... 28 3. Related Work 33 3.1. Algorithms for the SubgraphIsomorphism Problem ........... 35 3.1.1. Embedding Lists and Exponential Algorithms ............. 37 3.2. Algorithms for the FCSM Problem ......................... 39 3.2.1. Frequent Tree Mining Algorithms ..................... 39 3.2.2. Frequent Subgraph Mining Algorithms ................. 42 3.2.3. Algorithms for Relaxed Problems ..................... 44 4. Probabilistic Frequent Subtrees 49 4.1. Mining Probabilistic Frequent Subtrees ...................... 51 4.1.1. The Relaxed Frequent Subtree Mining Problem ............ 51 4.1.2. Probabilistic Bounds and the Importance of Subtrees ........ 56 4.1.3. Implementation Issues and Runtime Analysis ............. 58 4.2. Experimental Evaluation ............................... 60 4.2.1. Runtime ..................................... 61 4.2.2. Recall ...................................... 65 4.2.3. Stability of Probabilistic Subtree Patterns ................ 67 4.2.4. Predictive Performance ........................... 69 4.3. Summary ......................................... 72 5. Boosted Probabilistic Frequent Subtrees 73 5.1. An Efficient Embedding Operator for Trees ................... 75 x Contents 5.2. Mining Boosted Probabilistic Frequent Subtrees ................ 82 5.2.1. Implementation Issues ........................... 85 5.2.2. Experimental Evaluation .......................... 86 5.3. Exact Frequent Subtree Mining on Locally Easy Graphs ............ 90 5.4. Summary and Open Questions ........................... 93 6. Fast Computation in Probabilistic Subtree Feature Spaces 95 6.1. Complete Embeddings into Subtree Feature Spaces .............. 97 6.2. Min-Hashing in Subtree Feature Spaces ...................... 101 6.3. Experimental Evaluation ............................... 104 6.3.1. Efficiency Gains ................................ 105 6.3.2. Predictive and Retrieval Performance .................. 107 6.4. Summary and Open Questions ........................... 110 7. Conclusion 113 7.1. Discussion ........................................ 113 7.2. Outlook .........................................
Recommended publications
  • Applications of the Quantum Algorithm for St-Connectivity
    Applications of the quantum algorithm for st-connectivity KAI DELORENZO1 , SHELBY KIMMEL1 , AND R. TEAL WITTER1 1Middlebury College, Computer Science Department, Middlebury, VT Abstract We present quantum algorithms for various problems related to graph connectivity. We give simple and query-optimal algorithms for cycle detection and odd-length cycle detection (bipartiteness) using a reduction to st-connectivity. Furthermore, we show that our algorithm for cycle detection has improved performance under the promise of large circuit rank or a small number of edges. We also provide algo- rithms for detecting even-length cycles and for estimating the circuit rank of a graph. All of our algorithms have logarithmic space complexity. 1 Introduction Quantum query algorithms are remarkably described by span programs [15, 16], a linear algebraic object originally created to study classical logspace complexity [12]. However, finding optimal span program algorithms can be challenging; while they can be obtained using a semidefinite program, the size of the program grows exponentially with the size of the input to the algorithm. Moreover, span programs are designed to characterize the query complexity of an algorithm, while in practice we also care about the time and space complexity. One of the nicest span programs is for the problem of undirected st-connectivity, in which one must decide whether two vertices s and t are connected in a given graph. It is “nice” for several reasons: It is easy to describe and understand why it is correct. • It corresponds to a quantum algorithm that uses logarithmic (in the number of vertices and edges of • the graph) space [4, 11].
    [Show full text]
  • The Normalized Laplacian Spectrum of Subdivisions of a Graph
    The normalized Laplacian spectrum of subdivisions of a graph Pinchen Xiea,b, Zhongzhi Zhanga,c, Francesc Comellasd aShanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China bDepartment of Physics, Fudan University, Shanghai 200433, China cSchool of Computer Science, Fudan University, Shanghai 200433, China dDepartment of Applied Mathematics IV, Universitat Polit`ecnica de Catalunya, 08034 Barcelona Catalonia, Spain Abstract Determining and analyzing the spectra of graphs is an important and excit- ing research topic in mathematics science and theoretical computer science. The eigenvalues of the normalized Laplacian of a graph provide information on its structural properties and also on some relevant dynamical aspects, in particular those related to random walks. In this paper, we give the spec- tra of the normalized Laplacian of iterated subdivisions of simple connected graphs. As an example of application of these results we find the exact val- ues of their multiplicative degree-Kirchhoff index, Kemeny's constant and number of spanning trees. Keywords: Normalized Laplacian spectrum, Subdivision graph, Degree-Kirchhoff index, Kemeny's constant, Spanning trees 1. Introduction Spectral analysis of graphs has been the subject of considerable research effort in mathematics and computer science [1, 2, 3], due to its wide appli- cations in this area and in general [4, 5]. In the last few decades a large body of scientific literature has established that important structural and dynamical properties
    [Show full text]
  • Matroidal Structure of Rough Sets from the Viewpoint of Graph Theory
    Hindawi Publishing Corporation Journal of Applied Mathematics Volume 2012, Article ID 973920, 27 pages doi:10.1155/2012/973920 Research Article Matroidal Structure of Rough Sets from the Viewpoint of Graph Theory Jianguo Tang,1, 2 Kun She,1 and William Zhu3 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China 2 School of Computer Science and Engineering, XinJiang University of Finance and Economics, Urumqi 830012, China 3 Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China Correspondence should be addressed to William Zhu, [email protected] Received 4 February 2012; Revised 30 April 2012; Accepted 18 May 2012 Academic Editor: Mehmet Sezer Copyright q 2012 Jianguo Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Constructing structures with other mathematical theories is an important research field of rough sets. As one mathematical theory on sets, matroids possess a sophisticated structure. This paper builds a bridge between rough sets and matroids and establishes the matroidal structure of rough sets. In order to understand intuitively the relationships between these two theories, we study this problem from the viewpoint of graph theory. Therefore, any partition of the universe can be represented by a family of complete graphs or cycles. Then two different kinds of matroids are constructed and some matroidal characteristics of them are discussed, respectively. The lower and the upper approximations are formulated with these matroidal characteristics.
    [Show full text]
  • Permutation of Elements in Double Semigroups
    PERMUTATION OF ELEMENTS IN DOUBLE SEMIGROUPS MURRAY BREMNER AND SARA MADARIAGA Abstract. Double semigroups have two associative operations ◦, • related by the interchange relation: (a • b) ◦ (c • d) ≡ (a ◦ c) • (b ◦ d). Kock [13] (2007) discovered a commutativity property in degree 16 for double semigroups: as- sociativity and the interchange relation combine to produce permutations of elements. We show that such properties can be expressed in terms of cycles in directed graphs with edges labelled by permutations. We use computer alge- bra to show that 9 is the lowest degree for which commutativity occurs, and we give self-contained proofs of the commutativity properties in degree 9. 1. Introduction Definition 1.1. A double semigroup is a set S with two associative binary operations •, ◦ satisfying the interchange relation for all a,b,c,d ∈ S: (⊞) (a • b) ◦ (c • d) ≡ (a ◦ c) • (b ◦ d). The symbol ≡ indicates that the equation holds for all values of the variables. We interpret ◦ and • as horizontal and vertical compositions, so that (⊞) ex- presses the equivalence of two decompositions of a square array: a b a b a b (a ◦ b) • (c ◦ d) ≡ ≡ ≡ ≡ (a • c) ◦ (b • d). c d c d c d This interpretation of the operations extends to any double semigroup monomial, producing what we call the geometric realization of the monomial. The interchange relation originated in homotopy theory and higher categories; see Mac Lane [15, (2.3)] and [16, §XII.3]. It is also called the Godement relation by arXiv:1405.2889v2 [math.RA] 26 Mar 2015 some authors; see Simpson [21, §2.1].
    [Show full text]
  • G-Parking Functions and Tree Inversions
    G-PARKING FUNCTIONS AND TREE INVERSIONS DAVID PERKINSON, QIAOYU YANG, AND KUAI YU Abstract. A depth-first search version of Dhar’s burning algorithm is used to give a bijection between the parking functions of a graph and labeled spanning trees, relating the degree of the parking function with the number of inversions of the spanning tree. Specializing to the complete graph solves a problem posed by R. Stanley. 1. Introduction Let G = (V, E) be a connected simple graph with vertex set V = {0,...,n} and edge set E. Fix a root vertex r ∈ V and let SPT(G) denote the set of spanning trees of G rooted at r. We think of each element of SPT(G) as a directed graph in which all paths lead away from the root. If i, j ∈ V and i lies on the unique path from r to j in the rooted spanning tree T , then i is an ancestor of j and j is a descendant of i in T . If, in addition, there are no vertices between i and j on the path from the root, then i is the parent of its child j, and (i, j) is a directed edge of T . Definition 1. An inversion of T ∈ SPT(G) is a pair of vertices (i, j), such that i is an ancestor of j in T and i > j. It is a κ-inversion if, in addition, i is not the root and i’s parent is adjacent to j in G. The number of κ-inversions of T is the tree’s κ-number, denoted κ(G, T ).
    [Show full text]
  • The Normalized Laplacian Spectrum of Subdivisions of a Graph
    The normalized Laplacian spectrum of subdivisions of a graph Pinchen Xiea,c, Zhongzhi Zhangb,c, Francesc Comellasd aDepartment of Physics, Fudan University, Shanghai 200433, China bSchool of Computer Science, Fudan University, Shanghai 200433, China cShanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China dDepartment of Applied Mathematics IV, Universitat Polit`ecnica de Catalunya, 08034 Barcelona Catalonia, Spain Abstract Determining and analyzing the spectra of graphs is an important and exciting research topic in theoretical computer science. The eigenvalues of the normalized Laplacian of a graph provide information on its structural properties and also on some relevant dynamical aspects, in particular those related to random walks. In this paper, we give the spectra of the nor- malized Laplacian of iterated subdivisions of simple connected graphs. As an example of application of these results we find the exact values of their multiplicative degree-Kirchhoff index, Kemeny’s constant and number of spanning trees. Keywords: Normalized Laplacian spectrum, Subdivision graph, Degree-Kirchhoff index, Kemeny’s constant, Spanning trees 1. Introduction arXiv:1510.02394v1 [math.CO] 7 Oct 2015 Spectral analysis of graphs has been the subject of considerable research effort in theoretical computer science [1, 2, 3], due to its wide applications in this area and in general [4, 5]. In the last few decades a large body of scientific literature has established that important structural and dynamical properties
    [Show full text]
  • Classes of Graphs Embeddable in Order-Dependent Surfaces
    Classes of graphs embeddable in order-dependent surfaces Colin McDiarmid Sophia Saller Department of Statistics Department of Mathematics University of Oxford University of Oxford [email protected] and DFKI [email protected] 12 June 2021 Abstract Given a function g = g(n) we let Eg be the class of all graphs G such that if G has order n (that is, has n vertices) then it is embeddable in some surface of Euler genus at most g(n), and let eEg be the corresponding class of unlabelled graphs. We give estimates of the sizes of these classes. For example we 3 g show that if g(n) = o(n= log n) then the class E has growth constant γP, the (labelled) planar graph growth constant; and when g(n) = O(n) we estimate the number of n-vertex graphs in Eg and eEg up g to a factor exponential in n. From these estimates we see that, if E has growth constant γP then we must have g(n) = o(n= log n), and the generating functions for Eg and eEg have strictly positive radius of convergence if and only if g(n) = O(n= log n). Such results also hold when we consider orientable and non-orientable surfaces separately. We also investigate related classes of graphs where we insist that, as well as the graph itself, each subgraph is appropriately embeddable (according to its number of vertices); and classes of graphs where we insist that each minor is appropriately embeddable. In a companion paper [43], these results are used to investigate random n-vertex graphs sampled uniformly from Eg or from similar classes.
    [Show full text]
  • Cheminformatics for Genome-Scale Metabolic Reconstructions
    CHEMINFORMATICS FOR GENOME-SCALE METABOLIC RECONSTRUCTIONS John W. May European Molecular Biology Laboratory European Bioinformatics Institute University of Cambridge Homerton College A thesis submitted for the degree of Doctor of Philosophy June 2014 Declaration This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation is not substantially the same as any I have submitted for a degree, diploma or other qualification at any other university, and no part has already been, or is currently being submitted for any degree, diploma or other qualification. This dissertation does not exceed the specified length limit of 60,000 words as defined by the Biology Degree Committee. This dissertation has been typeset using LATEX in 11 pt Palatino, one and half spaced, according to the specifications defined by the Board of Graduate Studies and the Biology Degree Committee. June 2014 John W. May to Róisín Acknowledgements This work was carried out in the Cheminformatics and Metabolism Group at the European Bioinformatics Institute (EMBL-EBI). The project was fund- ed by Unilever, the Biotechnology and Biological Sciences Research Coun- cil [BB/I532153/1], and the European Molecular Biology Laboratory. I would like to thank my supervisor, Christoph Steinbeck for his guidance and providing intellectual freedom. I am also thankful to each member of my thesis advisory committee: Gordon James, Julio Saez-Rodriguez, Kiran Patil, and Gos Micklem who gave their time, advice, and guidance. I am thankful to all members of the Cheminformatics and Metabolism Group.
    [Show full text]
  • COLORING GRAPHS USING TOPOLOGY 11 Realized by Tutte Who Called an Example of a Disc G for Which the Boundary Has Chromatic Number 4 a Chromatic Obstacle
    COLORING GRAPHS USING TOPOLOGY OLIVER KNILL Abstract. Higher dimensional graphs can be used to color 2- dimensional geometric graphs G. If G the boundary of a three dimensional graph H for example, we can refine the interior until it is colorable with 4 colors. The later goal is achieved if all inte- rior edge degrees are even. Using a refinement process which cuts the interior along surfaces S we can adapt the degrees along the boundary S. More efficient is a self-cobordism of G with itself with a host graph discretizing the product of G with an interval. It fol- lows from the fact that Euler curvature is zero everywhere for three dimensional geometric graphs, that the odd degree edge set O is a cycle and so a boundary if H is simply connected. A reduction to minimal coloring would imply the four color theorem. The method is expected to give a reason \why 4 colors suffice” and suggests that every two dimensional geometric graph of arbitrary degree and ori- entation can be colored by 5 colors: since the projective plane can not be a boundary of a 3-dimensional graph and because for higher genus surfaces, the interior H is not simply connected, we need in general to embed a surface into a 4-dimensional simply connected graph in order to color it. This explains the appearance of the chromatic number 5 for higher degree or non-orientable situations, a number we believe to be the upper limit. For every surface type, we construct examples with chromatic number 3; 4 or 5, where the construction of surfaces with chromatic number 5 is based on a method of Fisk.
    [Show full text]
  • Graph Theory Trees
    GGRRAAPPHH TTHHEEOORRYY -- TTRREEEESS http://www.tutorialspoint.com/graph_theory/graph_theory_trees.htm Copyright © tutorialspoint.com Trees are graphs that do not contain even a single cycle. They represent hierarchical structure in a graphical form. Trees belong to the simplest class of graphs. Despite their simplicity, they have a rich structure. Trees provide a range of useful applications as simple as a family tree to as complex as trees in data structures of computer science. Tree A connected acyclic graph is called a tree. In other words, a connected graph with no cycles is called a tree. The edges of a tree are known as branches. Elements of trees are called their nodes. The nodes without child nodes are called leaf nodes. A tree with ‘n’ vertices has ‘n-1’ edges. If it has one more edge extra than ‘n-1’, then the extra edge should obviously has to pair up with two vertices which leads to form a cycle. Then, it becomes a cyclic graph which is a violation for the tree graph. Example 1 The graph shown here is a tree because it has no cycles and it is connected. It has four vertices and three edges, i.e., for ‘n’ vertices ‘n-1’ edges as mentioned in the definition. Note − Every tree has at least two vertices of degree one. Example 2 In the above example, the vertices ‘a’ and ‘d’ has degree one. And the other two vertices ‘b’ and ‘c’ has degree two. This is possible because for not forming a cycle, there should be at least two single edges anywhere in the graph.
    [Show full text]
  • Hamiltonian Cycle and Related Problems: Vertex-Breaking, Grid
    Hamiltonian Cycle and Related Problems: Vertex-Breaking, Grid Graphs, and Rubik’s Cubes by Mikhail Rudoy Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Masters of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2017 ○c Mikhail Rudoy, MMXVII. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author............................................................................ Department of Electrical Engineering and Computer Science May 26, 2017 Certified by. Erik Demaine Professor Thesis Supervisor Accepted by....................................................................... Christopher J. Terman Chairman, Masters of Engineering Thesis Committee THIS PAGE INTENTIONALLY LEFT BLANK 2 Hamiltonian Cycle and Related Problems: Vertex-Breaking, Grid Graphs, and Rubik’s Cubes by Mikhail Rudoy Submitted to the Department of Electrical Engineering and Computer Science on May 26, 2017, in partial fulfillment of the requirements for the degree of Masters of Engineering in Electrical Engineering and Computer Science Abstract In this thesis, we analyze the computational complexity of several problems related to the Hamiltonian Cycle problem. We begin by introducing a new problem, which we call Tree-Residue Vertex-Breaking (TRVB). Given a multigraph 퐺 some of whose vertices are marked “breakable,” TRVB asks whether it is possible to convert 퐺 into a tree via a sequence of applications of the vertex- breaking operation: disconnecting the edges at a degree-푘 breakable vertex by replacing that vertex with 푘 degree-1 vertices.
    [Show full text]
  • Learning Max-Margin Tree Predictors
    Learning Max-Margin Tree Predictors Ofer Meshiy∗ Elad Ebany∗ Gal Elidanz Amir Globersony y School of Computer Science and Engineering z Department of Statistics The Hebrew University of Jerusalem, Israel Abstract years using structured output prediction has resulted in state-of-the-art results in many real-worlds problems from computer vision, natural language processing, Structured prediction is a powerful frame- computational biology, and other fields [Bakir et al., work for coping with joint prediction of 2007]. Such predictors can be learned from data using interacting outputs. A central difficulty in formulations such as Max-Margin Markov Networks using this framework is that often the correct (M3N) [Taskar et al., 2003, Tsochantaridis et al., 2006], label dependence structure is unknown. At or conditional random fields (CRF) [Lafferty et al., the same time, we would like to avoid an 2001]. overly complex structure that will lead to intractable prediction. In this work we ad- While the prediction and the learning tasks are dress the challenge of learning tree structured generally computationally intractable [Shimony, 1994, predictive models that achieve high accuracy Sontag et al., 2010], for some models they can be while at the same time facilitate efficient carried out efficiently. For example, when the model (linear time) inference. We start by proving consists of pairwise dependencies between output vari- that this task is in general NP-hard, and then ables, and these form a tree structure, prediction can suggest an approximate alternative. Our be computed efficiently using dynamic programming CRANK approach relies on a novel Circuit- at a linear cost in the number of output variables RANK regularizer that penalizes non-tree [Pearl, 1988].
    [Show full text]