Big Graph Processing: Partitioning and Aggregated Querying

Total Page:16

File Type:pdf, Size:1020Kb

Big Graph Processing: Partitioning and Aggregated Querying Big Graph Processing : Partitioning and Aggregated Querying Ghizlane Echbarthi To cite this version: Ghizlane Echbarthi. Big Graph Processing : Partitioning and Aggregated Querying. Databases [cs.DB]. Université de Lyon, 2017. English. NNT : 2017LYSE1225. tel-01707153 HAL Id: tel-01707153 https://tel.archives-ouvertes.fr/tel-01707153 Submitted on 12 Feb 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. No d’ordre NNT : 2017LYSE1225 THESEDEDOCTORATDEL’UNIVERSIT` EDELYON´ op´er´ee au sein de l’Universit´e Claude Bernard Lyon 1 Ecole´ Doctorale ED512 Infomath Sp´ecialit´e de doctorat : Informatique Soutenue publiquement le 23/10/2017, par : Ghizlane ECHBARTHI Big Graph Processing: Partitioning and Aggregated Querying Devant le jury compos´ede: Nom Pr´enom, grade/qualit´e, ´etablissement/entreprise Pr´esident(e) Karine Zeitouni, Professeur, Universit´e de Versaille Rapporteure Raphael Couturier, Professeur, Universit´e de Franche-Comt´e Rapporteur Angela Bonifati, Professeur, Universit´e Lyon 1 Examinatrice Mohand Boughanem, Professeur, Universit´e Paul Sabatier Examinateur Hamamache Kheddouci, Professeur, Universit´e Lyon 1 Directeur de th`ese Remerciements Tout d’abord je tiens `a remercier mon directeur de th`ese Hamamache KHEDDOUCI, de m’avoir acceuilli au sein de l’´equipe GOAL, d’avoir constamment veill´e sur la qualit´e de mon travail et de m’avoir apport´e un soutien scientifique et morale d’une grande importance. Mes sinc`eres remerciements vont aux membres de jury, pour avoir accept´e d’´evaluer mon travail et de r´eviser ce manuscrit. Mes remerciements chaleureux vont `a l’ensemble du personnel du laboratoire LIRIS et de l’universit´e de Lyon, et en particulier le personnel administratif que j’ai pu cˆotoyer, Isabelle, Brigitte, Jean-Pierre, Catherine, pour leur efficacit´e et leur bienveillance. Je tiensa ` remercier mon amie Kaoutar Klaye, pour sa fid´elit´e et son soutien intarissable et aussi pour l’inspiration qu’elle me donne pour toujours pers´ev´erer et donner le meilleur de moi-mˆeme. Je remercie ma famille, mon p`ere, ma m`ere, mon fr`ere et sa petite famille, ma soeur et sa petite famille, mes deux soeurs Fati et Meriem d’ˆetre l`a pour moi et de m’apporter toujours un soutien tr`es chaleureux. Je tiensa ` remercier mon mari Ouadie, je lui dois beaucoup dans ce travail de th`ese, je tiens `a le remercier pour sa patience, son ´ecoute, sa disponibilit´e, sa g´en´erosit´eetson pr´ecieux soutien. Ghizlane ii Abstract The advent of ”big data” has tremendous repercussions on a broad range of data related domains, and it has impelled the use of novel techniques that achieve the best tradeoff between computational cost and precision. In the graph theory field, graphs represent a powerful tool for data and problem modeling where all kinds of problems, starting from the very simple to the very complicated, can be effectively formalized. To resolve NP-complete or NP-hard problems in this field, research is being directed to approximation algorithms and heuristic solutions rather than exact solutions, which exhibit a very high computational cost that makes them impossible to use for massive graph datasets. In this thesis, we tackle two main problems: first, the graph partitioning problem is studied in the context of ”big data”, where the focus is put on large graph streaming partitioning. In fact, the graph partitioning is an important and challenging problem when performing computation tasks over large distributed graphs, the reason is that a good partitioning leads to faster computations. We studied and proposed several streaming partitioning models and heuristics, and we studied their performances theoretically and experimentally as well. The second problem that we tackle in this thesis, is the querying of partitioned/dis- tributed graphs. In fact, querying graph datasets proves to be an important task as most of the real-life datasets are represented by networks of labeled entities. In this context, we study the problem of ”aggregated graph search” that aims to answer queries on several graph fragments, and has the task to build a coherent final answer such that it forms an ”approximate matching” to the initial query. We proposed a new method for ”aggregated graph search”, and we studied its performances theoretically and experimentally on different real word graph datasets. Key words: Balanced graph partitioning, Streaming partitioning, Streaming heuristics, Graph querying, Graph matching, Graph similarity metric, Aggregated search. iii R´esum´e Avec l’av`enement du ”big data”, de nombreuses r´epercussions ont eu lieu dans tous les domaines de la technologie de l’information, pr´econisant des solutions innovantes rem- portant le meilleur compromis entre coˆuts et pr´ecision. En th´eorie des graphes, o`ules graphes constituent un support de mod´elisation puissant qui permet de formaliser des probl`emes allant des plus simples aux plus complexes, la recherche pour des probl`emes NP-complet ou NP-difficil se tourne plutˆot vers des solutions approch´ees, mettant ainsi en avant les algorithmes d’approximation et les heuristiques alors que les solutions ex- actes deviennent extrˆemement coˆuteuses et impossible d’utilisation. Nous abordons dans cette th`ese deux probl´ematiques principales: dans un premier temps, le probl`eme du partitionnement des graphes est abord´e d’une perspective ”big data”, o`u les graphes massifs sont partitionn´es en streaming. En effet, le partitionnement de graphe est une tˆache cruciale et importante lors de la mise en place des graphes de donn´ees distribu´ees, pour la principale raison est qu’un bon partitionnement permet d’acc´elerer les calculs sur ces graphes de donn´ees distribu´ees. Dans ce contexte, nous ´etudions et proposons plusieurs mod`eles et heuristiques de partitionnement en streaming et nous ´evaluons leurs performances autant sur le plan th´eorique qu’empirique. Dans un second temps, nous nous int´eressons au requˆetage des graphes distribu´es ou par- titionn´es. En effet, une grande majorit´ededonn´ees est mod´elis´ee sous formes d’entit´es ´etiquet´ees et interconnect´ees, ce qui fait du requˆetage des graphes distribu´es une tˆache importante dans une large liste d’applications. Dans ce cadre, nous ´etudions la probl´ematique de la ”recherche agr´egative dans les graphes” qui a pour but de r´epondre `adesrequˆetes interrogeant plusieurs fragments de graphes et qui se charge de la reconstruction de la r´eponse finale tel que l’on obtient un ”matching approch´e” avec la requˆete initiale. Mots cl´es: Requˆete de graphes, Matching de graphes, Mesure de similarit´edansles graphes, Recherche agr´egative dans les graphes, Partitionnement des graphes, Partition- nement en streaming, Heuristiques de streaming, Partitionnement ´equilibr´e des graphes. iv Contents Remerciements ii Abstract iii R´esum´e iv List of Figures ix List of Tables xi 1 Introduction 1 1.1 Scope of the thesis and contributions ..................... 2 I Massive Graph Partitioning 4 2 Graph Partitioning Problem 6 2.1 Graph Partitioning: Problem Definition and Application .......... 6 2.1.1 Notations ................................ 6 2.1.2 Problem Definition ........................... 7 2.1.2.1 Balanced & Unbalanced Graph Partitioning ....... 7 2.1.2.2 Hardness Results and Approximation ........... 8 2.1.3 Hypergraph Partitioning ........................ 8 2.1.4 Graph Partitioning Applications ................... 9 2.1.4.1 Parallel Processing ...................... 10 2.1.4.2 Complex networks ...................... 10 2.1.4.3 Road networks ........................ 11 2.1.4.4 Image Segmentation ..................... 11 2.1.4.5 VLSI Physical Design .................... 11 2.2 Offline Partitioning ............................... 12 2.2.1 Spectral Partitioning .......................... 12 2.2.2 Multilevel Partitioning ......................... 14 2.2.2.1 Coarsening phase ...................... 15 2.2.2.2 Partitioning phase ...................... 17 2.2.2.3 Uncoarsening and refinement phase ............ 18 2.2.3 Geometric Partitioning ......................... 19 2.2.3.1 Geometric partitioning: Linear Embedding based Graph Partitioning ......................... 20 2.3 Online Partitioning: Streaming Graph Partitioning ............. 21 v Contents vi 2.3.1 Problem Setting & Definition ..................... 21 2.3.1.1 Streaming Model ....................... 21 2.3.1.2 Problem Definition ...................... 23 2.3.2 One Pass Streaming Partitioning ................... 23 2.3.2.1 Linear Deterministic Greedy heuristic ........... 24 2.3.2.2 FENNEL heuristic ...................... 24 2.3.2.3 Fractional Greedy heuristic ................. 25 2.3.3 Restreaming Partitioning ....................... 25 2.3.3.1 Restreaming the one-pass streaming partitioning heurtis- tics .............................. 26 2.4 Chapter Summary ............................... 27 3 Fractional Greedy Heuristic & Partial Restreaming Partitioning 28 3.1 Fractional Greedy: a proposed streaming partitioning heuristic for large graphs .....................................
Recommended publications
  • Error-Tolerant Graph Matching: a Formal Framework and Algorithms
    Error-Tolerant Graph Matching: A Formal Framework and Algorithms H. Bunke Department of Computer Science,University of Bern, Neubr/ickstr. 10, CH-3012 Bern, Switzerland [email protected] Abstract. This paper first reviews some theoretical results in error- tolerant graph matching that were obtained recently. The results in- clude a new metric for error-tolerant graph matching based on maximum common subgraph, a relation between maximum common subgraph and graph edit distance, and the existence of classes of cost functions for error-tolerant graph matching. Then some new optimal algorithms for error-tolerant graph matching are discussed. Under specific conditions, the new algorithms may be significantly more efficient than traditional methods. Keywords: structural pattern recognition, graphs, graph matching, error-tolerant matching, edit distance, maximum common subgraph, cost function 1 Introduction In structural pattern recognition, graphs are widely used for object representa- tion. Typically, parts of a complex object, or pattern, are represented by nodes, and relations between the various parts by edges. Labels and/or attributes for nodes and edges are used to further distinguish between different classes of nodes and edges, or to incorporate numerical information in the symbolic representa- tion. Using graphs to represent both known models from a database and un- known input patterns, the recognition task turns into a graph matching problem. That is, the database is searched for models that are similar to the unknown input graph. Standard algorithms for graph matching include graph isomorphism, sub- graph isomorphism, and maximum common subgraph search [1, 2]. However, in real world applications we can't always expect a perfect match between the input and one of the graphs in the database.
    [Show full text]
  • Typical Snapshots Selection for Shortest Path Query in Dynamic Road Networks
    Typical Snapshots Selection for Shortest Path Query in Dynamic Road Networks Mengxuan Zhang, Lei Li, Wen Hua, Xiaofang Zhou School of Information Technology and Electrical Engineering The University of Queensland, Brisbane, Australia [email protected], fl.li3, [email protected],[email protected] Abstract. Finding the shortest paths in road network is an important query in our life nowadays, and various index structures are constructed to speed up the query answering. However, these indexes can hardly work in real-life scenario because the traffic condition changes dynamically, which makes the pathfinding slower than in the static environment. In order to speed up path query answering in the dynamic road network, we propose a framework to support these indexes. Firstly, we view the dynamic graph as a series of static snapshots. After that, we propose two kinds of methods to select the typical snapshots. The first kind is time-based and it only considers the temporal information. The second category is the graph representation-based, which considers more insights: edge-based that captures the road continuity, and vertex-based that reflects the region traffic fluctuation. Finally, we propose the snapshot matching to find the most similar typical snapshot for the current traffic condition and use its index to answer the query directly. Extensive experiments on real-life road network and traffic conditions validate the effectiveness of our approach. Keywords: Shortest Path, Snapshot Selection, Dynamic Road Network 1 Introduction Shortest path query is a fundamental operation in road network routing and navigation. A road network can be denoted as a directed graph G(V; E) where V is the set of road intersections, and E ⊆ V × V is the set of road segments.
    [Show full text]
  • On the Unification of the Graph Edit Distance and Graph Matching Problems Romain Raveaux
    On the unification of the graph edit distance and graph matching problems Romain Raveaux To cite this version: Romain Raveaux. On the unification of the graph edit distance and graph matching problems. Pattern Recognition Letters, Elsevier, 2021, 145, 10.1016/j.patrec.2021.02.014. hal-03163084v2 HAL Id: hal-03163084 https://hal.archives-ouvertes.fr/hal-03163084v2 Submitted on 19 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. On the unification of the graph edit distance and graph matching problems Romain Raveaux1 1Universit´ede Tours, Laboratoire d'Informatique Fondamentale et Appliqu´eede Tours (LIFAT - EA 6300), 64 Avenue Jean Portalis, 37000 Tours, France 7th March 2020 Abstract Error-tolerant graph matching gathers an important family of problems. These problems aim at finding correspondences between two graphs while integrating an error model. In the Graph Edit Distance (GED) problem, the insertion/deletion of edges/nodes from one graph to another is explicitly expressed by the error model. At the opposite, the problem commonly referred to as \graph matching" does not explicitly express such operations. For decades, these two problems have split the research community in two separated parts.
    [Show full text]
  • A Binary Linear Programming Formulation of the Graph Edit Distance
    1200 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 8, AUGUST 2006 A Binary Linear Programming Formulation of the Graph Edit Distance Derek Justice and Alfred Hero, Fellow, IEEE Abstract—A binary linear programming formulation of the graph edit distance for unweighted, undirected graphs with vertex attributes is derived and applied to a graph recognition problem. A general formulation for editing graphs is used to derive a graph edit distance that is proven to be a metric, provided the cost function for individual edit operations is a metric. Then, a binary linear program is developed for computing this graph edit distance, and polynomial time methods for determining upper and lower bounds on the solution of the binary program are derived by applying solution methods for standard linear programming and the assignment problem. A recognition problem of comparing a sample input graph to a database of known prototype graphs in the context of a chemical information system is presented as an application of the new method. The costs associated with various edit operations are chosen by using a minimum normalized variance criterion applied to pairwise distances between nearest neighbors in the database of prototypes. The new metric is shown to perform quite well in comparison to existing metrics when applied to a database of chemical graphs. Index Terms—Graph algorithms, similarity measures, structural pattern recognition, graphs and networks, linear programming, continuation (homotopy) methods. æ 1INTRODUCTION TTRIBUTED graphs provide convenient structures for chemicals or medicines [6]. Various techniques have been Arepresenting objects when relational properties are of designed for processing the graphs for structural features and interest.
    [Show full text]
  • 16.6 Graph Partition 541
    16.6 GRAPH PARTITION 541 INPUT OUTPUT 16.6 Graph Partition Input description: A (weighted) graph G =(V,E) and integers k and m. Problem description: Partition the vertices into m roughly equal-sized subsets such that the total edge cost spanning the subsets is at most k. Discussion: Graph partitioning arises in many divide-and-conquer algorithms, which gain their efficiency by breaking problems into equal-sized pieces such that the respective solutions can easily be reassembled. Minimizing the number of edges cut in the partition usually simplifies the task of merging. Graph partition also arises when we need to cluster the vertices into logical components. If edges link “similar” pairs of objects, the clusters remaining after partition should reflect coherent groupings. Large graphs are often partitioned into reasonable-sized pieces to improve data locality or make less cluttered drawings. Finally, graph partition is a critical step in many parallel algorithms. Consider the finite element method, which is used to compute the physical properties (such as stress and heat transfer) of geometric models. Parallelizing such calculations requires partitioning the models into equal-sized pieces whose interface is small. This is a graph-partitioning problem, since the topology of a geometric model is usually represented using a graph. Several different flavors of graph partitioning arise depending on the desired objective function: 542 16. GRAPH PROBLEMS: HARD PROBLEMS Figure 16.1: The maximum cut of a graph • Minimum cut set –Thesmallest set of edges to cut that will disconnect a graph can be efficiently found using network flow or randomized algorithms.
    [Show full text]
  • Bayesian Graph Edit Distance
    This is a repository copy of Bayesian graph edit distance. White Rose Research Online URL for this paper: https://eprints.whiterose.ac.uk/1988/ Article: Myers, R., Hancock, E.R. orcid.org/0000-0003-4496-2028 and Wilson, R.C. (2000) Bayesian graph edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 628-635. ISSN 0162-8828 https://doi.org/10.1109/34.862201 Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request. [email protected] https://eprints.whiterose.ac.uk/ 628 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 6, JUNE 2000 Bayesian Graph Edit Distance strings [10]. This idea has been extended to form a basis for comparing trees and graphs on a global level [11], [12], [1], [7]. Richard Myers, Richard C. Wilson, and More recently, the idea of actively editing graphs during the matching process to eliminate relational clutter has proved very Edwin R. Hancock successful [13], [8], [14].
    [Show full text]
  • A Survey of Graph Edit Distance
    Pattern Anal Applic (2010) 13:113–129 DOI 10.1007/s10044-008-0141-y THEORETICAL ADVANCES A survey of graph edit distance Xinbo Gao Æ Bing Xiao Æ Dacheng Tao Æ Xuelong Li Received: 15 November 2007 / Accepted: 16 October 2008 / Published online: 13 January 2009 Ó Springer-Verlag London Limited 2009 Abstract Inexact graph matching has been one of the 1 Originality and contribution significant research foci in the area of pattern analysis. As an important way to measure the similarity between pair- Graph edit distance is an important way to measure the wise graphs error-tolerantly, graph edit distance (GED) is similarity between pairwise graphs error-tolerantly in the base of inexact graph matching. The research advance inexact graph matching and has been widely applied to of GED is surveyed in order to provide a review of the pattern analysis and recognition. However, there is scarcely existing literatures and offer some insights into the studies any survey of GED algorithms up to now, and therefore of GED. Since graphs may be attributed or non-attributed this paper is novel to a certain extent. The contribution of and the definition of costs for edit operations is various, the this paper focuses on the following aspects: existing GED algorithms are categorized according to these On the basis of authors’ effort for studying almost all two factors and described in detail. After these algorithms GED algorithms, authors firstly expatiate on the idea of are analyzed and their limitations are identified, several GED with illustrations in order to make the explanation promising directions for further research are proposed.
    [Show full text]
  • A Metric Learning Approach to Graph Edit Costs for Regression Linlin Jia, Benoit Gaüzère, Florian Yger, Paul Honeine
    A Metric Learning Approach to Graph Edit Costs for Regression Linlin Jia, Benoit Gaüzère, Florian Yger, Paul Honeine To cite this version: Linlin Jia, Benoit Gaüzère, Florian Yger, Paul Honeine. A Metric Learning Approach to Graph Edit Costs for Regression. Proceedings of IAPR Joint International Workshops on Statistical techniques in Pattern Recognition (SPR 2020) and Structural and Syntactic Pattern Recognition (SSPR 2020), Jan 2021, Venise, Italy. hal-03128664 HAL Id: hal-03128664 https://hal-normandie-univ.archives-ouvertes.fr/hal-03128664 Submitted on 2 Feb 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Metric Learning Approach to Graph Edit Costs for Regression Linlin Jia1;4, Benoit Ga¨uz`ere1;4, Florian Yger2?, and Paul Honeine3;4 1 LITIS Lab, INSA Rouen Normandie, France 2 LAMSADE, Universit´eParis Dauphine-PSL, France 3 LITIS Lab, Universit´ede Rouen Normandie, France 4 Normandie Universit´e,France Abstract. Graph edit distance (GED) is a widely used dissimilarity measure between graphs. It is a natural metric for comparing graphs and respects the nature of the underlying space, and provides interpretability for operations on graphs. As a key ingredient of the GED, the choice of edit cost functions has a dramatic effect on the GED and therefore the classification or regression performances.
    [Show full text]
  • Theory of Graph Traversal Edit Distance, Extensions, and Applications
    THESIS THEORY OF GRAPH TRAVERSAL EDIT DISTANCE, EXTENSIONS, AND APPLICATIONS Submitted by Ali Ebrahimpour Boroojeny Department of Computer Science In partial fulfillment of the requirements For the Degree of Master of Science Colorado State University Fort Collins, Colorado Spring 2019 Master’s Committee: Advisor: Hamidreza Chitsaz Asa Ben-Hur Zaid Abdo Copyright by Ali Ebrahimpour Boroojeny 2019 All Rights Reserved ABSTRACT THEORY OF GRAPH TRAVERSAL EDIT DISTANCE, EXTENSIONS, AND APPLICATIONS Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this paper, we give a new graph kernel which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the graph traversal edit distance is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of an SVM classifier on a few datasets in the literature.
    [Show full text]
  • Tensor Spectral Clustering for Partitioning Higher-Order Network Structures
    Tensor Spectral Clustering for Partitioning Higher-order Network Structures Austin R. Benson∗ David F. Gleichy Jure Leskovecz Abstract normalized) number of first-order structures (i.e., edges) that Spectral graph theory-based methods represent an important need to be cut in order to split the graph into two parts. In a class of tools for studying the structure of networks. Spec- similar spirit, a higher-order generalization of spectral clus- tral methods are based on a first-order Markov chain de- tering would try to minimize cutting higher-order structures rived from a random walk on the graph and thus they cannot that involve multiple nodes (e.g., triangles). take advantage of important higher-order network substruc- Incorporating higher-order graph information (that is, tures such as triangles, cycles, and feed-forward loops. Here network motifs/graphlets) into the partitioning process can we propose a Tensor Spectral Clustering (TSC) algorithm significantly improve our understanding of the underlying that allows for modeling higher-order network structures in a network. For example, triangles (three-dimensional network graph partitioning framework. Our TSC algorithm allows the structures involving three nodes) have proven fundamental to user to specify which higher-order network structures (cy- understanding social networks [14, 21] and their community cles, feed-forward loops, etc.) should be preserved by the structure [10, 26, 29]. Most importantly, higher-order spec- network clustering. Higher-order network structures of in- tral clustering would allow for greater modeling flexibility terest are represented using a tensor, which we then partition as the application would drive which higher-order network by developing a multilinear spectral method.
    [Show full text]
  • Graph Edit Networks
    Published as a conference paper at ICLR 2021 GRAPH EDIT NETWORKS Benjamin Paassen Daniele Grattarola The University of Sydney Università della Svizzera italiana [email protected] [email protected] Daniele Zambon Cesare Alippi Università della Svizzera italiana Università della Svizzera italiana [email protected] Politecnico di Milano [email protected] Barbara Hammer Bielefeld University [email protected] ABSTRACT While graph neural networks have made impressive progress in classification and regression, few approaches to date perform time series prediction on graphs, and those that do are mostly limited to edge changes. We suggest that graph edits are a more natural interface for graph-to-graph learning. In particular, graph edits are general enough to describe any graph-to-graph change, not only edge changes; they are sparse, making them easier to understand for humans and more efficient computationally; and they are local, avoiding the need for pooling layers in graph neural networks. In this paper, we propose a novel output layer - the graph edit network - which takes node embeddings as input and generates a sequence of graph edits that transform the input graph to the output graph. We prove that a mapping between the node sets of two graphs is sufficient to construct training data for a graph edit network and that an optimal mapping yields edit scripts that are almost as short as the graph edit distance between the graphs. We further provide a proof-of-concept empirical evaluation on several graph dynamical systems, which are difficult to learn for baselines from the literature.
    [Show full text]
  • Preconditioned Spectral Clustering for Stochastic Block Partition
    Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge (Preliminary version at arXiv.) David Zhuzhunashvili Andrew Knyazev University of Colorado Mitsubishi Electric Research Laboratories (MERL) Boulder, Colorado 201 Broadway, 8th Floor, Cambridge, MA 02139-1955 Email: [email protected] Email: [email protected], WWW: http://www.merl.com/people/knyazev Abstract—Locally Optimal Block Preconditioned Conjugate The graph partitioning problem can be formulated in terms Gradient (LOBPCG) is demonstrated to efficiently solve eigen- of spectral graph theory, e.g., using a spectral decomposition value problems for graph Laplacians that appear in spectral of a graph Laplacian matrix, obtained from a graph adjacency clustering. For static graph partitioning, 10–20 iterations of LOBPCG without preconditioning result in ˜10x error reduction, matrix with non-negative entries that represent positive graph enough to achieve 100% correctness for all Challenge datasets edge weights describing similarities of graph vertices. Most with known truth partitions, e.g., for graphs with 5K/.1M commonly, a multi-way graph partitioning is obtained from (50K/1M) Vertices/Edges in 2 (7) seconds, compared to over approximated “low frequency eigenmodes,” i.e. eigenvectors 5,000 (30,000) seconds needed by the baseline Python code. Our corresponding to the smallest eigenvalues, of the graph Lapla- Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) cian matrix. Alternatively and, in some cases, e.g., normalized seconds using 10GB (50GB) of memory. Our single-precision cuts, equivalently, one can operate with a properly scaled graph MATLAB code calculates the same clusters at half time and adjacency matrix, turning it into a row-stochastic matrix that memory.
    [Show full text]