Applications of Structural Equivalence to Subgraph Isomorphism on Multichannel Multigraphs

2019 IEEE International Conference on Big Data (Big Data) Applications of Structural Equivalence to Subgraph Isomorphism on Multichannel Multigraphs Thien Nguyen Dominic Yang, Yurun Ge, Hao Li, Andrea L. Bertozzi University of California, Irvine University of California, Los Angeles [email protected] domyang, yurun, lihao0809, bertozzi @math.ucla.edu Abstract—Structural Equivalence refers to the ability to ex- ordering that favors template vertices adjacent to already change two vertices in a graph without changing the structure of matched template vertices and adding pruning rules based on the graph. We provide basic definitions and properties applicable the degrees of vertices. The authors more recently published to the subgraph isomorphism problem. We show examples of structural equivalence that reduce the size of the search tree the VF3 algorithm [15] that further extends this approach. for subgraph isomorphism counting and enumeration, applied In a different approach, Solnon [8] draws upon constraint to multichannel networks. propagation from artificial intelligence, and her algorithm Index Terms—graph matching, multichannel multigraphs, internally stores candidate lists for every template vertex. multiplex network, structural equivalence, graph compression By way of repeated application of filters interspersed into a tree search, her algorithm can effectively prune branches I. INTRODUCTION to enumerate solutions. She expands on this work in [14] to This paper considers subgraph isomorphisms (SI): given a incorporate more powerful filters. world graph and a smaller template graph, the goal is to find Significant work has been done to exploit symmetry to a copy of the template graph within the world graph. This compress graphs and count isomorphisms. TurboIso [10] ex- problem, due to its simple description, finds application in a ploits basic symmetry in the template graph and optimizes the wide variety of areas. The most immediate applications occur matching order based on a selection of candidate regions and in the fields of biology and chemistry where certain networks exploration within those regions. CFL-Match [13] proposes of cells, proteins, or molecules are searched for the presence a match ordering based on a decomposition of the template of specific substructures [6] [7] [11]. Subgraph isomorphism graph into the core, a highly connected subgraph, and a forest has also been used in the field of pattern recognition [3] for which is further decomposed into a forest and leaves. It problems such as handwriting recognition, facial recognition, also exploits the symmetry in the template graph. BoostIso and 2D and 3D image analysis. Another application is social [12] exploits symmetry in the world graph and presents a network analysis [9]. method by which other tree-search-based approaches may be The subgraph isomorphism problem is NP-complete [2], accelerated by using their methodology. suggesting that there is no obvious efficient and broadly applicable algorithm, and that solving subgraph isomorphism Our work is part of the DARPA Modeling Adversial Activ- on graphs is in general computationally intractable. In spite of ity (MAA) program [21]. The data referenced in this paper are this difficulty, significant progress has been made in the de- synthetically generated by three teams of researchers (Pacific velopment of algorithms for detecting subgraph isomorphism. Northwest National Laboratory [17], the GORDIAN (Graph- As in other NP-complete problems, the literature addressing ing Observables from Realistic Distributions in Activity Net- the enumeration of exact subgraph isomorphisms has focused works) project [18], and Ivysys Technologies [16]). In practice, on performing a full tree search of the solution space. The the examples developed by these teams possess symmetry that state-of-the-art algorithms optimize the variable ordering and results in computational complexity when applying standard pruning of branches of the search tree. subgraph isomorphism approaches. For example, consider the The first significant subgraph isomorphism algorithm pro- graph in Fig. 1. Vertices of the same color may be interchanged posed by Ullmann [1] generates candidate vertices from the without changing the structure of the graph, implying that any neighbors of already matched world vertices. The widely used isomorphism can generate many more, simply by swapping VF2 algorithm [4] improves on this by choosing a match vertices. It is our goal to exploit this knowledge to fully determine all possible solutions. This material is based on research sponsored by the Air Force Research This paper builds upon previous work in [19] and develops Laboratory and DARPA under agreement number FA8750-18-2-0066. The U.S. Government is authorized to reproduce and distribute reprints for a theory of equivalence for both the template and world Governmental purposes notwithstanding any copyright notation thereon. The graph in order to accelerate isomorphism identification. We views and conclusions contained herein are those of the authors and should not wish to adequately characterize the larger solution space and be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory and DARPA present a compact solution representation that can be orders or the U.S. Government. 978-1-7281-0858-2/19/31:00 © 2019IEEE of magnitude smaller than the total count. 1 978-1-7281-0858-2/19/$31.00 ©2019 IEEE 4913 equivalence. Finally, we denote the set of neighbors of a vertex v by N v u V G v; u or u; v E G . Similarly, we define the set of incoming neighbors, Nin u and outgoing neighbors,( ) N= out ∈u .( )S( ) ( ) ∈ ( ) The central topic of interest in this paper is( the) problem of finding all subgraph( ) isomorphisms. Definition 1 (Subgraph isomorphism). Given graphs T V T ;E T and W V W ;E W , and a map f V T V W , we say that f is a subgraph isomorphism= if( it( is) injective( )) and edge-preserving,= ( ( ) that( )) is, for each u; v ∶ E(T ),Ð→ we have( that) f u ; f v E W . In the case of mul- c c tichannel multigraphs, we need AT u; v AW f u (; f v) ∈ in( each) channel c. ( ( ) ( )) ∈ ( ) [ ] ≤ [ ( ) ( )] Related terms to this are subgraph homomorphism which relaxes the injectivity requirement, and the induced subgraph Fig. 1. Example template graph from Ivysys Technologies [16]. Non-gray isomorphism which also requires it to be non-edge preserving vertices of the same color may be interchanged without changing the structure of the graph. (if u; v E T ; f u ; f v E W , and equality in the multichannel multigraph case). We( denote) ∉ ( the) graph( ( )T (above)) ∉ as( the) template graph and II. DEFINITIONS AND TERMINOLOGIES the graph W as the world graph. Any subgraph of W which is isomorphic to T is labeled as a signal of T in W . Given a A graph G is an ordered pair V G ;E G where V G pair of graphs T;W , we say that the pair is satisfiable under is the set of vertices or nodes of a graph, and E G V G the subgraph isomorphism problem if there exists at least one V G is the set of edges. Graphs of( this) form( ) are referred( to) signal of T in W . In order to identify potential signal nodes as simple directed graphs. We only consider directed( ) ⊆ graphs( )× for template vertices, we introduce candidate sets. A candidate in( this) paper, so if u; v E G , this does not necessarily set is the set of world vertices that could correspond to a imply v; u E G . We also consider additional structure template vertex under an isomorphism map. The candidate set in the form of multiedges( ) ∈ (or( parallel) edges) in separate for each template node evolves during steps of the algorithm. channels.( In) order∈ ( to) describe these additional features, given a Formally, we define a candidate set for a vertex v V T to fixed ordering of the vertices v ; : : : ; v , we introduce the 1 TV (G)T be any subset of V W . We denote the candidate set of v as adjacency matrix for a graph A which is a V G by V G C v . We say that C v is a minimum complete∈ candidate( ) matrix where the entry A i; j N denotes the number of set if it contains only( ) w V W for which there exists T T T T edges that start at vertex vi and end at vertex(vj). For graphs( ) some( ) subgraph isomorphism( ) f such that f v w. Having with multiple channels, we[ have] ∈ multiple adjacency matrices introduced appropriate terminologies∈ ( ) and notations, we turn 1 m A ;:::;A , one for each channel describing the adjacency to the main topic of this paper. ( ) = structure in that channel, with the non-superscripted matrix A A1;:::;Am referring to the tuple of all channels. In general, III. STRUCTURAL EQUIVALENCE we use superscripts to denote a graph property for a specific c channel. The notation( AG denotes) the adjacency matrix of a Structural Equivalence is well-known for modeling social graph G in channel c. We may omit G if the graph is clear networks [5]. Below we will give a motivating example and from the context. In this paper, we may also slightly abuse show several facts that can be useful in computing subgraph notation by writing the number of edges going from vertex u isomorphisms. Intuitively, we say that two vertices are struc- to vertex v in channel c as Ac u; v , and the tuple of number of turally equivalent to each other if they can be “swapped” edges across all channels A u; v A1 u; v ;:::;Am u; v . without changing the graph structure. Consider the example Given this, we let u; v E[G ]if A u; v 0;:::; 0 . in Fig. 2. We see that vertices B and C in the template graph We say H V H ;E[ H] = (is a[ subgraph] of[ G ])if and 2 and 3 in the world graph are interchangeable.

Load more