On the Unification of the Graph Edit Distance and Graph Matching Problems Romain Raveaux
Total Page:16
File Type:pdf, Size:1020Kb
On the unification of the graph edit distance and graph matching problems Romain Raveaux To cite this version: Romain Raveaux. On the unification of the graph edit distance and graph matching problems. Pattern Recognition Letters, Elsevier, 2021, 145, 10.1016/j.patrec.2021.02.014. hal-03163084v2 HAL Id: hal-03163084 https://hal.archives-ouvertes.fr/hal-03163084v2 Submitted on 19 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. On the unification of the graph edit distance and graph matching problems Romain Raveaux1 1Universit´ede Tours, Laboratoire d'Informatique Fondamentale et Appliqu´eede Tours (LIFAT - EA 6300), 64 Avenue Jean Portalis, 37000 Tours, France 7th March 2020 Abstract Error-tolerant graph matching gathers an important family of problems. These problems aim at finding correspondences between two graphs while integrating an error model. In the Graph Edit Distance (GED) problem, the insertion/deletion of edges/nodes from one graph to another is explicitly expressed by the error model. At the opposite, the problem commonly referred to as \graph matching" does not explicitly express such operations. For decades, these two problems have split the research community in two separated parts. It resulted in the design of different solvers for the two problems. In this paper, we propose a unification of both problems thanks to a single model. We give the proof that the two problems are equivalent under a reformulation of the error models. This unification makes possible the use on both problems of existing solving methods from the two communities. Keywords| Graph edit distance, graph matching, error-correcting graph matching, discrete optimiza- tion Graphical Abstract 1 1 Introduction then Yi;k = 1, and Yi;k = 0 otherwise. We denote by y 2 f0; 1gn1:n2, a column-wise vectorized replica Graphs are frequently used in various fields of com- of Y . With this notation, graph matching problems puter science, since they constitute a universal mod- can be expressed as finding the assignment vector ∗ eling tool which allows the description of structured y that maximizes a score function S(G1;G2; y) as data. The handled objects and their relations are follows: described in a single and human-readable formal- ism. Hence, tools for graphs supervised classification Model 1. Graph matching model (GMM) and graph mining are required in many applications ∗ such as pattern recognition (Riesen, 2015), chemi- y =argmax S(G1;G2; y) (1a) cal components analysis, transfer learning (Das and y Lee, 2018). In such applications, comparing graphs subject to yi;k 2 f0; 1g 8(ui; vk) 2 V1 × V2 (1b) is of first interest. The similarity or dissimilarity be- X yi;k ≤ 1 8vk 2 V2 (1c) tween two graphs requires the computation and the ui2V1 evaluation of the \best" matching between them. X Since exact isomorphism rarely occurs in pattern yi;k ≤ 1 8ui 2 V1 (1d) analysis applications, the matching process must be vk2V2 error-tolerant, i.e., it must tolerate differences on the where equations (1c),(1d) induces the matching topology and/or its labeling. The Graph Edit Dis- constraints, thus making y an assignment vector. tance (GED)(Riesen, 2015) problem and the Graph The function S(G ;G ; y) measures the similarity Matching problem (GM) (Swoboda et al., 2017) pro- 1 2 of graph attributes, and is typically decomposed into vide two different error models. These two problems a first order similarity function s(u ! v ) for a have been deeply studied but they have split the re- i k node pair u 2 V and v 2 V , and a second-order search community into two groups of people devel- i 1 k 2 similarity function s(e ! e ) for an edge pair e 2 oping separately quite different methods. ij kl ij E and e 2 E . Thus, the objective function of In this paper, we propose to unify the GED prob- 1 kl 2 graph matching is defined as: lem and the GM problem in order to unify the work force in terms of methods and benchmarks. We show X X S(G ;G ; y) = s(u ! v ) · y that the GED problem can be equivalent to the GM 1 2 i k i;k u 2V v 2V problem under certain (permissive) conditions. The i 1 k 2 X X paper is organized as follows: Section 2, we give the + s(eij ! ekl) · yik · yjl definitions of the problems. Section 3, the state of eij 2E1 ekl2E2 the art on GM and GED is presented along with the (2) literature comparing GED and GM to other prob- In essence, the score accumulates all the similarity lems. Section 4, a specific related work is detailed values that are relevant to the assignment. The GM since it is the basement of our reasoning. Section problem has been proven to be NP-hard by (Garey 5, our proposal is described and a proof is given. and Johnson, 1979). Section 6, experimental results are presented to val- idate empirically our proposal. Finally, conclusions are drawn. 2.2 Graph Edit Distance The graph edit distance (GED) was first reported in (Tsai et al., 1979). GED is a dissimilarity mea- 2 Definitions and problems sure for graphs that represents the minimum-cost sequence of basic editing operations to transform a In this section, we define the problems to be studied. graph into another graph by means classically in- An attributed graph is considered as a set of 4 tuples cluded operations: insertion, deletion and substitu- (V ,E,µ,ζ) such that: G = (V ,E,µ,ζ). V is a set of tion of vertices and/or edges. Therefore, GED can vertices. E is a set of edges such as E ⊆ V × V . µ be formally represented by the minimum cost edit is a vertex labeling function which associates a label path transforming one graph into another. Edge to a vertex. ζ is an edge labeling function which operations are taken into account in the matching associates a label to an edge. process when substituting, deleting or inserting their adjacent vertices. From now on and for simplicity, 2.1 Graph matching problem we denote the substitution of two vertices ui and vk by (ui ! vk), the deletion of vertex ui by (ui ! ) The objective of graph matching is to find correspon- and the insertion of vertex vk by ( ! vk). Like- dences between two attributed graphs G1 and G2.A wise for edges eij and ekl,(eij ! ekl) denotes edges solution of graph matching is defined as a subset of substitution, (eij ! ) and ( ! ekl) denote edges possible correspondences Y ⊆ V1 × V2, represented deletion and insertion, respectively. by a binary assignment matrix Y 2 f0; 1gn1×n2, An edit path (λ) is a set of edit operations o. where n1 and n2 denote the number of nodes in G1 This set is referred to as Edit Path and it is defined and G2, respectively. If ui 2 V1 matches vk 2 V2, in Definition1. 2 Definition 1. Edit Path Property 1. The edges matching are driven by the A set λ = fo1; ··· ; okg of k edit operations o that vertices matching transform G1 completely into G2 is called an (com- Assuming that constraint defined in Definition4 plete) edit path. is satisfied then three cases can appear : Case 1: If there is an edge eij = (ui; uj ) 2 E1 and an Let c(o) be the cost function measuring the edge ekl = (vk; vl) 2 E2, edges substitution between strength of an edit operation o. Let Γ(G1;G2) be (ui; uj ) and (vk; vl) is performed (i.e., (eij ! ekl)). the set of all possible edit paths (λ). The graph edit Case 2: If there is an edge eij = (ui; uj ) 2 E1 and distance problem is defined by Problem1. there is no edge between vk and vl then an edge dele- tion of (ui; uj ) is performed (i.e., (eij ! )). Problem 1. Graph Edit Distance Case 3: If there is no edge between ui and uj Let G1 = (V1,E1,µ1,ζ1) and G2 = (V2,E2,µ2,ζ2) be and there is an edge between and an edge ekl = two graphs, the graph edit distance between G1 and (vk; vl) 2 E2 then an edge insertion of (vk; vl) is G2 is defined as: performed (i.e., ( ! ekl)). X dmin(G1;G2) = min c(o) (3) The GED problem defined in Problem1 and re- λ2Γ(G1;G2) o2λ fined with constraints defined in Definitions2,3 and 4 is referred in the literature and in this paper as the GED problem. The GED problem has been proven The GED problem is a minimization problem and to be NP-hard by (Zeng et al., 2009). dmin is the best distance. In its general form, the GED problem (Problem1) is very versatile. The problem has to be refined to cope with the con- 2.3 Related problems and models straints of an assignment problem. First, let us de- fine constraints on edit operations (oi) in Definition GED and GM problems fall into the family of 2. error-tolerant graph matching problems. GED and GM problems can be equivalent to another prob- Definition 2. Edit operations constraints lem called Quadratic Assignment Problem (QAP) (Bougleux et al., 2017; Cho et al., 2013).