1200 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 8, AUGUST 2006 A Binary Linear Programming Formulation of the Graph Edit Distance
Derek Justice and Alfred Hero, Fellow, IEEE
Abstract—A binary linear programming formulation of the graph edit distance for unweighted, undirected graphs with vertex attributes is derived and applied to a graph recognition problem. A general formulation for editing graphs is used to derive a graph edit distance that is proven to be a metric, provided the cost function for individual edit operations is a metric. Then, a binary linear program is developed for computing this graph edit distance, and polynomial time methods for determining upper and lower bounds on the solution of the binary program are derived by applying solution methods for standard linear programming and the assignment problem. A recognition problem of comparing a sample input graph to a database of known prototype graphs in the context of a chemical information system is presented as an application of the new method. The costs associated with various edit operations are chosen by using a minimum normalized variance criterion applied to pairwise distances between nearest neighbors in the database of prototypes. The new metric is shown to perform quite well in comparison to existing metrics when applied to a database of chemical graphs.
Index Terms—Graph algorithms, similarity measures, structural pattern recognition, graphs and networks, linear programming, continuation (homotopy) methods. æ
1INTRODUCTION
TTRIBUTED graphs provide convenient structures for chemicals or medicines [6]. Various techniques have been Arepresenting objects when relational properties are of designed for processing the graphs for structural features and interest. Such representations are frequently useful in generating bit strings (referred to as fingerprints) based on the applications ranging from computer-aided drug design to presence or absence of such features [7]. Since the fingerprints machine vision. A familiar machine vision problem is to can be rapidly compared, some prescreening or clustering is recognize specific objects within an image. In this case, the often done based on these to eliminate all but the most similar image is processed to generate a representative graph based graphs to a given input graph. The remaining graphs may on structural characteristics, such a region adjacency graph then be compared to the query using a more sophisticated or a line adjacency graph, and vertex attributes may be (and more computationally demanding) method. Distance assigned according to characteristics of the region to which metrics based on the maximum common subgraph are each vertex corresponds [1]. This representative graph is frequently used in this role [8], [9]. then compared to a database of prototype or model graphs Although computing the maximum common subgraph in order to identify and classify the object of interest. Face (MCS) is no small task (indeed, it is an NP-Hard problem identification [2] and symbol recognition [3] are among the [10]), several graph distance metrics have been proposed that problems in machine vision where graphs have been use the size of the MCS. One such metric is given in [11], with a utilized recently. In this context, a reliable and speedy slight modification presented in [12]. A different MCS-based method for comparing graphs is important. Many heuristics metric is presented in [13] specifically for application to and simplifications have been developed and employed for chemical graphs. An alternate metric that uses the MCS along this purpose in a variety of applications [4]. with the minimum common supergraph has also been Comparing graphs in the context of graph database proposed [14]. For related structures, such as attributed trees, searching has also found a significant application in the it is often possible to derive distance metrics based on the pharmaceutical and agrochemical industries. The attributed maximum common substructure that operate in polynomial graphs of interest are so-called chemical graphs which are time [15]. An intimate relationship between graph compar- derived from chemical structure diagrams. Graphical repre- ison and graph (or subgraph) isomorphism is readily sentations are of great utility here because of the similar apparent in these examples because the MCS defines property principle, which states that molecules with similar subgraphs in the two graphs being compared that are structures will exhibit similar chemical properties [5]. Thus, isomorphic. Indeed, computing a graph distance metric often databases of these chemical graphs are often searched by requires the computation of some sort of isomorphism (also comparing with a query graph to aid in the design of new known as matching) between graphs. An exact graph isomorphism defines a mapping between the nodes of two attributed graphs so that their structures . The authors are with the Department of Electrical Engineering and (vertex attributes along with edges) exactly coincide. As one Computer Science, University of Michigan, 1301 Beal Avenue, Ann Arbor, might expect, this is also a challenging computational MI 48109-2122. E-mail: {justiced, hero}@umich.edu. problem, in general, although it has not been shown to be Manuscript received 5 Jan. 2005; revised 14 Oct. 2005; accepted 3 Jan. 2006; NP-Complete [10]. As with subgraph isomorphism, poly- published online 13 June 2006. Recommended for acceptance by A. Rangarajan. nomial time algorithms are available for certain restricted For information on obtaining reprints of this article, please send e-mail to: classes of graphs [16]. Algorithms for general graph iso- [email protected], and reference IEEECS Log Number TPAMI-0014-0105. morphism that are shown to be quite speedy in practice are
0162-8828/06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: University of Michigan Library. Downloaded on May 5, 2009 at 11:46 from IEEE Xplore. Restrictions apply. JUSTICE AND HERO: A BINARY LINEAR PROGRAMMING FORMULATION OF THE GRAPH EDIT DISTANCE 1201 given in [17], [18]. Such algorithms typically take advantage recognition point of view have been presented. In [36], the EM of vertex attributes to efficiently prune a search tree algorithm is applied to assumed Gaussian mixture models for constructed for finding an isomorphism. Graphs obtained edit events in order to choose costs that enforce similarity (or from real objects are rarely isomorphic, however, so it is dissimilarity) between specific pairs of graphs in a training useful to consider inexact or error-correcting graph iso- set. An application for matching images based on their shock morphisms (ECGI) that allow for graphs to nearly (but not graphs [37] uses the tree edit distance algorithm in [34] and exactly) coincide [19]. As the name suggests, the lack of exact chooses edit costs based on local shape differences within isomorphism can be caused by measurement noise or errors shock graphs corresponding to similar images. In a chemical in a sample graph when compared to a model graph. On the graph recognition application, heuristics are used to choose other hand, when comparing two model graphs, one might the edit costs of a string edit distance between strings formed interpret such “errors” as capturing the essential differences from the maximal paths between vertices in the graphs [38]. between the two graphs. Related studies have also been done into the effectiveness of Error-correcting graph matching attempts to compute a weighting the presence or absence of certain substructures mapping between the vertices of two graphs so that they differently when comparing fingerprints derived from approximately coincide, realizing that the graphs may not be chemical graphs [39]. isomorphic. Many suboptimal approaches exist to tackle this In this paper, we provide a formulation of the graph edit problem [20], [21], [22], [23]. The adjacency matrix eigende- distance whereby error-correcting graph matching may be composition approach of [21] gives fast suboptimal results, performed by solving a binary linear program (BLP—that however, it is only applicable to graphs having adjacency is, a linear program where all variables must take values matrices with no repeated eigenvalues. Graphs with a low from the set f0; 1g). We first present a general framework degree of connectivity will often have adjacency matrices for computing the GED between attributed, unweighted with multiple zero eigenvalues. Heuristics are used in the graphs by treating them as subgraphs of a larger graph graduated assignment type methods of [22], [23] to signifi- referred to as the edit grid. It is argued that the edit grid cantly reduce the exponential complexity of the original need only have as many vertices as the sum of the total problem. These methods can be applied to very large graphs; number of vertices in the graphs being compared. We show however, they require several tuning parameters to which the that graph editing is equivalent to altering the state of the performance of the algorithm is quite sensitive. Unfortu- edit grid and prove that the GED derived in this way is a nately, no systematic method for choosing these parameters metric, provided the cost function for individual edit is provided. The linear programming approach of [20] gives operations is a metric. We then use the adjacency matrix good results in a reasonable amount of time for graphs having representation to formulate a binary linear program to solve the same number of vertices. The authors of [20] use the linear for the GED. Since solving a BLP is NP-Hard [10], we show program to minimize a matrix norm similarity metric. how to obtain upper and lower bounds for the GED in The graph edit distance (GED) is a convenient and logical polynomial time by using solution techniques for standard graph distance metric that arises naturally in the context of linear programming and the assignment problem [40]. error-correcting graph matching [19], [24], [25]. It can also be These bounds may be useful in the event that the problem viewed as an extension of the string edit distance [26]. The is so large that solving the BLP is impractical. basic idea is to define graph edit operations such as insertion We also present a recognition problem [35] that demon- or deletion of a node/vertex or relabeling of a vertex along strates the utility of the new method in the context of a with costs associated with each operation. The graph edit chemical information system. Suppose there is a database of distance between two graphs is then just the cost associated prototype chemical graphs to which a sample graph is to be with the least costly series of edit operations needed to make compared as described earlier. The experiment proceeds in the two graphs isomorphic. The optimal error-correcting two stages: edit cost selection followed by recognition of a graph isomorphism can be defined as the resulting iso- perturbed prototype graph via a minimum distance classifier. morphism after performing this optimal series of edits. We provide a method for choosing the edit costs that is purely Furthermore, it has been shown that the optimal ECGI under nonparametric and is based on the assumption (or prior a certain graph edit cost function will find the MCS [24]. information) that the graphs in the database should be Enumeration procedures for computing such optimal match- uniformly distributed. The edit costs are chosen as those that ings have been proposed [27], [28], [19]. These procedures are minimize the normalized variance of pairwise distances applicable for only small graphs. In [29], [30], [31], probabil- between nearest neighbor prototypes, thereby uniformly istic models of the edit operations are proposed and used to distributing them in the metric space of graphs defined by the develop MAP estimates of the optimal ECGI. It is not clear in GED. Note that a metric which uniformly distributes nearest all applications, however, what is the appropriate model to neighbors in the database essentially equalizes the prob- use for the edit probabilities. As with previous metrics, ability of classification error with a minimum distance efficient algorithms have been developed for computing edit classifier, thereby minimizing the worst case error. This distances on trees with certain structures [32], [33], [34]. method is similar to the use of spherical packings for error- The graph edit distance is parameterized by a set of edit correcting code design, where the distances between all costs. The flexibility provided by these costs can be very nearest-neighbor code points are the same [41]. Also, useful in the context of the standard recognition problem providing such homogeneous sets of graphs is desirable in described earlier of matching a sample input graph to a chemical applications for certain structure-activity experi- database of known prototype graphs [35]. If chosen appro- ments [6]. This computation involves matching all pairs of priately, the costs can capture the essential features that prototypes in a neighborhood and tabulating the edits characterize differences among the prototype graphs. Re- between them. These are provided as inputs to a single cently, methods for choosing these costs that are best from a convex program to solve for the optimal edit costs. This is one
Authorized licensed use limited to: University of Michigan Library. Downloaded on May 5, 2009 at 11:46 from IEEE Xplore. Restrictions apply. 1202 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 8, AUGUST 2006
Fig. 2. Example edit grid G ð ; ;l Þ with five vertices. Fig. 1. Example undirected unweighted graphs with vertex attributes. The attribute alphabet is given by ¼f ; ; g. (a) G0ðV0;E0;l0Þ and insertion and deletion edits by simply swapping a null label (b) G1ðV1;E1;l1Þ. for a nonnull label or vice versa. The state of the edit grid is altered by relabeling its elements until the graph G1 surfaces possible method for choosing edit costs; other methods might somewhere on the grid. Assuming a cost metric on the set of certainly be concocted to accommodate whatever prior labels (including the null label), we prove the existence of a set information about the data at hand is available. of graph edits with minimum cost that occur in one transition We test our algorithm on a database of 135 chemical graphs of the edit grid state. We also show that the graph edit derived from a set of similar biochemical molecules [42]. Our distance implied by this cost is a metric on the set of graphs. GEDmetriciscomparedwiththeMCS-basedmetrics Next, we consider the adjacency matrix representation in proposed in [13], [11]. Indeed, the similarity of molecules in order to develop the binary linear programming formula- this database indicates it is a good candidate for our method tion of the graph edit optimization as a practical imple- of edit cost selection. We first compute the optimal edit costs mentation of the general framework. We show how to use as previously described and show that our metric equipped this formulation to obtain upper and lower bounds on the with these costs more uniformly distributes the prototype graph edit distance in polynomial time. graphs than either MCS metric. The recognition problem is Finally, we offer a minimum variance method for choosing investigated next by generating sample graphs through a cost metric. This metric is appropriate for a graph random perturbations on the prototype graphs; thus, we recognition problem wherein an input graph is compared to consider a scenario where the ECGI is used to “fix” errors a database of prototypes. It is based on the assumption that between sample and model graphs. Each sample graph is the prototype graphs should be roughly uniformly distrib- matched to every prototype in the database in an effort to uted in the metric space described by the graph edit distance. recognize which prototype was perturbed to create the 2.1 Editing Graphs and the Graph Edit Distance sample graph and a classification ambiguity index is Let G ðV ;E ;l Þ be an undirected graph to be edited where computed. The GED metric is found to perform better with 0 0 0 0 V is a finite set of vertices, E V V is a set of respect to certain types of edit and worse with respect to 0 0 0 0 unweighted edges, and l0 : V0 ! is a labeling function others than the MCS metrics. However, when random edits that assigns a label from the alphabet to each vertex. We are applied, the GED typically performs better. assume there is at most one edge between any pair of This paper is organized as follows: Section 2 presents the vertices. The vertex labels in capture the attribute bulk of the theory. Within Section 2, we first present the information. We define the label =2 as is a special general framework for computing the GED and prove that vertex label whose purpose will be introduced shortly. it results in a metric, provided the edit cost function is a These assumptions are made implicitly for every graph in metric. This is followed by the development of the binary this paper so that we need not mention them again. Some linear program to compute the graph edit distance along example graphs are shown in Fig. 1. with a description of polynomial-time solutions for upper N Let ¼f!igi¼1 denote a set of vertices; accordingly, and lower bounds on the GED. Finally, a description of edit is the set of undirected edges connecting all pairs of vertices cost selection for a graph recognition problem is given and in . We refer to the complete graph G ð ; ;l Þ as the it is shown that the resulting problem is a convex program. edit grid. N, the number of vertices in the edit grid, may be as Section 3 presents the results of the graph recognition large as necessary. We will argue later that for computing the problem applied to a database of chemical graphs and graph edit distance between G0ðV0;E0;l0Þ and G1ðV1;E1;l1Þ Section 4 provides some concluding remarks. N needs to be no larger than jV0jþjV1j. An example edit grid with five vertices is shown in Fig. 2. For the purposes of editing, we let the graph G0ðV0;E0;l0Þ 2THEORY be situated on the edit grid, i.e., V0 and E0 ; We first introduce a framework for edits on the set of equivalently, G0 is a subgraph of G . Vertices in V0 are unweighted, undirected graphs with vertex attributes based assigned the appropriate label from determined by the on the relabeling of graph elements (vertices and edges). labeling function l0, that is, l ð!iÞ¼l0ðviÞ for all !i 2 V0. Suppose we wish to find the graph edit distance between Vertices in V0 are assigned the vertex null label so graphs G0 and G1. The graph to be edited G0 is embedded in a l ð!iÞ¼ for all !i 2 V0. The null label indicates a virtual labeled complete graph G referred to as the “edit grid.” vertex that may be made “real” during editing by changing its Vertices and edges in G may possess the special null label label to something in . Since edges are unweighted, they indicating the element is not part of the embedded graph; take labels from the set f0; 1g, where 1 indicates a real edge such an element is referred to as “virtual” and allows for and 0 (the edge null label) indicates a virtual edge.
Authorized licensed use limited to: University of Michigan Library. Downloaded on May 5, 2009 at 11:46 from IEEE Xplore. Restrictions apply. JUSTICE AND HERO: A BINARY LINEAR PROGRAMMING FORMULATION OF THE GRAPH EDIT DISTANCE 1203
Fig. 3. Isomorphisms of the graph G0 on the edit grid G . Vertex labels are noted; dotted lines indicate virtual edges (label 0) while solid lines indicate real edges (label 1). The vertex numbering in Fig. 2 is used, therefore the standard placement is shown in (a).
Accordingly, l ð!i;!jÞ¼1 for all edges ð!i;!jÞ2E0 and We perform a finite sequence of graph edits to transform l ð!i;!jÞ¼0 for all edges ð!i;!jÞ2ð Þ E0. When the the graph G0ðV0;E0;l0Þ situated on the edit grid G ð ; graph G0 is placed on the first jV0j vertices of G (i.e., !i ¼ vi ;l Þ into the graph G1ðV1;E1;l1Þ (such that V1 and for i ¼ 1; 2; ...; jV0j), we refer to this as the standard placement. E1 ). Vertex edits consist of insertion, deletion, or Some placements of the graph G0 from Fig. 1 on the edit grid relabeling to some other symbol in . Edge edits consist of of Fig. 2 are shown in Fig. 3. These are clearly isomorphisms of insertion or deletion. Using the null labels introduced above, G0 on the edit grid. we may interpret all graph edits as the relabeling of real and Here, it is appropriate to provide a quick note on virtual elements. For example, changing the label of a virtual indexing notation used throughout this paper. Superscript edge from 0 to 1 corresponds to the insertion of that edge into indices index elements within a vector while subscript the graph GðV;E;lÞ. Similarly, relabeling an edge from 1 to indices index the entire vector (such as when it occurs in a 0 amounts to deleting that edge. Vertex insertion or deletion is 2 a bit more complex in that it also typically involves edge edits; sequence). For example, x5 refers to the second element in the x5 vector (fifth vector in a sequence of fxkg). Similar however, there is a natural decomposition of the vertex edit 34 that is consistent with this framework. Consider a vertex indexing schemes are adopted for matrices: A1 refers to the ð3; 4Þ element in matrix A1. Also, a single superscript index deletion whereby a vertex is removed from the graph along 3 with all edges adjacent to that vertex. We may delete the on a matrix indexes the entire row, so that A1 denotes the vertex by changing its label 2 to and relabeling all edges third row of matrix A1. N 1ðN2 NÞ 1 0 Let 2ð [ Þ f0; 1g2 denote the state vector of adjacent to it from to . Vertex insertion may involve attaching the new vertex to the existing graph via an edge. the edit grid. We assign an ordering to the graph elements Again, this process is easily decomposed by relabeling a (vertices and edges) of the edit grid so that the ith element virtual vertex from to some desired label 2 and ith of contains the label of the element of the edit grid (i.e., changing the label of an appropriate virtual edge from 0 to i l i i ¼ ð Þ for 2 [ð Þ). For example, the element 1. Thus, it suffices to consider the transforming of edge and orderings and state vectors for the graphs in Fig. 3 are vertex labels as the fundamental operation for editing. shown in Table 1. Edits essentially serve to alter the state of the edit grid. Thus, we may specify a sequence of edits by noting the M sequence of edit grid state vectors f kgk¼0 resulting from TABLE 1 these edits. Suppose we wish to transform a graph G into a Element Orderings, State Vectors, and Corresponding 0 graph G by performing edit operations. Assume at this point State Vector Permutations for the Isomorphisms of 1 that the initial state of the edit grid 0 contains G0 in its G0 on the Edit Grid as Shown in Fig. 3 standard placement. We must have the final state M be such that it describes G1 situated in some fashion on the edit grid. Thus, if 1 is the set of state vectors corresponding to all isomorphisms of G1 on the edit grid, we must have M 2 1. Two different state sequences for transforming the example G0 into the example G1 of Fig. 1 are shown in Fig. 4. The set of all isomorphisms of a graph Gn on the edit grid, n, may be defined in terms of the standard placement of Gn denoted by n and —the set of all permutation mappings describing isomorphisms of the edit grid G —as in (1). no i i n ¼ j9 2 s:t: ¼ n : ð1Þ
Note that does not contain all possible permutations of the elements of the state vector because elements of must describe an isomorphism of the edit grid. For The vector of edit grid graph elements is denoted by , the state vectors ¼f! ;! g are denoted by A, B, and C , and the state vector permutations are example, an edit grid with two vertices 1 2 has 0 0 0 denoted by A, B, and C for the corresponding isomorphism in Fig. 3. only two isomorphisms: !1 ¼ !1, !2 ¼ !2 and !1 ¼ !2, Note that the numbering of the edit grid vertices ! shown in Fig. 2 is used. 0 i !2 ¼ !1. Assuming the graph elements are indexed as
Authorized licensed use limited to: University of Michigan Library. Downloaded on May 5, 2009 at 11:46 from IEEE Xplore. Restrictions apply. 1204 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 8, AUGUST 2006
Fig. 4. Two different edit grid state sequences for transforming G0 of Fig. 1 into G1. The upper sequence requires only one state transition, while the lower sequence requires two. In both cases, the initial state is the standard placement of G0, and the final state represents an isomorphism of G1 on the edit grid. Using the vertex numbering scheme of Fig. 2, the following nontrivial edits are made in the single transition of the upper sequence: ð!1; ! Þ, ð!2; ! Þ, ð!4; ! Þ, ðð!1;!2Þ; 1 ! 0Þ, ðð!1;!3Þ; 1 ! 0Þ, ðð!2;!3Þ; 1 ! 0Þ, and ðð!3;!4Þ; 0 ! 1Þ. In the first transition of the lower sequence, we have ð!1; ! Þ, ð!2; ! Þ, ð!3; ! Þ, ðð!1;!2Þ; 1 ! 0Þ, ðð!1;!3Þ; 1 ! 0Þ, ðð!2;!3Þ; 1 ! 0Þ, and in the second transition of the lower sequence: ð!1; ! Þ, ð!2; ! Þ, ðð!1;!2Þ; 0 ! 1Þ. For a metric cost function c, the cost of the upper sequence is given by cð ; Þþcð ; Þþcð ; Þþ4cð0; 1Þ, and the cost of the lower sequence is cð ; Þþ3cð ; Þþcð ; Þþ4cð0; 1Þ.
2 2 ¼ð!1;!2; ð!1;!2ÞÞ, there are only two permutations of the Proposition 1. If c : ð [ Þ [f0; 1g !<þ is a metric on the state vector that comprise : 1 ¼ð1; 2; 3Þ and 2 ¼ð2; 1; 3Þ. set of labels, then C as defined in (2) is a metric on the edit grid Indeed, it will always be the case that j j¼N!. The state space. permutations of the state vector corresponding to the Proof. Expand C in terms of c using the definition in (2) and isomorphisms of G0 in Fig. 3 are given in Table 1. apply the metric properties of c to trivially obtain the 2 2 We define a cost function c : ð [ Þ [f0; 1g !<þ that corresponding properties for C. tu assigns a nonnegative cost to each graph edit. The cost of an We thus have the following useful lemma: M edit grid state transition denoted as Cð k 1; kÞ is simply the Lemma 1. Let c be a metric and f kgk¼0 be a sequence of edit grid sum of all edits separating the two states, i.e., state vectors. Then, for all M 1,
XI XM i i Cð k 1; kÞ¼ cð k 1; kÞ; ð2Þ Cð 0; M Þ Cð k 1; kÞ: i¼1 k¼1 where I ¼ N þ 1 ðN2 NÞ is the total number of graph 2 Proof. The M ¼ 1 case is trivial and the M ¼ 2 case follows elements (vertices and edges) in the edit grid. Similarly, the from the triangle inequality. Assume the claim holds for cost of a sequence of state transitions is simply the sum of some value M and proceed by induction: the costs of individual transitions. We consider only cost functions that are metrics on the set of vertex and edge Cð 0; Mþ1Þ Cð 0; M ÞþCð M ; Mþ1Þ labels as characterized by Definition 1. XM Cð ; ÞþCð ; Þ Definition 1. A cost function c : ð [ Þ2 [f0; 1g2 !< is k 1 k M Mþ1 þ k¼1 ð3Þ a metric if the following conditions hold for all MXþ1 2 2 ðx; yÞ2ð [ Þ [f0; 1g : ¼ Cð k 1; kÞ; k¼1 1. Positive definiteness: cðx; yÞ¼0 if and only if x ¼ y. 2. Symmetry: cðx; yÞ¼cðy; xÞ. where the first line in (3) follows from the triangle inequality and the second line uses the induction 3. Triangle inequality: cðx; yÞ cðx; zÞþcðz; yÞ for all 2 2 hypothesis. tu ðx; zÞ and ðz; yÞ in ð [ Þ [f0; 1g . We now define the graph edit distance with respect to a Assuming a metric cost function, the cost of the upper cost function c as sequence in Fig. 4 is cð ; Þþcð ; Þþcð ; Þþ4cð0; 1Þ, and the cost of the lower sequence is cð ; Þþ3cð ; Þþcð ; Þ XM dcðG0;G1Þ¼ min Cð k 1; kÞ; ð4Þ þ4cð0; 1Þ. With such a cost function c, we have the following M f kgk¼1j M 2 1 k 1 simple result: ¼
Authorized licensed use limited to: University of Michigan Library. Downloaded on May 5, 2009 at 11:46 from IEEE Xplore. Restrictions apply. JUSTICE AND HERO: A BINARY LINEAR PROGRAMMING FORMULATION OF THE GRAPH EDIT DISTANCE 1205 where 0 is the standard placement of G0 on the edit grid, 1 greedy algorithm for computing the graph edit distance in is the set of state vectors corresponding to isomorphisms of this case. G1 on the edit grid as in (1), and M is the maximum number If solves the minimization in (7), then we have of allowed state transitions (this can be as large as desired, XI but we are only concerned with finite graphs implying M i i dcðG0;G1Þ¼ cð 0; 1 Þ will be finite). There is no loss of generality by fixing the i¼1 0 X ð8Þ number of terms in the summation of (4) since, for M