Exact Probabilistic Inference for Inexact Graph Matching

Rupert Brooks 110247534 April 30, 2003

Inexact graph matching is a fundamental problem in computer vision applications. It crops up when- ever a correspondence between two structures must be determined in the presence of structural and at- tribute noise. Despite the importance of the problem, graph matching techniques in computer vision tend to be extremely task specific, and sometimes lack the power a general approach would have. For example, this project was motivated by the matching of human cortical sulci described in [1]. Riviere` et al achieve a fairly reliable solution for this problem using a congregation of neural networks. Unfortunately, their technique does not yield any sort of quality measure for the match, or the degree to which other matches are likely. A more general approach which could deliver information on match quality and report the existence of multiple valid matches would be desirable.

1 Theoretical background

The notation and terminological definitions for various elements of graph matching vary considerably between different authors. While this is to be expected to some degree, it does cause some confusion when comparing the different algorithms. In particular, the difference between a graph and the embedding of a graph is glossed over by all the approaches reviewed here, which leads to confusion. This section defines the terms used in this report in a mathematically strict 1 fashion. A graph consists of a structure G = (V, E), where V is a set of vertices, and the predicate E ⊆ V × V is a set of edges. Two vertices are called adjacent if they form the elements of an edge in E. A graph G is a subgraph of a graph H if VG ⊆ VH and EG ⊆ EH . The degree of a is the number of edges which connect to it. A graph is called bipartite if V can be divided into two parts V1 and V2 such that every edge in E has an endpoint in V1 and an endpoint in V2 [3]. A homomorphism (or simply morphism) is a mapping that preserves predicates and functions [2]. Thus a homomorphism between two graphs G, H is a mapping between VG and VH so that adjacent vertices in G map to vertices that are also adjacent in H, and non-adjacent vertices in G map to vertices that are not adjacent in H. An isomorphism is a one-to-one homomorphism. Given two graphs, G and H, the graph isomorphism problem is to find a one-to-one, adjacency pre- serving mapping between them if such a mapping exists. This problem holds an unusual place in com- plexity theory. No polynomial algorithm is known, but it has not been shown to be NP-complete. Some mathematicians suspect that it lies in a complexity class between P and NP [3]. The common subgraph isomorphism problem, which is to determine the largest graph F, that is a subgraph of both G and H, has been shown to be NP-complete. There are a range of graph homomorphism problems, most of which also have high complexity [4].

1Using terminology and notation from [2]

1 The graph and subgraph isomorphism problems are often referred to as exact graph matching. For many practical problems a true homomorphism of the graphs involved cannot be found. The area of inex- act graph matching involves finding mappings between the elements of two graphs that are approximately homomorphic. Practical results depend on careful definition of what is meant by an “approximate” homo- morphism. Although some authors claim that exact and inexact matching are entirely separate problems [5], in this report it will be assumed that an inexact matching technique should yield a homomorphism or isomorphism in the limiting case where such a mapping can be found. In most inexact matching approaches, the matching is not symmetric. One graph is treated as the model graph, while the other is treated as the sample graph. The algorithm may give different results if the roles of the two graphs are reversed. In particular, it is accepted that not all the vertices in one graph can be matched to the other. Thus a null vertex, φ, is added to the model graph. Nodes that cannot be matched to a model vertex will be matched to the null vertex. The algebraic definition of a graph says nothing about its representation. However, graphs are usually visualized as a series of dots and lines on paper. The practical examples of graph matching in computer vision use graphs as representations of structures in data. A structure X is embedded in a space Y when Y restricted to X has the properties of X. Thus, a particular representation of a graph is called an embedding of the graph. Our intuitive understanding of graph matching, often involves the matching of embedded graphs rather than just the graphs themselves. In this case, some spatial properties of the embedding should be preserved as well as the graph structure. The Attributed Relational Graph (ARG) extends the notion of graph by adding a vector of attributes defined at each node[6, 7]. Thus an ARG is a triple, G = (V, E, A) where V and E are defined as before and A = {x¯u, ∀u ∈ V }. These attributes can, in principle, be anything, and there is no particular requirement to have the same attributes at each vertex although this is usually done. These attributes usually represent the spatial properties of the vertices. The word relational is for semantic purposes only. It refers to the fact that the edges in an ARG represent relationships between elements being matched.

2 Approaches to inexact graph matching

Fundamentally graph matching techniques have two components. First some definition of the quality of a match must be made. Second, a search must be made through the space of possible matches to find an optimal, or at least acceptable, match. This report focuses on defining the quality measure, and relies on available tools for probabilistic inference to search the space. It is quite likely that specific cases can be solved more efficiently using specialized techniques.

2.1 Bayesian Approaches Of the numerous approaches to graph matching, explicitly probabilistic ones are relatively rare. For a number of years, they were considered difficult to formulate correctly, and risky to use when formulated incorrectly (e.g. [8]). In 1995, Christmas [6] and Wilson and Hancock [9] proposed probabilistic formu- lations for graph matching that used relaxation techniques to optimize a reasonable initial guess. Christmas [6] matched road network features extracted from aerial photography to features in a database. Geometric properties of the features were stored as a set of attributes in an ARG. The segments in the im- age are treated as the vertices of the graph, and the relationships between segments make up the edges of the graph. The probabilities were calculated locally on the attributes at a particular node, and on individual consideration of relational edges.

2 This method appears to work extremely well on the test images, but there are two theoretical annoy- ances. First, the local nature of the relaxation process means that it will converge to a local minimum. Second, and of more concern, a feedback process is used, where the posterior distribution from one iter- ation is used as the prior distribution for the next iteration. The system works well when initialized near the right answer, but it seems improbable that the resulting joint probability distribution can be trusted. In [7], Wilson and Hancock propose a Bayesian criterion for assessing the validity of a structural match. They propose that a correct structural match between a sample vertex, u, and a model vertex, v, will be characterized by three properties:

• the neighbors of u will be matched to neighbors of v,

• conversely, the neighbors of v will be matched to neighbors of u, and

• there will be a minimum of matches to the null vertex.

They explore both an exponential (Gibbs) distribution and a linear distribution as models for the weighted sum of these three properties. Like Christmas, the update method is local. The prior probabilities of matching are first computed based on the local attribute information. A matching model vertex is then selected at each node that maximizes the a posteriori probability given the attribute values and the neighborhood. They test their method on Delaunay triangulations to which they may add or delete nodes. The success is evaluated based on the number of correctly matched nodes. Results using their exponential criteria are impressive, as their system can correctly match the graphs even when initialized with up to 50% of the matches being incorrect, and in the presence of significant noise. The exponential distribution strikingly outperforms a linear distribution. In her PhD thesis proposal [10], Farmer describes several possible models for using Bayesian networks for graph matching. The most promising of these has been used as the foundation for this project. It formalizes the local Bayesian criteria of Wilson and Hancock [7] as a Bayesian network. The proposed structure is shown in Figure 1. The network consists of three types of node. Each vertex, ui in the sample graph has a corresponding node in the network. Each sample node has a child node f(ui) that describes the correctness of the attribute match. Each sample node also induces a neighborhood node Γui . Each Γui has ui and all the neighbors of ui in the sample graph as parents. Farmer develops the cumulative probability tables for each node using the equations from [7], and assuming a constant error probability (that is, that it is half as likely to make two errors as it is to make one error). The Maximum A Posteriori (MAP) estimate of the sample to model matchings is proposed as the obvious best answer. This is the mode(s) of the posterior joint distribution,P , over the sample nodes, where: |Vsample| P : (Vmodel ∪ φ) → [0, 1] (1) P assigns a probability to all possible matchings of the vertices in the sample with the vertices in the model, and of course: X P = 1; (2)

2.2 Fuzzy Approaches In [11], Perchant and Bloch formalize matchings between graphs in terms of fuzzy set theory. Although the implementation described in this report does not use a fuzzy approach, there are several elements of

3 Figure 1: The Bayesian network model proposed in [10]

interest. They define two types of fuzzy graph. A fuzzy graph of type I is a fuzzy relation µ 2 between two sets of vertices S1 and S2. A fuzzy graph of type II G = (σ, µ), is defined as a fuzzy membership function σ over a set of vertices. The fuzzy relation, µ, is defined on S × S. This can intuitively understood as an arbitrary graph where both the nodes and edges have fuzzy degrees of existence. No edge can have a higher weight than the conjunction of its two endpoints, which makes intuitive sense. (If it seems awkward to have two meanings for the same term, observe that a fuzzy graph of type I can be defined as a bipartite fuzzy graph of type II, with normalized vertex weights. Thus any results for fuzzy graphs of type II apply to all fuzzy graphs.) Perchant and Block define the notion of fuzzy morphism to be the analogue of an algebraic morphism between fuzzy graphs. A fuzzy morphism has two parts, a vertex morphism, ρσ and an edge morphism, ρµ. Both of these morphisms are fuzzy graphs - the vertex morphism of type I (bipartite) and the edge morphism of type II. The vertex morphism assigns links between sets of vertices. The edge morphism maps S1 × S2 × S1 × S2 → [0, 1] and has two different interpretations. The first, internal interpretation (S1 × S2) × (S1 × S2) corresponds to the association compatibility between the vertex correspondences. It can be imagined as a weighting over the set of statements like “Matching vertex x to vertex y implies that vertex a must be matched to vertex b”. The second, external, interpretation (S1 × S1) × (S2 × S2) corresponds to matchings between edge elements in two different graphs [11]. Their development is relevant to a probabilistic approach, as a joint probability distribution defining the matching between two graphs can be viewed as a fuzzy graph morphism, (although the converse is not necessarily true). The vertex morphism is simply the marginal probability of each possible match, while each element in the edge morphism can be viewed as the conditional probabilities obtained by fixing each node in turn.

2 A fuzzy relation, µ is de®ned between fuzzy membership functions, σ1, σ2 de®ned over two sets, S1, S2. It is a function over the cross-product of the two sets that maps into the range [0,1]. To be a relation, this function must map them to a value less than the conjunction of the two fuzzy membership functions. For details, see [11].

4 There are two important points to gain from this. First, that a matching between two graphs is com- prised of two graphs. Simply treating a matching as a bipartite graph between the vertices of the sample and the model graph (ie, the vertex morphism alone) is not sufficient as it does not capture the informa- tion in the edge morphism. An example where a vertex morphism alone would utterly fail is given in the experiments section. Secondly, that the dependencies between vertex matches are equivalent to a set of edge matches. This implies that there is probably little to be gained by modelling both the neighborhood constraints of vertices, and the matching between edges. One or the other is probably sufficient. Bengoetxea et al apply the framework of Perchant and Bloch in a probabilistic way in [12]. Unlike the approaches of Farmer, Wilson and Hancock, they do not take neighborhood information into account ex- plicitly. Instead they match edges of the graph as well as vertices. (From the above discussion, more or less equivalent results are expected.) They define a Bayesian network representing the relationships between vertex and edge attributes. Rather than using exact inference to solve for the modes of the probability distribution they use estimation of distribution algorithms (EDA). Briefly, the EDA approach is a population based stochastic search which was designed to overcome some of the shortcomings of genetic algorithms (GA). In both EDA and GA a population of individuals is evolved. Unlike GA, in EDA, the fittest individuals in a population are used to infer a probability distribution. New individuals are then created by sampling from this distribution. The thrust of [5] is a comparison between EDA and GA for graph matching. While GA have shown promise for graph matching in the past, in these experiments the EDA approach systematically performs better.

2.3 The bigger picture Many of the approaches described have focussed on relatively limited domains. By situating graph match- ing in its wider theoretical context it may be possible to use apply results in other fields to the solution. Graph matching problems are a type of labelling or constraint satisfaction problem. In a labelling problem, a set of labels must be assigned to a set of vertices while obeying a set of constraints. In the discrete labelling problem [8], or equivalently a finite domain constraint satisfaction problem [13], the constraints are binary, and a labelling must be found which does not violate any of them. If such a labelling does not exist, then the constraints are inconsistent and the problem is unsatisfiable. These problems have a wide literature which is relevant to our search for a general graph matching solution. Two points of particular interest are discussed below. Hummel and Zucker describe a general foundation for relaxation labelling problems in [8]. They extend the notion of constraint satisfaction to the case of the continuous labelling problem. In the continous labelling problem, instead of binary label assignments, labels are assigned with weights that sum to one, and the objective is to minimize a cost function defined by the constraints. A label assignment is consistent if there is no local change to the labelling assignments which would decrease the cost function of the local constraints at that vertex. They show that consistent label assignments do not necessarily coincide with minima of the cost function. Therefore the modes of the joint probability distribution may actually define locally inconsistent matchings of the graphs. The literature on discrete constraint satisfaction is most developed in the field of constraint logic pro- gramming (e.g. [13]). Since the satisfiability problem SAT 3 is a constraint satisfaction problem, it is clear that the general case of determining the solutions, if any, to a set of constraints is NP-complete. To deal with the heavy computational load, methods for quickly pruning a search space have been developed. The k-consistency methods use tests over limited subspaces of the problem to eliminate impossible combina- tions of labels. A set of possible labellings is k-consistent if for any k-1 variables, there exists a label for any arbitrary kth variable such that all constraints among the variables are satisfied. The most common

3SAT is the original problem proved to be NP-complete by Cook in 1971.

5 consistency checks are node-consistency (or 1-consistency) and arc-consistency (2-consistency) [14]. The constraints used by Christmas [6] correspond to arc-consistency. The constraints used by Wilson and Hancock [7] are stronger, corresponding to 3-consistency.

3 Project Scope

This project was primarily motivated by the search for a method of graph matching which would give information on the match quality and on alternate matchings. Problem specific graph matching techniques such as [1] produce a single match without any qualifying information. Therefore a general rather than problem specific approach was sought. Probabilistic reasoning with Bayesian networks is capable of both modelling multimodal information and quantifying the certitude of a given result. Farmer’s [10] proposed graph matching network was chosen for investigation. There were three main goals in undertaking this project:

• To review and extend if necessary the theoretical model proposed by Farmer which would allow the specification of quality measures, and the detection of multiple modes in graph matching.

• To connect this model to whatever degree possible with related problems and approaches

• To test this model by implementation

In order to achieve a general approach, solidly connected to the theory of probabilistic inference, optimization issues were ignored. Graph matching is, like most artificial intelligence problems, a search of a vast solution space. Pruning that search is critical to getting an efficient implementation. This issue was ignored for the general treatment, and as a result the implementation is computationally intensive. Some proposals for optimization based on existing approaches to constraint satisfaction problems are discussed in the conclusion. The implementation was in Matlab [15] using the Bayes Net Toolbox (BNT) [16]. The GraphViz [17] library was used to visualize some of the results. The BNT allows exact inference on Bayesian networks using either a junction tree or variable elimination. For this project, regrettably, variable elimination had to be used as the junction tree algorithm fails when the queried vertices are not all in the same clique 4. Slight modifications were made from the system described by Farmer [10] and Wilson and Hancock [7]. Note that although a number of weight parameters are specified, the system works well when they are set to one, and they do not need adjustment in general.

• Farmer proposed to use a fixed probability of error for determining the probabilities of the attribute nodes. However, [7] demonstrates much better results with exponential distributions. Therefore all the attribute nodes were given an exponential distribution. The probability of a true match at a given attribute node is given by:

watt||xu −Xv || P (f(ui) = true|ui = vj) = e i j (3)

where:

– watt is a weighting parameter for the attributes, used for scaling the values relative to the expected error in the observations x – ui is the vector of observations at ui

4This is the same problem described in Assignment 4, question 2c of COMP-526

6 X – vj is the vector of model parameters at vj

– ||x|| is the L2 norm of x • Wilson and Hancock combine a parameter under products of a couple weighting factors. For clarity, their formula for the probability was simplified to:

−wfwH(Ni)−wbwH(Mj )−wdummyΨ P (Γi|Ni = Mj) = e (4) where

– Ni = Mj indicates a set of assignments between the sample vertex neighborhood Ni and a set of model vertices Mj.

– wfw is the weighting of the neighbors of ui that are missing from the match set

– H(Ni) is the count of the neighbors of ui that are not matched to neighbors of vj. (This was referred to as the Hamming in [7].)

– wbw is the weighting of the extra neighbors of vj

– H(Mj) is the count of the neighbors of vj that are not matched to neighbors of ui.

– wdummy is the weighting of the number of dummy nodes.

– Ψ is the count of dummy nodes in the neighborhood of ui. • There is actually no constraint keeping multiple sample vertices from matching to the same model vertex. When using spatial attributes, this rarely happens, but in the experiments involving structural matching only it happened frequently. As described in [18] the uniqueness constraint was imple- mented by creating an XOR node. This node receives input from all the sample nodes, and the probability of it being true is:

wXORχ P (XOR|Vsample → (Vmodel ∪ φ)) = e (5) where:

– Vsample → (Vmodel ∪ φ) represents a complete mapping from the sample vertices to the model vertices.

– wXOR is a weight parameter for the XOR node. – χ is the number of duplicated matchings between sample vertices and model vertices. The system takes as input a sample graph and a model graph, specified as adjacency matrices. Op- tionally, observations may be provided for each sample vertex, in which case expected values must be provided for each model vertex as well. The user must also specify whether or not to use an XOR node, and if necessary, any non-default values for the weighting parameters. Under normal circumstances the only weight parameter that needs to be set is watt which controls the spatial scale of the residuals between the observations and the expected values. Once all necessary data is provided, a Bayesian network as described above is created. Figure 2 shows the Bayes network created using this process for matching two four-vertex triangulations. Once the network is created, all the nodes except the sample nodes are observed as having a correct match. Unlike other inexact graph matching approaches, no initialization of the matching is required. The matching is determined from uniform priors over the possible label assignments. The variable elimination inference engine in BNT is used to obtain the joint distribution over the possible label assignments to the sample vertices. The modes of this distribution are reported as the possible answers. If desired, the entire joint distribution may be obtained.

7 3 2 4 1

f(3) f(2) NBR 2 NBR 3 NBR 4 XOR NBR 1 f(4) f(1)

Figure 2: A Bayes net generated for matching two four-vertex triangulations

Figure 3: A four point graph which can be matched to itself in many ways due to symmetry

4 Experiments

Due to time constraints, relatively small graph matching exercises were performed. Nevertheless, they show that the algorithm gives good results, even in the presence of both structural and attribute noise.

4.1 Structural Matching For testing and experimentation purposes, a number of small graphs were matched based on structure alone. This differs from all the inexact graph matching approaches discussed earlier as no spatial infor- mation about the embedding of the graph was used. This problem is much more related to the exact graph and subgraph homomorphism problems.

4.1.1 Matching a diamond - Graph isomorphism Consider the exact graph isomorphism problem of matching the two graphs (see figure 3):

Gsample = Gmodel = {V = {1, 2, 3, 4}; E = {(1, 2), (2, 3), (3, 4), (4, 1)}} (6)

A mapping from Gsample to Gmodel can be described as a list of the vertices in Gmodel in the order that the sample vertices map to them. By inspection, the isomorphic matches are 1234, 2341, 3412, 4123, 4321, 3214, 2143, 1432. However, if the graph matching is run without the using the XOR node, then the modes of the joint probability distribution include such matches as 1232, 2343, and so forth. The graph has ”folded up” on itself because none of the neighborhood constraints have enforced the uniqueness of the nodes. If the XOR node is used, then the problem disappears and the eight correct modes are found. Note that the individual matching probabilities for each node tell us little. Each node in the sample has an equal probability of being matched to any node in the model. This is an example of a case where the edge morphism described earlier is essential for describing the match.

4.1.2 Finding triangles - Subgraph isomorphism Figure 4a) shows a classic children’s puzzle where the child is asked to count the triangles. The correct answer is 5 - the four small triangles plus the large outer triangle. This question can be posed to the graph matching system by making the sample graph a triangle, and the model graph the structure in Figure 4a). In this case, the system returns three modes, which correspond to the three corner triangles (highlighted

8 Figure 4: A simple puzzle that is an example of subgraph isomorphism in different colors in Figure 4b). The middle triangle is not a mode of the distribution because there are ’extra’ connections on each of its corners. If wbw is reduced to zero, then the middle triangle can be found as well. Can the outer triangle be found using only structural constraints? Unfortunately it cannot. It is visible to us as a triangle only because of the way it is drawn. Figure 4c) is the same graph,but in this case there is no outer triangle.

4.2 Delaunay Triangulations In [7], Wilson and Hancock test their structural matching algorithm by generating Delaunay triangulations of random point sets (approximately 40 points). They perturb the resulting graphs by adding or removing points, which causes adjustments in the triangulation structure. The same type of test has been used to test this implementation, although much smaller triangulations have been used (6 points). Delaunay triangulations of six different random point sets were generated. Each of these was ran- domly perturbed, and the graph matching algorithm was used to match the perturbed result to the original graph. The perturbation took the form of 0,1 or 2 edits after which the triangulation was recomputed. The coordinates of the resulting graph were then perturbed by uniform noise with ranges of 0.025, 0.05, or 0.125. An edit was either a node deletion or addition. The random points were created in the interval [0,1] so noise of 0.125 units could move points quite far relative to their original configuration. There were thus nine different levels of perturbation. For each level, each of the 6 different graphs was randomly perturbed and matched 5 times. The matching algorithm made use of the spatial attributes and the neighborhood information, but did not require an XOR node. In all cases, a unimodal solution was obtained. Not surprisingly, the entropy of the distribution was lowest, and the mean matching success rate was highest with the least amount of perturbation. The results are summarized in the following tables. A typical example match is shown in Figure 5, while an example of a poor matching result is shown in Figure 6. The system is far more sensitive to the perturbations due to graph edits than due to noise. Even with the highest amount of noise used, the matching was perfect for all cases with zero edits. While the matches to the edited graphs are poorer, one must bear in mind that in many cases it was not possible to match six out of six nodes as one or more of the original nodes had been deleted.

9 Number of edits noise=0.025 noise=0.05 noise=0.125 0 6/6/6 6/6/6 6/6/6 1 3/4.97/7 1/4.83/7 3/4.80/7 2 2/4.20/6 0/3.83/8 0/3.63/6 Table 1: Results of matching a triangulation to a corrupted version of itself. The numbers indicate the minimum/mean/maxiumum number of correct matches.

Number of edits noise=0.025 noise=0.05 noise=0.125 0 1.5451 1.6006 1.7529 1 5.4213 5.7622 6.0153 2 7.0979 6.6123 6.0229 Table 2: Mean entropy of the joint probability distribution of the computed match betwen graphs.

Matching 5 2 with 2 edits, noise 0.1 Modes 1 Entropy 5.2513 Matched 5/6

1.2

1

2 2

0.8

0.6 6 6 3 3

4 0.4 A

5 5 0.2 1 1

0

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Figure 5: Typical matching between a triangulation (left) and a corrupted version of it (right)

10 Matching 2 3 with 2 edits, noise 0.25 Modes 1 Entropy 5.4452 Matched 1/6 1.2

1 5 5 2 0.8 2

0.6

0.4 1 1 0.2 4 63

0 6

−0.2

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Figure 6: Poor matching between a triangulation (left) and a corrupted version of it (right)

4.3 Performance analysis The Matlab profiler was used to keep track of the amount of time used by various components of the graph matching process. The 270 Delaunay triangulation matches described above took an average of 42.5s to run 5. Considering that the great majority of the Matlab functions used were interpreted, not compiled, this performance is not too bad. It is interesting to note that the inference itself did not require the most time. Of the processing time, 65.1% was spent building the Bayes nets and their probability distributions. Only 14.4% was spent doing inference. The remainder was spent on various housekeeping tasks such as plotting graphics, loading and saving results, and so forth. Since many of the nodes in the Bayes net have the same distributions, there is an immediate potential for speed up by caching the distributions when the network is being built.

5 Discussion

This project has demonstrated a prototype system that implements the Bayesian network model for graph matching proposed in [10]. Testing similar to that done in [7] was conducted. Relatively simple datasets were used due to processing time constraints, but results indicate that the algorithm works well. The use of spatial attributes as constraints tends to greatly simplify the matching problem. Multimodal results are uncommon, and vertex uniqueness does not become an issue. When structural matching is used alone, however, there are often multiple equally valid results. While this may appear to cause confusion, there may also be hidden benefits. It is interesting to node that the multimodal nature of the matching of the diamond graph, for example, corresponds to its possible symmetries and reflections.

5.1 Controlling Complexity This method achieves the goals of generality and global optimization. Unfortunately, these come with a price in algorithmic complexity. Even with small numbers of nodes, our graph matching becomes

5On Matlab 6.5 Release 13, running on Windows 2000 on a 400Mhz Pentium III with 256Mb of RAM

11 computationally onerous. It will be necessary to find ways to prune the search space before this method can be used on practical problems. Two proposals for reducing the search space in a coherent way are made here.

5.1.1 Limiting the XOR function When it is required, the XOR node connects to every single sample node in the network. Needless to say, this creates a very large clique which leads to long processing times. However, it may not be strictly necessary to enforce a uniqueness constraint on every node. A trivial example is that the uniqueness of a node and its neighbors is already enforced by the neighborhood relationships. Thus a node really only needs to be kept unique from its neighbors once removed. The problem of finding a set of nodes which are once removed from each other is known as the vertex cover problem. Regrettably, it is NP-complete, however, a number of tractable methods for graph decimation do exist (e.g. [19]).

5.1.2 Limiting the possible set of labellings Several authors have pointed out [6, 7] that reducing the possible set of labellings of the sample graph greatly improves the search time. A number of ad-hoc approaches have been used. This problem has been addressed in other research areas, particularly in constraint satisfaction problems. It would be interesting to apply the node-consistency and arc-consistency algorithms used for constraint programming to the graph matching problem. Highly efficient algorithms exist for these, and they are known to prune the search space well [14, 13]. These algorithms could give a formal framework to existing heuristics for reducing the number of labels. For instance, applying node consistency would remove those possible labels whose attributes are too incompatible with a given sample vertex. Arc consistency would remove from consideration labellings where the two endpoints of an edge incompatible with the labelling.

5.2 Further investigation There are a number of interesting questions that could not be investigated within the scope of this project. Perchant and Bloch showed that matching edges is more or less equivalent to matching based on neigh- borhood constraints. Nevertheless it would be interesting to investigate approaches to graph matching that matched on edges as well as vertices. There may well be applications where one approach or the other has an advantage. In the triangle matching problem, a human being sees the outer triangle in the graph very strongly due to the process of perceptual grouping. The outer boundary is grouped together due to the principle of good continuation. That is, the line which continues straight through a junction has more merit than the line that bends at a junction. This property is not dealt with at all in this graph matching approach. It would be interesting to encode additional constraints that would group together subunits of a graph based on their perceptual properties. There are numerous examples where perceptual grouping properties could be useful, such as the analysis of maps, diagrams or handwriting.

References

[1] D. Riviere,` J.-F. Mangin, D. Papadopoulos-Orfanos, J.-M. Martinez, V. Frouin, and J. Regis,´ “Au- tomatic recognition of cortical sulci of the human brain using a congregation of neural networks,” Medical Image Analysis, vol. 6, no. 2, pp. 77–92, 2002.

12 [2] H. B. Enderton, A Mathematical Introduction to Logic. Academic Press, 1972. [3] E. Weisstein, “,” in Eric Weisstein’s World of Mathematics, Online Book http:// mathworld.wolfram.com/topics/GraphTheory.html. [4] P. Crescenzi, V. Kann, M. Halldorsson,´ M. Karpinski, and G. Woeginger, “A compendium of NP opti- mization problems.” Online Book http://www.nada.kth.se/˜viggo/wwwcompendium/ wwwcompendium.html, 2002. [5] E. Bengoetxea, Inexact Graph Matching Using Estimation of Distribution Algorithms. PhD thesis, Ecole Nationale Superieure´ des Tel´ ecommunications´ (Paris), October 2002. [6] W. J. Christmas, Structural Matching in Computer Vision using Probabilistic Reasoning. PhD thesis, University of Surrey, September 1995. [7] R. C. Wilson and E. R. Hancock, “Structural matching by discrete relaxation,” Pattern Analysis and Machine Intelligence, vol. 19, no. 6, pp. 634–648, 1997. [8] R. A. Hummel and S. W. Zucker, “On the foundation of relaxation labeling processes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-5, no. 3, pp. 267–287, 1983. [9] R. C. Wilson and E. R. Hancock, “A Bayesian compatibility model for graph matching,” Pattern Recognition Letters, vol. 17, pp. 263–276, 1996. [10] S.-J. Farmer, “Using probabilistic networks for graph matching.” http://www-users.cs. york.ac.uk/˜sara/project/publications/00submission/%netmatchart.ps, 2001. [11] A. Perchant and I. Bloch, “Fuzzy morphisms between graphs,” Fuzzy Sets and Systems, vol. 128, pp. 149–168, 2002. [12] E. Bengoetxea, P. Larranaga, I. Bloch, A. Perchant, and C. Boeres, “Inexact graph matching by means of estimation of distribution algorithms,” Pattern Recognition, vol. 35, pp. 2867–2880, 2002. [13] K. Marriot and P. J. Stuckey, Programming with constraints. MIT Press, 1998. [14] R. Bartak,´ “Constraint programming: In pursuit of the holy grail,” in Proceedings of Week of Doctoral Students, Part IV, pp. 555–564, MatFyzPress, Prague, 1999. [15] “Matlab.” Software Program published by Mathworks Inc. http://www.mathworks.com, 2003. [16] K. Murphy, “Bayes net toolbox for Matlab.” Software Program available at http://www.ai. mit.edu/˜murphyk/Software/BNT/bnt.html, 2003. [17] “Graphviz: graph visualization software.” Software Program published by AT&T Research. Avail- able at http://www.graphviz.org, 2003. [18] J. Pearl, Probabilistic reasoning in intelligent systems: Networks of Plausible Inference. Morgan Kaufmann, 1998. [19] J.-M. Jolion, “Stochastic pyramid revisited,” Pattern Recognition Letters, vol. 24, no. 8, pp. 1035– 1042, 2003.

13