2.5D Elastic Graph Matching Algorithms
Stefanos Zafeiriou† Maria Petrou† Vasileios Argyriou ∗ † Imperial College London ∗Kingston University {s.zafeiriou,maria.petrou}@imperial.ac.uk {v.argyriou}@kingston.ac.uk
Abstract The graph matching process is implemented by a stochas- tic optimization of a cost function which takes into account In this paper, a series of advances in Elastic Graph both jet similarities and grid deformations. One of the basic Matching (EGM) for face recognition assisted by the avail- advantages of EGM algorithms is that they can be applied ability of 3D facial geometry is proposed. More speci - in a fully automatic manner when combined with a face de- cally, we conceptually extend the EGM algorithm in order tector and/or a fully automatic facial feature detection algo- to exploit the 3D nature of human facial geometry for face rithm [14, 11, 13]. recognition/veri cation. In order to achieve that, rst we A lot of research has been conducted in order to boost extend the matching module of the EGM algorithm in order the performance of EGM for face recognition, face veri - to capitalize on the 2.5D facial data. Moreover, we incor- cation and facial expression recognition [5, 8, 14, 13, 11]. porate the 3D geometry into the multiscale analysis used In [5], EGM has been proposed and tested for frontal face and build a novel geodesic multiscale morphological pyra- veri cation. A variant of the typical EGM, the so-called mid of dilations/erosions in order to ll the graph jets. We Morphological Elastic Graph Matching (MEGM), has been demonstrate the ef ciency of the proposed advances in the proposed for frontal face veri cation and tested under var- face recognition/veri cation problem. ious recording conditions [8, 14, 13]. EGM with morpho- logical feature vectors for facial expression recognition was proposed in [12]. 1. Introduction In this paper, we present EGM methods that exploit the Dynamic link architecture (DLA) has been proposed 3D facial geometry. Nowadays, the acquisition of 3D faces as an abstract methodology for distortion invariant object is a relatively simple procedure due to the low cost devices recognition [9]. In DLA, an object is represented as a con- that are available. This has constituted 3D face recogni- nected graph. Each graph node is located at a certain spa- tion a very rapidly growing research topic. Another reason, tial image location x. A feature vector, the so called jet, is that resulted in the increase of 3D face recognition popular- attached at each graph node. The jet elements can be local ity as a research topic, is that although 2D face recognition brightness values that represent the image region around the systems can achieve good performance in constrained envi- node. However, it is desirable to have more complex types ronments, they still encounter dif culties in handling large of jet that are produced by multiscale image analysis [9]. amounts of facial variations due to head pose, lighting con- The representation of an object in DLA can be summarized ditions and facial expressions. Moreover, the acquisition of in the following steps: 1)Group all the features that corre- 3D face geometry is also quite restricted due to the pres- spond to the same graph node of the object into a jet. 2) ence of noise and the dif culty presented in image acquisi- Group all the nodes and jets that belong to the object in tion [3]. For the acquisition of 3D data, different techniques order to form the object graph. 3) De ne neighborhood re- have been used, such as laser and structured light scanners. lationships for each graph node. However, the use of these techniques is not very practical as One of the most well-studied implementations of DLA is it requires the cooperation of the subject. So, we use pho- the Elastic Graph Matching (EGM) algorithm. In [9] EGM tometric stereo [2] for 3D acquisition, that does not depend was initially proposed for arbitrary object recognition from signi cantly on the user’s cooperation. The actual model images and has been a very popular topic of research for that we acquire this way is a 2.5D model. 2.5D is a sim- various facial image characterization applications. In EGM, pli ed 3D (x, y, z) surface representation that contains at a reference object graph is created by overlaying a rectan- most one depth value (z direction) for every point in the gular elastic sparse graph on the object image and then cal- (x, y) plane. culating a Gabor wavelet bank response at each graph node. All EGM based methods proposed so far do not take into
703
2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 978-1-4244-4441-0/09/$25.00 ©2009 IEEE consideration the 3D nature of the objects. In this paper ordinates x, a jet (feature vector) j(x) is formed: we propose methods for exploiting the 3D nature of the ob- j x x x T jects inside the EGM architecture. That is, the proposed ( )=[f1( ),...,fM ( )] , (1) algorithm introduces a new structure for object representa- where f (x) denotes the output of a local operator applied tion combining both image intensity and 3D geometry in- i to image f at the i-thscaleoratthei-th pair (scale, orien- formation. Moreover, we propose algorithms which can be tation) and M is the jet dimensionality. used for robustly matching the object structure in novel in- The next step of EGM is to match the reference graph on stances. The algorithm is general and can be used for arbi- the test facial expression image in order to nd the corre- trary object recognition. In this paper we demonstrate the spondences of the reference graph nodes on the test image. power of the proposed method in face recognition assisted This is accomplished by minimizing a cost function that em- by the available 3D face geometry. Moreover, we exploit ploys node jet similarities while preserving at the same time the 3D face information available in order to advance the the node neighborhood relationships. Let the subscripts t matching procedure. Moreover, we adopt the assumption and r denote a test and a reference facial image (or graph), that facial expressions are isometries upon the facial sur- respectively. The L2 norm between the feature vectors at face and we incorporate this inside the matching algorithm. the l-th graph node of the reference and the test graph is This assumption has been exploited in [4, 10] for creating used as a similarity measure between jets. Another simi- expression invariant face representations. larity measure that is used is the cosine between the two The proposed method in its generalized version can be vectors: applied in a fully automatic manner, combined with a face j(xl )T j(xl ) detection algorithm. The simpli ed version of the proposed C (j(xl ), j(xl )) = r t . (2) f t r ||j xl ||||j xl || algorithm requires the detection of the nose. The nose is ( r) ( t) the most easily detected facial feature when the 3D facial V geometry is available. On the contrary, most of the 3D face Let be the set of all graph vertices of a certain facial recognition algorithms proposed so far require the robust image. For the rectangular graphs, all nodes apart from the detection [7] of a set of facial features, which is in general boundary nodes have exactly four connected nodes. Let N (l) l a dif cult task. Furthermore, we incorporate the 3D facial be the four-connected neighborhood of node .Inor- geometry inside the multiscale analysis and we propose a der to quantify the node neighborhood relationships using a novel Geodesic Morphological Multiscale analysis in order measure, the relative normalized local node deformation is to ll the graph jets. used: || xl − xξ − xl − xξ || l l ( t t ) ( r r) Cd(x , x )= . (3) t r ||xl − xξ|| 2. 2D Elastic Graph Matching Techniques ξ∈N (l) r r {xl }L In the rst step of the EGM algorithm, a suitable for The set of vertices t(r)opt l=1 can be found by mini- face representation sparse graph is selected. An evenly dis- mizing: tributed rectangular graph is one of the most easy to handle l L l {x (r) } =min{xl } C({x }) t opt l=1 t t automatically forms of a graph, since only a face detection l l =min{xl } {−Cf (j(x ), j(x )) algorithm is needed in order to nd an initial approxima- t l∈V t r +λC (xl , xl )}. tion of the rectangular facial region. The facial image re- d t r (4) gion is analyzed and a set of local descriptors is extracted at The choice of λ in (10) controls the rigidity/plasticity of the each graph node (called jets). Analysis is usually performed graph [5], [8]. by building an information pyramid using scale-space tech- In [8], the optimization of (4) was implemented as a Sim- niques. In the standard EGM, a 2D Gabor based lter bank ulated Annealing (SA) procedure, that imposes global trans- has been used for image analysis. The output of multiscale lation at the graph and local node deformations. In [12], in morphological dilation-erosion operations or the morpho- order to deal with face translation, rotation and scaling, the logical signal decomposition at several scales are nonlinear following optimization problem has been formulated: alternatives of the Gabor lters for multiscale analysis and L l both have been both successfully used for facial image anal- {{δl} =1, t, A} =min L C({Ax + t + δl}) l opt {δl}l=1,t,A r ysis [8]. Morphological feature vectors are robust against (5) plane rotations. A jet based on Gabor lters that is robust The above problem can be interpreted as follows: nd a against rotation and scaling has been recently proposed in wrapping matrix A (which in our case is a scaling and ro- [11]. tation matrix), a global translation t and a set of local node {δ }L Formally, at each graph node that is located at image co- perturbations l l=1 such that the cost (4) is minimized.
704 The optimal set of test node coordinate vectors is given by where L is the set of reliable nodes and t represents the xl Axl t δ t = r + + l. The above optimization problem is translational vector between graphs. For a set of six param- solved using simulated annealing, as well. eters, at least 3 reliable nodes should be identi ed. After In [12] the above optimization problem has been solved solving (9), the optimal wrapping matrix Aopt and trans- following a SA approach where random translations, scal- lation vector topt are applied to all nodes and the whole ings and rotations were tested. In order to nd the local procedure is repeated. A series of steps, including scaling, node perturbation, a local optimization problem is solved rotation, translation and local node perturbations are then for each one of the nodes, as: calculated. The algorithm is summarized as follows. {δ } − j xl j Axl t δ • l =minδ Cf ( ( r), ( r + + l)) s.t. Initialize the algorithm using a coarse matching proce- opt l (1) xl Axl t δ ≤ ||δ || ≤ {xl }L Cd( r, r + + l) τmax, and, l δmax. dure t l=1. (6) The above algorithm can be summarized as follows. • For a number of steps • Initialize the algorithm using an initial guess – for all nodes solve the local optimization problem (1) {xl }L (6); t l=1 (using the result of a face detector). – if the reliable nodes (reliable according to (8)) are • For a number of steps more than 3, then estimate the wrapping matrix A t – choose a global random similarity transformation opt and the translation vector opt; (i.e., rotation, translation and scaling) in a certain – re-estimate the position of all nodes according range; to the wrapping matrix Aopt and the translation t – for all nodes solve the local constrained opti- vector opt. mization problem (6); ( ) • | {− j xl k j xl }− If l∈V Cf ( ( t ), ( r)) – check whether you accept the new solution (k−1) ( ) {− j xl j xl }| {xl k }L l∈V Cf ( ( t ), ( r)) <, then stop. t l=1. In [11] the nodes were visited randomly and were locally • End optimized based on a SA strategy. Finally, the SA was 2.1. A Robust Method using Simulated Annealing cooled after the re-initialization of all nodes. The nal mea- {−C (j(xl ), j(xl ))} with a Reliability Criterion for Nodes sure l∈V f t r was used for quantifying the distance between the reference graph and the test im- Another EGM method for handling rotation and scaling age. using Gabor based features was proposed in [11]. Initially a coarse step is applied in order to nd an initial graph posi- 3. 2.5D Matching Algorithm tion. Afterwards, the nodes are randomly visited in order to match them locally. A set of reliable nodes is then selected. In this Section we describe the proposed matching algo- The reliability of the nodes is tested as follows. Let i1,i2,i3 rithm which makes use of the available 2.5D facial informa- be three linked nodes. A relative angle, around a test node, tion. The geodesic distances between the points can be cal- is calculated as: culated given the facial surface. Moreover, they are robust against isometry mappings (or isometries) of the facial sur- xi2 − xi1 xi3 − xi1 t −1 ( t t )( t t ) faces. The facial pose variations are isometries of the facial θ (i1,i2,i3)=cos (7) ||xi2 − xi1 ||||xi3 − xi1 || t t t t surfaces. Up to a certain extent facial expressions can be safely regarded as isometries, as well, as long as the mouth Then, a test node is determined as reliable if it satis es the remains closed. That is, the geodesic distances before and following criterion: after the development of an expression are considered to re- main (approximately) the same. Recently, a lot of research | t − r | θ (i1,i2,i3) θ (i1,i2,i3) <τθ. (8) has been conducted in order to create expression invariant representation for face recognition [4, 10], where facial ex- Having found this set of reliable nodes, we extract a pressions have been modelled as isometries of the facial sur- wrapping matrix A and a translation vector t by solving face. The notion of geodesic distance, the geodesic path and a simple optimization problem. The optimization problem the angle between geodesic paths are pictorially described is as follows: in Figure 1 a. A t ||xk − Axk t ||2 By incorporating the isometry assumption in the EGM ( , ) =argmin t ( r + ) (9) opt A,t k∈L architecture, a matching procedure is implemented in which
705 (a) (b) (c) (d) Figure 1. a) Geodesics paths and geodesic angles; b) the facial geometry and the graph; c) the 3D grid; d the 3D geodesic graph. the nodes can be deformed in such a way that the geodesic a 3 × 3 wrapping matrix A. In this case at least 3 reliable distances are approximately preserved during the matching nodes should be identi ed. After all nodes have been vis- procedure. In our case, the 2.5D information was derived ited, the set of reliable nodes is extracted. In our case, since from photometric stereo [2] and the integration of the nor- 2.5D information is available and in order to retain robust- mal eld [6]. Thus a height map g(x) ∈was acquired ness against pose variations, criterion (8) is generalized as 2 for every pixel of the image as x ∈Z.LetXg be a set follows. Let i1,i2,i3 be three linked nodes. The relative of connections Xg between the points of the surface which angle around a test node is calculated as: de nes the topology of the surface. The set of connections is derived from a triangulation of the image domain (in our t −1 implementation we used a simple Delauney triangulation). θ (i1,i2,i3)=cos (θG(i1,i2,i3)) (13) Let that G =(g,Xg) denote the tuple that contains both the depth map and the topology of the facial surface. where θG is the angle between the geodesic path i1,i2 and Optimization problem (4) was modi ed in such a way as the geodesic path i1,i3 (as shown in Figure 2). In case the {xl ∈V} to nd a set of vertices t(r),l in the test image that nodes are close enough, then: minimizes the following cost function: {xl } −{ j xl j xl t −1 CG( t )= l∈V C f ( ( t), ( r) θ(i1,i2,i3) =cos (θG(i1,i2,i3)) l ξ l ξ |dG (x ,x )−dG (x ,x )| i3 i1 i2 i1 t t r r −1 (z(xt )−z(xt ))(z(xt )−z(xt )) +λ ξ∈N (l) l ξ ≈ cos ( i i i i ). dG (xr ,xr ) 3 1 2 1 ||z(xt )−z(xt )||||z(xt )−z(xt )|| (10) (14) x x where dG( 1, 2) is the geodesic distance between points In any case, all angles are robust against out of plane facial x x G 1 and 2 of the triangulated surface . If we consider that pose changes. Examples of the relative geodesic and grid V the neighborhood of l is suf ciently local (i.e., we have angles are depicted in Figure 2 a and b. a dense graph and a topology without holes) then we may × approximate: After the set of reliable nodes are found, a 3 3 wrapping matrix is calculated by solving a similar optimization as (9). dG(x1, x2) ≈||z(x1) − z(x2)|| (11) Afterwards this wrapping matrix is used for recalculating all node positions. The whole procedure is repeated in a similar z x x x T where ( )=[ g( )] . way as the algorithm in the previous Section. A matching The corresponding local optimization problem (5) in or- result is shown in Figures 3 a and b. der to estimate the local perturbation is now formulated as: In order to simplify the procedure, we shall introduce a {δ } =min −C2(j(xl ), j(Axl + t + δ )) l opt δl f r r l s.t. modi cation of the generalized algorithm proposed above. xl Axl t δ ≤ ||δ || ≤ dG( r, r + + l) τ, and, l δmax. Assume that only certain facial features, like the nose, are (12) available. The nose is one of the most easily detected fea- The local optimization problem is schematically described tures upon the facial surface. Given the position of the nose l t in Figure 4 a. (points xnose and xnose) an initial estimation for the posi- The proposed EGM algorithm that uses the available 3D tion of the graph is provided so that we can search for small facial geometry is as follows. First, a coarse step is ap- translations. Afterwards, under the assumption that facial plied and a rst approximation of the graph is calculated. expressions are isometries on the facial surface, we con ne The coarse step can be a simple implementation of the step our local search, in the local matching procedure, to areas described in the previous Section. For every node a con- in which the geodesic distance between the position of the strained local search is performed in order to solve (12). reference node and the position of the test node does not Moreover, instead of using x, z(x) can be used so as to nd signi cantly change. Given the position of the node we can
706 Ζ Ζ Υ Υ
Χ Χ
(a) (b) Figure 2. a) Relative geodesic angles in (13); b) relative angles in case (14) modify the local search problem as: function does not lead to statistically signi cant changes in the veri cation performance. However, it affects the com- min {−C (j(xl + δ ), j(xl ))} s.t. δl f r l r putational complexity of feature calculation. The ef cient (xl xl )− (xl +δ xt ) dG r , nose dG r l, nose | l l |≤τ1, ||δ || ≤ δ . computation of dilations and erosions is a factor of great dG (x ,x ) l max r nose importance in practical applications. (15) In this paper we use, for simplicity of computations, only The new local search is pictorially described in Figure 4 b. at structuring elements (i.e., hσ(v)=0, ∀v ∈Hσ, ||v|| ≤ |σ|). For this type of structuring elements, the dilation and 4. Geodesic Multiscale Morphological Elastic the erosion can be written as: Graph Matching ⊕ x x − y (f hσ)( )=supy∈Hσ f( ) In the previous Section we described a method that incor- x x − y (17) (f h|σ|)( )=infy∈Hσ f( ) porates the available 2.5D information in the matching pro- cedure in order to obtain a pose invariant matching. More- For a at structuring function, dilations can be ef ciently over, the matching is to some extend invariant to facial ex- computed by applying running min/max calculations in pressions, as well. In this Section, we present features that which the computation of one scale exploits the previous are robust against facial pose variations and facial expres- outcome. That is, scale-recursive computations that speed sions. To that end we describe a Multiscale Morphologi- up considerably feature calculation can be used. For a cal Analysis (MMA) that is robust against pose variations, at structuring function, scale-recursive max computations the so-called Geodesic MMA (GMMA). Moreover, under (i.e., multiscale image dilations) are based on the observa- the isometry assumption (i.e, preservation of geodesic dis- tion that: tances) the proposed MMA analysis can be until some ex- tent robust under facial expression development. (f ⊕ hσ)(x)=max{(f ⊕ hσ)(x), MMA for facial image analysis has been initially pro- maxy∈ΔH(σ+1){f(x + y)}, (18) posed in [8] and has been well incorporated in a Morpholog- max{f(x ± s)}} ical EGM (MEGM) architecture. Another modi ed MEGM T has been presented in [13]. In MEGM an information where s =[σ +1σ +1] ,setΔH(σi+1)={y = T 2 2 2 2 2 pyramid is built using multiscale morphological dilations- [y1 y2] ∈Z : ||y|| >σ, ||y|| ≤ (σ +1) , |y1|≤ erosions. Given an image f(x):D⊆Z2 →and a σ, |y2|≤σ}. A similar multiscale analysis can be applied structuring function h(x):H⊆Z2 →, the dilation of for the calculation of the erosion. In this paper we capital- the image f(x) by h(x) is denoted by (f ⊕ h)(x). Its com- ize on the available 2.5D information and propose a novel plementary operation, the erosion, is denoted by (f h)(x) multiscale analysis. First, we de ne the surface depended [8]. The multiscale dilation-erosion pyramid of the image structuring at elements around x as: x x f( ) by hσ( ) is de ned as: 2 ⎧ gσ,x(v)=0, ∀v ∈{y ∈Z : dG(v, x) ≤|σ|}. (19) ⎨ (f ⊕ hσ)(x) if σ>0 (f ∗ hσ)(x)= f(x) if σ =0 (16) The new multiscale morphological analysis given a ⎩ X (f h|σ|)(x) if σ<0 height map g with a topology g is as follows: ⊕ x { ⊕ x where σ denotes the scale parameter of the structuring func- (f gσi )G( )=max(f gσi−1 )G( ), { v }} (20) tion. In [8] it was shown that the choice of the structuring max(v∈ΔHG (x,σi) f( )
707 (a) (b) (c) (d) Figure 3. a) The 2D reference graph; b) the facial geometry and the graph; c) the matched graph on a test image; d) the facial geometry and the resultant graph.
Optimal Solution
Search Space
(a) (b) Figure 4. a) Local optimization problem (12); b) local optimization problem (15).
2 where ΔHG(v,σi)={y ∈Z : σi−1