Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models

Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models Quan Wang [email protected] Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, NY 12180 USA T T T Abstract y = Ax, where A = [u1 ; ··· ; uM ], and uk uk = 1 for k = 1; 2; ··· ;M. We want to maximize the variance Principal component analysis (PCA) is a of fy g, which is the trace of the covariance matrix of popular tool for linear dimensionality reduc- i fy g. Thus, we want to find tion and feature extraction. Kernel PCA i is the nonlinear form of PCA, which better ∗ A = arg max tr(Sy); (1) exploits the complicated spatial structure of A high-dimensional features. In this paper, we where first review the basic ideas of PCA and kernel N 1 X PCA. Then we focus on the reconstruction S = (y − y¯)(y − y¯)T; (2) y N i i of pre-images for kernel PCA. We also give i=1 an introduction on how PCA is used in ac- and N tive shape models (ASMs), and discuss how 1 X y¯ = x : (3) kernel PCA can be applied to improve tra- N i ditional ASMs. Then we show some exper- i=1 imental results to compare the performance Let Sx be the covariance matrix of fxig. Since T of kernel PCA and standard PCA for clas- tr(Sy) = tr(ASxA ), by using the Lagrangian mul- sification problems. We also implement the tiplier and taking the derivative, we get kernel PCA-based ASMs, and use it to con- struct human face models. Sxuk = λkuk; (4) which means that uk is an eigenvector of Sx. Now xi 1. Introduction can be represented as D In this section, we briefly review the principal compo- X T nent analysis method and the active shape models. xi = xi uk uk: (5) k=1 1.1. Principal Component Analysis xi can be also approximated by arXiv:1207.3538v3 [cs.CV] 31 Aug 2014 Principal component analysis, or PCA, is a very pop- M ular technique for dimensionality reduction and fea- X T xei = xi uk uk; (6) ture extraction. PCA attempts to find a linear sub- k=1 space of lower dimensionality than the original feature space, where the new features have the largest variance where uk is the eigenvector of Sx corresponding to the (Bishop, 2006). kth largest eigenvalue. Consider a dataset fxig where i = 1; 2; ··· ;N, and 1.2. Active Shape Models each xi is a D-dimensional vector. Now we want to project the data onto an M-dimensional subspace, The active shape model, or ASM, is one of the most where M < D. We assume the projection is denoted as popular top-down object fitting approaches. It is de- signed to represent the complicated deformation pat- This work originally appears as the final project of Professor terns of the object shape, and to locate the object in Qiang Ji's course Pattern Recognition at RPI, Troy, NY, new images. ASMs use the point distribution model USA, 2011. Copyright 2011 by Quan Wang. (PDM) to describe the shape (Cootes et al., 1995). If Kernel PCA and its Applications a shape consists of n points, and (xj; yj) denotes the 2. Kernel PCA coordinates of the jth point, then the shape can be represented as a 2n-dimensional vector Standard PCA only allows linear dimensionality reduction. However, if the data has more complicated T structures which cannot be well represented in a linear x = [x1; y1; ··· ; xn; yn] : (7) subspace, standard PCA will not be very helpful. For- To simplify the problem, we now assume that all tunately, kernel PCA allows us to generalize standard shapes have already been aligned. Otherwise, a rota- PCA to nonlinear dimensionality reduction (Schölkopf tion by θ, a scaling by s, and a translation by t should et al., 1999). be applied to x. Given N aligned shapes as training data, the mean shape can be calculated by 2.1. Constructing the Kernel Matrix N Assume we have a nonlinear transformation φ(x) from 1 X x¯ = x : (8) the original D-dimensional feature space to an M- N i i=1 dimensional feature space, where usually M D. Then each data point xi is projected to a point φ(xi). For each shape xi in the training set, its deviation from We can perform standard PCA in the new feature the mean shape x¯ is space, but this can be extremely costly and inefficient. Fortunately, we can use kernel methods to simplify the dxi = xi − x¯: (9) computation (Schölkopf et al., 1998). Then the 2n×2n covariance matrix S can be calculated First, we assume that the projected new features have by zero mean: N N 1 X 1 X T φ(xi) = 0: (16) S = dxidxi : (10) N N i=1 i=1 The covariance matrix of the projected features is M × Now we perform PCA on S: M, calculated by Sp = λ p ; (11) N k k k 1 X C = φ(x )φ(x )T: (17) N i i where pk is the eigenvector of S corresponding to the i=1 kth largest eigenvalue λk, and Its eigenvalues and eigenvectors are given by T pk pk = 1: (12) Cvk = λkvk; (18) Let P be the matrix of the first t eigenvectors: where k = 1; 2; ··· ;M. From Eq. (17) and Eq. (18), we have P = [p1; p2; ··· ; pt]: (13) N 1 X Then we can approximate a shape in the training set φ(x )fφ(x )Tv g = λ v ; (19) N i i k k k by i=1 x = x¯ + Pb; (14) which can be rewritten as T where b = [b1; b2; ··· ; bt] is the vector of weights for N different deformation patterns. By varying the param- X vk = akiφ(xi): (20) eters bk, we can generate new examples of the shape. i=1 We can also limit each bk to constrain the deformation patterns of the shape. Typical limits are Now by substituting vk in Eq. (19) with Eq. (20), we have p p − 3 λk ≤ bk ≤ 3 λk; (15) N N N 1 X X X φ(x )φ(x )T a φ(x ) = λ a φ(x ): N i i kj j k ki i where k = 1; 2; ··· ; t. Another important issue of i=1 j=1 i=1 ASMs is how to search for the shape in new images (21) using point distribution models. This problem is be- If we define the kernel function yond the scope of our paper, and here we only focus T on the statistical model itself. κ(xi; xj) = φ(xi) φ(xj); (22) Kernel PCA and its Applications T and multiply both sides of Eq. (21) by φ(xl) , we have 2.2. Reconstructing Pre-Images N N N 1 X X X So far, we have discussed how to generate new fea- κ(xl; xi) akjκ(xi; xj) = λk akiκ(xl; xi): N tures yk(x) using kernel PCA. This is enough for ap- i=1 j=1 i=1 plications such as feature extraction and data classifi- (23) cation. However, for some other applications, we need We can use the matrix notation to approximately reconstruct the pre-images fxig from 2 K ak = λkNKak; (24) the kernel PCA features fyig. This is the case in ac- where tive shape models, where we not only need to use PCA K = κ(x ; x ); (25) features to describe the deformation patterns, but also i;j i j have to reconstruct the shapes from the PCA features and ak is the N-dimensional column vector of aki: (Romdhani et al., 1999; Twining & Taylor, 2001). T ak = [ ak1 ak2 ··· akN ] : (26) In standard PCA, the pre-image xi can simply be ap- ak can be solved by proximated by Eq. (6). However, Eq. (6) cannot be used for kernel PCA (GökhanH. Bakır et al., 2004). Kak = λkNak; (27) For kernel PCA, we define a projection operator Pm and the resulting kernel principal components can be which projects φ(x) to its approximation calculated using m N X T X Pmφ(x) = yk(x)vk; (33) yk(x) = φ(x) vk = akiκ(x; xi): (28) k=1 i=1 where vk is the eigenvector of the C matrix, which If the projected dataset fφ(xi)g does not have zero is define by Eq. (17). If m is large enough, we have mean, we can use the Gram matrix Ke to substitute P φ(x) ≈ φ(x). Since finding the exact pre-image x the kernel matrix K. The Gram matrix is given by m is difficult, we turn to find an approximation z such Ke = K − 1N K − K1N + 1N K1N ; (29) that where 1N is the N × N matrix with all elements equal φ(z) ≈ Pmφ(x): (34) to 1=N (Bishop, 2006). This can be approximated by minimizing The power of kernel methods is that we do not have 2 to compute φ(xi) explicitly. We can directly con- ρ(z) = kφ(z) − Pmφ(x)k : (35) struct the kernel matrix from the training data set fxig (Weinberger et al., 2004). Two commonly used kernels are the polynomial kernel 2.3. Pre-Images for Gaussian Kernels κ(x; y) = (xTy)d; (30) There are some existing techniques to compute z for specific kernels (Mika et al., 1999). For a Gaussian or 2 2 T d kernel κ(x; y) = exp −kx − yk =2σ , z should sat- κ(x; y) = (x y + c) ; (31) isfy where c > 0 is a constant, and the Gaussian kernel N P γ exp −kz − x k2=2σ2 x κ(x; y) = exp −kx − yk2=2σ2 (32) i i i z = i=1 ; (36) N with parameter σ.

Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support