Whitened LDA for Face Recognition ∗

Whitened LDA for Face Recognition ¤ Vo Dinh Minh Nhat SungYoung Lee Hee Yong Youn Ubiquitous Computing Lab Ubiquitous Computing Lab Mobile Computing Lab Kyung Hee University Kyung Hee University SungKyunKwan University Suwon, Korea Suwon, Korea Suwon, Korea [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Over the years, many Linear Discriminant Analysis (LDA) A facial recognition system is a computer-driven applica- algorithms have been proposed for the study of high dimen- tion for automatically identifying a person from a digital im- sional data in a large variety of problems. An intrinsic lim- age. It does that by comparing selected facial features in the itation of classical LDA is the so-called "small sample size live image and a facial database. With the rapidly increasing (3S) problem" that is, it fails when all scatter matrices are demand on face recognition technology, it is not surprising singular. Many LDA extensions were proposed in the past to see an overwhelming amount of research publications on to overcome the 3S problems. However none of the previ- this topic in recent years. Like other classical statistical ous methods could solve the 3S problem completely in the pattern recognition tasks, we usually represent data sam- sense that it can keep all the discriminative features with ples with n-dimensional vectors, i.e. data is vectorized to a low computational cost. By applying LDA after whiten- form data vectors before applying any technique. However ing data, we proposed the Whitened LDA (WLDA) which in many real applications, the dimension of those 1D data can ¯nd the most discriminant features without facing the vectors is very high, leading to the "curse of dimensionality". 3S problem. In WLDA, only eigenvalue problems instead of The curse of dimensionality is a signi¯cant obstacle in pat- generalized eigenvalue problems are performed, leading to tern recognition and machine learning problems that involve the low computation cost of WLDA. Experimental results learning from few data samples in a high-dimensional fea- are shown using two most popular Yale and ORL databases. ture space. In face recognition, Principal component analy- Comparisons are given against Linear Discriminant Analy- sis (PCA)[5] and Linear discriminant analysis (LDA)[1] are sis (LDA), Direct LDA (DLDA), Null space LDA (NLDA) the most popular subspace analysis approaches to learn the and several matrix-based subspace analysis approaches de- low-dimensional structure of high dimensional data. PCA is veloped recently. We show that our method is always the a subspace projection technique widely used for face recogni- best. tion. It ¯nds a set of representative projection vectors such that the projected samples retain most information about original samples. The most representative vectors are the Categories and Subject Descriptors eigenvectors corresponding to the largest eigenvalues of the I.5.4 [PATTERN RECOGNITION]: Applications covariance matrix. Unlike PCA, LDA ¯nds a set of vectors that maximizes Fisher Discriminant Criterion. It simulta- General Terms neously maximizes the between-class scatter while minimiz- ing the within-class scatter in the projective feature vec- Algorithms, Theory, Experimentation, Performance tor space. While PCA can be called unsupervised learning techniques, LDA is supervised learning technique because it Keywords needs class information for each image in the training pro- cess. This method overcomes the limitations of the Eigen- Face Recognition, Data Whitening, LDA face method by applying the Fisher's Linear Discriminant criterion. This criterion tries to maximize the ratio ¤This research was supported by the MIC(Ministry of Information and Communication), Korea, under the T ITRC(Information Technology Research Center) support w Sbw T (1) program supervised by the IITA (Institute of Information w Sww Technology Advancement) (IITA-2006-C1090-0602-0002) where Sb is the between-class scatter matrix, and Sw is the within-class scatter matrix. Thus, by applying this method, we ¯nd the projection directions that on one hand maximize Permission to make digital or hard copies of all or part of this work for the Euclidean distance between the face images of di®erent personal or classroom use is granted without fee provided that copies are classes and on the other minimize the distance between the not made or distributed for profit or commercial advantage and that copies face images of the same class. This ratio is maximized when bear this notice and the full citation on the first page. To copy otherwise, to the column vectors of the projection matrix W are the eigen- republish, to post on servers or to redistribute to lists, requires prior specific vectors of S¡1S . In face recognition tasks, this method permission and/or a fee. w b CIVR’07, July 9–11, 2007, Amsterdam, The Netherlands. cannot be applied directly since the dimension of the sam- Copyright 2007 ACM 978-1-59593-733-9/07/0007 ...$5.00. ple space is typically larger than the number of samples in 234 the training set. As a consequence, Sw is singular. This The key idea of WLDA is that we apply data whitening be- problem is known as the "small sample size problem"[3]. fore perform LDA. With some nice properties of whitened A lot of methods have been proposed to solve this prob- data, we show how to turn the generalized eigenvalue problem, and reviews of those methods could be found in papers lem of LDA into simple eigenvalue problem, leading to low relevant to this problem. For the sake of completeness, we computation cost of algorithm. While in previous works, here summarize the most novel approaches to overcome the some discriminant information lost during performing algo- 3S problem. In [1], they proposed a two stage PCA+LDA rithms, WLDA has the ability of keeping all discriminant method, also known as the Fisherface method, in which PCA information. Some main contributions of this paper can be is ¯rst used for dimension reduction so as to make Sw non- described as: giving the solution to LDA without facing the singular before the application of LDA. However, in order 3S problem, keeping all the discriminative features, solving to make Sw nonsingular, some directions corresponding to the problem with low computational cost. The outline of the small eigenvalues of Sw are thrown away in the PCA this paper is as follows. In Section 2, all important previ- step. Thus, applying PCA for dimensionality reduction has ous and related works are described such as : PCA, LDA, the potential to remove dimensions that contain discrimina- 2DPCA, GLRAM and 2DLDA. The proposed method is de- tive information. In [12], they try to make make Sw become scribed in Section 3. In Section 4, experimental results are nonsingular by adding a small perturbation matrix ¢ to presented for the ORL and Yale face image databases to Sw. However, this method is very computationally expen- demonstrate the e®ectiveness of our method. Finally, con- sive and will not be considered in the experiment part of clusions are presented in Section 5. this paper. The Direct-LDA method is proposed in [11]. First, the null space of Sb is removed and, then, the pro- 2. SUBSPACE ANALYSIS jection vectors that minimize the within-class scatter in the One approach to cope with the problem of excessive di- transformed space are selected from the range space of S . b mensionality of the image space is to reduce the dimension- However, removing the null space of S by dimensionality b ality by combining features. Linear combinations are par- reduction will also remove part of the null space of S and w ticularly attractive because they are simple to compute and may result in the loss of important discriminative informa- analytically tractable. In e®ect, linear methods project the tion. In [2], Chen et al. proposed the null space based LDA high-dimensional data onto a lower dimensional subspace. (NLDA), where the between-class scatter is maximized in Basic notations are described in Table 1 for reference. Sup- the null space of the within-class scatter matrix. The sin- pose that we have N sample images fx ; x ; :::; x g taking gularity problem is thus implicitly avoided. Huang et al. in 1 2 N values in an n-dimensional image space. Let us also consider [4] improved the e±ciency of the algorithm by ¯rst remov- a linear transformation mapping the original n-dimensional ing the null space of the total scatter matrix. In orthogonal image space into an m-dimensional feature space, where LDA (OLDA) [8], a set of orthogonal discriminant vectors m < n. The new feature vectors y 2 <m are de¯ned by the is computed, based on a new optimization criterion. The k following linear transformation: optimal transformation is computed through the simultane- T ous diagonalization of scatter matrices, while the singularity yk = W (xk ¡ ¹) (2) problem is overcome implicitly. In [10], they showed that where k = 1; 2; :::; N, ¹ 2 Rn is the mean of all samples, and NLDA is equivalent to OLDA, under a mild condition that n£m the rank of the total scatter matrix equals to the sum of W 2 < is a matrix with orthonormal columns. Atfer the linear transformation, each data point xk can be represented the rank of the between-class scatter matrix and the rank m of the within-class scatter matrix. So we will choose only by a feature vector yk 2 < which is used for classi¯cation. NLDA for experiment comparison intead of both NLDA and 2.1 Principal Component Analysis - PCA OLDA. In general, due to the nonsingular of the within-class scatter matrix, none of the above methods could solve the 3S Di®erent objective functions will yield di®erent algorithms problem completely in the sense that it can keep all the dis- with di®erent properties.

Whitened LDA for Face Recognition ∗

Robust Data Whitening As an Iteratively Re-Weighted Least Squares Problem

Arxiv:2106.04413V1 [Cs.CV] 3 Jun 2021 Improve the Generalization [8, 24, 15]

Robust Differentiable SVD

Non-Gaussian Component Analysis by Derek Merrill Bean a Dissertation

10.1109 LGRS.2012.2233711.Pdf

Robust Data Whitening As an Iteratively Re-Weighted Least Squares Problem

Understanding Generalized Whitening and Coloring Transform for Universal Style Transfer

Independent Component Analysis

Independent Component Analysis Using the ICA Procedure

Optimal Spectral Shrinkage and PCA with Heteroscedastic Noise

Discriminative Decorrelation for Clustering and Classification⋆

Learning Deep Architectures Via Generalized Whitened Neural Networks